# Representational Similarity Analysis

Eggs and marshmallow are not kept next to each other in the grocery store despite being a similar color and size. The eggs are usually placed with dairy items and marshmallows with baking supplies. This helps us shop easily, as we usually group our shopping by categories: fruits and vegetables, meat and dairy, frozen foods, and, somewhere far away, kitchen supplies, toys and sports. How does the brain represent meaningful conceptual groupings? Are patterns of neural activity for marshmallows and chocolate chips more similar to each other as compared to the patterns for eggs and marshmallows?

The brain could conceivably group items based on attributes such as color and size. This would make the neural representations of marshmallows and eggs very similar to each other. In a brain region that cares about color, the neural similarity would be greater for white eggs and marshmallows, compared to white eggs and brown eggs. How can we determine the similarity between neural representations and which attributes are driving this similarity?

Representational similarity analysis (RSA) is a way to compare and contrast different brain states and the stimuli that elicited them. In RSA, we compute a similarity measure (often a correlation) between patterns of neural activity for all items being compared. Then, to examine whether neural patterns in a brain region are grouped by color, size, or category, we can order the similarity measure based on a model that groups by these attributes.

RSA is a highly versatile tool: it can be used to compare brain activity to models, compare data across brain imaging techniques, and even to make cross-species comparisons. You can learn more about the RSA method [here](https://doi.org/10.3389/neuro.06.004.2008) and [here](https://doi.org/10.1016/j.tics.2013.06.007).

In today's notebook we are going to use RSA on movie data in two distinct ways. The first way is to compare responses to exemplars to assess categorical processing. This approach is not common in movie data, and so we will have to take steps like we did in the classification notebook in order to set up the data appropriately. The second approach is growing in popularity with movie data and uses it as a whole to compare between participants and a computer vision model.

We will be using the Sherlock dataset today. Review details about it from week 2 if you need a reminder.
    
## Goal of this script

>1. Learn how to perform RSA on a dataset
>2. Visualize similarity with multi dimensional scaling (MDS)
>3. Learn how to create a timepoint by timepoint similarity matrix and consider the weaknesses of that approach

## Table of Contents  
[1. Prepare for RSA](#preprocessing)

[2. Run the RSA](#rsa)  

[3. Multi-dimensional scaling (MDS)](#mds)   

[4. Time-point by time-point similarity analysis](#tsm)

Exercises
>[Exercise 1](#ex1)   [2](#ex2)  [3](#ex3)  [4](#ex4)  [5](#ex5)  [6](#ex6)  [7](#ex7)  [8](#ex8)  [9](#ex9) [10](#ex10)

[Novel contribution](#novel)  

In [None]:
# Import utils
import sys
sys.path.insert(0, '..')
from utils import * 

%autosave 5

# Preset the some information about the experiment and analysis. These won't necessarily make sense now, but we will refer to them later
TR_duration = 1.5 # How many seconds is each TR
lag_shift = 6 # How many seconds is the shift you will apply to the labels to account for hemodynamic lag?
first_segment_duration = 946 # How long was the first movie segment
ppt_num = 17 # How many participants are there?

## Prepare for RSA <a id="preprocessing"></a> 

In the classification notebook, we converted information about each time point in the video into labels that we could use for classification. This notebook we are going to do something similar, except that instead of making binarized labels, we will make several labels that we will try and test as categories. Like before the steps that we will take is to first load in our event information file. 

In [None]:
# Load in the CSV file
#pd.set_option('display.max_rows', 1000) # If you want to print out the entire data frame, uncomment this display
df = pd.read_csv('%s/derivatives/event_file.csv' % sherlock_dir)
df

For this exercise we are going to use the column `Name_Focus`, which reports the names of charaters that are in focus in the frame.

We are going to focus on eight characters: 'The Man', 'Lestrade', 'Anderson', 'Mike', 'Donovan', 'Anthea', 'Molly', 'Mrs. Hudson'. We are going to first capture and then compare the neural evoked response to each character. 

For our main test, we are going to see whether the neural evoked response to one male is more similar to another male than it is to females, and vice versa. The first half of characters are males and the second half are female. 


In [None]:
# Category labels
categories = ['The Man', 'Lestrade', 'Anderson', 'Mike', 'Donovan', 'Anthea', 'Molly', 'Mrs. Hudson'] 

# Specify the corresponding gender label of each character
gender_label = ['M', 'M', 'M', 'M', 'F', 'F', 'F', 'F']

For each character, we are going to make a new column in our dataframe to say whether or not they are on screen, using a binary label (akin to whether they are speaking, like we tested in the classification notebooks). 

In [None]:
# Preset arrays for each of the categories
for key in categories:
    df['is_', key] = np.zeros((len(df),))

# Loop through all the segments
for segment_counter, name in enumerate(df['Name_Focus']):
    
    if name != name: # This means it was a nan
        # When no one is in view, then skip
        continue
    else:
        
        # Cycle through the categories of participant names
        for key in categories:
            
            # Is this participant included
            if name.find(key) >= 0:
                
                # Is this segment selected
                df['is_', key][segment_counter] = 1

If you look at the `Name_Focus` column you will notice that there can be multiple names per segment, which makes sense: more than one character can be in the camera frame.

**Exercise 1:**<a id="ex1"></a> How many segments have more than one of the characters we specified in `categories` on the screen at once?

In [None]:
# Insert code here

We aren't going to subsample our time points like we did for the classification analysis nor are we going to make sure that the conditions are separated in time. You could choose to do these things, but there is no reason you *must* do them, like there is for classification. This is because if you pick bad time points to use for this RSA then that is likely going to hurt you.

That said, there is a key concern we need to have based on our ultimate goal of this analysis: do the male and female characters appear on screen at different times? You could imagine a movie where two male characters are always on screen together. In this analysis, their neural evoked response would be the same since we would be comparing the brain activity for the same time points when comparing the characters. This similarity would be unrelated to their real similarity but would merely result from an artifact of the analysis. Hence, we need to check to what extent there is overlap when characters are on screen, particularly when they are from the same gender category.

To help us understand when these characters were on the screen, we can make a plot.

In [None]:
plt.figure()

# Loop through the categories
for list_counter, key, in enumerate(categories):
    
    # What segments is the character on the screen for
    idxs = np.where(df['is_', key] == 1)[0]
    
    # Plot the data as a series of points
    if gender_label[list_counter] == 'M':
        plt.scatter(idxs, [list_counter + 1] * len(idxs), marker='.', color='r')
    else:
        plt.scatter(idxs, [list_counter + 1] * len(idxs), marker='.', color='g')
    
plt.yticks(range(len(categories) + 1), [''] + categories)
plt.ylabel('Character')
plt.xlabel('Segment of movie');

**Exercise 2:**<a id="ex2"></a> Report whether or not you think there is a problem for the contrast of male vs female? Consider running analyses in which you count the overlapping segments between characters.

**A:**

In [None]:
# Insert code here

**Self-study:** If I were running this analysis for a real publication then I would only use time points where one character is on the screen. Consider re-running these analyses with that constraint. The easy way to do that would be to change the `if` statement when editing the notebook to make new columns. You will see that one of the characters has no segments where they are alone on screen.

One thing you might be wondering about is why Sherlock or John Watson aren't included in this list of characters, even though they are the main characters in the movie. Below I made the same figure as above but including Sherlock and John.

In [None]:
from IPython.display import Image
Image('Character_presence_all.png')

**Exercise 3:**<a id="ex3"></a> Considering this figure, why do you think we aren't including Sherlock or John?

**A:**

You will remember in the classification lecture that we had to convert the segments from the event file into TRs. We are going to do that below. Additionally, we are going to time shift our data here to compensate for hemodynamic lag. The result will be a list (`TR_condition`) of time points for each condition, specifying when the peak neural response is expected to each condition

In [None]:
# Cycle through the different parts of the observation_TRs_raw dictionary
TR_lag_shift = int(lag_shift / TR_duration) # Convert the lag shift into TRs

TR_condition = {}
for key in categories:

    # Preset the variable where each TR is a nan
    TR_condition[key] = np.zeros((int(np.nanmax(df[' End Time (s) ']) / TR_duration, ))) # Make an array of nans that is the length of the movie

    # Cycle through each condition
    for segment_counter, condition in enumerate(df['is_', key]):

        start_time = df['Start Time (s) '][segment_counter]
        end_time = df[' End Time (s) '][segment_counter]
        num_TR = np.round((end_time - start_time) / TR_duration)

        start_idx = int(np.round(start_time / TR_duration)) + TR_lag_shift
        end_idx = int(np.round(end_time / TR_duration)) + TR_lag_shift
        
        # Store the condition labels
        TR_condition[key][start_idx:end_idx] = condition

We want a function to load in the participant fMRI data, so we will use the same one from the classification notebook. The output is voxel by time data where the functional has been masked.

In [None]:
def prepare_sherlock(ppt, ROI_counter):
    # Load a Sherlock participant's data.
    # ppt is the participant ID e.g., sub-01
    # ROI_counter is the index of the Harvard-Oxford atlas you will use. To see all the labels, use `datasets.fetch_atlas_harvard_oxford('cort-maxprob-thr25-2mm', symmetric_split=False).labels`
    # These functions are pulled from week 2, with a tweak for loading a different mask

    # What is the file name? We are getting participant 1
    file = sherlock_dir + '/derivatives/movie_files/%s.nii.gz' % (ppt)

    # Create the nifti object that serves as a header
    func_nii = nib.load(file)

    # Get the dimensionality of the data
    func_dim = func_nii.shape

    # Get the functional voxel size
    func_voxel = func_nii.header.get_zooms()

    # Load the data volume
    print('Loading fMRI data for %s, this will take a minute' % ppt)
    func_vol = func_nii.get_fdata()
    print('Finished')

    # Specify the mask file
    mask_file = '%s/visfAtlas/nifti_volume/visfAtlas_MNI152_volume.nii.gz' % (atlas_path) 

    # Load the atlas
    atlas = datasets.fetch_atlas_harvard_oxford('cort-maxprob-thr25-2mm', symmetric_split=False)

    # Get the nifti corresponding to the atlas
    mask_nii = atlas.maps

    # Run the command to align the data. Also see how Python lets you put line breaks within arguments to make it more readible!
    mask_aligned_nii = processing.conform(mask_nii, 
                                          out_shape=[func_dim[0], func_dim[1], func_dim[2]], 
                                          voxel_size=(func_voxel[0], func_voxel[0], func_voxel[0]),
                                         )

    # Pull the volume data
    mask_aligned_vol = mask_aligned_nii.get_fdata()

    # Threshold the mask
    mask_aligned_vol = mask_aligned_vol == ROI_counter

    # Produce the voxel by time matrix of the data
    func_masked = func_vol[mask_aligned_vol == 1]
    
    # Return the masked functional data
    return func_masked


In [None]:
# Now mask the data from this participant
ppt = 'sub-01' # Specify the participant to load

ROI_counter = 23 # Corresponds to Lateral Occipital Cortex, inferior division (atlas.labels.index("Lateral Occipital Cortex, inferior division"))

# Generate the voxel by time masked data
func_masked = prepare_sherlock(ppt, ROI_counter)


So we have all our functional data and we have the timepoints corresponding to each condition (i.e., the times when each character is on the screen). Unlike for classification, where we wanted to store all of the observations that belong to each label, now we want to **average** across all of the time points to get a one-dimensional vector for each label category. This vector is thought of as the neural representation of the response to this category. In other words, this vector is the pattern of activity that is the key input into the RSA.

In [None]:
def select_timepoints_average(func_masked, TR_condition):
    # Only use the time points that we have selected
    # func_masked is the voxel by time matrix of masked data
    # TR_indexes is a dictionary containing the time points we are using for each observation
    # This returns an observation by voxel matrix
    
    func_mat = np.zeros((len(TR_condition.keys()), func_masked.shape[0]))
    for key_counter, key in enumerate(TR_condition):

        # What time points are you pulling out for this chunk of data
        idxs = TR_condition[key]

        # Get the voxel by time matrix for the selected time points then transpose it so that it is observation x voxel
        voxel_data = np.mean(func_masked[:, idxs == 1], 1)

        # Store the data
        func_mat[key_counter, :] = voxel_data

    print('Output shape:', func_mat.shape)
    
    # Return the functional data
    return func_mat


In [None]:
# Generate the observation by feature matrix you will use for analyses
func_mat = select_timepoints_average(func_masked, TR_condition)

The data above is in the shape of categories by voxel. This is for one participant, representing the average neural response to each category.

# 2. Run the RSA <a id="rsa"></a> 

Now we can do the last step which is to make an RSA matrix. This is very simple computationally, as you will see below. To understand what is going on, we take each neural representation (i.e., the average response to each label) and correlate it with all other representations

In [None]:
# Compute the cross correlation (conveniently, this function does this without us needing to make a for loop)
RSA = np.corrcoef(func_mat)

plt.imshow(RSA)
plt.yticks(range(len(categories)), categories)
plt.xticks(range(len(categories)), categories, rotation=45);

It can often help to remove the diagonal (which is always 1 (since it is the correlation of a vector with itself) in order to visualize the variation more clearly. Below we turn the diagonal to nans so it shows up as white

In [None]:
RSA[np.eye(len(categories)) == 1] = np.nan

plt.imshow(RSA)
plt.yticks(range(len(categories)), categories)
plt.xticks(range(len(categories)), categories, rotation=45);
plt.colorbar()

plt.title('RSA plot for characters in Sherlock for %s' % ppt);

Here we are utilizing correlation to compare the patterns of activity. However, correlation is just one metric you could use to compare vectors. Correlation has flaws, for instance it normalizes the range of the values which might be inappropriate for your analysis (<a href="https://pubmed.ncbi.nlm.nih.gov/23738883/">this paper</a> is very illustrative). Indeed, I (Cameron) spent weeks banging my head against an analytic wall trying to figure out why my results were changing so much by making what I thought was a trivial change in the metric used for RSA (Euclidean vs. Correlation). To be clear Euclidean isn't the only option either, and there are [interesting options](https://www.sciencedirect.com/science/article/pii/S105381192200413X) for weighting each voxel differently in order to compute the similarity between responses.
    
Let's rerun this code with Euclidean distance as the metric to see how much that changes the RSA

In [None]:
# Get the euclidean distances between points
RSA_euclidean = metrics.pairwise.euclidean_distances(func_mat)

plt.imshow(RSA_euclidean)
plt.yticks(range(len(categories)), categories)
plt.xticks(range(len(categories)), categories, rotation=45);
plt.colorbar()


**Exercise 4:**<a id="ex4"></a> Comment on the similarities and differences of using correlation vs euclidean distance for the RSA.  
**A:**

The most common method for doing RSA is correlation, so let's go back to that now. So far, we have just ran this analysis on one participant, but we can't make much of a conclusion from that one participant. Let's run our analysis on all of the participants. This will take ~10 minutes.

In [None]:
RSA_all = np.zeros((len(categories), len(categories), ppt_num))

# Specify the region used
ROI_counter = 23 # Corresponds to Lateral Occipital Cortex, inferior division(atlas.labels.index("Lateral Occipital Cortex, inferior division"))

for sub_counter in np.arange(1, ppt_num + 1):

    ppt = 'sub-%02d' % sub_counter # Specify the participant to load

    # Generate the voxel by time masked data
    func_masked = prepare_sherlock(ppt, ROI_counter)

    # Generate the observation by feature matrix you will use for analyses
    func_mat = select_timepoints_average(func_masked, TR_condition)
    
    # Store the functional data
    RSA_all[:, :, sub_counter - 1] = np.corrcoef(func_mat)

Now that we have all of the RSA matrices for each participant, let's average them. We stored the data as a 3d array where the 3rd dimension is participant, so we can just average across that dimension to get our average matrix. 

**Self-study:** Statistical wisdom states that you should Fisher Z transform your correlation values before doing any arithmetic on them, like averaging. That is best practice, but is not typical in the community (or at least the papers often don't report that step). To do the transform, you can just take the arctan of the data.

In [None]:
# What is the average RSA matrix across participants
RSA_average = np.mean(RSA_all, 2)

# Remove the diagonal
RSA_average[np.eye(len(categories)) == 1] = np.nan

# plot the RSA
plt.imshow(RSA_average)
plt.yticks(range(len(categories)), categories)
plt.xticks(range(len(categories)), categories, rotation=45);
plt.colorbar()

plt.title('RSA plot for characters in Sherlock, averaged across participants');

We have the average matrix, but is this matrix reliable? In other words, are participant's RSA matrices similar to one another in such a way that we should trust them? There are a few approaches to assessing this, but we are going to adopt an approach that is in principle similar to what we did with the ISC methods. Specifically, we are going to test whether an individual participant is more similar to the average of the other participants. If participants are consistently similar to the average, then that is a sign that the individual participant RSA matrices are reliable and capture the signal.

Below we define a function that performs this test of reliability. It takes two steps. In the first step it takes the off diagonal of the RSA and converts it into a vector. It then loops through the participants and averages the off-diagonal vector in N-1 participants and compares that to the held out participant. The resulting correlation tells you how similar is to the average of others

In [None]:
def RSA_reliability(RSA_all, k=1, shuffle=False):
    # Do an ISC style analysis of the RSA matrix to test whether individuals are similar to the average
    # RSA_all is an NxNxP array where N is the number of categories, and P is the number of participants.
    # k is the distance from the diagonal that we will use for finding the off diagonal
    # shuffle determines whether to scramble the order of the vectors so that your correlation represents a test of the null.
    
    for ppt_counter in range(RSA_all.shape[2]):

        RSA = RSA_all[:, :, ppt_counter]

        # Convert the off-diagonal into a vector
        RSA_vec = np.triu(RSA, k=k).flatten()

        # Remove the zeroed values
        RSA_vec = RSA_vec[RSA_vec != 0]
        
        # Set the array
        if ppt_counter == 0:
            RSA_vecs = np.zeros((RSA_all.shape[2], len(RSA_vec)))

        # Store the off diagonals
        RSA_vecs[ppt_counter, :] = RSA_vec

    # Now do leave one out comparisons
    isc = []
    for ppt_counter in range(RSA_all.shape[2]):

        # Get all the participants but the one you are testing
        loo_ppts = np.setdiff1d(np.arange(0, RSA_all.shape[2]), ppt_counter)

        # What is the average of the LOO participants
        av_vecs = np.mean(RSA_vecs[loo_ppts, :], 0)
        
        # Get the vector for this participant
        ppt_vec = RSA_vecs[ppt_counter, :]
        
        # If true, then you will shuffle the participant vector, creating a null that can be used for reference
        if shuffle == True:
            np.random.shuffle(ppt_vec)
        
        isc += [np.corrcoef(av_vecs, ppt_vec)[0, 1]]
    
    # Return the ISC values
    return isc

In [None]:
# Run the code
isc = RSA_reliability(RSA_all)
print('Mean ISC: %0.2f (range: %0.2f -- %0.2f)' % (np.mean(isc), np.min(isc), np.max(isc)))

So we get a correlation that seems pretty high, but it is hard to know whether it is significant. Like in past weeks, we **strongly encourage you to create your own null distribution**. In other words, what would be the correlation between participants if we jumbled up the conditions? Fortunately, the `RSA_reliability` code has that functionality built in.

**Exercise 5:**<a id="ex5"></a> Rerun the `RSA_reliability` code with `shuffle=True`, visualize the result and interpret it. Specifically, we would like you to do the following:  
> Run the `RSA_reliability` function with `shuffle=True` and store the output with a new name  
> Create a *useful* visualization to see whether the distribution of real reliability values is different from the shuffled reliability values. There are several types of plots you could do. Whatever you chose, it should show the distribution of values for both the real and shuffled data.  
> Interpret what the difference between real and shuffled reliability means.

In [None]:
# Insert code here

**A:**

With the reliability of RSA out of the way, we can now finally ask our question: is the neural representation of female characters different from the neural representation of male characters? To set this up, we need to find the elements of the RSA matrix that correspond to each gender and whether it is a match or not. We do this below

In [None]:
# Preset
within_pos = []
between_pos = []

# Cycle through the x and y positions to see what gender each category is
for x_counter, x_label in enumerate(gender_label):
    for y_counter, y_label in enumerate(gender_label):
        
        # Only take the elements from the upper triangle. This works because the results are mirror symmetric around the diagonal
        if y_counter < x_counter:
        
            # If the two gender labels are the same then this is a within gender comparison, if they are different then this is a between gender comparison
            if x_label == y_label:
                within_pos += [[y_counter, x_counter]] # Store the coordinates
            else:
                between_pos += [[y_counter, x_counter]] # Store the coordinates

To confirm this worked, let's mark up the figures with a black 'w' or red 'b' for the within or between gender comparisons, respectively.

In [None]:
plt.imshow(RSA_average)

# Overlay the text on teh figure
for pos in within_pos:
    plt.text(pos[1], pos[0], 'w', c='k')
for pos in between_pos:
    plt.text(pos[1], pos[0], 'b', c='r')    

plt.yticks(range(len(categories)), categories)
plt.xticks(range(len(categories)), categories, rotation=45);
plt.colorbar()

That worked and confirmed that we have correctly identified the elements of the RSA that compare responses within gender category vs between gender categories. Now let's pull out the within vs. between gender comparisons for each participant, average them and find the difference. This difference is stored in the `diff` variable, with each entry referring to an individual participant. We can then use that difference in a 1 sample t-test to evaluate whether there is a significant difference

In [None]:
diff = [] # Preset the list

# Iterate through participants
for sub_counter in np.arange(RSA_all.shape[2]):
    
    # Get all the within participant comparisons (i.e., the similarity at all the 'w' locations above)
    within_vals = []
    for pos in within_pos:
        within_vals += [RSA_all[pos[0], pos[1], sub_counter]]

    # Get all the between participant comparisons (i.e., the similarity at all the 'b' locations above)        
    between_vals = []
    for pos in between_pos:
        between_vals += [RSA_all[pos[0], pos[1], sub_counter]]
        
    diff += [np.mean(within_vals) - np.mean(between_vals)]  

**Exercise 6:**<a id="ex6"></a> Run a t test and evaluate whether there is a significant difference for within vs. between comparisons. (Hint: You should use the `stats` package for t tests)

In [None]:
# Insert code here

**Exercise 7:**<a id="ex7"></a> Interpret the results from **Exercise 6**. Both consider what it means, and what limitations there are. One limitation you should discuss is whether there are potential confounds in the analyses (Hint: consider what might be going on in the scene other than the gender of the character).

**A:**

## 3. Multi-dimensional scaling (MDS) <a id="mds"></a>

The correlation matrix in RSA describes how similar each character is to each other character. This means that if two characters have a high positive correlation then they can be thought of as eliciting a very similar activation pattern across voxels. We can reframe this to be thought of as a distance in a high-dimensional space. From this perspective, items that are similar to one another will be grouped close together and will be far away from points that they are dissimilar to. 

To give you an intuition of this, consider the image below. On the right hand side is the RSA that we have been measuring up to this point, and on the left hand side is a way of visualizing the similarity relations of the exemplars via distance.
<img src="Similarity_space.png" style="height:20%;">

MDS allows us to visualize the similarity of our data in terms of the distances between the categories. Specifically, it allows to generate a lower-dimensional image (e.g., 2-D or 3-D) in which the distances between points approximate the distances in the original high-dimensional data. There is an MDS [method](https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Multidimensional_Scaling.pdf) built into [scikit-learn](http://scikit-learn.org/stable/modules/manifold.html#multidimensional-scaling) and is easy to run.

Multi-dimensional scaling cares about the *dissimilarity* not the similiarity. Correlation measures similarity; hence, we need to convert the correlations into dissimilarity. The conventional way to do it, albeit hacky, is to just take 1 minus the correlation values. So if the correlation was 0.7 it becomes 0.3, if it was -0.2 it becomes 1.2, etc. Alternatively, you can avoid this issue entirely by using Euclidean distance instead of correlation.

In [None]:
# Make the MDS object
mds = manifold.MDS(n_components=2, dissimilarity='precomputed')

# Convert the SIMILARITY matrix into a DISSIMILARITY matrix
rdm = 1 - np.mean(RSA_all, 2)

# Get the coordinates of the transformation
coords = mds.fit_transform(rdm)

**Self-study:** We set `dissimilarity=precomputed` because we have already made the matrix into a dissimilarity matrix that can be used directly. If this isn't set then the MDS code will compute the euclidean distances between points. This can be fine or it can be problematic, depending on your data. Explore what happens to the MDS under the following settings, where `RSM` means just the RSA (i.e., the `RDM` without taking `1 -`).
1. RDM and dissimilarity="precomputed" - This is what we just did
2. RDM and dissimilarity="euclidean"
3. RSM and dissimilarity="precomputed"
4. RSM and dissimilarity="euclidean"

The output of applying the MDS are coordinates providing the x an y position of an embedding of these points that captures their distance relationships. The dimensions of these points are arbitrary so we just put them in a box.

In [None]:
# Plot the text signifying the gender and name label of each point in the embedding
plt.figure()
for i in range(len(gender_label)):
    if gender_label[i] == 'M':
        plt.scatter(coords[i, 0], coords[i, 1], alpha=0.5, c='r')
    else:
        plt.scatter(coords[i, 0], coords[i, 1], alpha=0.5, c='g')
    plt.text(coords[i, 0], coords[i, 1], '  %s - %s' % (gender_label[i], categories[i]))
    

plt.xlim([np.min(coords[:, 0]) * 1.15, np.max(coords[:, 0]) * 1.5])
plt.ylim([np.min(coords[:, 1]) * 1.15, np.max(coords[:, 1]) * 1.5])

plt.xticks([])
plt.yticks([]);

This visualization has interesting properties where it seems to suggest some grouping consistent with our gender categories. However, MDS can put any data into 2 dimensions, even when it is not a good fit. Our data is 8 dimensional (because there are 8 characters) and so it is possible that putting the data in 2 dimensions is squishing things together that shouldn't be put together. MDS gives us tools to decide whether 2 dimensions is a good fit for our data. This is called 'stress' and a value below 0.2 is considered good stress

**Exercise 8:**<a id="ex8"></a> Interpret the MDS. In your interpretation, consider the stress of the MDS, accessed via the `stress_` of the `mds` object.

**A:**

 # 4. Time-point by time-point similarity analysis <a id="tsm"></a>
 
RSA is not typically used with movie data, but is instead used for event related designs of isolated stimuli. We have nonetheless made it work by twisting the nature of the stimuli. However, movie data affords a different type of analysis. In particular, rather than getting the neural response to individual items/exemplars (e.g., characters) we can instead compare the neural response across time. If we do that, we get a time-point by time-point similarity matrix (TSM), telling us how similar each time-point is to *all* others. This is done the same way as RSA. This kind of analysis is central to [event segmentation](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5558154/), but we are going to use it below to replicate an analysis from [Kumar et al., 2020](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008457)

Below we make a TSM using just the second half of the movie.

In [None]:
# We already have func_masked from the last participant loaded with `prepare_sherlock` so let's just use them
func_masked_2nd = func_masked[:, first_segment_duration:] # Take all the timepoints after the first segment

# Compute the TSM. We are transposing the data (that is what `.T` does), so that it becomes a time by time matrix
TSM = np.corrcoef(func_masked_2nd.T)

# Plot the TSM
plt.imshow(TSM)
plt.title('TSM for 2nd half')
plt.xlabel('TR')
plt.ylabel('TR');

Great, we have the TSM, now we want to use it for something. The beauty of RSA is that once we have the data in a matrix format, we can compare it to data from completely different sources. For instance, you can use RSA to compare between humans, animals and/or models. We can do the same with TSMs: we are going to compare the TSM from these participants to a TSM made with a computer vision model.

### Temporal smoothing confound

However, before we do that there is a confound that we need to be aware of: time points close to each other are similar to each other because the Hemodynamic Response is temporally smooth (AKA autocorrelated). We talked a lot about this topic in previous weeks and it matters again here. The problem caused by this temporal smoothing is that even when there is no task evoked activity (e.g., the brain is at rest), then the pattern of activity in adjacent time points will be similar to each other. This can then look like structure in our TSM without there actually being any.

To show this problem we will create a simulation (see how much we love simulations!). Below we simulate 10 voxels in 10 participants where the voxels have no similarity across participants but they have activity that is realistic for fMRI data (i.e., they are temporally smoothed)

In [None]:
# Specify the parameters for the simulation
num_voxels = 10 # How many voxels are you simulating
num_timepoints = TSM.shape[0] # How many time points are you making
num_sims = 10 # How many participants

# Simulate some random 'signal' and convolve it with the HRF
sim_sig = np.random.randn(num_timepoints + 1, num_voxels) # Simulate 100 time points in 10 'voxels'
sim_hrf = sim.convolve_hrf(sim_sig, tr_duration = TR_duration, temporal_resolution=1)[1:]

# Plot an example simulated voxel
plt.figure()
plt.plot(sim_hrf[:, 0])
plt.xlabel('TR')
plt.ylabel('Response')
plt.title('Example of simulated voxel')

# Plot the TSM from this simulated voxel
plt.figure()
TSM = np.corrcoef(sim_hrf)
plt.imshow(TSM)
plt.title('RSA of simulated data')
plt.xlabel('TR')
plt.ylabel('TR');


So if we run this simulation multiple times, each time generating new random data that should be unrelated to the last, we can compute the RSA for each one and compare them. Specifically, we will perform the test of reliability using the RSA method like before.

In [None]:
# Now cycle through making independent participants, all generated with noise
TSM = np.zeros((num_timepoints, num_timepoints, num_sims))
for i in range(num_sims):
    sim_sig = np.random.randn(num_timepoints + 1, num_voxels) # Simulate 100 time points in 10 'voxels'
    sim_hrf = sim.convolve_hrf(sim_sig, tr_duration = TR_duration, temporal_resolution=1)[1:, :]
    TSM[:, :, i] = np.corrcoef(sim_hrf)

# Now evaluate the reliability of the data    
k = 1 # How much of the off-diagonal should you ignore
isc = RSA_reliability(TSM, k=k)
print('Mean ISC: %0.2f (range: %0.2f -- %0.2f)' % (np.mean(isc), np.min(isc), np.max(isc)))

Those correlation values are low, but that is typical in TSM analyses and importantly they are very consistently above zero across the simulated "participant". That means that the TSM matrices generated from data of pure noise are similar to one another. In other words, when we generated a bunch of random data with the temporally smooth properties of fMRI and made a TSM out of it, the TSMs are similar to each other. This could mean that TSMs are useless for comparing movie data since it makes even noise seem similar.

But there is a way to save it, we can ignore the parts of the similarity matrix that are near the diagonal. Any elements near the diagonal are contaminated with autocorrelation, but elements far away from the diagonal are not. To give examples: timepoints 10 and 11 — which are near the diagonal — are correlated with each other because of mere temporal smoothing, but timepoints 10 and 30 — far from the diagonal — will not be related because of temporal smoothing.

How much buffer should we use? Here we will use 10 TRs, equivalent to 15s, which should be more than enough to avoid the worst effects of temporal smoothing. To make this buffer, we just need to set a `k` parameter in the `RSA_reliability` function.

In [None]:
k = 10 # How much of the off-diagonal should you ignore

plt.imshow(np.triu(TSM[:, :, 0], k=k))
plt.title('Example TSM matrix with a buffer')

isc = RSA_reliability(TSM, k=k)
print('Mean ISC: %0.2f (range: %0.2f -- %0.2f)' % (np.mean(isc), np.min(isc), np.max(isc)))

Great, now that we set k to 10, the similarity between TSMs generated from noise is near zero. That means that adding the buffer eliminated the contribution of temporal smoothing.

**Exercise 9:**<a id="ex9"></a> Test how changing `k` affects this test of ISC. Show a plot of the mean ISC changing with `k` (ranging from 0 to 10). Report the value of `k` where the reliability is near chance and explain why this value reflects chance. *Hint 1:* You don't need to regenerate the `TSM`, the one that was made is sufficient. *Hint 2:* You will want to set `shuffle` to `True` in `RSA_reliability` to figure out chance

In [None]:
# Insert code here

In [None]:
## EXAMPLE CODE, DELETE ##
real_isc = []
shuffle_isc = []
for k in np.arange(1, 10):
    
    real_isc += [np.mean(RSA_reliability(TSM, k=k))]
    shuffle_isc += [np.mean(RSA_reliability(TSM, k=k, shuffle=True))]
    
plt.plot(real_isc)
plt.plot(shuffle_isc);

### Comparing a computer vision model to human vision

So in the above section we learned what TSM was and we learned that we need a buffer along the diagonal in order to make fair comparisons. Now we are going to use the TSM to see whether human brain activity is similar to the way a computer vision model.

Starting approximately 10 years ago, convolutional neural networks (CNNs) proved to the world that it is possible to achieve human level visual performance using deep learning algorithms. There is a lot of history and explanation that is outside of the scope of this course to discuss about this topic, but here is an *extremely* useful [primer on how CNNs](https://poloclub.github.io/cnn-explainer/) work. Please read through the site now — even if you are a CNN Ninja who doesn't need a refresh, this website is frankly just *cool*. 

How do we use CNNs for cognitive neuroscience? Well as images are being processed through the neural network, at each layer of the network an image is 'evoking' a 'pattern' of 'activity' (I use quotes because the language is intentionally similar to language we use in neuroscience, but it is also how the literature talks about it). To give an intuition from the website, look at the first layer — called conv_1_1 — and compare how the channels (AKA the images in the conv_1_1 column) are activated for each of the input images. For the Koala, the most blue channel is the 5th but for the red peppers, the most blue channel is the 8th or 10th. These differences reflect the pattern of activity evoked by each stimulus. We can compare that pattern of activity to humans to see if there is correspondence. Surprisingly, the internal representations of the CNN are similar to the neural representations we have while processing stimuli ([Yamins, et al., 2014](https://pubmed.ncbi.nlm.nih.gov/24812127/); [Khaligh-Razavi & Kriegeskorte, 2014](https://pubmed.ncbi.nlm.nih.gov/25375136/)). These results were so impactful at the time because they suggest something deep about visual processing: there are only a few ways to do object processing well and it seems that both machines and brains found similar algorithms.

The research described above (and the website) is looking at the CNNs response to images, but we are using movies, so how do we make that work? Well, we can extract all the frames from a movie and then run each one through a CNN. We can then average the activity from the CNN within TR (since there are many frames for each TR we collect). Finally, we can do a TSM computation where we ask how similar each time point of the movie is in the CNN to all the other timepoints.

We have already done that step. Specifically, we used [AlexNet](https://en.wikipedia.org/wiki/AlexNet) — one of the first deep learning models capable of human-level object classification — and extracted the pattern of activity in it's `fc6` layer (this is a late layer in the network that should have lots of information about objects). We then computed a TSM for this layer of the model by comparing its response across all time points. We then removed the near diagonal of this data (temporal smoothing is also a problem for CNNs too) and then turned the TSM into a vector. The result is the `alex_fc6_rsm.npy` file. This is loaded in and visualized below.

In [None]:
# Get the model representations. These were stored as a vector for simplicity
model_vec = np.load('alex_fc6_rsm.npy')

# How much of the off-diagonal was used as a buffer
k = 10 

# We can convert the vector into a matrix
model_mat = np.triu(np.ones((func_masked_2nd.shape[1], func_masked_2nd.shape[1])), k=k) # Make a mask of all the off-diagonal points to be filled
model_mat[model_mat == 1] = model_vec # Insert the model values into the matrix

# Show the matrix
plt.imshow(model_mat)
plt.title('Alexnet FC6 time-point by time-point similarity for the second half of Sherlock')
plt.xlabel('TR')
plt.ylabel('TR');


So we have a TSM of the computer vision model and a TSM of human brain activity. How similar are these two things? The beauty of RSA is that to compare between these data formats — brains and models — you just need to do a simple correlation between the matrices. If the correlation is significantly above zero then that suggests that the model is capturing information about infant brain activity. Let's do that below:

In [None]:
# Compute the TSM
TSM = np.corrcoef(func_masked_2nd.T)

human_vec = np.triu(TSM, k=k)
human_vec = human_vec[human_vec != 0]

# Compute the correlation
r_val = np.corrcoef(human_vec, model_vec)[0, 1]

print('Correlation between Alexnet and brain activity: %0.2f' % r_val)


That is a small correlation value! You might look at this and assume that this is non-significant. But hold on! Correlations for this kind of analysis can be really low because this is a really strict test. Think about it, we are asking whether the timepoints of the movie my brain thinks are similar to each other are the same time points that the model thinks are similar. If me and the model are off by even a single TR (e.g., I think timepoints 10 and 30 are similar but the model thinks 10 and 31 are similar), then this correlation will be reduced. Hence, this method may be a particularly strict test of similarity to the model. Moreover, as the number of elements in a vector increase, correlation values tend to decrease, all else being equal. Hence, we need to test whether this is significant.

**Exercise 10:**<a id="ex10"></a> Test whether the TSM is reliably correlated with the representation from AlexNet. Specifically, find the correlation between each participant and the model, and test whether the correlations are significantly above zero. To do this, you will have to write a lot of code, so below we offer you some comments to guide you through the steps of what to do.

In [None]:
# Pseudocode template for Exercise 10. Feel free to change as you see fit

# Preset the region being used, the matrix you will store data in and the model you will use

# Create a for loop where you loop through each participant. In the loop, do the following:
#    Mask the functional data
#    Crop to only include the second segment of data
#    Create a TSM for each participant and store the TSM

# Create a for loop where you loop through each participant. In the loop, do the following:
#    Take the off diagonal of each participant's TSM and make it a vector
#    Correlate the vectorized TSM to the model, and store the correlation

# Perform a t test of all the correlation values relative to zero

**Novel contribution:**<a id="novel"></a>  be creative and make one new discovery by adding an analysis, visualization, or optimization.

In [None]:
# Insert code here

## Contributions <a id="contributions"></a> 

C. Ellis Adapted the RSA tutorial from Brainiak.