# Assignment 10: Representational Similarity Analysis

Please submit this assignment to Canvas as a jupyter notebook (.ipynb). The assignment will have you analyzing correlations between time points during a task.

In [None]:
%matplotlib inline
import pandas as pd
import seaborn as sns

# Question 1: Serial position similarity

In this first part, we will be replicating some of the main results from from [Manning et al. (2011)](https://www.pnas.org/doi/10.1073/pnas.1015174108). The first result that Manning showed was that neural activity seemed to drift such that the similarity of patterns of intracranial EEG decreased as a function of distance in the list.

<center>
    <img src="figures/MannEtal11A.png" width=400>
</center>

The following code loads all of the correlations between all spectral features during all encoding items in a list

In [None]:
data_path = '/data7/rsa_class/'
raw_WORD_rsa_df = pd.read_csv(data_path + 'raw_WORD_rsa_df.csv')

In both dataframe files, each row contains the correlation between two events, either two encoding events or an encoding event and a retrieval event. In the dataframe comparing two encoding events, the metadata associated with it are coded as `colname_WORD` and `colname_WORD2` with `_WORD2` referring to the encoding event with the later serial position. In tha dataframe comparing encoding to retrieval, the columns will be coded as `colname_WORD` for the encoding event and `colname_REC_WORD` for the retrieval event. The codebook for the `colname` options are as follows:
* `subject`: The string identifier of the subject, e.g. R1001.
* `session`: The session number
* `list`: List number (1-25) during which the event occurred.
* `serialpos`: Indicates the serial position in which the word was presented during the study list
* `item`: The word being presented or recalled in a `WORD` or `REC_WORD` event.
* `category`: The category of the word being presented or recalled in a `WORD` or `REC_WORD` event.
* `recalled`: 1 if the item was subsequently recalled, 0 if it was not
* `outpos`: Indicates the output position in which the word was recalled during the recall period

We'll again do this analysis within list. Since we are interested in the encoding period, we will use the "WORD" events. First, use `pandas` `query` function to select only the within-list comparisons. This means selecting only cases where `list_WORD` is the same as `list_WORD2`. Also we want to select only cases where `serialpos_WORD` is before `serialpos_WORD2`

In [None]:
# Question 1a
### YOUR CODE HERE

Now just like in lecture, we want to subtract off the mean of each list so each correlation value is relative to the list mean. We'll save this in a variable called `corr_z_list_adj`

In [None]:
# Question 1b
### YOUR CODE HERE

Then create a heatmap comparing all serial positions with each other as we did in the lecture for categories. The relevant columns are `serialpos_WORD` and `serialpos_WORD2`

In [None]:
# Question 1c
### YOUR CODE HERE

From this heatmap, we can clearly see how the similarity decreases as a function of distance. To replicate the Manning analysis more directly, we want to plot this with "Study distance" on the x-axis. Study distance is going to be the difference in serial positions between the two compared items. Recompute the within-list comparisons as a function of Study distance for each subject. You should end up with a dataframe that has one row for each subject and each distance (from 1 to 11)

In [None]:
# Question 1d
### YOUR CODE HERE

Use seaborn's [catplot](https://seaborn.pydata.org/generated/seaborn.catplot.html) function to replicate the Manning plot. Since each row is an independent observation from each subject, seaborn will compute the correct errorbars by default. You just need to plug in which variables go on the x and y-axis

In [None]:
# Question 1e
### YOUR CODE HERE

We'll also show that being from the same category influences similarity on top of the serial position distance within a list. It's crucial to account for both at the same time because of the category structure of the list. You need to create a variable that checks if the category of the two items are the same. Use the split-apply-combine technique with both `serialpos_dist` and `same_category` to get the mean within each possible combination. Use the [catplot](https://seaborn.pydata.org/generated/seaborn.catplot.html) parameter `hue` to display both variables on the same plot.

In [None]:
# Question 1f
### YOUR CODE HERE

# Question 2: Encoding-Retrieval RSA analysis

Now we'll look at the key analysis in Manning et al. (2011). They showed that right before you retrieve an item, the neural activity that displayed the drift above looks more similar to the item youre about to retrieve than any other item. On top of that, it is also more similar to serial positions that were near by at encoding. This suggests that you not only retrieve an item but its general *context*. This is a key prediction of retrieved context theories of free recall.

<center>
    <img src="figures/MannEtal11_fig1.png" width=600>
</center>

<center>
    <img src="figures/MannEtal11B.png" width=400>
</center>

Here we will use comparisons between encoding events and recall events. They are stored in the following dataframe

In [None]:
raw_REC_WORD_rsa_df = pd.read_csv(data_path + 'raw_REC_WORD_rsa_df.csv')

We will only look at comparisons within a list and on retrievals where a word that was actually on the list was retrieved. The following code selects that for you

In [None]:
within_list_correct_REC_WORD_rsa_df = raw_REC_WORD_rsa_df.query(
    'list_REC_WORD == list_WORD and serialpos_REC_WORD != -999')

Take a look at the `within_list_correct_REC_WORD_rsa_df` and make sure you understand the information that is contained in that dataframe

How would you compute the serialpos_lag used in the above plot?

In [None]:
# Q2a
### YOUR CODE HERE

Like above, we'll subtract out variance we are not interested in. Here, we will compute relative similarity not only within list but within each retrieval. We can use the `outpos_REC_WORD` column to do that since it indicates each retrieval.

In [None]:
# Q2b
### YOUR CODE HERE

Using that, we get a figure that looks a lot like the one from Manning et al.! Similar to Manning, make sure you select serial position lags between -5 and 5 before plotting

In [None]:
# Q2c
### YOUR CODE HERE

Manning et al. (2012) showed that items that were semantically similar were *also* more similar to the moments just prior to item retrieval. Kragel et al. (2021) also showed that you could decode the category of the item that was about to be retrieved in the moments just before retrieval.

<center>
    <img src="figures/KragEtal21_fig3b_cat.png" width=400>
</center>

 We are going to try to look at the same thing here. Just like above, make the above plot as a function of both `serialpos_lag` and `same_category`

In [None]:
# Q2d
### YOUR CODE HERE

# Question 3: category similarity during encoding

Finally, we are going to attempt to replicate [Kuhl et al. (2012)](https://www.sciencedirect.com/science/article/pii/S0028393211004088). They asked subjects to do an image-word cued recall task and investigated neural representations of the images during encoding. They showed that when two images from the same category were both recalled, their temporal cortex representation was more similar than when they were both forgotten and the opposite when they were from different categories. 

<center>
    <img src="figures/KuhlEtal12_fig6.jpeg" width=400>
</center>

We have a lot of temporal cortex electrodes so as a first pass we can try to replicate this effect using all electrodes. The RSA at encoding is already computed above so we just need to make the right comparisons. You might find it useful to create a `pair_recalled` variable like this. 

In [None]:
within_list_WORD_rsa_df['pair_recalled'] = within_list_WORD_rsa_df['recalled_WORD'] + within_list_WORD_rsa_df['recalled_WORD2']

Now try to make a plot similar to the left hand side of the Kuhl plot but slightly more complex. Make a plot where the x axis represents study distance (like in Q1) and the hue reflects the `pair_recalled` variable. Make one plot for within category and one for between. You may find it useful to use the `col` parameter of the `catplot` function. Don't be surprised if the results don't seem to totally replicate Kuhl's

In [None]:
# Q3
### YOUR CODE HERE