COGS 4290 RSA

David Halpern 4/4/23

# How could we know if my mind represents concepts similarly to yours?

If I see a square and you see a square, do our internal representations look the same? [Shepard & Chipman (1970)](https://www.sciencedirect.com/science/article/abs/pii/0010028570900022) note, following arguments from Wittgenstein, that there is basically no way to know. "If there is, as we suppose, some internal event that corresponds to our perception of a square, our ability to form an association between this event and the word “square” requires only that this event have a regular relation to the external object is one of causality, not of structural isomorphism. Such an event could be the activation of some group of neurons, such, perhaps, as a “cell assembly” of the sort described by Hebb (1949). To insist, in addition, that these neurons must be spatially arranged in precisely the form of a square, themselves, does not in the least help to explain how they come to trigger the naming response “square.” On the contrary, it only attempts the absurdity of putting off until later the whole process of pattern recognition that must by definition preceed the pivotal event in question. (With about as much logic, one might as well argue that the neurons that signal that the square is green should themselves be green!)"

So what can we know about the nature of your representations and my representations? Way before the advent of modern cognitive neuroscience, Shepard & Chipman tried to tackle this problem in the following way: they gave participants a deck of 105 cards with the names of two states on them. They asked the participants to sort the deck from most similar to least similar. They then did the same thing again but with pictures of the states instead of names. Using these ranks, they used a statistical procedure called multidimensional scaling to try to map out what the state representations must have looked like in order to generate this data. This is what they got:

<center>
    <img src="figures/ChipShep70_fig1.png" width=400>
</center>

Amazingly, the results looked pretty similar whether they were ranking by image or by name! And they could give a (rough) interpretation of the similarity space:

<center>
    <img src="figures/ChipShep70_fig2.png" width=400>
</center>

Most importantly, subjects were pretty consistent across not only with themselves across the image vs. name ranking but also with the group!

<center>
    <img src="figures/ChipShep70_tab1.png" width=400>
</center>

The key insight of Shepard and Chipman was to look at the "second order isomorphism" -- it's difficult to know if my representation of Louisiana is similar to yours because we can't just ask. But if both of us think Louisiana is more similar to Florida than Colorado, that gets us a lot of the way there. Of course, with modern cognitive neuroscience methods, we can now ask a new question: does my neural activity represent things similarly to yours?

# How should we measure the similarity of neural activity?

With people, we just asked which of these are more similar. But with groups of neurons or electrodes or other sources of neural measurement (e.g. fMRI voxels), there are lots of ways to talk about similarity. For instance, consider A, B, C in figure A below:

<center>
    <img src="figures/geom_distance.png" width=400>
</center>

Imagine the points A, B and C are the responses of two electrodes to seeing three names of states. How similar are each are these? One natural way to think about similarity is Euclidean distance. This is the actual distance in the plane represented by the two electrode responses in figure A. How far apart are A and C? On Measure 1, A is C and C is 2 so they are a distance "2" away. On measure two, A is 0 and C is 2 again so the same distance. If we draw a straight line between A and C, (if you remember from high school geometry class), it would have a length $\sqrt{2^2 + 2^2} = 4$. B is the same distance from A because it is a distance of 2 along Measure 1 (but in the opposite direction as C) and 2 along Measure 2. B and C are the same on Measure 2 but 4 away on Measure 1 so a line between them would have length $\sqrt{4^2 + 0^2} = 4$. So all three points are the same distance away in terms of Euclidean distance. Therefore, we might say, from the neural activity's perspective, all three states are equally similar.

However, imagine we sampled a lot of data from the two neural measures and it looks like the black dots in figure B above. The two measures tend to be correlated. In some sense, this might imply that B should be less similar to A than C because it would be rare for the measures to take on those values. A sligthly more complicated distance measure, known as Mahalanobis distance, takes into account these statistical dependencies when computing distance. There is more detail about how it's calculated in your book.

A separate concern that often comes up with neural measures is that sometimes the numbers aren't meaningful in specific ways. For instance, if two people have slightly different skull thickness then the voltage picked up by scalp EEG will vary in magnitude. If the magnitudes of the points in the figure above doubled, their Euclidean distances would be multipled by \sqrt{2}. This is not necessarily ideal for comparing across subjects and other types of similar artifacts can make it hard to compare signals across sessions from the same subject or even within session. Therefore, it is often preferable to measure similarity using angle-based measures. These types of measures, which include cosine similarity and correlation, look at the angle between the two vectors represented by two points relative to the 0 point. The cosine similarity between a point at \[1, 2\] and \[10, 20\] is 1 because these points have the exact same angle relative to the origin. The figure below, from [Walther et al. (2016)](https://www.sciencedirect.com/science/article/pii/S1053811915011258), shows how various changes to the signals affect distance-based and angle-based distance measures.

<center>
    <img src="figures/WaltEtal16_fig1_w_caption.png" width=400>
</center>

Walther et al. given an argument for why they prefer Euclidean/Mahalanobis distances to angle-based distances when investigating neural activity that is very selective for certain kinds of stimuli (e.g. FFA for faces/PPA for places) and discuss cross-validation of distances measures to reduce bias. However, these concerns are somewhat more relevant for perception studies than memory studies for reasons that are beyond the scope of this class. Because most of the historical literature has used correlation/angle-based measures, we will use that for the examples in this class.

In the 01_compute_rsa notebook, we compute the RSA matrices for several subjects in the categorized free recall dataset. In 01_analyze_rsa notebook, we try to replicate various results in the literature.

In [1]:
import numpy as np
import xarray as xr

Generate a dataset where we observe 10 neural features across 15 states

In [2]:
# mean_features = np.random.normal(size=10)

In [3]:
A = np.diag(np.ones(15))  # a diagonal covariance matrix

In [4]:
A

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0.

Add in a couple true correlations across "states"

In [5]:
A[1, 2] = A[2, 1] = .5
A[6, 8] = A[6, 8] = .8
A

array([[1. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. ],
       [0. , 1. , 0.5, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. ],
       [0. , 0.5, 1. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. ],
       [0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. ],
       [0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. ],
       [0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. ],
       [0. , 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0.8, 0. , 0. , 0. , 0. ,
        0. , 0. ],
       [0. , 0. , 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. ],
       [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. ,
        0. , 0. ],
       [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. ,
        0. , 0. ],
       [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. ,
       

In [6]:
features = np.random.multivariate_normal(size=10, mean=np.zeros(15), cov=A).T

  """Entry point for launching an IPython kernel.


Now we want to look at two sets of features and see how similar they are

In [7]:
features[1], features[2]

(array([ 0.14010331,  2.41049863,  0.47318073,  0.34510661,  0.17460727,
        -0.30782859,  0.70971758, -2.12595088, -0.65893486, -1.43938892]),
 array([ 0.05503875,  2.2370451 , -0.35378325,  0.98279543, -0.26459798,
        -0.98762072,  0.62583672,  0.29198124, -0.81102996, -0.56434377]))

Generally feature 1 is negative when feature 2 is negative and positive when feature 2 is positive so they seem pretty similar (this is because we made them similar when we generated the covariance matrix). This should be reflected when we compute their correlation. [Pearson's correlation](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) is the covariance of the two variables divided by the standard deviation of each multiplied together.

Covariance is the expectation (aka expected value aka mean) of (feature 1 - mean(feature 1)) * (feature 2 - mean(feature 2). This measure is going to be high if feature 1 is usually above its mean when feature 2 is also above its mean. Lets compute it

In [8]:
np.mean(features[1]), np.mean(features[2])

(-0.027888912720931414, 0.12113215709427871)

In [10]:
(features[1] - np.mean(features[1])) 

array([ 0.16799222,  2.43838754,  0.50106965,  0.37299552,  0.20249618,
       -0.27993968,  0.73760649, -2.09806197, -0.63104595, -1.41150001])

In [9]:
(features[1] - np.mean(features[1])) * (features[2] - np.mean(features[2]))

array([-0.01110318,  5.15941576, -0.23796569,  0.32139654, -0.07810888,
        0.31038392,  0.37227336, -0.35845197,  0.58823713,  0.96754927])

Now we take the average of that value (but its a sample statistic so we use n-1 in the denominator, instead of n=10)

In [11]:
feat_cov = (np.sum((features[1] - np.mean(features[1])) * 
        (features[2] - np.mean(features[2]))) / (10 - 1)) 
feat_cov 

0.7815140293730856

We can check it against `numpy`'s version

In [12]:
np.cov(features[1], features[2])

array([[1.53555569, 0.78151403],
       [0.78151403, 0.93892271]])

Now just divide that by the sample standard deviations

In [13]:
feat_cov / (np.std(features[1], ddof=1) * np.std(features[2], ddof=1))

0.6508622317713612

And check that against numpy's versiosn of correlation

In [14]:
np.corrcoef(features[1], features[2])

array([[1.        , 0.65086223],
       [0.65086223, 1.        ]])

Now we want a way to compute these correlations for all of the states and keep track of all of the state info. Like we do with EEG, we'll store it using an xarray DataArray

In [15]:
states_da = xr.DataArray(data=features, 
             dims=["states", "features"], 
             coords={'features': ['feat_' + str(i) for i in range(10)],
                     'states': ['Minn.', 'Ore.', 'W.V.', 'Colo.', 'Ala.', 'Ill.',
                               'Nev.', 'Nebr.', 'Okla.', 'Ida.', 'Fla.', 'La.', 
                                'S.C.', 'Mo.', 'Me.']})

In [16]:
states_da

Now we can use the xr.corr function to compute the correlation of features between states but its not so straightforward because the states have the same names in both -- just plugging it in like this leads to the wrong answer because it matches the array along its dimensions. This leads to just correlates each state with itself, which is going to  always be 1

In [17]:
xr.corr(states_da, states_da, dim='features')

What we want is each state correlated with every other one. We need to rename the states variable to get the right answer

In [20]:
states_da2 = states_da.rename({'states': 'states2'})

In [21]:
states_corr = xr.corr(states_da, states_da2, dim='features')

In [22]:
states_corr

In [23]:
states_corr_df = states_corr.to_dataframe('corr').reset_index()
states_corr_df

Unnamed: 0,states,states2,corr
0,Minn.,Minn.,1.000000
1,Minn.,Ore.,-0.269354
2,Minn.,W.V.,0.049426
3,Minn.,Colo.,0.391610
4,Minn.,Ala.,0.368687
...,...,...,...
220,Me.,Fla.,0.145198
221,Me.,La.,0.058229
222,Me.,S.C.,0.451794
223,Me.,Mo.,0.491934


This is the same as we calculated before!

In [24]:
states_corr_df.query('states == "Ore." and states2 == "W.V."')

Unnamed: 0,states,states2,corr
17,Ore.,W.V.,0.650862
