In [6]:
import warnings
from sklearn.exceptions import ConvergenceWarning

# Suppress warnings globally
warnings.filterwarnings("ignore", category=ConvergenceWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)

from utils import (
    load_index
)

from mapping import (
    consistency_across_trials,
    interanimal_consistency_1v1,
    interanimal_consistency_pool
)

In [7]:
index_df = load_index("../Preproc2/data/combined_index.csv")

# Theoretical background

## General framework

The mouse representation after each stimulus (i.e. image) are stored in tensors of shape $(T, I, U)$, where:
- $T$ is the number of trials (i.e. repetitions of the same stimulus)
- $I$ is the number of images (i.e. different stimuli)
- $U$ is the number of units (i.e. neurons) in a pairt (specimen, area)


We want to compute the similarity between two representations, for example same (specimen, area) and different trials (fixed stimulus), or same area, different specimens (fixed stimulus), or the similarity between a model representation (e.g. one layer of AlexNet) and a (specimen, area) representation (fixed stimulus).


## Corrected similarity

In all cases we have to deal with very noisy signals, so we will use a statistical methods to estimate the **corrected similarity**, exploiting the repetitions of the same stimulus (trials).

Let $A$ and $B$ be two representations of shape $(T, I, U_A)$ and $(T, I, U_B)$ respectively (e.g. $A$ could be a model representation, $B$ a neural representation, i.e. a specific specimen and area representation).

$$
\text{sim}_\text{corrected}(A,B) = \frac{\text{sim}_\text{observed}(A,B)}{\sqrt{\text{reliability}(A) \cdot \text{reliability}(B)}}
$$

One way to estimate the reliability of a representation is to compute the similarity between two halves of the trials, for example using split-half correlation (Spearman-Brown corrected):

$$
\text{reliability}(A) = \text{SB}(\text{sim}_\text{observed}(A_1, A_2))
$$
$$
\text{SB}(r) = \frac{2r}{1+r}
$$

where $A_1$ and $A_2$ are defined by taking two halves of the trials of representation $A$, and computing the mean response over trials for each half. So $A_1$ and $A_2$ have shape $(I, U_A)$, i.e. they are the traditional **design matrices**.

The split is done by randomly assigning each trial to one of the two halves, and then averaging the similarity over multiple random splits (n_bootstrap=100).


### RSA, CKA similarity

$$
\text{RSA}_{\text{corrected}}(A,B) = \frac{\frac{1}{2}[\text{RSA}_{\text{observed}}(A_1,B_2) + \text{RSA}_{\text{observed}}(A_2,B_1)]}{\sqrt{\text{reliability}(A) \cdot \text{reliability}(B)}}
$$

analogously for CKA.

### PLS regression

Given two representations $A$ and $B$, we fit a PLS regression model to predict $B$ from $A$ (with 25 components, as in the paper), and then we compute the corrected correlation between the predicted $\hat{B}$ and the true $B$.

We performed a train-test split on the images (50% train, 50% test): $A_{train}, A_{test}, B_{train}, B_{test}$.

We split each of the train and test sets in two halves of trials: $A_{train,1}, A_{train,2}, A_{test,1}, A_{test,2}, B_{train,1}, B_{train,2}, B_{test,1}, B_{test,2}$.

We fit two PLS models: one on the first half of trials, one on the second half of trials: 

$$
f_1: A_{train,1} \rightarrow B_{train,1}
$$

$$
f_2: A_{train,2} \rightarrow B_{train,2}
$$

We use the two models to predict the two halves of the test set:

$$
\hat{B}_1 = f_1(A_{test,1})
$$

$$
\hat{B}_2 = f_2(A_{test,2})
$$

Then for each unit $j$ in $B$, we compute the corrected correlation between the predicted and true responses:

$$
s_j = \frac{\text{corr}(\hat{B}_1^j, B_2^j)}{\sqrt{\text{SB}(\text{corr}(\hat{B}_1^j,\hat{B}_2^j)) \cdot \text{SB}(\text{corr}(B_1^j,B_2^j))}}

$$

Finally we take the median over all units:

$$
\text{PLS}_{\text{corrected}}(A,B) = \text{median}_j(s_j)
$$

## Comparing a mouse with itself across different areas

Using the above formalism, we compute here the similarity between the representations each pair (specimen, area) and itself. This is just a trivial check to ensure that the code is working as expected, i.e. the similarity should be 1 (or very close to 1).

In [None]:
consistency_PLS = consistency_across_trials(index_df)
consistency_PLS

# 1 vs 1 mapping: interanimal consistency

Now we can compute the similarity between each pair ofdifferent specimens, for the same area. This is the inter-animal consistency, as defined in the paper. Then we aggregate the results by taking the median across all pairs of specimens, for each area.

In [4]:
# interanimal_consistency_1v1_df_pls = interanimal_consistency_1v1(index_df, n_boot=5, n_splits=5)
# interanimal_consistency_1v1_df_pls.groupby('Area')['Mean'].mean() #median()

## Pooling version

In the pooling version, we compute the similarity between each specimen and the pooled representation of all the other specimens, for the same area. Then we aggregate the results by taking the median across all specimens, for each area.

In [6]:
iac_pool_pls = interanimal_consistency_pool(index_df, n_boot=100, n_splits=10)
iac_pool_pls.groupby('Area')['Mean'].mean()



Area
VISal    0.545608
VISam    0.483868
VISl     0.329097
VISp     0.455990
VISpm    0.461907
VISrl    0.451550
Name: Mean, dtype: float64

In [9]:
iac_pool_pls = interanimal_consistency_pool(index_df, n_boot=5, n_splits=5)
print(iac_pool_pls.groupby('Area')['Mean'].mean())

Area
VISal    0.541422
VISam    0.484949
VISl     0.333043
VISp     0.457581
VISpm    0.464098
VISrl    0.461443
Name: Mean, dtype: float64


In [3]:
iac_pool_pls = interanimal_consistency_pool(index_df, n_boot=1, n_splits=1)
iac_pool_pls.groupby('Area')['Mean'].mean()



Area
VISal    0.558592
VISam    0.473066
VISl     0.354944
VISp     0.487118
VISpm    0.478385
VISrl    0.470565
Name: Mean, dtype: float64