# Week 3: representational similarity analysis (RSA)

This week's tutorial is about RSA! We'll be looking at how to transform patterns into RDMs using various distance measures, how to test the relation between feature-RDMs and brain-RDMs, and take a look at exploratory RDM visualization using multidimensional-scaling (MDS).

We will use data from the SharedStates dataset - a within-subject dataset used previously for a cross-decoding analysis (see [here](https://github.com/lukassnoek/SharedStates/blob/master/sharedstates_fullarticle_draft.pdf) for a draft of the corresponding article, which is currently in press). In the first couple of examples of this tutorial, we'll use the single-trial pattern estimates from the "self-task", in which subjects were shown short sentences about either emotional actions ("action" trials), emotional ("interoceptive") feelings ("interoception" trials), or emotional situations ("situation" trials). They were instructed to imagine as if they were experiencing/doing the actions/feelings/situations themselves (see figure below). <img src='self_task.png'>

The self-task was done twice (i.e. in two runs). Each run contained 20 trials of each condition (action, interoception, situation), so in total (across runs) we have 120 trials (20 trials \* 3 conditions \* 2 runs). While we applied a (cross-)decoding analysis on this dataset for the original study, we will apply some RSA techniques on this data for this week's tutorial. (The SharedStates dataset is, by the way, also one of the data-sets that you can use for your final project (more info on Blackboard/Week 4)).

In terms of skills, after this tutorial you are be able to:

* Create representational dissimilarity matrices (RDMs) from brain patterns using various metrics;
* Create custom "conceptual" feature-RDMs based on categorical conditions;
* Statistically test the similarity between feature- and brain-RDMS;
* Exploratively visualize brain-RDMs with [MDS](http://scikit-learn.org/stable/auto_examples/manifold/plot_mds.html);

### Names
student 1: fill in your name ...

student 2: ... and the name of the person you're working with

### 1. Loading in and organizing data (yet again ...)


In [1]:
def extract_condition_names(design_file):
    
    contrasts = sum(1 if 'ContrastName' in line else 0
                        for line in open(design_file))

    n_lines = sum(1 for line in open(design_file))

    df = pd.read_csv(design_file, delimiter='\t', header=None,
                     skipfooter=n_lines - contrasts, engine='python')

    cope_labels = list(df[1].str.strip())  # remove spaces

    # Here, numeric extensions of labels (e.g. 'positive_003') are removed
    labels = []
    for c in cope_labels:
        parts = [x.strip() for x in c.split('_')]
        if parts[-1].isdigit():
            label = '_'.join(parts[:-1])
            labels.append(label)
        else:
            labels.append(c)

    return labels
    
def generate_categorical_rdm(y):
    rdm = np.vstack([y == y_tmp for y_tmp in y])
    return rdm.astype(int) * -1

def kendalltau_a(a, b):
    
    n = a.size
    K = 0
    for k in range(n - 1):
        pairrel_a=np.sign(a[k]-a[k+1:n])
        pairrel_b=np.sign(b[k]-b[k+1:n])
        K += np.sum(pairrel_a * pairrel_b)
    return K/(n*(n-1) / 2.0)

def test_rdm(X, candidate_rdm, dist_func, corr_func, average=False, mask=None):
    
    if average:
        X = (X[::2, :] + X[1::2, :]) / 2.0
        candidate_rdm = candidate_rdm[::2, ::2]
    
    if mask is not None:
        X = X[:, mask]
    
    brain_rdm = pairwise_distances(mvp.X, metric=dist_func)
    brain_rdm = brain_rdm[np.triu_indices(brain_rdm.shape[0], k=1)]
    candidate_rdm = candidate_rdm[np.triu_indices(brain_rdm.shape[0], k=1)]
    score = corr_func(brain_rdm, candidate_rdm) 
    
    if isinstance(score, (list, tuple)):
        score = score[0]
    return score

In [2]:
import numpy as np
import nibabel as nib

class Mvp():
    """ Custom class to load, organize, and process multivoxel MRI patterns. """
    
    def __init__(self, paths):
        
        self.paths = paths
        
    def load(self, voxel_dims=(91, 109, 91)):
        
        X = np.zeros((len(self.paths), np.prod(voxel_dims)))

        # Start your loop here!
        for i, path in enumerate(self.paths):
    
            X[i, :] = nib.load(path).get_data().ravel()
        
        self.X = X
    
    def standardize(self):
        self.X = (self.X - self.X.mean(axis=0)) / self.X.std(axis=0)
        self.X[np.isnan(self.X)] = 0
        
    def apply_mask(self, path_to_mask, threshold):
        
        mask = nib.load(path_to_mask).get_data()
        mask_bool = mask > threshold
        self.X = self.X[:, mask_bool.ravel()]

In [3]:
from glob import glob
from skbold.utils import load_roi_mask, parse_roi_labels
import os.path as op

import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.metrics.pairwise import pairwise_distances
import pandas as pd
from scipy import stats



In [7]:
lud = {'Actie': 0,
       'Interoception': 1,
       'Situation': 2}
mask_names = sorted(parse_roi_labels(atlas_type='HarvardOxford-Cortical', lateralized=True).keys())
subs = sorted(glob('/media/lukas/data/PatternAnalysis/week_4/SharedStatesData/SELF/sub*'))

scores = np.zeros((len(subs), len(mask_names)))

for i, sub_path in enumerate(subs):
    sub = op.basename(sub_path)
    print(sub)
    reg_dir = sub_path + '/%s-self1.feat/reg' % sub
    path = sub_path + '/*.feat/stats/tstat*.nii.gz'
    paths = sorted(glob(path), key=lambda x: int(op.basename(x).split('.')[0].split('tstat')[-1]))
    labels = extract_condition_names(sub_path + '/%s-self1.feat/design.con' % sub)
    labels.extend(labels)
    labels = sorted(labels)
    labels = np.array([lud[tmp] for tmp in labels])
    cat_rdm = generate_categorical_rdm(labels)

    masks = load_roi_mask('all', atlas_name='HarvardOxford-Cortical',
                          lateralized=True, threshold=0, reg_dir=reg_dir, which_hemifield='left')

    for ii, mask in enumerate(masks):
        mvp = Mvp(paths)
        mvp.load(voxel_dims=(80, 80, 37))
        mvp.X = mvp.X[:,mask.ravel()]
        mvp.X = (mvp.X[0::2, :] + mvp.X[1::2, :]) / 2.0
        brain_rdm = pairwise_distances(mvp.X, metric='correlation')
        brain_rdm_triu = brain_rdm[np.triu_indices(brain_rdm.shape[0], k=1)]
        cat_rdm_tmp = cat_rdm[::2]
        cat_rdm_triu = cat_rdm_tmp[np.triu_indices(brain_rdm.shape[0], k=1)]
        scores[i, ii] = stats.spearmanr(brain_rdm_triu, cat_rdm_triu)[0]
print(scores)

sub007
sub002
sub003
sub004
sub005
sub006
sub007
sub008
sub009
sub010
sub011
sub012
sub013
sub018
sub020
sub021
sub022
sub023
sub024
sub025


In [16]:
mask_names[scores.mean(axis=0).argmax()]

u'Left Parietal Operculum Cortex'

In [17]:
scores[:, 30]

array([ 0.11582821, -0.13442059, -0.01340976,  0.01146322, -0.07725815,
        0.15712431,  0.01684405, -0.03401773, -0.0085345 ,  0.20194157,
        0.15010206, -0.07092855,  0.26829997,  0.20716204, -0.03940524,
       -0.01251444, -0.0116169 , -0.02123825, -0.0269398 , -0.05358783])