# Comparing RSA to Machine Learning

Han Tran, Timo Matzen, University of Amsterdam, Pattern Analysis 2017


<div class='alert alert-info'>
**General feedback**: Well done, Han & Timo! The analyses are well implemented, clear, and (as far as I can see) correct. Also, very interesting results! I liked that you embedded the analysis in the emotion-literature, but this kind of contrasted with your research question, which was more *methodological* in nature. Given that, I would have rather seen some more in depth elaboration on the issues you briefly mentioned when citing Friston and the Naselaris paper. I think this would have created a more coherent report. In terms of code clarity (and efficiency), you could have organized your code a little bit better (by e.g. summarizing your analysis in functions, and looping that across ROIs) to improve clarity. 

<br><br>
Argumentation analysis (10%):  6<br><br>
Embedding in literature (5%): 7<br><br>
Implementation analysis (55%): 9<br><br>
Clarity (30%): 8<br><br>
Total: 8.3
<br><br>
</div>

## Abstract

In the present study subjects were shown short sentences about either emotional actions ("action" trials), emotional ("interoceptive") feelings ("interoception" trials), or emotional situations ("situation" trials) and instructed to mentalize as if they were feeling or experiencing the states themselves. We investigated the self-focused emotion imagery task with multivariate pattern analysis using machine learning (ML) and representational similarity analysis (RSA) (Kriegeskorte, 2011; Kriegeskorte & Kievit, 2013; Naselaris, Kay, Nishimoto, & Gallant, 2011). We examined in how far RSA and decoding model emotional actions, emotional feelings and emotional situations. Three ROIs were examined with decoding/ ML and RSA: The insular cortex, the Anterior Cingulate Gyrus and the medial Prefrontal Cortex. The correlation between conceptual and brain representational dissimilarity matrices was compared to decoding accuracy using Spearman’s rank correlation. RSA as well as Machine Learning performed similarly; decoding could significantly predict the categories above chance, while RSA could find significant correlations between the brain geometries and the conceptual geometries (categories). On average the correlation between RSA and decoding analyses across the three ROIs was 0.646. Significant correlations between correlation of the RSA analyses and the decoding accuracy were found for the *Anterior Cingulate Gyrus* and the *medial Prefrontal Cortex*; no significant correlation was found in the *Insular Cortex*.


<div class='alert alert-info'>
**Feedback**: Clear, but I kind of miss an answer to the "why" question: why did you suspect that RSA and decoding would correlate (hypothesis), and what did you conclude given your results (discussion/conclusion)?
</div>

## Introduction

“Clenching the jaw” as a sign of anger,  “holding the breath” as a sign of fear or “a smile as a sign of happiness. The literature has already found the strong link between emotions and bodily responses, depicting strong emotions with bodily reactions. James and Lange (1884) have already suggested in 1884 that “emotional brain-processes not only resemble the ordinary sensorial brain-processes, but in very truth are nothing but such processes variously combined” emphasizing the distinct bodily expressions of emotions. However, it would take nearly a century later before researchers would attempt to disentangle the relationship between bodily and mental states. Emotional research has since proposed that bodily responses and emotions are highly intertwined (Jennings, 1992; Richard Jennings & van der Molen, 2002). Critchley, Wiens, Rotshtein, Ohman, & Dolan (2004) could find indications that emotions result from brain activity caused by bodily responses using functional magnetic resonance imaging (fMRI). 

According to the findings of Critchley et al. (2004) the size and activity of the insular cortex were associated with subject’s ability to experience specific emotions and their interoception, the awareness of one’s own sensory signals from the body. Further studies could link the insular cortex to interoceptive awareness contributing to emotional experience (Bechara & Naqvi, 2004; Casanova et al., 2016; Gu, Hof, Friston, & Fan, 2013). Previous studies could further demonstrate that the anterior cingulate cortex (ACC) is involved in emotional activity and interoception (Bush, Luu, & Posner, 2000; Etkin, Egner, & Kalisch, 2011; Zaki, Davis, & Ochsner, 2012) considering the connection of the ACC to the limbic system and interoceptive markers (Jones, Minati, Harrison, Ward, & Critchley, 2011; Lavin et al., 2013). According to Etkin et al. (2011) the medial prefrontal cortex (mPFC) is involved in emotional processing as well considering that the mPFC is associated with mentalizing emotions and the theory of Mind (Perry, Hendler, & Shamay-Tsoory, 2012; Sebastian et al., 2012). In a neuroimaging study Oosterwijk et al. (2015) could demonstrate that the mPFC showed high engagement for internal emotion sentences, thereby further supporting previous studies (Hänsel & von Känel, 2008; Medford & Critchley, 2010). In fact, a meta-analysis by Lindquist, Wager, Kober, Bliss-Moreau, & Barrett (2012) could demonstrate that the ventromedial prefrontal cortex, the temporal lobe, the precuneus and dorsomedial prefrontal cortex, also known as the default network,  are engaged during tasks involving emotion experience and perception. Furthermore, the default network is involved in conceptual processing in general (Binder, Desai, Graves, & Conant, 2009).

The semantic embodiment in terms that processing mental states activates neural circuits in order to experience these states has found wide evidence (Moseley et al., 2015; Pulvermüller, 2013). In fact previous research could demonstrate that semantic embodiment through processing linguistic descriptions of emotions or actions related to specific emotions can activate neural patterns frequently engaged in the representation and generation of internal states and actions (Lench, Flores, & Bench, 2011; Oosterwijk et al., 2015; Pulvermüller & Fadiga, 2010).  

Oosterwijk, Snoek, Rotteveel, Barrett, & Scholte (2017) could already demonstrate that the category in the self-focused task can be reliably decoded using a linear supper vector machine classifier on multi voxel patterns. However, representational similarity analyses were not applied, so we do not know in how far the results of the classifier can be generalized, when using representational similarity analyses. According to Friston (2009) decoding and encoding are not fundamentally different in terms of effectiveness when evaluating significant performance above chance. If decoding analyses leads to accurate discrimination between categories in the ROI, then the ROI must respectively represent information about the decoded features (Naselaris et al., 2011), thus RSA and machine learning should show an overlap in their information, depicted by high correlations between them. 


<div class='alert alert-info'>
**Feedback**: good intro and great literature review. Although I think your research question was more methodological in nature, which does not warrant an extensive review of the emotion literature, you 'compensated' by the last paragraph, in which you explain and cite literature on why RSA and decoding could be equivalent - which I think was spot on! 
</div>

## Current study

In the present study subjects were shown short sentences about either emotional actions ("action" trials), emotional ("interoceptive") feelings ("interoception" trials), or emotional situations ("situation" trials) and instructed to mentalize as if they were feeling or experiencing the states themselves. We investigated the self-focused emotion imagery task with multivariate pattern analysis using machine learning (ML) and representational similarity analysis (RSA) (Kriegeskorte, 2011; Kriegeskorte & Kievit, 2013; Naselaris et al., 2011). The correlation between conceptual and brain representational dissimilarity matrices (RDM) was compared to decoding accuracy using Spearman’s rank correlation. The comparison between RSA correlation and decoding accuracy was applied to three different brain regions: The insular cortex, mPFC and the ACC. 

### Hypotheses

1. We expect significant decoding accuracy and significant correlation between conceptual and brain RDM in the following ROIs: Insular Cortex, ACC, mPFC.

2. We expect that it is possible to decode whether a participant was presented either the action, interoception or situation stimuli from the following ROIs: Insular Cortex, ACC, mPFC. 

3. Taking into consideration that RSA and decoding try to model similar phenomena only in a different process (pipeline) (Friston, 2009; Naselaris et al., 2011), we expect a significant Spearman correlation between RSA and decoding for the following ROIs: Insular Cortex, ACC, mPFC.


<div class='alert alert-info'>
**Feedback**: Clear.
</div>

## Methods

This study consists of secondary data analyses. The data used in this study was obtained as part of the research project "Shared States". Ethical approval was obtained from the Departmental Ethics Committee at the Department of Psychology, University of Amsterdam. Informed consent was obtained from all adult participants. 

Analyses
The Bonferroni corrected threshold is 0.0056; we wanted to test with an alpha level of 0.05; we did 9 statistical tests so we corrected the threshold by dividing 0.05 by 9. This should prevent capitalizing on chance.


In [None]:
print ("The Bonferroni corrected threshold is: %f" %  (0.05/9))

### Experimental stimuli and procedure

The study consisted of two parts, the self-focused emotion imagery task (SF-Task) and the other-focused emotion understanding task (OF-Task). In the current analyses we only focused on the SF-Task, which comprised of subjects who imagined performing or
experiencing actions (e.g., “pushing someone away”), interoceptive sensations (e.g., “increased heart rate”) and situations (e.g., “alone in a park at night”) associated with emotion. 

Participants processed short linguistic cues that described actions (e.g., “pushing someone away”; “making a fist”), interoceptive sensations (e.g., “being out of breath”; “an increased heart rate”), or situations (e.g., “alone in a park at night”; “being falsely accused”) and were instructed to imagine experiencing it in order to elicit self-focused processing of actions, interoceptive or situational information.

The SF-Task was a fully randomized, event-related design and was performed in two runs: 60 sentences were presented (20 per condition) with a different randomization for each participant.  <img src='self_task.png'>


### Participants

Participants received 22.50 Euros per session. Standard exclusion criteria regarding MRI safety were applied and people who were on psychopharmacological medication were excluded a priori.

Twenty-two Dutch students from the University of Amsterdam (14 females; Mage = 21.48, SDage = 1.75) were tested. Two participants were excluded from the model validation dataset because there was not enough time to complete the experimental protocol or due to excessive movement (> 3 mm within data acquisition runs).


### Creating the custom *Mvp* Class and the *make_conceptual_rdm()* function


In [1]:
# Some imports 
import os.path as op
from glob import glob
import numpy as np
import nibabel as nib
import numpy.matlib
import matplotlib.pyplot as plt
%matplotlib inline
import os.path as op
from sklearn.metrics.pairwise import pairwise_distances
from scipy.stats import spearmanr
from scipy.stats import wilcoxon
from skbold.utils import load_roi_mask
from scipy import stats
from scipy.stats import wilcoxon

# Create the Mvp-Class
class Mvp():
    """ Custom class to load, organize, and process multivoxel MRI patterns. """
    
    def __init__(self, paths):
        
        self.paths = paths
    
    # loads the nifti-files
    def load(self, voxel_dims=(91, 109, 91)):
        
        X = np.zeros((len(self.paths), np.prod(voxel_dims)))

        for i, path in enumerate(self.paths):
    
            X[i, :] = nib.load(path).get_data().ravel()
        
        self.X = X
    
    # Preprocessing: standardizes each feature (voxel) in the 2D `data` attribute
    def standardize(self):
        self.X = (self.X - self.X.mean(axis=0)) / self.X.std(axis=0)
        # Because the above line may divide by 0 for voxels outside the brain,
        # this leads to NaN (Not a Number) values. The line below sets all NaN values to 0.
        self.X[np.isnan(self.X)] = 0
        
    # load in the mask (using nibabel), then make a boolean array of the mask 
    # and ravel the result in a 1D vector, and finally use this boolean mask-array to index and 
    # update the columns of the `X` attribute    
    def apply_mask(self, path_to_mask, threshold):
        
        mask = nib.load(path_to_mask).get_data()
        mask_bool = mask > threshold
        self.X = self.X[:, mask_bool.ravel()]
        
# Function that makes a conceptual RDM
def make_conceptual_rdm(labels):
    N = len(labels)
    s =(N,N)
    new = np.zeros(s)
    for i in range(N):
        for p in range(N):
            if labels[i] == labels[p]:
                new[i,p] = new[i,p] + 0
            else:
                new[i,p] = new[i,p] +1
    return new



## RSA

RSA was applied on the single-trial pattern estimates from the SF-Task in which subjects were shown short sentences about either emotional actions, emotional feelings or emotional situations.

The conceptual RDM was created using these three conditions: actions, feelings and situations. Each sample (trial) in our *Mvp object* was matched to its specific condition. The conditions for all the contrasts (single-trial estimates) for both runs were extracted and stored in a text-file called *sample_labels.txt*. The self-built function *make_conceptual_rdm()*  then makes a RDM using the labels.

The brain RDM was created using pairwise correlational distances between samples/ patterns in k-dimensional space.

<div>
 \begin{align} \delta_{correlation} = 1-\frac{\sum_{i=1}^{K}((p_{i} - \bar{p})\cdot(q_{i} - \bar{q}))}{\sqrt{\sum_{i=1}^{K}{((p_{i} - \bar{p})^{2}\cdot(q_{i} - \bar{q})^{2})}}} \end{align} 
</div>

According to Garrido, Vaziri-Pashkam, Nakayama, & Wilmer (2013) no standardization in RSA should be applied because the resulting data transformation leads to altered relationships between conditions and further make generalization of results invalid and unreliable, therefore we did not apply standardization for the RSA analyses. 

Taking the symmetry of RDM’s into consideration, only the $N\cdot(N-1)/2$ pairwise dissimilarity values without the diagonal are selected for the significance test. Furthermore, the lower-triangle extracted matrices are flattened into a vector.

The conceptual RDM was correlated with the brain RDM using Spearman’s rank correlation in order to test whether the conceptual RDM significantly explains the brain RDM.
Correlational distances are preferred for the comparison between conceptual RDM and brain RDM because they are invariant to differences in the mean and variability of the dissimilarities, specifically Spearman’s rank correlation because it does not assume a linear relationship between feature and brain RDM (Kriegeskorte, Mur, & Bandettini, 2008). 



<div class='alert alert-info'>
**Feedback**: Clear.
</div>

### RSA: Insular Cortex

In [None]:

brain_rdms_insular = np.zeros((120*(120-1)/2, 19)) # matrix of (N*(N-1)/2)-by-19 (19 subjects)

cor_insular = np.zeros(19)

# Path to subject directories data and sorting of the subject paths
sub_paths = glob("/home/nipa_8/SharedStates/SELF/sub0*/")
sub_paths = sorted(sub_paths)

# Path to reg - subject directories data and sorting of the subject paths for the mask
paths_reg = glob("/home/nipa_8/SharedStates/SELF/sub0*/self1.feat/reg" )
paths_reg = sorted(paths_reg)

N = 19 
for i in range(N):
    
    # Globbing over the subject paths to get the paths to tstat files for every subject.
    paths = glob(sub_paths[i] + "self*/stats/tstat*")
    # This sorts the paths so that it is ordered *tstat1.nii.gz (run1), *tstat1.nii.gz (run2), 
    # *tstat2.nii.gz (run1) etc.
    paths = sorted(paths, key=lambda x: int(op.basename(x).split('.')[0].split('tstat')[-1]))
    
    # Loading in the mask of the Insular Cortex ROI
    mask_insular = load_roi_mask(roi_name='Insular Cortex',atlas_name='HarvardOxford-Cortical',
                                    lateralized=False, threshold=20, reg_dir = paths_reg[i])[0]
    
    # Creating boolean array to index our subject data.
    bool_mask_insular = mask_insular > 0
    
    # Making an array of all the labels of the sharedstates data.
    labels = np.loadtxt(sub_paths[i] + "sample_labels.txt" ,dtype = str )
    labels = np.array(labels)
    
    # Using the Mvp class to load in the the data
    mvp = Mvp(paths = paths)
    
    # Changing the voxel dimensions.
    mvp.load(voxel_dims=(80, 80, 37))
    
    # Applying the mask of the insular over the subject data
    mvp.X = mvp.X[:, bool_mask_insular.ravel()]
    
    # Make conceptual RDM
    conc_rdm_insular = make_conceptual_rdm(labels)
    
    # We used the correlation-distance for our RDM
    brain_rdm_insular = pairwise_distances(mvp.X, metric = 'correlation')
    
    # Use np.triu_indices to index both conc_rdm and brain_rdm
    brain_rdm_triu_insular = brain_rdm_insular[np.triu_indices(120,1)]
    conc_rdm_triu_insular = conc_rdm_insular[np.triu_indices(120,1)]
    
    brain_rdms_insular[:,i] = brain_rdms_insular[:,i] + brain_rdm_triu_insular
    
    # calculate the correlation
    cor_insular[i] = cor_insular[i] + spearmanr(conc_rdm_triu_insular, brain_rdm_triu_insular)[0]

print cor_insular
print ("The average correlation is: %f" % (np.mean(cor_insular)))


#### Calculating the noise ceiling

mean_brain_RDM_insular = np.mean(brain_rdms_insular, axis = 1)

cor_tot_insular = np.zeros(19)


for i in range(N):
    cor_tot_insular[i] = cor_tot_insular[i] + spearmanr(mean_brain_RDM_insular, brain_rdms_insular[:,i])[0]

print ("The noise ceiling is: %f" % (sum(cor_tot_insular)/N))




In [None]:
# Applying a one sample t-test
tstat_insular, pvalue_insular = stats.ttest_1samp(cor_insular, 0)
print (tstat_insular, pvalue_insular)


#### RSA: Insular Cortex
On average conceptual and brain RDM was correlated with 0.026. A one-sample t-test revealed a significant correlation, *t* = 4.964, *p* =  0.0001.

### RSA: Anterior  Cingulate Gyrus

In [None]:
brain_rdms_ACC = np.zeros((120*(120-1)/2, 19)) # matrix of (N*(N-1)/2)-by-19 (19 subjects)

cor_ACC = np.zeros(19)

# Path to subject directories data and sorting of the subject paths
sub_paths = glob("/home/nipa_8/SharedStates/SELF/sub0*/")
sub_paths = sorted(sub_paths)

# Path to reg - subject directories data and sorting of the subject paths for the mask
paths_reg = glob("/home/nipa_8/SharedStates/SELF/sub0*/self1.feat/reg" )
paths_reg = sorted(paths_reg)

N = 19
for i in range(N):
    
    # Globbing over the subject paths to get the paths to tstat files for every subject.
    paths = glob(sub_paths[i] + "self*/stats/tstat*")
    # This sorts the paths so that it is ordered *tstat1.nii.gz (run1), *tstat1.nii.gz (run2), 
    # *tstat2.nii.gz (run1) etc.
    paths = sorted(paths, key=lambda x: int(op.basename(x).split('.')[0].split('tstat')[-1]))
    
    # Loading in the mask of the Anterior Cingulate Gyrus ROI
    mask_ACC = load_roi_mask(roi_name='Cingulate Gyrus, anterior division',atlas_name='HarvardOxford-Cortical',
                                    lateralized=False, threshold=20, reg_dir = paths_reg[i])[0]
    
    # Creating boolean array to index our subject data
    bool_mask_ACC = mask_ACC > 0
    
    # Making an array of all the labels of the sharedstates data.
    labels = np.loadtxt(sub_paths[i] + "sample_labels.txt" ,dtype = str )
    labels = np.array(labels)
    
    # Using the mvp class to load in the the data
    mvp = Mvp(paths = paths)
    
    # Changing Voxel Dimensions
    mvp.load(voxel_dims=(80, 80, 37))
    
    # Applying the mask of the Anterior Cingulate Gyrus over the subject data
    mvp.X = mvp.X[:, bool_mask_ACC.ravel()]
    
    # Make conceptual RDM
    conc_rdm_ACC = make_conceptual_rdm(labels)
    
    # We used the correlation-distance for our RDM
    brain_rdm_ACC = pairwise_distances(mvp.X, metric = 'correlation')
    
    # Use np.triu_indices to index both conc_rdm and brain_rdm
    brain_rdm_triu_ACC = brain_rdm_ACC[np.triu_indices(120,1)]
    conc_rdm_triu_ACC = conc_rdm_ACC[np.triu_indices(120,1)]
    
    brain_rdms_ACC[:,i] = brain_rdms_ACC[:,i] + brain_rdm_triu_ACC
    
    # Calculate correlation
    cor_ACC[i] = cor_ACC[i] + spearmanr(conc_rdm_triu_ACC, brain_rdm_triu_ACC )[0]

print cor_ACC
print ("The average correlation is: %f" % (np.mean(cor_ACC)))

# noise ceiling

mean_brain_RDM_ACC = np.mean(brain_rdms_ACC, axis = 1)

cor_tot_ACC = np.zeros(19)


for i in range(N):
    cor_tot_ACC[i] = cor_tot_ACC[i] + spearmanr(mean_brain_RDM_ACC, brain_rdms_ACC[:,i])[0]

print ("The noise ceiling is: %f" % (sum(cor_tot_ACC)/N))






In [None]:
# Applying a one sample t-test
tstat_ACC, pvalue_ACC = stats.ttest_1samp(cor_ACC, 0)
print (tstat_ACC, pvalue_ACC)

#### Anterior Cingulate Cortex
On average conceptual and brain RDM was correlated with 0.029. A one-sample t-test revealed a significant correlation, *t* = 6.074, *p* < 0.001.

### RSA: Frontal Medial Cortex

In [None]:
brain_rdms_mPFC = np.zeros((120*(120-1)/2, 19)) # matrix of (N*(N-1)/2)-by-19 (19 subjects)

cor_mPFC = np.zeros(19)

# Path to subject directories data and sorting of the subject paths
sub_paths = glob("/home/nipa_8/SharedStates/SELF/sub0*/")
sub_paths = sorted(sub_paths)

# Path to reg - subject directories data and sorting of the subject paths for the mask
paths_reg = glob("/home/nipa_8/SharedStates/SELF/sub0*/self1.feat/reg" )
paths_reg = sorted(paths_reg)

N = 19
for i in range(N):
    
    # Globbing over the subject paths to get the paths to tstat files for every subject
    paths = glob(sub_paths[i] + "self*/stats/tstat*")
    # This sorts the paths so that it is ordered *tstat1.nii.gz (run1), *tstat1.nii.gz (run2), 
    # *tstat2.nii.gz (run1) etc.
    paths = sorted(paths, key=lambda x: int(op.basename(x).split('.')[0].split('tstat')[-1]))
    
    # Loading in the mask of the Frontal Medial Cortex ROI
    mask_mPFC = load_roi_mask(roi_name='Frontal Medial Cortex',atlas_name='HarvardOxford-Cortical',
                                    lateralized=False, threshold=20, reg_dir = paths_reg[i])[0]
    
    # Creating boolean array to index our subject data
    bool_mask_mPFC = mask_mPFC > 0
    
    # Making an array of all the labels of the sharedstates data.
    labels = np.loadtxt(sub_paths[i] + "sample_labels.txt" ,dtype = str )
    labels = np.array(labels)
    
    # Using the Mvp class to load in the the data
    mvp = Mvp(paths = paths)
    
    # Changing the voxel dimensions
    mvp.load(voxel_dims=(80, 80, 37))
    
    # Applying the mask of the Frontal Medial Cortex over the subject data
    mvp.X = mvp.X[:, bool_mask_mPFC.ravel()]
    
    # Make conceptual RDM
    conc_rdm_mPFC = make_conceptual_rdm(labels)

    # We used the correlation-distance for our RDM
    brain_rdm_mPFC = pairwise_distances(mvp.X, metric = 'correlation')
    
    # Use np.triu_indices to index both conc_rdm and brain_rdm
    brain_rdm_triu_mPFC = brain_rdm_mPFC[np.triu_indices(120,1)]
    conc_rdm_triu_mPFC = conc_rdm_mPFC[np.triu_indices(120,1)]
    
    brain_rdms_mPFC[:,i] = brain_rdms_mPFC[:,i] + brain_rdm_triu_mPFC
    
    # Calculate correlation
    cor_mPFC[i] = cor_mPFC[i] + spearmanr(conc_rdm_triu_mPFC, brain_rdm_triu_mPFC)[0]

print cor_mPFC
print ("The average correlation is: %f" % (np.mean(cor_mPFC)))

# noise ceiling

mean_brain_RDM_mPFC = np.mean(brain_rdms_mPFC, axis = 1)

cor_tot_mPFC = np.zeros(19)


for i in range(N):
    cor_tot_mPFC[i] = cor_tot_mPFC[i] + spearmanr(mean_brain_RDM_mPFC, brain_rdms_mPFC[:,i])[0]

print ("The noise ceiling is: %f" % (sum(cor_tot_mPFC)/N))



In [None]:
# Applying a one sample t-test
tstat_mPFC, pvalue_mPFC = stats.ttest_1samp(cor_mPFC, 0)
print (tstat_mPFC, pvalue_mPFC)

#### RSA: Frontal Medial Cortex
On average conceptual and brain RDM was correlated with 0.025. A one-sample t-test revealed a significant correlation, *t* = 3.427, *p* =  0.003.

<div class='alert alert-info'>
**Feedback**: Code looks good and was clear to me. I also liked that you split up the analysis of the three ROIs in three parts (one 'section' for each ROI), but perhaps next time you can write a function - like analyze_rsa() - that does the entire analysis, and then call this separately for each ROI. This saves you from copy-pasting the code and minimizes the chance of mistakes when copy-pasting or altering the code. 
</div>

### Visualization of conceptual and feature RDM

In [None]:
f, axs = plt.subplots(1,2,figsize=(10,10))


plt.subplot(1, 2, 1)
plt.imshow(conc_rdm_insular, cmap='seismic')
plt.title('Conceptual RDM')
plt.axis('off')

plt.subplot(1, 2, 2)
plt.imshow(brain_rdm_insular, cmap='seismic')
plt.title('Insular RDM')
plt.axis('off')
plt.show()

f, axs = plt.subplots(1,2,figsize=(10,10))

plt.subplot(2, 2, 1)
plt.imshow(conc_rdm_ACC, cmap='seismic')
plt.title('Conceptual RDM')
plt.axis('off')

plt.subplot(2, 2, 2)
plt.imshow(brain_rdm_ACC, cmap='seismic')
plt.title('Anterior Cingulate Gyrus RDM')
plt.axis('off')
plt.show()

f, axs = plt.subplots(1,2,figsize=(10,10))


plt.subplot(2, 2, 1)
plt.imshow(conc_rdm_mPFC, cmap='seismic')
plt.title('Conceptual RDM')
plt.axis('off')

plt.subplot(2, 2, 2)
plt.imshow(brain_rdm_mPFC, cmap='seismic')
plt.title('Middle Frontal Gyrus RDM')
plt.axis('off')
plt.show()



## Machine learning pipeline (ML)

We chose logistic regression (LR) over a support vector machine classifier (SVM) for our analysis. The methods do not differ much in their performance but LR is prefered in our case as we have more than two classes (Pereira, Mitchell & Botvinick, 2009). We used K-fold cross validation to fit and test our model on different partitions of the data with each fold. The amount of folds we used was 10. There are no hard rules about the choice for the amount of folds, but 10 is often used as it balances bias and variance. We had 120 samples for every subject, 40 samples for every class. In every fold we had 108 samples as the train set and 12 samples as the test/cross-validation set. The learned hyper parameters of the train set were cross-validated on the test set in each fold and the accuracy for each fold was calculated. We reduced the amount of features by applying a ROI mask for every subject and we standardized our features column wise to get all the features on the same scale. We used the *tstats* of the fitted HRFs as the features in our machine learning model. The method for standardization is included in our class and involves mean normalization. We used three ROIs namely: the Insular Cortex, the ACC and the mPFC. The mean accuracies of our algorithm across all folds were calculated for each subject in each ROI and for each ROI we tested whether our algorithm scored above chance. We had three possible classes: action, interoception and situation. This means that if our algorithm performed at chance we would end up with an accuracy of .33 (1/3). The results and code for each ROI in our ML analysis is described below.
 

<div class='alert alert-info'>
**Feedback**: Clear. Couple of notes: every ML algorithm can principally be applied in a multiclass setting. As far as I know, LR has no particular advantage over other algorithms in multiclass settings. That said, choosing LR is of course fine. Also, you state that you the 'learned hyper parameters' are cross-validated on the test-set, but that's not the case: you cross-validate the *parameters* (i.e. $\beta$s)!
</div>

In [2]:
from sklearn.pipeline import Pipeline
from skbold.utils import print_mask_options
from skbold.utils import load_roi_mask
from glob import glob
import os.path as op
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression

# Defining the folds we are gonna use, K = 10.
skf = StratifiedKFold(n_splits=10)

#Path to subject directories data, after that sorted so we have the ordered subject paths.#
paths = glob("/home/nipa_4/SharedStates/SELF/sub0*/")
paths = sorted(paths)

path_reg = glob("/home/nipa_4/SharedStates/SELF/sub0*/self1.feat/reg")
path_reg = sorted(path_reg)

#Making the y dataset, 0 for action, introspection is 1 and situation is 2.
y = np.repeat([0, 1, 2], 40)

### Machine Learning: Insular Cortex

In [None]:
### Insular Cortex Decoding Analysis ###


clf = LogisticRegression()


Acc_Insular = np.zeros(19)


N = 19

for i in range(N):
    
    
    #Globbing over the subject paths to get the paths to tstat files for every subject.
    paths_t = glob(paths[i] + "self*/stats/tstat*")
    
    
    
    # Ordering the paths so tstat1(run1) comes before tstat2(run2).
    paths_t = sorted(paths_t, key=lambda x: int(op.basename(x).split('.')[0].split('tstat')[-1]))

    
    #Loading in the mask of the insular ROI.
    mask = load_roi_mask(roi_name='Insular Cortex', atlas_name='HarvardOxford-Cortical', 
                 lateralized=False,threshold=20, reg_dir = path_reg[i])[0]
    
    #Creating boolean array to index our subject data.
    mask = mask > 0
    
    #Using the mvp class to load in the the data
    mvp = Mvp(paths = paths_t)
    #Changing the voxel dimensions.
    mvp.load(voxel_dims=(80, 80, 37))
    
    #Applying the mask of the insular over the subject data.
    mvp.X = mvp.X[:, mask.ravel()]
    
    # Standardizing the data for out decoding analysis.
    mvp.standardize()
    
    
    # After masking and standardizing we had a problem with NaN (probably because of diviing zeros by something)
    # this was problematic for fitting out decoding model.
    mvp.X = np.nan_to_num(mvp.X)
    
    accuracy = []
    
    # Splitting the outcome data and X dat into folds.
    folds = skf.split(mvp.X, y)
    
    #Loop with the folds, changing the fold with every iteration.
    for fold in folds:
        
        train_idx, test_idx = fold
        
        X_train = mvp.X[train_idx]
        y_train = y[train_idx]
        
        X_test = mvp.X[test_idx]
        y_test = y[test_idx]
        
        # Fitting the logistic regression model on our train set for fold i.
        clf.fit(X_train, y_train)
        
        # Predicting on the test set using our fitted model for fold i.
        pred = clf.predict(X_test)
        
        # Calculating the accuracy for fold i.
        accuracy.append(sum(y_test == pred)/float(len(pred)))
        
    # Mean accuracy over folds i for subject j for the insular saved in Acc_Insular.
    Acc_Insular[i] = Acc_Insular[i] + np.mean(accuracy)



In [None]:

print ("The mean accuracies across folds for each subject was: " )
print Acc_Insular
print ("The mean accuracy across subjects was: %f" % (np.mean(Acc_Insular)))
tstat_MLins, pvalue_MLins = stats.ttest_1samp(Acc_Insular, 0.33)
print ("The test statistics for the t-test against chance in the ACC: %f %f" % (tstat_MLins, pvalue_MLins))

#### Machine Learning: Insular Cortex
The mean accuracy of our algorithm was 0.433 in the Insular Cortex. We tested whether the mean accuracies across subjects were above chance (0.33) with a one sample t-test. The t-test indicated that our algorithm made predictions above chance in the Insular Cortex, *t* = 6.747, *p* < 0.001. 

<div class='alert alert-info'>
**Feedback**: Looks good. Also here a function (like decode_subjects() or something) would have been nice. (But your way is also fine.)
</div>

### Machine Learning: Anterior Cingulate Cortex

In [None]:
clf = LogisticRegression()

Acc_Cing = np.zeros(19)


N = 19

for i in range(N):
    
    
    #Globbing over the subject paths to get the paths to tstat files for every subject.
    paths_t = glob(paths[i] + "self*/stats/tstat*")
    
    # Ordering the paths so tstat1(run1) comes before tstat2(run2).
    paths_t = sorted(paths_t, key=lambda x: int(op.basename(x).split('.')[0].split('tstat')[-1]))
    
    
    
    #Loading in the mask of the cingulate gyrus ROI.
    mask = load_roi_mask(roi_name='Cingulate Gyrus, anterior division', atlas_name='HarvardOxford-Cortical', 
                  lateralized=False,threshold=20, reg_dir = path_reg[i])[0]
    
    
    #Creating boolean array to index our subject data.
    mask = mask > 0
    
    #Using the mvp class to load in the the data
    mvp = Mvp(paths = paths_t)
    
    #Changing the voxel dimensions.
    mvp.load(voxel_dims=(80, 80, 37))
    
    
    #Applying the mask of the insular over the subject data.
    mvp.X = mvp.X[:, mask.ravel()]
    
    # Standardizing the data for out decoding analysis.
    mvp.standardize()
    
    # After masking and standardizing we had a problem with NaN (probably because of diviing zeros by something)
    # this was problematic for fitting out decoding model.
    mvp.X = np.nan_to_num(mvp.X)
    
    accuracy = []
    
    # Splitting the outcome data and X dat into folds.
    folds = skf.split(mvp.X, y)
    
    #Loop with the folds, changing the fold with every iteration.
    for fold in folds:
        
        train_idx, test_idx = fold
        
        X_train = mvp.X[train_idx]
        y_train = y[train_idx]
        
        X_test = mvp.X[test_idx]
        y_test = y[test_idx]
        
        # Fitting the logistic regression model on our train set for fold i.
        clf.fit(X_train, y_train)
        
        # Predicting on the test set using our fitted model for fold i.
        pred = clf.predict(X_test)
        
        # Calculating the accuracy for fold i.
        accuracy.append(sum(y_test == pred)/float(len(pred)))
        
    # Mean accuracy over folds i for subject j for the cingulate gyrus saved in Acc_Cing.
    Acc_Cing[i] = Acc_Cing[i] + np.mean(accuracy)



In [None]:
print ("The mean accuracies across folds for each subject was: " )
print Acc_Cing
print ("The mean accuracy across subjects was: %f" % (np.mean(Acc_Cing)))
tstat_Cing, pvalue_Cing = stats.ttest_1samp(Acc_Cing, 0.33)
print ("The test statistics for the t-test against chance in the ACC: %f %f" % (tstat_Cing, pvalue_Cing))

#### Machine Learning: Anterior Cingulate Cortex
The mean accuracy of our algorithm was 0.437 in the ACC. We tested whether the mean accuracies across subjects were above chance (0.33) with a one sample t-test. The t-test indicated that our algorithm made predictions above chance in the ACC, *t* = 7.061, *p* < 0.001. 

### Machine Learning: Frontal Medial Cortex

In [None]:
### Medial Frontal ###

clf = LogisticRegression()


Acc_Med = np.zeros(19)


N = 19

for i in range(N):
    
    
    #Globbing over the subject paths to get the paths to tstat files for every subject.
    paths_t = glob(paths[i] + "self*/stats/tstat*")
    
    
    
    # Ordering the paths so tstat1(run1) comes before tstat2(run2).
    paths_t = sorted(paths_t, key=lambda x: int(op.basename(x).split('.')[0].split('tstat')[-1]))

    
    #Loading in the mask of the frontal frontal medial cortex ROI.
    mask = load_roi_mask(roi_name='Frontal Medial Cortex', atlas_name='HarvardOxford-Cortical', 
                  lateralized=False,threshold=20, reg_dir = path_reg[i])[0]
    
    #Creating boolean array to index our subject data.
    mask = mask > 0
    
    #Using the mvp class to load in the the data
    mvp = Mvp(paths = paths_t)
    #Changing the voxel dimensions.
    mvp.load(voxel_dims=(80, 80, 37))
    
    #Applying the mask of the insular over the subject data.
    mvp.X = mvp.X[:, mask.ravel()]
    
    # Standardizing the data for out decoding analysis.
    mvp.standardize()
    
    
    # After masking and standardizing we had a problem with NaN (probably because of diviing zeros by something)
    # this was problematic for fitting out decoding model.
    mvp.X = np.nan_to_num(mvp.X)
    
    accuracy = []
    
    # Splitting the outcome data and X dat into folds.
    folds = skf.split(mvp.X, y)
    
    #Loop with the folds, changing the fold with every iteration.
    for fold in folds:
        
        train_idx, test_idx = fold
        
        X_train = mvp.X[train_idx]
        y_train = y[train_idx]
        
        X_test = mvp.X[test_idx]
        y_test = y[test_idx]
        
        # Fitting the logistic regression model on our train set for fold i.
        clf.fit(X_train, y_train)
        
        # Predicting on the test set using our fitted model for fold i.
        pred = clf.predict(X_test)
        
        # Calculating the accuracy for fold i.
        accuracy.append(sum(y_test == pred)/float(len(pred)))
        
    # Mean accuracy over folds i for subject j for the insular saved in the Frontal Medial Cortex.
    Acc_Med[i] = Acc_Med[i] + np.mean(accuracy)





In [None]:
print ("The mean accuracies across folds for each subject was: " )
print Acc_Med
print ("The mean accuracy across subjects was: %f" % (np.mean(Acc_Med)))
tstat_Med, pvalue_Med = stats.ttest_1samp(Acc_Med, 0.33)
print ("The test statistics for the t-test against chance in the ACC: %f %f" % (tstat_Med, pvalue_Med))

#### Machine Learning: Frontal Medial Cortex
The mean accuracy of our algorithm was 0.410 in the mPFC. We tested whether the mean accuracies across subjects were above chance (0.33) with a one sample t-test. The t-test indicated that our algorithm made predictions above chance in the mPFC, *t* = 5.141, *p* < 0.001. 

## Results of the comparison between RSA and Machine Learning

In [None]:
### Correlation Between Decoding & RSA ###
from scipy.stats import spearmanr

cor_tot_insular, pvalue_tot_insular = spearmanr(Acc_Insular,cor_insular)
print ("The Spearman correlation test between RSA and ML for the Insular Cortex yielded : Rho = %f and P = %f" %(cor_tot_insular, pvalue_tot_insular))
cor_tot_ACC, pvalue_tot_ACC = spearmanr(Acc_Cing, cor_ACC)
print ("The Spearman correlation test between RSA and ML for the ACC yielded : Rho = %f and P = %f" %(cor_tot_ACC, pvalue_tot_ACC))
cor_tot_mPFC, pvalue_tot_mPFC = spearmanr(Acc_Med, cor_mPFC)
print ("The Spearman correlation test between RSA and ML for the mPFC yielded : Rho = %f and P = %f" %(cor_tot_mPFC, pvalue_tot_mPFC))



RSA and machine learning were compared using a Spearman rank correlation. On average the correlation between decoding accuracy and the correlation between conceptual and brain RDM was 0.646 across the three ROIs. 
The correlation between decoding accuracy and the correlation between conceptual and brain RDM in the Insular Cortex was not significant, *r*S = 0.556, *p* = 0.013. 
The correlation between decoding accuracy and the correlation between conceptual and brain RDM in the ACC was significant, *r*S = 0.749, *p* = 0.0002.
The correlation between decoding accuracy and the correlation between conceptual and brain RDM in the mPFC was significant, *r*S = 633, *p* = 0.004.

<div class='alert alert-info'>
**Feedback**: Wow, I am still amazed by these correlations between decoding/RSA! Very interesting. But on the other hand, as explained by Friston (2009), it does make sense... Anyway, well done.
</div>

## Discussion & conclusions

The hypothesis that using ML we could classify the three classes within subjects above chance could not be rejected for all the ROIs investigated. The hypothesis that our conceptual RDM and brain RDM show a significant correlation for all the ROIs investigated could not be rejected as well. From this we can conclude that the ROIs we chose represent a significant amount of information about the three classes namely: action, interoception and situation. The conclusions we can draw from both analysis are the same and the Spearman rank correlation between the results of the analyses demonstrated how similar the results are. We already expected that ML and RSA would yield somewhat similar results, but in this case it proved to be even more similar than we already expected. The most compelling evidence was found in the ACC with a correlation of .749 between the two analyses. We did not find a significant correlation between the two analyses in the Insular Cortex although the correlation was still rather high: 0.556. We think that the correlation in the Insular Cortex did not reach significance because of the conservative Bonferroni correction that we used to adjust our alpha value. This method is too conservative when not all tests are independent of each other; we used the same data and the same subjects for both the RSA and ML analyses, thus our tests were not entirely independent. However, the fact that most of our results were convincing even when using a correction that was too conservative shows how similar these methods are. Our analyses show that when using a simple conceptual RDM in RSA, where the different classes are completely dissimilar from each other (not correlated or high Euclidean distance) that this method yields (almost) the same results as a decoding classification algorithm. 

For future research it would be interesting to see how similar the results of these two analyses are when the model used for the conceptual RDM in RSA is more complex. Another thing that we would do if we could redo the analyses, is to split the data and use half of the data for the RSA analysis and half of the data for the ML analysis before starting our pipelines. This would make for a more compelling argument when the two analyses are still highly correlated and it would also take away the problem of non-independent tests. Unfortunately, we did not think of this before running all our analyses.

-------------

<div class='alert alert-info'>
**Feedback**: I think you got the wording about (rejecting) hypotheses kind of mixed up here. You reject the null-hypothesis, but you seem to imply that you reject your alternative hypothesis... Also, while I liked that you though about and implemented multiple comparison correction, your explanation about (in)dependence between tests is not really valid. This is because the multiple testing arises from testing multiple ROIs, not necessarily because your test is between estimates from the same subjects (as you seem to imply). It could very well be that the ROIs yield very different (independent) estimates, so in that case it would be sensible to apply Bonferroni MCC... 
</div>

## References

Bechara, A., & Naqvi, N. (2004). Listening to your heart: interoceptive awareness as a gateway to feeling. Nat Neurosci, 7(2), 102–103. Retrieved from http://dx.doi.org/10.1038/nn0204-102
Binder, J. R., Desai, R. H., Graves, W. W., & Conant, L. L. (2009). Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies. Cerebral Cortex (New York, N.Y. : 1991), 19(12), 2767–2796. http://doi.org/10.1093/cercor/bhp055
Bush, G., Luu, P., & Posner, M. I. (2000). Cognitive and emotional influences in anterior cingulate cortex. Trends in Cognitive Sciences, 4(6), 215–222. http://doi.org/http://doi.org/10.1016/S1364-6613(00)01483-2
Casanova, J. P., Madrid, C., Contreras, M., Rodriguez, M., Vasquez, M., & Torrealba, F. (2016). A role for the interoceptive insular cortex in the consolidation of learned fear. Behavioural Brain Research, 296, 70–77. http://doi.org/10.1016/j.bbr.2015.08.032
Critchley, H. D., Wiens, S., Rotshtein, P., Ohman, A., & Dolan, R. J. (2004). Neural systems supporting interoceptive awareness. Nature Neuroscience, 7(2), 189–195. http://doi.org/10.1038/nn1176
Etkin, A., Egner, T., & Kalisch, R. (2011). Emotional processing in anterior cingulate and medial prefrontal cortex. Trends in Cognitive Sciences, 15(2), 85–93. http://doi.org/10.1016/j.tics.2010.11.004
Friston, K. J. (2009). Modalities, modes, and models in functional neuroimaging. Science (New York, N.Y.), 326(5951), 399–403. http://doi.org/10.1126/science.1174521
Garrido, L., Vaziri-Pashkam, M., Nakayama, K., & Wilmer, J. (2013). The consequences of subtracting the mean pattern in fMRI multivariate correlation analyses   . Frontiers in Neuroscience  . Retrieved from http://journal.frontiersin.org/article/10.3389/fnins.2013.00174
Gu, X., Hof, P. R., Friston, K. J., & Fan, J. (2013). Anterior Insular Cortex and Emotional Awareness. The Journal of Comparative Neurology, 521(15), 3371–3388. http://doi.org/10.1002/cne.23368
Hänsel, A., & von Känel, R. (2008). The ventro-medial prefrontal cortex: a major link between the autonomic nervous system, regulation of emotion, and stress reactivity? BioPsychoSocial Medicine, 2(1), 21. http://doi.org/10.1186/1751-0759-2-21
James, W., & Lange. (1884). Mind.
Jennings, J. R. (1992). Is it important that the mind is in a body? Inhibition and the heart. Psychophysiology, 29(4), 369–383.
Jennings, J. R., & van der Molen, M. W. (2002). Cardiac timing and the central regulation of action. Psychological Research, 66(4), 337–349. http://doi.org/10.1007/s00426-002-0106-5
Jones, C. L., Minati, L., Harrison, N. A., Ward, J., & Critchley, H. D. (2011). Under pressure: response urgency modulates striatal and insula activity during decision-making under risk. PloS One, 6(6), e20942. http://doi.org/10.1371/journal.pone.0020942
Kriegeskorte, N. (2011). Pattern-information analysis: from stimulus decoding to computational-model testing. NeuroImage, 56(2), 411–421. http://doi.org/10.1016/j.neuroimage.2011.01.061
Kriegeskorte, N., & Kievit, R. A. (2013). Representational geometry: integrating cognition, computation, and the brain. Trends in Cognitive Sciences, 17(8), 401–412. http://doi.org/10.1016/j.tics.2013.06.007
Kriegeskorte, N., Mur, M., & Bandettini, P. (2008). Representational similarity analysis - connecting the branches of systems neuroscience   . Frontiers in Systems Neuroscience  . Retrieved from http://journal.frontiersin.org/article/10.3389/neuro.06.004.2008
Lavin, C., Melis, C., Mikulan, E., Gelormini, C., HUEPE, D., & Ibanez, A. (2013). The anterior cingulate cortex: an integrative hub for human socially-driven interactions   . Frontiers in Neuroscience  . Retrieved from http://journal.frontiersin.org/article/10.3389/fnins.2013.00064
Lench, H. C., Flores, S. A., & Bench, S. W. (2011). Discrete emotions predict changes in cognition, judgment, experience, behavior, and physiology: a meta-analysis of experimental emotion elicitations. Psychological Bulletin, 137(5), 834–855. http://doi.org/10.1037/a0024244
Lindquist, K. A., Wager, T. D., Kober, H., Bliss-Moreau, E., & Barrett, L. F. (2012). The brain basis of emotion: A meta-analytic review. The Behavioral and Brain Sciences, 35(3), 121–143. http://doi.org/10.1017/S0140525X11000446
Medford, N., & Critchley, H. D. (2010). Conjoint activity of anterior insular and anterior cingulate cortex: awareness and response. Brain Structure and Function, 214(5), 535–549. http://doi.org/10.1007/s00429-010-0265-x
Moseley, R. L., Shtyrov, Y., Mohr, B., Lombardo, M. V, Baron-Cohen, S., & Pulvermüller, F. (2015). Lost for emotion words: What motor and limbic brain activity reveals about autism and semantic theory. NeuroImage, 104, 413–422. http://doi.org/http://doi.org/10.1016/j.neuroimage.2014.09.046
Naselaris, T., Kay, K. N., Nishimoto, S., & Gallant, J. L. (2011). Encoding and decoding in fMRI. NeuroImage, 56(2), 400–410. http://doi.org/10.1016/j.neuroimage.2010.07.073
Oosterwijk, S., Mackey, S., Wilson-Mendenhall, C., Winkielman, P., & Paulus, M. P. (2015). Concepts in context: Processing mental state concepts with internal or external focus involves different neural systems. Social Neuroscience, 10(3), 294–307. http://doi.org/10.1080/17470919.2014.998840
Oosterwijk, S., Snoek, L., Rotteveel, M., Barrett, L. F., & Scholte, H. S. (2017). Shared States: Using MVPA to test neural overlap between self-focused emotion imagery and other-focused emotion understanding. In Press.
Pereira, F., Detre, G., & Botvinick, M. (2011). Generating Text from Functional Brain Images. Frontiers in Human Neuroscience, 5, 72. http://doi.org/10.3389/fnhum.2011.00072
Perry, D., Hendler, T., & Shamay-Tsoory, S. G. (2012). Can we share the joy of others? Empathic neural responses to distress vs joy . Social Cognitive and Affective Neuroscience, 7(8), 909–916. Retrieved from http://dx.doi.org/10.1093/scan/nsr073
Pulvermüller, F. (2013). Semantic embodiment, disembodiment or misembodiment? In search of meaning in modules and neuron circuits. Brain and Language, 127(1), 86–103. http://doi.org/http://doi.org/10.1016/j.bandl.2013.05.015
Pulvermüller, F., & Fadiga, L. (2010). Active perception: sensorimotor circuits as a cortical basis for language. Nat Rev Neurosci, 11(5), 351–360. Retrieved from http://dx.doi.org/10.1038/nrn2811
Sebastian, C. L., Fontaine, N. M. G., Bird, G., Blakemore, S.-J., De Brito, S. A., McCrory, E. J. P., & Viding, E. (2012). Neural processing associated with cognitive and affective Theory of Mind in adolescents and adults. Social Cognitive and Affective Neuroscience, 7(1), 53–63. Retrieved from http://dx.doi.org/10.1093/scan/nsr023
Zaki, J., Davis, J. I., & Ochsner, K. N. (2012). Overlapping activity in anterior insula during interoception and emotional experience. NeuroImage, 62(1), 493–499. http://doi.org/10.1016/j.neuroimage.2012.05.012





