# Final Exam Take Home (Part 1) - fMRI analysis

#### **All files needed for the final are contained within the folder final**

### The Final Exam is divided into 2 parts.  The first part involves using Linear Discriminant Analysis (LDA) to analyze fMRI data related to working memory using a comparison of a **n-back (n=2)** task with **target detection**, using different types of visual stimuli known to activate different parts of the brain. In an n-back task, subjects are presented a sequence of stimuli. The task is to detect whether the stimuli are repeats with a separation of n items.  This task is popular in studies of working memory because the subject has to hold the last n stimuli in sequence in memory, and the load on working memory can be parametrically varied by increasing n. 

### The task here was designed to separately manipulate aspects of working memory.  First, the two tasks contrast the engagement of areas of the brain involved in working memory computations in the brain.  Second, the use of different types of visual stimuli involves different parts of the brain that process the stimuli and potentially hold separately the representation of the stimulus in working memory.  

### The data for this task is in the file 'WM_fmri_subjectaverage.mat'.  The data has been averaged over fmri scans in each of 100 participants for each experimental condition. 




## README 

### After you load the data (import loadmat from hdf5storage), you will get a dictionary with the following keys.  

#### condition_index - index for each data sample, indicating the experimental condition 
#### conditions - conditions in the experiment 
#### fmri - fmri data averaged over participants, nregions x (nsubjects x nconditions)
#### nconditions - number of conditions (in this case, 8)
#### nregions - number of regions (always 360)
#### nsubjects - number of subjects (always 100)
#### subject - indexes which subject each average comes from. 
#### task - which task the data comes from.  

#### For the Working Memory data set, each condition is labeled in 2 ways.  
#### First the task is either
* #### '0bk' - target detection (condition_index 0-3) 
* #### '2bk' - working memory task (condition_index 4-7)
#### The stimulus category is one of 
* #### 'body' - body parts (condition_index = 0/4) 
* #### 'faces' - human faces (condition_index = 1/5)
* #### 'places' - landscapes (condition_index = 2/6) 
* #### 'tools' - common tools (condition_index = 3/7)
#### The 8 conditions reflect a combination of task and category labeled by condition_index 0 to 7 
* #### 0: '0bk_body'
* #### 1: '0bk_faces'
* #### 2: '0bk_places'
* #### 3: '0bk_tools'
* #### 4: '2bk_body'
* #### 5: '2bk_faces'
* #### 6: '2bk_places'
* #### 7: '2bk_tools'
    

#### Your task is to analyze this data using LDA to answer questions about working memory and brain networks. 

#### 1.  Analyze the effect of engaging working memory in fMRI data by computing the difference in the fMRI response in the '0bk' (target detection) and '2bk' (working memory) conditions.  Identify the ROIs that show the strongest difference (Hint: Don't forget to z-score the data, so your difference reflect standardized effect sizes), by ranking the ROIs by effect size, and making a table of the top 10 ROI that show the largest magnitude difference.  The table should show the ROI name, the network they belong to, and the value of the standardized difference (including sign).   

In [None]:
import numpy as np 
from matplotlib import pyplot as plt 
from hdf5storage import loadmat, savemat 
hcppath = '/home/ramesh/Teaching/classdata/fmri/hcp_task/'
datapath = hcppath+'processed/'
regions = np.load('regions.npy') # this is the file 
roi_names = regions[:,0] # these are the names of each of 360 roi from the parcellation.
network_names = regions[:,1] # these are the networks each roi "belongs" to
networks = np.unique(regions[:,1]) # these are the unique network names 
data = loadmat('WM_fmri_subjectaverage.mat')
condition_index = data['condition_index']
conditions = data['conditions']
fmri = data['fmri']
nconditions = data['nconditions']
nregions = data['nregions']
nsubjects = data['nsubjects']
subject = data['subject']
task = data['task']
from scipy.stats import zscore
z = zscore(fmri)
diff = np.mean(z[:,condition_index < 4],axis = 1) -np.mean(z[:,condition_index >3],axis = 1)
n = 10
ordered_index = np.argsort(np.abs(diff)) # sorts into ascending order 
topn = ordered_index[-n:] #take the last n
for j in range(len(topn)):
    print(roi_names[topn[j]],network_names[topn[j]], diff[topn[j]])

#### **In this text box, write a sentence that identifies the network that shows the strongest effects, and the direction of that effect**

 #### 2. Using Linear Discriminant Analysis (LDA) to classify the data by the **task** - target detection versus working memory  ('0bk' versus '2bk') combining the data across the visual stimulus types. First perform the analysis using all the brain data (all 360 ROIs).  Second, perform the analysis separately on each of the subsets of ROI belonging to each of the 12 specific labeled networks (12 classifier models). The only output required is the performance of each LDA classifier using 5-fold cross-validation.  Make a table showing the performance of the classifier for each network and for the whole brain.   

#### **In this text box, identify the network that shows the strongest classification performance, and compare that performance to a whole brain model**

#### 3. Use LDA to classify all 8 experimental conditions separately for the ROIs in each of the 12 labeled networks. make a table that presents the classification performance of each network. Identify the network that has the strongest classification performance, and make a plot (using imshow) of the confusion matrix when making a model using only the ROIs in that network.  

#### **In this textbox, comment on the pattern of the confusion matrix results.  When there is an error, where does the misclassification occur?**

#### 4. Use LDA to make a classification model of all 8 experimental conditions, combining ROIs in the network identified in Question 3, with the network identified in Question 2 as best classifying task. Compute and visualize a confusion matrix for this new two-network model. As a comparison, compute a classification model using all the ROI from the whole brain.       

#### **In this text box, comment on the difference in performance of the two-network classification model with the whole brain classification model. Does the two-network classification model show any systematic patterns in the confusion matrix, as compared to your answer in question 3**