# Foundations 3: Homework for lab on connectivity and graph theory
- Due November 22, 2023
- Instructor: Ruben Sanchez-Romero
- T.A: Jaleesa Longfellow
- Fall 2023 CMBN

Save the notebook *.ipynb file adding your name (eg. HW_ConnGraphTheory_Foundations3_RubenSanchez.ipynb)
and send it to [ruben.saro.at.rutgers.edu]. 
I will run the notebook to grade it.

For each question add cells as you deem necessary.

Important! Be aware that Amarel will be down for maintenance Nov 21 & 22 (https://oarc.rutgers.edu/amarel-system-status/)

In [None]:
# necessary packages for the homework
import numpy as np
import matplotlib.pyplot as plt
import bct as bct
from scipy import stats
import glob as glob
import h5py

### For the homework we will work with empirical Human Connectome Project resting-state data

1. Download the resting-state preprocessed data for the first 10 subjects from the HCP dataset included in the Actflow Toolbox directory (should be inside the Foundations3_fMRI_lab directory). (I will do it here as an example.) 

In [None]:
files_dir = f'../ActflowToolbox/examples/HCP_example_data'
# this will find and sort in ascending order 30 subject files containing the preprocessed HCP resting-state run 1
subject_files = sorted(glob.glob(f'{files_dir}/HCP_example_restrun1_subj*_data.h5'))
# check that there are indeed 30 files
print(f'{len(subject_files)} available subject files')

# download the data for each individual subject
# save results here
num_nodes = 360 # 360 cortical nodes
num_timepoints = 1195 # timepoints for just run 1
num_subj = 10 # we only will use 10 here
rest_data = np.zeros((num_nodes,num_timepoints,num_subj))

scount = 0
print(f'using {num_subj} first subjects')
# loop just for the first 10 subjects
for subject in subject_files[0:num_subj]:
    # download the restdata
    rest_data[:,:,scount] = h5py.File(subject,'r')['restdata']    
    scount = scount + 1

2. Compute the Pearson correlation FC for each subject individually. Save your results in a 360 x 360 x 10 matrix (nodes x nodes x subjects)

3. Compute the group average FC (360 x 360 matrix). Do not forget to set the diagonal entries to zero.

4. Plot the group average FC, including the colorbar to see the range of values.

### We will explore how graph metrics change as the density of the graph changes

5. For mean degree, mean clustering coefficient, characteristic path length and mean betweenness centrality, plot each graph metric (y-axis) versus increasing values of density of the graph (x-axis). Consider densities 5%, 10%, 20% and 40%. Show the 4 plots. One for each graph metric. (As in the lab, only consider positive correlations for the threshold.) Do not forget to binarize before computing the graph metrics.

Note: Thresholding can result in nodes that are not connected to any other node (disconnected nodes). For disconnected nodes the shortest path length is set to Inf by the Brain Connectivity Toolbox (ie. there is no way to reach that node) and thus we need to use the characteristic path length function as such:<br /> "bct.charpath(bct.distance_bin(graph),include_infinite=False)[0]"<br /> (type "bct.charpath?" for more details)

### We will explore if the resting-state FC graph metrics are larger than those from a null model without structure

6. Threshold the group average FC to a 10% density and plot it. Then produce 500 null models. Use those to determine if the resting-state correlation FC graph metrics (mean clustering coefficient, characteristic path length and mean betweenness centrality) are significantly larger than those from null models without structure. Report the p-values (as computed in the lab) and describe if these p-values support the claim that resting-state FC graph metrics are larger than those from a null model.

### Finally, we will determine if the resting-state FC graph metrics differ between two different subsets of subjects of the HCP data

7. Repeat questions 1 to 4 but this time using the last 10 subjects from the HCP data.

8. Threshold the group average FC matrix at 10% density, as above, and then plot.

9. Use the Wilcoxon signed-rank test (used in the lab) to determine if the first group FC graph metrics differ from the last group FC. Consider both FCs at 10% density. Do the comparison for clustering coefficient, shortest path length (not characteristic path length!) and betweenness centrality. Report the result of the Wilcoxon function and describe if the p-values indicate a significant difference between the 10 first subjects and the last 10 subjects.

As mentioned in question 5, disconnected nodes will have shortest path length set to Inf. So, in order to properly compute the Wilcoxon test, we need to first remove the resulting nan values of the difference (a difference with an Inf value produces a nan). We can do this using the numpy function np.isfinite:<br />
idx = np.isfinite[difference]<br />
difference = difference[idx]<br />
The resulting difference does not have nan values and can be used in the Wilcoxon test.<br />
(type np.isfinite? for details)