# Using Prevent-AD data from sihnpy

If you want to practice using any module before moving on to your own data, everything is made available to you within `sihnpy` itself, with very little effort. Isn't that great?

At the time of writing, `sihnpy` contains data from 15 Prevent-AD participants who had at least 1 anatomical MRI image available and who had at least 1 functional MRI image available. Currently, the basic demographic data as well as the functional connectivity matrix of participants at baseline and at 12 months of follow-up are available. Note that not all participants have all fMRI modalities and/or follow-up timepoints available due to attrition or due to changes in the protocol over the years.,

The images were processed with fMRIPrep 20.2.0 and functional connectivity was generated with Nilearn. If you want more details on the preprocessing parameters, they are available in the LAST SECTION.

**Note**: At the time of writing this document (Jan 2023), the fingerprinting module is the only one available. As such, the data currently available pertains specifically to the fingerprinting module. As other modules are implemented, this document will be updated.

## Using the datasets module in sihnpy

For each `sihnpy` module, there is an accompagnying `dataset` composed of Prevent-AD data, unless specified otherwise. It's usage is very simple:
1. Import the module
2. Import the data

That's it!

Below is an example using the import function to test the fingerprinting module.

In [1]:
from sihnpy.datasets import pad_fp_input

id_list, path_participant_list, path_data = pad_fp_input()

The output of the function will always have three parts: the basic demographics of the participants, the path to the participant IDs file and the parts necessary to be able to run the functions of the module targeted, often in a single Python dictionary. 

The first part will usually be identical across all modules, unless specified otherwise.

In [2]:
id_list

Unnamed: 0,participant_id,sex,test_language,handedness_score,handedness_interpretation
0,sub-1000173,Male,French,100,Right-handed
1,sub-1002928,Female,French,100,Right-handed
2,sub-1004359,Female,French,90,Right-handed
3,sub-1016072,Female,French,-100,Left-handed
4,sub-1031654,Male,French,100,Right-handed
5,sub-1072774,Female,French,100,Right-handed
6,sub-1076159,Female,French,100,Right-handed
7,sub-1121981,Female,French,100,Right-handed
8,sub-1154932,Male,French,30,Ambidextrous
9,sub-1176949,Female,French,80,Right-handed


In `id_list`, we see the IDs of the participants as well as their basic demographics. Note that `id_list` is a `pandas.DataFrame` object, with `participant_id` as the index column. Therefore, any Pandas methods you can think of will work on `id_list`.

Most `sihnpy` functions have their own import function. This is to ensure that whatever is fed to the module can be checked properly right from the start. The path is local on your own computer, depending on how `sihnpy` was installed.

In [3]:
path_participant_list

'/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/participants.tsv'

## Additional information on brain imaging preprocessing

Most of the modules envisioned for `sihnpy` overlap with my first PhD project, which focused on functional magnetic resonance imaging. Incidently, most of the data available through `sihnpy` is data derived from fMRI.

[fMRIPrep v20.2.0](https://fmriprep.org/en/20.2.0/) (which is already kind of an old version) was used to preprocess the data. Then, [nilearn 0.9.2](https://nilearn.github.io/stable/index.html) was used to compute and derive the functional connectivity.

### fMRIPrep 
*Below is copied almost integrally from fMRIPrep's boilerplate post-preprocessing*

Results included in this manuscript come from preprocessing performed using fMRIPrep 20.2.0 (Esteban, Markiewicz, et al. (2018); Esteban, Blair, et al. (2018); RRID:SCR_016216), which is based on Nipype 1.5.1 (Gorgolewski et al. (2011); Gorgolewski et al. (2018); RRID:SCR_002502).

**Anatomical data preprocessing**
All available T1-weighted (T1w) images for each participants across visits were used. They were corrected for intensity non-uniformity (INU) with N4BiasFieldCorrection (Tustison et al. 2010), distributed with ANTs 2.3.3 (Avants et al. 2008, RRID:SCR_004757). The T1w-reference was then skull-stripped with a Nipype implementation of the antsBrainExtraction.sh workflow (from ANTs), using OASIS30ANTs as target template. Brain tissue segmentation of cerebrospinal fluid (CSF), white-matter (WM) and gray-matter (GM) was performed on the brain-extracted T1w using fast (FSL 5.0.9, RRID:SCR_002823, Zhang, Brady, and Smith 2001). A T1w-reference map was computed after registration of all T1w images (after INU-correction) using mri_robust_template (FreeSurfer 6.0.1, Reuter, Rosas, and Fischl 2010). Brain surfaces were reconstructed using recon-all (FreeSurfer 6.0.1, RRID:SCR_001847, Dale, Fischl, and Sereno 1999), and the brain mask estimated previously was refined with a custom variation of the method to reconcile ANTs-derived and FreeSurfer-derived segmentations of the cortical gray-matter of Mindboggle (RRID:SCR_002438, Klein et al. 2017). Volume-based spatial normalization to one standard space (MNI152NLin2009cAsym) was performed through nonlinear registration with antsRegistration (ANTs 2.3.3), using brain-extracted versions of both T1w reference and the T1w template. The following template was selected for spatial normalization: ICBM 152 Nonlinear Asymmetrical template version 2009c [Fonov et al. (2009), RRID:SCR_008796; TemplateFlow ID: MNI152NLin2009cAsym] Note that while the Prevent-AD Open BIDS do contain other brain imaging modalities that can be leveraged by fMRIPrep (e.g., FLAIR), it is not consistent across participants. As such, preprocessing was restricted to T1w and EPI images only.

**Functional data preprocessing**
For each of the 12 BOLD runs found per subject (across all tasks and sessions), the following preprocessing was performed. First, a reference volume and its skull-stripped version were generated using a custom methodology of fMRIPrep. A B0-nonuniformity map (or fieldmap) was estimated based on a phase-difference map calculated with a dual-echo GRE (gradient-recall echo) sequence, processed with a custom workflow of SDCFlows inspired by the epidewarp.fsl script and further improvements in HCP Pipelines (Glasser et al. 2013). The fieldmap was then co-registered to the target EPI (echo-planar imaging) reference run and converted to a displacements field map (amenable to registration tools such as ANTs) with FSL’s fugue and other SDCflows tools. Based on the estimated susceptibility distortion, a corrected EPI (echo-planar imaging) reference was calculated for a more accurate co-registration with the anatomical reference. The BOLD reference was then co-registered to the T1w reference using bbregister (FreeSurfer) which implements boundary-based registration (Greve and Fischl 2009). Co-registration was configured with six degrees of freedom. Head-motion parameters with respect to the BOLD reference (transformation matrices, and six corresponding rotation and translation parameters) are estimated before any spatiotemporal filtering using mcflirt (FSL 5.0.9, Jenkinson et al. 2002). BOLD runs were slice-time corrected using 3dTshift from AFNI 20160207 (Cox and Hyde 1997, RRID:SCR_005927). The BOLD time-series (including slice-timing correction when applied) were resampled onto their original, native space by applying a single, composite transform to correct for head-motion and susceptibility distortions. These resampled BOLD time-series will be referred to as preprocessed BOLD in original space, or just preprocessed BOLD. The BOLD time-series were resampled into standard space, generating a preprocessed BOLD run in MNI152NLin2009cAsym space. First, a reference volume and its skull-stripped version were generated using a custom methodology of fMRIPrep. Several confounding time-series were calculated based on the preprocessed BOLD: framewise displacement (FD), DVARS and three region-wise global signals. FD was computed using two formulations following Power (absolute sum of relative motions, Power et al. (2014)) and Jenkinson (relative root mean square displacement between affines, Jenkinson et al. (2002)). FD and DVARS are calculated for each functional run, both using their implementations in Nipype (following the definitions by Power et al. 2014). The three global signals are extracted within the CSF, the WM, and the whole-brain masks. Additionally, a set of physiological regressors were extracted to allow for component-based noise correction (CompCor, Behzadi et al. 2007). Principal components are estimated after high-pass filtering the preprocessed BOLD time-series (using a discrete cosine filter with 128s cut-off) for the two CompCor variants: temporal (tCompCor) and anatomical (aCompCor). tCompCor components are then calculated from the top 2% variable voxels within the brain mask. For aCompCor, three probabilistic masks (CSF, WM and combined CSF+WM) are generated in anatomical space. The implementation differs from that of Behzadi et al. in that instead of eroding the masks by 2 pixels on BOLD space, the aCompCor masks are subtracted a mask of pixels that likely contain a volume fraction of GM. This mask is obtained by dilating a GM mask extracted from the FreeSurfer’s aseg segmentation, and it ensures components are not extracted from voxels containing a minimal fraction of GM. Finally, these masks are resampled into BOLD space and binarized by thresholding at 0.99 (as in the original implementation). Components are also calculated separately within the WM and CSF masks. For each CompCor decomposition, the k components with the largest singular values are retained, such that the retained components’ time series are sufficient to explain 50 percent of variance across the nuisance mask (CSF, WM, combined, or temporal). The remaining components are dropped from consideration. The head-motion estimates calculated in the correction step were also placed within the corresponding confounds file. The confound time series derived from head motion estimates and global signals were expanded with the inclusion of temporal derivatives and quadratic terms for each (Satterthwaite et al. 2013). Frames that exceeded a threshold of 0.5 mm FD or 1.5 standardised DVARS were annotated as motion outliers. All resamplings can be performed with a single interpolation step by composing all the pertinent transformations (i.e. head-motion transform matrices, susceptibility distortion correction when available, and co-registrations to anatomical and output spaces). Gridded (volumetric) resamplings were performed using antsApplyTransforms (ANTs), configured with Lanczos interpolation to minimize the smoothing effects of other kernels (Lanczos 1964). Non-gridded (surface) resamplings were performed using mri_vol2surf (FreeSurfer).

Many internal operations of fMRIPrep use Nilearn 0.6.2 (Abraham et al. 2014, RRID:SCR_001362), mostly within the functional processing workflow. For more details of the pipeline, see the section corresponding to workflows in fMRIPrep’s documentation.

### Nilearn

Once preprocessed by fMRIPrep, confounds were removed from the images and frames with excessive motion were scrubbed using Nilearn. Timeseries were extracted in 400 brain parcels from the Schaefer atlas (Schaefer et al. 2018) and the timeseries in each region was correlated with every other region using partial correlations to generate the functional connectivity matrices. This process yielded 400x400 matrices representing the functional links between each brain region of the atlas.

### Scripts

While it is not the goal of `sihnpy` to focus on preprocessing, I made documentation on how to actually preprocess the fMRI data from the Prevent-AD. More details on how to access and download Prevent-AD data as well as how to process it is available IN THE NEXT SECTION.