# Notebook for analysis of ABCD data

This notebook is the outline for the ABCD paper. As such is it divided in to sections as one would see in journal articles

# Motivating articles

Tooley at al,. 2019  
Cohen and D'Esposito, 2015  
https://www.sciencedirect.com/science/article/pii/S105381191730109X?via%3Dihub  
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002328

## Introduction

## Hypotheses
#### 1.	Assess the differences in functional connectivity, modularity, global connectivity, and participation coefficient between overweight, obese, and normal weight adolescents by pubertal status.   
a. Hypothesis. Obese children in the prepubertal group will show low functional connectivity between the insula, frontal operculum, left middle temporal cortex, and the dlPFC based on the previous work of Moreno-Lopez. 
i. Functional connectivity: minimum edge cut  
b. Hypothesis. Normal weight late pubertal children will show increased integration of the cingulo-operular/salience network based on the work of Marek  
i.	Louvain algorithm for modularity  
ii.	Participation coefficient to assess integration   
iii.	ANOVA between groups controlling for age and race/ethnicity  
c.	Hypothesis. Obese children will show overall lower global efficiency    
i.	Global efficiency  
ii.	ANOVA to compare BMI groups controlling for age, pubertal status, race/ethnicity  
#### 2.	Assess the relationship between delayed gratification, topological metrics, BMI, and puberty   
a.	Hypothesis. Children who are both overweight/obese and in early puberty will show poor delayed gratification and this will be reflected in decreased participation coefficients in the cingulo-opercular cortex   
i.	2 way ANOVA between BMI and puberty with delayed gratification as an outcome measure  
ii.	Correlate delayed gratification with participation coefficients in the cingulo-opercular cortex  

### Preregistered at https://osf.io/9y73t

## Materials and Methods

### Participant Sample

* We used a random sample from the ABCD dataset. Of the !<TOTAL possible>! we randomly selected 4524 subjects 
* The data is from the 1.0 release 2/8/2018  
* Specifically, we used the Processed MRI Data (used for minimally processed data) fmriresults01 
* Participants were excluded if they were missing height, weight, or did not have enough data to determine pubertal compostite score.
* Particpants were excluded if they were prepubertal or underweight
* In order to get even group sizes, we randomly selected even numbered cells

### Missing subjects

* 387 were missing pubertal data
* 699 were prepubertal
* 181 were underweight, of those 25 were both underweight and prepubertal. 84 were underweight and early pubertal, 59 were underweight and midpubertal. None of the late puberal were underweight. 
* 1568 had scanning data available
* A total of 1149 met the above criteria and had a scan data

### Measurement of BMI

####  Important Notebook

Exploring_imaging_data.ipyb

To assess BMI we used the average height and average weight variables. We calculated both raw BMI and BMI percentile based on sex and age. 

### Measurement of Puberty

### General
* pds_ht2_y = Would you say that your growth in height?
* pds_skin2_y = Have you noticed any skin changes, especially pimples?
* pds_bdyhair_y = And how about the growth of your body hair? ("Body hair" means hair any place other than your head, such as under your arms) Would you say that your body hair growth:
#### Female specific
* pds_f4_2_y = Have you noticed that your breasts have begun to grow?
* pds_f5_y = Have you begun to menstruate (started to have your period)?
#### Male specific
* pds_m4_y = Have you noticed a deepening of your voice?
* pds_m5_y = Have you begun to grow hair on your face?
### Scoring Algorithms:
* For Items 1 through 4 on the girls’ version and all items on the boys’ version, response options were: not yet started (1 point); barely started (2 points); definitely started (3 points); seems complete (4 points); I don’t know (missing). 
* Yes on the menstruation item = 4 points; no = 1 point. 
* Point values are averaged for all items to give a Pubertal Development Scale (PDS) score.

#### Puberty Category Scores are computed using the criteria of Crockett (1988, unpublished) by totaling the scale values given above.

#### To compute Puberty Category Scores for boys use body hair growth, voice change, and facial hair growth as follows:
* Prepubertal = 3
* Early Pubertal = 4 or 5 (no 3-point responses)
* Midpubertal = 6, 7, or 8 (no 4-points)
* Late pubertal = 9-11
* Postpubertal = 12
#### To compute Puberty Category Scores for girls use body hair growth, breast development, and menarche as follows:
* Prepubertal = 2 and no menarche
* Early Puberty = 3 and no menarche
* Midpubertal = > 3 and no menarche
* Late Puberty = <= 7 and menarche
* Postpubertal = 8 and menarche.

### Imaging Acquistion

See Casey et al. 2018 https://doi.org/10.1016/j.dcn.2018.03.001

### Imaging processing

#### Done By ABDC Image processing (common to all fMRI)
* head motion corrected by registering  each frame to the first using AFNI’s3dvolreg (Cox, 1996) 
* B0 distortions were corrected using the reversing gradient method (Holland, et al.,2010)  
* displacement field estimated fromspin-echo fieldmap scans 
* applied to gradient-echo images after adjustment for between-scan head motion  
* corrected for gradient nonlinearity distortions(Jovicich,etal.,2006) 
* between scan motion correction across all fMRI scans in imaging event 
* registration between T2-weighted, spin-echo B0 calibration scans and T1-weighted  
* structural images performed using mutual information (Wells, et al., 1996) • rs-fMRI specific pre-processing  
* removalofinitialvolumes 
#### 3 scanner types
*  Siemens: 8 TRs 
*  Philips: 8 TRs 
*  GE DV25: 5 TRs 
*   GE DV26: 16 TRs 
#### normalization and demean 
*  divide by the mean of each voxel, subtract 1, multiply by 100 
#### regression 
*  linear regression to remove quadratic trends and signals correlated with 
*  motion and mean time courses of cerebral white matter, ventricles, and whole 
*  brain, plus first derivatives (Power, et al., 2014; Satterthwaite, et al., 2012) 
*  motion regression included 6 parameters plus derivatives and squares 
*  frames with displacement > 0.3 mm were excluded from the regression (Power, et al., 2014) 
#### temporalfiltering 
*  band-pass filtered between 0.009 and 0.08 Hz (Hallquist, et al., 2013) o pre-processedtimecoursesweresampledontothecorticalsurface  
*  projecting 1mm into cortical gray matter along surface normal vector 
* motion censoring to reduce residual effects of head (Power, et al., 2012; Power, et al., 2014) 
*  motion estimates filtered to attenuate signals (0.31 - 0.43 Hz) associated with respiration (18.6 - 25.7 respirations / minute) 
*  time points with FD > 0.2 mm excluded from variance and correlation calculations 
*  time periods with < 5 contiguous, sub-threshold time points also excluded 
*  time points that were outliers in standard deviation across ROIs also excluded 

### Python/FSL Resting State Pipeline and Network Construction
wrapped in makinCorrelations.py  
Chou et al. AJNAR(2012), May; 33(5): 833–838  
https://wiki.biac.duke.edu/biac:analysis:resting_pipeline  

#### Step 3
* The functional run is meaned across time with fslmaths, then bet is applied. the resulting mask is then applied to the entire run of data
* if provided T1 anatomical is skull stripped. –anatbetfval is used to control intensity threshold, the default is 0.5 (same as feat)

#### Step 4
* normalize the data using flirt
* if no options are specified, then the default is the standard MNI152_T1_2mm_brain used in feat
* if you have a specific template, you can define it with –ref (ie: a kid brain, or study specific template)
* if your subject has already been normalized during standard pre-processing of other runs, please provide the flirt matrix from pre-stats with –flirtmat ( most likely example_func2standard.mat)
* this will apply the previously determined flirt matrix to your functional data instead of trying to calculate a new matrix based on the functionals. the matrix from pre-stats was likely calculated using a high-resolution anatomical, also this will assure that your resting state runs are in the same space as the other runs from your subject.
* if you've provided at T1 anatomical image then this sequence is followed: func-2-t1 t1-2-standard
* flirt matrices are concatenated to create func-2-standard
* if no T1 is provided, then the functional is used for the flirt normalization


#### Step 7
* if defaults are used, then the aal_MNI_V4 label file is used to extract the average timeseries for each of the 116 regions
* FOR THIS ANALYSIS WE USED THE POWER ATLAS
* cross correlation coefficients are found for the entire matrix of 116×116 regions
* this step produces 4 files:
-- “subject.graphml” : graphml format with regions timecourse, zr_vals, r_vals  
-- “corrlabel_ts.txt”: extracted time series for each region  
-- “r_matrix.nii.gz”: the correlation coefficients  
-- “zr_matrix.nii.gz”: normalized correlation coefficients  
-- “mask_matrix.nii.gz”: an inclusion mask for everything below the intersect of the regions, which can be used at a higher level  
* since they are saved as nifti, then can be loaded into fslview.
* if the default labels are used, then you can load a custom atlas ( “AAL116 Correlation Atlas” ) into fslview which will allow you to see which regions each “voxel” represents
* this is loaded through the Atlas Toolbar, the atlas is installed on the cluster, but it can be provided with instructions to install elsewhere.
* if you want to provide your own label file, the –corrlabel option can be used.
* The input to this step would be a 3D image of ROIs the same size as your normalized data.
* Each individual ROI needs to have a unique intensity value for the timecourse extraction ( ie: 5 ROIs with intergers as values from 1-5 ).
* This would be the standard type of thing saved from the pickatlas
* Also provide a label text –corrtext file with your labels intensity value and label name in tab delimited format
* If your space is different from mni please provide the anterior commissure point –refacpoint for centroid/XYZ conversion
* The correlation matrix would then be NumRois X NumRois and point 1,4 would be the corrcoef or zscore of the ROI with values 2 and 5 ( these are indexed starting at zero )

### Network Statistics
Important notebooks
* ABCD_graph_analysis

#### Functional Connectivity

#### Clustering coefficient
A measure of local segregation

#### Modularity
Used the community best partition value

#### Participation coefficient
Partition networks into the modules, calculate the PC per node within each group. Higher PC indicates more distributed between network connectivity, while a PC of 0 signifies a node’s links are completely within its home network (within network).

#### Examine within larger scale networks
YEO et al. 2011 
Caluclare the average clustering coefficient across nodes with in each parcel 


### Network Null Models

A conservative null model that preserved both the degree and strength distributions, suitable for use in complex functional brain networks. This model preserves both the sign and approximate weight of connections when permuting edges. We generated a total of 100 instantiations of each null model per participant. 

### Statistical Modeling and testing

#### Test for non-linear effects

#### Test for linear effects

#### Test for the effect of distance

## Results

On average there were 3.7 scans per subject
1045 subjects


## Discussion 