# MacLaren test-retest comparison
Comparing results of running:

1. Samseg v6.1 (via THINQ v1.0.0-rc.11 results)
2. Samseg v7.1.1 (via recon-all in FreeSurfer v7.1.1)
3. FreeSurfer v6.1
4. FreeSurfer v7.1.1 

On the [MacLaren test-retest dataset](https://openneuro.org/datasets/ds000239/versions/00001)



## Get Samseg v6.1 data (via THINQ v.1.0.0-rc.11)
```
mkdir -p /home/paul/cmet/data/20200609-mclaren-1.0.0-rc.11-42-g8d976b0--take4
cd /home/paul/cmet/data/20200609-mclaren-1.0.0-rc.11-42-g8d976b0--take4
aws s3 cp s3://cmet-scratch/maclaren-cmeds/demographics.tsv .
aws s3 cp \
  --recursive \
  --exclude "*" \
  --include "*subject_info.json" \
  --include "*.pdf" \
  s3://cmet-scratch/20200609-mclaren-1.0.0-rc.11-42-g8d976b0--take4/maclaren-cmeds/ .
find . -type d -name 'cache' -exec rm -rf {} \;
```

Get rid of cached `subject_info.json` files:

```
find . -type d -name 'cache' -exec rm -rf {} \;
```

## Get FreeSurfer v6.1 data 
```
mkdir -p /home/paul/cmet/data/20200714-maclaren-fs6/
cd /home/paul/cmet/data/20200714-maclaren-fs6/
aws s3 cp s3://cmet-scratch/maclaren-cmeds/demographics.tsv .
aws s3 cp \
  --recursive \
  --exclude "*" \
  --include "*.stats" \
  s3://cmet-scratch/20200714-maclaren-fs6/ .
```

## Get v7.1 data (both samseg and aseg)
```
mkdir -p /home/paul/cmet/data/20201006-maclaren-fs-7.1-samseg-aseg-long/
cd /home/paul/cmet/data/20201006-maclaren-fs-7.1-samseg-aseg-long/
aws s3 cp s3://cmet-scratch/maclaren-cmeds/demographics.tsv .
aws s3 cp \
  --recursive \
  --exclude "*" \
  --include "*.stats" \
  s3://cmet-scratch/20201006-maclaren-fs-7.1-samseg-aseg-long/ .
```

### Split data into seperate subdirs

To faciliate recursive processing of *.stats files
```
cd /home/paul/cmet/data/20201006-maclaren-fs-7.1-samseg-aseg-long/
mkdir cross
mkdir long
mkdir long-base
mv sub-??_run-?? ./cross/
mv sub-??_base ./long-base
mv sub* ./long/
```

### Rename the long dirs
This is gross
```
cd ./long
for DIR_STEM in `ls -1|sed 's/\..*//'`; do  SUB_NUM=`echo $DIR_STEM|sed 's/sub-//'|sed 's/_run.*//'`; mv ${DIR_STEM}.long.sub-${SUB_NUM}_base ${DIR_STEM}; done
```

In [32]:
import json
import os
import fnmatch
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# imports find_json_files(); load_json_file(); load_dataset(); load_fs_dataset()
from cmeds import *
# imports calc_cvs(); session_permute(); monte_carlo_perm_test
from test_retest import *

In [33]:
structs_of_interest = [
    'Left-Lateral-Ventricle',
    'Left-Hippocampus',
    'Left-Amygdala',
    'Left-Caudate',
    'Left-Putamen',
    'Right-Lateral-Ventricle',
    'Right-Hippocampus',
    'Right-Amygdala',
    'Right-Caudate',
    'Right-Putamen'
]

fs61aseg_demofile = '/home/paul/cmet/data/20200714-maclaren-fs6/demographics.tsv'
fs61aseg_datadir = '/home/paul/cmet/data/20200714-maclaren-fs6/'

fs61samseg_demofile = '/home/paul/cmet/data/20200609-mclaren-1.0.0-rc.11-42-g8d976b0--take4/demographics.tsv'
fs61samseg_datadir = '/home/paul/cmet/data/20200609-mclaren-1.0.0-rc.11-42-g8d976b0--take4/'

fs71aseg_demofile = '/home/paul/cmet/data/20201006-maclaren-fs-7.1-samseg-aseg-long/demographics.tsv'
fs71aseg_datadir = '/home/paul/cmet/data/20201006-maclaren-fs-7.1-samseg-aseg-long/cross'

fs71samseg_demofile = '/home/paul/cmet/data/20201006-maclaren-fs-7.1-samseg-aseg-long/demographics.tsv'
fs71aseg_datadir = '/home/paul/cmet/data/20201006-maclaren-fs-7.1-samseg-aseg-long/cross'

## Load data into pandas dataframes

In [34]:
maclaren_fs61aseg_df = load_fs_dataset(fs61aseg_datadir, fs61aseg_demofile, structs_of_interest)

Dropping the following subjects []


In [35]:
maclaren_fs61samseg_df, maclaren_fs61samseg_df_normative = load_dataset(fs61samseg_datadir, fs61samseg_demofile, drop_subjects=[], vol_data_src='volume')

Ignoring Subject (did it error out?) sub-01_run-39
Ignoring Subject (did it error out?) sub-01_run-02
Ignoring Subject (did it error out?) sub-01_run-09
Ignoring Subject (did it error out?) sub-01_run-08
Ignoring Subject (did it error out?) sub-01_run-24
Ignoring Subject (did it error out?) sub-01_run-33
Ignoring Subject (did it error out?) sub-01_run-13
Ignoring Subject (did it error out?) sub-01_run-16
Ignoring Subject (did it error out?) sub-01_run-14
Ignoring Subject (did it error out?) sub-01_run-32
Ignoring Subject (did it error out?) sub-01_run-01
Ignoring Subject (did it error out?) sub-01_run-36
Ignoring Subject (did it error out?) sub-01_run-06
Ignoring Subject (did it error out?) sub-01_run-26
Ignoring Subject (did it error out?) sub-01_run-27
Ignoring Subject (did it error out?) sub-01_run-40
Ignoring Subject (did it error out?) sub-01_run-03
Ignoring Subject (did it error out?) sub-01_run-19
Ignoring Subject (did it error out?) sub-01_run-18
Ignoring Subject (did it error 

In [36]:
maclaren_fs71aseg_df = load_fs_dataset(fs71aseg_datadir, fs61aseg_demofile, structs_of_interest)

Dropping the following subjects []


In [37]:
maclaren_fs71samseg_df = load_fssamseg_dataset(fs71aseg_datadir, fs61aseg_demofile, structs_of_interest, )

Dropping the following subjects []


In [38]:
# Add some lateral regions together so we can directly compare to table 1 in MacLaren et al.
# https://www.nature.com/articles/sdata201437/tables/2
regions = [             
            [ ['Left-Hippocampus', 'Right-Hippocampus'],'Hippocampus' ],
            [ ['Left-Lateral-Ventricle', 'Right-Lateral-Ventricle'],'Lateral-Ventricles' ],
            [ ['Left-Amygdala', 'Right-Amygdala',],'Amygdala' ],
            [ ['Left-Putamen', 'Right-Putamen'],'Putamen' ],
            [ ['Left-Caudate', 'Right-Caudate'],'Caudate' ],
          ]

maclaren_fs61aseg_df = add_regions(maclaren_fs61aseg_df,regions)
maclaren_fs61samseg_df = add_regions(maclaren_fs61samseg_df,regions)
maclaren_fs71aseg_df = add_regions(maclaren_fs71aseg_df,regions)
maclaren_fs71samseg_df = add_regions(maclaren_fs71samseg_df,regions)

In [39]:
# Setup for permutation tests

# Since samseg 6 has some issues with sub1, remove from all analyses
session_list= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
subject_list= [2,3]

# The column name that holds session info in the demographics.tsv
session_col='session'
# The column name that holds subject info in the demographics.tsv
subject_col='subject_num'

# To match with the rows of table 1 in https://www.nature.com/articles/sdata201437/tables/2
structs_of_interest = ['Hippocampus', 'Lateral-Ventricles', 'Amygdala', 'Putamen', 'Caudate']

In [45]:
# Run the permutation tests, this will take a while for large n
n = 10000
maclaren_fs61aseg_covs_df = monte_carlo_perm_test(maclaren_fs61aseg_df, subject_list, session_list, subject_col, session_col, structs_of_interest, n_itrs=n, method='gluer')
maclaren_fs61samseg_covs_df = monte_carlo_perm_test(maclaren_fs61samseg_df, subject_list, session_list, subject_col, session_col, structs_of_interest, n_itrs=n, method='gluer')
maclaren_fs71aseg_covs_df = monte_carlo_perm_test(maclaren_fs71aseg_df, subject_list, session_list, subject_col, session_col, structs_of_interest, n_itrs=n, method='gluer')
maclaren_fs71samseg_covs_df = monte_carlo_perm_test(maclaren_fs71samseg_df, subject_list, session_list, subject_col, session_col, structs_of_interest, n_itrs=n, method='gluer')

These tables are comparable to [Table 1 in MacLaren et al](https://www.nature.com/articles/sdata201437/tables/2)

In [46]:
maclaren_fs61aseg_covs_df

Unnamed: 0,Hippocampus,Lateral-Ventricles,Amygdala,Putamen,Caudate
mean-vol,8522.69875,13563.5025,3424.3325,9855.9875,6863.805
total-cov,2.053487,2.081874,3.230995,1.895118,1.706345
session-cov,2.258124,0.902939,3.159073,2.135972,1.557038
abs-diff-cov,0.204637,1.178935,0.071922,0.240854,0.149307
p-vals,0.2171,0.0,0.7825,0.1573,0.2911


In [47]:
maclaren_fs61samseg_covs_df

Unnamed: 0,Hippocampus,Lateral-Ventricles,Amygdala,Putamen,Caudate
mean-vol,8660.34375,15643.9925,3290.56625,11229.76125,7051.12625
total-cov,0.732712,1.428306,1.213812,1.103016,0.928835
session-cov,0.754715,0.950413,1.216737,0.979911,0.789903
abs-diff-cov,0.022003,0.477893,0.002925,0.123105,0.138931
p-vals,0.7116,0.0,0.9758,0.1617,0.0529


In [48]:
maclaren_fs71aseg_covs_df

Unnamed: 0,Hippocampus,Lateral-Ventricles,Amygdala,Putamen,Caudate
mean-vol,8590.835,13670.435,3510.1775,9883.4525,6867.28375
total-cov,1.670197,1.910059,3.013569,1.429833,1.996576
session-cov,1.749787,0.728072,3.255942,1.496864,1.644559
abs-diff-cov,0.079589,1.181987,0.242373,0.067031,0.352017
p-vals,0.572,0.0,0.3172,0.5657,0.0223


In [49]:
maclaren_fs71samseg_covs_df

Unnamed: 0,Hippocampus,Lateral-Ventricles,Amygdala,Putamen,Caudate
mean-vol,10770.732396,23208.657477,3779.890494,13517.24048,8726.35149
total-cov,1.38953,1.672704,1.556568,8.80864,2.07905
session-cov,1.069301,1.406157,1.675307,9.338202,2.004842
abs-diff-cov,0.320229,0.266547,0.118739,0.529562,0.074207
p-vals,0.0029,0.0285,0.3162,0.4659,0.6952
