## Data Quality Assurance (QA)

This script identify bad quality data (e.g. with motion artifacts) and returns vectors specifying which subject can be included in second-level analysis. Script features:
- identifies high motion subjects
- returns DataFrame specifying final sample with high quality data

---
**Last update**: 19.02.2020 

In [None]:
import os
import sys
import pandas as pd
import numpy as np

from bids import BIDSLayout

path_root = os.environ.get('DECIDENET_PATH')
path_code = os.path.join(path_root, 'code')
if path_code not in sys.path:
    sys.path.append(path_code)
from dn_utils.behavioral_models import load_behavioral_data 

In [None]:
# Directory to save exclusion table
path_out = os.path.join(path_root, 
                        'data/main_fmri_study/derivatives/nistats/exclusion')
os.makedirs(path_out, exist_ok=True)

# Load behavioral data
path_beh = os.path.join(path_root, 'data/main_fmri_study/sourcedata/behavioral')
beh, meta = load_behavioral_data(path=path_beh)
n_subjects, n_conditions, n_trials, _ = beh.shape
n_scans = 730

## Load confounds
- `conf_files`: list of two lists containing sorted (by subject number) paths to confound files

In [None]:
path_bids = os.path.join(path_root, 'data/main_fmri_study')

layout = BIDSLayout(
    root=path_bids,
    derivatives=True,
    index_metadata=False
)

conf_filter = {
    "extension": "tsv",
    "desc": "confounds",
    "return_type": "filename"
}

conf_files = []

for task_dict in [{"task": "prlrew"}, {"task": "prlpun"}]:
    conf_filter.update(task_dict)
    conf_files.append(layout.get(**conf_filter))

## Exclusion criteria

Two types of exclusion criteria are applies. Fist, exclusion based on excessive head motion is performed. We consider three criteria for volume-to-volume movements (**framewise displacement; FD**):
- mean FD should not exceed 0.2mm (`thr_fd_mean`)
- max FD should not exceed 5mm (`thr_fd_max`)
- number of volumes with FD > 0.5mm should not exceed 20% of total volumes (`thr_fd_gt05`)

Second, we excluded additional subjects based on other factors e.g. errors in acquisition, chance-level performance. We excluded:
- subject `m19` (both conditions): due to flipped response grips
- subject `m32` (pun condition): due to failed realignment (TODO: fix preprocessing for that subject)

Columns in `exclusions.csv` are coded accordingly:
- `ok_fd_<condition>`: False for subjects with excessive movement during \<condition\>, True otherwise
- `ok_err_<condition>`: False for additional subjects excluded from specific condition, True otherwise
- `ok_<condition>`: includes both movement and additional exclusion criteria for specific condition, False for subjects excluded from analysis, True for subjects included in analysis 
- `ok_all`: includes all exclusion criteria from both conditions, represent final inclusion / exclusion vector for second level analysis

In [None]:
# Exclusion threshold
thr_fd_mean = 0.2
thr_fd_max = 5
thr_fd_gt05 = 0.2 * n_scans

# Additional exclusions
error_rew = ['m19']
error_pun = ['m19', 'm32']

In [None]:
fd_stats = np.zeros((n_subjects, n_conditions, 3))

for sub in range(n_subjects):
    for con in range(n_conditions):

        df = pd.read_csv(conf_files[con][sub], sep='\t')

        fd_mean = df['framewise_displacement'].mean()
        fd_max = df['framewise_displacement'].max()
        fd_gt05 = (df['framewise_displacement'] > 0.5).sum()
        
        fd_stats[sub, con, :] = [fd_mean, fd_max, fd_gt05]

# Create exclusion DataFrame
df = pd.DataFrame(data=np.hstack((fd_stats[:,0,:], fd_stats[:,1,:])),
                  columns=['fd_mean_rew', 'fd_max_rew', 'fd_gt05_rew',
                           'fd_mean_pun', 'fd_max_pun', 'fd_gt05_pun'])
df.insert(0, 'sub', meta['dim1'])

# Apply exclusion criteria for motion
df['ok_fd_rew'] = (df['fd_mean_rew'] < thr_fd_mean) \
                   & (df['fd_max_rew'] < thr_fd_max) \
                   & (df['fd_gt05_rew'] < thr_fd_gt05)
df['ok_fd_pun'] = (df['fd_mean_pun'] < thr_fd_mean) \
                   & (df['fd_max_pun'] < thr_fd_max) \
                   & (df['fd_gt05_pun'] < thr_fd_gt05)

# Apply additional exclusion criteria (e.g. acquisition errors)
df['ok_err_rew'] = pd.Series((True, ) * n_subjects)
for sub in error_rew:
    df.loc[np.flatnonzero(df['sub'] == sub), 'ok_err_rew'] = False    
df['ok_err_pun'] = pd.Series((True, ) * n_subjects)
for sub in error_pun:
    df.loc[np.flatnonzero(df['sub'] == sub), 'ok_err_pun'] = False  

# Create summary for conditions & entire task
df['ok_rew'] = df['ok_fd_rew'] & df['ok_err_rew']
df['ok_pun'] = df['ok_fd_pun'] & df['ok_err_pun']
df['ok_all'] = df['ok_rew'] & df['ok_pun']

df.to_csv(os.path.join(path_out, 'exclusion.csv'))