# Preprocessing pipeline
Credits: https://miykael.github.io/nipype_tutorial/

At the moment, this notebook has been coded to be tested on a single subject.
Other subjects can be easily added to the "subject_list". Some small changes have to be done to the names of the images called in the notebook in order to generalize the script to all subjects.

Before the preprocessing, the data format was modified using the bash function dcm2nii.

In [1]:
%matplotlib inline
from os.path import join as opj
import os
import json
from nipype.interfaces.fsl import (BET, ExtractROI, FAST, FLIRT, ImageMaths,
                                   MCFLIRT, SliceTimer, Threshold)
from nipype.interfaces.spm import Smooth
from nipype.interfaces.utility import IdentityInterface, Merge
from nipype.interfaces.io import SelectFiles, DataSink, FreeSurferSource
from nipype.algorithms.rapidart import ArtifactDetect
from nipype.pipeline.engine import Workflow, Node, MapNode
from nipype.interfaces.ants import Registration, ApplyTransforms
from nipype.interfaces.fsl import Info
from nipype.interfaces.freesurfer import FSCommand, MRIConvert, BBRegister
from nipype.interfaces.c3 import C3dAffineTool

# Import modules
from os.path import join as opj
from nipype.interfaces.spm import Normalize12
from nipype.interfaces.utility import IdentityInterface
from nipype.interfaces.io import SelectFiles, DataSink
from nipype.algorithms.misc import Gunzip
from nipype.pipeline.engine import Workflow, Node, MapNode

from nilearn import image, plotting
import numpy as np
import pylab as plt
import numpy as np
from IPython.display import SVG

We first define all the directories and variables that we are going to use among the notebook. The TR is the repetition time, that is the interval of time between two acquisitions.

In [2]:
#basedir = '/Volumes/myFatDrive/ada_project/workingdir/PPMI/'

experiment_dir = '/Volumes/myFatDrive/ada_project'

#experiment_dir = '/Users/elisabettamessina/Desktop/ADA/ada2017hw/project/ExampleFolder2'
working_dir = 'workingdir'
output_dir = 'output_folder'
input_dir_1st = 'output_folder'    # name of 1st-level output folder


# Specification to MATLAB
from nipype.interfaces.matlab import MatlabCommand
MatlabCommand.set_default_paths('/Users/elisabettamessina/spm12')
MatlabCommand.set_default_matlab_cmd("matlab -nodesktop -nosplash")

# location of template in form of a tissue probability map to normalize to
template = '/Users/elisabettamessina/spm12/tpm/TPM.nii'

In [3]:
# Specify variables

# list of subject identifiers
subject_list = []
for fn in os.listdir(basedir):
    if fn[0]=='3' or fn[0]=='4':
        subject_list.append(fn)
#subject_list = ['3105']


# list of session identifiers
task_list = ['rs']

# Smoothing widths to apply
fwhm = [4, 8]

# TR of functional images
TR = 2.4 

# Isometric resample of functional images to voxel size (in mm)
iso_size = 4


#experiment_dir = '~/nipype_tutorial'          # location of experiment folder
#input_dir_1st = 'output_fMRI_example_1st'     # name of 1st-level output folder
#output_dir = 'output_fMRI_example_norm_ants'  # name of norm output folder
#working_dir = 'workingdir_fMRI_example_norm_ants'  # name of norm working directory



### Defining nodes
In the following cell we are going to define all the nodes of the preprocessing workflow, that are the separate steps that have to be performed. 
The main transformations are going to be motion correction (realignment), smoothing and co-registration, while the other nodes are functional to the main ones. In the future, the normalization node will be added to this workflow.

In [4]:
# ExtractROI - skip dummy scans
extract = Node(ExtractROI(t_min=4, t_size=-1),
               output_type='NIFTI',
        ~       name="extract")

# MCFLIRT - motion correction
mcflirt = Node(MCFLIRT(mean_vol=True,
                       save_plots=True,
                       output_type='NIFTI'),
               name="mcflirt")

# SliceTimer - correct for slice wise acquisition
slicetimer = Node(SliceTimer(index_dir=False,
                             interleaved=True,
                             output_type='NIFTI',
                             time_repetition=TR),
                  name="slicetimer")

# Smooth - image smoothing
smooth = Node(Smooth(), name="smooth")
smooth.iterables = ("fwhm", fwhm)

# Gunzip - unzip the structural image
gunzip_struct = Node(Gunzip(), name="gunzip_struct")

# Gunzip - unzip the contrast image
gunzip_con = MapNode(Gunzip(), name="gunzip_con",
                     iterfield=['in_file'])

# Normalize - normalizes functional and structural images to the MNI template
normalize = Node(Normalize12(jobtype='estwrite',
                             tpm=template,
                             write_voxel_sizes=[1, 1, 1]),
                 name="normalize")

# Artifact Detection - determines outliers in functional images
art = Node(ArtifactDetect(norm_threshold=2,
                          zintensity_threshold=3,
                          mask_type='spm_global',
                          parameter_source='FSL',
                          use_differences=[True, False],
                          plot_type='svg'),
           name="art")

# BET - Skullstrip anatomical Image
bet_anat = Node(BET(frac=0.5,
                    robust=True,
                    output_type='NIFTI_GZ'),
                name="bet_anat")

# FAST - Image Segmentation
segmentation = Node(FAST(output_type='NIFTI_GZ'),
                name="segmentation")

# Select WM segmentation file from segmentation output
def get_wm(files):
    return files[-1]

# Threshold - Threshold WM probability image
threshold = Node(Threshold(thresh=0.5,
                           args='-bin',
                           output_type='NIFTI_GZ'),
                name="threshold")

# FLIRT - pre-alignment of functional images to anatomical images
coreg_pre = Node(FLIRT(dof=6, output_type='NIFTI_GZ'),
                 name="coreg_pre")

# FLIRT - coregistration of functional images to anatomical images with BBR
coreg_bbr = Node(FLIRT(dof=6,
                       cost='bbr',
                       schedule=opj(os.getenv('FSLDIR'),
                                    'etc/flirtsch/bbr.sch'),
                       output_type='NIFTI_GZ'),
                 name="coreg_bbr")

# Apply coregistration warp to functional images
applywarp = Node(FLIRT(interp='spline',
                       apply_isoxfm=iso_size,
                       output_type='NIFTI'),
                 name="applywarp")

# Apply coregistration warp to mean file
applywarp_mean = Node(FLIRT(interp='spline',
                            apply_isoxfm=iso_size,
                            output_type='NIFTI_GZ'),
                 name="applywarp_mean")

### Coregistration workflow
The nypipe library provides us a smart and easy way to connect different nodes into a sequential workflow. We can first connect the nodes for the coregistration workflow, which will be then added to the main pre-processing workflow.

In [5]:
# Create a coregistration workflow
coregwf = Workflow(name='coregwf')
coregwf.base_dir = opj(experiment_dir, working_dir)

# Connect all components of the coregistration workflow
coregwf.connect([(bet_anat, segmentation, [('out_file', 'in_files')]),
                 (segmentation, threshold, [(('partial_volume_files', get_wm),
                                             'in_file')]),
                 (bet_anat, coreg_pre, [('out_file', 'reference')]),
                 (threshold, coreg_bbr, [('out_file', 'wm_seg')]),
                 (coreg_pre, coreg_bbr, [('out_matrix_file', 'in_matrix_file')]),
                 (coreg_bbr, applywarp, [('out_matrix_file', 'in_matrix_file')]),
                 (bet_anat, applywarp, [('out_file', 'reference')]),
                 (coreg_bbr, applywarp_mean, [('out_matrix_file', 'in_matrix_file')]),
                 (bet_anat, applywarp_mean, [('out_file', 'reference')]),
                 ])

Now we need to specify where the input data can be found and where and how to save the output data.

In [6]:
# Infosource - a function free node to iterate over the list of subject names
infosource = Node(IdentityInterface(fields=['subject_id', 'task_name']),
                  name="infosource")
infosource.iterables = [('subject_id', subject_list),
                        ('task_name', task_list)]

# SelectFiles - to grab the data (alternativ to DataGrabber)
anat_file = opj('{subject_id}', 'ep2d_RESTING_STATE', 'anat', 'anat.nii')
func_file = opj('{subject_id}', 'ep2d_RESTING_STATE', 'func',
                'rest.nii')

templates = {'anat': anat_file,
             'func': func_file}
selectfiles = Node(SelectFiles(templates,
                               base_directory= experiment_dir + '/workingdir/PPMI/'),
                   name="selectfiles")

# Datasink - creates output folder for important outputs
datasink = Node(DataSink(base_directory=experiment_dir,
                         container=output_dir),
                name="datasink")

## Use the following DataSink output substitutions
substitutions = [('_subject_id_', ''),
                 ('_task_name_', '/task-'),
                 ('_fwhm_', 'fwhm-'),
                 ('_roi', ''),
                 ('_mcf', ''),
                 ('_st', ''),
                 ('_flirt', ''),
                 ('.nii_mean_reg', '_mean'),
                 ('.nii.par', '.par'),
                 ]
subjFolders = [('fwhm-%s/' % f, 'fwhm-%s_' % f) for f in fwhm]
substitutions.extend(subjFolders)
datasink.inputs.substitutions = substitutions

### Main preprocessing workflow

In [7]:
# Create a preprocessing workflow
preproc = Workflow(name='preproc')
preproc.base_dir = opj(experiment_dir, working_dir)

# Connect all components of the preprocessing workflow
preproc.connect([(infosource, selectfiles, [('subject_id', 'subject_id'),
                                            ('task_name', 'task_name')]),
                 (selectfiles, extract, [('func', 'in_file')]),
                 (extract, mcflirt, [('roi_file', 'in_file')]),
                 (mcflirt, slicetimer, [('out_file', 'in_file')]),

                 (selectfiles, coregwf, [('anat', 'bet_anat.in_file'),
                                         ('anat', 'coreg_bbr.reference')]),
                 (mcflirt, coregwf, [('mean_img', 'coreg_pre.in_file'),
                                     ('mean_img', 'coreg_bbr.in_file'),
                                     ('mean_img', 'applywarp_mean.in_file')]),
                 (slicetimer, coregwf, [('slice_time_corrected_file', 'applywarp.in_file')]),
                 
                 (coregwf, smooth, [('applywarp.out_file', 'in_files')]),

                 (mcflirt, datasink, [('par_file', 'preproc.@par')]),
                 (smooth, datasink, [('smoothed_files', 'preproc.@smooth')]),
                 (coregwf, datasink, [('applywarp_mean.out_file', 'preproc.@mean')]),

                 (coregwf, art, [('applywarp.out_file', 'realigned_files')]),
                 (mcflirt, art, [('par_file', 'realignment_parameters')]),

                 (coregwf, datasink, [('coreg_bbr.out_matrix_file', 'preproc.@mat_file'),
                                      ('bet_anat.out_file', 'preproc.@brain')]),
                 (art, datasink, [('outlier_files', 'preproc.@outlier_files'),
                                  ('plot_files', 'preproc.@plot_files')]),
                 ])

In [8]:
# DEBUG MODE:
preproc.config['execution'] = {'stop_on_first_rerun': 'False',
                                   'hash_method': 'timestamp'}
from nipype import config, logging
config.enable_debug_mode()
logging.update_logging(config)

### Let's run it!

In [9]:
preproc.run('MultiProc', plugin_args={'n_procs': 4})

171217-18:54:05,788 workflow DEBUG:
	 MultiProcPlugin starting 4 threads in pool
171217-18:54:05,804 workflow DEBUG:
	 Creating flat graph for workflow: preproc
171217-18:54:05,818 workflow DEBUG:
	 expanding workflow: preproc
171217-18:54:05,819 workflow DEBUG:
	 processing node: preproc.infosource
171217-18:54:05,821 workflow DEBUG:
	 processing node: preproc.selectfiles
171217-18:54:05,822 workflow DEBUG:
	 processing node: preproc.extract
171217-18:54:05,823 workflow DEBUG:
	 processing node: preproc.mcflirt
171217-18:54:05,824 workflow DEBUG:
	 processing node: preproc.slicetimer
171217-18:54:05,824 workflow DEBUG:
	 processing node: preproc.coregwf
171217-18:54:05,825 workflow DEBUG:
	 in: connections-> [('anat', 'bet_anat.in_file'), ('anat', 'coreg_bbr.reference')]
171217-18:54:05,826 workflow DEBUG:
	 in: ('anat', 'bet_anat.in_file')
171217-18:54:05,839 workflow DEBUG:
	 in edges: preproc.selectfiles anat coregwf.bet_anat in_file
171217-18:54:05,840 workflow DEBUG:
	 disconnect

because the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.



171217-19:05:29,124 workflow DEBUG:
	 Needed files: /Volumes/myFatDrive/ada_project/workingdir/preproc/_subject_id_3105_task_name_rs/art/art.rest_roi_mcf_st_flirt_outliers.txt;/Volumes/myFatDrive/ada_project/workingdir/preproc/_subject_id_3105_task_name_rs/art/global_intensity.rest_roi_mcf_st_flirt.txt;/Volumes/myFatDrive/ada_project/workingdir/preproc/_subject_id_3105_task_name_rs/art/norm.rest_roi_mcf_st_flirt.txt;/Volumes/myFatDrive/ada_project/workingdir/preproc/_subject_id_3105_task_name_rs/art/stats.rest_roi_mcf_st_flirt.txt;/Volumes/myFatDrive/ada_project/workingdir/preproc/_subject_id_3105_task_name_rs/art/plot.rest_roi_mcf_st_flirt.svg;/Volumes/myFatDrive/ada_project/workingdir/preproc/_subject_id_3105_task_name_rs/art/mask.rest_roi_mcf_st_flirt.nii;/Volumes/myFatDrive/ada_project/workingdir/preproc/_subject_id_3105_task_name_rs/art/mask.rest_roi_mcf_st_flirt.mat;/Volumes/myFatDrive/ada_project/workingdir/preproc/coregwf/_subject_id_3105_task_name_rs/applywarp/rest_roi_mcf_st_

<networkx.classes.digraph.DiGraph at 0x111844748>

### Normalization workflow

In [10]:
 # Gunzip - unzip the structural image
gunzip_struct = Node(Gunzip(), name="gunzip_struct")

# Gunzip - unzip the contrast image
gunzip_con = MapNode(Gunzip(), name="gunzip_con",
                     iterfield=['in_file'])

171217-19:10:52,720 workflow DEBUG:
	 adding multipath trait: in_file


In [11]:
# Normalize - normalizes functional and structural images to the MNI template
normalize = Node(Normalize12(jobtype='estwrite',
                             tpm=template,
                             write_voxel_sizes=[1, 1, 1]),
                 name="normalize")

In [12]:
# Specify Normalization-Workflow & Connect Nodes
normflow = Workflow(name='normflow')
normflow.base_dir = opj(experiment_dir, working_dir)

# Connect up ANTS normalization components
normflow.connect([(gunzip_struct, normalize, [('out_file', 'image_to_align')]),
                  (gunzip_con, normalize, [('out_file', 'apply_to_files')]),
                  ])

171217-19:10:52,762 workflow DEBUG:
	 (normflow.gunzip_struct, normflow.normalize): No edge data
171217-19:10:52,764 workflow DEBUG:
	 (normflow.gunzip_struct, normflow.normalize): new edge data: {'connect': [('out_file', 'image_to_align')]}
171217-19:10:52,766 workflow DEBUG:
	 (normflow.gunzip_con, normflow.normalize): No edge data
171217-19:10:52,767 workflow DEBUG:
	 (normflow.gunzip_con, normflow.normalize): new edge data: {'connect': [('out_file', 'apply_to_files')]}


In [13]:
# Infosource - a function free node to iterate over the list of subject names
infosource = Node(IdentityInterface(fields=['subject_id']),
                  name="infosource")
infosource.iterables = [('subject_id', subject_list)]

# SelectFiles - to grab the data (alternativ to DataGrabber)
anat_file = opj(working_dir, 'PPMI', '{subject_id}', 'ep2d_RESTING_STATE', 'anat', 'zipped', 'anat.nii.gz')
con_file = opj(output_dir, 'preproc', '{subject_id}', 'task-rs',
                        'rest_mean.nii.gz')


templates = {'anat': anat_file,
             'con': con_file,
             }
selectfiles = Node(SelectFiles(templates,
                               base_directory=experiment_dir),
                   name="selectfiles")

# Datasink - creates output folder for important outputs
datasink = Node(DataSink(base_directory=experiment_dir,
                         container=output_dir),
                name="datasink")

# Use the following DataSink output substitutions
substitutions = [('_subject_id_', '')]
datasink.inputs.substitutions = substitutions

# Connect SelectFiles and DataSink to the workflow
normflow.connect([(infosource, selectfiles, [('subject_id', 'subject_id')]),
                  (selectfiles, gunzip_struct, [('anat', 'in_file')]),
                  (selectfiles, gunzip_con, [('con', 'in_file')]),
                  (normalize, datasink, [('normalized_files',
                                          'normalized.@files'),
                                         ('normalized_image',
                                          'normalized.@image'),
                                         ('deformation_field',
                                          'normalized.@field'),
                                         ]),
                  ])

171217-19:10:52,809 workflow DEBUG:
	 (normflow.infosource, normflow.selectfiles): No edge data
171217-19:10:52,810 workflow DEBUG:
	 (normflow.infosource, normflow.selectfiles): new edge data: {'connect': [('subject_id', 'subject_id')]}
171217-19:10:52,812 workflow DEBUG:
	 (normflow.selectfiles, normflow.gunzip_struct): No edge data
171217-19:10:52,813 workflow DEBUG:
	 (normflow.selectfiles, normflow.gunzip_struct): new edge data: {'connect': [('anat', 'in_file')]}
171217-19:10:52,814 workflow DEBUG:
	 (normflow.selectfiles, normflow.gunzip_con): No edge data
171217-19:10:52,816 workflow DEBUG:
	 (normflow.selectfiles, normflow.gunzip_con): new edge data: {'connect': [('con', 'in_file')]}
171217-19:10:52,817 workflow DEBUG:
	 (normflow.normalize, normflow.datasink): No edge data
171217-19:10:52,818 workflow DEBUG:
	 (normflow.normalize, normflow.datasink): new edge data: {'connect': [('normalized_files', 'normalized.@files'), ('normalized_image', 'normalized.@image'), ('deformation_

In [14]:
# DEBUG MODE:
normflow.config['execution'] = {'stop_on_first_rerun': 'False',
                                   'hash_method': 'timestamp'}
from nipype import config, logging
config.enable_debug_mode()
logging.update_logging(config)

In [15]:
normflow.run('MultiProc', plugin_args={'n_procs': 8})

171217-19:10:52,872 workflow DEBUG:
	 MultiProcPlugin starting 8 threads in pool
171217-19:10:52,917 workflow DEBUG:
	 Creating flat graph for workflow: normflow
171217-19:10:52,931 workflow DEBUG:
	 expanding workflow: normflow
171217-19:10:52,933 workflow DEBUG:
	 processing node: normflow.infosource
171217-19:10:52,934 workflow DEBUG:
	 processing node: normflow.selectfiles
171217-19:10:52,936 workflow DEBUG:
	 processing node: normflow.gunzip_con
171217-19:10:52,949 workflow DEBUG:
	 processing node: normflow.gunzip_struct
171217-19:10:52,950 workflow DEBUG:
	 processing node: normflow.normalize
171217-19:10:52,951 workflow DEBUG:
	 processing node: normflow.datasink
171217-19:10:52,953 workflow DEBUG:
	 finished expanding workflow: normflow
171217-19:10:52,954 workflow INFO:
	 Workflow normflow settings: ['check', 'execution', 'logging']
171217-19:10:52,962 workflow DEBUG:
	 PE: expanding iterables
171217-19:10:52,965 workflow DEBUG:
	 Detected iterable nodes [normflow.infosource]

<networkx.classes.digraph.DiGraph at 0x11197ffd0>

### Checking results
At this point we can show how the obtained images look like.

In the following, we look at the motion parameters and check wheter the artifact detection algorithm was able to exclude the outliers.

## Next steps:
* Time courses extraction
* Preprocess all subjects 
* Machine learning

Have a look to the README for more details!