# FNGS Dataset Analysis
Here, we will detail a basic framework for generating derivatives with the FNGS pipeline for all of your subjects in your dataset at once. For the purposes of this tutorial, we assume that each subject has 1 anatomical and 1 resting state scan per scan session.

## Subject List Maintenance
To begin, we collect a list of all the resting state nifti files, and save them to a textfile. In your terminal window (note that we are not in python yet) call something like the following, and save it to a textfile:

In [None]:
%%bash
find ../tests/data/ -maxdepth 1 -mindepth 1 -name "*fMRI*.nii*" > testrest.txt
cat testrest.txt
find ../tests/data/ -maxdepth 1 -mindepth 1 -name "*MPRAGE*.nii*" > testanat.txt
cat testanat.txt

Here, we give terminal the directory where we expect some rest files to be using the find command, specify exactly how many directory levels we expect our input to go (expect to be same level as the directory we are looking in here, so we specify 1), and then finally specify some keywords that would be present in our fMRI filenames that would not be found in other filenames. We repeat this for our structural scans, and note that we need the subjects to match up row-wise in our specification files (since we want the functional images of one subject to be analyzed with a structural scan of that same subject).

#### Expected Alternate Dataset Organization Methods

While processing brain graphs, you will probably come across several common dataset organization hierarchies. Here, we will detail two common ones (from the CoRR dataset) and show you how we might handle them. If our data was organized a bit differently, such as the following:

    +-- /BNU_1/ # directory where we have all our subjects  
    |    +-- 0025864/ # subject directory  
    |        +-- session_1/  
    |             +-- rest/BNU1_0025864_1_rest.nii.gz   
    |             +-- anat/BNU1_0025864_1_anat.nii.gz  
    |        +-- session_2/  
    |             +-- rest/BNU1_0025864_2_rest.nii.gz 
    |             +-- anat/BNU1_0025864_2_anat.nii.gz
    |    +-- 0021002/ and so on for all subjects...  

For this potential dataset organization, we can very easily replicate the above procedure, with a few small changes, and due to the fact that we have an anatomical scan at each level we have a rest scan, the call will again be very simple. We might make a call like the following:

In [None]:
%%bash
find ./BNU_1/ -maxdepth 4 -mindepth 4 -name "*rest*.nii.gz" > bnurest.txt
find ./BNU_1/ -maxdepth 4 -mindepth 4 -name "*anat*.nii.gz" > bnuanat.txt

    +-- /NKI/ # directory where we have all our subjects  
    |    +-- 0021001/ # subject directory  
    |        +-- session_1/  
    |             +-- rest_645/NKI_0021001_1_rest.nii.gz
    |             +-- rest_1400/NKI_0021001_1_rest.nii.gz
    |             +-- anat/NKI_0021001_anat.nii.gz  
    |        +-- session_2/  
    |             +-- rest_645/NKI_0021001_2_rest.nii.gz
    |             +-- rest_1400/NKI_0021001_2_rest.nii.gz
    |    +-- 0021002/ and so on for all subjects...  

We might need a more complex command to obtain these files. Note that here, we only have one anatomical scan per pair of functional scan sessions. This is because the anatomical scan is only imaging the structural properties of the brain; in a typical fMRI study, where each session is conducted in a small time window, we would not expect any significant structural changes for healthy patients, so often experimenters will only collect one anatomical scan for each subject, regardless of the number of scanning sessions. If your data is organized such as this, you might need a bit more complex of a function call to get the functional file specification and anatomical file specifications to match up properly. We might make a call something like this, first: 

In [None]:
%%bash
find ./NKI/ -maxdepth 4 -mindepth 4 -wholename "*rest_645*rest.nii.gz" > nkirest.txt

Here, we use the 'wholename' option with the find command, instead of 'name' (wholename allows us to exclude directories, and not just in the filename itself; here we want to exclude any subject without a TR of 645 so we specify the folder TR=645 would be placed into in our find command). Then, to get the anatomical files organized properly, we first take a look at our directory structure, and then can make substitutions in our resting file to adjust in the anatomical paths as well (this will lead to a perfect matching of subjects, since you are only substituting paths within the subject's portion of the directory structure and not substituting between subjects). I have found this way simplest:

In [None]:
%%bash
cp nkirest.txt nkianat.txt
# begin by in-place substituting the session where the anatomical
# scan is found; in this case, the anatomical scans for session_1
# are in the session_1 directory, and the anatomical scans for
# session_2 are in the session_1 directory.
sed -i 's/session_2/session_1/g' nkianat.txt
# then, substitute any keywords that might be different between rest
# scans and anatomical scans
sed -i 's/rest_645/anat/g' nkianat.txt # replace the directory name
sed -i 's/1_rest/anat/g' nkianat.txt # replace the filename for sess_1
sed -i 's/2_rest/anat/g' nkianat.txt # replace the filename for sess_2

These are just a few of the potential directory structures you might come across, but just about every dataset I have analyzed is in one of these two structures (or a similar structure), so hopefully you won't require too much manipulation to organize your structural and functional specification files. If you have any questions, feel free to make an issue, or shoot me an email at ericwb95@gmail.com.

## Multigraph Processing

### Python Scripts
Now that we have our two specification files, we are ready to do some multigraph processing. Here, we will open up the 'fngs_multigraph.py' script found in this directory, and explain what's going on, so let's begin by loading some dependencies:

In [None]:
import sys
import argparse
from ndmg.scripts.fngs_pipeline import fngs_pipeline
from multiprocessing import Process

Nothing too out of the ordinary here; we import in our pipeline and a multiprocessing module, which allows us to spawn processes for each subject and terminates all of the memory being used by a particular subject upon completion of the subject. This is useful because several packages used for quality control (ie, matplotlib) do not effectively clear their cache in between runs, so having a separate process for each subject that can be terminated in its entirity upon completion eliminates this cache problem. We add our dependencies that will be consistent between subjects:


In [None]:
atlas = "/path/to/atlas.nii.gz"
atlas_brain = "/path/to/atlas/brain.nii.gz
mask = "/path/to/atlas/mask.nii.gz"
labels = ["/path/to/label/in/atlas/brainspace.nii.gz"]


Noting that we want to use the same atlas, atlas brain, mask, and labelled atlases for every subject (as detailed in the single subject tutorial, we want all subjects to be in the same brain space to make accurate downstream inferences from our timeseries). We then specify our resting and anatomical files, which were computed above:

In [None]:
restfile = 'bnurest.txt'
anatfile = 'bnuanat.txt'

And are then ready to spawn our processes:

In [None]:
with open(results.restfile) as restfile:
    with open(results.anatfile) as anatfile:
        restpaths = [l[:-1] for l in restfile.readlines()]
        anatpaths = [l[:-1] for l in anatfile.readlines()]
        for (rest, anat) in zip(restpaths, anatpaths):
            try:
                p = Process(target=fmri_pipeline, args=(rest, anat,
                            atlas, atlas_brain, mask, labels,
                            outdir),
                            kwargs={'clean':False, 'fmt':'graphml'})
                p.start()
                p.join()
            except Exception as e:
                print(e)

## Running from Bash
As we provide entry points for the fngs pipeline, the pipeline can alternatively be run for multigraph processing from a shell script:

In [None]:
%%bash
# fngs_multigraph.sh
#$1 = restfile
#$2 = anatfile
#$3 = /path/to/outdir

atlas='/path/to/atlas.nii.gz'
atlas_brain='/path/to/atlas/brain.nii.gz'
atlas_mask='/path/to/atlas/mask.nii.gz'
label='/path/to/labels.nii.gz'

exec 4<$1
exec 5<$2

while read -r rest <&4 && read -r anat<&5; do
    fngs_pipeline $rest $anat $atlas $atlas_brain $atlas_mask $3 $label -fmt graphml
done 4<$1 5<$2

We then can execute this from command line using:

In [None]:
%%bash
bash fngs_multigraph.sh restfiles.txt anatfiles.txt /path/to/outputdir