## 

# ACERTA-ABIDE

### Introduction

This notebook was written to more easily display the steps needed to work through this project. Except for the tables near the end, it works by running scripts external to the notebook.

### Downloading the ABIDE dataset

The original authors provided a script to download the dataset from Amazon S3. This is capable of resuming if interrupted and will not download it again if it has finished.

In [None]:
! python download_abide.py

### Preparing the datasets

In this section the data can be prepared in several different ways. This was configured to allow multiple datasets to exist at the same time. 
The data are prepared using either the whole data or a subset of that, which is passed as an argument. 
A derivative should also be passed at the end, which tells the script which labels to use. The default is cc200, which was used by this author and the original authors. The others may or may not work currently. 

In [None]:
# Commands Below are examples of datasets that can be PREPARED. The uncommented lines are those that were used in this author's project.

# Possible arguments: --whole, --threshold, --male, --leave-site-out
# Possible derivatives: cc200, aal, ez, ho, tt, hosenbach160

#!python prepare_data.py --threshold cc200
#!python prepare_data.py --male cc200
! python prepare_data.py --whole cc200
! python prepare_data.py --leave-site-out cc200


### Training the models

The commands below run the script to train the model. These may take a significant amount of time. The arguments and derivatives are the same as in the previous step. In order for this section to work for a given argument and derivative combination, that combination must be prepared above. 

This author ran them on system with a Ryzen 5 3600XT 6-core processor, 32gb of DDR4 2600mhz RAM, and a RTX 3060 with 12 GB VRAM. The whole dataset model took apprx 2.5 hours to run. The leave-site-out model took apprx. 27 hours to run.

In [None]:
# Commands Below are examples of how to TRAIN the models. The uncommented lines are those that were used in this author's project.

# Possible arguments: --whole, --threshold, --male, --leave-site-out
# Possible derivatives: cc200, aal, ez, ho, tt, hosenbach160

#!python nn.py --threshold cc200
#!python nn.py --male cc200
! python nn.py --whole cc200
! python nn.py --leave-site-out cc200

### Evaluating the models

This will return the accuracy, precision, recall, F-score, sensitivity, and specificy of each model. This is done with the same arguments and derivatives as above. 

The results are saved to txt files that can be accessed later

In [9]:
# Commands Below are examples of how to EVALUATE the models. The uncommented lines are those that were used in this author's project.

# Possible arguments: --whole, --threshold, --male, --leave-site-out
# Possible derivatives: cc200, aal, ez, ho, tt, hosenbach160

! python nn_evaluate.py --whole cc200
! python nn_evaluate.py --leave-site-out cc200

2022-04-26 22:26:25.359502: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-26 22:26:25.384112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-26 22:26:25.384293: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-26 22:26:25.384637: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags

### Comparison models

These commands run an SVM model and an RF model to compare the above models to. 

In [20]:
!python svm.py --whole cc200
!python random_forest.py --whole cc200

exp               acc      prec    recall    fscore      sens      spec
-----------  --------  --------  --------  --------  --------  --------
cc200_whole  0.686034  0.687668  0.710147  0.698057  0.710147  0.661107
exp              acc      prec    recall    fscore      sens     spec
-----------  -------  --------  --------  --------  --------  -------
cc200_whole  0.64144  0.644651  0.674208  0.657564  0.674208  0.60825


In [1]:
import os
import pandas as pd

filedir = os.getcwd() + "/data/"

txt_files = [f for f in os.listdir(filedir) if f.endswith('.txt')]
txt_files

for t in txt_files:
    print(t[:-4]) 
    data = pd.read_csv(filedir + t, sep="\t")
    display(data)