<a href="https://colab.research.google.com/github/yecatstevir/teambrainiac/blob/main/source/DL/metrics/VisualizationCreation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Visualization Playground
## For 3D Convolutional Neural Network on Group Brain fMRI

This notebook turns fMRI brain images from flat matlab files into 4D tensor objects for CNN training.

To start:
- Mount Google Colab, clone fMRI repository locally, and create path to AWS for saving and loading
- Select desired brain images by subject id, splitting into train, validation, and test sets

Note: There is some additional data wranging needed to get the metric outputs from the Group3DCNN.ipynb notebook into the format used in this notebook.   

## Mount Colab in Google Drive and Import Images

In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/gdrive')  

Mounted at /content/gdrive


In [2]:
# Clone the entire repo.
!git clone -l -s https://github.com/yecatstevir/teambrainiac.git

# Change directory into cloned repo DL folder
%cd teambrainiac/source/DL

# !ls

Cloning into 'teambrainiac'...
remote: Enumerating objects: 2020, done.[K
remote: Counting objects: 100% (205/205), done.[K
remote: Compressing objects: 100% (176/176), done.[K
remote: Total 2020 (delta 112), reused 73 (delta 29), pack-reused 1815[K
Receiving objects: 100% (2020/2020), 110.42 MiB | 11.06 MiB/s, done.
Resolving deltas: 100% (1303/1303), done.
/content/teambrainiac/source/DL


### Load path_config.py to access AWS credentials

In [3]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving path_config.py to path_config.py
User uploaded file "path_config.py" with length 196 bytes


## Import Packages

In [24]:
# General Library Imports
import scipy.io
import pickle
import numpy as np
import pandas as pd
import tqdm
import random
from path_config import mat_path

import altair as alt

## Patient Time in Scanner

In [154]:
patient_status_df = pd.read_csv('/content/gdrive/My Drive/patient_status_in_scanner.csv')

reg_in_scanner = alt.Chart(patient_status_df).mark_tick(thickness=5).encode(
    x = 'image_index:Q',
    color = alt.Color('Patient Status:N', scale=alt.Scale(
        domain = ['Buffer (No Regulation)', 'Up Regulation', 'Down Regulation'],
        range = ['grey', '#446CCF', '#F58518'])
    )
).properties(
    width = 800,
    title = 'Patient Regulation in Scanner'
)

reg_in_scanner
# c='#446CCF' = blue
# 11:00
# '#F58518' = yellow

## Model Training
Recall the data was split into 4 files on AWS to save RAM. To further save RAM, each training file was split into two. 

The first training round went through each of the 8 partitions (4 files x 2 splits) for 10 epochs. See below that the model began to train faster on later partitions of the data, but with sustained low accuracies accuracies on the first epoch of each partition. We implemented early stopping to avoid overfitting when any batch of images achieved a perfect accuracy.

To further avoid overfitting on partitions, we ran a second training round through the data with only a single epoch for each partition. You can see the training results below.

In [167]:
first_training_round = pd.read_csv('/content/gdrive/My Drive/10_epochs.csv')

first_training_accuracies = alt.Chart(first_training_round).mark_line().encode(
    x='epoch:O',
    y='accuracy:Q',
    color='test_set:N'
).properties(
    title='First Round Training Accuracies by Epoch',
    width=400
)

first_training_accuracies

In [176]:
second_training_round = pd.read_csv('/content/gdrive/My Drive/final_train_epochs.csv')
second_training_round['partition'] = second_training_round['index']


alt.Chart(second_training_round).mark_boxplot(extent='min-max', size=35).encode(
    x='partition:O',
    y=alt.Y('accuracy:Q', scale=alt.Scale(domain=[0.5,1]))
).properties(
    width=400,
    title='Second Round Training Accuracies by Partition'
)

## Test Metrics

In [178]:
test = pickle.load(open('/content/gdrive/My Drive/metrics_dict_test_2.pkl', 'rb'))
test

{'epoch_1': {'accuracy': [tensor(0.6190),
   tensor(0.7619),
   tensor(0.7500),
   tensor(0.5595),
   tensor(0.5952),
   tensor(0.6429),
   tensor(0.7976),
   tensor(0.8452),
   tensor(0.5476),
   tensor(0.4762),
   tensor(0.7500),
   tensor(0.6190)],
  'loss': [tensor(0.8330, requires_grad=True),
   tensor(0.4402, requires_grad=True),
   tensor(0.7305, requires_grad=True),
   tensor(1.0419, requires_grad=True),
   tensor(0.8442, requires_grad=True),
   tensor(0.7074, requires_grad=True),
   tensor(0.4047, requires_grad=True),
   tensor(0.3713, requires_grad=True),
   tensor(0.8905, requires_grad=True),
   tensor(0.7976, requires_grad=True),
   tensor(0.5331, requires_grad=True),
   tensor(0.6473, requires_grad=True)]},
 'labels': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
         1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1,

In [180]:
subject_number = [int(x/2) for x in range(2,14)]
accuracies = [x.item() for x in test['epoch_1']['accuracy']]
loss = [x.item() for x in test['epoch_1']['loss']]
scatter_df = pd.DataFrame(np.array([subject_number, accuracies, loss]).T, columns = ['subject', 'accuracy', 'loss'])

test_scatter = alt.Chart(scatter_df).mark_circle(size=80).encode(
    x = alt.X('accuracy:Q', scale=alt.Scale(domain=[0,1])),
    y = alt.Y('loss:Q', scale=alt.Scale(domain=[0,1.2])),
    color = 'subject:N'
).properties(
    title='Test Set Accuracy and Loss by Subject'
)
test_scatter

In [None]:
# Run 2 and Run 3 of best person
# Run 2 and Run 3 of worst person

best = 0
worst = 1
for i,acc in enumerate(test['accuracy']):
  if accuracy

## To do finishing up
- Finish training the model on training data and save it
- Put all metrics in the same dictionary or dataframe for the first round of training with 10 epochs
- Build visualizations for epoch accuracies during training
- Talk about file deprecation


For Validation and Testing
 - Import validation dataset
 - Change metrics dictionary to contain predictions
 - Run and train on validation set
 

 Other
 - Do write-up~