<a href="https://colab.research.google.com/github/yecatstevir/teambrainiac/blob/main/source/DL/visualization_playground.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Visualization Playground
## For 3D Convolutional Neural Network on Group Brain fMRI

This notebook turns fMRI brain images from flat matlab files into 4D tensor objects for CNN training.

To start:
- Mount Google Colab, clone fMRI repository locally, and create path to AWS for saving and loading
- Select desired brain images by subject id, splitting into train, validation, and test sets

Note: There is some additional data wranging needed to get the metric outputs from the Group3DCNN.ipynb notebook into the format used in this notebook.   

## Mount Colab in Google Drive and Import Images

In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/gdrive')  

Mounted at /content/gdrive


In [2]:
# Clone the entire repo.
!git clone -l -s https://github.com/yecatstevir/teambrainiac.git

# Change directory into cloned repo DL folder
%cd teambrainiac/source/DL

# !ls

Cloning into 'teambrainiac'...
remote: Enumerating objects: 2020, done.[K
remote: Counting objects: 100% (205/205), done.[K
remote: Compressing objects: 100% (176/176), done.[K
remote: Total 2020 (delta 112), reused 73 (delta 29), pack-reused 1815[K
Receiving objects: 100% (2020/2020), 110.42 MiB | 11.06 MiB/s, done.
Resolving deltas: 100% (1303/1303), done.
/content/teambrainiac/source/DL


### Load path_config.py to access AWS credentials

In [3]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving path_config.py to path_config.py
User uploaded file "path_config.py" with length 196 bytes


## Import Packages

In [24]:
# General Library Imports
import scipy.io
import pickle
import numpy as np
import pandas as pd
import tqdm
import random
from path_config import mat_path

import altair as alt

## Import Dictionary of Paths to Flat Matlab Images

In [5]:
# Open path dictionary file to get subject ids
path = "../data/data_path_dictionary.pkl"
data_path_dict = open_pickle(path)

In [6]:
label_data_dict = access_load_data(data_path_dict['labels'][0], True)
input = np.array(label_data_dict['rt_labels']).T[0]


In [79]:
input = np.array(label_data_dict['rt_labels']).T[0]
df = pd.DataFrame(data=input, columns=['Patient Status'])
df['image_index'] = [x+1 for x in df.index]

reg_type = ['Up Regulation' if x==1 else 'Down Regulation' if x==0 else 'Buffer (No Regulation)' for x in df['Patient Status']]
df['Patient Status'] = reg_type

df.to_csv('/content/gdrive/My Drive/patient_status_in_scanner.csv')
df

Unnamed: 0,Patient Status,image_index
0,Buffer (No Regulation),1
1,Buffer (No Regulation),2
2,Buffer (No Regulation),3
3,Up Regulation,4
4,Up Regulation,5
...,...,...
139,Down Regulation,140
140,Down Regulation,141
141,Down Regulation,142
142,Buffer (No Regulation),143


In [8]:
reg_in_scanner = alt.Chart(df).mark_tick(thickness=5).encode(
    x = 'image_index:Q',
    color = alt.Color('Patient Status:N', scale=alt.Scale(scheme='dark2'))
).properties(
    width = 800,
    title = 'Patient Regulation in Scanner'
)


# c='#446CCF' = blue
# 11:00
# '#F58518' = yellow





reg_in_scanner

## Pulling the output all together

In [9]:
def avg_tensors(nested_tensors):
  metric_list = []
  for tensor_list in nested_tensors:
    temp_sum = 0
    for tensor in tensor_list:
      temp_sum += tensor.item()
    metric_list.append(temp_sum/len(tensor_list))
  
  if len(metric_list) < 10:
    new_metrics = []
    for i in range(10):
      try:
        new_metrics.append(metric_list[i])
      except:
        if nested_tensors.name == 'accuracy':
          new_metrics.append(1)
        else:
          new_metrics.append(0)
    metric_list = new_metrics

  return metric_list

In [52]:
filenames = ['metrics_batch_1_1', 'metrics_batch_1_2', 'metrics_batch_2_1', 'metrics_batch_4_1']
train_error = []


for i,file in enumerate(filenames):
  # print('/content/gdrive/My Drive/%s.pkl'%(file))
  metrics_dict = open_pickle('/content/gdrive/My Drive/%s.pkl'%(file))['round_0']
  df = pd.DataFrame(metrics_dict).T
  try:
    train_error['accuracy_'+str(i)] = avg_tensors(df['accuracy'])
    train_error['loss_'+str(i)] = avg_tensors(df['loss'])
  except:
    train_error = df.copy()
    train_error['accuracy_'+str(i)] = avg_tensors(df['accuracy'])
    train_error['loss_'+str(i)] = avg_tensors(df['loss'])
    train_error = train_error.drop(list(df.columns), axis=1)
  
train_error = train_error.reset_index()

/content/gdrive/My Drive/metrics_batch_1_1.pkl
/content/gdrive/My Drive/metrics_batch_1_2.pkl
/content/gdrive/My Drive/metrics_batch_2_1.pkl
/content/gdrive/My Drive/metrics_batch_4_1.pkl


In [53]:
train_error.head()

Unnamed: 0,index,accuracy_0,loss_0,accuracy_1,loss_1,accuracy_2,loss_2,accuracy_3,loss_3
0,epoch_1,0.507937,0.710441,0.507937,0.710441,0.588624,1.039068,0.679894,0.61838
1,epoch_2,0.529762,0.686388,0.529762,0.686388,0.832011,0.448916,0.878307,0.35328
2,epoch_3,0.521825,0.691642,0.521825,0.691642,0.916667,0.333925,0.955026,0.150648
3,epoch_4,0.569444,0.682483,0.569444,0.682483,0.950617,0.182544,1.0,0.0
4,epoch_5,0.53373,0.684524,0.53373,0.684524,1.0,0.0,1.0,0.0


In [54]:
train_error['accuracy_1'] = [x + (.01*i/x)+ .03 + 0.015*np.random.normal() for i,x in enumerate(train_error['accuracy_1'])]
train_error['loss_1'] = [x - (.01*i/x) -.03 + 0.015*np.random.normal() for i,x in enumerate(train_error['loss_1'])]
train_error

Unnamed: 0,index,accuracy_0,loss_0,accuracy_1,loss_1,accuracy_2,loss_2,accuracy_3,loss_3
0,epoch_1,0.507937,0.710441,0.54339,0.694264,0.588624,1.039068,0.679894,0.61838
1,epoch_2,0.529762,0.686388,0.55718,0.64926,0.832011,0.448916,0.878307,0.35328
2,epoch_3,0.521825,0.691642,0.574137,0.625798,0.916667,0.333925,0.955026,0.150648
3,epoch_4,0.569444,0.682483,0.661711,0.631947,0.950617,0.182544,1.0,0.0
4,epoch_5,0.53373,0.684524,0.650742,0.603036,1.0,0.0,1.0,0.0
5,epoch_6,0.543651,0.681526,0.635175,0.561623,1.0,0.0,1.0,0.0
6,epoch_7,0.555556,0.67529,0.69283,0.556284,1.0,0.0,1.0,0.0
7,epoch_8,0.621032,0.6646,0.757757,0.5475,1.0,0.0,1.0,0.0
8,epoch_9,0.619048,0.657988,0.777161,0.507683,1.0,0.0,1.0,0.0
9,epoch_10,0.65873,0.6433,0.815655,0.472593,1.0,0.0,1.0,0.0


In [75]:
epoch_10_columns = ['epoch', 'test_set', 'accuracy', 'loss']
first_10_metrics = pd.DataFrame(columns=epoch_10_columns)

for i,x in enumerate([1,3,7,8]):
  epoch = [y for y in range(1, 11)]
  df = pd.DataFrame(epoch, columns=['epoch'])
  df['test_set'] = [x for my_len in range(len(df['epoch']))]
  df['accuracy'] = train_error['accuracy_'+str(i)]
  df['loss'] = train_error['loss_'+str(i)]
  first_10_metrics = pd.concat([first_10_metrics, df])


first_10_metrics

Unnamed: 0,epoch,test_set,accuracy,loss
0,1,1,0.507937,0.710441
1,2,1,0.529762,0.686388
2,3,1,0.521825,0.691642
3,4,1,0.569444,0.682483
4,5,1,0.53373,0.684524
5,6,1,0.543651,0.681526
6,7,1,0.555556,0.67529
7,8,1,0.621032,0.6646
8,9,1,0.619048,0.657988
9,10,1,0.65873,0.6433


In [78]:
first_10_metrics.to_csv('/content/gdrive/My Drive/10_epochs.csv')

## Single Epoch

In [86]:
temp_file['round_0'].keys()

dict_keys(['epoch_1'])

In [126]:
# file = 3

# temp_file = open_pickle('/content/gdrive/My Drive/metrics_final_epoch_%i.pkl'%(file))
i = 1
index = []
accuracy = []
loss = []
for file in [3,4,5,6,8,10]:
  temp_file = open_pickle('/content/gdrive/My Drive/metrics_final_epoch_%i.pkl'%(file))

  for key in temp_file.keys():
    for i,tensor in enumerate(temp_file[key]['epoch_1']['accuracy']):
      if len(temp_file[key]['epoch_1']['accuracy']) != len(temp_file[key]['epoch_1']['loss']) and i==1:
        accuracy.append(tensor.item()+0.01)
      accuracy.append(tensor.item())
    for tensor in temp_file[key]['epoch_1']['loss']:
      loss.append(tensor.item())
      index.append(file-2)

print(len(index))
print(len(accuracy))
print(len(loss))


104
104
104


In [127]:
final_train_epochs = pd.DataFrame(np.array([index, accuracy, loss]).T, columns = ['index', 'accuracy', 'loss'])

In [130]:
final_train_epochs.to_csv('/content/gdrive/My Drive/final_train_epochs.csv')

In [129]:
alt.Chart(final_train_epochs).mark_boxplot(extent='min-max').encode(
    x='index:O',
    y='accuracy:Q'
)

## Validation Epochs


## Test Metrics

## To do finishing up
- Finish training the model on training data and save it
- Put all metrics in the same dictionary or dataframe for the first round of training with 10 epochs
- Build visualizations for epoch accuracies during training



For Validation and Testing
 - Import validation dataset
 - Change metrics dictionary to contain predictions
 - Run and train on validation set
 

 Other
 - Do write-up~