<a href="https://colab.research.google.com/github/yecatstevir/teambrainiac/blob/main/BuildSingleSubjectSVM_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Support Vector Machine Training for Single Subject Brain State Prediction
- Go to 'Runtime' in Colab browser bar, select 'Change Runtime Type', select 'High-RAM' from 'Runtime Shape'. 
-This note book will run the whole pipeline
-Need to load in datadictionary.pkl to get paths to subject .mat files
-Steps for Single Subject SVM
  1.   Load single subject data from .mat file and convert data in to numpy matrix
  2.   Load mask data for masking
  3.   Indicate training runs: there are 4 runs to train on 1-4
  4.   Indicate testing runs: there are 4 runs to test on
  5.   Choose standardization strategy: 'psc','zscore','nonorm'. Note: Detrend is True by default
  6.   Choose mask indices from data dictionary 0: whole brain mask and whole brain minus regions of interest (ROIs), 1: ROIs (see list below
  7.   Indicate kernel and C parameters
  8.   Set path to save models
  9.   Grabs subject ids, subject paths to load data
  10.  Gets labels and mask labels to only grab timepoints of interest
  11.  Enters for loop which grabs subject data, masks data, scales data, runs model, gets predictions on test runs.
  12.  Save model, train data, labels, predictions in pickle file


- ROIs
  1.   Nucleus Accumbens-Bilateral (NAcc)
  2.   Anterior Insula-Right (AI)
  3.   Anterior Cingular Cortex-Bilateral (ACC)
  4.   medial PreFrontal Cortex-Bilateral (mPFC)


### Mount Google Drive and clone repository
- open to source directory

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')#, force_remount = True)

Mounted at /content/gdrive


In [None]:

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
# Clone the entire repo.
!git clone -l -s https://github.com/yecatstevir/teambrainiac.git
# Change directory into cloned repo
%cd teambrainiac/source
!ls


Cloning into 'teambrainiac'...
remote: Enumerating objects: 1575, done.[K
remote: Counting objects: 100% (1575/1575), done.[K
remote: Compressing objects: 100% (1239/1239), done.[K
remote: Total 1575 (delta 1024), reused 627 (delta 319), pack-reused 0[K
Receiving objects: 100% (1575/1575), 88.14 MiB | 18.96 MiB/s, done.
Resolving deltas: 100% (1024/1024), done.
/content/teambrainiac/source
access_data.py			    __init__.py
AccuracyMeasures.ipynb		    process.py
brain_viz_single_subj.py	    single_subject.py
BuildSingleSubjectSVM_Models.ipynb  SingleSubjectSVM.ipynb
data				    streamlit
DataExploration_SingleSubj.ipynb    SubjectVisualization_Models_ZNORM.ipynb
DataExplorationVisuals.ipynb	    TestMask.ipynb
DL				    utils.py
group_svm			    VisualizationPlayground.ipynb
helper				    VisualizationsNotebook.ipynb


### Load path_config.py 
- we are already in source so we can just load this file without changing directory

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving path_config.py to path_config.py
User uploaded file "path_config.py" with length 228 bytes


### Import Libraries

In [None]:


# Import libraries
!pip install boto3 nilearn nibabel #for saving data and image visualizations
import pickle
#sklearn packages needed
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, auc, recall_score, precision_score,roc_curve,f1_score
#for normalization
from nilearn.signal import clean
#important utility functions for loading,masking,saving data
#from utils import *
from access_data import *
from single_subject import *
#from process import *
#normal python packages we use
import numpy as np
import pandas as pd
#for normalization
from nilearn.signal import clean

Collecting boto3
  Downloading boto3-1.21.42-py3-none-any.whl (132 kB)
[K     |████████████████████████████████| 132 kB 4.1 MB/s 
[?25hCollecting nilearn
  Downloading nilearn-0.9.1-py3-none-any.whl (9.6 MB)
[K     |████████████████████████████████| 9.6 MB 17.2 MB/s 
Collecting botocore<1.25.0,>=1.24.42
  Downloading botocore-1.24.42-py3-none-any.whl (8.7 MB)
[K     |████████████████████████████████| 8.7 MB 35.8 MB/s 
[?25hCollecting s3transfer<0.6.0,>=0.5.0
  Downloading s3transfer-0.5.2-py3-none-any.whl (79 kB)
[K     |████████████████████████████████| 79 kB 7.9 MB/s 
[?25hCollecting jmespath<2.0.0,>=0.7.1
  Downloading jmespath-1.0.0-py3-none-any.whl (23 kB)
Collecting urllib3<1.27,>=1.25.4
  Downloading urllib3-1.26.9-py2.py3-none-any.whl (138 kB)
[K     |████████████████████████████████| 138 kB 68.3 MB/s 
Collecting scipy>=1.5
  Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
[K     |████████████████████████████████| 38.1 MB 72.

In [None]:
#get data dictionary, using default path
data_path_dict = get_data_dict()

### Run Single Subject SVM on all Subjects
* Can modify code to only do a few subjects
* If only want to run on one mask, indicate the mask name in the list, but the loop is expecting nested list, so make sure you nest.

In [None]:
##setting mask list
#masks we want to run model on, needs to be nested list for cell to run
#indices of the masks we want 0 = whole brain mask and masks minus ROIs, 1 = ROIs
mask_indices = [0]
mask_list = []
# for ind in mask_indices:
#   mask_dict = get_mask_data(data_path_dict,ind)  
#   masks = list(mask_dict.keys())[3:]
#   mask_list.append(masks)
#Example of building your own
mask_dict = get_mask_data(data_path_dict,0)
mask_list = [['mask']]


In [None]:
###running model with best params across all masks
##what runs do you want to normalize on
runs_train=['run_01','run_02'] #runs we want to train on
runs_test=['run_03','run_04'] #runs we want to test on
runs_list=[1,2,3,4] #specify runs we want to normalize
norm_type = 'zscore' #specify normalization
svc_kernel='rbf' #specify kernel 
svc_c = 5 #specify c parameter
svc_gamma = 'auto'
save_data_path = f'/content/drive/My Drive/data/singlesubjectmodels/' #where we want to store our models
#get subject information
subjs_id, subjs_paths = get_subj_information(data_path_dict)
#get mask labels to only retrieve time series we care about
mask_labels_indices,binary_labels,labels_t = get_labels(data_path_dict)
#iterate over mask indices
for masks in mask_list:
  print(masks)
  #iterate over masks
  for mask_type in masks:
    print(mask_type)
    subj_mask_model = {}
    mask = make_mask(mask_dict[mask_type])
    subj_mask_model[mask_type] = {}
    #iterate over subjects
    for idx in range(len(subjs_id)):
      subj_id = subjs_id[idx] #get subject id
      subj_path = subjs_paths[idx] #get subject path
      subj_data = access_load_data(subj_path,True) #load subject data        
      subj_mask_model[mask_type][subj_id] = {} #initialize empty subject dict

      print(f'Running SVM on {subj_id} with mask {mask_type}')    
      masked_data = mask_subject_data(subj_data,mask,mask_labels_indices) #mask data
      scaled_data = scale_data_single_subj(masked_data,runs_list,norm='zscore') #scale data
      clf,X_train,y_train = run_single_subject_svm(scaled_data,runs_train,binary_labels,svc_kernel,svc_gamma,svc_c) #run model
      #store model and predicts of tests runs
      subj_mask_model[mask_type][subj_id]['model'] = clf
      subj_mask_model[mask_type][subj_id]['X_train'] = X_train
      subj_mask_model[mask_type][subj_id]['y_train'] = y_train
      subj_mask_model[mask_type][subj_id]['predicts'] = get_predicts(clf,scaled_data,runs_test)
    full_path_name = f'{save_data_path}/{mask_type}_tr_1_2_for_real_subject_models.pkl'
    filehandler = open(full_path_name,"wb")
    pickle.dump(subj_mask_model,filehandler)
    filehandler.close()

['mask']
mask
Running SVM on 10004_08693 with mask mask
Running SVM on 10008_09924 with mask mask
Running SVM on 10009_08848 with mask mask
Running SVM on 10016_09694 with mask mask
Running SVM on 10017_08894 with mask mask
Running SVM on 10018_08907 with mask mask
Running SVM on 10021_08839 with mask mask
Running SVM on 10022_08854 with mask mask
Running SVM on 10023_09126 with mask mask
Running SVM on 10027_09455 with mask mask
Running SVM on 10033_08871 with mask mask
Running SVM on 10034_08879 with mask mask
Running SVM on 10035_08847 with mask mask
Running SVM on 10036_09800 with mask mask
Running SVM on 10037_09903 with mask mask
Running SVM on 10038_09063 with mask mask
Running SVM on 10039_08941 with mask mask
Running SVM on 10042_08990 with mask mask
Running SVM on 10043_09222 with mask mask
Running SVM on 10045_08968 with mask mask
Running SVM on 10046_09216 with mask mask
Running SVM on 10047_09030 with mask mask
Running SVM on 10050_09079 with mask mask
Running SVM on 10053