## Support Vector Machine (SVM)


This notebook serves the purpose of computing **SVM** pipelines for macro-structural and micro-structural data.

Following the scikit-learn algorhitm cheat-sheet, for the data at hand the performance of a **linear SVM** is recommended. The data contains more than 50 participants (samples), the aim is to predict a category (control/patient) and the data is labeled. 

SVMs are a set of supervised learning methods used that can be used for classifcation, regression and outliers detection (for further information click here). The basic idea is to find an optimal separating line (or hyperplane) as output that separates the data into two classes. The SVM algorithm looks for the data points that are the clostest to the line from both classes. These points are called support vectors. Then, the distance between the support vectors and the hyperplane which is called the margin is computed. To find the best and optimal hyperplane, the margin should be maximized.


Support vector classification(SVC) or Linear Support vector classification (LinearSVC) are methods of SVMs making it feasible to perfom a binary or mulit-class classification on a dataset. For the purpose of this project, LinearSVC is going to be performed.

## Macro-structural data: Cortical Thickness

In [1]:
#import relevant modules

import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import metrics

In [2]:
#load data

CT_Dublin_path = os.path.join(os.pardir, 'data', 'PARC_500.aparc_thickness_Dublin.csv')
CT_Dublin = pd.read_csv(CT_Dublin_path)

In [3]:
#adjust dataframe

CT_Dublin_adj = CT_Dublin.drop(['Subject ID','Age', 'Sex'], axis=1)

In [4]:
#label group 1 as 0 and 2 as 1

CT_Dublin_adj['Group'] = CT_Dublin_adj['Group'].replace([1,2],[0, 1])

In [5]:
#print the names of the 308 features

print("Features: ", CT_Dublin_adj.columns[2:308])

Features:  Index(['lh_bankssts_part2_thickness',
       'lh_caudalanteriorcingulate_part1_thickness',
       'lh_caudalmiddlefrontal_part1_thickness',
       'lh_caudalmiddlefrontal_part2_thickness',
       'lh_caudalmiddlefrontal_part3_thickness',
       'lh_caudalmiddlefrontal_part4_thickness', 'lh_cuneus_part1_thickness',
       'lh_cuneus_part2_thickness', 'lh_entorhinal_part1_thickness',
       'lh_fusiform_part1_thickness',
       ...
       'rh_supramarginal_part4_thickness', 'rh_supramarginal_part5_thickness',
       'rh_supramarginal_part6_thickness', 'rh_supramarginal_part7_thickness',
       'rh_frontalpole_part1_thickness', 'rh_temporalpole_part1_thickness',
       'rh_transversetemporal_part1_thickness', 'rh_insula_part1_thickness',
       'rh_insula_part2_thickness', 'rh_insula_part3_thickness'],
      dtype='object', length=306)


In [6]:
#print the names of the labels

print("Labels: ", CT_Dublin_adj['Group'])

Labels:  0      0
1      0
2      0
3      1
4      1
      ..
103    1
104    1
105    1
106    1
107    1
Name: Group, Length: 108, dtype: int64


In [8]:
#define data and target

CT_data = CT_Dublin_adj.iloc[:,1:308].values
CT_target = CT_Dublin_adj.iloc[:,[0]].values

In [10]:
CT_data
CT_target

array([[0],
       [0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
    

In [73]:
CT_X_train, CT_X_test, CT_y_train, CT_y_test = train_test_split(CT_data, CT_target, test_size=0.3,random_state=109)

In [74]:
classifier = svm.SVC(kernel='linear')

In [75]:
CT_target

array([0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [47]:
CT_target.shape

(108,)

In [48]:
CT_target = CT_target.ravel()

In [49]:
CT_target.shape

(108,)

In [78]:
classifier.fit(CT_X_train, CT_y_train)

SVC(kernel='linear')

In [81]:
CT_y_pred = classifier.predict(CT_X_test)

In [83]:
print("Accuracy:",metrics.accuracy_score(CT_y_test, CT_y_pred))
print("Precision:",metrics.precision_score(CT_y_test, CT_y_pred))
print("Recall:",metrics.recall_score(CT_y_test, CT_y_pred))

Accuracy: 0.8484848484848485
Precision: 1.0
Recall: 0.5454545454545454


## Micro-structural data

### Mean Diffusivity

In [40]:
#read data

MD_Dublin_path = os.path.join(os.pardir, 'data', 'PARC_500.aparc_MD_cortexAv_mean_Dublin.csv')
MD_Dublin = pd.read_csv(MD_Dublin_path)

In [41]:
#adjust dataframe

MD_Dublin_adj = MD_Dublin.drop(['Subject ID','Age', 'Sex'], axis=1)

In [42]:
#label group 1 as 0 and 2 as 1

MD_Dublin_adj['Group'] = MD_Dublin_adj['Group'].replace([1,2],[0, 1])

In [43]:
MD_Dublin_adj

Unnamed: 0,Group,lh_bankssts_part1_thickness,lh_bankssts_part2_thickness,lh_caudalanteriorcingulate_part1_thickness,lh_caudalmiddlefrontal_part1_thickness,lh_caudalmiddlefrontal_part2_thickness,lh_caudalmiddlefrontal_part3_thickness,lh_caudalmiddlefrontal_part4_thickness,lh_cuneus_part1_thickness,lh_cuneus_part2_thickness,...,rh_supramarginal_part5_thickness,rh_supramarginal_part6_thickness,rh_supramarginal_part7_thickness,rh_frontalpole_part1_thickness,rh_temporalpole_part1_thickness,rh_transversetemporal_part1_thickness,rh_insula_part1_thickness,rh_insula_part2_thickness,rh_insula_part3_thickness,rh_insula_part4_thickness
0,0,0.911,0.931,0.891,1.048,0.881,0.939,1.124,0.986,1.045,...,0.928,1.067,1.096,0.892,1.238,1.021,1.166,0.900,0.907,0.937
1,0,0.861,0.913,0.846,0.927,0.888,0.894,0.924,1.040,1.093,...,0.878,0.985,1.045,1.001,1.196,1.083,1.143,0.917,0.923,0.960
2,0,0.817,0.827,0.828,0.828,0.780,0.843,0.825,0.848,0.838,...,0.847,0.849,0.819,0.952,0.933,0.942,1.059,0.794,0.834,0.860
3,0,0.887,0.905,0.878,0.932,0.820,0.888,0.970,0.918,0.900,...,0.957,0.985,0.989,1.075,1.150,1.017,0.986,0.888,0.916,0.928
4,0,0.887,0.854,0.905,1.011,0.946,0.922,1.034,1.126,1.114,...,0.871,0.952,0.987,1.325,0.996,1.094,1.064,0.966,0.989,0.977
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
110,1,0.843,0.855,0.940,1.017,0.954,0.840,1.128,1.012,0.997,...,0.938,1.062,1.143,0.903,1.364,1.284,1.218,1.017,0.972,1.028
111,1,0.911,0.914,0.926,1.001,0.918,1.115,1.036,1.026,1.001,...,0.957,1.085,1.098,1.059,1.268,1.089,1.173,0.990,1.065,1.021
112,1,0.890,0.899,0.886,0.930,0.883,0.882,0.883,1.190,1.101,...,0.916,1.010,0.974,0.968,1.305,1.168,1.265,0.981,0.975,0.972
113,1,0.920,0.986,0.883,0.879,0.794,0.983,1.029,1.076,1.053,...,0.942,0.985,0.990,1.199,1.353,1.187,1.444,0.947,1.047,1.085


In [84]:
#define input and output

MD_data = MD_Dublin_adj.iloc[:,1:308].values
MD_target = MD_Dublin_adj.iloc[:,[0]].values

In [85]:
MD_X_train, MD_X_test, MD_y_train, MD_y_test = train_test_split(MD_data, MD_target, test_size=0.3,random_state=109)

In [86]:
MD_classifier = svm.SVC(kernel='linear')

In [105]:
MD_classifier.fit(MD_X_train, MD_y_train)

  y = column_or_1d(y, warn=True)


SVC(kernel='linear')

In [91]:
MD_y_pred = MD_classifier.predict(MD_X_test)

In [106]:
print("Accuracy:",metrics.accuracy_score(MD_y_test, MD_y_pred))
print("Precision:",metrics.precision_score(MD_y_test, MD_y_pred))
print("Recall:",metrics.recall_score(MD_y_test, MD_y_pred))

Accuracy: 0.8571428571428571
Precision: 1.0
Recall: 0.5


## Fractional Anisotropy

In [92]:
#read data

FA_Dublin_path = os.path.join(os.pardir, 'data', 'PARC_500.aparc_FA_cortexAv_mean_Dublin.csv')
FA_Dublin = pd.read_csv(FA_Dublin_path)

In [93]:
#adjust dataframe

FA_Dublin_adj = FA_Dublin.drop(['Subject ID','Age', 'Sex'], axis=1)

In [94]:
#label group 1 as 0 and 2 as 1

FA_Dublin_adj['Group'] = FA_Dublin_adj['Group'].replace([1,2],[0, 1])

In [97]:
#define input and output

FA_data = FA_Dublin_adj.iloc[:,1:308].values
FA_target = FA_Dublin_adj.iloc[:,[0]].values

In [99]:
#split data

FA_X_train, FA_X_test, FA_y_train, FA_y_test = train_test_split(FA_data, FA_target, test_size = 0.25, random_state = 0)

In [100]:
FA_classifier = svm.SVC(kernel='linear')

In [101]:
FA_classifier.fit(FA_X_train, FA_y_train)

  y = column_or_1d(y, warn=True)


SVC(kernel='linear')

In [102]:
FA_y_pred = FA_classifier.predict(FA_X_test)

In [107]:
print("Accuracy:",metrics.accuracy_score(FA_y_test, FA_y_pred))
print("Precision:",metrics.precision_score(FA_y_test, FA_y_pred))
print("Recall:",metrics.recall_score(FA_y_test, FA_y_pred))

Accuracy: 0.6896551724137931
Precision: 0.0
Recall: 0.0


  _warn_prf(average, modifier, msg_start, len(result))
