#**Research for Classification of Brain disease between 'Alzheimer's Disease', 'Mild Coginitive Impariment', 'Cognitive Normal' with various methods of Machine Learning.**
###**Introduction to Brain and Machine Learning, 2020-2**
###**Prof. Suk. , Korea University**
###**All codes are written by John Leo**
---

#**Used Libraries for the project**
---
###**Processing the dataframe**
1. **math**: For detecting the nan value
2. **pandas**: For processing total dataframe
3. **statistics**: For getting median value from the array
4. **numpy**: To create array

###**Constructing the model**
1. **Scikit Learn**

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import statistics
import math
import matplotlib.pyplot as plt

df = pd.read_csv('/content/drive/My Drive/train_data.csv')

#**PreProcessing of Input Features: Morphological phenotypes**
---
#**Classes**

###**::MissingTask(rawData):**

This class fills the missing features(nan) to the specific value, and return the dataframe.
Each missing feature cell is filled with the mean, median and zero of **the existing values of the columns containing the missing features** by each method (Mean, Median, Zero).

**Methods**

- **Mean()** : Fill the missing values with the Mean of the **exist values from the missing value column**
>
- **Zero()** : Fill the missing values with Zero
>
- **Median()** : Fill the missing values with Median of the **exist values from the missng value column**

---
###**::FullData(data)::**

This Class is consist of methods for seperating the data by Cortical Volume (E-BV), Average Thickness (BW-EN), and the dataframe itself.

**Attributes**

- **DiagnosisGroup** : DataFrame of the Diagnosis columns of the FullData
>
- **CorticalVolume**: DataFrame of the Cortical Volume columns of the FullData
>
- **AverageThickness**: DataFrame of the Average Thickness columns of the Fulldata

**Methods**
- **getDiagnosisGroup()** : Return the Diagnosis Columns from of Full Data
>
- **getCorticalVolume()** : Return the Cortical Volume Columns of Full Data
>
- **getAverageThickness()** : Return the Cortical Volume Columns of Full Data
>
- **getVolumeArray(type)**: Return the Array of the Volumes by the type.
> **type == "All"** => return the all volumes array(CV, AT).
> ##### **type == "CV"** => return the Cortical Volume array
> ##### **type == "AT"** => return the Average Thickness array.
- **getScoreArray(type)**: Return the Array of the Scores by the type
> ##### **type == "All"** => return the all scores array (ADAS11, ADAS13, MMSE
> ##### **type == "Question"** => return the ADAS11, ADAS13 array
> ##### **type == "MMSE"** => return the MMSE array.
- **getDiagnosisArray()**: Return the Diagnosis Column of Full Data.
>
- **getVolumeArrayByColumns()**: Return the Array by columns

---
###**::ClassData(taskedData, diagnosisIndex)::**
This class seprates the full data to the each diagnosis groups, which are Cognitive Normal (0), Mild Cogintive Impariment (1), Alzheimer's Disease (2). Each class has the method for getting the full dataframe, dataframe for cortical volume, dataframe for average thickness, and the mean of the each cortical volume and average thickness.

**Methods**
- **getData()** : Get the full dataframe of the class of the diagnosis group
>
- **getCorticalVolume()** : Get the Cortical Volume columns of the class
>
- **getAverageThickness()** : Get the Average Thickness columns of the class
>
- **getColumnArray(type)** : Get the Array of Mean or Median of each columns
>
- **getMeanOfCV()** : Get the Mean of the Cortical Volume from the whole values of the class
>
- **getMeanOfAT()** : Get the Mean of the Average Thickness from the whole values of the class

In [None]:
class MissingTask:
  def __init__(self, rawdata):
    self.data = rawdata.copy()
    self.columns = df.columns[4:]
  
  def Mean(self):
    dictionary = {}
    for column in self.columns:
      average = 0
      sum = 0
      count = 0
      for volume in self.data[column]:
        if math.isnan(volume):
          volume = 0
        else:
          count = count + 1
        sum = sum + volume
        average = sum / count
        dictionary[column] = average

    for dic in dictionary:
      for index, volume in enumerate(self.data[dic]):
        if math.isnan(volume):
          self.data[dic][index] = dictionary[dic]

    return self.data


  def Zero(self):
    for column in self.columns:
      for index, volume in enumerate(self.data[column]):
        if math.isnan(volume):
          self.data[column][index] = 0

    return self.data

  def Median(self):
    dictionary = {}
    for column in self.columns:
      array = []
      for volume in self.data[column]:
        if math.isnan(volume) is False:
          array.append(volume)
      median = statistics.median(array)
      dictionary[column] = median

    for dic in dictionary:
      for index, volume in enumerate(self.data[dic]):
        if math.isnan(volume):
          self.data[dic][index] = dictionary[dic]

    return self.data

  @staticmethod
  def CheckIfMeanRight():
    sum = 0
    count = 0
    notcount = 0
    for volume in df['ST123CV']:
      if math.isnan(volume):
        volume = 0
        notcount = notcount + 1
      else:
        count = count + 1
      sum = sum + volume
    average = sum / count
    print(len(df), count, notcount, count+notcount)
    

class FullData:
  def __init__(self, data):
    self.data = data
    self.DiagnosisGroup = self.getDiagnosisGroup()
    self.CorticalVolume = self.getCorticalVolume()
    self.AverageThickness = self.getAverageThickness()

  def getDiagnosisGroup(self):
    return self.data['DX_bl']

  def getCorticalVolume(self):
    return pd.concat([self.DiagnosisGroup, self.data.iloc[:,4:74]], axis=1)

  def getAverageThickness(self):
    return pd.concat([self.DiagnosisGroup, self.data.iloc[:,74:]], axis=1)

  def getVolumeArray(self, type):
    if type is 'CV':
      return np.array(self.CorticalVolume.iloc[:,1:])
    elif type is 'AT':
      return np.array(self.AverageThickness.iloc[:,1:])
    elif type is 'All':
      return np.array(self.data.iloc[:,4:])
  
  def getScoreArray(self, type):
    if type is 'All':
     return np.array(self.data.iloc[:,1:4])
    elif type is 'Question':
      return np.array(self.data.iloc[:,1:3])
    elif type is 'Mini':
      return np.array(self.data.iloc[:,3])
                      
  def getDiagnosisArray(self):
    return np.array(self.DiagnosisGroup)

  def getVolumeArrayByColumns(self):
    data = self.data.iloc[:10,4:]
    columns = data.columns
    
    array = []

    for column in columns:
      inarray = []
      for volume in data[column]:
        inarray.append(volume)
      array.append(inarray)

    return array


class ClassData:
  def __init__(self, taskedData, diagnosisIndex):
    self.taskedData = taskedData
    self.diagnosisIndex = diagnosisIndex
    self.data = self.taskedData[taskedData['DX_bl'] == diagnosisIndex]
    self.CorticalVolume = self.getCorticalVolume()
    self.AverageThickness = self.getAverageThickness()

  def getData(self):
    return self.data
  
  def getCorticalVolume(self):
    cortical = self.data.iloc[:, 4:74]
    diagnosisColumn = self.data['DX_bl']
    return pd.concat([diagnosisColumn, cortical], axis =1)

  def getAverageThickness(self):
    average = self.data.iloc[:, 74:]
    diagnosisColumn = self.data['DX_bl']
    return pd.concat([diagnosisColumn, average], axis =1)

  def getColumnArray(self, volumeType, type):
    if volumeType is 'CV':
      data = self.CorticalVolume
    else:
      data = self.AverageThickness
    columns = data.columns[1:]
    array = []
    if type is 'Mean':
      for column in columns:
        sum = 0
        for volume in data[column]:
          sum = sum + volume
        mean = sum / len(data[column])
        array.append(mean)
      return array, data.columns[1:]
    else:
      for column in columns:
        median = statistics.median(data[column])
        array.append(median)
      return array, data.columns[1:]



  def MeanOfCV(self):
    columns = self.CorticalVolume.columns[1:]
    totalcellnumber = len(columns) * len(self.CorticalVolume)
    total = 0
    for column in columns:
      sum = 0
      for volume in self.CorticalVolume[column]:
        sum = sum + volume
      total = total + sum
    return total / totalcellnumber

  def MeanOfAT(self):
    columns = self.AverageThickness.columns[1:]
    totalcellnumber = len(columns) * len(self.AverageThickness)
    total = 0
    for column in columns:
      sum = 0
      for volume in self.AverageThickness[column]:
        sum = sum + volume
      total = total + sum
    return total / totalcellnumber




# Define the data by the selection of the method of processing missing features of MISSING TASK CLASS
# Here the tasked data is defined with the Mean Processing missing features.

Data = FullData(MissingTask(df).Mean())

Data.data

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,DX_bl,ADAS11,ADAS13,MMSE,ST102CV,ST103CV,ST104CV,ST105CV,ST106CV,ST107CV,ST108CV,ST109CV,ST110CV,ST111CV,ST113CV,ST114CV,ST115CV,ST116CV,ST117CV,ST118CV,ST119CV,ST121CV,ST123CV,ST129CV,ST130CV,ST13CV,ST14CV,ST15CV,ST23CV,ST24CV,ST25CV,ST26CV,ST31CV,ST32CV,ST34CV,ST35CV,ST36CV,ST38CV,ST39CV,ST40CV,...,ST34TA,ST35TA,ST36TA,ST38TA,ST39TA,ST40TA,ST43TA,ST44TA,ST45TA,ST46TA,ST47TA,ST48TA,ST49TA,ST50TA,ST51TA,ST52TA,ST54TA,ST55TA,ST56TA,ST57TA,ST58TA,ST59TA,ST60TA,ST62TA,ST64TA,ST72TA,ST73TA,ST74TA,ST82TA,ST83TA,ST84TA,ST85TA,ST90TA,ST91TA,ST93TA,ST94TA,ST95TA,ST97TA,ST98TA,ST99TA
0,0,9.00,14.00,27,3295,1760,2957,1975,3270,1871,9504,3219,12683,8167,1956,15106,18296,12517,11914,9771,2024,1221,6627.000000,6245,5844,2361,2161,4782,2116,1757.0,723,6714,11308,10841,1669,8168,6690,6040,3858,11530,...,2.098,1.965,2.666,1.690,2.328,2.636,2.159,2.354,2.174,2.331,2.335,1.353,1.757,2.270,2.189,2.099,2.695,2.112,2.537,2.004,2.698,2.445,3.461,2.216,1.004000,2.480,2.428,2.234,1.670,3.132,2.467,2.424,2.345,2.914,2.461,2.129,2.519,1.606,2.363,2.864
1,0,5.00,7.00,26,3644,1926,4376,2723,4806,1756,10159,2972,13170,10227,1754,14615,18758,12629,9276,10399,2057,842,6454.726316,7175,5860,2154,2159,5930,2979,1770.0,761,10683,10900,11519,2544,10561,7039,6141,4148,9033,...,2.719,2.031,2.429,2.204,2.079,2.852,2.367,2.887,2.543,2.395,2.431,1.712,2.034,2.602,2.598,2.246,2.401,2.142,2.574,2.169,2.877,2.602,3.375,2.382,1.043959,2.657,2.543,2.348,1.948,3.782,2.440,2.843,2.499,2.695,2.457,2.254,2.269,1.868,2.142,2.679
2,2,16.33,29.33,25,3096,2191,3784,2344,2675,2063,8561,3208,14310,7538,1571,14405,20381,11549,9488,8848,2175,1109,6150.000000,6085,5977,1828,1392,5996,3021,1562.0,489,8597,6703,8310,2281,11219,7499,7580,4451,7417,...,2.253,1.891,2.540,1.879,2.279,2.548,2.298,2.861,2.528,2.827,2.234,1.404,1.940,2.285,2.436,1.800,2.940,2.152,2.692,1.756,2.607,2.221,3.703,2.175,1.064000,2.058,2.168,2.414,1.885,3.303,2.702,2.617,2.108,2.526,2.151,2.003,2.534,1.996,2.163,2.643
3,1,6.00,9.00,29,3857,2660,3382,2751,3783,2202,9074,2504,11995,10484,1986,13690,20896,15095,10463,11313,2787,935,6454.726316,7009,7279,1719,2238,5620,2816,1934.0,925,11194,11349,10076,2299,11054,7071,6329,5498,8785,...,2.643,2.158,2.489,1.952,2.289,2.767,2.199,3.184,2.388,2.671,2.179,1.536,2.137,2.539,2.481,2.376,2.736,2.163,2.638,2.185,2.709,2.328,3.924,2.529,1.043959,2.663,2.990,2.469,1.922,4.232,2.546,2.937,2.514,2.875,2.837,2.379,2.315,1.954,2.326,2.989
4,0,5.00,8.00,28,3411,1578,2664,2433,3448,1550,7128,2500,9801,7732,1451,13050,17785,10460,8407,8573,1952,874,6454.726316,5275,6034,2070,1405,3591,2244,1110.0,652,9601,9964,8338,2059,8735,5546,5859,4480,8762,...,2.644,2.128,2.702,2.065,2.597,2.874,2.281,2.128,2.689,2.955,2.370,1.536,1.990,2.545,2.519,2.286,2.839,2.495,2.812,2.104,2.652,2.549,3.105,2.390,1.043959,2.483,2.381,2.498,1.981,3.331,2.954,2.825,2.440,2.848,2.447,2.256,2.497,2.052,2.498,2.829
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1602,1,16.00,27.00,28,3902,2108,4206,2151,3877,2426,7699,3485,11789,11747,2924,15620,19979,14960,9522,9381,1601,844,6454.726316,6546,7567,1978,1405,6207,3621,1393.0,792,8728,11456,9916,2942,12863,7089,6221,5619,10623,...,2.320,2.244,2.487,1.779,2.361,2.548,2.003,2.541,2.469,2.482,2.259,1.623,1.947,2.487,2.308,2.654,2.864,2.270,2.638,2.265,2.311,2.441,3.410,2.226,1.043959,2.421,2.452,2.482,1.851,2.286,2.455,2.473,2.538,2.726,2.092,2.342,2.461,1.837,2.305,2.716
1603,2,23.00,36.00,24,2713,1694,3153,1766,2685,2081,9003,2927,11843,9890,2308,12994,17177,11485,10095,7990,1763,798,6454.726316,6614,6391,1890,1683,5660,2538,1500.0,842,7753,8109,8618,2172,7058,6358,4903,5022,9294,...,2.187,1.522,2.373,1.599,2.422,2.468,2.347,2.838,2.221,2.426,2.058,1.505,1.662,2.792,2.121,2.253,2.750,1.976,2.314,1.767,2.364,2.052,3.201,2.170,1.043959,2.457,2.690,2.273,1.693,2.943,2.711,2.375,2.182,2.533,2.317,1.808,2.282,1.748,2.430,2.701
1604,2,28.00,40.00,23,3123,1630,4345,1937,3946,1767,8170,3380,13104,8393,2021,16168,18640,12868,11232,9629,2217,840,6454.726316,8379,8229,2714,1594,7492,2933,1453.0,668,10203,9111,10515,2090,12227,7888,6518,5305,9113,...,1.975,2.242,2.482,1.892,2.155,2.635,2.378,2.424,2.407,2.298,2.391,1.481,1.918,2.322,2.524,1.964,2.990,2.176,2.636,1.955,2.445,2.309,3.634,1.931,1.043959,2.581,2.374,2.228,1.670,2.462,2.628,2.551,2.169,2.873,2.063,2.170,2.732,1.894,2.291,2.803
1605,2,18.33,29.33,23,2411,1749,2297,1800,2979,1901,5947,2934,7588,7609,1662,11397,15116,9759,6644,6461,1663,469,5231.000000,5289,5055,1204,1098,4157,1528,1209.0,376,9532,6771,8745,2212,7365,5692,5525,3017,7082,...,2.520,1.648,2.414,1.736,1.962,2.296,1.784,2.468,1.769,2.079,1.926,1.232,1.501,2.476,1.831,1.939,3.250,1.900,2.403,1.680,2.193,1.968,3.351,1.494,0.997000,1.828,3.025,1.852,1.463,2.982,2.109,2.408,1.849,2.449,2.574,1.784,2.515,1.806,2.071,2.248


##**Define 10 Fold Cross Validate**

In [None]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_predict

kfold = KFold(n_splits=10, shuffle=True, random_state=10)

#**Task 1 (3class Classification)**
##**::Classifier(X, y)::**
This Class is consist of various models to classify the 'DX_bl', which labels are 0 (Cognitive Normal), 1 (Mild Cogitive Impariment), 2 (Alzheimer's Disease).
>
Each model can be selected whether the model should be trained and tested with the whole data, or **10-fold cross valdation**. Each of them also has a method for predicting.
>
The Class has total 11 models for classifying, and each of them can be called by 

>```classifiers = Classifier(X, y)```
```classifiers.SgdClassifier(cross=True, predict=X[0])```

###**Model Parameters**###
```classifiers.model(cross = True, predict = None)```
####**cross = True(default)**
> This parameter decide whether the model train and test with the 10 fold cross validation or just raw set.

####**predict = None(default)**
> With this parameter, if predict parameter exists, model return the predicted value of the model.

When the parameter ***cross*** is set to True, the model returns follow 3 items.

1. The **mean** of the test scores by 10 fold cross validation.
2. The array of the ten test scores by 10 fold cross validation.

When the parameter ***cross*** is set to False, the model returns following item.

1. The test score by train dataset.

When the parameter ***predict*** is set to True, the model returns
1. if **cross** = True, => The array of the ten predicted value by 10 fold cross validation.
2. if **cross** = False, => The predicted value from the model.
---
###**Models**

> **Logistic Regression Classifier**: Classifier using Logistic Regression. (max_iter=7600, C=1e)
```classifiers.logiClassifier(cross=True)```

> **Decision Tree Classifier**: Classifier using DecisionTree.
```classifiers.DeciTreeClassifier(cross=True)```

> **Support Vector Machine Classifier**: Classifier using SVM. Preprocessed with the StandardScaler at scikit learn
```classifiers.SVMClassifier(cross=True)```

> **KNeighbor Classifier**: Classifier using K Neighbor. K is set to 40.
```classifiers.KNeighClassifier(cross=True)```

> **Random Forest Classifier**: Classifier using Random Forest. max depth is set to 40.
```classifiers.RanForestClassifier(cross=True)```

> **Stochastic Gradient Descent**: Classifier using Stochastic Gradient Descent.
```classifiers.SGDClassifier(cross=True)```

> **Neural Network Classifier**: Classifier using Neural Network. Hidden layer is set to (5,2).
```classifiers.NeuClassifier(cross=True)```

> **Linear Support Vector Classifier**: Classifier using Linear Support Vector. Preprocessed with the StandardScaler at scikit learn.
```classifiers.LVCClassifier(cross=True)```

> **Perceptron Classifier**: Classifier using Perceptron.
```classifiers.PerceptronClassifier(cross=True)```

> **Polynomial Kernel SVM Classifier**: Classifier using SVM with Poly Kernel. Preprocessed with the StandardScaler at scikit learn.
```classifiers.PolyClassifier(cross=True)```

> **Gaussian Rbf Kernel Classifier**: Classifier using Gaussian Rbf Kernel. Preprocessed with the StandardScaler at scikit learn.
```classifiers.RbfClassifier(cross=True)```

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.svm import LinearSVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import Perceptron
from sklearn.tree import DecisionTreeClassifier


#Get the data of 140 Columns (including CV, AT)
X = Data.getVolumeArray('All')

#Get the diagnosis Column
y = Data.getDiagnosisArray()

class Classifier:
  def __init__(self, X, y):
    self.X = X
    self.y = y
  
  def common(self, classifier, cross = True, predict = None):
    if cross is True:
      predictions = cross_val_predict(classifier, X, y, cv=kfold)
      crossValidationResults = cross_validate(classifier, X, y, cv=kfold, return_train_score=True)
      sum = 0
      for index in range(10):
        sum = sum + crossValidationResults['test_score'][index]
      mean = sum / 10

      if predict is not None:
        print('mean of cross validation test score:',mean, end="\n\n")
        print('cross validation test score:',crossValidationResults['test_score'], end="\n\n")
        print('predictions:',predictions, end="\n\n")
        print('return values ----------', end="\n\n")
        return mean, crossValidationResults['test_score'], predictions
      else:
        print('mean of cross validation test score:',mean)
        print('cross validation test score:',crossValidationResults['test_score'], end="\n\n")
        print('return values ----------', end="\n\n")
        return mean, crossValidationResults['test_score']

    elif cross is False:
      classifier.fit(self.X, self.y)
      prediction = classifier.predict([predict])
      if predict is not None:
        print('score of full data:',classifier.score(self.X, self.y))
        print('prediction:', prediction, end="\n\n")
        print('return values ----------', end="\n\n")
        return classifier.score(self.X, self.y), prediction
      else:
        print('score of full data:',classifier.score(self.X, self.y), end="\n\n")
        print('return values ----------', end="\n\n")
        return classifier.score(self.X, self.y)

  #Logisitic Regression Classifier
  def LogClassifier(self, cross, predict= None):
    #c = 1e5 0.5108928571428571 #default 0.5102717391304348 #1e3 = 0.5102717391304348 #1e10 = 0.5108928571428571
    log = LogisticRegression(max_iter=7600, C=1e5)
    return self.common(log, cross, predict)

  #Decision Tree Classifier
  def DeciTreeClassifier(self, cross, predict= None):
    #0.4617546583850931
    dtc = DecisionTreeClassifier(random_state=0)
    return self.common(dtc, cross, predict)

  #Support Vector Machine with Linear Kernel
  def SVMClassifier(self, cross, predict = None):
    #0.547014751552795
    svm = make_pipeline(StandardScaler(), SVC(kernel='linear', degree=2, gamma='auto'))
    return self.common(svm, cross, predict)

  #KNeighbor
  def KNeighClassifier(self, cross, predict = None):
    #K = 3 => 0.45053959627329193
    #K = 40 => 0.4872437888198757
    #K = 10 => 0.46856754658385097
    #K = 200 => 0.5090217391304347
    neigh = KNeighborsClassifier(n_neighbors=40)
    return self.common(neigh, cross, predict)

  #Random Forest
  def RanForestClassifier(self, cross, predict = None):
    #0.5152445652173914 at maxdepth = 2
    #0.5494836956521738 at maxdepth = 40
    rfc = RandomForestClassifier(max_depth=40, random_state=0)
    return self.common(rfc, cross, predict)

  #Stochastic Gradient Descent
  def SgdClassifier(self, cross, predict = None):
    #0.4922282608695653
    sgd = make_pipeline(StandardScaler(), SGDClassifier(max_iter=1000))
    return self.common(sgd, cross, predict)

  #Neural Network
  def NeuClassifier(self, cross, predict = None):
    #0.5065062111801243
    neu = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
    return self.common(neu, cross, predict)

  #Linear Support Vector Classifier <=> SVM with linear kernel
  def LinSVCClassifier(self, cross, predict = None):
    #0.5028454968944099
    linsvc = make_pipeline(StandardScaler(), LinearSVC(tol=1e-5, dual=False))
    return self.common(linsvc, cross, predict)

  #Perceptron Classifier
  def PerceptronClassifier(self, cross, predict = None):
    #0.4647399068322981
    perc = Perceptron(tol=1e-3, random_state=0)
    return self.common(perc, cross, predict)

  #Polynomial Kernel SVM Classifier
  def PolyClassifier(self, cross, predict = None):
    #0.5221467391304349
    poly_kernel_svm_clf = Pipeline([
        ('scaler', StandardScaler()),
        ('svm_clf', SVC(kernel="poly", degree=3, coef0=1, C=5))
    ])
    return self.common(poly_kernel_svm_clf, cross, predict)

  #Gaussian RBF Kernel SVM Classifier
  def RbfClassifier(self, cross, predict = None):
    #0.5065062111801243
    rbf_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("svm_clf", SVC(kernel="rbf", gamma="auto", C=0.001))
    ])
    return self.common(rbf_kernel_svm_clf, cross, predict)

classifiers = Classifier(X, y)

classifiers.LinSVCClassifier(cross=True, predict=X[0])

mean of cross validation test score: 0.5028454968944099

cross validation test score: [0.59006211 0.43478261 0.43478261 0.47204969 0.47826087 0.49689441
 0.54037267 0.5625     0.50625    0.5125    ]

predictions: [1 0 1 ... 1 1 1]

return values ----------



(0.5028454968944099,
 array([0.59006211, 0.43478261, 0.43478261, 0.47204969, 0.47826087,
        0.49689441, 0.54037267, 0.5625    , 0.50625   , 0.5125    ]),
 array([1, 0, 1, ..., 1, 1, 1]))

##**Model Comparisions & Selection**

###**def ClassifierSelector**
This function create the dictionary each models' mean of test scores that are derived by 10-Fold Cross Validation.
Also it creates the different mean of test scores by the data, **whether the missing features are filled by mean, median, or zero.**

---



In [None]:
def ClassifierSelector():
  dictionary = {}
  for datatype in ["Mean", "Zero", "Median"]:
    dictionary[datatype] = {}
    if datatype == "Mean":
      Data = FullData(MissingTask(df).Mean())
    elif datatype == "Zero":
      Data = FullData(MissingTask(df).Zero())
    elif datatype == "Median":
      Data = FullData(MissingTask(df).Median())
    
    X = Data.getVolumeArray('All')
    y = Data.getDiagnosisArray()

    log = [LogisticRegression(max_iter=7600, C=1e5), 'Logistic']
    dtc = [DecisionTreeClassifier(random_state=0), 'Decision Tree']
    svm = [make_pipeline(StandardScaler(), SVC(gamma='auto')), 'SVM']
    neigh = [KNeighborsClassifier(n_neighbors=40), 'k Neigh']
    rfc = [RandomForestClassifier(max_depth=40, random_state=0), 'Random Forest']
    sgd = [make_pipeline(StandardScaler(), SGDClassifier(max_iter=1000)), 'SGD']
    neu = [MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1), 'Neural Network']
    linsvc = [make_pipeline(StandardScaler(), LinearSVC(tol=1e-5, dual=False)), 'Linear SVC']
    perc = [Perceptron(tol=1e-3, random_state=0), 'Perceptron']
    poly_kernel_svm_clf = [Pipeline([
        ('scaler', StandardScaler()),
        ('svm_clf', SVC(kernel="poly", degree=3, coef0=1, C=5))
    ]), 'Poly Kernel']
    rbf_kernel_svm_clf = [Pipeline([
        ("scaler", StandardScaler()),
        ("svm_clf", SVC(kernel="rbf", gamma=5, C=0.001))
    ]), 'Gaussian RBF Kernel']

    models = [log, dtc, svm, neigh, rfc, sgd, neu, linsvc, perc, poly_kernel_svm_clf, rbf_kernel_svm_clf]
    for model in models:
      sum = 0
      crossValidationResults = cross_validate(model[0], X, y, cv=kfold, return_train_score=True)
      for index in range(10):
        sum = sum + crossValidationResults['test_score'][index]
      mean = sum / 10
      dictionary[datatype][model[1]] = mean
  return dictionary
ClassifierSelector()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


{'Mean': {'Decision Tree': 0.4617546583850931,
  'Gaussian RBF Kernel': 0.5065062111801243,
  'Linear SVC': 0.5028454968944099,
  'Logistic': 0.5108928571428571,
  'Neural Network': 0.5065062111801243,
  'Perceptron': 0.4647399068322981,
  'Poly Kernel': 0.5221467391304349,
  'Random Forest': 0.5494836956521738,
  'SGD': 0.47864130434782604,
  'SVM': 0.547014751552795,
  'k Neigh': 0.4872437888198757},
 'Median': {'Decision Tree': 0.45864130434782613,
  'Gaussian RBF Kernel': 0.5065062111801243,
  'Linear SVC': 0.5034666149068323,
  'Logistic': 0.5096506211180125,
  'Neural Network': 0.5065062111801243,
  'Perceptron': 0.4268827639751553,
  'Poly Kernel': 0.5215256211180125,
  'Random Forest': 0.5457453416149068,
  'SGD': 0.4816459627329192,
  'SVM': 0.547639751552795,
  'k Neigh': 0.48661878881987575},
 'Zero': {'Decision Tree': 0.45368012422360254,
  'Gaussian RBF Kernel': 0.5065062111801243,
  'Linear SVC': 0.5040683229813665,
  'Logistic': 0.5059549689440994,
  'Neural Network': 0.

##**Comparision of the mean of the test scores by each models that are dervied by each different dataframes (Mean, Median, Zero)**

As below, each test scores by 10 fold cross validation at the different data frames which are filled with ZERO, MEAN, MEDIAN.

###**Mean**###
At the dataframe whose missing values are filled by ***mean***, the **Random forest** has the biggest accuracy of mean of 10 fold cross validation test score. 
> **Winner: Random Forest (0.5494836956521738)**

###**Median**###
At the dataframe whose missing values are filled by ***median***, the **Support Vector Machine** has the biggest accuracy of median of 10 fold cross validation test score.
> **Winner: Support Vector Machine (0.547639751552795)**

###**Zero**###
At the dataframe whose missing values are filled by ***zero***, the **Random Forest** has the biggest accuracy of zero of 10 fold cross validation test score.
> **Winner: Random Forest (0.5544487577639752)**
---
##**The DataFrame whose missing values are filled with ZERO has the biggest results of test score**

> ***Zero(0.5544487577639752) > Mean(0.5494836956521738) > Median(0.547639751552795)***

In [None]:
mean = [0.4617546583850931, 0.5065062111801243, 0.5028454968944099, 0.5108928571428571, 0.5065062111801243, 0.4647399068322981, 0.5221467391304349, 0.5494836956521738, 0.4785481366459628, 0.547014751552795, 0.4872437888198757]
#1. Random Forest #2. SVM
#0.5494836956521738
median = [0.45864130434782613, 0.5065062111801243, 0.5034666149068323, 0.5096506211180125, 0.5065062111801243, 0.4268827639751553, 0.5215256211180125, 0.5457453416149068, 0.4797437888198758,0.547639751552795, 0.48661878881987575]
#1. SVM #2. Random Forest
#0.547639751552795
zero = [0.45368012422360254, 0.5065062111801243, 0.5040683229813665, 0.5059549689440994, 0.5065062111801243, 0.45367624223602476, 0.5239868012422361, 0.5544487577639752, 0.4717041925465838, 0.5420225155279503, 0.4841187888198758]
#1. Random Forest #2.SVM
#0.5544487577639752

#Zero > Mean > Median
print('max of mean data:',np.max(mean))
print('max of median data:',np.max(median))
print('max of zero data:',np.max(zero))

max of mean data: 0.5494836956521738
max of median data: 0.547639751552795
max of zero data: 0.5544487577639752


#**Task 2 (3-logit Regression)**
##**::Regressor::**
This Class is consist of various models to regression the 'ADAS11', 'ADAS13', 'MMSE'.
>
Each model can be selected whether the model should be trained and tested with the whole data, or **10-fold cross valdation**. Each of them also has a method for predicting.
>
The Class has total 5 models for regression, and each of them can be called by 

>```regressors = Regressor(X, y)```
```regressors.LinearRegressor(cross=True, predict=X[0], predictIndex = None)```

###**Model Parameters**###
```regressors.model(cross = True, predict = None, predictIndex = None)```
####**cross = True(default)**
> This parameter decide whether the model train and test with the 10 fold cross validation or just raw set.

####**predict = None(default)**
> With this parameter, if predict parameter exists, model return the predicted value of the model.

####**predictIndex = None(default)**
> This parameter helps to derive the actual value by derving the y[predictIndex]

When the parameter ***cross*** is set to True, the model returns follow 3 items.

1. The **mean** of the test scores by 10 fold cross validation.
2. The array of the ten test scores by 10 fold cross validation.

When the parameter ***cross*** is set to False, the model returns following item.

1. The test score by train dataset.

When the parameter ***predict*** is set to True, the model returns
1. if **cross** = True, => 
  1. The array of the ten predicted values by 10 fold cross validation.
  2. The actual values of ADAS11, ADAS13, MMSE
2. if **cross** = False, => 
  1. The predicted values from the model that trained by full data.
  2. The actual values of ADAS11, ADAS13, MMSE
---
###**Models**

> **Linear Regression**: Regression using Linear Regression.
```regressors.LinearRegressor(cross=True)```

> **Elastic Net Regression**: Regression using Elastic Net.
```regressors.ElasticRegressor(cross=True)```

> **Multi Task Elastic Net Regression**: Regression using Multi Task Elastic Net.
```regressors.MultiTaskElasticRegressor(cross= True)```

> **Ridge Regression**: Regression using Ridge Regression.
```regressors.RidRegressor(cross=True)```

> **Lasso Regression**: Linear Model trained with L1 prior as regularizer.(=lasso)
```regressors.LasRegressor(cross=True)```

> **Multi Task Lasso Regression**: Multi Task Lasso Regresson.
```regressors.MultiTaskLassoRegressor(cross=True)```

> **KNeighbors Regressionm**: Regression using K Neighbors. K is set to 40.
```regressors.KNeighRegressor(cross=True)```


In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import ElasticNet
from sklearn.linear_model import Ridge
from sklearn import linear_model
from sklearn.pipeline import make_pipeline
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score

X = Data.getVolumeArray('All')
y = Data.getScoreArray('All')

class Regressor:
  def __init__(self, X, y):
    self.X = X
    self.y = y

  def common(self, regressor, cross = False, predict = None, predictIndex = None):
    if cross is True:
      predictions = cross_val_predict(regressor, X, y, cv=kfold)
      crossValidationResults = cross_validate(regressor, X, y, cv=kfold, scoring=('r2'), return_train_score=True)
      sum = 0
      for index in range(10):
        sum = sum + crossValidationResults['test_score'][index]
      mean = sum / 10

      if predict is not None:
        print('mean of cross validation test score:',mean, end="\n\n")
        print('cross validation test score:',crossValidationResults['test_score'], end="\n\n")
        print('predictions:',predictions, end="\n\n")
        print('actual value:',y[predictIndex], end="\n\n")
        print('return values ----------', end="\n\n")
        return mean, crossValidationResults['test_score'], predictions
      else:
        print('mean of cross validation test score:',mean, end="\n\n")
        print('cross validation test score:',crossValidationResults['test_score'], end="\n\n")
        print('return values ----------', end="\n\n")
        return mean, crossValidationResults['test_score']

    elif cross is False:
      regressor.fit(self.X, self.y)
      prediction = regressor.predict([predict])
      if predict is not None:
        print('score of full data:',regressor.score(self.X, self.y), end="\n\n")
        print('prediction:', prediction, end="\n\n")
        print('actual value:',y[predictIndex], end="\n\n")
        print('return values ----------', end="\n\n")
        return regressor.score(self.X, self.y), prediction
      else:
        print('score of full data:',regressor.score(self.X, self.y), end="\n\n")
        print('return values ----------', end="\n\n")
        return regressor.score(self.X, self.y, end="\n\n")

  #Linear Regression
  def LinearRegressor(self, cross, predict = None, predictIndex = None):
    lrg = LinearRegression()
    return self.common(lrg, cross, predict, predictIndex)

  #ElasticNet Regression
  def ElasticRegressor(self, cross, predict = None, predictIndex = None):
    eln = ElasticNet(max_iter=2125)
    return self.common(eln, cross, predict, predictIndex)

  #Multi Task Elastic Regression
  def MultiTaskElasticRegressor(self, cross, predict = None, predictIndex = None):
    mte = linear_model.MultiTaskElasticNet(alpha=0.1, max_iter = 2000)
    return self.common(mte, cross, predict, predictIndex)
  
  #Multi Task Lasso Regression
  def MultiTaskLassoRegressor(self, cross, predict = None, predictIndex = None):
    mtl = linear_model.MultiTaskLasso(alpha=0.1, max_iter=2000)
    return self.common(mtl, cross, predict, predictIndex)

  #Ridge Regeression
  def RidRegressor(self, cross, predict = None, predictIndex = None):
    rid = Ridge(alpha=1.0)
    return self.common(rid, cross, predict, predictIndex)

  #Lasso Regression
  def LasRegressor(self, cross, predict = None, predictIndex = None):
    las = linear_model.Lasso(alpha=0.1, max_iter=2000)
    return self.common(las, cross, predict, predictIndex)

  #KNeighbor Regression
  def KNeighRegressor(self, cross, predict = None, predictIndex = None):
    neigh = KNeighborsRegressor(n_neighbors=40)
    return self.common(neigh, cross, predict, predictIndex)

regressors = Regressor(X, y)
regressors.MultiTaskElasticRegressor(cross=True, predict=X[2], predictIndex=2)

mean of cross validation test score: 0.3703170871869467

cross validation test score: [0.43407818 0.35480642 0.27694822 0.44954425 0.30628531 0.24354555
 0.43150638 0.41392495 0.40313057 0.38940104]

predictions: [[10.54889075 15.65601619 26.68079172]
 [ 9.55436866 15.79896773 28.28977027]
 [17.14575713 26.38577851 25.49440071]
 ...
 [16.89910219 27.05197709 24.58709307]
 [11.20646296 17.24517134 26.820145  ]
 [ 9.04889151 14.44490892 27.79371537]]

actual value: [16.33 29.33 25.  ]

return values ----------



(0.3703170871869467,
 array([0.43407818, 0.35480642, 0.27694822, 0.44954425, 0.30628531,
        0.24354555, 0.43150638, 0.41392495, 0.40313057, 0.38940104]),
 array([[10.54889075, 15.65601619, 26.68079172],
        [ 9.55436866, 15.79896773, 28.28977027],
        [17.14575713, 26.38577851, 25.49440071],
        ...,
        [16.89910219, 27.05197709, 24.58709307],
        [11.20646296, 17.24517134, 26.820145  ],
        [ 9.04889151, 14.44490892, 27.79371537]]))

##**Model Comparisions & Selection**
###**def RegressionSelector**
This function create the dictionary each models' mean of test scores that are derived by 10-Fold Cross Validation.
Also it creates the different mean of test scores by the data, **whether the missing features are filled by mean, median, or zero.**

---

In [None]:
def RegressionSelector():
  dictionary = {}
  for datatype in ["Mean", "Zero", "Median"]:
    dictionary[datatype] = {}
    if datatype == "Mean":
      Data = FullData(MissingTask(df).Mean())
    elif datatype == "Zero":
      Data = FullData(MissingTask(df).Zero())
    elif datatype == "Median":
      Data = FullData(MissingTask(df).Median())
    
    X = Data.getVolumeArray('All')
    y = Data.getScoreArray('All')

    lrg = [LinearRegression(), 'linear']
    eln = [ElasticNet(max_iter=2125), 'elastic net']
    mte = [linear_model.MultiTaskElasticNet(alpha=0.1, max_iter = 2000), 'multi task elastic net']
    rid = [Ridge(alpha=1.0), 'ridge']
    las = [linear_model.Lasso(alpha=0.1, max_iter=2000), 'lasso']
    mtl = [linear_model.MultiTaskLasso(alpha=0.1, max_iter=2000), 'multi task lasso']
    neigh = [KNeighborsRegressor(n_neighbors=40), 'k neighbor']

    models = [lrg, eln, mte, rid, las, mtl, neigh]

    for model in models:
      sum = 0
      crossValidationResults = cross_validate(model[0], X, y, cv=kfold, scoring=("r2"), return_train_score=True)
      for index in range(10):
        sum = sum + crossValidationResults['test_score'][index]
      mean = sum / 10
      dictionary[datatype][model[1]] = mean
  return dictionary

RegressionSelector()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


{'Mean': {'elastic net': 0.28369656792203257,
  'k neighbor': 0.15656783877861807,
  'lasso': 0.35312534424768305,
  'linear': 0.34961901498407255,
  'multi task elastic net': 0.3703170871869467,
  'multi task lasso': 0.3684817569638511,
  'ridge': 0.3553057787909294},
 'Median': {'elastic net': 0.28376987486753946,
  'k neighbor': 0.15679314050856558,
  'lasso': 0.35307853612971635,
  'linear': 0.34950618715105325,
  'multi task elastic net': 0.37026894538961974,
  'multi task lasso': 0.3684371067360447,
  'ridge': 0.3552137266852444},
 'Zero': {'elastic net': 0.29742061536291925,
  'k neighbor': 0.16763792715945564,
  'lasso': 0.35994142574503807,
  'linear': 0.35380031294849146,
  'multi task elastic net': 0.3738753161956154,
  'multi task lasso': 0.37314815849067295,
  'ridge': 0.3595024221250281}}

##**Comparision of the mean of the test scores by each models that are dervied by each different dataframes (Mean, Median, Zero)**

As below, each test scores by 10 fold cross validation at the different data frames which are filled with ZERO, MEAN, MEDIAN.

###**Mean**###
At the dataframe whose missing values are filled by ***mean***, the **Multi Task Elastic Net** has the biggest accuracy of mean of 10 fold cross validation test score. 
> **Winner: Multi Task Elastic Net (0.3703170871869467)**

###**Median**###
At the dataframe whose missing values are filled by ***median***, the **Multi Task Elastic Net** has the biggest accuracy of median of 10 fold cross validation test score.
> **Winner: Multi Task Elastic Net (0.37026894538961974)**

###**Zero**###
At the dataframe whose missing values are filled by ***zero***, the **Multi Task Elastic Net** has the biggest accuracy of zero of 10 fold cross validation test score.
> **Winner: Multi Task Elastic Net (0.3738753161956154)**
---
##**The DataFrame whose missing values are filled with ZERO has the biggest results of test score**

> ***Zero(0.3738753161956154) > Median(0.37026894538961974) > Mean(0.3703170871869467)***

In [None]:
mean = [0.28369656792203257, 0.15656783877861807, 0.35312534424768305, 0.34961901498407255, 0.3703170871869467, 0.3684817569638511, 0.3553057787909294]
#1.Multi Task Elastic Net 2. Multi Task Lasso
median = [0.28376987486753946, 0.15679314050856558, 0.35307853612971635, 0.34950618715105325, 0.37026894538961974, 0.3684371067360447, 0.3552137266852444]
#1.Multi Task Elastic Net 2. Multi Task Lasso
zero = [0.29742061536291925, 0.16763792715945564, 0.35994142574503807, 0.35380031294849146, 0.3738753161956154, 0.37314815849067295, 0.3595024221250281]
#1.Multi Task Elastic Net 2. Multi Task Lasso

#Zero > Mean > Median
print("max of mean:", np.max(mean))
print("max of median:", np.max(median))
print("max of zero:", np.max(zero))

max of mean: 0.3703170871869467
max of median: 0.37026894538961974
max of zero: 0.3738753161956154


#**Persist the model by weights file and download**


In [None]:
import joblib
from google.colab import files

Data = FullData(MissingTask(df).Zero())

X = Data.getVolumeArray('All')
yreg = Data.getScoreArray('All')
yclf = Data.getDiagnosisArray()


mte = linear_model.MultiTaskElasticNet(alpha=0.1, max_iter = 2000)
mte.fit(X, yreg)
cross_validate(mte, X, yreg, cv=kfold, scoring=('r2'), return_train_score=True)

rfc = RandomForestClassifier(max_depth=40, random_state=0)
rfc.fit(X, yclf)
cross_validate(rfc, X, yclf, cv=kfold, return_train_score=True)

joblib.dump(mte, 'elasticNet.pkl')
joblib.dump(rfc, 'randomForest.pkl')

#Regressor.predict([X[0]])
#Classifier.predict([X[0]])

files.download('elasticNet.pkl')
files.download('randomForest.pkl')


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

---
#**Analysis & Discussion**
---
> With the morphological phenotypes of brain from the data, there are Cortical Volume of 70 brain regions (ST102CV - ST99CV), and the Average Thickness (ST102TA - ST99TA) of 70 brain regions.
> By classfiying the DX-bl which means the diagnosis group of 0: Cognitive Normal, 1: Mild Cognitive Normal, 2: Alzheimer's disease, the 70 regions of Cortical Volume and 70 regions of Average Thickness of each patient is showing slightly different aspects between each diagnosis group.

##**Comparisions of the morphological phenotypes by each diagnosis group**

In [None]:
class0 = ClassData(Data.data, 0)
class1 = ClassData(Data.data, 1)
class2 = ClassData(Data.data, 2)

print('Mean of Cortical Volume between classes',class0.MeanOfCV(), class1.MeanOfCV(), class2.MeanOfCV())
print('Mean of Average Thickness between classes',class0.MeanOfAT(), class1.MeanOfAT(), class2.MeanOfAT())

Mean of Cortical Volume between classes 6111.859006278583 6023.156696717224 5586.838403479871
Mean of Average Thickness between classes 2.3929713185024424 2.33627246365299 2.2235102472960797


---
> As Above, there are not large distictions between each diagnosis group with the Cortical Volume and Average Thickness of brain, **but still have a slightly difference with the amount of them**.

**Comparision to mean of amount of Cortical Volume between diagnosis group** (Missing features were filled with mean of each column)

> ***Cognitive Normal (6111.85) > Mild Cognitive Impariment (6023.15) > Alzheimer's disease (5586.83)***

**Comparision to mean of amount of Average Thickness between diagnosis group**

> ***Cognitive Normal (2.39) > Mild Cognitive Impariment (2.33) > Alzheimer's disease (2.22)***


> As above comparision with the mean of amount by the each morphological phenotypes, which can be categorized to the Cortical Volume and Average Thickness, have a relation with determining the diagnosis group.
---

####**As the results,**

> The amount of morphological phenotypes of Cortical Volume and Average Thickness for each diagnosis group are showing different aspects with the amount of each, cleary reveals that the amount of the Cortical Volume and the Average Thickness is getting smaller as the condition of brain gets worse. This shows that each of the morphological phenotypes that are Cortical Volume and Average Thickness are related to brain disease with the "Alzheimer's", as getting smaller of each 70 brain regions shows bad results to brain.
---

##**Task 1 (3-class Classification): Predict the diagnosis group of subjects**
---

Like above, the diagnosis group of subjects can be specified with the amount of each Cortical Volume and Average Thickness of 70 regions of brain, by having slightly differences of amounts between diagnosis groups. The diagnosis group which indicates the label 'DX_bl' in the data can be predicted with the 70 brain regions of Cortical Volume and 70 brain regions of Average Thickness.

Tasking the Classification, there were total 11 classification models to predict the diagnosis group of subjects, each of the shows different accuracy of predicting diagnosis group.

---
####**Models of Classification Task**

- **Decision Tree Classifier**

- **Gausian RBF Kernel SV Classifier**

- **Perceptron Classifier**

- **Polynomial Kernel SV Classifier**

- **Neural Network Classifier**

- **Logitic Regression Classifier**

- **Linear SV Classifier <=> SVM (linear kernel)**

- **Random Forest Classifier**

- **Stochastic Gradient Descent Classifier**

- **Support Vector Machine (linear kernel) Classifier**

- **KNeighbor Classifier**

---
#####Each models gave the different accuracy at the three different data, which are filled with three different criteria (Mean, Median, Zero). Below table shows the mean of accuracy from 10 fold cross validation of each models from the different tasked data with the different criteria of Zero, Mean, Median.
---

|            	| **Decision Tree** 	| **Gaussian RBF SVM** 	| **Linear SVC** 	| **Logistic** 	| **Neural Network** 	| **Perceptron** 	| **Polynomial SVM** 	| **Random Forest** 	|  **SGD**  	|  **SVM**  	| **K Neighbors** 	|
|:----------:	|:------------------:	|:---------------------------:	|:--------------:	|:------------:	|:------------------:	|:--------------:	|:-------------------------:	|:-----------------:	|:---------:	|:---------:	|:---------------:	|
|  **Mean**  	|      0.4617546     	|          0.5065062          	|    0.5028454   	|   0.5108928  	|      0.5065062     	|    0.4647399   	|         0.5221467         	|     0.5494836     	| 0.4786413 	| 0.5470147 	|    0.4872437    	|
|  **Zero**  	|      0.4536801     	|          0.5065062          	|    0.5040683   	|   0.5059549  	|      0.5065062     	|    0.4536762   	|         0.5239868         	|     0.5544487     	| 0.4952989 	| 0.5420225 	|    0.4841187    	|
| **Median** 	|      0.4586413     	|          0.5065062          	|    0.5034666   	|   0.5096506  	|      0.5065062     	|    0.4268827   	|         0.5215256         	|     0.5457453     	| 0.4816459 	| 0.5476397 	|    0.4866187    	|
|   **MAX**  	|      0.4617546     	|          0.5065062          	|    0.5040683   	|   0.5108928  	|      0.5065062     	|    0.4647399   	|         0.5239868         	|     0.5544487     	| 0.5034239 	| 0.5476397 	|    0.4872437    	|

---
As above, at the three different dataset whose missing features are filled with different cirteria (Zero, Mean, Median), each model shows different aspects of test scores. 
Among of them, **the Random Forest showed the largest test score** from the **Zero-filled** dataset,the score of 0.5544487. 
#####The model that showed second largest score was the **SVM (Linear Kernel)** about 0.5420225 of score at the Zero-filled data, the first largest score at the Median-filled data with the score of 0.5476397.
---

#####**Comparision of test scores between models at Zero-Filled Data**

> ***Random Forest (0.5544487) > SVM (Linear Kernel) (0.5420225) > SVM (Poly Kernel) (0.5239868)***

#####**Comparision of test scores between models at Mean-Filled Data**

> ***Random Forest (0.5494836) > SVM (Linear Kernel) (0.5470147) > SVM (Poly Kernel) (0.5221467)***

#####**Comparision of test scores between models at Median-Filled Data**

> ***SVM (Linear Kernel) (0.5476397) > Random Forest (0.5457453) >  SVM (Poly Kernel) (0.5215256)***

---

#####**As the results,** **the Random Forest** model at the data whose missing features are filled with **the Zero** has the largest accuracy beyond other models at the three different filled-data.




##**Task 2 (3-logit Regression): Predict the cognitive assessment scores of subjects**
---

With the regression task to predict the cognitive assessment scores of subjects which are ADAS11, ADAS13, MMSE, each assessment scores of them shows the higher score leads to the normal type of disease, while the lower score of each of them indicates disease like Alzheimer's.

Tasking of regression, there were total 7 regression tasks to predict the cognitive assessment scores of subject, each of them shows different accuracy of predicting scores.

---

####**Models of Regression task**

- **Linear Regression**

- **Lasso Regression**

- **Multi Task Lasso Regression**

- **Ridge Regression**

- **Elastic Net Regression**

- **Multi Task Elastic Net Regression**

- **KNeighbors Regression**

---
As the classification models above do, each models gave the different accuracy at the three different data, which are filled with three different criteria (Mean, Median, Zero). Below table shows the mean of accuracy from 10 fold cross validation of each models from the different tasked data with the different criteria of Zero, Mean, Median.

---

|            	| **Elastic Net** 	| **Multi Task Elastic Net** 	| **K Neighbors** 	| **Lasso** 	| **Linear** 	| **Multi Task Lasso** 	| **Ridge** 	|
|:----------:	|:---------------:	|:--------------------------:	|:---------------:	|:---------:	|:----------:	|:--------------------:	|-----------	|
|  **Mean**  	|    0.3703170    	|          0.3703170         	|    0.1565678    	| 0.3531253 	|  0.3496190 	|       0.3684817      	| 0.3553057 	|
|  **Zero**  	|    0.2974206    	|          0.3738753         	|    0.1676379    	| 0.3599414 	|  0.3538003 	|       0.3731481      	| 0.3595024 	|
| **Median** 	|    0.28376987   	|          0.3702689         	|    0.1567931    	| 0.3530785 	|  0.3495061 	|       0.3684371      	| 0.3552137 	|
|   **MAX**  	|    0.3703170    	|          0.3738753         	|    0.1676379    	| 0.3599414 	|  0.3538003 	|       0.3731481      	| 0.3595024 	|

---

As above, at the three different dataset whose missing features are filled with different cirteria (Zero, Mean, Median), each model shows different aspects of test scores. 
Among of them, **Multi task Elastic Net showed the largest test score** from the **Zero-filled** dataset,the score of 0.3738753. The Multi task Elastic Net won at all 3 different data. 
#####The model that showed second largest score was the **Lasso** about 0.3738753 of score at the Zero-filled data.
---

#####**Comparision of test scores between models at Zero-Filled Data**

> ***Multi Task Elastic Net (0.3738753) > Multi Task Lasso (0.3731481) > Lasso (0.3599414)***

#####**Comparision of test scores between models at Median-Filled Data**

> ***Multi Task Elastic Net (0.3702689) >= Elastic Net (0.3702689) > Multi Task Lasso (0.3684371)***

#####**Comparision of test scores between models at Mean-Filled Data**

> ***Multi Task Elastic Net (0.3703170) > Multi Task Lasso (0.3684817) >  Ridge (0.3553057)***

---

#####**As the results,** **the Multi Task Elastic Net** model at the data whose missing features are filled with **the Zero** has the largest accuracy beyond other models at the three different filled-data.



##**Summary**
Since the morophological phenotypes, which are Cortical Volume and Average Thickness of 70 brain regions are related to the degree of cognitive disease, predicting the diagnosis group and assessment scores could be derived with the data of 140 columns from dataframe.

With the 3 different dataframe, whose missing features are filled with three different criteria (Mean, Median, Zero), they derived slightly different results of accuracy with both regression and classification task.

Both classifciation and regression task brought the same results that **the Zero-filled missing features data had more accuracy** than the other data such as Mean-filled and Median-filled.

**At the classification task, the Random Forest classifier** at the Zero-filled data, with the 10-fold-cross validation gave the best test_score, the accuracy of 0.5544487.

**At the regression task, the Multi Task Elastic Net Regression** at the Zero-filled data, with the 10-fold-cross validation gave the best test_score, the accuracy of 0.3738753.