### Data Analysis on MCI and ADD patient Data

Here a machine Learning Model to predict if the Patient has MCI, ADD or Not shall be buiild. A number of steps are going to be taken and this include;
1. Creating of the suitable Datasets from the three Excel files shared. The Datasets will include
a) MCI and Controls Dataset
b) ADD and Controls Dataset
c) MCI, ADD and controls Dataset

2. Pre-Analysis of the Data. We want to establish;
a) The number of features in the three datasets(excel files)
b) The Number of records in the datasets
c) The composition of the Dataset, how many MCI, ADD and controls records in the datasets
d) Are there any missing values?

3. Data Preprocessing
a) There were missing values in the Datasets. Some of the features had no values at all(100% missing values). Features with missing values greater than 80% were not considered for subsequent steps.
b) Records with null missing values whose features were 80% and above having values were dropped/removed from the datasets
c) Label Encoding was performed on the features. Machine learning only performs on encoded data(Converted to numbers). So performed encoding(Conversion of textual data into numbers). Once encoded, this data wil be used for building, training and testing of the Models

4. Model Building, Training and Testing 
 CLassification algorithms shall be used to build this models, Logistic Regression, SVM Adabost etc

5. Model Evaluation & Selection using Cross Validation, AUC etc



In [1]:
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
np.random.seed(45)
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from pandas.plotting import table
from scipy import interp

from sklearn.feature_selection import SelectKBest, f_classif,chi2
import seaborn as sns

from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.metrics import accuracy_score,log_loss
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis

from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction import DictVectorizer

from sklearn.preprocessing import StandardScaler,MinMaxScaler

from sklearn.model_selection import KFold,cross_val_score,GridSearchCV

from sklearn.metrics import accuracy_score, roc_auc_score,confusion_matrix
from sklearn.utils import resample
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression,LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
import seaborn as sns

from sklearn.metrics import recall_score, precision_score,f1_score
from skfeature.function.similarity_based import fisher_score   
from sklearn.model_selection import learning_curve
from sklearn.model_selection import ShuffleSplit

1. Helper Functions that will come in Handy

In [2]:
# Create table for missing data analysis
def draw_missing_data_table(df):
    total = df.isnull().sum().sort_values(ascending=False)
    percent = (df.isnull().sum()/df.isnull().count()).sort_values(ascending=False)*100
    missing_data = pd.concat([total, percent], axis=1, keys=['Total missing Values', '% of Missing Values'])
    return missing_data

#pick the Top 10 Features by Default
def view_best_features(features, labels, number_of_features = 10):
    selector = SelectKBest(score_func=f_classif, k = number_of_features) #f_classif
    #New dataframe with the selected features for later use in the classifier. fit() method works too, if you want only the feature names and their corresponding scores
   
    X_new = selector.fit_transform(features, labels)
    names = features.columns.values[selector.get_support()]
    scores = selector.scores_[selector.get_support()]

    names_scores = list(zip(names, scores))
    ns_df = pd.DataFrame(data = names_scores, columns=['Feature', 'Scores'])
    #Sort the dataframe for better visualization
    ns_df_sorted = ns_df.sort_values('Scores',ascending=False)
    return ns_df_sorted


def drop_null_columns(data, percentage):
     return data[data.columns[data.isnull().mean() < percentage]]

In [3]:
def load_data(file_path):
    return pd.read_excel(file_path) 

#We are adding a column, label to indicate if patient has (MCI or ADD) or not(Control)
def label_data(data, label):
    data['label'] = label
    return data  
#fixing messy column names
def fixing_column_names(data):
    # remove spaces in columns name
    data.columns = data.columns.str.replace(' ','_')
    # remove xters like ():[] in the column names
    data.columns = data.columns.str.replace("[()\[\]:]",'')
    # convert to lower case
    data.columns = data.columns.str.lower()
    return data
def combine_two_dataframes(first_dataframe, second_dataframe):
    return pd.concat([first_dataframe, second_dataframe], ignore_index=True, sort=False)

def encode_data(data):
        # Categorical boolean mask
    categorical_feature_mask = data.dtypes==object
    # filter categorical columns using mask and turn it into a list
    categorical_cols = data.columns[categorical_feature_mask].tolist()
    # import labelencoder
    from sklearn.preprocessing import LabelEncoder
    # instantiate labelencoder object
    le = LabelEncoder()
   # apply le on categorical feature columns
    data[categorical_cols] = data[categorical_cols].apply(lambda col: le.fit_transform(col))
    return data
def scale_training_set(X_train, X_test):
    scaling = MinMaxScaler(feature_range=(-1,1)).fit(X_train)
    X_train = scaling.transform(X_train)
    X_test = scaling.transform(X_test)
    return X_train, X_test

1. Loading dataset as is

In [4]:
data_mci = load_data("data/AD_PAGN_GN_C01_00121_MCI.xlsx")
data_controls = load_data("data/AD_PAGN_GN_C01_00141_control.xlsx")
data_add         = load_data("data/AD_PAGN_GN_C01_00240_AD.xlsx")


2. Observing the first two records of the mci data

In [5]:
data_mci.head(2)

Unnamed: 0,Recording timestamp,Project name,Export date,Participant name,Gender,Glasses,Age,Recording name,Recording date,Recording start time,Recording duration,Timeline name,Recording Fixation filter name,Recording software version,Recording resolution height,Recording resolution width,Recording monitor latency,Average calibration accuracy (mm),Average calibration precision SD (mm),Average calibration precision RMS (mm),Average calibration accuracy (degrees),Average calibration precision SD (degrees),Average calibration precision RMS (degrees),Average calibration accuracy (pixels),Average calibration precision SD (pixels),Average calibration precision RMS (pixels),Eyetracker timestamp,Event,Event value,Gaze point X,Gaze point Y,Gaze point left X,Gaze point left Y,Gaze point right X,Gaze point right Y,Gaze direction left X,Gaze direction left Y,Gaze direction left Z,Gaze direction right X,Gaze direction right Y,Gaze direction right Z,Pupil diameter left,Pupil diameter right,Validity left,Validity right,Eye position left X (DACSmm),Eye position left Y (DACSmm),Eye position left Z (DACSmm),Eye position right X (DACSmm),Eye position right Y (DACSmm),Eye position right Z (DACSmm),Gaze point left X (DACSmm),Gaze point left Y (DACSmm),Gaze point right X (DACSmm),Gaze point right Y (DACSmm),Gaze point X (MCSnorm),Gaze point Y (MCSnorm),Gaze point left X (MCSnorm),Gaze point left Y (MCSnorm),Gaze point right X (MCSnorm),Gaze point right Y (MCSnorm),Presented Stimulus name,Presented Media name,Presented Media width,Presented Media height,Presented Media position X (DACSpx),Presented Media position Y (DACSpx),Original Media width,Original Media height,Eye movement type,Gaze event duration,Eye movement type index,Fixation point X,Fixation point Y,Fixation point X (MCSnorm),Fixation point Y (MCSnorm),AOI hit [TP_Lt - TP_Lt_C],AOI hit [TP_Lt - TP_Lt_L],AOI hit [TP_Lt - TP_Lt_R],AOI hit [TP_Rt - TP_Rt_C],AOI hit [TP_Rt - TP_Rt_L],AOI hit [TP_Rt - TP_Rt_R],AOI hit [CueG - CueG_C],AOI hit [CueG - CueG_L],AOI hit [CueG - CueG_R],AOI hit [TG_Rt - TG_Rt_C],AOI hit [TG_Rt - TG_Rt_L],AOI hit [TG_Rt - TG_Rt_R],AOI hit [CueP - CueP_C],AOI hit [CueP - CueP_L],AOI hit [CueP - CueP_R],AOI hit [TN_Rt - TN_Rt_C],AOI hit [TN_Rt - TN_Rt_L],AOI hit [TN_Rt - TN_Rt_R],AOI hit [FixG - FixG],AOI hit [TN_Lt - TN_Lt_C],AOI hit [TN_Lt - TN_Lt_L],AOI hit [TN_Lt - TN_Lt_R],AOI hit [TG_Lt - TG_Lt_C],AOI hit [TG_Lt - TG_Lt_L],AOI hit [TG_Lt - TG_Lt_R],AOI hit [FixN - FixN],AOI hit [CueN - CueN_C],AOI hit [CueN - CueN_L],AOI hit [CueN - CueN_R],AOI hit [TA_Rt - TA_Rt_C],AOI hit [TA_Rt - TA_Rt_L],AOI hit [TA_Rt - TA_Rt_R],AOI hit [CueA - CueA_C],AOI hit [CueA - CueA_L],AOI hit [CueA - CueA_R],AOI hit [FixP - FixP],AOI hit [TA_Lt - TA_Lt_C],AOI hit [TA_Lt - TA_Lt_L],AOI hit [TA_Lt - TA_Lt_R],AOI hit [FixA - FixA],AOI hit [Target:Correct],AOI hit [Target:Uncorrect],Client area position X (DACSpx),Client area position Y (DACSpx),Viewport position X,Viewport position Y,Viewport width,Viewport height,Full page width,Full page height,Mouse position X,Mouse position Y
0,0,AD_PAGN,03/05/2020 09:28:24,C01_121,Female,Yes,≥71,GN_C01_121,07/08/2019 13:22:56,13:22:56.310,323753,GNsession,Raw,1.118.21001,1080,1920,10,4,2.7,2,0.34,0.22,0.16,15,10,7,,RecordingStart,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,
1,82,AD_PAGN,03/05/2020 09:28:24,C01_121,Female,Yes,≥71,GN_C01_121,07/08/2019 13:22:56,13:22:56.310,323753,GNsession,Raw,1.118.21001,1080,1920,10,4,2.7,2,0.34,0.22,0.16,15,10,7,,Eye tracker TTL in,255.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,


3. Ensuring the the column names are not messy

In [6]:
data_mci_fixed_columns       =fixing_column_names(data_mci)
data_controls_fixed_columns      =fixing_column_names(data_controls)
data_add_fixed_columns      =fixing_column_names(data_add)

In [7]:
data_mci_fixed_columns.head(5)

Unnamed: 0,recording_timestamp,project_name,export_date,participant_name,gender,glasses,age,recording_name,recording_date,recording_start_time,recording_duration,timeline_name,recording_fixation_filter_name,recording_software_version,recording_resolution_height,recording_resolution_width,recording_monitor_latency,average_calibration_accuracy_mm,average_calibration_precision_sd_mm,average_calibration_precision_rms_mm,average_calibration_accuracy_degrees,average_calibration_precision_sd_degrees,average_calibration_precision_rms_degrees,average_calibration_accuracy_pixels,average_calibration_precision_sd_pixels,average_calibration_precision_rms_pixels,eyetracker_timestamp,event,event_value,gaze_point_x,gaze_point_y,gaze_point_left_x,gaze_point_left_y,gaze_point_right_x,gaze_point_right_y,gaze_direction_left_x,gaze_direction_left_y,gaze_direction_left_z,gaze_direction_right_x,gaze_direction_right_y,gaze_direction_right_z,pupil_diameter_left,pupil_diameter_right,validity_left,validity_right,eye_position_left_x_dacsmm,eye_position_left_y_dacsmm,eye_position_left_z_dacsmm,eye_position_right_x_dacsmm,eye_position_right_y_dacsmm,eye_position_right_z_dacsmm,gaze_point_left_x_dacsmm,gaze_point_left_y_dacsmm,gaze_point_right_x_dacsmm,gaze_point_right_y_dacsmm,gaze_point_x_mcsnorm,gaze_point_y_mcsnorm,gaze_point_left_x_mcsnorm,gaze_point_left_y_mcsnorm,gaze_point_right_x_mcsnorm,gaze_point_right_y_mcsnorm,presented_stimulus_name,presented_media_name,presented_media_width,presented_media_height,presented_media_position_x_dacspx,presented_media_position_y_dacspx,original_media_width,original_media_height,eye_movement_type,gaze_event_duration,eye_movement_type_index,fixation_point_x,fixation_point_y,fixation_point_x_mcsnorm,fixation_point_y_mcsnorm,aoi_hit_tp_lt_-_tp_lt_c,aoi_hit_tp_lt_-_tp_lt_l,aoi_hit_tp_lt_-_tp_lt_r,aoi_hit_tp_rt_-_tp_rt_c,aoi_hit_tp_rt_-_tp_rt_l,aoi_hit_tp_rt_-_tp_rt_r,aoi_hit_cueg_-_cueg_c,aoi_hit_cueg_-_cueg_l,aoi_hit_cueg_-_cueg_r,aoi_hit_tg_rt_-_tg_rt_c,aoi_hit_tg_rt_-_tg_rt_l,aoi_hit_tg_rt_-_tg_rt_r,aoi_hit_cuep_-_cuep_c,aoi_hit_cuep_-_cuep_l,aoi_hit_cuep_-_cuep_r,aoi_hit_tn_rt_-_tn_rt_c,aoi_hit_tn_rt_-_tn_rt_l,aoi_hit_tn_rt_-_tn_rt_r,aoi_hit_fixg_-_fixg,aoi_hit_tn_lt_-_tn_lt_c,aoi_hit_tn_lt_-_tn_lt_l,aoi_hit_tn_lt_-_tn_lt_r,aoi_hit_tg_lt_-_tg_lt_c,aoi_hit_tg_lt_-_tg_lt_l,aoi_hit_tg_lt_-_tg_lt_r,aoi_hit_fixn_-_fixn,aoi_hit_cuen_-_cuen_c,aoi_hit_cuen_-_cuen_l,aoi_hit_cuen_-_cuen_r,aoi_hit_ta_rt_-_ta_rt_c,aoi_hit_ta_rt_-_ta_rt_l,aoi_hit_ta_rt_-_ta_rt_r,aoi_hit_cuea_-_cuea_c,aoi_hit_cuea_-_cuea_l,aoi_hit_cuea_-_cuea_r,aoi_hit_fixp_-_fixp,aoi_hit_ta_lt_-_ta_lt_c,aoi_hit_ta_lt_-_ta_lt_l,aoi_hit_ta_lt_-_ta_lt_r,aoi_hit_fixa_-_fixa,aoi_hit_targetcorrect,aoi_hit_targetuncorrect,client_area_position_x_dacspx,client_area_position_y_dacspx,viewport_position_x,viewport_position_y,viewport_width,viewport_height,full_page_width,full_page_height,mouse_position_x,mouse_position_y
0,0,AD_PAGN,03/05/2020 09:28:24,C01_121,Female,Yes,≥71,GN_C01_121,07/08/2019 13:22:56,13:22:56.310,323753,GNsession,Raw,1.118.21001,1080,1920,10,4,2.7,2,0.34,0.22,0.16,15,10,7,,RecordingStart,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,
1,82,AD_PAGN,03/05/2020 09:28:24,C01_121,Female,Yes,≥71,GN_C01_121,07/08/2019 13:22:56,13:22:56.310,323753,GNsession,Raw,1.118.21001,1080,1920,10,4,2.7,2,0.34,0.22,0.16,15,10,7,,Eye tracker TTL in,255.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,
2,170,AD_PAGN,03/05/2020 09:28:24,C01_121,Female,Yes,≥71,GN_C01_121,07/08/2019 13:22:56,13:22:56.310,323753,GNsession,Raw,1.118.21001,1080,1920,10,4,2.7,2,0.34,0.22,0.16,15,10,7,1225511000.0,,,860.0,515.0,847.0,494.0,872.0,537.0,0.01611,0.01402,-0.99977,-0.06486,0.03449,-0.9973,3.91,3.48,Valid,Valid,222.7,126.8,636.1,281.5,125.5,641.7,232.9,135.7,239.8,147.7,,,,,,,,,,,,,,,Fixation,3.0,1.0,860.0,515.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,
3,173,AD_PAGN,03/05/2020 09:28:24,C01_121,Female,Yes,≥71,GN_C01_121,07/08/2019 13:22:56,13:22:56.310,323753,GNsession,Raw,1.118.21001,1080,1920,10,4,2.7,2,0.34,0.22,0.16,15,10,7,1225514000.0,,,,,,,,,,,,,,,,,Invalid,Invalid,,,,,,,,,,,,,,,,,,,,,,,,,EyesNotFound,3.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,
4,177,AD_PAGN,03/05/2020 09:28:24,C01_121,Female,Yes,≥71,GN_C01_121,07/08/2019 13:22:56,13:22:56.310,323753,GNsession,Raw,1.118.21001,1080,1920,10,4,2.7,2,0.34,0.22,0.16,15,10,7,1225517000.0,,,,,,,,,,,,,,,,,Invalid,Invalid,,,,,,,,,,,,,,,,,,,,,,,,,EyesNotFound,3.0,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,


4. Label the Data, for subsequent classification algorithm development: Mark Patients with MCI as 1 and those with ADD as -1 and those without Dimentia ie controls: 0

In [8]:
data_mci_labelled       =label_data(data_mci_fixed_columns, 1)
data_controls_labelled      =label_data(data_controls_fixed_columns, 0)
data_add_labelled      =label_data(data_add_fixed_columns,-1)

In [9]:
data_mci_labelled.describe()

Unnamed: 0,recording_timestamp,recording_duration,recording_resolution_height,recording_resolution_width,recording_monitor_latency,average_calibration_accuracy_mm,average_calibration_precision_sd_mm,average_calibration_precision_rms_mm,average_calibration_accuracy_degrees,average_calibration_precision_sd_degrees,average_calibration_precision_rms_degrees,average_calibration_accuracy_pixels,average_calibration_precision_sd_pixels,average_calibration_precision_rms_pixels,eyetracker_timestamp,gaze_point_x,gaze_point_y,gaze_point_left_x,gaze_point_left_y,gaze_point_right_x,gaze_point_right_y,gaze_direction_left_x,gaze_direction_left_y,gaze_direction_left_z,gaze_direction_right_x,gaze_direction_right_y,gaze_direction_right_z,pupil_diameter_left,pupil_diameter_right,eye_position_left_x_dacsmm,eye_position_left_y_dacsmm,eye_position_left_z_dacsmm,eye_position_right_x_dacsmm,eye_position_right_y_dacsmm,eye_position_right_z_dacsmm,gaze_point_left_x_dacsmm,gaze_point_left_y_dacsmm,gaze_point_right_x_dacsmm,gaze_point_right_y_dacsmm,gaze_point_x_mcsnorm,gaze_point_y_mcsnorm,gaze_point_left_x_mcsnorm,gaze_point_left_y_mcsnorm,gaze_point_right_x_mcsnorm,gaze_point_right_y_mcsnorm,presented_media_width,presented_media_height,presented_media_position_x_dacspx,presented_media_position_y_dacspx,original_media_width,original_media_height,gaze_event_duration,eye_movement_type_index,fixation_point_x,fixation_point_y,fixation_point_x_mcsnorm,fixation_point_y_mcsnorm,aoi_hit_tp_lt_-_tp_lt_c,aoi_hit_tp_lt_-_tp_lt_l,aoi_hit_tp_lt_-_tp_lt_r,aoi_hit_tp_rt_-_tp_rt_c,aoi_hit_tp_rt_-_tp_rt_l,aoi_hit_tp_rt_-_tp_rt_r,aoi_hit_cueg_-_cueg_c,aoi_hit_cueg_-_cueg_l,aoi_hit_cueg_-_cueg_r,aoi_hit_tg_rt_-_tg_rt_c,aoi_hit_tg_rt_-_tg_rt_l,aoi_hit_tg_rt_-_tg_rt_r,aoi_hit_cuep_-_cuep_c,aoi_hit_cuep_-_cuep_l,aoi_hit_cuep_-_cuep_r,aoi_hit_tn_rt_-_tn_rt_c,aoi_hit_tn_rt_-_tn_rt_l,aoi_hit_tn_rt_-_tn_rt_r,aoi_hit_fixg_-_fixg,aoi_hit_tn_lt_-_tn_lt_c,aoi_hit_tn_lt_-_tn_lt_l,aoi_hit_tn_lt_-_tn_lt_r,aoi_hit_tg_lt_-_tg_lt_c,aoi_hit_tg_lt_-_tg_lt_l,aoi_hit_tg_lt_-_tg_lt_r,aoi_hit_fixn_-_fixn,aoi_hit_cuen_-_cuen_c,aoi_hit_cuen_-_cuen_l,aoi_hit_cuen_-_cuen_r,aoi_hit_ta_rt_-_ta_rt_c,aoi_hit_ta_rt_-_ta_rt_l,aoi_hit_ta_rt_-_ta_rt_r,aoi_hit_cuea_-_cuea_c,aoi_hit_cuea_-_cuea_l,aoi_hit_cuea_-_cuea_r,aoi_hit_fixp_-_fixp,aoi_hit_ta_lt_-_ta_lt_c,aoi_hit_ta_lt_-_ta_lt_l,aoi_hit_ta_lt_-_ta_lt_r,aoi_hit_fixa_-_fixa,aoi_hit_targetcorrect,aoi_hit_targetuncorrect,client_area_position_x_dacspx,client_area_position_y_dacspx,viewport_position_x,viewport_position_y,viewport_width,viewport_height,full_page_width,full_page_height,mouse_position_x,mouse_position_y,label
count,97889.0,97889.0,97889.0,97889.0,97889.0,97889.0,97889.0,97889.0,97889.0,97889.0,97889.0,97889.0,97889.0,97889.0,96971.0,91606.0,91606.0,91527.0,91527.0,91252.0,91252.0,91527.0,91527.0,91527.0,91252.0,91252.0,91252.0,91527.0,91252.0,91527.0,91527.0,91527.0,91252.0,91252.0,91252.0,91527.0,91527.0,91252.0,91252.0,79531.0,79531.0,79493.0,79493.0,79233.0,79233.0,84391.0,84391.0,84391.0,84391.0,84391.0,84391.0,97885.0,97885.0,92466.0,92466.0,80250.0,80250.0,0.0,0.0,0.0,0.0,0.0,0.0,14539.0,14539.0,14539.0,9954.0,9954.0,9954.0,0.0,0.0,0.0,7239.0,7239.0,7239.0,9148.0,6341.0,6341.0,6341.0,17194.0,17194.0,17194.0,4574.0,7277.0,7277.0,7277.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,97889.0,97889.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,184.0,184.0,97889.0
mean,161905.446026,323753.0,1080.0,1920.0,10.0,4.0,2.7,2.0,0.34,0.22,0.16,15.0,10.0,7.0,1387310000.0,919.683263,532.670524,916.759557,533.107793,924.468472,532.836562,0.044621,0.031374,-0.989673,-0.042487,0.032156,-0.990599,3.341274,2.876273,223.317975,126.829297,635.670062,281.885081,125.803387,642.045923,252.108835,146.604544,254.228991,146.530273,0.478337,0.50177,0.479053,0.505456,0.477654,0.496624,1920.0,1080.0,0.0,0.0,1981.0,1114.0,3.002809,43407.067467,919.732583,532.833279,0.478403,0.50172,,,,,,,0.78114,0.029094,0.022629,0.171288,0.030942,0.577758,,,,0.775107,0.0,0.161901,0.750219,0.785838,0.086422,0.0,0.193905,0.580144,0.007677,0.734587,0.865879,0.0,0.0,,,,,,,,,,,,0.268896,0.073542,,,,,,,,,-392.994565,877.668478,1.0
std,93365.948931,0.0,0.0,0.0,0.0,0.0,4.849035e-12,0.0,7.810459e-14,6.103482e-14,1.218198e-13,0.0,0.0,0.0,93324470.0,281.388489,143.797968,288.881001,148.387347,273.705874,139.9199,0.117256,0.058378,0.020386,0.111738,0.055775,0.016666,0.145892,0.142751,0.906945,0.440808,0.545161,0.880964,0.654056,1.352499,79.442116,40.8066,75.269151,38.477749,0.111748,0.032828,0.110769,0.024671,0.113483,0.042726,0.0,0.0,0.0,0.0,0.0,0.0,0.058956,27538.665809,280.518061,143.41399,0.111494,0.032866,,,,,,,0.413487,0.168076,0.148722,0.376779,0.17317,0.493942,,,,0.417541,0.0,0.368385,0.43291,0.410272,0.281008,0.0,0.395367,0.493549,0.087285,0.441601,0.340806,0.0,0.0,,,,,,,,,,,,0.443388,0.261026,,,,,,,,,167.356251,75.988667,0.0
min,0.0,323753.0,1080.0,1920.0,10.0,4.0,2.7,2.0,0.34,0.22,0.16,15.0,10.0,7.0,1225511000.0,-335.0,-564.0,-353.0,-584.0,53.0,-381.0,-0.41465,-0.37879,-0.99998,-0.37032,-0.31718,-0.99998,1.62,1.22,217.7,121.2,620.4,276.4,109.6,627.4,-97.0,-160.6,14.7,-104.9,0.1,0.3331,0.1,0.32,0.1968,0.3418,1920.0,1080.0,0.0,0.0,1981.0,1114.0,3.0,1.0,-335.0,-564.0,0.1,0.3331,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,0.0,0.0,,,,,,,,,-742.0,643.0,1.0
25%,81051.0,323753.0,1080.0,1920.0,10.0,4.0,2.7,2.0,0.34,0.22,0.16,15.0,10.0,7.0,1306509000.0,788.0,524.0,796.0,532.0,789.0,515.0,-0.006055,0.03027,-0.99767,-0.10007,0.02436,-0.99928,3.25,2.79,222.9,126.7,635.4,281.4,125.6,641.6,218.8,146.2,216.9,141.6,0.4317,0.487,0.4393,0.4939,0.4295,0.4788,1920.0,1080.0,0.0,0.0,1981.0,1114.0,3.0,18854.0,790.0,524.0,0.4331,0.4869,,,,,,,1.0,0.0,0.0,0.0,0.0,0.0,,,,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,,,,,,,,,,,,0.0,0.0,,,,,,,,,-493.0,821.75,1.0
50%,161904.0,323753.0,1080.0,1920.0,10.0,4.0,2.7,2.0,0.34,0.22,0.16,15.0,10.0,7.0,1387316000.0,956.0,536.0,953.0,543.0,959.0,529.0,0.06048,0.03512,-0.99688,-0.02871,0.0305,-0.99873,3.34,2.87,223.5,126.9,635.6,282.1,125.9,642.3,262.1,149.3,263.6,145.4,0.4978,0.4964,0.4965,0.5029,0.4991,0.4901,1920.0,1080.0,0.0,0.0,1981.0,1114.0,3.0,43111.0,956.0,536.0,0.4978,0.4963,,,,,,,1.0,0.0,0.0,0.0,0.0,1.0,,,,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,,,,,,,,,,,,0.0,0.0,,,,,,,,,-330.0,866.0,1.0
75%,242760.0,323753.0,1080.0,1920.0,10.0,4.0,2.7,2.0,0.34,0.22,0.16,15.0,10.0,7.0,1468127000.0,989.0,558.0,985.0,560.0,996.0,553.0,0.07405,0.04225,-0.993815,-0.01303,0.04016,-0.986417,3.42,2.95,223.8,127.0,635.9,282.3,126.1,642.7,270.9,154.0,273.8,152.0,0.5128,0.5134,0.5108,0.5164,0.515,0.5088,1920.0,1080.0,0.0,0.0,1981.0,1114.0,3.0,67360.0,989.0,558.0,0.5128,0.5133,,,,,,,1.0,0.0,0.0,0.0,0.0,1.0,,,,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,,,,,,,,,,,,1.0,0.0,,,,,,,,,-297.0,934.25,1.0
max,323753.0,323753.0,1080.0,1920.0,10.0,4.0,2.7,2.0,0.34,0.22,0.16,15.0,10.0,7.0,1548936000.0,2371.0,1495.0,1893.0,1111.0,2697.0,1495.0,0.41465,0.26403,-0.83001,0.55742,0.40645,-0.8271,3.91,3.63,226.2,132.6,647.7,284.7,130.7,686.8,520.5,305.6,741.6,411.1,0.8409,0.9482,0.8186,0.905,0.8409,0.9919,1920.0,1080.0,0.0,0.0,1981.0,1114.0,5.0,91606.0,2371.0,1495.0,0.8409,0.9482,,,,,,,1.0,1.0,1.0,1.0,1.0,1.0,,,,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,,,,,,,,,,,,1.0,1.0,,,,,,,,,-96.0,1003.0,1.0


# Understanding the Dataset

Establishing the extent of Missing values in the Datasets. We may drop features with 100% missing values for the start

1. Dealing with Mild Cognitive Impairment(MCI) data
A number of features in the data have 100% missing values. We have thus dropped them.

In [10]:
draw_missing_data_table(data_mci_labelled)

Unnamed: 0,Total missing Values,% of Missing Values
aoi_hit_tp_rt_-_tp_rt_l,97889,100.0
aoi_hit_cuep_-_cuep_r,97889,100.0
aoi_hit_cuep_-_cuep_c,97889,100.0
aoi_hit_tp_rt_-_tp_rt_r,97889,100.0
aoi_hit_tp_rt_-_tp_rt_c,97889,100.0
aoi_hit_ta_rt_-_ta_rt_c,97889,100.0
aoi_hit_ta_rt_-_ta_rt_l,97889,100.0
aoi_hit_ta_rt_-_ta_rt_r,97889,100.0
aoi_hit_cuea_-_cuea_c,97889,100.0
aoi_hit_cuea_-_cuea_l,97889,100.0


2. View Missing values of the Controls Dataset

In [11]:
draw_missing_data_table(data_controls_labelled)

Unnamed: 0,Total missing Values,% of Missing Values
aoi_hit_cuea_-_cuea_c,96554,100.0
aoi_hit_ta_lt_-_ta_lt_r,96554,100.0
aoi_hit_ta_rt_-_ta_rt_l,96554,100.0
aoi_hit_tp_lt_-_tp_lt_r,96554,100.0
aoi_hit_tp_lt_-_tp_lt_l,96554,100.0
aoi_hit_tp_lt_-_tp_lt_c,96554,100.0
aoi_hit_cuea_-_cuea_r,96554,100.0
aoi_hit_cuea_-_cuea_l,96554,100.0
aoi_hit_ta_rt_-_ta_rt_r,96554,100.0
aoi_hit_ta_lt_-_ta_lt_l,96554,100.0


3. View missing value of the ADD Dataset

In [12]:
draw_missing_data_table(data_add_labelled)

Unnamed: 0,Total missing Values,% of Missing Values
aoi_hit_cuea_-_cuea_c,94734,100.0
aoi_hit_tp_lt_-_tp_lt_c,94734,100.0
aoi_hit_tp_lt_-_tp_lt_r,94734,100.0
aoi_hit_cuea_-_cuea_r,94734,100.0
aoi_hit_cuea_-_cuea_l,94734,100.0
aoi_hit_ta_lt_-_ta_lt_r,94734,100.0
aoi_hit_ta_lt_-_ta_lt_l,94734,100.0
aoi_hit_ta_lt_-_ta_lt_c,94734,100.0
aoi_hit_tp_rt_-_tp_rt_r,94734,100.0
aoi_hit_fixa_-_fixa,94734,100.0


In [13]:
#Filter out the features with more than 100% NULL values and then drop those columns from the DataFrame
data_mci_with_minimal_missing_values = drop_null_columns(data_mci_labelled,0.8)
data_controls_with_minimal_missing_values = drop_null_columns(data_controls_labelled,0.8)
data_add_with_minimal_missing_values = drop_null_columns(data_add_labelled,0.8)

In [14]:
draw_missing_data_table(data_mci_with_minimal_missing_values)

Unnamed: 0,Total missing Values,% of Missing Values
gaze_point_right_x_mcsnorm,18656,19.058321
gaze_point_right_y_mcsnorm,18656,19.058321
gaze_point_left_x_mcsnorm,18396,18.792714
gaze_point_left_y_mcsnorm,18396,18.792714
gaze_point_x_mcsnorm,18358,18.753895
gaze_point_y_mcsnorm,18358,18.753895
fixation_point_y_mcsnorm,17639,18.019389
fixation_point_x_mcsnorm,17639,18.019389
presented_media_position_y_dacspx,13498,13.789088
presented_media_width,13498,13.789088


Let's Create datasets that willl be used for development of classifcation models. We'll datasets comprising of;
1. MCI and Controls labelled 1 and 0 respectively
2. ADD and Controls labelled -1 and 0 respectively
3. A dataset containing MCI, ADD and controls labelled 1, 0 and -1 respectively

In [15]:
mci_controls_data = combine_two_dataframes(data_mci_with_minimal_missing_values, data_controls_with_minimal_missing_values)
add_controls_data = combine_two_dataframes(data_add_with_minimal_missing_values, data_controls_with_minimal_missing_values)
mci_add_controls_data = combine_two_dataframes(mci_controls_data, data_add_with_minimal_missing_values)

# Data PreProcessing

1. Performing Encoding of the Features

# Feature Selection

Now that we have the Data, we'll have to perform the following;
1. Select the best performing Features
2. Perform Some Cross Validation
3. Build, Train and Test Our Models off the Data
4. Choose the best performing model

In [16]:
mci_controls_data.tail(4)

Unnamed: 0,recording_timestamp,project_name,export_date,participant_name,gender,glasses,age,recording_name,recording_date,recording_start_time,recording_duration,timeline_name,recording_fixation_filter_name,recording_software_version,recording_resolution_height,recording_resolution_width,recording_monitor_latency,average_calibration_accuracy_mm,average_calibration_precision_sd_mm,average_calibration_precision_rms_mm,average_calibration_accuracy_degrees,average_calibration_precision_sd_degrees,average_calibration_precision_rms_degrees,average_calibration_accuracy_pixels,average_calibration_precision_sd_pixels,average_calibration_precision_rms_pixels,eyetracker_timestamp,gaze_point_x,gaze_point_y,gaze_point_left_x,gaze_point_left_y,gaze_point_right_x,gaze_point_right_y,gaze_direction_left_x,gaze_direction_left_y,gaze_direction_left_z,gaze_direction_right_x,gaze_direction_right_y,gaze_direction_right_z,pupil_diameter_left,pupil_diameter_right,validity_left,validity_right,eye_position_left_x_dacsmm,eye_position_left_y_dacsmm,eye_position_left_z_dacsmm,eye_position_right_x_dacsmm,eye_position_right_y_dacsmm,eye_position_right_z_dacsmm,gaze_point_left_x_dacsmm,gaze_point_left_y_dacsmm,gaze_point_right_x_dacsmm,gaze_point_right_y_dacsmm,gaze_point_x_mcsnorm,gaze_point_y_mcsnorm,gaze_point_left_x_mcsnorm,gaze_point_left_y_mcsnorm,gaze_point_right_x_mcsnorm,gaze_point_right_y_mcsnorm,presented_stimulus_name,presented_media_name,presented_media_width,presented_media_height,presented_media_position_x_dacspx,presented_media_position_y_dacspx,original_media_width,original_media_height,eye_movement_type,gaze_event_duration,eye_movement_type_index,fixation_point_x,fixation_point_y,fixation_point_x_mcsnorm,fixation_point_y_mcsnorm,aoi_hit_targetcorrect,aoi_hit_targetuncorrect,label
194439,319140,AD_PAGN,03/11/2020 19:11:03,C01_00141,Female,No,≥70,GN_C01_00141,01/28/2020 15:38:16,15:38:16.120,319321,GNsession,Raw,1.118.21001,1080,1920,10,0.4,1.3,1.1,0.04,0.11,0.1,2,5,4,8827743000.0,,,,,,,,,,,,,,,Invalid,Invalid,,,,,,,,,,,,,,,,,GoodBye,GoodBye.BMP,1920.0,1080.0,0.0,0.0,1920.0,1080.0,EyesNotFound,3.0,840.0,,,,,0,0,0
194440,319143,AD_PAGN,03/11/2020 19:11:03,C01_00141,Female,No,≥70,GN_C01_00141,01/28/2020 15:38:16,15:38:16.120,319321,GNsession,Raw,1.118.21001,1080,1920,10,0.4,1.3,1.1,0.04,0.11,0.1,2,5,4,8827746000.0,,,,,,,,,,,,,,,Invalid,Invalid,,,,,,,,,,,,,,,,,GoodBye,GoodBye.BMP,1920.0,1080.0,0.0,0.0,1920.0,1080.0,EyesNotFound,3.0,841.0,,,,,0,0,0
194441,319160,AD_PAGN,03/11/2020 19:11:03,C01_00141,Female,No,≥70,GN_C01_00141,01/28/2020 15:38:16,15:38:16.120,319321,GNsession,Raw,1.118.21001,1080,1920,10,0.4,1.3,1.1,0.04,0.11,0.1,2,5,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0
194442,319321,AD_PAGN,03/11/2020 19:11:03,C01_00141,Female,No,≥70,GN_C01_00141,01/28/2020 15:38:16,15:38:16.120,319321,GNsession,Raw,1.118.21001,1080,1920,10,0.4,1.3,1.1,0.04,0.11,0.1,2,5,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0


In [17]:
mci_controls_data.columns

Index(['recording_timestamp', 'project_name', 'export_date', 'participant_name', 'gender', 'glasses', 'age', 'recording_name', 'recording_date', 'recording_start_time', 'recording_duration', 'timeline_name', 'recording_fixation_filter_name', 'recording_software_version', 'recording_resolution_height', 'recording_resolution_width', 'recording_monitor_latency', 'average_calibration_accuracy_mm', 'average_calibration_precision_sd_mm', 'average_calibration_precision_rms_mm', 'average_calibration_accuracy_degrees', 'average_calibration_precision_sd_degrees', 'average_calibration_precision_rms_degrees', 'average_calibration_accuracy_pixels', 'average_calibration_precision_sd_pixels', 'average_calibration_precision_rms_pixels', 'eyetracker_timestamp', 'gaze_point_x', 'gaze_point_y', 'gaze_point_left_x', 'gaze_point_left_y', 'gaze_point_right_x', 'gaze_point_right_y', 'gaze_direction_left_x', 'gaze_direction_left_y', 'gaze_direction_left_z', 'gaze_direction_right_x', 'gaze_direction_right_y',


We have considered the features below for builing of the Model
eyetracker_timestamp', 'gaze_point_x', 'gaze_point_y', 'gaze_point_left_x','gaze_point_left_y','gaze_point_right_x', 'gaze_point_right_y', 'gaze_direction_left_x', 'gaze_direction_left_y', 'gaze_direction_left_z', 'gaze_direction_right_x', 'gaze_direction_right_y','gaze_direction_right_z', 'pupil_diameter_left', 'pupil_diameter_right', 'validity_left', 'validity_right',  'fixation_point_x', 'fixation_point_y', 'label'

From Observation, the features that capture the configurations on the application 
capturing the data eg recording_software_version were dropped

In [18]:
mci_controls_data_selected_features= mci_controls_data[['eyetracker_timestamp', 'gaze_point_x', 'gaze_point_y', 'gaze_point_left_x', 'gaze_point_left_y', 'gaze_point_right_x', 
                                                             'gaze_point_right_y', 'gaze_direction_left_x', 'gaze_direction_left_y', 'gaze_direction_left_z', 'gaze_direction_right_x', 'gaze_direction_right_y',
       'gaze_direction_right_z', 'pupil_diameter_left', 'pupil_diameter_right', 'validity_left', 'validity_right',  'fixation_point_x', 'fixation_point_y', 'label'
                                                             ]]

In [19]:
add_controls_data_selected_features= add_controls_data[['eyetracker_timestamp', 'gaze_point_x', 'gaze_point_y', 'gaze_point_left_x', 'gaze_point_left_y', 'gaze_point_right_x', 
                                                             'gaze_point_right_y', 'gaze_direction_left_x', 'gaze_direction_left_y', 'gaze_direction_left_z', 'gaze_direction_right_x', 'gaze_direction_right_y',
       'gaze_direction_right_z', 'pupil_diameter_left', 'pupil_diameter_right', 'validity_left', 'validity_right',  'fixation_point_x', 'fixation_point_y', 'label'
                                                             ]]

In [20]:
mci_add_controls_data_selected_features= mci_add_controls_data[['eyetracker_timestamp', 'gaze_point_x', 'gaze_point_y', 'gaze_point_left_x', 'gaze_point_left_y', 'gaze_point_right_x', 
                                                             'gaze_point_right_y', 'gaze_direction_left_x', 'gaze_direction_left_y', 'gaze_direction_left_z', 'gaze_direction_right_x', 'gaze_direction_right_y',
       'gaze_direction_right_z', 'pupil_diameter_left', 'pupil_diameter_right', 'validity_left', 'validity_right',  'fixation_point_x', 'fixation_point_y', 'label'
                                                             ]]

In [21]:
#Let's remove the Null Records from the Data. 
mci_controls_data_selected = mci_controls_data_selected_features.dropna()
add_controls_data_selected = add_controls_data_selected_features.dropna()
mci_add_controls_data_selected = mci_add_controls_data_selected_features.dropna()

Lets encode the Data. Convert Any text based values in the data into digits. For the ML algorithms to use

In [22]:
mci_controls_encoded = encode_data(mci_controls_data_selected)
add_controls_encoded = encode_data(add_controls_data_selected)
mci_add_controls_encoded = encode_data(mci_add_controls_data_selected)

Separating the Features(Independent Variables) from the Labels(Dependent/Target) Variable

In [23]:
#Dataset with MCI + Controls
mci_controls_features = mci_controls_encoded.drop(['label'], axis=1)
mci_controls_label = mci_controls_encoded['label']

#ADD + Controls
add_controls_features = add_controls_encoded.drop(['label'], axis=1)
add_controls_label = add_controls_encoded['label']

#MCI+ADD+Controls
mci_add_controls_features = mci_add_controls_encoded.drop(['label'], axis=1)
mci_add_controls_label = mci_add_controls_encoded['label']


In [24]:
mci_controls_features.columns

Index(['eyetracker_timestamp', 'gaze_point_x', 'gaze_point_y', 'gaze_point_left_x', 'gaze_point_left_y', 'gaze_point_right_x', 'gaze_point_right_y', 'gaze_direction_left_x', 'gaze_direction_left_y', 'gaze_direction_left_z', 'gaze_direction_right_x', 'gaze_direction_right_y', 'gaze_direction_right_z', 'pupil_diameter_left', 'pupil_diameter_right', 'validity_left', 'validity_right', 'fixation_point_x', 'fixation_point_y'], dtype='object')

# Feature Selection

Selecting the Right Features/Variables for the Model Training

I have used univariate measures(Specifically the, Analysis of Variance (ANOVA) statistical test) to select a variable based on its level of association with its target(MCI, AD or Control), 
the class SelectPercentile provides an automatic procedure 
for keeping only a certain percentage of the best, associated features.

The order of relevance of the variables/features to the target is eyetracker_timestamp , pupil_diameter_left,	pupil_diameter_right ,gaze_direction_right_y,	gaze_direction_left_y, gaze_direction_right_z, gaze_direction_left_z, gaze_point_right_y, gaze_point_y,	fixation_point_y, gaze_point_left_y, gaze_point_right_x, gaze_direction_right_x , gaze_point_x and	fixation_point_x respectively with eyetracker_timestamp the most relevant and fixation_point_x least relevant

In [25]:
#I have started by picking the 10 To Performing Features, this can be view in the next Cells
mci_controls_best_features = view_best_features(mci_controls_features,mci_controls_label,10)
add_controls_best_features = view_best_features(add_controls_features,add_controls_label,10)
mci_add_controls_best_features = view_best_features(mci_add_controls_features,mci_add_controls_label,10)



In [26]:
mci_controls_best_features.head(10)

Unnamed: 0,Feature,Scores
0,eyetracker_timestamp,288072600.0
7,pupil_diameter_left,1001376.0
8,pupil_diameter_right,39059.18
5,gaze_direction_right_y,6773.853
3,gaze_direction_left_y,6647.441
6,gaze_direction_right_z,867.0755
4,gaze_direction_left_z,745.561
2,gaze_point_right_y,499.3661
1,gaze_point_y,482.7611
9,fixation_point_y,482.7611


# Training of Model

1. Using MCI Data

In [27]:
X_train, X_test, y_train, y_test = train_test_split(mci_controls_features, mci_controls_label, test_size=0.3, random_state=40)

In [28]:
X_train, X_test = scale_training_set(X_train, X_test)

  return self.partial_fit(X, y)


1 Declare Base Classifers we'll use for Building our Models

In [33]:
svmClassifier = SVC(kernel='linear',class_weight='balanced',probability=True)
lg = LogisticRegression()
dt = DecisionTreeClassifier()

In [36]:
#Support Vector Machine 
svmClassifier.fit(X_train,y_train)
svm_predict = svmClassifier.predict(X_test)
print("MCI-Controls(SVC Accuracy): %0.2f" % accuracy_score(y_test,svm_predict))

#Logistic Regression Classifer
lg.fit(X_train, y_train)
y_predict = lg.predict(X_test)
print("MCI-Controls(Logistic Regression Accuracy): %0.2f" % accuracy_score(y_test,y_predict))

#Decision Tree Classifer
dt.fit(X_train, y_train)
y_predict = dt.predict(X_test)
print("MCI-Controls(Decision Trees Accuracy): %0.2f" % accuracy_score(y_test,y_predict))

MCI-Controls(SVC Accuracy): 1.00
MCI-Controls(Logistic Regression Accuracy): 1.00
MCI-Controls(Decision Trees Accuracy): 1.00


2. Using ADD Data

In [37]:
X_train, X_test, y_train, y_test = train_test_split(add_controls_features, add_controls_label, test_size=0.3, random_state=40)

In [38]:
X_train, X_test = scale_training_set(X_train, X_test)

  return self.partial_fit(X, y)


In [39]:
#Support Vector Machine 
svmClassifier.fit(X_train,y_train)
svm_predict = svmClassifier.predict(X_test)
print("ADD-Controls(SVC Accuracy): %0.2f " % accuracy_score(y_test,svm_predict))

#Logistic Regression
lg.fit(X_train, y_train)
y_predict = lg.predict(X_test)
print("ADD-Controls(Logistic Regression Accuracy): %0.2f" % accuracy_score(y_test,y_predict))


#Decision Tree Classifer
dt.fit(X_train, y_train)
y_predict = dt.predict(X_test)
print("ADD-Controls(Decision Trees Accuracy): %0.2f " % accuracy_score(y_test,y_predict))

ADD-Controls(SVC Accuracy): 1.00 
ADD-Controls(Logistic Regression Accuracy): 1.00
ADD-Controls(Decision Trees Accuracy): 1.00 


3. Using the MCI -ADD -Controls Dataset

In [40]:
X_train, X_test, y_train, y_test = train_test_split(mci_add_controls_features, mci_add_controls_label, test_size=0.3, random_state=40)

In [41]:
X_train, X_test = scale_training_set(X_train, X_test)

  return self.partial_fit(X, y)


In [42]:
#Support Vector Machine 
svmClassifier.fit(X_train,y_train)
svm_predict = svmClassifier.predict(X_test)
print("MCI-ADD-Controls(SVC Accuracy): %0.2f" % accuracy_score(y_test,svm_predict))

#Logistic Regression
lg.fit(X_train, y_train)
y_predict = lg.predict(X_test)
print("MCI-ADD-Controls(Logistic Regression Accuracy): %0.2f" % accuracy_score(y_test,y_predict))


#Decision Tree Classifer
dt.fit(X_train, y_train)
y_predict = dt.predict(X_test)
print("MCI-ADD-Controls(Decision Trees Accuracy): %0.2f" % accuracy_score(y_test,y_predict))


MCI-ADD-Controls(SVC Accuracy): 1.00
MCI-ADD-Controls(Logistic Regression Accuracy): 1.00
MCI-ADD-Controls(Decision Trees Accuracy): 1.00
