<a href="https://colab.research.google.com/github/harsh-hks-580/Alzheimer-s-Detetction/blob/main/b1370a31a0de55f67dc4747c3d29d9d7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>




<p style="text-align: center"><img src="https://gitlab.aicrowd.com/aicrowd/assets/-/raw/master/challenges/clock-decomposition/notebook-banner.jpg?inline=false" alt="Drawing" style="height: 400px;"/></p>

# What is the notebook about?

The challenge is to use the features extracted from the Clock Drawing Test to build an automated and algorithm to predict whether each participant is one of three phases:

1)    Pre-Alzheimer’s (Early Warning)
2)    Post-Alzheimer’s (Detection)
3)    Normal (Not an Alzheimer’s patient)

In machine learning terms: this is a 3-class classification task.

# How to use this notebook? 📝

<p style="text-align: center"><img src="https://gitlab.aicrowd.com/aicrowd/assets/-/raw/master/notebook/aicrowd_notebook_submission_flow.png?inline=false" alt="notebook overview" style="width: 650px;"/></p>

- **Update the config parameters**. You can define the common variables here

Variable | Description
--- | ---
`AICROWD_DATASET_PATH` | Path to the file containing test data (The data will be available at `/ds_shared_drive/` on aridhia workspace). This should be an absolute path.
`AICROWD_PREDICTIONS_PATH` | Path to write the output to.
`AICROWD_ASSETS_DIR` | In case your notebook needs additional files (like model weights, etc.,), you can add them to a directory and specify the path to the directory here (please specify relative path). The contents of this directory will be sent to AIcrowd for evaluation.
`AICROWD_API_KEY` | In order to submit your code to AIcrowd, you need to provide your account's API key. This key is available at https://www.aicrowd.com/participants/me

- **Installing packages**. Please use the [Install packages 🗃](#install-packages-) section to install the packages
- **Training your models**. All the code within the [Training phase ⚙️](#training-phase-) section will be skipped during evaluation. **Please make sure to save your model weights in the assets directory and load them in the predictions phase section** 

# Setup AIcrowd Utilities 🛠

We use this to bundle the files for submission and create a submission on AIcrowd. Do not edit this block.

In [None]:
!pip install -q -U aicrowd-cli

In [None]:
%load_ext aicrowd.magic

# AIcrowd Runtime Configuration 🧷

Define configuration parameters. Please include any files needed for the notebook to run under `ASSETS_DIR`. We will copy the contents of this directory to your final submission file 🙂

The dataset is available under `/ds_shared_drive` on the workspace.

In [None]:
import os

# Please use the absolute for the location of the dataset.
# Or you can use relative path with `os.getcwd() + "test_data/validation.csv"`
AICROWD_DATASET_PATH = os.getenv("DATASET_PATH", "/ds_shared_drive/validation.csv")
AICROWD_PREDICTIONS_PATH = os.getenv("PREDICTIONS_PATH", "predictions.csv")
AICROWD_ASSETS_DIR = "assets"

# Install packages 🗃

Please add all pacakage installations in this section

In [None]:
!pip install scikit-learn
!pip install catboost



# Define preprocessing code 💻

The code that is common between the training and the prediction sections should be defined here. During evaluation, we completely skip the training section. Please make sure to add any common logic between the training and prediction sections here.

### Import common packages

Please import packages that are common for training and prediction phases here.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # data visualization
from sklearn.metrics import f1_score, log_loss, balanced_accuracy_score, roc_auc_score, make_scorer, confusion_matrix, plot_confusion_matrix
import pickle
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn import preprocessing
import catboost
from catboost import CatBoostClassifier, Pool
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.utils import class_weight
%matplotlib inline
# pd.set_option('display.max_rows', None)

# Training phase ⚙️

You can define your training code here. This sections will be skipped during evaluation.

## Load training data

In [None]:
df2 = pd.read_csv(os.getenv("DATASET_PATH", "/ds_shared_drive/train.csv"))
x_test = pd.read_csv(os.getenv("DATASET_PATH", "/ds_shared_drive/validation.csv"))
y_test = pd.read_csv(os.getenv("DATASET_PATH", "/ds_shared_drive/validation_ground_truth.csv"))
test = pd.merge(x_test,y_test,on='row_id')
df = df2.append(test)
df.drop(columns=['row_id'],inplace=True)
col_names = df.columns

In [None]:
# col_drop = ['actual_hour_digit','actual_minute_digit','single_hand_length' ,'between_digits_angle_ccw_sum']
# col_names =list(set(df.columns)-set(col_drop))
cat_var = ['missing_digit_1', 'missing_digit_2', 'missing_digit_3', 'missing_digit_4', 'missing_digit_5',
           'missing_digit_6', 'missing_digit_7', 'missing_digit_8', 'missing_digit_9', 'missing_digit_10',
           'missing_digit_11', 'missing_digit_12', 'sequence_flag_cw', 'sequence_flag_ccw', 'hand_count_dummy',
           'intersection_pos_rel_centre', 'hour_pointing_digit', 'minute_pointing_digit', 'eleven_ten_error',
           'other_error', 'pred_tremor', 'centre_dot_detect']
cont_var = list(set(col_names)-set(cat_var)-set(['diagnosis']))
print(len(cat_var),len(cont_var))

22 98


In [None]:
# df.drop(col_drop,axis=1,inplace=True)
for col in cat_var:
    df[col].fillna(1000,inplace=True)
# def_val = {}
# for col in cont_var :
#     def_val[col] = df[col].mean()
#     df[col].fillna(def_val[col],inplace=True)

In [None]:
a_file = open(AICROWD_ASSETS_DIR+"/cat_var.pkl", "wb")
pickle.dump(cat_var, a_file)
a_file.close()
a_file = open(AICROWD_ASSETS_DIR+"/cont_var.pkl", "wb")
pickle.dump(cont_var, a_file)
a_file.close()
# a_file = open(AICROWD_ASSETS_DIR+"/def_val.pkl", "wb")
# pickle.dump(def_val, a_file)
# a_file.close()

In [None]:
# df.drop(col_drop,axis=1,inplace=True)
df[[feature for feature in cat_var if feature != 'intersection_pos_rel_centre']] = df[[feature for feature in cat_var if feature != 'intersection_pos_rel_centre']].astype(int)
x_train, x_test, y_train, y_test = train_test_split(df.drop(['diagnosis'], axis=1), df['diagnosis'], 
                                                    test_size=0.01, stratify=df['diagnosis'], random_state=42)
# x_train = df.drop(['diagnosis'],axis=1).copy()
# y_train = df['diagnosis']

In [None]:
x_train.shape

(32807, 120)

In [None]:
# ct = ColumnTransformer([
#         ('ColumnTransform', StandardScaler(), cont_var)
#     ], remainder='passthrough')

# x_train = ct.fit_transform(x_train)
# x_test = ct.transform(x_test)

In [None]:
y_train.head()

26094    normal
15378    normal
26099    normal
3149     normal
15206    normal
Name: diagnosis, dtype: object

## Train your model

In [None]:
my_model = CatBoostClassifier(iterations=300,learning_rate=0.05,max_depth = 7,
                              loss_function='MultiClassOneVsAll',auto_class_weights='SqrtBalanced',
                             early_stopping_rounds=10)
my_model.fit(x_train, y_train,eval_set=(x_test,y_test),
             cat_features=cat_var, verbose=100)

0:	learn: 0.6656025	test: 0.6658973	best: 0.6658973 (0)	total: 299ms	remaining: 1m 29s
100:	learn: 0.2936796	test: 0.3086743	best: 0.3084828 (96)	total: 27.5s	remaining: 54.2s
Stopped by overfitting detector  (10 iterations wait)

bestTest = 0.3051860565
bestIteration = 129

Shrink model to first 130 iterations.


<catboost.core.CatBoostClassifier at 0x7f1f5075a640>

In [None]:
preds =  my_model.predict_proba(x_test)
for i,x in enumerate(preds):
    preds[i] = preds[i]/(preds[i][0]+preds[i][1]+preds[i][2])
print(f1_score(y_test, my_model.predict(x_test),average='macro'),log_loss(y_test,preds))

0.4506883090199301 0.26749970379068166


## Save your trained model

In [None]:
filename = AICROWD_ASSETS_DIR+"/finalized_model.sav"
pickle.dump(my_model, open(filename, 'wb'))

# Prediction phase 🔎

Please make sure to save the weights from the training section in your assets directory and load them in this section

In [None]:
filename = AICROWD_ASSETS_DIR+"/finalized_model.sav"
model = pickle.load(open(filename, 'rb'))
a_file = open(AICROWD_ASSETS_DIR+"/cat_var.pkl", "rb")
cat_var = pickle.load(a_file)
a_file.close()
a_file = open(AICROWD_ASSETS_DIR+"/cont_var.pkl", "rb")
cont_var = pickle.load(a_file)
# a_file.close()
# a_file = open(AICROWD_ASSETS_DIR+"/col_drop.pkl", "rb")
# col_drop = pickle.load(a_file)
# a_file.close()

## Load test data

In [None]:
test_data = pd.read_csv(AICROWD_DATASET_PATH)
# test_data.drop(col_drop,axis=1,inplace=True)

In [None]:
test_data[cat_var] = test_data[cat_var].fillna(1000)
test_data[[feature for feature in cat_var if feature != 'intersection_pos_rel_centre']] = test_data[[feature for feature in cat_var if feature != 'intersection_pos_rel_centre']].astype(int)

In [None]:
preds = model.predict_proba(test_data.drop(['row_id'], axis=1))

## Generate predictions

In [None]:
predictions = {
    "row_id":test_data["row_id"].values,
    "normal_diagnosis_probability": preds[:, 0],
    "post_alzheimer_diagnosis_probability": preds[:, 1],
    "pre_alzheimer_diagnosis_probability": preds[:, 2],
}

predictions_df = pd.DataFrame.from_dict(predictions)

In [None]:
pred_sum = predictions_df['normal_diagnosis_probability'] + predictions_df['post_alzheimer_diagnosis_probability'] + predictions_df['pre_alzheimer_diagnosis_probability']
predictions_df['normal_diagnosis_probability'] /= pred_sum 
predictions_df['post_alzheimer_diagnosis_probability'] /= pred_sum 
predictions_df['pre_alzheimer_diagnosis_probability'] /= pred_sum
predictions_df['normal_diagnosis_probability'] + predictions_df['post_alzheimer_diagnosis_probability'] + predictions_df['pre_alzheimer_diagnosis_probability']

0      1.0
1      1.0
2      1.0
3      1.0
4      1.0
      ... 
357    1.0
358    1.0
359    1.0
360    1.0
361    1.0
Length: 362, dtype: float64

In [None]:
predictions_df.head()

Unnamed: 0,row_id,normal_diagnosis_probability,post_alzheimer_diagnosis_probability,pre_alzheimer_diagnosis_probability
0,LA9JQ1JZMJ9D2MBZV,0.604323,0.263683,0.131994
1,PSSRCWAPTAG72A1NT,0.588046,0.220323,0.191631
2,GCTODIZJB42VCBZRZ,0.955143,0.024025,0.020832
3,7YMVQGV1CDB1WZFNE,0.317031,0.581333,0.101636
4,PHEQC6DV3LTFJYIJU,0.58283,0.334544,0.082626


## Save predictions 📨

In [None]:
predictions_df.to_csv(AICROWD_PREDICTIONS_PATH, index=False)

# Submit to AIcrowd 🚀

**NOTE: PLEASE SAVE THE NOTEBOOK BEFORE SUBMITTING IT (Ctrl + S)**

In [None]:
!DATASET_PATH=$AICROWD_DATASET_PATH \
aicrowd notebook submit \
    --assets-dir $AICROWD_ASSETS_DIR \
    --challenge addi-alzheimers-detection-challenge

Using notebook: /home/desktop2/Desktop/workspace/harsh_submit.ipynb for submission...
Removing existing files from submission directory...
Scrubbing API keys from the notebook...
Collecting notebook...
Validating the submission...
Executing install.ipynb...
[NbConvertApp] Converting notebook /home/desktop2/Desktop/workspace/submission/install.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python
[NbConvertApp] Writing 4066 bytes to /home/desktop2/Desktop/workspace/submission/install.nbconvert.ipynb
Executing predict.ipynb...
[NbConvertApp] Converting notebook /home/desktop2/Desktop/workspace/submission/predict.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python
[NbConvertApp] Writing 16221 bytes to /home/desktop2/Desktop/workspace/submission/predict.nbconvert.ipynb
[2K[1;34msubmission.zip[0m [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m100.0%[0m • [32m5.9/5.8 MB[0m • [31m2.5 MB/s[0m • [36m0:00:00[0m[0m • [36m0:00:01[0m[36m0:00:01