<p style="text-align: center"><img src="https://gitlab.aicrowd.com/aicrowd/assets/-/raw/master/challenges/clock-decomposition/notebook-banner.jpg?inline=false" alt="Drawing" style="height: 400px;"/></p>


# What is the notebook about?

The challenge is to use the features extracted from the Clock Drawing Test to build an automated and algorithm to predict whether each participant is one of three phases:

1)    Pre-Alzheimer’s (Early Warning)
2)    Post-Alzheimer’s (Detection)
3)    Normal (Not an Alzheimer’s patient)

In machine learning terms: this is a 3-class classification task.

# How to use this notebook? 📝

<p style="text-align: center"><img src="https://gitlab.aicrowd.com/aicrowd/assets/-/raw/master/notebook/aicrowd_notebook_submission_flow.png?inline=false" alt="notebook overview" style="width: 650px;"/></p>

- **Update the config parameters**. You can define the common variables here

Variable | Description
--- | ---
`AICROWD_DATASET_PATH` | Path to the file containing test data (The data will be available at `/ds_shared_drive/` on aridhia workspace). This should be an absolute path.
`AICROWD_PREDICTIONS_PATH` | Path to write the output to.
`AICROWD_ASSETS_DIR` | In case your notebook needs additional files (like model weights, etc.,), you can add them to a directory and specify the path to the directory here (please specify relative path). The contents of this directory will be sent to AIcrowd for evaluation.
`AICROWD_API_KEY` | In order to submit your code to AIcrowd, you need to provide your account's API key. This key is available at https://www.aicrowd.com/participants/me

- **Installing packages**. Please use the [Install packages 🗃](#install-packages-) section to install the packages
- **Training your models**. All the code within the [Training phase ⚙️](#training-phase-) section will be skipped during evaluation. **Please make sure to save your model weights in the assets directory and load them in the predictions phase section** 

# Setup AIcrowd Utilities 🛠

We use this to bundle the files for submission and create a submission on AIcrowd. Do not edit this block.

In [None]:
!pip install -q -U aicrowd-cli --use-feature=2020-resolver

In [None]:
%load_ext aicrowd.magic

The aicrowd.magic extension is already loaded. To reload it, use:
  %reload_ext aicrowd.magic


In [None]:
!pip install -q numpy pandas scikit-learn
!pip install -q -U fastcore fastai

# AIcrowd Runtime Configuration 🧷

Define configuration parameters. Please include any files needed for the notebook to run under `ASSETS_DIR`. We will copy the contents of this directory to your final submission file 🙂

The dataset is available under `/ds_shared_drive` on the workspace.

In [None]:
import os

# Please use the absolute for the location of the dataset.
# Or you can use relative path with `os.getcwd() + "test_data/validation.csv"`
AICROWD_DATASET_PATH = os.getenv("DATASET_PATH", "/ds_shared_drive/validation.csv")
AICROWD_PREDICTIONS_PATH = os.getenv("PREDICTIONS_PATH", "predictions.csv")
AICROWD_ASSETS_DIR = "assets"


# Install packages 🗃

Please add all pacakage installations in this section

In [None]:
!pip install numpy pandas



In [None]:
!pip install -q numpy pandas scikit-learn
!pip install -q -U fastcore fastai
!pip install tensorflow_decision_forests
!pip install wurlitzer



# Define preprocessing code 💻

The code that is common between the training and the prediction sections should be defined here. During evaluation, we completely skip the training section. Please make sure to add any common logic between the training and prediction sections here.

### Import common packages

Please import packages that are common for training and prediction phases here.

In [None]:
import numpy as np
import pandas as pd
import math
from fastai.tabular.all import *
import sklearn as sk
from sklearn.metrics import f1_score, log_loss
import tensorflow as tf
import tensorflow_decision_forests as tfdf
import pickle
from sklearn import preprocessing
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import f1_score, log_loss
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import make_multilabel_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.utils import shuffle
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF


pd.set_option("display.max_columns", None)

# Training phase ⚙️

You can define your training code here. This sections will be skipped during evaluation.

## Load training data

In [None]:
df = pd.read_csv(os.getenv("DATASET_PATH", "/ds_shared_drive/train.csv"))
df_val = pd.read_csv(os.getenv("DATASET_PATH", "/ds_shared_drive/validation.csv"))

# Still not done, here we smash the previous dataset with the ground-truth
df_val = pd.merge(df_val, pd.read_csv(os.getenv("DATASET_PATH", "/ds_shared_drive/validation_ground_truth.csv")), how='left', on='row_id')

# col =[]
# for i,name in enumerate(df.columns):
#     if i>0:
#         col.append(name)

# len(col)
# df.dropna(axis='rows',subset=col,inplace=True)
# df.reset_index(drop=True)
# load your data

In [None]:
df['intersection_pos_rel_centre'].fillna('N', inplace=True)
df_val['intersection_pos_rel_centre'].fillna('N', inplace=True)

df_dummies = pd.get_dummies(df['intersection_pos_rel_centre'], columns='intersection_pos_rel_centre',
                          dummy_na=True).add_prefix('c_')

df_val_dummies = pd.get_dummies(df_val['intersection_pos_rel_centre'], columns='intersection_pos_rel_centre',
                          dummy_na=True).add_prefix('c_')


#and then we drop the original ones from the datasets
df = df.drop('intersection_pos_rel_centre', axis=1)
df_val = df_val.drop('intersection_pos_rel_centre', axis=1)

#our new sets are the concatenation of the last ones
df = pd.concat([df, df_dummies], axis=1)
df_val = pd.concat([df_val, df_val_dummies], axis=1)

In [None]:
X = df.drop(['row_id', 'diagnosis'], axis=1)
# we save the diagnosis as our target 
y = df['diagnosis']

# And we create our validation vectors
X_val = df_val.drop(['row_id', 'diagnosis'], axis=1)
y_val = df_val['diagnosis']

In [None]:
imputer = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=999)
imputer.fit(pd.concat([X,X_val]))
SimpleImputer()
X_imputed = imputer.transform(X)
X_val = imputer.transform(X_val)


## Train your model

In [None]:
# model.fit(train_data)
model = RandomForestClassifier(n_estimators=1000)
model.fit(X_imputed, y)

log_loss_value = log_loss(y_val, model.predict_proba(X_val))
f1_value = f1_score(y_val, model.predict(X_val), average='macro')

print("log_loss_value over validation = {}\nf1_value over validation = {}".format(log_loss_value, f1_value))

log_loss_value over validation = 0.715918873159399
f1_value over validation = 0.35465225811799633


In [None]:
# some custom code block

## Save your trained model

In [None]:
# model.save()
filename = f'{AICROWD_ASSETS_DIR}/haha'
torch.save(model, filename)


# Prediction phase 🔎

Please make sure to save the weights from the training section in your assets directory and load them in this section

In [None]:
# model = load_model_from_assets_dir(AIcrowdConfig.ASSETS_DIR)
filename = f'{AICROWD_ASSETS_DIR}/haha'
model = torch.load(filename)

## Load test data

In [None]:
test_data = pd.read_csv(AICROWD_DATASET_PATH)

test_data['intersection_pos_rel_centre'].fillna('N', inplace=True)

test_dummies = pd.get_dummies(test_data['intersection_pos_rel_centre'], columns='intersection_pos_rel_centre',
                          dummy_na=True).add_prefix('c_')

test = test_data.drop(['row_id','intersection_pos_rel_centre'], axis=1)

test = pd.concat([test, test_dummies], axis=1)

imputer_test = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=999)
imputer_test.fit(test)
SimpleImputer()
test = imputer_test.transform(test)

## Generate predictions

In [None]:
predict = model.predict_proba(test)

predictions = {
    "row_id": test_data["row_id"].values,
    "normal_diagnosis_probability": predict[:,0],
    "post_alzheimer_diagnosis_probability": predict[:,1],
    "pre_alzheimer_diagnosis_probability": predict[:,2],
}

predictions_df = pd.DataFrame.from_dict(predictions)

In [None]:
predictions_df

Unnamed: 0,row_id,normal_diagnosis_probability,post_alzheimer_diagnosis_probability,pre_alzheimer_diagnosis_probability
0,LA9JQ1JZMJ9D2MBZV,0.867,0.093,0.040
1,PSSRCWAPTAG72A1NT,0.853,0.084,0.063
2,GCTODIZJB42VCBZRZ,0.997,0.003,0.000
3,7YMVQGV1CDB1WZFNE,0.769,0.205,0.026
4,PHEQC6DV3LTFJYIJU,0.700,0.279,0.021
...,...,...,...,...
357,SDM0DQJ0Z1L72FBQG,0.999,0.001,0.000
358,3A7NVWPQEHUGYJUH0,0.796,0.167,0.037
359,S36ZWGFUK77RAOSV1,0.949,0.037,0.014
360,LFYFH8E7EP75VLWNW,0.921,0.062,0.017


## Save predictions 📨

In [None]:
predictions_df.to_csv(AICROWD_PREDICTIONS_PATH, index=False)

# Submit to AIcrowd 🚀

**NOTE: PLEASE SAVE THE NOTEBOOK BEFORE SUBMITTING IT (Ctrl + S)**

In [None]:
!DATASET_PATH=$AICROWD_DATASET_PATH \
aicrowd notebook submit \
    --assets-dir $AICROWD_ASSETS_DIR \
    --challenge addi-alzheimers-detection-challenge

[32mAPI Key valid[0m
[32mSaved API Key successfully![0m
Using notebook: /home/desktop0/haha.ipynb for submission...
Removing existing files from submission directory...
Scrubbing API keys from the notebook...
Collecting notebook...
Validating the submission...
Executing install.ipynb...
[NbConvertApp] Converting notebook /home/desktop0/submission/install.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python
[NbConvertApp] Writing 10217 bytes to /home/desktop0/submission/install.nbconvert.ipynb
Executing predict.ipynb...
[NbConvertApp] Converting notebook /home/desktop0/submission/predict.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python
2021-06-07 18:01:06.588982: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-06-07 18:01:06.589025: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] 