# Predicting diabetes 
This notebook uses the toolkit to develop a range of models for the diabetes use case using the Pima Indians dataset. Source: UCI repository.

In [None]:
import morpher
from morpher.jobs import *
from morpher.plots import *
from morpher.metrics import *
from morpher.config import (
    imputers,
    algorithms,
    explainers
)

### Basic definitions
Now define the set up for this classification problem, such as filename, target, and test size:

In [None]:
filename = "diabetes.csv"
target = "diabetes"

### Loading and imputing data 
Load the data set and impute it using the mean imputer and split it. Dataset should be composed of numeric or boolean features and target variable should be numeric, e.g., 0 for 'no' and 1 for 'yes'.

In [None]:
data = Load().execute(filename=filename)
data,_ = Impute().execute(data)

train, test = Split().execute(
    data, test_size=0.2
)

### Training different models
Now train models using decision tree, random forest, gradient boosting decision tree:

In [None]:
models = Train().execute(
    train,
    target=target,
    algorithms=[ algorithms.DT, algorithms.RF, algorithms.GBDT ],
    verbose=True
)

### Evaluate the models
Now evaluate the trained models on the test set obtained previously and plot a ROC curve.

In [None]:
results = Evaluate().execute(test, target=target, models=models)
plot_roc(results)

### Explain the models
Now explain the models using model feature contribution, LIME and mimic learning and plot the explanations for Random Forest (RF).

In [None]:
explanations = Explain().execute(
    data,
    models=models,
    explainers = [explainers.FEAT_CONTRIB, explainers.LIME, explainers.MIMIC],
    target=target,
    exp_args = {'test':test}                 
)

plot_explanation_heatmap(explanations[algorithms.RF])
