# Fitting an ANN with *Scikitlearn*

In this exercise we will build a simple NN to detect diabetes using the famous PIMA indians dataset.

The dataset contains 768 observations and 9 variables, as described below:

1. `pregnancies` - Number of times pregnant.
2. `glucose` - Plasma glucose concentration.
3. `diastolic` - Diastolic blood pressure (mm Hg).
4. `triceps` - Skinfold thickness (mm).
5. `insulin` - Hour serum insulin (mu U/ml).
6. `bmi` – Basal metabolic rate (weight in kg/height in m).
7. `dpf` - Diabetes pedigree function.
8. `age` - Age in years.
9. `diabetes` - “1” represents the presence of diabetes while “0” represents the absence of it. This is the target variable.

We will follow the following steps:

1. Loading the required libraries and modules.
2. Reading the data and performing basic data checks.
3. Creating arrays for the features and the response variable.
4. Creating the training and test datasets.
5. Building , predicting, and evaluating the neural network model.

## Loading required libraries and modules

The notebook uses standard python libraries that may be already installed in your system. If this is not the case you may install them with the `pip` command from the anaconda console.

`Scikit` learn is a high-level library, which provides functions for building and validating all types of machine learning models.

In [263]:
# Import required libraries
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import sklearn
from sklearn.neural_network import MLPClassifier

# Import necessary modules
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from math import sqrt
from sklearn.metrics import r2_score

## Read the data and perfom basic checks



In [264]:
df = pd.read_csv('diabetes.csv') 
print(df.shape)
df.describe().transpose()

(768, 9)


Unnamed: 0,count,mean,std,min,25%,50%,75%,max
pregnancies,768.0,3.845052,3.369578,0.0,1.0,3.0,6.0,17.0
glucose,768.0,120.894531,31.972618,0.0,99.0,117.0,140.25,199.0
dloodPressure,768.0,69.105469,19.355807,0.0,62.0,72.0,80.0,122.0
triceps,768.0,20.536458,15.952218,0.0,0.0,23.0,32.0,99.0
insulin,768.0,79.799479,115.244002,0.0,0.0,30.5,127.25,846.0
bmi,768.0,31.992578,7.88416,0.0,27.3,32.0,36.6,67.1
dpf,768.0,0.471876,0.331329,0.078,0.24375,0.3725,0.62625,2.42
age,768.0,33.240885,11.760232,21.0,24.0,29.0,41.0,81.0
outcome,768.0,0.348958,0.476951,0.0,0.0,0.0,1.0,1.0


In [265]:
df.head()

Unnamed: 0,pregnancies,glucose,dloodPressure,triceps,insulin,bmi,dpf,age,outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


## Creating Arrays for the Features and the Response Variable

The code below sets the frameork for building the predictor from it
- The first line of code creates an object of the target variable called 'target_column'.
- The second line gives us the list of all the features, excluding the target variable 'outcome',
- The third line normalizes the predictors.
- The fourth line displays the summary of the normalized data. We can see that all the independent variables have now been scaled between 0 and 1. The target variable remains unchanged.

In [266]:
from sklearn import preprocessing
target_column = ['outcome'] 
predictors = list(set(list(df.columns))-set(target_column))
pred_scaled = df[predictors]/df[predictors].max()
scaler = preprocessing.StandardScaler()
# pred_scaled = scaler.fit_transform(df[predictors])
# df.head()

In [267]:
pd.DataFrame(pred_scaled)

Unnamed: 0,glucose,insulin,bmi,dpf,dloodPressure,pregnancies,age,triceps
0,0.743719,0.000000,0.500745,0.259091,0.590164,0.352941,0.617284,0.353535
1,0.427136,0.000000,0.396423,0.145041,0.540984,0.058824,0.382716,0.292929
2,0.919598,0.000000,0.347243,0.277686,0.524590,0.470588,0.395062,0.000000
3,0.447236,0.111111,0.418778,0.069008,0.540984,0.058824,0.259259,0.232323
4,0.688442,0.198582,0.642325,0.945455,0.327869,0.000000,0.407407,0.353535
...,...,...,...,...,...,...,...,...
763,0.507538,0.212766,0.490313,0.070661,0.622951,0.588235,0.777778,0.484848
764,0.613065,0.000000,0.548435,0.140496,0.573770,0.117647,0.333333,0.272727
765,0.608040,0.132388,0.390462,0.101240,0.590164,0.294118,0.370370,0.232323
766,0.633166,0.000000,0.448584,0.144215,0.491803,0.058824,0.580247,0.000000


## Creating test and training set

The code chunk below splits the original dataset into test and train subsets using a high level function `train_test_split`

In [268]:
df['outcome'].sum()

268

In [269]:
X = pred_scaled
y = df['outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=40)
print(X_train.shape); print(X_test.shape)
print(y_train.shape); print(y_test.shape)

(537, 8)
(231, 8)
(537,)
(231,)


## Building the Neural Network

The neural network model is built using the scikit-learn library's estimator object, *Multi-Layer Perceptron Classifier*. 

- The first line of code (shown below) imports `MLPClassifier`.

- The second line instantiates the model with the following setup:
    -  `hidden_layer_sizes` is set to three layers, whith has the same number of neurons as the count of features in the dataset.
    -  the activation function  is set to 'logistic'
    -  the solver (optimizer) is set to 'adam'.

- The third line of code fits the model to the training data.

In [270]:
X_train

Unnamed: 0,glucose,insulin,bmi,dpf,dloodPressure,pregnancies,age,triceps
402,0.683417,0.104019,0.521610,0.118182,0.688525,0.294118,0.432099,0.414141
748,0.939698,0.236407,0.542474,0.168595,0.573770,0.176471,0.444444,0.222222
606,0.909548,0.346336,0.596125,0.519835,0.639344,0.058824,0.271605,0.424242
253,0.432161,0.000000,0.533532,0.098347,0.557377,0.000000,0.308642,0.323232
361,0.793970,0.000000,0.444113,0.085537,0.573770,0.294118,0.777778,0.000000
...,...,...,...,...,...,...,...,...
440,0.949749,0.000000,0.511177,0.179752,0.852459,0.000000,0.506173,0.252525
165,0.522613,0.184397,0.445604,0.298347,0.606557,0.352941,0.506173,0.181818
7,0.577889,0.000000,0.526080,0.055372,0.000000,0.588235,0.358025,0.000000
219,0.562814,0.000000,0.563338,0.107851,0.540984,0.294118,0.506173,0.000000


In [280]:
from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(hidden_layer_sizes = (8,), 
                    activation = 'logistic', 
                    solver = 'lbfgs', 
                    max_iter = 1000,
                    batch_size = 64)
# y_train = column_or_1d(y_train, warn=True)
fitted = mlp.fit(X_train, y_train)

predict_train = mlp.predict(X_train)
print(confusion_matrix(y_train, predict_train))

[[317  41]
 [ 56 123]]


In [241]:
mlp.coefs_

[array([[  8.12449678, -24.41027455,  -9.60421012, -24.93885738,
         -59.45760304, -59.07836272,  -4.72429356, -57.35673568],
        [  6.44645266, -40.32310698, -59.99330075, -55.76374263,
          14.93555669,   7.69152366,   5.99737188, -13.76571543],
        [ 18.49885551, -16.44901068,  42.70065744, -63.89723842,
          -6.25173006,  -9.72609964,  -4.02998066,  41.6067245 ],
        [-21.0176215 , -40.45678771,  42.65418196,   7.50157305,
         -95.32870936, -22.91233022, -37.41093163, -27.8969419 ],
        [ -3.6309969 ,  14.42694074, -26.32301232,  -7.35117144,
           2.01591876,  35.50960444, -11.84261993, -35.49181269],
        [-56.87178499, -42.03666058, -59.65194968, -10.81977177,
           6.78170888, -19.06741239,  51.38147966, -72.62716764],
        [ 56.08468658, -21.24723444,  20.59220012,   9.6999276 ,
          11.79418986, -86.41716214, -21.09760761,  29.64507588],
        [-25.92745049, -23.97048637,   2.32878738,  44.8552203 ,
         -25.23823

## Predicting and validating

Last we use the trained model to generate predictions on the training and test dataset, respectively.

In [140]:
predict_train = mlp.predict(X_train)
predict_test = mlp.predict(X_test)


And output a confusion matrix

In [141]:
predict_train

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

In [142]:
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_train2, predict_train))
# print(classification_report(y_train,predict_train))
y_test2 = y_test.ravel()
print(confusion_matrix(y_test2, predict_test))
# print(classification_report(y_test,predict_test))

[[358   0]
 [179   0]]
[[142   0]
 [ 89   0]]
