# **AIM: Build an Artificial Neural Network to implement Binary Classification task using the Back-propagation algorithm and test the same using appropriate data sets.**

# **Description**

The data used here is : '**Pima Indians Diabetes Dataset**'. It is downloaded from : https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv

It is a binary (2-class) classification problem. There are 768 observations with 8 input variables and 1 output variable.

The variable names are as follows:

**1. Number of times pregnant.**

**2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test.**

**3. Diastolic blood pressure (mm Hg).**

**4. Triceps skinfold thickness (mm).**

**5. 2-Hour serum insulin (mu U/ml).**

**6. Body mass index (weight in kg/(height in m)^2).**

**7. Diabetes pedigree function.**

**8. Age (years).**

**9. Class variable (0 or 1).**


 # **Data Import and Processing**


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import sklearn

In [None]:
# load data
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv'
data_pd = pd.read_csv(url,header = None)
print(data_pd.info())
print(data_pd.head())

StandardScaler: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

In [None]:
#Scaling Numerical columns
from sklearn.preprocessing import StandardScaler
std = StandardScaler()
scaled = std.fit_transform(data_pd.iloc[:,0:8])
scaled = pd.DataFrame(scaled)
scaled.head()

In [None]:
X_data =scaled.to_numpy()
print('X_data:',np.shape(X_data))
Y_data = data_pd.iloc[:,8]
print('Y_data:',np.shape(Y_data))

In [None]:
# Split data into X_train, X_test, y_train, y_test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_data, Y_data, test_size=0.25, random_state= 0)

In [None]:
# Check the dimension of the sets
print('X_train:',np.shape(X_train))
print('y_train:',np.shape(y_train))
print('X_test:',np.shape(X_test))
print('y_test:',np.shape(y_test))

# **Design the Model**

In [None]:
import keras
from keras.models import Sequential   # importing Sequential model
from keras.layers import Dense        # importing Dense layers

In [None]:
# declaring model
basic_model = Sequential()

Check Eg: https://github.com/urjeet/Pima-Diabetes-Keras-Model/blob/master/pima_diabetes_keras_model.py

In [None]:
# Adding layers to the model (DIY)

# First layers: 8 neurons/perceptrons that takes the input and uses 'sigmoid' activation function.

# Second layers: 4 neurons/perceptrons, 'sigmoid' activation function.

# Final layer: 1 neuron/perceptron to do binary classification


In [None]:
# compiling the model (DIY)


# **Train the Model**

In [None]:
# training the model
epochs=120
history = basic_model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs)

#**Evaluate the Model**

In [None]:
# plot loss vs epochs
epochRange = range(1,epochs+1);
plt.plot(epochRange,history.history['loss'])
plt.plot(epochRange,history.history['val_loss'])
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid()
plt.xlim((1,epochs))
plt.legend(['Train','Test'])
plt.show()

In [None]:
# Plot accuracy vs epochs (DIY)


In [None]:
# Test, Loss and accuracy
loss_and_metrics = basic_model.evaluate(X_test, y_test)
print('Loss = ',loss_and_metrics[0])
print('Accuracy = ',loss_and_metrics[1])

## Classification Model Performance measures

<img src='https://editor.analyticsvidhya.com/uploads/99666confusion%20matrix.JPG' width=40%>

In [None]:
y_pred = basic_model.predict(X_test)
print(y_test[:5])
print(y_pred[:5])

In [None]:
y_pred =[1 if y_pred[aa]>=0.5 else 0 for aa in range(len(y_pred)) ]
print(y_pred[:5])

In [None]:
print(sklearn.metrics.classification_report(y_test, y_pred))