# **AIM: Build an Artificial Neural Network to implement Binary Classification task using the Back-propagation algorithm and test the same using appropriate data sets.**

# **Description**

The data used here is : '**Pima Indians Diabetes Dataset**'. It is downloaded from : https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv

It is a binary (2-class) classification problem. There are 768 observations with 8 input variables and 1 output variable.

The variable names are as follows:

**1. Number of times pregnant.**

**2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test.**

**3. Diastolic blood pressure (mm Hg).**

**4. Triceps skinfold thickness (mm).**

**5. 2-Hour serum insulin (mu U/ml).**

**6. Body mass index (weight in kg/(height in m)^2).**

**7. Diabetes pedigree function.**

**8. Age (years).**

**9. Class variable (0 or 1).**


 # **Data Import and Processing**


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import sklearn

In [3]:
# load data
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv'
data_pd = pd.read_csv(url,header = None)
print(data_pd.info())
print(data_pd.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       768 non-null    int64  
 1   1       768 non-null    int64  
 2   2       768 non-null    int64  
 3   3       768 non-null    int64  
 4   4       768 non-null    int64  
 5   5       768 non-null    float64
 6   6       768 non-null    float64
 7   7       768 non-null    int64  
 8   8       768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None
   0    1   2   3    4     5      6   7  8
0  6  148  72  35    0  33.6  0.627  50  1
1  1   85  66  29    0  26.6  0.351  31  0
2  8  183  64   0    0  23.3  0.672  32  1
3  1   89  66  23   94  28.1  0.167  21  0
4  0  137  40  35  168  43.1  2.288  33  1


StandardScaler: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

In [4]:
#Scaling Numerical columns
from sklearn.preprocessing import StandardScaler
std = StandardScaler()
scaled = std.fit_transform(data_pd.iloc[:,0:8])
scaled = pd.DataFrame(scaled)
scaled.head()

Unnamed: 0,0,1,2,3,4,5,6,7
0,0.639947,0.848324,0.149641,0.90727,-0.692891,0.204013,0.468492,1.425995
1,-0.844885,-1.123396,-0.160546,0.530902,-0.692891,-0.684422,-0.365061,-0.190672
2,1.23388,1.943724,-0.263941,-1.288212,-0.692891,-1.103255,0.604397,-0.105584
3,-0.844885,-0.998208,-0.160546,0.154533,0.123302,-0.494043,-0.920763,-1.041549
4,-1.141852,0.504055,-1.504687,0.90727,0.765836,1.409746,5.484909,-0.020496


In [5]:
X_data =scaled.to_numpy()
print('X_data:',np.shape(X_data))
Y_data = data_pd.iloc[:,8]
print('Y_data:',np.shape(Y_data))

X_data: (768, 8)
Y_data: (768,)


In [None]:
# Split data into X_train, X_test, y_train, y_test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_data, Y_data, test_size=0.25, random_state= 0)

In [None]:
# Check the dimension of the sets
print('X_train:',np.shape(X_train))
print('y_train:',np.shape(y_train))
print('X_test:',np.shape(X_test))
print('y_test:',np.shape(y_test))

# **Design the Model**

In [None]:
import keras
from keras.models import Sequential   # importing Sequential model
from keras.layers import Dense        # importing Dense layers

In [None]:
# declaring model
basic_model = Sequential()

Check Eg: https://github.com/urjeet/Pima-Diabetes-Keras-Model/blob/master/pima_diabetes_keras_model.py

In [None]:
# Adding layers to the model (DIY)

# First layers: 8 neurons/perceptrons that takes the input and uses 'sigmoid' activation function.

# Second layers: 4 neurons/perceptrons, 'sigmoid' activation function.

# Final layer: 1 neuron/perceptron to do binary classification


In [None]:
# compiling the model (DIY)


# **Train the Model**

In [None]:
# training the model
epochs=120
history = basic_model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs)

#**Evaluate the Model**

In [6]:
# plot loss vs epochs
epochRange = range(1,epochs+1);
plt.plot(epochRange,history.history['loss'])
plt.plot(epochRange,history.history['val_loss'])
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid()
plt.xlim((1,epochs))
plt.legend(['Train','Test'])
plt.show()

NameError: name 'epochs' is not defined

In [None]:
# Plot accuracy vs epochs (DIY)


In [None]:
# Test, Loss and accuracy
loss_and_metrics = basic_model.evaluate(X_test, y_test)
print('Loss = ',loss_and_metrics[0])
print('Accuracy = ',loss_and_metrics[1])

## Classification Model Performance measures

<img src='https://editor.analyticsvidhya.com/uploads/99666confusion%20matrix.JPG' width=40%>

In [None]:
y_pred = basic_model.predict(X_test)
print(y_test[:5])
print(y_pred[:5])

In [None]:
y_pred =[1 if y_pred[aa]>=0.5 else 0 for aa in range(len(y_pred)) ]
print(y_pred[:5])

In [None]:
print(sklearn.metrics.classification_report(y_test, y_pred))