## TensorFlow
One of the main reasons for success of neural networks is the introduction of new software packages designed specifically for creating and training neural network models. These packages have the capability to train the model on specific hardware such as CPUs, GPUs or even TPUs.  

In this notebook, we use TensorFlow to train a neural network model. The other main package for neural networks is Torch. Its python wrapper pyTorch.  
First, we need to install the package on the machine.

In [2]:
!pip3 install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.6.0-cp38-cp38-macosx_10_11_x86_64.whl (199.0 MB)
[K     |████████████████████████████████| 199.0 MB 35.6 MB/s eta 0:00:01   |██████████▎                     | 63.6 MB 7.2 MB/s eta 0:00:19     |█████████████████▊              | 110.2 MB 24.6 MB/s eta 0:00:04     |███████████████████████▊        | 147.6 MB 65.6 MB/s eta 0:00:01███████████████▊| 197.2 MB 35.6 MB/s eta 0:00:01
[?25hCollecting grpcio<2.0,>=1.37.0
  Downloading grpcio-1.41.1-cp38-cp38-macosx_10_10_x86_64.whl (3.9 MB)
[K     |████████████████████████████████| 3.9 MB 60.1 MB/s eta 0:00:01
[?25hCollecting astunparse~=1.6.3
  Using cached astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting opt-einsum~=3.3.0
  Using cached opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Collecting google-pasta~=0.2
  Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting gast==0.4.0
  Using cached gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting keras~=2.6
  Downloading keras-2.6.0-p

## Binary Classification with Neural Networks
In this notbook, we consider a simple problem in healthcare. Can we predict diabetes based on other healthcare information. For this problem we use **Pima Indians onset of diabetes** dataset from **UCI Machine Learning** repository. This is dataset of structured data.

### Pima Indians Onset of Diabetes Dataset
This dataset has 767 row. The dependent variable, Onset, has values 1 for onset of diabetes and 0 for no sign of the disease.

#### Features
This dataset has 8 input variable. All features are numeric and there is no missing values.

1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function (history in relatives)
8. Age (years)



### Reading the Data

In [1]:
import pandas as pd
df = pd.read_csv('pima-indians-diabetes.data.csv',header=None)
col_list = ['Pregnancy','Glucose','Blood_Pressure','Skin_fold','Insulin','BMI','Relatives_History','Age','Onset']
df.columns = col_list
print(df.shape)
print(df.info())
df.head()

(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Pregnancy          768 non-null    int64  
 1   Glucose            768 non-null    int64  
 2   Blood_Pressure     768 non-null    int64  
 3   Skin_fold          768 non-null    int64  
 4   Insulin            768 non-null    int64  
 5   BMI                768 non-null    float64
 6   Relatives_History  768 non-null    float64
 7   Age                768 non-null    int64  
 8   Onset              768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None


Unnamed: 0,Pregnancy,Glucose,Blood_Pressure,Skin_fold,Insulin,BMI,Relatives_History,Age,Onset
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [2]:
# First, we seperate input and output data.
df_y = pd.DataFrame(df['Onset'])
df_x = df[['Pregnancy','Glucose','Blood_Pressure','Skin_fold','Insulin','BMI','Relatives_History','Age']]

### Unbalanced Dataset
There are 500 of 0 labels and 268 of 1 labels, i.e., we have roughly 2 times 0s compared to 1s in this data. This is a problem when we want to choose a threshold to separate 1s from 0s given their probabilities. 

In [3]:
from collections import Counter
print(Counter(df_y['Onset']))

Counter({0: 500, 1: 268})


In [4]:
# Keep some data for out of sample testing

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    df_x,df_y, test_size=0.2,random_state=42)

### Neural Network Model

Let's start with a feedforward network with 2 hidden layers. TensorFlow has the ability of define each layer as a function. To create the model, we simply compose these functions. The input layer is defined with function Input. Then we have two hidden layers defined by Dense function. The first of these layers has 12 nodes and the next one has 8. 

The Model functions gets the input and output layers as creates a Keras (an internal tensorflow package) model for us. Our model needs a loss function and an optimizer (a version of gradient decent). Here we go with Binary Crossentropy and Adam. The training evaluation metric is accuracy.

The summary function lists the layers of the created model, number of their nodes and number of parameter in each layer. 

This is one of the simplest models that can be defined. It is just a two layer perceptron.

In [5]:
import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense,Input

input1 = Input(shape=(8,))
logits1 = Dense(units=12, activation="relu")(input1)
logits2 = Dense(units=8, activation="relu")(logits1)
#logits2 = Dropout(0.5)(logits1)
logits3 = Dense(units=1, activation="sigmoid")(logits2)

model = Model(inputs=input1, outputs=logits3)

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 8)]               0         
_________________________________________________________________
dense (Dense)                (None, 12)                108       
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 104       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 9         
Total params: 221
Trainable params: 221
Non-trainable params: 0
_________________________________________________________________


### Learning Rate
The optimizer function has the default learning rate value. We get this value by running following command.

One can change this value or even define a callback function to change it based on a schedule. 

In [6]:
print(round(model.optimizer.lr.numpy(), 5))

0.001


### Training

Number of epochs and batch size are two of the most important hyper parameters of each neural network model. Epochs determine how many times the model should be retained on the data and batch size determine during each epoch how many of the data points should be considered at each iteration of the epoch.

In [7]:
model.fit(x=X_train, y=y_train,
            validation_split=0.1,
            batch_size=10,
            shuffle=True,
            epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f97f920c160>

In [None]:
### Model e

In [8]:
_, accuracy = model.evaluate(X_test,y_test)
print('Accuracy: %.2f' % (accuracy*100))

Accuracy: 68.18


In [9]:
from sklearn.metrics import accuracy_score,confusion_matrix

y_train_pred = (model.predict(X_train) > 0.5).astype(int)
train_acc = accuracy_score(y_train,y_train_pred)
print('Train Accuracy:',round(train_acc * 100,2))

y_pred = (model.predict(X_test) > 0.5).astype(int)
acc = accuracy_score(y_test,y_pred)
cm = confusion_matrix(y_test,y_pred)

print('Accuracy:',round(acc*100,2))
print('Confusion Matrix:')
print(cm)

Train Accuracy: 69.06
Accuracy: 68.18
Confusion Matrix:
[[75 24]
 [25 30]]


In [10]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df[col_list[:-1]] = scaler.fit_transform(df[col_list[:-1]])
print(df.shape)
print(df.info())
df.head()

(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Pregnancy          768 non-null    float64
 1   Glucose            768 non-null    float64
 2   Blood_Pressure     768 non-null    float64
 3   Skin_fold          768 non-null    float64
 4   Insulin            768 non-null    float64
 5   BMI                768 non-null    float64
 6   Relatives_History  768 non-null    float64
 7   Age                768 non-null    float64
 8   Onset              768 non-null    int64  
dtypes: float64(8), int64(1)
memory usage: 54.1 KB
None


Unnamed: 0,Pregnancy,Glucose,Blood_Pressure,Skin_fold,Insulin,BMI,Relatives_History,Age,Onset
0,0.352941,0.743719,0.590164,0.353535,0.0,0.500745,0.234415,0.483333,1
1,0.058824,0.427136,0.540984,0.292929,0.0,0.396423,0.116567,0.166667,0
2,0.470588,0.919598,0.52459,0.0,0.0,0.347243,0.253629,0.183333,1
3,0.058824,0.447236,0.540984,0.232323,0.111111,0.418778,0.038002,0.0,0
4,0.0,0.688442,0.327869,0.353535,0.198582,0.642325,0.943638,0.2,1


In [11]:
# First, we seperate input and output data.
df_y = pd.DataFrame(df['Onset'])
df_x = df[['Pregnancy','Glucose','Blood_Pressure','Skin_fold','Insulin','BMI','Relatives_History','Age']]

In [12]:
from collections import Counter
print(Counter(df_y['Onset']))

Counter({0: 500, 1: 268})


In [53]:
# Keep some data for out of sample testing
import numpy as np
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    df_x,df_y, test_size=0.2,random_state=42)

np.random.seed(42)

In [54]:
X_train.shape

(614, 8)

In [59]:
import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense,Dropout,Input

logits1 = Input(shape=(8,))
logits2 = Dense(units=16, activation="relu")(logits1)
logits3 = Dense(units=32, activation="relu")(logits2)
#logits4 = Dense(units=32, activation="relu")(logits3)
#logits4 = Dropout(0.1)(logits3)
logits20 = Dense(units=1, activation="relu")(logits3)

model = Model(inputs=logits1, outputs=logits20)

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()

Model: "model_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_12 (InputLayer)        [(None, 8)]               0         
_________________________________________________________________
dense_34 (Dense)             (None, 16)                144       
_________________________________________________________________
dense_35 (Dense)             (None, 32)                544       
_________________________________________________________________
dense_36 (Dense)             (None, 1)                 33        
Total params: 721
Trainable params: 721
Non-trainable params: 0
_________________________________________________________________


In [60]:
model.fit(x=X_train, y=y_train,
            validation_split=0.1,
            batch_size=617,
            shuffle=True,
            epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f97decdd5b0>

In [61]:
_, accuracy = model.evaluate(X_test,y_test)
print('Accuracy: %.2f' % (accuracy*100))

Accuracy: 74.03


In [58]:
from sklearn.metrics import accuracy_score,confusion_matrix

y_train_pred = (model.predict(X_train) > 0.5).astype(int)
train_acc = accuracy_score(y_train,y_train_pred)
print('Train Accuracy:',round(train_acc * 100,2))

y_pred = (model.predict(X_test) > 0.5).astype(int)
acc = accuracy_score(y_test,y_pred)
cm = confusion_matrix(y_test,y_pred)

print('Accuracy:',round(acc*100,2))
print('Confusion Matrix:')
print(cm)

Train Accuracy: 65.31
Accuracy: 64.29
Confusion Matrix:
[[99  0]
 [55  0]]
