# **Keras Tutorial 1: Build a Classifier Using a Fully Connected Neural Network**
* Code adapted from University of Florida course *Biomedical Data Science*, College of Engineering (Parisa Rashidi 2021)

* [Dataset](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database): Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.

In this notebook, we will implement a simple deep learning classifier to predict whether a given patient has diabetes from a structured dataset of medical predictors. We will use the Keras library for developing our models.

* Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation. *Being able to go from idea to result as fast as possible is key to doing good research.* (https://keras.io/about/)

In [1]:
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import ModelCheckpoint,EarlyStopping

We will also be using some data processing functions from the library scikit-learn.
* Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities. (https://scikit-learn.org/stable/getting_started.html)

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix

Finally, we'll use methods we've already learned from the Pandas and NumPy libraries.

In [3]:
import pandas as pd
import numpy

Fix random seed for reproducibility.

In [4]:
seed = 7
numpy.random.seed(seed)

# 1. Load and explore dataset

Load dataset into a Pandas dataframe.

In [5]:
df = pd.read_csv('https://raw.githubusercontent.com/bshickel/student/main/data_mlp.csv')

HTTPError: HTTP Error 404: Not Found

In [None]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,Age,Outcome
0,6,148,72,35,0,33.6,50,1
1,1,85,66,29,0,26.6,31,0
2,8,183,64,0,0,23.3,32,1
3,1,89,66,23,94,28.1,21,0
4,0,137,40,35,168,43.1,33,1


In [None]:
df.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,81.0,1.0


# 2. Convert Pandas dataframe into vectors used for machine learning

Convert Pandas dataframe to NumPy array.

In [None]:
data = df.values

Split dataset into features (X) and labels (Y).

In [None]:
X = data[:,0:7]
Y = data[:,7]

# 3. Data Processing: Prepare data for ML algorithms

Scale each input feature to have zero mean and unit variance.

In [None]:
X = StandardScaler().fit_transform(X)

Split dataset into a training set (sometimes referred to as a *development* set), and a testing set (sometimes referred to as a *validation* set). Here we use a random 20% of samples as our testing set to evaluate model performance. We will also set an (optional) random state for reproducibility.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=1)

In [None]:
print("Train set shape: ", X_train.shape)
print("Test set shape: ", X_test.shape)

Train set shape:  (614, 7)
Test set shape:  (154, 7)


# 4. Create our deep learning model (feed-forward neural network)
We will use Keras to create a fully-connected neural network for our prediction task. This type of deep learning architecture is sometimes also known as a multi-layer perceptron (MLP). It is a relatively simple deep learning model that passes input data through multiple hidden *layers* (each with a number of hidden *neurons* and a particular *activation* function) to produce an output. We will construct our model to perform a classification task from our dataset.

There are many ways to create a model using the Keras API. Here, we will create a Sequential object, which is a way to define a series of layers that make up our model. In a Sequential model, input data will flow from one layer to the next, in the order that we define our layers.

In [None]:
model = Sequential()

Each hidden layer of our neural network will be created using the **Dense** class from Keras. For each layer, we must define the number of hidden units (also known as neurons). There are several optional arguments we may also pass, which can be viewed in the [Keras documentation page](https://keras.io/api/layers/core_layers/dense/). We can add many layers to our deep learning model using the .add() function of the Sequential class. You can think of a Sequential container as a list of hidden layers.

For the first layer of our neural network, we must tell Keras how many variables to expect in each input vector. From our previous data exploration, we know that each patient is defined by 7 different variables, so the input dimension to our network is 7.

One reason why deep learning models are so powerful is their ability to model complex variable interactions through nonlinear activation functions. We have several choices for activation function. In our example, we will use the commonly chosen Rectified Linear Unit activation (ReLU).

In [None]:
model.add(Dense(units=15, input_dim=7, activation='relu'))

So far our model has a single hidden layer. Let's add one more hidden layer with 8 hidden units.

In [None]:
model.add(Dense(units=8, activation='relu'))

Once we are satisfied with the hidden layers of our model, we need to add an output layer for generating class predictions. Our output layer will also be a Dense layer, but it will only have a single (1) unit. Instead of ReLU, we will use a sigmoid activation function, which is typically chosen for binary classification problems such as ours. Using a sigmoid activation on our output layer allows us to interpret the output as a prediction probability. In other words, the probability that a given input vector belongs to class 1.

In [None]:
model.add(Dense(units=1, activation='sigmoid'))

Now that we have defined the architecture of our neural network, we will use the .compile() function to build it. In our example we are defining a few arguments that are associated with the training of our model:
* We are using a binary cross-entropy loss. This is an appropriate choise for binary classification.
* We will be using the Adam optimizer, which is a popular version of stochastic gradient descent (SGD).
* For this example, we are interested in our model's prediction accuracy, so we'll tell Keras to use the "accuracy" metric. 

In [None]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# 5. Train our model

Now it's time to train our prediction model! We're going to use a couple of best practices:
* **Model checkpointing.** As we train our model, we're going to continously save the best version. This can be important in case training gets interrupted and you would like to pick up where you left off at a later time. Here, after each epoch (i.e. each iteration through our entire training data), we're going to look at the model's accuracy on a hold-out set that it hasn't seen before. If this accuracy is better than the last epoch, we'll save the model as our best-performing model and continue.
* **Early stopping.** Sometimes we do not wish to continue training if the model's performance is not improving. Similar to model checkpointing, after each epoch we're going to look at the model's accuracy on the hold-out data. We define a counter called patience, which is the number of epochs without improvement in a particular metric (here we define the model's loss on hold-out data as our early stopping metric). If the metric does not improve after an epoch, the patience counter decreases by 1. If the model does improve, the counter gets reset to the initial value. Once the counter reaches zero, training will immediately end. Here we will tell the model to wait 4 epochs without improvement before ending training.

In [None]:
checkpoint = ModelCheckpoint('model.hdf5', monitor='val_accuracy', save_best_only=True)
earlystop = EarlyStopping(monitor='val_loss', patience=4)

Fit the model using our training dataset. We will use the one-line function **.fit()** to train our entire deep learning model.
* We will tell Keras to train the model for 150 epochs, but since we're using the early stopping method, training will likely end well before we reach 150 epochs.
* We will use a batch size of 10 samples. During each epoch, the model will pass in 10 samples at a time.
* We will use a random 30% of the training dataset as our hold-out dataset (here called the validation set) for computing metrics for our model checkpointing and early stopping.

In [None]:
model.fit(X_train, y_train, epochs=150, batch_size=10, validation_split= 0.3,callbacks = [checkpoint,earlystop])

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150


<keras.callbacks.History at 0x7fac20055ad0>

# 6. Evaluate our model

Let's check the performance of our trained model on the test set we already created. The model has never seen this particular data, so it can provide an idea of how well the model might perform in the future (generalizability to unseen data). We will use the Keras function .evaluate(), which will compute the loss, as well as any metrics we defined when compiling our model. Since we told Keras to use "accuracy" when we compiled, we will see the model's accuracy on the test data.

In [None]:
scores = model.evaluate(X_test, y_test)



# 7. Let's try another neural network!

Create the Sequential model.

In [None]:
model = Sequential()

Add a hidden layer with 128 neurons. (Remember, since this is the first layer in our model, we also need to define the input dimension).

In [None]:
model.add(Dense(units=128, input_dim=7, activation='sigmoid'))

Add 3 more hiden layers with the following attributes:
* A layer with 64 hidden units and "sigmoid" activation
* A layer with 32 hidden units and "sigmoid" activation
* A layer with 16 hidden units and "sigmoid" activation

In [None]:
model.add(Dense(units=64, activation='sigmoid'))
model.add(Dense(units=32, activation='sigmoid'))
model.add(Dense(units=16, activation='sigmoid'))

Add an output layer with "sigmoid" activation. (Remember, since we are doing binary classification, the final layer should have a single unit).

In [None]:
model.add(Dense(units=1, activation='sigmoid'))

Compile the model using binary crossentropy loss, adam optimizer, and accuracy metrics (Hint: this is the same way we compiled our first model.)

In [None]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Train the model without any callbacks (no checkpointing or early stopping). Use the following training arguments:
* Train for 10 epochs.
* Use 10% of the training data for hold-out validation.
* Use a batch size of 32.
* **Do not** use any callbacks such as early stopping or checkpointing.

In [None]:
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split= 0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fac90044190>

Evaluate the new model's performance on the test dataset.

In [None]:
scores = model.evaluate(X_test, y_test)

