<center><h1>Deep Neural Network Exploration for Diabetes Dataset</h1></center>

First, we start by importing all the tools needed.  The main libraries used are:

**Sklearn**-- For encoding the labels, scaling the data, splitting the data into training and testing sets, and applying PCA as necessary. 

**Keras**-- Deep Learning library by Google, used for quickly prototyping Neural Networks, without all of the extra boilerplate code that comes with Tensorflow.  

**Pandas**-- For reading in the data into a dataframe.  

<center><h3>Step 1: Import all the Necessary Tools</h3></center>

Just a ton of import statements!

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
import keras
from keras.layers import Dense
from keras.layers import Softmax
from keras.layers import Dropout
from keras.models import Sequential

Using TensorFlow backend.
  return f(*args, **kwds)


<center><h3>Step 2: Read in the data and store the labels separately</h3></center>

Read in the dataset, store the labels in a `labels` variable, and then drop the labels from the dataset. Also, when we drop the index column included by default (added when we wrote the file in the data cleaning step).


In [2]:
dataset = pd.read_csv("cleaned_diabetes_data_v2")

In [3]:
labels = dataset['labels'].astype('str')
dataset.drop(['Unnamed: 0', 'labels'], axis=1, inplace=True)

<center><h3>Step 3: Preprocess the data/labels</h3></center>

The labels are currently stored as strings.  In this step, we use `sklearn.preprocessing.LabelEncoder()` to encode the labels as integer values instead.  

In this step, we also use `sklearn.preprocessing.StandardScaler()` to normalize the scale of our data (this just subtracts the mean value for each column from every value, and then divides it by the column's variance)

In [4]:
encoder = LabelEncoder()
labels = encoder.fit_transform(labels)

scaler = StandardScaler()
scaler.fit(dataset)
scaled_dataset = scaler.transform(dataset)

<center><h3>Step 4: Split the Data into Training and Testing sets</h3></center>

Split off 20% of the data from the training set to hold out as a testing set.  The data is randomly shuffled into training and testing sets.

In [5]:
X_train, X_test, y_train, y_test = train_test_split(scaled_dataset, labels, train_size=0.8)



<center><h3>Step 5: Convert Labels from integer to categorical variables</h3></center>

This step one-hot encodes the labels, turning them into sparse vectors.  This ensures that our ground-truth labels will be the same shape as the output from our softmax layer, making for easy comparisons between the two. 

Below this, we also get the shape of the data contained within X_train and X_test (we omit index 0 because this corresponds to the number of data points, not the shape of them).  We'll use this to tell our DNN what shape our input layer should be (1 neuron corresponding to each dimension in the dataset).

In [6]:
y_train = keras.utils.to_categorical(y_train, num_classes=3)
y_test = keras.utils.to_categorical(y_test, num_classes=3)

In [7]:
X_train.shape[1:]

(100,)

<center><h3>Step 6: Define the Architecture of our Deep Neural Network</h3></center>

Following the conventions of Keras, we start by declaring a `Sequential()` object.

Each line of code below corresponds, in order, to every hidden layer in the neural network (see the diagram for a visualization of the network). The following types of layers are used:

**_Dense_**: These are the typical "fully-connected" layers in a Feed-Forward Neural Network.  The number of neurons in a Dense layer is specified as a hyperparameter when the Dense Layer is declared (e.g., the first Dense Layer of the model contains 256 neurons).  

**_Dropout_**: Dropout layers act as a very effective form of regularization in neural networks.  The parameter passed in during the creation of the dropout layer corresponds to the percentage chance that any given neuron in this layer will be turned off (it's output changed to zero).  This helps the network avoid overfitting by ensuring that certain important neurons don't "overwhelm" their neighbors (think about how the rest of a team gets more play time and learns more in games where the star player is injured!). 

**_Softmax_**: This is the output layer of the neural network, used for outputing a vector of probabilities corresponding to the percentage chance the network thinks a given example belongs to each class.  The sum of all values in any vector from a Softmax will always be 1.  For instance, if the softmax layer looked at datapoint \[x\] and output the vector `[0.82, 0.03, 0.15]`, then this means that the model is 82% confident the data point belongs to class 0, 3% confident it belongs to class 1, and 15% confident it belongs to class 2.  The overall prediction is whatever class has the max value in the output vector (in the above example, class 0).


<center><h3>Model Summary and compilation choices</h3></center>

After declaring the architecture of the DNN, we printed out a summary of the model.  We can see that the model has 100,227 trainable parameters--these are the weights and biases that the model will (hopefully) learn optimal values for as the model trains. 

For the optimizer, we chose `adam`, which is currently considered one of the most robust forms of gradient descent since it's release in 2015 (adam cobmines the Gradient Descent algorithms Adagrad and RMSProp). 

For loss, we chose `categorical_crossentropy`.  

The only metric we'll track is is classification `accuracy`.



In [8]:
model = Sequential()
model.add(Dense(256, activation='relu', input_shape=(100,)))
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 256)               25856     
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               32896     
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 128)               16512     
__________

<center><h3>Step 7: Create a Tensorboard Callback and fit the model</h3></center>

Since Keras uses Tensorflow as a backend, we have access to awesome tools such as `Tensorboard` to analyze the logs output by our model.  We use this by creating a _tensorboard callback_ and passing it into the model during our `.fit()` call.

<center><h3>Fitting the Model</h3></center>

During the fit step, we originally tried about 25 epochs, but found that the model tended to converge in less than 10 epochs.  We also played around with the batch size, but found that smaller batches generally didn't have much of an effect on the performance of our model (smaller batches means more updates to the trainable parameters, but also means a longer run time).  We also pass in the validation data during this step so that the model can compute loss and accuracy on our testing set at the end of each epoch.  

In [9]:
tb_callback = keras.callbacks.TensorBoard(log_dir='./logs', histogram_freq=0, batch_size=16, write_graph=True, write_grads=False, write_images=False, embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None)

In [10]:
model.fit(X_train, y_train, batch_size=16, epochs=10, verbose=1, callbacks=[tb_callback], validation_data=(X_test, y_test))

Train on 81412 samples, validate on 20354 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x122ae27b8>

<center><h3>Intepreting our Results</h3></center>

In the end, the model was only able to achieve a validation accuracy of 57.42%. This is still better than chance, because there are 3 total categories that the model could predict (random chance would have an accuracy around 33%).  However, this is still quite low.  The deep learning approach also showed no significant improvement over shallow learning algorithms, and comes with a much higher computation cost. 