<a href="https://colab.research.google.com/github/mvince33/Coding-Dojo/blob/main/week11/6_20_22_Challenge_Tuning_Neural_Networks_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tuning Neural Networking in Keras

We will use the version of Keras that comes in the Tensorflow package, as it has the most up to date tools.

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from seaborn import heatmap

from sklearn.metrics import mean_squared_error, classification_report, ConfusionMatrixDisplay
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# New libraries
import tensorflow.keras as keras
from keras.models import Sequential
from keras.layers import Dense

In [None]:
def evaluate_classification(y_true, y_pred, labels=None, normalize=None):
  print(classification_report(y_true, y_pred, target_names=labels))

  ConfusionMatrixDisplay.from_predictions(y_true, y_pred,
                                          display_labels=labels, 
                                          normalize=normalize,
                                          cmap='Blues')
  plt.show()

### Plot History

Since we will be plotting histories for all of our models, lets create a function to do it quickly.

In [None]:
#  You can use this function to see how your model improves over time
def plot_history(history, metrics=None):
  plt.plot(history.history['loss'], label='training')
  plt.plot(history.history['val_loss'], label='testing')
  plt.title('Loss')
  plt.legend()
  plt.show()
  if metrics:
    for metric in metrics:
      plt.plot(history.history[metric], label=f'training {metric}')
      plt.plot(history.history[f'val_{metric}'], label=f'testing {metric}')
      plt.legend()
      plt.title(metric)
      plt.show()



# Classification:

Classification models are similar, except that we need to adjust the final activation of the output layer, the loss function in the compile step, and the metrics we use to judge them.  Remember: MAE, MSE, RMSE, and R2 are regression metrics, accuracy, recall, precision, F1-Score, and confusion matrices are classification metrics.

## Classification Dataset
The classification dataset describes diabetes rates among Pima Indians.  Each row is a person and this dataset and includes features regarding health related measurements.  The target binary and represents whether or not a person will diagnosed with diabetes.  This is another old dataset first presented in 1988.



In [None]:
classification_df = pd.read_csv('https://raw.githubusercontent.com/ninja-josh/image-storage/main/diabetes.csv')
classification_df.head()

In [None]:
classification_df.info()

In [None]:
classification_df.duplicated().any()

In [None]:
classification_df.describe()

We see minimums for Glucose, BloodPressure, SkinThickness, Insulin, and BMI of 0s.  Those are impossible for humans, so lets drop those rows.

In [None]:
no_glucose = classification_df['Glucose'] == 0
no_blood = classification_df['BloodPressure'] == 0
no_skin = classification_df['SkinThickness'] == 0
no_insulin = classification_df['Insulin'] == 0
no_bmi = classification_df['BMI'] == 0

#class_df_clean excludes rows that have no values == 0 in the above columns
class_df_clean = classification_df[~(no_glucose |
                                     no_blood |
                                     no_skin |
                                     no_insulin |
                                     no_bmi)]
class_df_clean.describe()

We lost a lot of data, going from 768 samples to 392 samples.  In the future we might impute this data using means, medians, or other imputation strategies.  For this exercise we won't focus on that.

In [None]:
# Define X and y and train test split
X = class_df_clean.drop(columns = 'Outcome')
y = class_df_clean['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42, stratify = y)

In [None]:
# Scale
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### Build your model

In [None]:
# Build your model

n_cols = X_train.shape[1]

# Instentiate the model
class_model = Sequential()

# create the first layer with input as the no of features in dataset
class_model.add(Dense(10, activation = 'relu', input_dim = X_train.shape[1]))

# Create hidden layers
class_model.add(Dense(10, activation = 'relu'))

# Create output layer 
# Since this is a binary classification, the activation function of our final layer needs to be 'sigmoid'. 

class_model.add(Dense(1, activation = 'sigmoid'))


In [None]:
# Compile your model

# Since this is binary classification set loss  = 'binary_crossentropy'
# Set the metrics = ['acc']
class_model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['acc'])


In [None]:
# fit your model
history = class_model.fit(X_train, y_train,
                        validation_data = (X_test, y_test),
                        epochs = 100)



In [None]:
# See how your model is doing
plot_history(history, metrics = ['acc'])

## Evaluation



In [None]:
# Make predicitons and evaluate your model
# Define labels for the confusion matrix
labels = ['No Diabetes', 'Diabetes']

# Get predictions and round them to integers instead of floats
train_preds = np.rint(class_model.predict(X_train))
test_preds = np.rint(class_model.predict(X_test))

# Evaluate training set
print('Training Evaluation:\n')
evaluate_classification(y_train, train_preds, labels=labels,
                        normalize='true')
print('Testing Evaluation:\n')
# Confusion Matrix
evaluate_classification(y_test, test_preds, labels=labels,
                        normalize='true')

# ðŸ‘‰ Tuning an underfit model:
##Increase model complexity:
1. add layers  
2. add nodes 
3. reduce other regularization

# ðŸ‘‰ Tuning an overfit model:
## Reduce model complexity:
1. Reduce layers or nodes
2. Add dropout layers
3. Implement early stopping callback
3. Add L1 or L2 regularization


# ðŸ”§ Your Turn: Tune This Model!

* Choose one or more regularization techniques to improve this model.

* Make one change at a time.  Make a new cell for each change to keep a record of what you've tried.

##Ask yourselves:  Should we increase or decrease model complexity?

