# Optimizing Model Example
**(c) Feb 2025 Julie Fleischer**

This file contains optimizations to my model in Example Solution from Deep Learning Class.  Details on the optimizations I made (and the ah-ha moment insight I had that allowed me to get to 100% accuracy) are below.

Enjoy!

## 1 - Install libraries

In [None]:
# Install required libraries (if not already installed)
!pip install pandas
!pip install tensorflow
!pip install scikit-learn

## 2 - Load and pre-process input data

In this step, we load the data in from input file.

See next step for details on why the input file changed.

In [39]:
# Load and pre-process input data
import pandas as pd

# Read in the file
#root_cause_data = pd.read_csv('deep_learning_sample_data.csv')
root_cause_data = pd.read_csv('deep_learning_sample_data_better.csv') # new file loaded - see next step for why

print("\n ----------------- Input file has been loaded -----------------\n")

# Convert ROOT_CAUSE column (the target data column) from string to ordinal number

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
root_cause_data['ROOT_CAUSE'] = le.fit_transform(root_cause_data['ROOT_CAUSE'])
print(" ----------------- Converted root cause to numeric -----------------\n")

# Create a numpy array from root_cause_data for use with Keras functions

root_cause_data_np = root_cause_data.to_numpy()
print(" ----------------- Data in numpy array -----------------\n")

# Create our input data (X data) array from columns 2-8 (the seven boolean columns)
# and our target data (Y data) from the last column (the ROOT_CAUSE) column

X_data = root_cause_data_np[:, 1:8]
Y_data = root_cause_data_np[:, 8]
print(" ----------------- X data and Y data extracted -----------------\n")

# Convert ROOT_CAUSE column (the target data column) from ordinal number to boolean matrix using one hot encoding

import tensorflow as tf
Y_data = tf.keras.utils.to_categorical(Y_data)
print(" ----------------- Y data converted to binary matrix -----------------\n")

# Split data into training data and test data.  Use 10% of the data for test.
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X_data, Y_data, test_size=0.1)
print(" ----------------- Data split into training and test data -----------------\n")
print("Training data shapes:", X_train.shape, Y_train.shape)
print("Test data shapes:", X_test.shape, Y_test.shape)



 ----------------- Input file has been loaded -----------------

 ----------------- Converted root cause to numeric -----------------

 ----------------- Data in numpy array -----------------

 ----------------- X data and Y data extracted -----------------

 ----------------- Y data converted to binary matrix -----------------

 ----------------- Data split into training and test data -----------------

Training data shapes: (900, 7) (900, 3)
Test data shapes: (100, 7) (100, 3)


## 3 - Create the Deep Learning Model

The optimization journey for this model is described below.

The original model (from file Example Solution from Deep Learning Class) used the following design decisions:
- I started with the best practice a single hidden layer with a number of nodes equal to the average of input and output layer sizes, a sigmoid activation function on hidden layers (to mirror the binary data), softmax activation function on the output layer, and a categorical cross-entropy loss function (since this was a multi-class classification).  Ultimately, I ended up adding one more layer and a few more input nodes.
- I couldn't get an accuracy higher than ~85% training and testing (on the class example data set) and 86% during training and 76% during testing (on my data set).

Per community feedback (thanks to chatGPT), I started playing around more with optimizations to this model.  Those are described below:
- I added more neurons in the first layer and made those a multiple of 2 (like 16, 32).
- I added another layer.
- I played around with the activation function on hidden layers and moved from sigmoid to ReLU.

My findings from these optimizations were:
- A third hidden layer helped with accuracy.  A fourth made it worse.
- The best combination for neurons was 32 (layer 1), 32 (layer 2), 16 (layer 3).  I could get 78% accuracy during testing in that case (although only 83% during training).
- ReLU didn't help the accuracy, so I went back to sigmoid.  It may have helped if I had converted the input from binary to float, which I didn't end up trying out.

---
That said, none of these really moved the needle much.  This surprised me since I created the input file, and I knew I was using a basic linear function to directly map my independent variables to the target.  Given that I (for the sake of the exercise) gave my model data with no errors, I had expected it to eventually get to 100% accuracy.  Since it wasn't, I went back and explored my data.

It turns out, when I thought I was using binary values for all independent variables to generate an integer from 0-2 (which I then converted to a string), I was actually using floating point values.  I just didn't notice this because I had formatted Excel to print zero decimal places.

I went back and converted all my independent variables to true binary values (which were true zeros and ones) and then applied my linear function to them.  **Once I ran the model on my new data set, I got to 100% accuracy very quickly (~70 epochs).**

In [40]:
# Create the DL model
from tensorflow import keras
from tensorflow.keras import layers

HIDDEN_LAYER1_NODES = 32
HIDDEN_LAYER2_NODES = 32
HIDDEN_LAYER3_NODES = 16
#HIDDEN_LAYER4_NODES = 16 # removed because decreases accuracy

# Create a simple sequential model that takes all 7 columns of input and delivers one of the three target output values
model = keras.Sequential([
    layers.Dense(HIDDEN_LAYER1_NODES, input_shape=(7,), name='HiddenLayer1', activation='sigmoid'), 
    layers.Dense(HIDDEN_LAYER2_NODES, name='HiddenLayer2', activation='sigmoid'), 
    layers.Dense(HIDDEN_LAYER3_NODES, name='HiddenLayer3', activation='sigmoid'), 
#    layers.Dense(HIDDEN_LAYER3_NODES, name='HiddenLayer4', activation='sigmoid'), 
    layers.Dense(3, name='OutputLayer', activation='softmax')
])

# Compile the model with categorical cross-entropy where we monitor accuracy
model.compile(loss='categorical_crossentropy', metrics=['accuracy'])

# Print a summary of the model architecture
model.summary() 

Model: "sequential_12"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 HiddenLayer1 (Dense)        (None, 32)                256       
                                                                 
 HiddenLayer2 (Dense)        (None, 32)                1056      
                                                                 
 HiddenLayer3 (Dense)        (None, 16)                528       
                                                                 
 OutputLayer (Dense)         (None, 3)                 51        
                                                                 
Total params: 1891 (7.39 KB)
Trainable params: 1891 (7.39 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## 4 - Train and Evaluate Model

We can now train and evaluate the model.  I used the following heuristics when choosing the hyperparameters for training:
- Batch size to be a power of 2
- Started with an epoch size of 10 and grew to 300 to increase accuracy.
- Validation split of 0.2 per best-practice to have roughly 20% of data to be validation data

In [41]:
# Train and evaluate the model

# Setting key hyperparameters as constants for easily modifying

BATCH_SIZE = 16 # power of 2
EPOCH_SIZE = 70 
VALIDATION_SPLIT = 0.2 # roughly 20% of data is validation data

# This is where training occurs.  For EPOCH_SIZE runs, we will run BATCH_SIZE data through the model to train.
# After that, we'll run VALIDATION_SPLIT percentage of the training data through to fine tune the model.

print("\n ----------------- Starting training -----------------\n")
model.fit(X_train, Y_train, epochs=EPOCH_SIZE, batch_size=BATCH_SIZE, verbose=1, validation_split=VALIDATION_SPLIT)
print("\n ----------------- Training finished -----------------\n")

# This is where we test our model and see how it did.

print("\n ----------------- Starting evaluation -----------------\n")
loss, accuracy = model.evaluate(X_test, Y_test, verbose=1)
print("\n ----------------- Evaluation results -----------------\n")
print(f"Delta between predicted and actual values for model (loss): {loss:.4f}")
print(f"Accuracy for model: {accuracy:.4f}")


 ----------------- Starting training -----------------

Epoch 1/70
Epoch 2/70
Epoch 3/70
Epoch 4/70
Epoch 5/70
Epoch 6/70
Epoch 7/70
Epoch 8/70
Epoch 9/70
Epoch 10/70
Epoch 11/70
Epoch 12/70
Epoch 13/70
Epoch 14/70
Epoch 15/70
Epoch 16/70
Epoch 17/70
Epoch 18/70
Epoch 19/70
Epoch 20/70
Epoch 21/70
Epoch 22/70
Epoch 23/70
Epoch 24/70
Epoch 25/70
Epoch 26/70
Epoch 27/70
Epoch 28/70
Epoch 29/70
Epoch 30/70
Epoch 31/70
Epoch 32/70
Epoch 33/70
Epoch 34/70
Epoch 35/70
Epoch 36/70
Epoch 37/70
Epoch 38/70
Epoch 39/70
Epoch 40/70
Epoch 41/70
Epoch 42/70
Epoch 43/70
Epoch 44/70
Epoch 45/70
Epoch 46/70
Epoch 47/70
Epoch 48/70
Epoch 49/70
Epoch 50/70
Epoch 51/70
Epoch 52/70
Epoch 53/70
Epoch 54/70
Epoch 55/70
Epoch 56/70
Epoch 57/70
Epoch 58/70
Epoch 59/70
Epoch 60/70
Epoch 61/70
Epoch 62/70
Epoch 63/70
Epoch 64/70
Epoch 65/70
Epoch 66/70
Epoch 67/70
Epoch 68/70
Epoch 69/70
Epoch 70/70

 ----------------- Training finished -----------------


 ----------------- Starting evaluation ---------------