**CSI 4106 Introduction to Artificial Intelligence** <br/>
*Assignment 3: Neural Networks*

# Identification

Name: Raj Badial<br/>
Student Number: 300173931
Tasks: Steps 1-7

Name: Shereen Etemad<br/>
Student Number: 300186291
Tasks: Steps 8 & 9

Split this way because 8 and 9 took the longest by far. We both helped each other throughout the assignment however.

## 1. Exploratory Analysis

### Loading the dataset

A custom dataset has been created for this assignment. It has been made available on a public GitHub repository:

- [github.com/turcotte/csi4106-f24/tree/main/assignments-data/a3](https://github.com/turcotte/csi4106-f24/tree/main/assignments-data/a3)

Access and read the dataset directly from this GitHub repository in your Jupyter notebook.

You can use this code cell for you import statements and other initializations.

In [1]:
# Code cell
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.dummy import DummyClassifier
from tensorflow.keras.models import Sequential # type: ignore
from tensorflow.keras.layers import Dense, Input, Dropout # type: ignore
from tensorflow.keras.callbacks import EarlyStopping # type: ignore
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, make_scorer
from sklearn.model_selection import cross_validate, GridSearchCV
import tensorflow as tf # type: ignore
import matplotlib.pyplot as plt
from tensorflow.keras.regularizers import l2 # type: ignore
from tensorflow.keras.optimizers import Adam # type: ignore

#1: Load the Dataset

# URLs for the datasets (replace with actual URLs if these are not correct)
train_url = 'https://raw.githubusercontent.com/turcotte/csi4106-f24/refs/heads/main/assignments-data/a3/cb513_train.csv'
validation_url = 'https://raw.githubusercontent.com/turcotte/csi4106-f24/refs/heads/main/assignments-data/a3/cb513_valid.csv'
test_url = 'https://raw.githubusercontent.com/turcotte/csi4106-f24/refs/heads/main/assignments-data/a3/cb513_test.csv'

# Load the datasets
train_data = pd.read_csv(train_url, header=None)
validation_data = pd.read_csv(validation_url, header=None)
test_data = pd.read_csv(test_url, header=None)

### Data Pre-Processing

2. **Shuffling the Rows**:

    - Since examples are generated by sliding a window across each protein sequence, most adjacent examples originate from the same protein and share 20 positions. To mitigate the potential negative impact on model training, the initial step involves shuffling the **rows** of the data matrix.

In [None]:
# Code cell
shuffled_training = train_data.sample(frac=1, random_state=42).reset_index(drop=True)

3. **Scaling of Numerical Features**:

    - Since all 462 features are proportions represented as values between 0 and 1, scaling may not be necessary. In our evaluations, using [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) actually degraded model performance. Within your pipeline, compare the effects of not scaling the data versus applying [MinMaxScaler](https://scikit-learn.org/1.5/modules/generated/sklearn.preprocessing.MinMaxScaler.html). In the interest of time, a single experiment will suffice. It is important to note that when scaling is applied, a uniform method should be used across all columns, given their homogeneous nature.

In [None]:
# Code cell
y = shuffled_training.iloc[:, 0]    # target vector
X = shuffled_training.iloc[:, 1:]   # features
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Option 1: Without scaling
model = DecisionTreeClassifier(max_depth=10, random_state=42)
model.fit(X_train, y_train)
no_scaling_score = model.score(X_val, y_val)

# Option 2: With MinMaxScaler
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)

model.fit(X_train_scaled, y_train)
scaling_score = model.score(X_val_scaled, y_val)

# Compare results
print("Score without scaling:", no_scaling_score)
print("Score with MinMaxScaler:", scaling_score)

Score without scaling: 0.5514195042456471 <br>
Score with MinMaxScaler: 0.5514195042456471

4. **Isolating the Target and the Data**:

    - In the CSV files, the target and data are combined. To prepare for our machine learning experiments, separate the training data $X$ and the target vector $y$ for each of the three datasets.

In [None]:
# Code cell
# Step 4: Isolating the data for each dataset (already done for training in step 3, labeled just as y and x)
y_validation = validation_data.iloc[:, 0]    # target vector
X_validation = validation_data.iloc[:, 1:]   # features

y_test = test_data.iloc[:, 0]    # target vector
X_test = test_data.iloc[:, 1:]   # features

### Model Development & Evaluation

5. **Model Development**:

    - **Dummy Model**: Implement a model utilizing the [DummyClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html). This model disregards the input data and predicts the majority class. Such model is sometimes called a straw man model.

    - **Basline Model**: As a baseline model, select one of the previously studied machine learning algorithms: Decision Trees, K-Nearest Neighbors (KNN), or Logistic Regression. Use the default parameters provided by scikit-learn to train each model as a baseline. Why did you choose this particular classifier? Why do you think it should be appropriate for this specific task?

    - **Neural Network Model**: Utilizing [Keras](https://keras.io) and [TensorFlow](https://www.tensorflow.org), construct a sequential model comprising an input layer, a hidden layer, and an output layer. The input layer should consist of 462 nodes, reflecting the 462 attributes of each example. The hidden layer should include 8 nodes and employ the default activation function. The output layer should contain three nodes, corresponding to the three classes: helix (0), sheet (1), and coil (2). Apply the softmax activation function to the output layer to ensure that the outputs are treated as probabilities, with their sum equaling 1 for each training example.

    We therefore have three models: dummy, baseline, and neural network.

In [None]:
# Code cell
# Step 5
# Implementing Dummy model
dummy_model = DummyClassifier(strategy="most_frequent")
dummy_model.fit(X_train, y_train)
y_val_pred_dummy = dummy_model.predict(X_val)

# Implementing Baseline model (with DecisionTree)
baseline_model = DecisionTreeClassifier(max_depth=10, random_state=42)
baseline_model.fit(X_train, y_train)
y_val_pred_baseline = baseline_model.predict(X_val)

# Using decision tree as many of the relationships are likely non-linear which decision tree is good at predicting, 
# while Logistic Regression is better for linear predictions. And KNN is better for smaller datasets where the data naturally 
# clusters around each class. Because of high-dimensionality KNN may not be the best fit. Decision Tree is prone to overfitting however.

# Implementing Neural Network Model
def create_neural_network(hidden_nodes):
    model = Sequential()
    model.add(Input(shape=(462,)))                                # Input layer with 462 nodes
    model.add(Dense(hidden_nodes, activation='relu'))             # Hidden layer with 8 nodes (using hidden_nodes to reuse this function later for step 8)
    model.add(Dense(3, activation='softmax'))                     # Output layer with 3 nodes
    
    # Compile the model
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

nn_model = create_neural_network(8)
history = nn_model.fit(X_train, y_train, epochs=50, validation_data=(X_val, y_val)) # train the model
y_val_pred_nn = nn_model.predict(X_val).argmax(axis=1) # because we're using softmax in the NNM

6. **Model Evaluation**:

    - Employ cross-validation to assess the performance of the baseline model. Select a small number of folds to prevent excessive computational demands.

In [None]:
# Code cell
# Step 6: Cross-Eval of Models
# Dummy Model
accuracy_dummy = accuracy_score(y_val, y_val_pred_dummy)
precision_dummy = precision_score(y_val, y_val_pred_dummy, average='weighted', zero_division=1)
recall_dummy = recall_score(y_val, y_val_pred_dummy, average='weighted')
f1_dummy = f1_score(y_val, y_val_pred_dummy, average='weighted')

print("\nDummy Model Performance:")
print(f"Accuracy: {accuracy_dummy:.2f}")
print(f"Precision: {precision_dummy:.2f}")
print(f"Recall: {recall_dummy:.2f}")
print(f"F1 Score: {f1_dummy:.2f}")

# Baseline Model w 5 folds
scoring = {
    'accuracy': make_scorer(accuracy_score), 
    'precision': make_scorer(precision_score, average='weighted'),
    'recall': make_scorer(recall_score, average='weighted'),
    'f1': make_scorer(f1_score, average='weighted')
}

# Perform cross-validation
cv_results = cross_validate(baseline_model, X_train, y_train, cv=5, scoring=scoring)

# Display average cross-validation scores
print("\nBaseline Model Performance:")
print(f"Accuracy: {cv_results['test_accuracy'].mean():.4f}")
print(f"Decision Tree CV Average Precision: {cv_results['test_precision'].mean():.4f}")
print(f"Decision Tree CV Average Recall: {cv_results['test_recall'].mean():.4f}")
print(f"Decision Tree CV Average F1-Score: {cv_results['test_f1'].mean():.4f}")

# NN Model
accuracy_nn = accuracy_score(y_val, y_val_pred_nn)
precision_nn = precision_score(y_val, y_val_pred_nn, average='weighted')
recall_nn = recall_score(y_val, y_val_pred_nn, average='weighted')
f1_nn = f1_score(y_val, y_val_pred_nn, average='weighted')

print("\nNeural Network Model Performance:")
print(f"Accuracy: {accuracy_nn:.2f}")
print(f"Precision: {precision_nn:.2f}")
print(f"Recall: {recall_nn:.2f}")
print(f"F1 Score: {f1_nn:.2f}")

Dummy Model Performance: <br>
Accuracy: 0.42<br>
Precision: 0.76<br>
Recall: 0.42<br>
F1 Score: 0.24<br>

Baseline Model Performance:<br>
Accuracy: 0.5552<br>
Decision Tree CV Average Precision: 0.5517<br>
Decision Tree CV Average Recall: 0.5552<br>
Decision Tree CV Average F1-Score: 0.5466<br>

Neural Network Model Performance:<br>
Accuracy: 0.70<br>
Precision: 0.70<br>
Recall: 0.70<br>
F1 Score: 0.70

    - Assess the models using metrics such as precision, recall, and F1-score.

### Hyperparameter Optimization

7. **Baseline Model:**

    - To ensure a fair comparison for our baseline model, we will examine how varying hyperparameter values affect its performance. This prevents the erroneous conclusion that neural networks inherently perform better, when in fact, appropriate hyperparameter tuning could enhance the baseline model's performance.

    - Focus on the following relevant hyperparameters for each model:

        - [DecisionTreeClassifier](https://scikit-learn.org/dev/modules/generated/sklearn.tree.DecisionTreeClassifier.html): `criterion` and `max_depth`.
  
        - [LogisticRegression](https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.LogisticRegression.html): `penalty`, `max_iter`, and `tol`.
  
        - [KNeighborsClassifier](https://scikit-learn.org/dev/modules/generated/sklearn.neighbors.KNeighborsClassifier.html): `n_neighbors` and `weights`.

    - Employ a grid search strategy or utilize scikit-learn's built-in methods [GridSearchCV](https://scikit-learn.org/dev/modules/generated/sklearn.model_selection.GridSearchCV.html) to thoroughly evaluate all combinations of hyperparameter values. Cross-validation should be used to assess each combination.

    - Quantify the performance of each hyperparameter configuration using precision, recall, and F1-score as metrics.

    - Analyze the findings and offer insights into which hyperparameter configurations achieved optimal performance for each model.

In [None]:
# Code cell
# Step 7: Hyperparameter Optimization for Baseline model
param_grid = {
    'criterion': ['gini', 'entropy'],
    'max_depth': [None, 10, 20, 30],   # Limits depth to avoid overfitting
    'min_samples_split': [2, 5, 10]    # Controls when a node will be split
}

scoring = {
    'precision': make_scorer(precision_score, average='weighted'),
    'recall': make_scorer(recall_score, average='weighted'),
    'f1': make_scorer(f1_score, average='weighted')
}

# Initialize GridSearchCV with the Decision Tree model and 5-fold cross-validation
grid_search = GridSearchCV(
    DecisionTreeClassifier(random_state=42),
    param_grid=param_grid,
    scoring=scoring,
    refit='f1',                      # Refits on the best parameter setting based on F1 score
    cv=5,
    verbose=1,
    n_jobs=-1                        # Use all available CPU cores
)

grid_search.fit(X_train, y_train)

# best parameters found
print("Best parameters found:", grid_search.best_params_)

# average precision and recall scores for the best model
best_cv_results = grid_search.cv_results_
best_index = grid_search.best_index_
print("Best Cross-Validated Precision:", best_cv_results['mean_test_precision'][best_index])
print("Best Cross-Validated Recall:", best_cv_results['mean_test_recall'][best_index])
print("Best Cross-Validated F1 Score:", best_cv_results['mean_test_f1'][best_index])

Best parameters found: {'criterion': 'entropy', 'max_depth': 10, 'min_samples_split': 5}<br>
Best Cross-Validated Precision: 0.554962721731411<br>
Best Cross-Validated Recall: 0.5570424802596701<br>
Best Cross-Validated F1 Score: 0.5471584699757781

8. **Neural Network:**

    In our exploration and tuning of neural networks, we focus on the following hyperparameters:

    - **Single hidden layer, varying the number of nodes**. 

        - Start with a single node in the hidden layer. Use a graph to depict the progression of loss and accuracy for both the training and validation sets, with the horizontal axis representing the number of training epochs and the vertical axis showing loss and accuracy. Training this network should be relatively fast, so let's conduct training for 50 epochs. Observing the graph, what do you conclude? Is the network underfitting or overfitting? Why?

        - Repeat the above process using 2 and 4 nodes in the hidden layer. Use the same type of graph to document your observations regarding loss and accuracy.

        - Start with 8 nodes in the hidden layer and progressively double the number of nodes until it surpasses the number of nodes in the input layer. This results in seven experiments and corresponding graphs for the following configurations: 8, 16, 32, 64, 128, 256, and 512 nodes. Document your observations throughout the process.
        
        - Ensure that the **number of training epochs** is adequate for **observing an increase in validation loss**. **Tip**: During model development, start with a small number of epochs, such as 5 or 10. Once the model appears to perform well, test with larger values, like 40 or 80 epochs, which proved reasonable in our tests. Based on your observations, consider conducting further experiments, if needed. How many epochs were ultimately necessary?

In [None]:
# Code cell
# Step 8 (Neural Network Experimentation)
# Varying nodes in hidden layer:

# List of node counts to try in the hidden layer
node_counts = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]

# Dictionary to store the validation accuracy for each configuration
results = {}

# Loop through each node configuration, train the model, and record the validation accuracy
for nodes in node_counts:
    print(f"\nTraining model with {nodes} hidden nodes...")
    nn_model = create_neural_network(hidden_nodes=nodes) # Same function from Step 5
    history = nn_model.fit(X_train, y_train, epochs=50, validation_data=(X_val, y_val), verbose=0)
    
    # Plot training and validation loss and accuracy
    epochs = range(1, len(history.history['loss']) + 1)

    plt.figure(figsize=(12, 5))
    
    # Plot Loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, history.history['loss'], 'b', label='Training Loss')
    plt.plot(epochs, history.history['val_loss'], 'r', label='Validation Loss')
    plt.title(f'Loss for {nodes} Nodes in Hidden Layer')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    
    # Plot Accuracy
    plt.subplot(1, 2, 2)
    plt.plot(epochs, history.history['accuracy'], 'b', label='Training Accuracy')
    plt.plot(epochs, history.history['val_accuracy'], 'r', label='Validation Accuracy')
    plt.title(f'Accuracy for {nodes} Nodes in Hidden Layer')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    
    plt.tight_layout()
    plt.show()

![alt text](<Screenshot 2024-11-09 224237.png>) ![alt text](<Screenshot 2024-11-09 224340.png>) ![alt text](<Screenshot 2024-11-09 224450.png>) ![alt text](<Screenshot 2024-11-09 224608.png>) ![alt text](<Screenshot 2024-11-09 224728.png>) ![alt text](<Screenshot 2024-11-09 224849.png>) ![alt text](<Screenshot 2024-11-09 225010.png>) ![alt text](<Screenshot 2024-11-09 225149.png>) ![alt text](<Screenshot 2024-11-09 225350.png>) ![alt text](<Screenshot 2024-11-09 225640.png>)

    - **Varying the number of layers**.

        - Conduct similar experiments as described above, but this time vary the number of layers from 1 to 4. Document your findings.

        - How many nodes should each layer contain? Test at least two scenarios. Traditionally, a common strategy involved decreasing the number of nodes from the input layer to the output layer, often by halving, to create a pyramid-like structure. However, recent experience suggests that maintaining a constant number of nodes across all layers can perform equally well. Describe your observations. It is acceptable if both strategies yield similar performance results.

        - Select one your models that exemplifies overfitting. In our experiments, we easily constructed a model achieving nearly 100% accuracy on the training data, yet showing no similar improvement on the validation set. Present this neural network along with its accuracy and loss graphs. Explain the reasoning for concluding that the model is overfitting.

In [None]:
# Code cell
# Varying number of layers:
def create_neural_network2(num_layers, nodes_per_layer=8):
    model = Sequential()
    model.add(Input(shape=(462,)))  # Input layer with 462 features
    
    # Add the specified number of hidden layers with the same number of nodes per layer
    for _ in range(num_layers):
        model.add(Dense(nodes_per_layer, activation='relu'))
    
    model.add(Dense(3, activation='softmax'))  # Output layer with 3 nodes for 3 classes
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

# List of layer counts to try
layer_counts = [1, 2, 3, 4, 5]

# Loop through each configuration, train the model, and plot the results
for layers in layer_counts:
    print(f"\nTraining model with {layers} hidden layers")
    nn_model = create_neural_network2(num_layers=layers)
    history = nn_model.fit(X_train, y_train, epochs=50, validation_data=(X_val, y_val), verbose=0)
    
    # Plot training and validation loss and accuracy
    epochs = range(1, len(history.history['loss']) + 1)

    plt.figure(figsize=(12, 5))
    
    # Plot Loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, history.history['loss'], 'b', label='Training Loss')
    plt.plot(epochs, history.history['val_loss'], 'r', label='Validation Loss')
    plt.title(f'Loss for {layers} Hidden Layers')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    
    # Plot Accuracy
    plt.subplot(1, 2, 2)
    plt.plot(epochs, history.history['accuracy'], 'b', label='Training Accuracy')
    plt.plot(epochs, history.history['val_accuracy'], 'r', label='Validation Accuracy')
    plt.title(f'Accuracy for {layers} Hidden Layers')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    
    plt.tight_layout()
    plt.show()

![alt text](<Screenshot 2024-11-09 225759.png>) ![alt text](<Screenshot 2024-11-09 225913.png>) ![alt text](<Screenshot 2024-11-09 230024.png>) ![alt text](<Screenshot 2024-11-09 230139.png>) ![alt text](<Screenshot 2024-11-09 230256.png>)

    - **Activation function**.

        - Present results for one of the configurations mentioned above by varying the activation function. Test at least `relu` (the default) and `sigmoid`. The choice of the specific model, including the number of layers and nodes, is at your discretion. Document your observations accordingly.

In [None]:
# Code cell
# Changing the activation function
def create_neural_network3(activation='relu', num_layers=2, nodes_per_layer=8):
    model = Sequential()
    model.add(Input(shape=(462,)))  # Input layer with 462 features
    
    # Hidden layers with the specified activation function
    for _ in range(num_layers):
        model.add(Dense(nodes_per_layer, activation=activation))
    
    model.add(Dense(3, activation='softmax'))  # Output layer with 3 nodes for 3 classes
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

# List of activation functions
activation_functions = ['relu', 'sigmoid']

# Loop through each activation function, train the model, and plot the results
for activation in activation_functions:
    print(f"\nTraining model with {activation} activation function")
    nn_model = create_neural_network3(activation=activation)
    history = nn_model.fit(X_train, y_train, epochs=50, validation_data=(X_val, y_val), verbose=0)
    
    # Plot training and validation loss and accuracy
    epochs = range(1, len(history.history['loss']) + 1)

    plt.figure(figsize=(12, 5))
    
    # Plot Loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, history.history['loss'], 'b', label='Training Loss')
    plt.plot(epochs, history.history['val_loss'], 'r', label='Validation Loss')
    plt.title(f'Loss with {activation} Activation Function')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    
    # Plot Accuracy
    plt.subplot(1, 2, 2)
    plt.plot(epochs, history.history['accuracy'], 'b', label='Training Accuracy')
    plt.plot(epochs, history.history['val_accuracy'], 'r', label='Validation Accuracy')
    plt.title(f'Accuracy with {activation} Activation Function')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    
    plt.tight_layout()
    plt.show()

![alt text](<Screenshot 2024-11-09 230403.png>) ![alt text](<Screenshot 2024-11-09 230510.png>)

    - **Regularization** in neural networks is a technique used to prevent overfitting.

        - One technique involves adding a penalty to the loss function to discourage excessively complex models. Apply an `l2` penalty to some or all layers. Exercise caution, as overly aggressive penalties have been problematic in our experiments. Begin with the default `l2` value of 0.01, then reduce it to 0.001 and 1e-4. Select a specific model from the above experiments and present a case where you successfully reduced overfitting. Include a pair of graphs comparing results with and without regularization. Explain your rationale to conclude that overfitting has been reduced. Do not expect to completely eliminate overfitting. Again, this is a challenging dataset to work with.

        - Dropout layers are a regularization technique in neural networks where a random subset of neurons is temporarily removed during training. This helps prevent overfitting by promoting redundancy and improving the network's ability to generalize to new data. Select a specific model from the above experiments where you have muliple layers and experiment adding one or of few dropout layers into your network. Experiment with two different rates, say 0.25 and 0.5. Document your observations.

In [None]:
# Code cell
# Regularization techniques
def create_neural_network4(l2_penalty=0.0, dropout_rate=0.0):
    model = Sequential()
    model.add(Input(shape=(462,)))
    model.add(Dense(8, activation='relu', kernel_regularizer=l2(l2_penalty)))  # L2 regularization on the hidden layer
    if dropout_rate > 0:
        model.add(Dropout(dropout_rate))  # Apply dropout if rate > 0
    model.add(Dense(3, activation='softmax'))
    
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

# Define different regularization configurations to try
regularization_configs = [
    {'l2_penalty': 0.01, 'dropout_rate': 0.0}, # Only L2 regularization
    {'l2_penalty': 0.001, 'dropout_rate': 0.0}, # Smaller L2 regularization
    {'l2_penalty': 0.01, 'dropout_rate': 0.25}, # Both L2 and Dropout of 0.25
    {'l2_penalty': 0.001, 'dropout_rate': 0.25}, # Smaller L2 with 0.25 dropoyt
    {'l2_penalty': 0.01, 'dropout_rate': 0.5}, # L2 with 0.5 dropout
    {'l2_penalty': 0.001, 'dropout_rate': 0.5}, # Smaller L2 with 0.5 dropout 
    {'l2_penalty': 0.0, 'dropout_rate': 0.25}, # Only Dropout of 0.25
    {'l2_penalty': 0.0, 'dropout_rate': 0.5} # Only Dropout of 0.5
]

# Loop through each regularization configuration
for config in regularization_configs:
    l2_penalty = config['l2_penalty']
    dropout_rate = config['dropout_rate']
    print(f"\nTraining model with L2 penalty={l2_penalty} and Dropout rate={dropout_rate}")
    
    nn_model = create_neural_network4(l2_penalty=l2_penalty, dropout_rate=dropout_rate)
    history = nn_model.fit(X_train, y_train, epochs=50, validation_data=(X_val, y_val), verbose=0)
    
    # Plot training and validation loss and accuracy
    epochs = range(1, len(history.history['loss']) + 1)

    plt.figure(figsize=(12, 5))
    
    # Plot Loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, history.history['loss'], 'b', label='Training Loss')
    plt.plot(epochs, history.history['val_loss'], 'r', label='Validation Loss')
    plt.title(f'Loss with L2={l2_penalty} and Dropout={dropout_rate}')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    
    # Plot Accuracy
    plt.subplot(1, 2, 2)
    plt.plot(epochs, history.history['accuracy'], 'b', label='Training Accuracy')
    plt.plot(epochs, history.history['val_accuracy'], 'r', label='Validation Accuracy')
    plt.title(f'Accuracy with L2={l2_penalty} and Dropout={dropout_rate}')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    
    plt.tight_layout()
    plt.show()

        - Summarize your experiments with using a graphical representation such as Figure 6.15 [on this page](https://egallic.fr/Enseignement/ML/ECB/book/deep-learning.html).

![alt text](<Screenshot 2024-11-09 230623.png>) ![alt text](<Screenshot 2024-11-09 230734.png>) ![alt text](<Screenshot 2024-11-09 230846.png>) ![alt text](<Screenshot 2024-11-09 231000.png>) ![alt text](<Screenshot 2024-11-09 231114.png>) ![alt text](<Screenshot 2024-11-09 231222.png>) ![alt text](<Screenshot 2024-11-09 231333.png>) ![alt text](<Screenshot 2024-11-09 231440.png>)

### Test

9. **Model Comparison**:

    - Evaluate the baseline model on the test set, using the optimal parameter set identified through grid search. Additionally, apply your best-performing neural network configuration to the test set.

    - Quantify the performance of the baseline model (best hyperparameter configuration) and your neural network (best configuration) using precision, recall, and F1-score as metrics. How do these two models compare to the dummy model?

    - Provide recommendations on which model(s) to choose for this task and justify your choices based on the analysis results.

The best Decision Tree model hyperparameters were found in a previous step so we used these for the Model Comparison.

The Neural Network seemed to perform best when:<br>
There were 2 hidden layers<br>
There were 8 nodes per layer<br>
An L2 penalty of 0.01 and with a dropout rate of 0.25

We used 30 epochs as some of these showed slight dropoffs after that and mainly had their peaks in accuracy within the 20-30 range.

In [None]:
# Code cell
# Step 9

# Decision Tree model with specified hyperparameters
decision_tree_model = DecisionTreeClassifier(criterion='entropy', min_samples_split=5, max_depth=10)

# Train the Decision Tree model
decision_tree_model.fit(X_train, y_train)

# Predict on the validation/test set
y_pred_dt = decision_tree_model.predict(X_test)

# Calculate evaluation metrics
accuracy_dt = accuracy_score(y_test, y_pred_dt)
precision_dt = precision_score(y_test, y_pred_dt, average='weighted', zero_division=1)
recall_dt = recall_score(y_test, y_pred_dt, average='weighted')
f1_dt = f1_score(y_test, y_pred_dt, average='weighted')

# Print metrics
print("Decision Tree Model Evaluation")
print("Accuracy:", accuracy_dt)
print("Precision:", precision_dt)
print("Recall:", recall_dt)
print("F1 Score:", f1_dt)

# Define the neural network structure
def create_neural_network5(num_layers, l2_penalty, dropout_rate, nodes_per_layer):
    model = Sequential()
    model.add(Input(shape=(462,)))
    for _ in range(num_layers):
        model.add(Dense(nodes_per_layer, activation='relu', kernel_regularizer=l2(l2_penalty)))  # L2 regularization on the hidden layer
        model.add(Dropout(dropout_rate))
    model.add(Dense(3, activation='softmax'))
    
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

neural_network_model = create_neural_network5(l2_penalty=0.01, dropout_rate=0.25, nodes_per_layer=8, num_layers=2)

# Train the model (adjust epochs and batch size as needed)
history = neural_network_model.fit(X_train, y_train, epochs=30, batch_size=32, validation_data=(X_test, y_test))

# Predict and evaluate the model
y_pred_nn = neural_network_model.predict(X_test).argmax(axis=1)

# Calculate evaluation metrics
accuracy_nn = accuracy_score(y_test, y_pred_nn)
precision_nn = precision_score(y_test, y_pred_nn, average="weighted", zero_division=1)
recall_nn = recall_score(y_test, y_pred_nn, average='weighted')
f1_nn = f1_score(y_test, y_pred_nn, average='weighted')

# Print metrics
print("Neural Network Model Evaluation")
print("Accuracy:", accuracy_nn)
print("Precision:", precision_nn)
print("Recall:", recall_nn)
print("F1 Score:", f1_nn)

Decision Tree Model Evaluation<br>
Accuracy: 0.5589343379978472<br>
Precision: 0.5615071732072793<br>
Recall: 0.5589343379978472<br>
F1 Score: 0.555584610114814<br>

Neural Network Model Evaluation<br>
Accuracy: 0.7070775026910656<br>
Precision: 0.707159434552183<br>
Recall: 0.7070775026910656<br>
F1 Score: 0.7070984732289461

# Resources

Resources<br>
https://stackoverflow.com/questions/29576430/shuffle-dataframe-rows<br>
https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html <br>
https://www.geeksforgeeks.org/implementing-neural-networks-using-tensorflow/<br>
https://scikit-learn.org/dev/modules/generated/sklearn.model_selection.GridSearchCV.html<br>

AI Code given when asked how to display loss and accuracy for each set of nodes in step 8 part 1 (Used mainly to save time and to keep code clean and consistent for similar parts):<br>
<br>

for nodes in node_counts:
    print(f"\nTraining model with {nodes} hidden nodes...")
    nn_model = create_neural_network(hidden_nodes=nodes)
    history = nn_model.fit(X_train, y_train, epochs=50, validation_data=(X_val, y_val), callbacks=[early_stopping], verbose=0)
    
    # Plot training and validation loss and accuracy
    epochs = range(1, len(history.history['loss']) + 1)

    plt.figure(figsize=(12, 5))
    
    # Plot Loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, history.history['loss'], 'b', label='Training Loss')
    plt.plot(epochs, history.history['val_loss'], 'r', label='Validation Loss')
    plt.title(f'Loss for {nodes} Nodes in Hidden Layer')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    
    # Plot Accuracy
    plt.subplot(1, 2, 2)
    plt.plot(epochs, history.history['accuracy'], 'b', label='Training Accuracy')
    plt.plot(epochs, history.history['val_accuracy'], 'r', label='Validation Accuracy')
    plt.title(f'Accuracy for {nodes} Nodes in Hidden Layer')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    
    plt.tight_layout()
    plt.show()
"""