<a href="https://colab.research.google.com/github/marktfaust/TensorFlow-Neural-Networks/blob/main/Faust_Mark_assignment_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Artificial Intelligence
# 464/664
# Assignment #7

## General Directions for this Assignment

00. We're using a Jupyter Notebook environment (tutorial available here: https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html),
01. Output format should be exactly as requested (it is your responsibility to make sure notebook looks as expected on Gradescope),
02. Check submission deadline on Gradescope,
03. Rename the file to Last_First_assignment_7,
04. Submit your notebook (as .ipynb, not PDF) using Gradescope, and
05. Do not submit any other files.

## Before You Submit...

1. Re-read the general instructions provided above, and
2. Hit "Kernel"->"Restart & Run All".

## Neural Networks: Architecture

For this assignment we will explore Neural Networks; in particular, we are going to explore model complexity. We will use the same dataset from Assignment #6 to classify a mushroom as either edible ('e') or poisonous ('p'). You are free to use PyTorch, TensorFlow, scikit-learn -- to name a few resources. The goal is to explore different model complexities (architectures) before declaring a winner. Either start with a simple network and make it more complex; or start with a complex model and pare it down. Either way, your submission should clearly demonstrate your exploration.


Your output for each model should look like the output of `cross_validate` from Assignment #6:

```
Fold: 0	Train Error: 15.38%	Validation Error: 0.00%
Fold: 1
...

Mean(Std. Dev.) over all folds:
-------------------------------
Train Error: 100.00%(0.00%) Test Error: 100.00%(0.00%)
```

Notice that "Test Error" has been replaced by "Validation Error." Split your dataset into train, test, and validation sets.


Start with a simple network. Train using the train set. Observe model's performance using the validation set.


Increase the complexity of your network. Train using the train set. Observe model's performance using the validation set.


Model complexity in Assignment #6 was depth limit. You can think of it here as the architecture of the network (number of layers and units per layer). Try at least three different network architectures.


We're trying to find a model complexity that generalizes well. (Recall high bias vs high variance discussion in class.)


Pick the network architecture that you deem best. Use the test set to report your winning model's performance. This is the ONLY time you use the test set.


Try at least three different models; more importantly, document your process: what the results were, how the winning model was determined, what was the winning model's performance on the test data. Clearly highlight these items to receive full credit.

## Importing necessary packages

# Documentation for Imports

### `import pandas as pd`
- **Purpose**:
  Pandas is a powerful library for data manipulation and analysis, particularly for handling structured data like DataFrames.
- **Usage**:
  - Data loading from files (e.g., CSV, Excel).
  - Data cleaning, transformation, and summarization.

---

### `import numpy as np`
- **Purpose**:
  NumPy provides support for efficient numerical computations, especially with arrays and matrices, and includes mathematical functions to operate on these structures.
- **Usage**:
  - Numerical operations on large datasets.
  - Efficient handling of array-based data.
  - Precision display settings for better readability of NumPy output.

**Note**: The `np.set_printoptions(precision=3, suppress=True)` ensures NumPy arrays are displayed with 3 decimal places, and small values in scientific notation are suppressed for easier reading.

---


### `import tensorflow as tf`
- **Purpose**:
  TensorFlow is a machine learning framework used for building, training, and deploying neural networks and other ML models.
- **Usage**:
  - Creating and training deep learning models.
  - Deploying models in production environments.

---

### `from tensorflow.keras import layers`
- **Purpose**:
  Provides access to TensorFlow's high-level Keras API for defining and building neural network layers.
- **Usage**:
  - Simplifies neural network architecture design.
  - Supports layers like Dense, Convolutional, and Recurrent layers.

---

### `from sklearn.model_selection import train_test_split`
- **Purpose**:
  Provides a convenient method to split datasets into training and testing subsets for machine learning workflows.
- **Usage**:
  - Ensures that models are evaluated on unseen data to prevent overfitting.
  - Allows for stratified splits to maintain class distributions.

### ---

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

# Make numpy values easier to read.
np.set_printoptions(precision=3, suppress=True)

## Utility Functions

<a id="create_folds"></a>

## create_folds

The `create_folds` function partitions a dataset into a specified number of folds for use in cross-validation. It ensures that both feature (`x_data`) and target (`y_data`) datasets are split consistently, maintaining their alignment.

### Parameters:
* **x_data** `Dict[str, List[Any]]`: A dictionary where keys are feature names and values are lists containing feature data.
* **y_data** `List[Any]`: A list containing the target values corresponding to the features in `x_data`.
* **num_folds** `int`: The number of folds to split the dataset into.

### Returns:
* **folds_x** `List[Dict[str, List[Any]]]`: A list of dictionaries, where each dictionary contains a subset of `x_data` split into one fold.
* **folds_y** `List[List[Any]]`: A list of lists, where each inner list contains a subset of `y_data` split into one fold.


In [2]:
def create_folds(x_data, y_data, num_folds):
  # Assert proper lengths of incoming data
  assert(len(x_data.items()) > 0 and len(y_data) > 0)
  assert(len(list(x_data.values())[0]) == len(y_data))

  # Create folds of each number of elements over entire data set
  k, m = divmod(len(y_data), num_folds)
  folds = {'x': [], 'y': []}
  for i in range(num_folds):
    start = i * k + min(i, m)
    stop = (i + 1) * k + min(i + 1, m)
    folds['x'].append({name:values[start:stop] for name, values in x_data.items()})
    folds['y'].append(y_data[start:stop])

  return folds['x'], folds['y']

<a id="evaluate"></a>
## evaluate

This function evaluates the accuracy of a trained model on a test dataset by comparing predicted labels to true labels.

### Parameters:
- **model** `tf.keras.Model`: The trained TensorFlow model used for making predictions.
- **x_test_features_dict** `dict`: A dictionary of test dataset features, where keys are feature names and values are their corresponding data arrays.
- **y_test_labels** `pandas.Series`: The ground truth labels for the test dataset.

### Returns:
- **accuracy** `float`: The proportion of correctly predicted labels to the total number of labels in the test dataset.

### Notes:
- **Prediction**: The model predicts probabilities for each test sample, which are rounded to the nearest integer to determine class labels.
- **Evaluation**: Compares the predicted labels with the actual labels to calculate accuracy.
- **Dependencies**:
  - The `model.predict` method must be compatible with the input format of `x_test_features_dict`.
  - `y_test_labels.to_list()` is used to convert the labels into a list format for comparison.


In [3]:
def evaluate(model, x_test_features_dict, y_test_labels):
  # Define variables to track performance
  correct_count = 0
  num_elements = len(y_test_labels)

  # Flatten data for easier access
  predictions = [label for labels in model.predict(x_test_features_dict, verbose=0) for label in labels]
  values = y_test_labels.to_list()

  # Check all predictions against ground truths
  for i in range(num_elements):
    if values[i] == round(predictions[i]): correct_count += 1

  return correct_count / num_elements

### Loading agaricus-lepiota.data

This cell performs the initial data loading and preprocessing for the mushroom classification task:

1. **Load Data**:  
   - The dataset (`agaricus-lepiota.data`) is read into a Pandas DataFrame with specified column names for clarity. The dataset contains features describing mushrooms and their labels (`poisonous` or `edible`).

2. **Shuffle Data**:  
   - The `sample(frac=1)` method shuffles the rows randomly, ensuring the data order does not bias model training. The `reset_index(drop=True)` resets the index after shuffling.

3. **Preview Data**:  
   - `mushrooms.head()` displays the first few rows of the shuffled dataset, providing a quick overview of its structure and contents.

This step ensures the data is ready for further preprocessing and model training.


In [4]:
# Implementation and exploration.
mushrooms = pd.read_csv("agaricus-lepiota.data", names=["poisonous", "cap-shape", "cap-surface", "cap-color", "bruises", "odor",
           "gill-attachment", "gill-spacing", "gill-size", "gill-color", "stalk-shape", "stalk-root", "stalk-surface-above-ring",
           "stalk-surface-below-ring", "stalk-color-above-ring", "stalk-color-below-ring", "veil-type", "veil-color",
           "ring-number", "ring-type", "spore-print-color", "population", "habitat"])

# Shuffle data
mushrooms = mushrooms.sample(frac=1).reset_index(drop=True)

# Preview data
mushrooms.head()

Unnamed: 0,poisonous,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,e,x,f,n,t,n,f,c,b,w,...,s,w,p,p,w,o,p,n,v,d
1,p,k,s,e,f,f,f,c,n,b,...,s,w,p,p,w,o,e,w,v,l
2,e,f,f,w,f,n,f,w,b,k,...,s,w,w,p,w,o,e,k,s,g
3,e,x,f,n,t,n,f,c,b,p,...,s,w,p,p,w,o,p,k,y,d
4,p,f,f,y,f,f,f,c,b,p,...,k,b,p,p,w,o,l,h,y,d


### Format agaricus-lepiota.data for preprocessing

This cell prepares the dataset for training and testing by splitting it into features and labels, and further into training and test sets:

1. **Separate Features and Labels**:  
   - A copy of the `mushrooms` dataset is created as `mushroom_features` to ensure the original dataset remains unmodified.
   - The target variable, `poisonous`, is extracted into `mushroom_labels` and mapped to binary values: `'e'` (edible) to `0` and `'p'` (poisonous) to `1`.

2. **Convert Features to Dictionary**:  
   - `mushroom_features_dict` converts the features into a dictionary format, where each column name is a key, and its values are stored as NumPy arrays. This format aligns with TensorFlow's input requirements.

3. **Train-Test Split**:  
   - The data is split into training (80%) and testing (20%) subsets using `train_test_split`. A `random_state` ensures reproducibility of the split.

4. **Prepare Dictionary Formats**:  
   - Both `x_train_features` and `x_test_features` are converted into dictionary formats (`x_train_features_dict` and `x_test_features_dict`), similar to the complete dataset.

This step ensures the data is appropriately formatted and split for model training, validation, and final evaluation.


In [5]:
# Create Pandas DataFrame and Series from data
mushroom_features = mushrooms.copy()
mushroom_labels = mushroom_features.pop('poisonous').map({'e': 0, 'p': 1})
mushroom_features_dict = {name: np.array(value)
                          for name, value in mushroom_features.items()}

# Split data into training and testing sets
x_train_features, x_test_features, y_train_labels, y_test_labels = train_test_split(
    mushroom_features, mushroom_labels, test_size=0.2, random_state=42
)

# Format data for TensorFlow
x_train_features_dict = {name: np.array(value) for name, value in x_train_features.items()}
x_test_features_dict = {name: np.array(value) for name, value in x_test_features.items()}

<a id="evaluate_mushroom_model"></a>
## evaluate_mushroom_model

This function builds, compiles, and trains a TensorFlow model for classifying mushrooms based on the provided training and validation datasets. The model uses one-hot encoding for feature preprocessing and supports customizable complexity via user-defined dense layer configurations.

### Parameters:
- **x_train** `dict`: A dictionary containing the training dataset features, where keys are feature names and values are their corresponding data arrays (strings).
- **y_train** `array-like`: The training dataset labels.
- **epochs** `int`: The number of epochs for training the model.
- **x_val** `dict`: A dictionary containing the validation dataset features, formatted similarly to `x_train`.
- **y_val** `array-like`: The validation dataset labels.
- **model_complexity** `list[int]`: A list defining the number of units in each dense layer, determining the model's architecture.
- **activation** `str`: The activation function used in the neural network layers (e.g., `"relu"`, `"sigmoid"`).
- **optimizer** `str or object`: The optimizer used for training the model (e.g., `"adam"`, `"sgd"`).
- **loss** `str or object`: The loss function used for training the model (e.g., `"binary_crossentropy"`, `"mean_squared_error"`).

### Returns:
- **model** `tf.keras.Model`: The trained TensorFlow model.
- **history** `tf.keras.callbacks.History`: The training history object, which includes metrics and loss values for each epoch.

### Notes:
- **Feature Preprocessing**: The function uses a `StringLookup` layer to convert string-based features into one-hot encoded vectors.
- **Model Complexity**: The architecture is determined dynamically based on the `model_complexity` parameter, allowing for flexible network designs.
- **Loss Function**: Uses binary cross-entropy for binary classification tasks.
- **Metrics**: Tracks accuracy during training and validation.

### Dependencies:
- Requires TensorFlow (`tf`) and Keras layers (`layers`) for building and training the model.


In [6]:
def evaluate_mushroom_model(x_train, y_train, epochs, x_val, y_val, model_complexity, activation, optimizer, loss):
    # Build inputs
    inputs = {name: tf.keras.Input(shape=(1,), name=name, dtype=tf.string) for name in x_train.keys()}

    # Preprocess inputs
    encoded_features = []
    for name in inputs.keys():
        # String lookup layer
        lookup = layers.StringLookup(output_mode='one_hot')
        lookup.adapt(x_train[name])
        encoded_feature = lookup(inputs[name])
        encoded_features.append(encoded_feature)

    # Concatenate all features
    all_features = layers.concatenate(encoded_features)

    # Build the model with the specified complexity
    x = all_features
    count = 0
    for units in model_complexity:
        x = layers.Dense(units, activation=activation, name=f'dense_layer_{count}')(x)
        count += 1
    output = layers.Dense(1, activation='sigmoid')(x)

    model = tf.keras.Model(inputs=inputs, outputs=output)

    # Compile the model
    model.compile(optimizer=optimizer,
                  loss=loss,
                  metrics=['accuracy'])

    # Fit the model
    history = model.fit(x_train, y_train,
                        validation_data=(x_val, y_val),
                        epochs=epochs,
                        verbose=0)

    return model, history

<a id="run_experiment"></a>
## run_experiment

Runs a cross-validation experiment to evaluate a neural network model with the specified configuration. It trains and validates the model across multiple folds, calculates training and validation errors, and reports the performance.

### Parameters:
* **complexity** `list[int]`: A list defining the architecture of the neural network (e.g., layer sizes).
* **x_folds** `list[dict[str, np.ndarray]]`: A list where each element is a dictionary of feature arrays for a specific fold.
* **y_folds** `list[np.ndarray]`: A list where each element is an array of labels for a specific fold.
* **activation** `str`: The activation function used in the neural network layers (e.g., `"relu"`, `"sigmoid"`).
* **optimizer** `str or object`: The optimizer used for training the model (e.g., `"adam"`, `"sgd"`).
* **loss** `str or object`: The loss function used for training the model (e.g., `"binary_crossentropy"`, `"mean_squared_error"`).
* **num_epochs** `int`: Number of epochs to train the model.

### Returns:
* **model** `object`: The trained neural network model.
* **avg_val_error** `float`: Average validation error across all folds.


In [7]:
def run_experiment(complexity, x_folds, y_folds, activation, optimizer, loss, num_epochs):

  # Create data structures to log model performance
  fold_train_errors = []
  fold_val_errors = []
  print(f"\nEvaluating model with layers: {complexity}")

  for i in range(num_folds):
      # Prepare validation data
      x_val_fold, y_val_fold = x_folds[i], y_folds[i]

      # Prepare training data by combining all other folds
      x_train_folds = {
          key: np.concatenate([x_folds[j][key] for j in range(num_folds) if j != i])
          for key in x_train_features.columns
      }
      y_train_folds = np.concatenate([y_folds[j] for j in range(num_folds) if j != i])

      # Build and evaluate the model
      model, history = evaluate_mushroom_model(
          x_train_folds, y_train_folds, num_epochs, x_val_fold, y_val_fold,
          complexity, activation, optimizer, loss
      )

      # Calculate errors
      fold_train_errors.append(100 * (1 - np.mean(history.history['accuracy'])))
      fold_val_errors.append(100 * (1 - np.mean(history.history['val_accuracy'])))

  # Display fold results for this model complexity
  for i, (train_err, val_err) in enumerate(zip(fold_train_errors, fold_val_errors)):
      print(f"Fold: {i}    Train Error: {train_err:.2f}%    Validation Error: {val_err:.2f}%")

  # Report model performance
  mean_train_error = np.mean(fold_train_errors)
  std_train_error = np.std(fold_train_errors)
  mean_val_error = np.mean(fold_val_errors)
  std_val_error = np.std(fold_val_errors)

  print("\nMean(Std. Dev.) over all folds for this model:")
  print("-------------------------------")
  print(f"Train Error: {mean_train_error:.2f}%({std_train_error:.2f}%) Validation Error: {mean_val_error:.2f}%({std_val_error:.2f}%)")

  return model, np.mean(fold_val_errors)

## Setup experiment

This cell is a setup for the cross-validation experiments and exploration of different neural network architectures:

1. **Number of Folds**:  
   - `num_folds = 4`: The dataset is divided into 4 folds for cross-validation. This ensures that every part of the data is used for both training and validation, reducing the risk of overfitting and providing a more robust evaluation of the model's performance.

2. **Number of Epochs**:  
   - `num_epochs = 1`: Each model is trained for a single epoch to quickly explore the effects of different architectures. While this limits convergence, it allows for faster experimentation.

3. **Create Folds**:  
   - `create_folds(x_train_features_dict, y_train_labels.values, num_folds)`: The training data is partitioned into folds for cross-validation. Each fold contains a subset of features (`x`) and corresponding labels (`y`).

4. **Model Complexities**:  
   - The `model_complexities` list defines three different architectures to explore:
     - `[16]`: A simple model with a single layer containing 16 units.
     - `[32, 16]`: A moderately complex model with two layers containing 32 and 16 units, respectively.
     - `[64, 32, 16]`: A high-complexity model with three layers containing 64, 32, and 16 units, respectively.
   - These architectures are chosen to systematically evaluate the impact of increasing network complexity on model performance.

This setup facilitates a systematic exploration of model performance across different architectures, allowing the best-performing configuration to be identified through cross-validation.

In [8]:
num_folds = 4
num_epochs = 1

# Create folds from the training data
x, y = create_folds(x_train_features_dict, y_train_labels.values, num_folds)

# Define different model complexities to try
model_complexities = [
    [16],                # Simple model
    [32, 16],            # Moderate complexity
    [64, 32, 16],        # High complexity
]

## Conduct experiment for all three complexities

### **Experiment 1: Baseline Network ([16])**
- **Motivation**:  
  Start with a simple neural network architecture to establish a baseline. A single-layer model with 16 units is chosen to minimize complexity while still providing meaningful classification performance. Simple architectures are less prone to overfitting and provide insights into the dataset's separability.

- **Observations**:  
  - **Validation Performance**: Achieved a mean validation error of **2.97% (±0.90%)**.  
  - **Training Error**: Recorded a mean training error of **12.61% (±1.70%)**.

- **Effect of Simplicity**:  
  - Low validation error indicates reasonable generalization to unseen data.  
  - Higher training error suggests underfitting, which is expected for a minimal architecture.

---

### **Experiment 2: Moderate Complexity ([32, 16])**
- **Motivation**:  
  Increase the model's capacity to capture more complex patterns by introducing an additional layer with 32 units. This moderate increase aims to balance underfitting and overfitting.

- **Observations**:  
  - **Validation Performance**: Mean validation error improved to **0.97% (±0.13%)**.  
  - **Training Error**: Reduced to **7.29% (±1.88%)**.

- **Impact of Increased Complexity**:  
  - Lower validation error compared to the baseline indicates improved generalization.  
  - Reduced training error confirms the model's ability to better capture data patterns.

---

### **Experiment 3: High Complexity ([64, 32, 16])**
- **Motivation**:  
  Further increase the model's capacity by adding a third layer with 64 units to test whether additional complexity improves performance or leads to overfitting.

- **Observations**:  
  - **Validation Performance**: Achieved the best mean validation error of **0.08% (±0.03%)**.  
  - **Training Error**: Further decreased to **5.58% (±1.06%)**.

- **Insights**:  
  - Improved validation performance suggests the model can generalize very well without overfitting.  
  - Reduced training error highlights the model's enhanced learning capacity.

In [9]:
best_validation_error = float('inf')
best_model = None
best_complexity = None

activation = 'relu'
optimizer = 'adam'
loss = 'binary_crossentropy'

# Conduct experiment on all model complexities
for complexity in model_complexities:
    model, avg_val_error = run_experiment(complexity, x, y, activation, optimizer, loss, num_epochs)

    # Update best model if current one is better
    if avg_val_error < best_validation_error:
        best_validation_error = avg_val_error
        best_model = model
        best_complexity = complexity


Evaluating model with layers: [16]
Fold: 0    Train Error: 14.71%    Validation Error: 3.57%
Fold: 1    Train Error: 12.00%    Validation Error: 1.78%
Fold: 2    Train Error: 13.54%    Validation Error: 4.06%
Fold: 3    Train Error: 10.17%    Validation Error: 2.46%

Mean(Std. Dev.) over all folds for this model:
-------------------------------
Train Error: 12.61%(1.70%) Validation Error: 2.97%(0.90%)

Evaluating model with layers: [32, 16]
Fold: 0    Train Error: 4.23%    Validation Error: 0.80%
Fold: 1    Train Error: 7.65%    Validation Error: 0.92%
Fold: 2    Train Error: 9.36%    Validation Error: 0.98%
Fold: 3    Train Error: 7.94%    Validation Error: 1.17%

Mean(Std. Dev.) over all folds for this model:
-------------------------------
Train Error: 7.29%(1.88%) Validation Error: 0.97%(0.13%)

Evaluating model with layers: [64, 32, 16]
Fold: 0    Train Error: 4.47%    Validation Error: 0.06%
Fold: 1    Train Error: 4.60%    Validation Error: 0.06%
Fold: 2    Train Error: 6.81%  

## Evaluate best model's performance on test data

This cell evaluates the best-performing model, determined during the cross-validation phase, on the independent test dataset:

1. **Identify the Best Model**:  
   - The `best_complexity` variable stores the architecture of the model that achieved the lowest validation error during cross-validation.
   - `print(f"Best model complexity: {best_complexity}")` displays the architecture of the chosen model for reference. In this case, the best model has three layers with 64, 32, and 16 units.

2. **Test Set Evaluation**:  
   - The `evaluate` function calculates the accuracy of the `best_model` using the preprocessed test dataset (`x_test_features_dict` and `y_test_labels`).
   - The test accuracy is used to compute the **test error** as `100 * (1 - test_accuracy)`.

3. **Report Test Error**:  
   - The `print` statement outputs the test error, which provides an estimate of the model's ability to generalize to completely unseen data.
   - In this instance, the test error is **0.06%**, indicating exceptional generalization performance.

This step is crucial because the test set is used only once, after the model has been finalized, to avoid overfitting and ensure an unbiased evaluation of the model's real-world performance.


In [10]:
# After trying all models, evaluate the best one on the test set
print(f"Best model complexity: {best_complexity}")

# Evaluate the model using the preprocessed test data
test_accuracy = evaluate(best_model, x_test_features_dict, y_test_labels)
test_error = 100 * (1 - test_accuracy)
print(f"Test Error of the best model: {test_error:.2f}%")

Best model complexity: [64, 32, 16]
Test Error of the best model: 0.06%


## Experiment: Activation Function and Optimizer
Modify the 1) Activation function 2) Optimizer of any chosen model. Try at least one model for each modified component.

Explain the motivation behind the modifications you made.

Explore the effects on the performance.


### **Experiment 4: Modify Activation function from 'relu' to 'tanh' (optimizer 'adam')**
- **Motivation**:  
  Replace ReLU with **tanh** to explore its ability to capture non-linear patterns symmetrically, given the dataset features.

- **Observations**:  
  - **Validation Performance**: Slightly worsened to **0.25% (±0.20%)**.  
  - **Training Error**: Slightly improved to **4.13% (±0.48%)**.

- **Analysis**:  
  - The shift to tanh led to a minor drop in generalization performance, possibly due to vanishing gradients during training.  
  - Training error decreased, but the higher validation error suggests reduced efficacy compared to ReLU.


In [11]:
activation = 'tanh'
optimizer = 'adam'
loss = 'binary_crossentropy'
complexity = best_complexity

model, avg_val_error = run_experiment(complexity, x, y, activation, optimizer, loss, num_epochs)

# Evaluate the new model using the preprocessed test data
print(f"\nModel complexity: {complexity}")
test_accuracy = evaluate(model, x_test_features_dict, y_test_labels)
test_error = 100 * (1 - test_accuracy)
print(f"Test Error of the model: {test_error:.2f}%")


Evaluating model with layers: [64, 32, 16]
Fold: 0    Train Error: 4.31%    Validation Error: 0.00%
Fold: 1    Train Error: 3.39%    Validation Error: 0.18%
Fold: 2    Train Error: 4.72%    Validation Error: 0.25%
Fold: 3    Train Error: 4.12%    Validation Error: 0.55%

Mean(Std. Dev.) over all folds for this model:
-------------------------------
Train Error: 4.13%(0.48%) Validation Error: 0.25%(0.20%)

Model complexity: [64, 32, 16]
Test Error of the model: 0.25%


### **Experiment 5: Modify Optimizer from 'adam' to 'sgd' gradient descent w/ momentum (activation function 'relu')**
- **Motivation**:  
  Replace Adam with **SGD** to evaluate the impact of a simpler optimization algorithm that relies on manual momentum tuning.

- **Observations**:  
  - **Validation Performance**: Validation error significantly rose to **11.51% (±1.48%)**.  
  - **Training Error**: Also significantly increased to **23.81% (±6.40%)**.

- **Insights**:  
  - The model struggled with convergence using SGD, resulting in considerably worse performance.  
  - Highlights the importance of adaptive optimizers like Adam for this problem.


In [12]:
activation = 'relu'
optimizer = 'sgd'
loss = 'binary_crossentropy'
complexity = best_complexity

model, avg_val_error = run_experiment(complexity, x, y, activation, optimizer, loss, num_epochs)

# Evaluate the new model using the preprocessed test data
print(f"\nModel complexity: {complexity}")
test_accuracy = evaluate(model, x_test_features_dict, y_test_labels)
test_error = 100 * (1 - test_accuracy)
print(f"Test Error of the model: {test_error:.2f}%")


Evaluating model with layers: [64, 32, 16]
Fold: 0    Train Error: 18.10%    Validation Error: 12.68%
Fold: 1    Train Error: 34.22%    Validation Error: 13.17%
Fold: 2    Train Error: 19.02%    Validation Error: 10.65%
Fold: 3    Train Error: 23.92%    Validation Error: 9.54%

Mean(Std. Dev.) over all folds for this model:
-------------------------------
Train Error: 23.81%(6.40%) Validation Error: 11.51%(1.48%)

Model complexity: [64, 32, 16]
Test Error of the model: 8.31%


## OPTIONAL. BONUS. Experiment: Loss Function

Modify the loss function of any chosen model.

Explain the motivation behind the modifications you made.

Explore the effects on the performance.


### **Experiment 6: Modify Loss from 'binary_crossentropy' to 'mean_squared_error' (activation function 'relu' and optimizer 'sgd')**
- **Motivation**:  
  Replace binary cross-entropy with **mean squared error (MSE)** to analyze the effects of a regression-based loss for classification tasks.

- **Observations**:  
  - **Validation Performance**: Remained excellent at **0.17% (±0.13%)**.  
  - **Training Error**: Slightly better at **4.03% (±0.27%)**.

- **Analysis**:  
  - MSE performed similarly to binary cross-entropy, suggesting it was able to handle the binary classification task effectively.  
  - However, binary cross-entropy is theoretically more suitable for probabilistic outputs.


In [13]:
activation = 'relu'
optimizer = 'adam'
loss = 'mean_squared_error'
complexity = best_complexity

model, avg_val_error = run_experiment(complexity, x, y, activation, optimizer, loss, num_epochs)

# Evaluate the new model using the preprocessed test data
print(f"\nModel complexity: {complexity}")
test_accuracy = evaluate(model, x_test_features_dict, y_test_labels)
test_error = 100 * (1 - test_accuracy)
print(f"Test Error of the model: {test_error:.2f}%")


Evaluating model with layers: [64, 32, 16]
Fold: 0    Train Error: 4.02%    Validation Error: 0.00%
Fold: 1    Train Error: 3.78%    Validation Error: 0.37%
Fold: 2    Train Error: 3.86%    Validation Error: 0.18%
Fold: 3    Train Error: 4.47%    Validation Error: 0.12%

Mean(Std. Dev.) over all folds for this model:
-------------------------------
Train Error: 4.03%(0.27%) Validation Error: 0.17%(0.13%)

Model complexity: [64, 32, 16]
Test Error of the model: 0.06%


# **Conclusion**
- **Best Model**: The architecture with three layers ([64, 32, 16]) and ReLU activation, Adam optimizer, and binary cross-entropy loss performed the best.
- **Test Performance**: This model achieved a **test error of 0.06%**, demonstrating exceptional generalization.

No other directions for this assignment, other than what's here and in the "General Directions" section. You have a lot of freedom with this assignment. Don't get carried away. It is expected the results may vary, being better or worse, due to the limitations of the dataset. Graders are not going to run your notebooks. The notebook will be read as a report on how different models were explored. Since you'll be using libraries, the emphasis will be on your ability to communicate your findings.

## Before You Submit...

1. Re-read the general instructions provided above, and
2. Hit "Kernel"->"Restart & Run All".