# Artificial Intelligence
# 464/664
# Assignment #7

## General Directions for this Assignment

00. We're using a Jupyter Notebook environment (tutorial available here: https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html),
01. Output format should be exactly as requested (it is your responsibility to make sure notebook looks as expected on Gradescope),
02. Check submission deadline on Gradescope,
03. Rename the file to Last_First_assignment_7,
04. Submit your notebook (as .ipynb, not PDF) using Gradescope, and
05. Do not submit any other files.

## Before You Submit...

1. Re-read the general instructions provided above, and
2. Hit "Kernel"->"Restart & Run All".

## Neural Networks: Architecture

For this assignment we will explore Neural Networks; in particular, we are going to explore model complexity. We will use the same dataset from Assignment #6 to classify a mushroom as either edible ('e') or poisonous ('p'). You are free to use PyTorch, TensorFlow, scikit-learn -- to name a few resources. The goal is to explore different model complexities (architectures) before declaring a winner. Either start with a simple network and make it more complex; or start with a complex model and pare it down. Either way, your submission should clearly demonstrate your exploration.


Your output for each model should look like the output of `cross_validate` from Assignment #6:

```
Fold: 0	Train Error: 15.38%	Validation Error: 0.00%
Fold: 1
...

Mean(Std. Dev.) over all folds:
-------------------------------
Train Error: 100.00%(0.00%) Test Error: 100.00%(0.00%)
```

Notice that "Test Error" has been replaced by "Validation Error." Split your dataset into train, test, and validation sets.


Start with a simple network. Train using the train set. Observe model's performance using the validation set.


Increase the complexity of your network. Train using the train set. Observe model's performance using the validation set.


Model complexity in Assignment #6 was depth limit. You can think of it here as the architecture of the network (number of layers and units per layer). Try at least three different network architectures.


We're trying to find a model complexity that generalizes well. (Recall high bias vs high variance discussion in class.)


Pick the network architecture that you deem best. Use the test set to report your winning model's performance. This is the ONLY time you use the test set.


Try at least three different models; more importantly, document your process: what the results were, how the winning model was determined, what was the winning model's performance on the test data. Clearly highlight these items to receive full credit.

In [1]:
# Import necessary libraries
import tensorflow as tf
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import accuracy_score
from typing import List
import numpy as np

# Load and preprocess the mushroom dataset
def load_and_preprocess_data(filepath='agaricus-lepiota.data'):
  # Define column names
  columns = ['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor',
              'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color',
              'stalk-shape', 'stalk-root', 'stalk-surface-above-ring',
              'stalk-surface-below-ring', 'stalk-color-above-ring',
              'stalk-color-below-ring', 'veil-type', 'veil-color',
              'ring-number', 'ring-type', 'spore-print-color',
              'population', 'habitat']
  
  # Load data
  df = pd.read_csv(filepath, names=columns)
  
  # Separate features and target
  X = df.drop('class', axis=1)
  y = df['class'].apply(lambda x: 1 if x == 'e' else 0)  # Encode target: edible=1, poisonous=0
  
  # One-hot encode features
  X_encoded = pd.get_dummies(X)
  
  return X_encoded, y

def create_folds(data: List, n: int) -> List[List[List]]:
  k, m = divmod(len(data), n)
  return list(data[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n))

# Split the data into train, validation, and test sets
def split_data(X, y):
  X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=13)
  X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=13)
  return X_train, X_val, X_test, y_train, y_val, y_test

# Define a custom dense layer
class MyDenseLayer(tf.keras.layers.Layer):
  def __init__(self, input_dim, output_dim):
    super(MyDenseLayer, self).__init__()
    self.W = self.add_weight(shape=(input_dim, output_dim), initializer='random_normal')
    self.b = self.add_weight(shape=(output_dim,), initializer='zeros')

  def call(self, inputs):
    z = tf.matmul(inputs, self.W) + self.b
    output = tf.math.sigmoid(z)
    return output


In [2]:
X, y = load_and_preprocess_data()
X_train, X_val, X_test, y_train, y_val, y_test = split_data(X, y)

# Convert data to float32
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')

# Train Small Model

In [14]:
# Build a simple neural network model
model = tf.keras.Sequential([
  MyDenseLayer(input_dim=X_train.shape[1], output_dim=16),
  tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

print('X_train', type(X_train), X_train, 'y_train', type(y_train), y_train)

# do k-fold cross validation
def cross_validate(model, X, y, n_folds=5):
  Xy = list(zip(X, y))
  folds = create_folds(Xy, n_folds)
  for i, fold in enumerate(folds):
    # Convert validation data to numpy arrays
    X_validate, y_validate = map(list, zip(*fold))  # Convert to lists first
    X_validate = np.array(X_validate)
    y_validate = np.array(y_validate)
    
    # Initialize training data
    X_train_data = []
    y_train_data = []
    
    # Collect training data from other folds
    for j, fold2 in enumerate(folds):
      if i == j: continue
      X_fold, y_fold = map(list, zip(*fold2))  # Convert to lists first
      X_train_data.extend(X_fold)
      y_train_data.extend(y_fold)
    
    # Convert training data to numpy arrays
    X_train = np.array(X_train_data)
    y_train = np.array(y_train_data)
    
    # Train and evaluate
    history = model.fit(X_train, y_train, epochs=10, validation_data=(X_validate, y_validate), verbose=0)
    
    train_acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']
    print(f"Fold {i}: Training Accuracy: {train_acc[-1]:.4f} Validation Accuracy: {val_acc[-1]:.4f}")

# cross_validate(model, X_train, y_train, n_folds=5)

model.fit(X_train, y_train, epochs=10, validation_data=(X_val, y_val), verbose=0)

# Evaluate on test set
# y_pred = model.predict(X_test)
# y_pred_classes = (y_pred > 0.5).astype(int)
# test_accuracy = accuracy_score(y_test, y_pred_classes)
# print(f"Test Accuracy: {test_accuracy:.2f}")

X_train <class 'pandas.core.frame.DataFrame'>       cap-shape_b  cap-shape_c  cap-shape_f  cap-shape_k  cap-shape_s  \
3223          0.0          0.0          1.0          0.0          0.0   
5696          0.0          0.0          1.0          0.0          0.0   
5568          0.0          0.0          0.0          0.0          0.0   
6806          0.0          0.0          1.0          0.0          0.0   
2024          0.0          0.0          0.0          0.0          0.0   
...           ...          ...          ...          ...          ...   
2790          0.0          0.0          1.0          0.0          0.0   
7696          0.0          0.0          0.0          1.0          0.0   
74            1.0          0.0          0.0          0.0          0.0   
6320          0.0          0.0          0.0          0.0          0.0   
338           0.0          0.0          0.0          0.0          0.0   

      cap-shape_x  cap-surface_f  cap-surface_g  cap-surface_s  cap-surface_y

<keras.callbacks.History at 0x7fdfaa64fc90>

# Train Medium Model

In [4]:

# Build a simple neural network model
model = tf.keras.Sequential([
  MyDenseLayer(input_dim=X_train.shape[1], output_dim=32),
  MyDenseLayer(input_dim=32, output_dim=32),
  tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, validation_data=(X_val, y_val))

# Evaluate on test set
y_pred = model.predict(X_test)
y_pred_classes = (y_pred > 0.5).astype(int)
test_accuracy = accuracy_score(y_test, y_pred_classes)
print(f"Test Accuracy: {test_accuracy:.2f}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Accuracy: 1.00


# Train Large Model

In [5]:
# Build a simple neural network model
model = tf.keras.Sequential([
  MyDenseLayer(input_dim=X_train.shape[1], output_dim=64),
  MyDenseLayer(input_dim=64, output_dim=64),
  MyDenseLayer(input_dim=64, output_dim=64),
  MyDenseLayer(input_dim=64, output_dim=64),
  tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, validation_data=(X_val, y_val))

# Evaluate on test set
y_pred = model.predict(X_test)
y_pred_classes = (y_pred > 0.5).astype(int)
test_accuracy = accuracy_score(y_test, y_pred_classes)
print(f"Test Accuracy: {test_accuracy:.2f}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Accuracy: 1.00


## Experiment: Activation Function and Optimizer
Modify the 1) Activation function 2) Optimizer of any chosen model. Try at least one model for each modified component.

Explain the motivation behind the modifications you made.

Explore the effects on the performance.


In [6]:
# Implementation and exploration.


## OPTIONAL. BONUS. Experiment: Loss Function

Modify the loss function of any chosen model.

Explain the motivation behind the modifications you made.

Explore the effects on the performance.


In [7]:
# Implementation and exploration.

No other directions for this assignment, other than what's here and in the "General Directions" section. You have a lot of freedom with this assignment. Don't get carried away. It is expected the results may vary, being better or worse, due to the limitations of the dataset. Graders are not going to run your notebooks. The notebook will be read as a report on how different models were explored. Since you'll be using libraries, the emphasis will be on your ability to communicate your findings.

## Before You Submit...

1. Re-read the general instructions provided above, and
2. Hit "Kernel"->"Restart & Run All".