# Artificial Intelligence
# 464/664
# Assignment #7

## General Directions for this Assignment

00. We're using a Jupyter Notebook environment (tutorial available here: https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html),
01. Output format should be exactly as requested (it is your responsibility to make sure notebook looks as expected on Gradescope),
02. Check submission deadline on Gradescope, 
03. Rename the file to Last_First_assignment_7, 
04. Submit your notebook (as .ipynb, not PDF) using Gradescope, and
05. Do not submit any other files.

## Before You Submit...

1. Re-read the general instructions provided above, and
2. Hit "Kernel"->"Restart & Run All".

## Neural Networks

For this assignment we will explore Neural Networks; in particular, we are going to explore model complexity. We will use the same dataset from Assignment #6 to classify a mushroom as either edible ('e') or poisonous ('p'). You are free to use PyTorch, TensorFlow, scikit-learn -- to name a few resources. The goal is to explore different model complexities (architectures) before declaring a winner. Either start with a simple network and make it more complex; or start with a complex model and pare it down. Either way, your submission should clearly demonstrate your exploration. 


Your output for each model should look like the output of `cross_validate` from Assignment #6:

```
Fold: 0	Train Error: 15.38%	Validation Error: 0.00%
Fold: 1
...

Mean(Std. Dev.) over all folds:
-------------------------------
Train Error: 100.00%(0.00%) Test Error: 100.00%(0.00%)
```

Notice that "Test Error" has been replaced by "Validation Error." Split your dataset into train, test, and validation sets. 


Start with a simple network. Train using the train set. Observe model's performance using the validation set. 


Increase the complexity of your network. Train using the train set. Observe model's performance using the validation set. 


Model complexity in Assignment #6 was depth limit. You can think of it here as the architecture of the network (number of layers and units per layer). Try at least three different network architectures. 


We're trying to find a model complexity that generalizes well. (Recall high bias vs high variance discussion in class.) 


Pick the network architecture that you deem best. Use the test set to report your winning model's performance. This is the ONLY time you use the test set.


No other directions for this assignment, other than what's here and in the "General Directions" section. You have a lot of freedom with this assignment. Don't get carried away. Try at least three different models; more importantly, document your process. Graders are not going to run your notebooks. The notebook will be read as a report on how different models were explored: what the results were, how the winning model was determined, what was the winning model's performance on the test data. Clearly highlight these items to receive full credit. Since you'll be using libraries, the emphasis will be on your ability to communicate your findings.

In [1]:
from sklearn.model_selection import train_test_split
import numpy as np
# pyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score
# tensorflow
import tensorflow as tf
# MLPClassifier
import pandas as pd
from sklearn.neural_network import MLPClassifier

2024-05-14 23:00:36.484523: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
# M1: pyTorch
# dataset given in assignment #6 
data = [['round','large','blue','no'],
['square','large','green','yes'],
['square','small','red','no'],
['round','large','red','yes'],
['square','small','blue','no'],
['round','small','blue','no'],
['round','small','red','yes'],
['square','small','green','no'],
['round','large','green','yes'],
['square','large','green','yes'],
['square','large','red','no'],
['square','large','green','yes'],
['round','large','red','yes'],
['square','small','red','no'],
['round','small','green','no']]
attribute_names = ['shape', 'size', 'color']

data = np.array(data)
X = data[:, :-1]
y = data[:, -1]

# convert categorical features and labels to numerical values
shape_map = {'round': 0, 'square': 1}
size_map = {'small': 0, 'large': 1}
color_map = {'blue': 0, 'green': 1, 'red': 2}
label_map = {'no': 0, 'yes': 1}

for i in range(X.shape[0]):
    X[i, 0] = shape_map[X[i, 0]]
    X[i, 1] = size_map[X[i, 1]]
    X[i, 2] = color_map[X[i, 2]]
    y[i] = label_map[y[i]]

X = X.astype(int)
y = y.astype(int)

# splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.25, random_state = 42)

def create_pytorch_model(input_size, output_size, hidden_layers, units_per_layer, activation):
    layers = []
    layers.append(nn.Linear(input_size, units_per_layer))
    layers.append(activation())
    
    for _ in range(hidden_layers - 1):
        layers.append(nn.Linear(units_per_layer, units_per_layer))
        layers.append(activation())
    layers.append(nn.Linear(units_per_layer, output_size))
    return nn.Sequential(*layers)

def train_pytorch_model(model, criterion, optimizer, X_train, y_train):
    for epoch in range(50):
        model.train()
        optimizer.zero_grad()
        outputs = model(torch.tensor(X_train, dtype=torch.float32))
        outputs = torch.sigmoid(outputs)
        loss = criterion(outputs.squeeze(), torch.tensor(y_train, dtype=torch.float32))
        loss.backward()
        optimizer.step()

def evaluate_pytorch_model(model, X_val, y_val):
    model.eval()
    with torch.no_grad():
        outputs = model(torch.tensor(X_val, dtype=torch.float32))
        val_preds = (outputs.squeeze() > 0.5).int().numpy()
        val_error = 1 - accuracy_score(y_val, val_preds)
    return val_error

def cross_validate_pytorch(X, y, model_params, n_folds=3):
    skf = StratifiedKFold(n_splits=n_folds, shuffle=True, random_state=42)
    for params in model_params:
        val_errors = []
        for fold, (train_index, val_index) in enumerate(skf.split(X, y)):
            X_train_fold, X_val_fold = X[train_index], X[val_index]
            y_train_fold, y_val_fold = y[train_index], y[val_index]
            model = create_pytorch_model(X.shape[1], 1, **params)
            criterion = nn.BCELoss()
            optimizer = optim.Adam(model.parameters(), lr=0.001)
            train_pytorch_model(model, criterion, optimizer, X_train_fold, y_train_fold)
            val_error = evaluate_pytorch_model(model, X_val_fold, y_val_fold)
            val_errors.append(val_error)
            print(f"Fold: {fold}    Validation Error: {val_error*100:.2f}%")
        print("\nMean(Std. Dev.) over all folds:\n-------------------------------")
        print(f"Validation Error: {np.mean(val_errors)*100:.2f}% ({np.std(val_errors)*100:.2f}%)")
        print("\n")


model_params = [
    {'hidden_layers': 1, 'units_per_layer': 4, 'activation': nn.ReLU}, 
    {'hidden_layers': 2, 'units_per_layer': 15, 'activation': nn.ReLU},
    {'hidden_layers': 3, 'units_per_layer': 36, 'activation': nn.ReLU}
]

cross_validate_pytorch(X_train, y_train, model_params)

Fold: 0    Validation Error: 66.67%
Fold: 1    Validation Error: 33.33%
Fold: 2    Validation Error: 33.33%

Mean(Std. Dev.) over all folds:
-------------------------------
Validation Error: 44.44% (15.71%)


Fold: 0    Validation Error: 66.67%
Fold: 1    Validation Error: 33.33%
Fold: 2    Validation Error: 33.33%

Mean(Std. Dev.) over all folds:
-------------------------------
Validation Error: 44.44% (15.71%)


Fold: 0    Validation Error: 33.33%
Fold: 1    Validation Error: 0.00%
Fold: 2    Validation Error: 33.33%

Mean(Std. Dev.) over all folds:
-------------------------------
Validation Error: 22.22% (15.71%)




In [3]:
# M2: tensorflow
data = [['round','large','blue','no'],
['square','large','green','yes'],
['square','small','red','no'],
['round','large','red','yes'],
['square','small','blue','no'],
['round','small','blue','no'],
['round','small','red','yes'],
['square','small','green','no'],
['round','large','green','yes'],
['square','large','green','yes'],
['square','large','red','no'],
['square','large','green','yes'],
['round','large','red','yes'],
['square','small','red','no'],
['round','small','green','no']]

attribute_names = ['shape', 'size', 'color', 'label']
df = pd.DataFrame(data, columns=attribute_names)

# Map categorical features to integers
shape_map = {'round': 0, 'square': 1}
size_map = {'small': 0, 'large': 1}
color_map = {'blue': 0, 'green': 1, 'red': 2}
label_map = {'no': 0, 'yes': 1}

df['shape'] = df['shape'].map(shape_map)
df['size'] = df['size'].map(size_map)
df['color'] = df['color'].map(color_map)
df['label'] = df['label'].map(label_map)


X = df.drop('label', axis=1)
y = df['label']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)

def create_tf_model_with_embeddings(input_dims, output_shape, hidden_layers, units_per_layer):
    input_layers = []
    embedding_layers = []
    for input_dim in input_dims:
        input_layer = tf.keras.layers.Input(shape=(1,))
        embedding_layer = tf.keras.layers.Embedding(input_dim, 2)(input_layer)
        embedding_layer = tf.keras.layers.Reshape(target_shape=(2,))(embedding_layer)
        input_layers.append(input_layer)
        embedding_layers.append(embedding_layer)
    merged_layer = tf.keras.layers.concatenate(embedding_layers)
    for _ in range(hidden_layers):
        merged_layer = tf.keras.layers.Dense(units_per_layer, activation='relu')(merged_layer)
    output_layer = tf.keras.layers.Dense(output_shape, activation='sigmoid')(merged_layer)
    model = tf.keras.Model(inputs=input_layers, outputs=output_layer)
    return model

def train_tf_model_with_embeddings(model, X_train, y_train, X_val, y_val):
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    history = model.fit([X_train[col] for col in X_train.columns], y_train, 
                        validation_data=([X_val[col] for col in X_val.columns], y_val), 
                        epochs=50, batch_size=8, verbose=0)
    return history

def evaluate_tf_model_with_embeddings(model, X_test, y_test):
    y_pred = (model.predict([X_test[col] for col in X_test.columns]) > 0.5).astype("int32")
    accuracy = accuracy_score(y_test, y_pred)
    return 1 - accuracy

def cross_validate_tf_with_embeddings(X, y, model_params, n_folds=3):
    skf = StratifiedKFold(n_splits=n_folds, shuffle=True, random_state=42)
    input_dims = [X[col].max() + 1 for col in X.columns]
    for params in model_params:
        val_errors = []
        for fold, (train_index, val_index) in enumerate(skf.split(X, y)):
            X_train_fold, X_val_fold = X.iloc[train_index], X.iloc[val_index]
            y_train_fold, y_val_fold = y.iloc[train_index], y.iloc[val_index]
            model = create_tf_model_with_embeddings(input_dims, **params)
            history = train_tf_model_with_embeddings(model, X_train_fold, y_train_fold, X_val_fold, y_val_fold)
            val_error = evaluate_tf_model_with_embeddings(model, X_val_fold, y_val_fold)
            val_errors.append(val_error)
            print(f"Fold: {fold}    Validation Error: {val_error*100:.2f}%")
        print("\nMean(Std. Dev.) over all folds:\n-------------------------------")
        print(f"Validation Error: {np.mean(val_errors)*100:.2f}% ({np.std(val_errors)*100:.2f}%)")
        print("\n")

model_params = [
    {'output_shape': 1, 'hidden_layers': 1, 'units_per_layer': 4}, 
    {'output_shape': 1, 'hidden_layers': 2, 'units_per_layer': 15},
    {'output_shape': 1, 'hidden_layers': 3, 'units_per_layer': 36}
]

cross_validate_tf_with_embeddings(X_train, y_train, model_params)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 74ms/step
Fold: 0    Validation Error: 66.67%
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 71ms/step
Fold: 1    Validation Error: 33.33%
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 85ms/step
Fold: 2    Validation Error: 33.33%

Mean(Std. Dev.) over all folds:
-------------------------------
Validation Error: 44.44% (15.71%)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 77ms/step
Fold: 0    Validation Error: 66.67%
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 78ms/step
Fold: 1    Validation Error: 0.00%
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 89ms/step
Fold: 2    Validation Error: 33.33%

Mean(Std. Dev.) over all folds:
-------------------------------
Validation Error: 33.33% (27.22%)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 90ms/step
Fold: 0    Validation Error: 66.67%
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[

In [4]:
# M3: MLPClassifier
def cross_validate_sklearn(X, y, model_params, n_folds=3):
    skf = StratifiedKFold(n_splits=n_folds, shuffle=True, random_state=42)
    for params in model_params:
        val_errors = []
        for fold, (train_index, val_index) in enumerate(skf.split(X, y)):
            X_train_fold, X_val_fold = X[train_index], X[val_index]
            y_train_fold, y_val_fold = y[train_index], y[val_index]
            model = MLPClassifier(hidden_layer_sizes=(params['units_per_layer'],) * params['hidden_layers'],
                                  activation='relu', solver='adam', random_state=42, max_iter=5000, 
                                  learning_rate_init=0.001)
            model.fit(X_train_fold, y_train_fold)
            val_preds = model.predict(X_val_fold)
            val_error = 1 - accuracy_score(y_val_fold, val_preds)
            val_errors.append(val_error)
            print(f"Fold: {fold}    Validation Error: {val_error*100:.2f}%")
        print("\nMean(Std. Dev.) over all folds:\n-------------------------------")
        print(f"Validation Error: {np.mean(val_errors)*100:.2f}% ({np.std(val_errors)*100:.2f}%)")
        print("\n")

X_train_np = X_train.to_numpy()
y_train_np = y_train.to_numpy()

model_params = [
    {'hidden_layers': 1, 'units_per_layer': 4}, 
    {'hidden_layers': 2, 'units_per_layer': 15},
    {'hidden_layers': 3, 'units_per_layer': 36}
]

cross_validate_sklearn(X_train_np, y_train_np, model_params)

Fold: 0    Validation Error: 33.33%
Fold: 1    Validation Error: 33.33%
Fold: 2    Validation Error: 33.33%

Mean(Std. Dev.) over all folds:
-------------------------------
Validation Error: 33.33% (0.00%)


Fold: 0    Validation Error: 33.33%
Fold: 1    Validation Error: 66.67%
Fold: 2    Validation Error: 33.33%

Mean(Std. Dev.) over all folds:
-------------------------------
Validation Error: 44.44% (15.71%)


Fold: 0    Validation Error: 33.33%
Fold: 1    Validation Error: 33.33%
Fold: 2    Validation Error: 33.33%

Mean(Std. Dev.) over all folds:
-------------------------------
Validation Error: 33.33% (0.00%)




Summary:

1. Pytorch
- Model Setup: Constructed using nn.Sequential with varying hidden layers and units per layer.
- Training: Utilized Adam optimizer and BCELoss. Trained for 50 epochs per fold.
- Cross-Validation: Used StratifiedKFold with 3 folds.
- Pros and Cons: More hidden layers and units per layer lead to lower training error but higher validation error, suggesting overfitting.A model with one hidden layer and four units per layer maintained a more consistent performance across folds, indicating a better balance between training and validation errors.

2. Tensorflow with Embedding
- Model Setup: Created using embeddings for categorical features with varying hidden layers and units per layer.
- Training: Utilized Adam optimizer and binary cross-entropy loss. Trained for 50 epochs per fold.
- Cross-Validation: Used StratifiedKFold with 3 folds.
- Pros and Cons: Similar to PyTorch, adding more hidden layers and units per layer reduces training error but increases validation error, suggesting overfitting. A model with one hidden layer and four units per layer performed more consistently, achieving a lower average validation error.

3. MLPClassifier from scikit-learn
- Model Setup: Configured using MLPClassifier with varying hidden layers and units per layer.
- Training: Trained for a maximum of 5000 iterations per fold.
- Cross-Validation: Used StratifiedKFold with 3 folds.
- Pros and Cons: The model maintained consistent performance across all folds, showing no deviation in validation error. Although consistent, the model's flexibility and adaptability are limited.

**Conclusion**:
The PyTorch model was chosen as the winning model because it achieved the lowest average validation error(22.22%), indicating better performance in this comparison. Despite the higher standard deviation, its adaptability and potential for further tuning make it a more robust choice in the long run compared to the MLPClassifier, which is kind of limited. The TensorFlow model's higher average error made it less favorable despite its consistent performance.

## Before You Submit...

1. Re-read the general instructions provided above, and
2. Hit "Kernel"->"Restart & Run All".