# EE 508 HW 1 Part 2: Classification

Your task in this Colab notebook is to fill out the sections that are specified by **TODO** (please search the keyword `TODO` to make sure you do not miss any). 

## Cross Validation, Bias-Variance trade-off, Overfitting

In this section, we will demonstrate data splitting and the validation process in machine learning paradigms. We will use the Iris dataset from the `sklearn` library.

Objective:
- Train a Fully-Connected Network (FCN) for classification.  
- Partition the data using three-fold cross-validation and report the training, validation, and testing accuracy.  
- Train the model using cross-entropy loss and evaluate it with 0/1 loss.  

In [1]:
# import required libraries and dataset
import numpy as np
# load sklearn for ML functions
from sklearn.datasets import load_iris
# load torch dataaset for training NNs
import torch
import torch.nn as nn
import torch.optim as optim
# plotting library
import matplotlib.pyplot as plt
import torch.nn.functional as F

%matplotlib inline
plt.style.use(['ggplot'])

### **TODO 1**: Implement the cross validation function

In this function:
1.  Shuffle the dataset using `np.random.shuffle`.
2.  Create partition indices using `np.linspace`.
3.  Loop through `n_folds`:
    *   Determine indices for `valid` set (from `partitions[i]` to `partitions[i+1]`).
    *   Determine indices for `train` set (the rest).
    *   Store the partitioned arrays `(x_train, y_train, x_valid, y_valid)` in `folds` list.
4.  Return `folds`.

In [2]:
# Skeleton for TODO 1
def cross_validation(x: np.array, y: np.array, n_folds: int=3):
    """
    Splitting the dataset to the given fold
    Parameters:
    - x: Feaures of the dataset, with shape (n_samples, n_features)
    - y: Class label of the dataset, with shape (n_samples,)
    - n_folds: the given number of partitions
    
    Returns:
    - folds (list): List of tuples (x_train, y_train, x_valid, y_valid)
    """
    folds = []
    n = x.shape[0]
    num = np.arange(0, n)
    np.random.shuffle(num)
    # 1. Shuffle indices
    # ... your code here ...
    cut = n // n_folds
    for i in range(n_folds):
        start = i * cut
        end = (i + 1) * cut

        idv = num[start:end]                  
        idt = np.concatenate([num[:start],    
                               num[end:]])

        folds.append((x[idt], y[idt], x[idv], y[idv]))
    

    # 2. Divide indices into n_folds+1 partitions
    # ... your code here ...
    
    # 3. Loop through folds
    # ... your code here ...
    
    return folds

In [3]:
# Skeleton for TODO 1 (continued)
# fixed the random seed
from typing import Any


np.random.seed(42)

# 1. Load Iris dataset
iris = load_iris()
x, y = iris.data, iris.target


# 2. Split into training and validation sets using cross_validation
# three_folds = ...
three_folds = cross_validation(x, y, n_folds= 3)


### **TODO 2**: Build a Fully-Connect Networks with PyTorch

Define a PyTorch model `FCN_model`:
1.  In `__init__`, define 3 fully connected layers (`nn.Linear`):
    *   Layer 1: Input size 4 -> `n_hidden` units.
    *   Layer 2: `n_hidden` -> `n_hidden`.
    *   Layer 3: `n_hidden` -> 3 (output classes).
2.  In `forward`:
    *   Pass input through Layer 1, apply ReLU.
    *   Pass through Layer 2, apply ReLU.
    *   Pass through Layer 3 (return logits, no activation).

In [4]:
# Skeleton for TODO 2
class FCN_model(nn.Module):
    # take the argument for the number of hidden units
    def __init__(self, n_hidden=32):
        super(FCN_model, self).__init__()
        # 1. Define Layer 1: Linear(4, n_hidden)
        # ... your code here ...
        self.fc1 = nn.Linear(4, n_hidden)
        self.fc2 = nn.Linear(n_hidden, n_hidden)
        self.fc3 = nn.Linear(n_hidden, 3)

        # 2. Define Layer 2: Linear(n_hidden, n_hidden)
        # ... your code here ...
        
        # 3. Define Output Layer: Linear(n_hidden, 3)
        # ... your code here ...

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)   
        return x

Set up the evaluation and training functions for the FCN models.

In [5]:
def eval(model:nn.Module, 
         x:torch.tensor, 
         y:torch.tensor) -> float:
    """Evaluate the model: inference the model with 0/1 loss
    We can define the output label is the maximum logit from the model
    
    Parameters:
    - model: the FCN model
    - x: input features
    - y: ground truth labels, dtype=long

    Returns:
    - loss: the average 0/1 loss value 
    """
    # Evaluate the model
    model.eval()
    with torch.no_grad():
        preds = torch.argmax(model(x), dim=1)

    loss = 0
    for y_pred, y_gt in zip(preds, y):
        if y_pred != y_gt:
            loss += 1
    print(f"Averaging 0/1 loss: {loss/preds.shape[0]:.4f}")
    return loss/preds.shape[0]

In [6]:
def train(model:nn.Module, 
          x_train:torch.tensor, 
          y_train:torch.tensor,
          x_valid:torch.tensor,
          y_valid:torch.tensor,
          epochs:int=300):
    """Trining process
    Parameters:
    - model: the FCN model
    - x_train, y_train: trainig features and labels (dtype=long)
    - x_valid, y_valid: validation features and labels (dtype=long)
    - epochs: number of the epoches for training
    """
    # To simplify the process
    # we do not take batches but use all the training samples
    # set up the objective function and the optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=1e-2)
    # training loop
    for epoch in range(epochs):
        model.train()
        # Forward pass
        outputs = model(x_train)
        loss = criterion(outputs, y_train)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (epoch + 1) % 100 == 0:
            print(f"Epoch [{epoch + 1}/{epochs}], Cross Entropy Loss: {loss.item():.4f}")
            print(f"[Train] ", end="")
            eval(model, x_train, y_train)
            print(f"[Valid] ", end="")
            eval(model, x_valid, y_valid)

### **TODO 3**: Conduct the training/validation process in each fold
We will use three-fold validation.

1.  Instantiate lists `train_losses` and `valid_losses`.
2.  Loop through `three_folds`:
    *   Instantiate `FCN_model(n_hidden=32)`.
    *   Convert data arrays to `torch.Tensor` (and labels to `dtype=torch.long`).
    *   Train the model for 500 epochs using `train()`.
    *   Evaluate using `eval()` on training data and validation data, appending results to the lists.

In [7]:
# Skeleton for TODO 3
train_losses, valid_losses = [], []

for idx, (x_train, y_train, x_valid, y_valid) in enumerate(three_folds):
    print(f"===== Training Fold {idx} =====")
    model = FCN_model(n_hidden=32)

    # 2. Convert numpy arrays to torch tensors
    x_train_t = torch.tensor(x_train, dtype=torch.float32)
    y_train_t = torch.tensor(y_train, dtype=torch.long)

    x_valid_t = torch.tensor(x_valid, dtype=torch.float32)
    y_valid_t = torch.tensor(y_valid, dtype=torch.long)

    # 3. Train model (500 epochs)
    train(
        model=model,
        x_train=x_train_t,
        y_train=y_train_t,
        x_valid=x_valid_t,
        y_valid=y_valid_t,
        epochs=500
    )

    # 4. Evaluate and store results
    train_loss = eval(model, x_train_t, y_train_t)
    valid_loss = eval(model, x_valid_t, y_valid_t)

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

===== Training Fold 0 =====
Epoch [100/500], Cross Entropy Loss: 0.6993
[Train] Averaging 0/1 loss: 0.3300
[Valid] Averaging 0/1 loss: 0.2800
Epoch [200/500], Cross Entropy Loss: 0.4800
[Train] Averaging 0/1 loss: 0.0900
[Valid] Averaging 0/1 loss: 0.0800
Epoch [300/500], Cross Entropy Loss: 0.3720
[Train] Averaging 0/1 loss: 0.0400
[Valid] Averaging 0/1 loss: 0.0200
Epoch [400/500], Cross Entropy Loss: 0.2906
[Train] Averaging 0/1 loss: 0.0400
[Valid] Averaging 0/1 loss: 0.0000
Epoch [500/500], Cross Entropy Loss: 0.2283
[Train] Averaging 0/1 loss: 0.0300
[Valid] Averaging 0/1 loss: 0.0000
Averaging 0/1 loss: 0.0300
Averaging 0/1 loss: 0.0000
===== Training Fold 1 =====
Epoch [100/500], Cross Entropy Loss: 0.7244
[Train] Averaging 0/1 loss: 0.3300
[Valid] Averaging 0/1 loss: 0.3400
Epoch [200/500], Cross Entropy Loss: 0.4767
[Train] Averaging 0/1 loss: 0.1200
[Valid] Averaging 0/1 loss: 0.1800
Epoch [300/500], Cross Entropy Loss: 0.3630
[Train] Averaging 0/1 loss: 0.0300
[Valid] Avera

In [8]:
print(f"#Fold, training loss, validation loss")
for idx, (train_loss, valid_loss) in enumerate(zip(train_losses, valid_losses)):
    print(f"{idx:>5d},          {train_loss:.2f},            {valid_loss:.2f}")

#Fold, training loss, validation loss
    0,          0.03,            0.00
    1,          0.02,            0.06
    2,          0.03,            0.02


### **TODO 4**: Check over-fitting with complex model
Repeat the procedure from TODO 3, but this time use a more complex model:
*   Set `n_hidden` to 2048.
*   Train for 500 epochs.
*   Store results in `train_overfit` and `valid_overfit`.

In [9]:
# Skeleton for TODO 4
train_overfit, valid_overfit = [], []

for idx, (x_train, y_train, x_valid, y_valid) in enumerate(three_folds):
    print(f"===== Training Fold {idx} =====")
    model = FCN_model(n_hidden=2048)

    # 2. Convert to torch Tensors
    x_train_t = torch.tensor(x_train, dtype=torch.float32)
    y_train_t = torch.tensor(y_train, dtype=torch.long)

    x_valid_t = torch.tensor(x_valid, dtype=torch.float32)
    y_valid_t = torch.tensor(y_valid, dtype=torch.long)

    # 3. Train (500 epochs)
    train(
        model=model,
        x_train=x_train_t,
        y_train=y_train_t,
        x_valid=x_valid_t,
        y_valid=y_valid_t,
        epochs=500
    )

    # 4. Evaluate and store results
    train_loss = eval(model, x_train_t, y_train_t)
    valid_loss = eval(model, x_valid_t, y_valid_t)

    train_overfit.append(train_loss)
    valid_overfit.append(valid_loss)

===== Training Fold 0 =====
Epoch [100/500], Cross Entropy Loss: 0.3589
[Train] Averaging 0/1 loss: 0.1900
[Valid] Averaging 0/1 loss: 0.1600
Epoch [200/500], Cross Entropy Loss: 0.2593
[Train] Averaging 0/1 loss: 0.0800
[Valid] Averaging 0/1 loss: 0.1400
Epoch [300/500], Cross Entropy Loss: 0.1477
[Train] Averaging 0/1 loss: 0.0600
[Valid] Averaging 0/1 loss: 0.1000
Epoch [400/500], Cross Entropy Loss: 0.0904
[Train] Averaging 0/1 loss: 0.0200
[Valid] Averaging 0/1 loss: 0.0000
Epoch [500/500], Cross Entropy Loss: 0.0803
[Train] Averaging 0/1 loss: 0.0200
[Valid] Averaging 0/1 loss: 0.0200
Averaging 0/1 loss: 0.0200
Averaging 0/1 loss: 0.0200
===== Training Fold 1 =====
Epoch [100/500], Cross Entropy Loss: 0.3019
[Train] Averaging 0/1 loss: 0.1400
[Valid] Averaging 0/1 loss: 0.0800
Epoch [200/500], Cross Entropy Loss: 0.1435
[Train] Averaging 0/1 loss: 0.0700
[Valid] Averaging 0/1 loss: 0.0600
Epoch [300/500], Cross Entropy Loss: 0.0651
[Train] Averaging 0/1 loss: 0.0000
[Valid] Avera

In [10]:
print(f"#Fold, training loss, validation loss")
for idx, (train_loss, valid_loss) in enumerate(zip(train_overfit, valid_overfit)):
    print(f"{idx:>5d},          {train_loss:.2f},            {valid_loss:.2f}")

#Fold, training loss, validation loss
    0,          0.02,            0.02
    1,          0.01,            0.06
    2,          0.04,            0.02


### **TODO 5**: Compare the FCN with statistical ML models
Use `sklearn.naive_bayes.GaussianNB`:
1.  Loop through `three_folds`.
2.  Instantiate `GaussianNB`.
3.  Fit the model on training data.
4.  Calculate error (1 - accuracy) for training and validation sets using `model.score()`.
5.  Store errors in `train_nb` and `valid_nb`.

In [11]:
# Skeleton for TODO 5
from sklearn.naive_bayes import GaussianNB

train_nb, valid_nb = [], []
for idx, (x_train, y_train, x_valid, y_valid) in enumerate(three_folds):
    model = GaussianNB()

    # 2. Fit model on training data
    model.fit(x_train, y_train)

    # 3. Calculate error (1 - accuracy)
    train_error = 1.0 - model.score(x_train, y_train)
    valid_error = 1.0 - model.score(x_valid, y_valid)

    # 4. Store results
    train_nb.append(train_error)
    valid_nb.append(valid_error)

In [12]:
print(f"#Fold, training loss, validation loss")
for idx, (train_loss, valid_loss) in enumerate(zip(train_nb, valid_nb)):
    print(f"{idx:>5d},          {train_loss:.2f},            {valid_loss:.2f}")

#Fold, training loss, validation loss
    0,          0.05,            0.04
    1,          0.02,            0.06
    2,          0.04,            0.04


### **TODO 6**:
Answer the following questions in the next cell.  
1. What is the the bias-variance trade-off in machine learning?
2. How to reduce overfitting and underfitting? 
3. How do the training and inference processes differ between the Naive Bayes model and a fully connected neural network?

Your answer:
1. Simple models have high bias and underfit, while complex models have high variance and overfit. The goal is to balance both to minimize generalization error.
2. Overfitting can be reduced with regularization, simpler models, or more data. Underfitting can be reduced by increasing model complexity or training longer.
3. Naive Bayes trains quickly using probabilistic estimates and strong independence assumptions, while FCNs require iterative gradient-based training and can model complex nonlinear relationships.