# Module 3: Training Models on Structured Data Using PyTorch

## What is Structured Data?
Structured data refers to any data that resides in a fixed field within a file or record. This includes data contained in relational databases (such as tables and SQL queries), as well as Excel spreadsheets. Structured data is typically tabular, with columns representing different attributes and rows representing instances.

## Why Use Neural Networks for Structured Data?
While structured data has traditionally been handled with classical machine learning algorithms (like linear regression, decision trees, and SVMs), neural networks can provide significant benefits:

- Automatic Feature Learning: Neural networks can automatically learn to represent raw features in a way that is useful for a given task, reducing the need for manual feature engineering.
- Scalability: Neural networks are highly scalable and can handle large and high-dimensional datasets efficiently.
- Flexibility and Customization: Neural networks can be designed to suit various complex tasks and are not restricted to specific data distributions.

## Goals of This Notebook

1. Understand how to load and preprocess structured data.
2. Learn how to build a simple neural network using PyTorch.
3. Train this network on structured data.
4. Evaluate the model’s performance and fine-tune it.

## Designing the Neural Network Architecture

Let's assume we are working with a dataset having 10 features (input size), and we are solving a binary classification problem (output size of 1). Here’s how we can define a simple neural network with an input layer, output layer, and hidden layer of size 5:

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(10, 5)  # Input layer
        self.fc2 = nn.Linear(5, 5)   # Hidden layer
        self.fc3 = nn.Linear(5, 1)   # Output layer
        
    def forward(self, x):
        x = F.relu(self.fc1(x)) # Activation function for input layer     
        x = F.relu(self.fc2(x)) # Activation function for hidden layer
        x = torch.sigmoid(self.fc3(x))  # Activation function for output layer
        return x

# Instantiate the model
model = SimpleNN()
print(model)

SimpleNN(
  (fc1): Linear(in_features=10, out_features=5, bias=True)
  (fc2): Linear(in_features=5, out_features=5, bias=True)
  (fc3): Linear(in_features=5, out_features=1, bias=True)
)


## Creating a Toy Structured Dataset

For demonstration purposes, let’s create a synthetic dataset using scikit-learn's `make_classification` function. This function allows us to create a multi-class or binary classification dataset:

In [2]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

N_CLASSES=2

# Create training and test sets
features, target = make_classification(n_classes=N_CLASSES, n_informative=9, 
    n_redundant=0, n_features=10, n_samples=1000)
features_train, features_test, target_train, target_test = train_test_split(
    features, target, test_size=0.1, random_state=1)

# Set random seed
torch.manual_seed(0)
np.random.seed(0)

# Convert data to PyTorch tensors
x_train = torch.from_numpy(features_train).float()
y_train = torch.from_numpy(target_train).float().view(-1,1)
x_test = torch.from_numpy(features_test).float()
y_test = torch.from_numpy(target_test).float().view(-1,1)

In this example:

- `make_classification` is used to generate a synthetic binary classification dataset with 100 samples, 10 features, and 2 classes (0 or 1).
- `train_test_split` is used to split the dataset into training and test sets.
We then convert the Numpy arrays to PyTorch tensors using torch.tensor.

## Loss Function and Optimizer Selection
For our binary classification task, we will use the Binary Cross Entropy Loss. For optimization, we will use the popular Stochastic Gradient Descent (SGD) optimizer:

In [3]:
# Loss function
criterion = nn.BCELoss()

# Optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

## Training the model
Training a neural network essentially consists of iterating through our dataset multiple times (epochs), and in each pass:

Making predictions (forward pass),
Calculating the loss (cost function),
Computing the gradients (backward pass),
Updating the weights (optimizer step).
Here’s the Python code that implements these steps:

In [4]:
# Number of epochs
n_epochs = 100

# Store the loss at each epoch
losses = []

# Training loop
for epoch in range(n_epochs):
    # Forward pass
    outputs = model(x_train)
    
    # Compute the loss
    loss = criterion(outputs, y_train)
    losses.append(loss.item())
    
    # Zero the gradients
    optimizer.zero_grad()
    
    # Backward pass
    loss.backward()
    
    # Update the weights
    optimizer.step()
    
    # Print loss every 10 epochs
    if epoch % 10 == 0:
        print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch, n_epochs, loss.item()))

print('Training complete.')

Epoch [0/100], Loss: 0.7101
Epoch [10/100], Loss: 0.7049
Epoch [20/100], Loss: 0.7000
Epoch [30/100], Loss: 0.6953
Epoch [40/100], Loss: 0.6907
Epoch [50/100], Loss: 0.6863
Epoch [60/100], Loss: 0.6819
Epoch [70/100], Loss: 0.6776
Epoch [80/100], Loss: 0.6733
Epoch [90/100], Loss: 0.6690
Training complete.


In this loop:

- Forward Pass: We start by running our data through the model (outputs = model(X_train_tensor)). This is the forward pass.
- Compute Loss: We then compute the loss/cost using our loss function (loss = criterion(outputs, y_train_tensor)).
- Backward Pass: This step computes the gradient of the loss with respect to each parameter (loss.backward()).
- Update the Weights: Finally, we update the weights using the gradients with the optimizer step (optimizer.step()).

## Evaluating the Model
After training the model, we need to evaluate it using the test set. We don’t compute gradients here, since we are not updating the weights during evaluation.

In [5]:
# Set the model to evaluation mode
model.eval()

# No gradient computation
with torch.no_grad():
    # Forward pass
    test_outputs = model(x_test)
    
    # Compute the loss
    test_loss = criterion(test_outputs, y_test)

# Convert to numpy arrays
test_outputs_np = test_outputs.numpy()
y_test_np = y_test.numpy()

# Calculate the accuracy of the model
predicted = (test_outputs_np > 0.5).astype(int)
accuracy = (predicted == y_test_np).mean()
    
print('Test Loss: {:.4f}'.format(test_loss.item()))
print('Test Accuracy: {:.2f}%'.format(accuracy * 100))

Test Loss: 0.6529
Test Accuracy: 66.00%


In this evaluation phase:

- We set the model to evaluation mode (model.eval()), which is essential when the network has layers like dropout or batch normalization.
- `with torch.no_grad():` context manager is used to ensure that no gradients are computed, which reduces memory usage and speeds up computation.
- The model’s performance is assessed using the loss calculated from the test set and the accuracy metric, which is simply the proportion of the test set that was correctly classified.

## Hyperparameter Tuning
Hyperparameter tuning is the process of systematically searching for the best combination of hyperparameters that optimize a model's performance on a particular task. Common hyperparameters in neural network training include the learning rate, number of epochs, batch size, and the architecture of the network itself (e.g., number of layers, number of units in each layer).

### Using Grid Search
One common approach for hyperparameter tuning is grid search. In a grid search, we specify a set of possible values for each hyperparameter we want to tune, and then we train a new model for every possible combination of these hyperparameters.

Below is a simple example of hyperparameter tuning via grid search, where we try different learning rates and batch sizes:

In [6]:
from sklearn.model_selection import ParameterGrid

# Define the hyperparameters
param_grid = {
    'lr': [0.1, 0.01, 0.001],
    'batch_size': [16, 32, 64],
}

# Generate combinations of hyperparameters
grid = ParameterGrid(param_grid)

# Store the results
results = []

# Grid search
for params in grid:
    # Set the hyperparameters
    lr = params['lr']
    batch_size = params['batch_size']
    
    # Create a new model
    model = SimpleNN()
    
    # Loss function
    criterion = nn.BCELoss()

    # Optimizer
    optimizer = torch.optim.SGD(model.parameters(), lr=lr)
    
    # Data loader
    train_loader = torch.utils.data.DataLoader(
        torch.utils.data.TensorDataset(x_train, y_train), 
        batch_size=batch_size, shuffle=True)
    
    # Training loop (for simplicity, we'll use a fixed number of epochs)
    # Note that this will heavily affect which learning rate does best
    for epoch in range(100):
        for X_batch, y_batch in train_loader:
            outputs = model(X_batch)
            loss = criterion(outputs, y_batch)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
    
    # Evaluate the model
    model.eval()
    with torch.no_grad():
        test_outputs = model(x_test)
        test_loss = criterion(test_outputs, y_test)
        
    # Convert to numpy arrays for calculating accuracy
    test_outputs_np = test_outputs.numpy()
    y_test_np = y_test.numpy()
    predicted = (test_outputs_np > 0.5).astype(int)
    accuracy = (predicted == y_test_np).mean()
    
    # Store the results
    results.append({
        'lr': lr,
        'batch_size': batch_size,
        'test_loss': test_loss.item(),
        'accuracy': accuracy
    })

# Print the results
for result in results:
    print('lr: {:.3f}, batch_size: {:d}, Test Loss: {:.4f}, Accuracy: {:.2f}%'.format(
        result['lr'], result['batch_size'], result['test_loss'], result['accuracy'] * 100))


lr: 0.100, batch_size: 16, Test Loss: 0.2363, Accuracy: 89.00%
lr: 0.010, batch_size: 16, Test Loss: 0.3046, Accuracy: 84.00%
lr: 0.001, batch_size: 16, Test Loss: 0.5044, Accuracy: 82.00%
lr: 0.100, batch_size: 32, Test Loss: 0.2931, Accuracy: 86.00%
lr: 0.010, batch_size: 32, Test Loss: 0.3672, Accuracy: 85.00%
lr: 0.001, batch_size: 32, Test Loss: 0.6239, Accuracy: 50.00%
lr: 0.100, batch_size: 64, Test Loss: 0.2078, Accuracy: 94.00%
lr: 0.010, batch_size: 64, Test Loss: 0.4551, Accuracy: 86.00%
lr: 0.001, batch_size: 64, Test Loss: 0.6167, Accuracy: 50.00%


Above we can see how different combinations of hyperparameters performed across 100 epochs! However, it's important to note that the lower the learning rate - the larger the number of epochs the model takes to learn - so we can't yet say that one learning rate is better than another. This is simply meant to serve as an example for how we can train deep learning models whilst also finding the best parameters for our data. 