# Objective
The objective for this assignment is to take the in-class Multi-Layer Perceptron code we used for classification and modify it for regression.

# Approach
## MLP Class
We write an MLP class that inherits `torch.nn.Module`, the basic Neural Network module containing the required functions we'll use for Linear Regression.

The `torch.nn.Sequential` class creates a sequential container that allows us to manually call a sequence of modules. In effect, it enables us to transform the container as needed, like creating three `torch.nn.Linear` layers. The input to the first layer should be the number of features and the output of the last layer should be 1. In this case, we'll call the Sigmoid activation function to see a non-linear fit to the data. This will be graphed later in the report.

In [None]:
import torch

class MLP(torch.nn.Module):
	def __init__(self, num_features):
		super().__init__()
		self.all_layers = torch.nn.Sequential(
            # 1st hidden layer
            torch.nn.Linear(num_features, 5),
            torch.nn.Sigmoid(),
            # 2nd hidden layer
            torch.nn.Linear(5, 2),
            torch.nn.Sigmoid(),						  
            # output layer
            torch.nn.Linear(2, 1),
        )

	def forward(self, x):
		logits = self.all_layers(x)
		return logits

## Dataset, Data normalization, and Dataloader

Given the data below

In [None]:
X_train = torch.tensor([245.0, 273.0, 304.0, 331.0, 347.0, 360.0, 387.0, 438.0, 493.0, 547.0]).view(-1,1)
y_train = torch.tensor([232.3, 241.1, 257.4, 301.5, 324.6, 350.2, 362.3, 389.0, 398.2, 401.8])

We can create a `MyDataset` class modeled after the one we discussed in the lecture. This time, instead of inheriting from the `Dataset` class, we'll inherit from the `TensorDataset` class as our data is already tensorized.

The `MyDataset` class is a map-style dataset and needs to implement the `__getItem__()` and `__len__()` protocols. By defining these methods, we enable the use of the `DataLoader` utility class, allowing us to easily iterate through the dataset during our training loop.

In [None]:
from torch.utils.data import TensorDataset 

class MyDataset(TensorDataset):
	def __init__(self, X, y):
		self.features = X
		self.labels = y

	def __getitem__(self, index):
		x = self.features[index]
		y = self.labels[index]
		return x, y
	
	def __len__(self):
		return self.labels.shape[0]

The classification example during lecture did not use normalized data because it was already centered at zero, and had relatively small values.

In our case, we will normalize the data by doing z-score standardization using the mean and standard deviation of the given data. We'll also later use these values to make predictions when plotting the regression curve.

We then put our normalized data into the `MyDataset` class, which is then inserted into the `DataLoader` utility class for training.

In [None]:
from torch.utils.data import DataLoader 

X_mean, X_std = X_train.mean(), X_train.std()
y_mean, y_std = y_train.mean(), y_train.std()

X_normalized = (X_train - X_mean) / X_std
y_normalized = (y_train - y_mean) / y_std

train_ds = MyDataset(X_normalized, y_normalized)

train_loader = DataLoader(
	dataset=train_ds,
	batch_size=5,
	shuffle=True,
)

## Training Loop

The code for the training is very similar to the classification example from lecture, so I'd like to analyze the code in this report.
Before the training loop, we do a bit of setup. We first set a seed by calling `torch.manual_seed()` to ensure reproducibility of results in this report.
We then initialize our MLP model and optimizer. The MLP model takes in the number of features (which is 1 in our case). Our optimizer will use the Stochastic Gradient Descent algorithm with parameters from our MLP model and a learning rate of 0.5.

### Stochastic Gradient Descent (SGD)
SGD is an optimization algorithm used to minimize the loss function in deep learning models. Stochastic refers to how the gradient is computed and the weights are updated for each training batch, as opposed to the entire training dataset.

### Learning Rate
The learning rate is a hyperparameter of the optimization algorithm that determines the step size of each iteration when moving towards a minimum of the loss function. It controls how much we adjust the model in response to the estimated error each time the model weights are updated.

A high learning rate result in the model converging faster, but can overshoot the optimal point. This could even lead to higher loss or the model may fail to converge entirely.

A low learning rate will converge slowly, which may allow the model to reach a more precise loss minimum at the cost of time. There's a chance that the model can get stuck in an undesired local minimum.

In [None]:
torch.manual_seed(123)
model = MLP(num_features=1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)

Now comes the actual training loop. We declare the number of epochs and a list to store the loss value at the end of each epoch. These loss values are used later to visually see our loss regression on a graph.

In [None]:
num_epochs = 200 
losses = []

We then create our training loop to run `num_epochs` times. At the beginning of each epoch iteration, we set our `model` to training mode, as opposed to evaluation mode with `model.train()`.

With each epoch iteration, we create a batch loop which iterates through our dataset.

Each batch iteration will retrieve the features and labels, passes the features into our model to get predictions, and calculates the loss for that batch by using `torch.nn.functional.mse_loss()`. We then call `zero_grad()` on our optimizer to set the gradients of all model parameters to zero as gradients accumlate by default in PyTorch. We call `backward()` to compute the gradient of the loss with respect to the model and then updates the model parameters with the newly calculated gradient by calling `step()` on the optimizer.

At the end of each batch, we add our loss to the `losses` list.

In [None]:
import torch.nn.functional as F

for epoch in range(num_epochs):
    model = model.train()
    for batch_idx, (features, labels) in enumerate(train_loader):
        preds = model(features)
        loss = F.mse_loss(preds.squeeze(), labels.float())
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    losses.append(loss.item())

## Conclusion - Evaluating the Model

We can now plot the regression curve! We start off by generating the points as x values.

In [None]:
X_range = torch.arange(200, 600, 0.1).view(-1, 1)

We then normalize these points from the `x_mean` and `x_std` from earlier, and use normalized points to make predictions from our now trained model.

In [None]:
X_range_normalized = (X_range - X_mean) / X_std

with torch.no_grad():
    y_range_preds = model(X_range_normalized)

Then we un-normalize the prediction values using the original `y_mean` and `y_std`.

In [None]:
y_range_unnormalized = (y_range_preds.squeeze() * y_std) + y_mean
y_range_unnormalized = y_range_unnormalized.numpy()
X_range = X_range.numpy()

This provides us with everything we need to plot the regression curve! Notice that the curve is non-linear because we used the Sigmoid activation function.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
plt.scatter(X_train, y_train, color='blue', label='Training data', s=100)
plt.plot(X_range, y_range_unnormalized, color='red', label='Regression curve')
plt.xlabel('X values')
plt.ylabel('Predictions')
plt.title('MLP Regression Curve with Non-Linear Activation')
plt.legend()
plt.grid(True)
plt.show()