# Neural Network 101

The effectiveness of a deep learning model depends on several factors. These include the choice of architecture, the number of hidden layers, the number of neurons in each layer, the optimizer and activation functions used, and the number of training epochs.

While research literature and factors such as dataset type can help guide the selection of an appropriate model architecture, choosing other factors like the number of layers or training epochs typically requires a trial and error process.

In this tutorial, you will get hands-on experience with this process. For simplicity, we will restrict the model architecture to a neural network (NN). However, you will have the opportunity to experiment with different activation functions, optimizers, network configurations, and training epochs.

Below, you will find simplified definitions of important terminology. We have intentionally kept the explanations non-technical, as theoretical exploration is beyond the scope of this project. However, if you are interested in learning more or gaining deeper insight, please refer to this link `https://www.geeksforgeeks.org/machine-learning/neural-networks-a-beginners-guide/` for further exploration.


1) __NN__: A neural network (NN) is a deep learning model architecture inspired by the structure and functioning of the human brain. It consists of artificial neurons, which resemble biological brain cells. These neurons are organized into layers and are interconnected, mimicking the way brain cells are connected in the nervous system.

3) __Hidden Layer__: A neural network (NN) consists of three or more layers of neurons. The first and last layers are called the input and output layers, respectively. The layers between them are known as hidden layers (Layers with neurons highlighted in red).

![NN model architecture](../../Images/NN.png)

3) __Activation Function__: One of the key reasons for the widespread adoption of neural networks is their ability to capture non-linear patterns in data. This capability is made possible through the use of activation functions, which introduce non-linearity into the model.

4) __Optimizer__: Optimizer algorithms help a model improve its performance by reducing the error between its predicted and target values.

5) __Epochs__: An epoch is one complete pass through the entire training dataset by the model during training.

6) __Mean Squared Error__: The average of the squared differences between predicted and actual values. This metric penalizes larger errors more heavily, making it sensitive to outliers. A lower MSE indicates fewer large errors and smaller overall prediction errors.

In [1]:
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, Subset
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

class DatasetClass(Dataset):
    def __init__(self, num_samples=300):
        noise_std=0.8
        seed=42
        np.random.seed(seed)
        self.x = np.sort(np.random.uniform(-3.5, 3.5, num_samples)).astype(np.float32)

        y = (
            np.sin(7 * self.x ** 3)
            + np.log(np.abs(self.x) + 1) * np.cos(3 * self.x ** 2)
            + np.tanh(self.x ** 4)
        ) * np.exp(-self.x ** 2 / 2) + 0.3 * self.x ** 5

        y += np.random.normal(0, noise_std, size=self.x.shape)

        self.x = torch.tensor(self.x, dtype=torch.float32).unsqueeze(1)
        self.y = torch.tensor(y, dtype=torch.float32).unsqueeze(1)

    def __len__(self):
        return len(self.x)

    def __getitem__(self, idx):
        return self.x[idx], self.y[idx]

class ModelClass(nn.Module):
    def __init__(self, model_block: nn.Sequential, activation: str = "relu"):
        super().__init__()
        self.model = model_block

    def forward(self, x):
        return self.model(x)

class TrainClass:
    def __init__(self, model, dataset, optimizer_name="adam", lr=0.01, epochs=30, batch_size=32, test_size=0.2):
        self.model = model
        self.epochs = epochs
        self.loss_fn = torch.nn.MSELoss()
        self.optimizer = self._get_optimizer(optimizer_name, model.parameters(), lr)
        self.scheduler = torch.optim.lr_scheduler.StepLR(self.optimizer, step_size=50, gamma=0.5)
        self.train_loader, self.test_loader = self._split_dataset(dataset, test_size, batch_size)
        self.train_losses = []

    def _get_optimizer(self, name, parameters, lr):
        name = name.lower()
        if name == "adam":
            return torch.optim.Adam(parameters, lr=lr)
        elif name == "sgd":
            return torch.optim.SGD(parameters, lr=lr)
        elif name == "adamw":
            return torch.optim.AdamW(parameters, lr=lr)
        else:
            raise ValueError(f"Unsupported optimizer: {name}")

    def _split_dataset(self, dataset, test_size, batch_size):
        indices = np.arange(len(dataset))
        train_idx, test_idx = train_test_split(indices, test_size=test_size, random_state=42)
        train_set = Subset(dataset, train_idx)
        test_set = Subset(dataset, test_idx)
        return (
            DataLoader(train_set, batch_size=batch_size, shuffle=True),
            DataLoader(test_set, batch_size=batch_size, shuffle=False)
        )

    def train(self):
        for epoch in range(1, self.epochs + 1):
            self.model.train()
            total_loss = 0
            for xb, yb in self.train_loader:
                pred = self.model(xb)
                loss = self.loss_fn(pred, yb)
                self.optimizer.zero_grad()
                loss.backward()
                self.optimizer.step()
                total_loss += loss.item()

            self.scheduler.step()
            avg_loss = total_loss / len(self.train_loader)
            self.train_losses.append(avg_loss)

            if epoch % 5 == 0 or epoch == 1:
                print(f"Epoch {epoch} | Train Loss: {avg_loss:.6f}")

    def evaluate(self):
        self.model.eval()
        total_loss = 0
        preds, targets = [], []

        with torch.no_grad():
            for xb, yb in self.test_loader:
                pred = self.model(xb)
                loss = self.loss_fn(pred, yb)
                total_loss += loss.item()
                preds.append(pred)
                targets.append(yb)

        final_mse = total_loss / len(self.test_loader)
        print(f"\nFinal Test MSE: {final_mse:.6f}")
        return torch.cat(preds), torch.cat(targets)

class PlotClass:
    @staticmethod
    def loss(train_losses):
        plt.figure(figsize=(8, 4))
        plt.plot(train_losses, label="Train Loss")
        plt.xlabel("Epoch")
        plt.ylabel("MSE Loss")
        plt.title("Training Loss")
        plt.grid(True)
        plt.legend()
        plt.tight_layout()
        plt.show()

    @staticmethod
    def predictions(model, dataset):
        model.eval()
        with torch.no_grad():
            x_all = dataset.x
            y_true = dataset.y
            y_pred = model(x_all)

        plt.figure(figsize=(10, 4))
        plt.scatter(x_all.numpy(), y_true.numpy(), s=10, label="True", alpha=0.5)
        plt.plot(x_all.numpy(), y_pred.numpy(), color="red", label="Predicted", linewidth=2)
        plt.title("Regression Fit with Noise")
        plt.xlabel("x")
        plt.ylabel("y")
        plt.grid(True)
        plt.legend()
        plt.tight_layout()
        plt.show()

## Excersice 1:

In this exercise, there are 12 possible combinations of activation functions (relu, tanh, leaky_relu, gelu) and optimizers (adam, sgd, adamw). Randomly select five of these combinations. For each one, train the model for 30 epochs and record both the training loss and final test MSE. Identify which combination results in the lowest final test MSE and training loss.

In [None]:
if __name__ == "__main__":

    # TODO: sSelect five of these combinations activationa and 
    activation = "relu"     # Options: relu, tanh, leaky_relu, gelu
    optimizer = "adam"      # Options: adam, sgd, adamw
    epochs = 30

    act_map = {
            "relu": nn.ReLU(),
            "tanh": nn.Tanh(),
            "leaky_relu": nn.LeakyReLU(),
            "gelu": nn.GELU()
        }
    act_ftn = act_map[activation.lower()] 
    
    model_block = nn.Sequential(
        nn.Linear(1, 16),
        act_ftn,
        nn.Linear(16, 4),
        act_ftn,
        nn.Linear(4, 1)
    )
    

    dataset = DatasetClass()
    model = ModelClass(model_block=model_block, activation=activation)
    trainer = TrainClass(model, dataset, optimizer_name=optimizer, epochs=epochs)

    trainer.train()
    trainer.evaluate()
    PlotClass.loss(trainer.train_losses)
    PlotClass.predictions(model, dataset)

## Excersice 2:
Currently, the model architecture is defined as follows: the input layer is nn.Linear(1, 16), the hidden layer is nn.Linear(16, 4), and the output layer is nn.Linear(4, 1).

```
model_block = nn.Sequential(
    nn.Linear(1, 16),
    act_ftn,
    nn.Linear(16, 4),
    act_ftn,
    nn.Linear(4, 1)
)
```

If you want to add an additional hidden layer, you can modify the architecture like this:
```
model_block = nn.Sequential(
    nn.Linear(1, 16),
    act_ftn,
    nn.Linear(16, 8), 
    act_ftn,
    nn.Linear(8, 4),
    act_ftn,
    nn.Linear(4, 1)
)
```

Notice that the output dimension of one layer must match the input dimension of the next layer. For example, the first hidden layer outputs 16 values, so the next layer must accept 16 as its input dimension. If these dimensions do not align, you will encounter a RuntimeError.


In this exercise, add two additional hidden layers to the model architecture shown below. Then, use the best combination of optimizer and activation function identified in Exercise 1, and train the updated model on 20 epochs.

```
model_block = nn.Sequential(
    nn.Linear(1, 128),
    act_ftn,
    nn.Linear(128, 16),
    act_ftn,
    nn.Linear(16, 1)
)
```

## Excersice 3:
I was able to achieve a final test MSE of 3.446659 in 30 epochs. Can you build a better model using any combination of optimizer and activation function and beat my result within the same epoch limit?

In [None]:
if __name__ == "__main__":

    
    activation = "relu"     # Options: relu, tanh, leaky_relu, gelu
    optimizer = "adam"      # Options: adam, sgd, adamw
    epochs = 30

    act_map = {
            "relu": nn.ReLU(),
            "tanh": nn.Tanh(),
            "leaky_relu": nn.LeakyReLU(),
            "gelu": nn.GELU()
        }
    act_ftn = act_map[activation.lower()] 
    
    model_block = nn.Sequential(
        nn.Linear(1, 16),
        act_ftn,
        nn.Linear(16, 4),
        act_ftn,
        nn.Linear(4, 1)
    )
    

    dataset = DatasetClass()
    model = ModelClass(model_block=model_block, activation=activation)
    trainer = TrainClass(model, dataset, optimizer_name=optimizer, epochs=epochs)

    trainer.train()
    trainer.evaluate()
    PlotClass.loss(trainer.train_losses)
    PlotClass.predictions(model, dataset)

## Next Step: 
`/BuildingsBenchTutorial/Tutorials/Final-Project-Modules/`