course: https://learn.deeplearning.ai/courses/pytorch-fundamentals/

lesson: https://learn.deeplearning.ai/specializations/pytorch-for-deep-learning-professional-certificate/lesson/466v5tii/building-a-simple-neural-network

based off of lab: 
https://learn.deeplearning.ai/specializations/pytorch-for-deep-learning-professional-certificate/lesson/x77awl3j/modeling-non-linear-patterns-with-activation-functions 


LESSON CONCEPTS
Nonlinear Modeling with Activation Functions
+ 
Normalization


Expands on ```pytorch-fundamentals-1-basic-ml-pipeline.ipynb```

Activation functions
Great explanations and illustrations: https://learn.deeplearning.ai/specializations/pytorch-for-deep-learning-professional-certificate/lesson/hllrjryf/activation-functions

- Refer to screenshots for ReLU

- Standard activation functions: ReLU, Sigmoid, TanH



Lesson verbiage:
In the last lab, your simple linear model performed well on bike-only data, but it struggled when cars were added. The reason was simple: your model could only learn straight lines, but the new data followed a curve. As you saw in the lectures, simply adding more linear neurons is not the solution. The model's output would still be a straight line.

This is where non-linear activation functions come in. They are the key to unlocking your model's ability to learn the complex, curved patterns found in real-world data. In this lab, you'll use the most popular and powerful activation function, ReLU (Rectified Linear Unit), to build a more sophisticated model. By adding a ReLU activation, your model can create multiple "bends" that can approximate the complex delivery time curve.

In this lab, you will:

Prepare the combined bike and car delivery data, this time applying a technique called normalization to help your model train more effectively.
Build a non-linear neural network using the ReLU activation function.
Train your new model to learn the complex, curved relationship in the data.
Predict delivery times using your new model and see if it can finally succeed where the linear one failed.


In [None]:
# Imports

import torch
import torch.nn as nn   # neural network
import torch.optim as optim # optimization algorithms

import matplotlib.pyplot as plt
import numpy as np

In [None]:
# 1. Data Ingestion 

# more complex data than first notebook for both bike and car deliveries

distances_data = [
    [1.0], [1.5], [2.0], [2.5], [3.0], [3.5], [4.0], [4.5], [5.0], [5.5],
    [6.0], [6.5], [7.0], [7.5], [8.0], [8.5], [9.0], [9.5], [10.0], [10.5],
    [11.0], [11.5], [12.0], [12.5], [13.0], [13.5], [14.0], [14.5], [15.0], [15.5],
    [16.0], [16.5], [17.0], [17.5], [18.0], [18.5], [19.0], [19.5], [20.0]
]

delivery_times_data = [
    [6.96], [9.67], [12.11], [14.56], [16.77], [21.7], [26.52], [32.47], [37.15], [42.35],
    [46.1], [52.98], [57.76], [61.29], [66.15], [67.63], [69.45], [71.57], [72.8], [73.88],
    [76.34], [76.38], [78.34], [80.07], [81.86], [84.45], [83.98], [86.55], [88.33], [86.83],
    [89.24], [88.11], [88.16], [91.77], [92.27], [92.13], [90.73], [90.39], [92.98]
]

# input tensor
distances_tensor = torch.tensor(
    distances_data,
    dtype=torch.float32
)

# output tensor
delivery_times_tensor = torch.tensor(
    delivery_times_data,
    dtype=torch.float32
)


In [None]:
# 2. Data Preparation

# Normalization
"""
Lesson notes:

This is astandard technique that makes the training process more stable and effective by adjusting the scale of the data. This adjustment helps prevent large distance values from dominating the learning process and keeps gradients stable during training.

You will calculate the mean and standard deviation for the distances and times tensors.
You will then apply standardization to each tensor using its respective mean and standard deviation, which creates new normalized tensors named distances_norm and times_norm.
This specific technique is called standardization (or z-score normalization), which converts the original data from 1.0 to 20.0 miles and approximately 7 to 93 minutes into a new, normalized scale.
"""

distances_mean = distances_tensor.mean()
distances_std = distances_tensor.std()

delivery_times_mean = delivery_times_tensor.mean()
delivery_times_std = delivery_times_tensor.std()

# normalized tensors
distances_tensor_normalized = (distances_tensor - distances_mean) / distances_std
delivery_times_tensor_normalized = (delivery_times_tensor - delivery_times_mean) / delivery_times_std

# print(f"Distances normalized: {distances_tensor_normalized[:5]}")
# print(f"Delivery times normalized: {delivery_times_tensor_normalized[:5]}")


In [None]:
# 3. Model

# define model
"""
Lesson notes:
nn.Linear(1, 3): This is your first hidden layer. It consists of three neurons, each receiving one input feature (the normalized distance). This layer transforms the single input value into three separate values.
nn.ReLU() applies the ReLU activation function to the output of each of the three neurons from the hidden layer. This is the crucial non-linear step that allows your model to create "bends" and learn curves instead of just straight lines.
nn.Linear(3, 1): This is your output layer. It takes the three activated values from the previous step as its input and combines them to produce a single final output, which is your predicted (normalized) delivery time.
"""

torch.manual_seed(27)   # ensures results are reproducible

model = nn.Sequential(
    nn.Linear(in_features=1, out_features=3),  # hidden layer
    nn.ReLU(),                                # activation function
    nn.Linear(in_features=3, out_features=1) # output layer
)

In [None]:
# 4. Training

# define loss function and optimizer
loss_function = nn.MSELoss()   # mean squared error loss
optimizer = optim.SGD(
    model.parameters(),
    lr=0.01   # learning rate
)


In [None]:
# Helper: plot_training_progress

def plot_training_progress(
    epoch,
    loss,
    model,
    input_data_normalized,
    target_data_normalized
):
    
    """
    Course notes:
    Plots the training progress of a model on normalized data,
    showing the current fit at each epoch.

    Args:
        epoch: The current training epoch number.
        loss: The loss value at the current epoch.
        model: The model being trained.
        input_data_normalized (distances_norm)
        target_data_normalized (times_norm)
    """

    predicted_target_data_normalized = model(input_data_normalized)
    
    # convert tensors to NumPy arrays for plotting
    x_plot = input_data_normalized.numpy()
    y_plot = target_data_normalized.numpy()

    # detach predictions from the computation graph and convert to NumPy
    y_pred_plot = predicted_target_data_normalized.detach().numpy()

    # sort the data based on distance to ensure a smooth line plot
    sorted_indices = x_plot.argsort(axis=0).flatten()

    # create new figure for the plot
    plt.figure(figsize=(8, 6))

    # plot the original normalized data points
    plt.plot(x_plot, y_plot, color='orange', marker='o', linestyle='none', label='Actual Normalized Data')

    # plot the model's predictions as a line
    plt.plot(x_plot[sorted_indices], y_pred_plot[sorted_indices], color='green', label='Model Predictions')

    # set chart elements
    plt.title(f'Epoch: {epoch + 1} | Normalized Training Progress')
    plt.xlabel('Normalized Distance')
    plt.ylabel('Normalized Time')

    # show legend
    plt.legend()

    # show grid
    plt.grid(True)

    # show plot
    plt.show()

In [None]:
# 4. Training

"""
Lesson notes:
You will run the training loop for 3000 epochs (more than Lab 1 because the non-linear pattern is more complex and requires more training). This will repeatedly feed the normalized data to your model, measure the error, and adjust the model's parameters to improve its predictions.
The second half of the code includes a live plot, allowing you to watch in real time as your model's prediction line adapts to fit the curved data. The live plot helps you see how your model gradually learns to fit the curve, starting with a poor fit and improving over time.
"""

# training loop
NUM_EPOCHS = 3000
for epoch in range(NUM_EPOCHS):

    # reset optimizer's gradients for each loop so adjustments don't accumulate
    optimizer.zero_grad()

    # forward pass
    predicted_outputs = model(distances_tensor_normalized)

    # calculate loss
    loss = loss_function(
        predicted_outputs,
        delivery_times_tensor_normalized    # actual outputs
    )

    # backpropagation
    loss.backward()


    # update model parameters
    optimizer.step()

    # print and live plot training progress
    # plot every 50 epochs
    # TODO...
    if (epoch + 1) % 50 == 0:
        
        print(f"Epoch [{epoch+1}/{NUM_EPOCHS}], Loss: {loss.item():.4f}")

        # TODO...
        # plot_training_progress helper function
        plot_training_progress(
            epoch,
            loss,
            model,
            distances_tensor_normalized,
            delivery_times_tensor_normalized
        )

print("\nTraining complete.\n")
print(f"Final loss: {loss.item()}\n")