# TITLE

**Learning Source:**

freeCodeCamp.org : PyTorch for Deep Learning & Machine Learning – Full Course 

This example is a example of implementation of classification example from the provided tutorial.

**References:**
- Link to resource: https://www.youtube.com/watch?v=V_xro1bcAuA
- https://pytorch.org/tutorials/beginner/ptcheat.html
- https://www.geeksforgeeks.org/machine-learning

## 01. Setup Environment

### 01.01 Setting up environment

In [None]:
# install basic packages required for the project
%conda install numpy pandas matplotlib
%conda install pytorch -c pytorch
%conda install scikit-learn
%conda install tqdm

### 01.02 Importing Libraries to python

In [None]:
import torch

torch.__version__

### 01.03 Setting up Device Agnostic code and device selection

In [None]:
#setup devices agnostic code
print("Setting up device agnostic ...")
if torch.cuda.is_available():       # Check if cuda available
    device = torch.device("cuda")   # Set device as cuda
elif torch.mps.is_available():      # Check if mps available
    device = torch.device("mps")    # Set device as mps
else:                               # Default device selection
    device = torch.device("cpu")    # Set device as cpu, Default behavour

print(f"Selected device for processing : {device}")

### 01.04 Setting up some Helper Functions

In [None]:
import torch
import matplotlib.pyplot as plt
import numpy as np



def plot_decision_boundary(model: torch.nn.Module, X: torch.Tensor, y: torch.Tensor):
    """Plots decision boundaries of model predicting on X in comparison to y.

    Source - https://madewithml.com/courses/foundations/neural-networks/ (with modifications)
    """
    # Put everything to CPU (works better with NumPy + Matplotlib)
    model.to("cpu")
    X, y = X.to("cpu"), y.to("cpu")

    # Setup prediction boundaries and grid
    x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
    y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 101), np.linspace(y_min, y_max, 101))

    # Make features
    X_to_pred_on = torch.from_numpy(np.column_stack((xx.ravel(), yy.ravel()))).float()

    # Make predictions
    model.eval()
    with torch.inference_mode():
        y_logits = model(X_to_pred_on)

    # Test for multi-class or binary and adjust logits to prediction labels
    if len(torch.unique(y)) > 2:
        y_pred = torch.softmax(y_logits, dim=1).argmax(dim=1)  # mutli-class
    else:
        y_pred = torch.round(torch.sigmoid(y_logits))  # binary

    # Reshape preds and plot
    y_pred = y_pred.reshape(xx.shape).detach().numpy()
    plt.contourf(xx, yy, y_pred, cmap=plt.cm.RdYlBu, alpha=0.7)
    plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.RdYlBu)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())


# Plot linear data or training and test and predictions (optional)
def plot_predictions(
    train_data, train_labels, test_data, test_labels, predictions=None
):
    """
  Plots linear training data and test data and compares predictions.
  """
    plt.figure(figsize=(10, 7))

    # Plot training data in blue
    plt.scatter(train_data, train_labels, c="b", s=4, label="Training data")

    # Plot test data in green
    plt.scatter(test_data, test_labels, c="g", s=4, label="Testing data")

    if predictions is not None:
        # Plot the predictions in red (predictions were made on the test data)
        plt.scatter(test_data, predictions, c="r", s=4, label="Predictions")

    # Show the legend
    plt.legend(prop={"size": 14})

# Plot loss curves of a model
def plot_loss_curves(results):
    """Plots training curves of a results dictionary.

    Args:
        results (dict): dictionary containing list of values, e.g.
            {"train_loss": [...],
             "train_acc": [...],
             "test_loss": [...],
             "test_acc": [...]}
    """
    loss = results["train_loss"]
    test_loss = results["test_loss"]

    accuracy = results["train_acc"]
    test_accuracy = results["test_acc"]

    epochs = range(len(results["train_loss"]))

    plt.figure(figsize=(15, 7))

    # Plot loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, loss, label="train_loss")
    plt.plot(epochs, test_loss, label="test_loss")
    plt.title("Loss")
    plt.xlabel("Epochs")
    plt.legend()

    # Plot accuracy
    plt.subplot(1, 2, 2)
    plt.plot(epochs, accuracy, label="train_accuracy")
    plt.plot(epochs, test_accuracy, label="test_accuracy")
    plt.title("Accuracy")
    plt.xlabel("Epochs")
    plt.legend()


## 02. Data Preparation and Loading

### 02.01 Preparing (Generating) data

### 02.02 Train Test Split data

## 03. Build a Model

### 03.01 Setting up Helper Functions for Model

### 03.02 Setting up HyperParameters

### 03.03 Writing the model Class

## 04. Training

Feel free to reference the [ML Activation function cheatsheet website](https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html) 

### 04.00 Setting up helper methods for training

In [None]:
import torch
from timeit import default_timer as timer 

def print_train_time(start: float, end: float, device: torch.device = None):
    """Prints difference between start and end time.

    Args:
        start (float): Start time of computation (preferred in timeit format). 
        end (float): End time of computation.
        device ([type], optional): Device that compute is running on. Defaults to None.

    Returns:
        float: time between start and end in seconds (higher is longer).
    """
    total_time = end - start
    print(f"Train time on {device}: {total_time:.3f} seconds")
    return total_time

# Calculate accuracy (a classification metric)
def accuracy_fn(y_true, y_pred):
    """
        Calculates accuracy between truth labels and predictions.

        Args:
            y_true (torch.Tensor): Truth labels for predictions.
            y_pred (torch.Tensor): Predictions to be compared to predictions.

        Returns:
            [torch.float]: Accuracy value between y_true and y_pred, e.g. 78.45
    """
    correct = torch.eq(y_true, y_pred).sum().item()
    acc = (correct / len(y_pred)) * 100
    return acc

def logitsToPredictionActivationFn(logits, dim=1):
    """
        Activation function that converts logits to prediction probabilities and then to prediction labels

        Args:
            logits (torch.Tensor): Logits output from the model.
            dim (int): Dimension along which to apply the softmax function.
        
        Returns:
            [torch.int]: Prediction labels.
    """
    return torch.softmax(logits, dim).argmax(dim)

def train_step(model: torch.nn.Module,
                  X: torch.Tensor,
                  y: torch.Tensor,
                  loss_fn: torch.nn.Module,
                  optimizer: torch.optim.Optimizer,
                  device: torch.device = device) -> tuple:
    """
        Training step for a model. 

        Args:
            model (torch.nn.Module): Model to train. 
            x (torch.Tensor): Input Features. 
            y (torch.Tensor): Target Labels. 
            loss_fn (torch.nn.Module): Loss Function to use for training. 
            optimizer (torch.optim.Optimizer): Optimizer to use for training. 
            device (torch.device, optional): Device to train on. Defaults to device.
        
        Returns:
            (torch.float, torch.float): Training Loss and Accuracy tuple. 
    """
    
    # train_loss, train_acc = 0.0, 0.0
    # Move model and data to device
    model.to(device)
    X, y = X.to(device), y.to(device)

    model.train()                                       # Set model to training mode
    logits = model(X)                                   # Forward Pass through model
    pred = logitsToPredictionActivationFn(logits)       # Apply softmax to get prediction probability and get prediction index
    
    train_loss = loss_fn(logits,y)                      # Calculate loss, with BCEWithLogitsLoss we use logits insted of pred
    train_acc = accuracy_fn(y_true=y, y_pred=pred)      # Calculate accuracy

    optimizer.zero_grad()                               # Clear gradients from previous iteration

    train_loss.backward()                               # Back propogation

    optimizer.step()                                    # Gradient Decent, update weights

    return train_loss, train_acc

def test_step(model: torch.nn.Module, 
              X: torch.Tensor,
              y: torch.Tensor,
              loss_fn: torch.nn.Module,
              device: torch.device = device) -> tuple:
    """
        Make predictions on a test set and calculate accuracy

        Args:
            model (torch.nn.Module): Model to train. 
            X (torch.Tensor): Input Features. 
            y (torch.Tensor): Target Labels. 
            loss_fn (torch.nn.Module): Loss Function to use for training.
            device (torch.device, optional): Device to train on. Defaults to device.
        Returns:
            (torch.float, torch.float): Testing Loss and Accuracy tuple. 
    """
    test_loss, test_acc = 0.0, 0.0
    # move model and daa to device
    model.to(device)
    X, y = X.to(device), y.to(device)

    model.eval()                                            # Set model to evaluation mode
    with torch.inference_mode():                            # Disable gradient calculation for performance reasons
        logits = model(X)                                   # Forward Pass through model
        pred = logitsToPredictionActivationFn(logits)       # Apply softmax to get prediction probability and get prediction index
        
        test_loss = loss_fn(logits,y)                       # Calculate loss, as we use BCEWithLogitsLoss we use logits insted of pred
        test_acc = accuracy_fn(y_true=y, y_pred=pred)       # Calculate accuracy
    
    return test_loss, test_acc


### 04.01 Setting up Loss Function and Optimizer

Different problem types require different loss functions. 

For example, for a regression problem (predicting a number) you might use mean absolute error (MAE) loss.

And for a binary classification problem (like ours), you'll often use [binary cross entropy](https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a) as the loss function.

However, the same optimizer function can often be used across different problem spaces.

For example, the stochastic gradient descent optimizer (SGD, `torch.optim.SGD()`) can be used for a range of problems, and the same applies to the Adam optimizer (`torch.optim.Adam()`). 

| Loss function/Optimizer | Problem type | PyTorch Code |
| ----- | ----- | ----- |
| Stochastic Gradient Descent (SGD) optimizer | Classification, regression, many others. | [`torch.optim.SGD()`](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) |
| Adam Optimizer | Classification, regression, many others. | [`torch.optim.Adam()`](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) |
| Binary cross entropy loss | Binary classification | [`torch.nn.BCELossWithLogits`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) or [`torch.nn.BCELoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html) |
| Cross entropy loss | Multi-class classification | [`torch.nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) |
| Mean absolute error (MAE) or L1 Loss | Regression | [`torch.nn.L1Loss`](https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html) | 
| Mean squared error (MSE) or L2 Loss | Regression | [`torch.nn.MSELoss`](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) |  

*Table of various loss functions and optimizers, there are more but these are some common ones you'll see.*

Since we're working with a binary classification problem, let's use a binary cross entropy loss function.

> **Note:** Recall a **loss function** is what measures how *wrong* your model predictions are, the higher the loss, the worse your model.
>
> Also, PyTorch documentation often refers to loss functions as "loss criterion" or "criterion", these are all different ways of describing the same thing.

PyTorch has two binary cross entropy implementations:
1. [`torch.nn.BCELoss()`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html) - Creates a loss function that measures the binary cross entropy between the target (label) and input (features).
2. [`torch.nn.BCEWithLogitsLoss()`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) - This is the same as above except it has a sigmoid layer ([`nn.Sigmoid`](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html)) built-in (we'll see what this means soon).

Which one should you use? 

The [documentation for `torch.nn.BCEWithLogitsLoss()`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) states that it's more numerically stable than using `torch.nn.BCELoss()` after a `nn.Sigmoid` layer. 

So generally, implementation 2 is a better option. However for advanced usage, you may want to separate the combination of `nn.Sigmoid` and `torch.nn.BCELoss()` but that is beyond the scope of this notebook.

Knowing this, let's create a loss function and an optimizer. 

For the optimizer we'll use `torch.optim.SGD()` to optimize the model parameters with learning rate 0.1.

> **Note:** There's a [discussion on the PyTorch forums about the use of `nn.BCELoss` vs. `nn.BCEWithLogitsLoss`](https://discuss.pytorch.org/t/bceloss-vs-bcewithlogitsloss/33586/4). It can be confusing at first but as with many things, it becomes easier with practice.

### 04.02 Training Model

Okay, now we've got a loss function and optimizer ready to go, let's train a model.

Steps in training:

<details>
    <summary>PyTorch training loop steps</summary>
    <ol>
        <li><b>Forward pass</b> - The model goes through all of the training data once, performing its
            <code>forward()</code> function
            calculations (<code>model(x_train)</code>).
        </li>
        <li><b>Calculate the loss</b> - The model's outputs (predictions) are compared to the ground truth and evaluated
            to see how
            wrong they are (<code>loss = loss_fn(y_pred, y_train</code>).</li>
        <li><b>Zero gradients</b> - The optimizers gradients are set to zero (they are accumulated by default) so they
            can be
            recalculated for the specific training step (<code>optimizer.zero_grad()</code>).</li>
        <li><b>Perform backpropagation on the loss</b> - Computes the gradient of the loss with respect for every model
            parameter to
            be updated (each parameter
            with <code>requires_grad=True</code>). This is known as <b>backpropagation</b>, hence "backwards"
            (<code>loss.backward()</code>).</li>
        <li><b>Step the optimizer (gradient descent)</b> - Update the parameters with <code>requires_grad=True</code>
            with respect to the loss
            gradients in order to improve them (<code>optimizer.step()</code>).</li>
    </ol>
</details>



### 04.03 Evaluating the Model

## 05. Storing & Loading Model

### 05.01 Saving Model state to file

### 05.02 Loading Model state from file

### 05.03 Testing loaded model

## 06. Conclusion