## 01.PyTorch Workflow Fundamentals

**A PyTorch Workflow**
1. Get data ready (turn into tensors)
2. Build or pick a pretrained model (to suit your problem)
   
   2.1 Pick a loss function & optimizer
   
   2.2 Build a training loop
4. Fit the model to the data and make a prediction
5. Evaluate the model
6. Improve through experimentation
7. Save and reload your trained model 

We're going to get ```torch```, ```torch.nn``` and ```matplotlib```

In [None]:
import torch
from torch import nn
import matplotlib.pyplot as plt

### 1.Data (preparing and loading)

In [None]:
# Create *known* parameters
weight = 0.7
bias = 0.3

# Create data
start = 0
end = 1
step = 0.02
X = torch.arange(start, end, step).unsqueeze(dim=1)
y = weight * X + bias

X[:10], y[:10]

Output: (tensor([[0.0000],
         [0.0200],
         [0.0400],
         [0.0600],
         [0.0800],
         [0.1000],
         [0.1200],
         [0.1400],
         [0.1600],
         [0.1800]]),
 tensor([[0.3000],
         [0.3140],
         [0.3280],
         [0.3420],
         [0.3560],
         [0.3700],
         [0.3840],
         [0.3980],
         [0.4120],
         [0.4260]]))

#### Split data into training and test sets

|Split             |Purpose                                                                                      |Amount of total data        |How often is it used?                                                                                                                             |
|------------------|-----------------------------------------------------------------------------------------|-------|--------------------|
|**Training set**  |The model learns from this data (like the course materials you study during the semester).   |~60-80%|Always              |
|**Validation set**|The model gets tuned on this data (like the practice exam you take before the final exam).   |~10-20%|Often but not always|
|**Testing set**   |The model gets evaluated on this data to test what it has learned (like the final exam).     |~10-20%|Always              |

In [None]:
# Create train/test split
train_split = int(0.8 * len(X)) # 80% of data used for training set, 20% for testing
X_train, y_train = X[:train_split], y[:train_split]
X_test, y_test = X[train_split:], y[train_split:]

len(X_train), len(y_train), len(X_test), len(y_test)

Output: (40, 40, 10, 10)

Let's create a function to visualize it.

In [None]:
def plot_predictions(train_data=X_train,
                     train_labels=y_train,
                     test_data=X_test,
                     test_labels=y_test,
                     prediction=None):
    plt.figure(figsize=(10, 7))
    
    # Plot training data in blue
    plt.scatter(train_data, train_labels, c="b", s=4, label="Training data")
    
    # Plot test data in green
    plt.scatter(test_data, test_labels, c="g", s=4, label="Testing data")
    
    if predictions is not None:
        # Plot the predictions in red
        plt.scatter(test_data, predictions, c="r", s=4, label="Predictions")
        
    # Show the legend
    plt.legend(prop={"size":14});

plot_predictions();

### 2.Build model

In [None]:
# Create a Linear Regression model class
class LinearRegressionModel(nn.Module): # <- almost everything in PyTorch is a nn.Module (think of this as neural network lego blocks)
    def __init__(self):
        super().__init__() 
        self.weights = nn.Parameter(torch.randn(1, # <- start with random weights (this will get adjusted as the model learns)
                                                dtype=torch.float), # <- PyTorch loves float32 by default
                                   requires_grad=True) # <- can we update this value with gradient descent?)

        self.bias = nn.Parameter(torch.randn(1, # <- start with random bias (this will get adjusted as the model learns)
                                            dtype=torch.float), # <- PyTorch loves float32 by default
                                requires_grad=True) # <- can we update this value with gradient descent?))

    # Forward defines the computation in the model
    def forward(self, x: torch.Tensor) -> torch.Tensor: # <- "x" is the input data (e.g. training/testing features)
        return self.weights * x + self.bias # <- this is the linear regression formula (y = m*x + b)

#### PyTorch model building essentials

PyTorch has 4 essential modules you can use to create almost any kind of neural network you can image. They are ```torch.nn```, ```torch.optim```, ```torch.utils.data.Dataset``` and ```torch.utils.data.DataLoader```.

Almost everything in a PyTorch neural network comes from ```torch.nn```,
* ```nn.Module``` contains the larger building blocks(layers)
* ```nn.Parameter``` contains the **smaller parameters** like weights and biases(put these together to make ```nn.Module```(s))
* ```forward()``` tells the larger blocks **how to make calculations on inputs**(tensors full of data) within ```nn.Module```(s)
* ```torch.optim``` contains optimization methods on how to improve the parameters within ```nn.Parameter``` to better represent input data

#### Checking the contents of a PyTorch model

Now let's create a model instance with the class we've made and **check its parameters** using ```.parameters()```

In [None]:
# Set manual seed since nn.Parameter are randomly initialzied
torch.manual_seed(42)

# Create an instance of the model
model_0 = LinearRegressionModel()

# Check the nn.Parameter(s) within the nn.Module subclass we created 
list(model_0.parameters())

Output: [Parameter containing:
 tensor([0.3367], requires_grad=True),
 Parameter containing:
 tensor([0.1288], requires_grad=True)]

We can also get the state of the model using ```.state_dict()```

In [None]:
# List named parameters
model_0.state_dict()

Output: OrderedDict([('weights', tensor([0.3367])), ('bias', tensor([0.1288]))])

#### Making predictions using ```torch.inference_mode()```

When we pass data to our model it'll go through the model's ```forward()``` method and produce a result using the computatuin we've defined.

In [None]:
# Make predictions with model
with torch.inference_mode():
    y_preds = model_0(X_test)

# Note: in older PyTorch code you might also see torch.no_grad()
with torch.no_grad():
    y_preds = model_0(X_test)

Note: In older PyTorch code, you may also see ```torch.no_grad()``` being used for inference. While ```torch.inference_mode()``` and ```torch.no_grad()``` do **similar** things, ```torch.inference_mode()``` is newer, potentially **faster and preferred**. 

### 3.Train model

#### Creating a loss function and optimizer in PyTorch

|Function|What does it do?|Where does it live in PyTorch?|Common values|
|--------|----------------|------------------------------|-------------|
|**Loss function**|Measures how wrong your models predictions are compared to the truth labels.|PyTorch has plenty of built-in loss functions in ```torch.nn```.|Mean absolute error (MAE) for regression problems (```torch.nn.L1Loss()```). Binary cross entropy for binary classification problems (```torch.nn.BCELoss()```).|
|**Optimizer**|Tells your model how to update its internal parameters to best lower the loss.|You can find various optimization function implementations in ```torch.optim```.|Stochastic gradient descent (```torch.optim.SGD()```). Adam optimizer (```torch.optim.Adam()```).|

In [None]:
# Create the loss function
loss_fn = nn.L1Loss()

# Create the optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(), # parameters of target model to optimize
                            lr=0.01)

#### Creating an optimization loop in PyTorch

It's time to create a **training loop**(and **testing loop**).

The training loop involves the model going through the training data and learning the relationships between the ```features``` and ```labels```.

The testing loop involves going through the testing data and evaluating how good the patterns are that the model learned on the training data.

#### PyTorch training loop

|Number|Step name|What does it do?|Code example|
|------|---------|----------------|------------|
|1|Forward pass|The model goes through all of the training data once, performing its ```forward()``` function calculations.|```model(x_train)```|
|2|Calculate the loss|The model's outputs (predictions) are compared to the ground truth and evaluated to see how wrong they are.|```loss = loss_fn(y_pred, y_train)```|
|3|Zero gradients|The optimizers gradients are set to zero (they are accumulated by default) so they can be recalculated for the specific training step.|```optimizer.zero_grad()```|
|4|Perform backpropagation on the loss|Computes the gradient of the loss with respect for every model parameter to be updated (each parameter with ```requires_grad=True```). This is known as **backpropagation**, hence "backwards".|```loss.backward()```|
|5|Update the optimizer (**gradient descent**)|Update the parameters with ```requires_grad=True``` with respect to the loss gradients in order to improve them.|```optimizer.step()```|

Note: On the ordering of things, the above is a good default order but you may see slightly different orders. Some rules of thumb:
* Calculate the loss (```loss = ...```) before performing backpropagation on it (```loss.backward()```).
* Zero gradients (```optimizer.zero_grad()```) before stepping them (```optimizer.step()```).
* Step the optimizer (```optimizer.step()```) after performing backpropagation on the loss (```loss.backward()```).

#### Pytorch testing loop

|Number|Step name|What does it do?|Code example|
|------|---------|----------------|------------|
|1|Forward pass|The model goes through all of the testing data once, performing its ```forward()``` function calculations.|```model(x_test)```|
|2|Calculate the loss|The model's outputs (predictions) are compared to the ground truth and evaluated to see how wrong they are.|```loss = loss_fn(y_pred, y_test)```|
|3|Calulate evaluation metrics (optional)|Alongisde the loss value you may want to calculate other evaluation metrics such as accuracy on the test set.|Custom functions|

Notice the testing loop **doesn't contain performing backpropagation** ( ```loss.backward()```) or **stepping the optimizer** (```optimizer.step()```), this is because no parameters in the model are being changed during testing, they've already been calculated. For testing, we're only interested in the output of the forward pass through the model.

Let's put all of the above together and train out model for 100 epochs(forward passes through the data) and we'll evaluate it every 10 epochs.

In [None]:
torch.manual_seed(42)

# Set the number of epoches (how many times the model will pass over the training data)
epochs = 100

# Create empty loss lists to track values
train_loss_values = []
test_loss_values = []
epoch_count = []

for epoch in range(epochs):
    ### Training

    # Put model in training mode (this is the default state of a model)
    model_0.train()
    
    # 1. Forward pass on train data using forward() method inside
    y_pred = model_0(X_train)
    
    # 2. Calculate the loss 
    loss = loss_fn(y_pred, t_train)
    
    # 3. Zero grad of the optimizer
    optimizer.zero_grad()
    
    # 4. Loss backwards
    loss.backward()
    
    # 5. Progress the optimizer
    optimizer.step()

    ### Testing

    # Put the model in evaluation mode
    model_0.eval()

    with torch.inference_mode():
        # 1. Forward pass on test data
        test_pred = model_0(X_test)

        # 2. Calculate loss on test data
        test_loss = loss_fn(test_pred, y_test.type(torch.float))
        
        # Print out what's happening
        if epoch % 10 == 0:
            epoch_count.append(epoch)
            train_loss_values.append(loss.detach().numpy())
            test_loss_values.append(test_loss.detach().numpy())
            print(f"Epoch: {epoch} | MAE Train Loss: {loss} | MAE Test Loss: {test_loss} ")

Epoch: 0 | MAE Train Loss: 0.31288138031959534 | MAE Test Loss: 0.48106518387794495 
Epoch: 10 | MAE Train Loss: 0.1976713240146637 | MAE Test Loss: 0.3463551998138428 
Epoch: 20 | MAE Train Loss: 0.08908725529909134 | MAE Test Loss: 0.21729660034179688 
Epoch: 30 | MAE Train Loss: 0.053148526698350906 | MAE Test Loss: 0.14464017748832703 
Epoch: 40 | MAE Train Loss: 0.04543796554207802 | MAE Test Loss: 0.11360953003168106 
Epoch: 50 | MAE Train Loss: 0.04167863354086876 | MAE Test Loss: 0.09919948130846024 
Epoch: 60 | MAE Train Loss: 0.03818932920694351 | MAE Test Loss: 0.08886633068323135 
Epoch: 70 | MAE Train Loss: 0.03476089984178543 | MAE Test Loss: 0.0805937647819519 
Epoch: 80 | MAE Train Loss: 0.03132382780313492 | MAE Test Loss: 0.07232122868299484 
Epoch: 90 | MAE Train Loss: 0.02788739837706089 | MAE Test Loss: 0.06473556160926819 

### 4.Making predictions with a trained PyTorch model

There are 3 things to remember when making predictions with a PyTorch model:
1. Set the model in evaluation mode (```model.eval()```).
2. Make the predictions using the inference mode context manager (```with torch.inference_mode():...```).
3. All predictions should be made with objects on the same device.

In [None]:
# 1. Set the model in evaluation mode
model_0.eval()

# 2. Setup the inference mode context manager
with torch.inference_mode():
  # 3. Make sure the calculations are done with the model and data on the same device
  # in our case, we haven't setup device-agnostic code yet so our data and model are
  # on the CPU by default.
  # model_0.to(device)
  # X_test = X_test.to(device)
  y_preds = model_0(X_test)
y_preds

tensor([[0.8141],
        [0.8256],
        [0.8372],
        [0.8488],
        [0.8603],
        [0.8719],
        [0.8835],
        [0.8950],
        [0.9066],
        [0.9182]])

### 5.Saving and loading a PyTorch model

For saving and loading models in PyTorch, there are 3 main methods you should be aware of:
|PyTorch method|What does it do?|
|--------------|----------------|
|```torch.save```|Saves a serialized object to disk using Python's ```pickle``` utility. Models, tensors and various other Python objects like dictionaries can be saved using ```torch.save```.|
|```torch.load```|Uses ```pickle```'s unpickling features to deserialize and load pickled Python object files (like models, tensors or dictionaries) into memory. You can also set which device to load the object to (CPU, GPU etc).|
|```torch.nn.Module.load_state_dict```|Loads a model's parameter dictionary (```model.state_dict()```) using a saved ```state_dict()``` object.|

#### Saving a PyTorch model's ```state_dict()```

The **recommended** way for saving and loading a model for inference is by saving and loading a model's ```state_dict()```.

Let's see how we can do that in a few steps:
1. We'll **create a directory for saving models** to called ```models``` using Python's ```pathlib``` module.
2. We'll **create a file path to save the model** to.
3. We'll call ```torch.save(obj, f)``` where ```obj``` is the target model's ```state_dict()``` and ```f``` is the filename of where to save the model.

In [None]:
from pathlib import Path

# 1. Create models directory
MODEL_PATH = Path("models")
MODEL_PATH.mkdir(parents=True, exist_ok=True)

# 2. Create model save path
MODEL_NAME = "01_pytorch_workflow_model_0.pth"
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

# 3. Save the model state dict 
print(f"Saving model to: {MODEL_SAVE_PATH}")
torch.save(obj=model_0.state_dict(), # only saving the state_dict() only saves the models learned parameters
           f=MODEL_SAVE_PATH)

Output: Saving model to: models/01_pytorch_workflow_model_0.pth

In [None]:
# Check the saved file path
!ls -l models/01_pytorch_workflow_model_0.pth

-rw-rw-r-- 1 daniel daniel 1063 Nov 10 16:07 models/01_pytorch_workflow_model_0.pth

#### Loading a saved PyTorch model's ```state_dict()```

Since we've now got a saved model ```state_dict()``` at ```models/01_pytorch_workflow_model_0.pth``` we can now load it in using ```torch.nn.Module.load_state_dict(torch.load(f))``` where ```f``` is the filepath of our saved model ```state_dict()```.

Why call ```torch.load()``` inside ```torch.nn.Module.load_state_dict()```?

Because we only saved the model's ```state_dict()``` which is a dictionary of learned parameters and not the entire model, we first have to load the ```state_dict()``` with ```torch.load()``` and then pass that ```state_dict()``` to a new instance of our model (which is a subclass of ```nn.Module```).

In [None]:
# Instantiate a new instance of our model (this will be instantiated with random weights)
loaded_model_0 = LinearRegressionModel()

# Load the state_dict of our saved model (this will update the new instance of our model with trained weights)
loaded_model_0.load_state_dict(torch.load(f=MODEL_SAVE_PATH))

### 6.Putting it all together

1. We'll start by **importing the standard libraries** we need.
2. Setup device (```device = "cuda" if torch.cuda.is_available() else "cpu"```)

#### 6.1 Data

1. Create some data
2. Split data into training and test sets
3. Visualize

#### 6.2 Building a PyTorch linear model

1. Make a model
2. Put the model on the GPU

#### 6.3 Training

1. Create a loss function and optimizer
2. Based on PyTorch training loop steps, train the model

#### 6.4 Making predictions

We've got a trained model, let's turn on it's evaluation mode and make some predictions.

#### 6.5 Saving and loading a model

