# PyTorch Basics
> Introduction into the basics of using the PyTorch tools alongside NumPy
>
> **Table Of Contents:**
>> 1. Using AutoGrad for Stochastic Gradient Descent
>>
>> 2. Converting Data Between PyTorch And NumPy
>>
>> 3. The Input Pipeline
>>
>> 4. Running a Pre-Trained Model
>>
>> 5. Saving and Loading Models

## Using AutoGrad for Stochastic Gradient Descent

> **What is AutoGrad?**
>> AutoGrad (automatic gradient) is the automatic differentition tool, recording the change history of tensors to follow the changes and apply the chain rule to compute gradients (https://pytorch.org/docs/stable/notes/autograd)
>
> **Why is the gradient important?**
>> As written by Goodfellow et al. (ch. 4.3), deep-learning involves the optimization of an objective function. And so, the gradient allows us to get the direction with the highest rate of change at a specifc point. By then taking the opposite direction (i.e., gradient descent), we can minimize the objective function (also known as the loss function) which results in a relatively good set of parameters.
>
> **How to import AutoGrad?**
>> AutoGrad is apart of the torch module (torch.AutoGrad), so we only need to import torch to use the AutoGrad tool

In [2]:
#Import the torch module to use AutoGrad
import torch

> **How to use AutoGrad?**
>> As the name implies, AutoGrad works automatically in the background; however, we must specify when we want it to be keeping track of tensor change history

In [3]:
#Create tensors specifying the gradient will be needed for later
x = torch.tensor([1.0], requires_grad=True)
w = torch.tensor([2.0], requires_grad=True)
b = torch.tensor([3.0], requires_grad=True)

#Print the current tensors made
print(f"x = {x}")
print(f"w = {w}")
print(f"b = {b}")

x = tensor([1.], requires_grad=True)
w = tensor([2.], requires_grad=True)
b = tensor([3.], requires_grad=True)


>**In what way do we "change" the tensors?**
>> We can apply forward propogation by using mathematical functions such as a linear function to combine our tensors and get a resulting tensor. This works out well because an artificial neural network is nothing more than a mathematical function, mapping our input to the output
>
> **What is Forward Propogation?**
>> Forward propogation is the process of feeding all of our input data into a model to get the resulting output (i.e., get the prediction made by a model)

In [4]:
#Apply forward propogation via a simple linear equation and save the result
y = w*x + b

> **We have some history of changes made; now what?**
>> After applying forward propogation and now having some history for the output tensor, we can determine the gradient by using backwards propogation
>
>**What is backwards propogation?**
>> Backwards propogation the process of comparing the output data to the correct label and having that error self-correct the parameters in a model by working backwards. A subset of the training data (batch) is used to calculate a vector which is proportional to the negative gradient, allowing us to take a single gradient descent step to correct the parameter values (i.e., lessen the error of the model on the training data) (https://www.youtube.com/watch?v=Ilg3gGewQ5U)
>
> **What is Stochastic Gradient Descent?**
>> The repeated process of using backwards propogation until a certain number of passes through the entire dataset (i.e., epochs) are made is called Stochatic Gradient Descent (https://www.youtube.com/watch?v=Ilg3gGewQ5U)

In [5]:
#Perform backwards propogation from the y tensor
y.backward()

> **How can we think of the resulting gradient from backpropogation in a multi-dimensional space?**
>> We can think of the gradient in terms of its components. Each component in the gradient tells how sensitive that parameter is changes in its value. Another way to put it is that the value of the component in the gradient is the multiplied weight given to the input that then show up in the output. An example is seen below with x being the only component with 2. The reason is because it is being multiplied by the weight (which is 2) an the bias term does not effect it because it is an added constant. So if we double the x value, the output would be 4 times as large. (https://www.youtube.com/watch?v=tIeHLnjs5U8)

In [6]:
#Print the component of the gradient after performing backward propogation
print(f"x component of the gradient: {x.grad}")
print(f"w component of the gradient: {w.grad}")
print(f"b component of the gradient: {b.grad}")

x component of the gradient: tensor([2.])
w component of the gradient: tensor([1.])
b component of the gradient: tensor([1.])


> **What does this look like in a more complex example?**
>> We will look through the same example as above, but we will instead rely more on the built in functions and modules the PyTorch offers

In [13]:
#First import the necessary module
import torch

#Next create a toy dataset (random tensors)
x = torch.rand(10, 3) #A set of 10 feature vectors with 3 features each
y = torch.rand(10, 2) #A set of 10 label vectors with 2 labels each

#Create a linear model by creating a fully connected 2-layer neural network
input_layer = 3 #3 input nodes
output_layer = 2 #2 output nodes
linearModel = torch.nn.Linear(input_layer, output_layer)

#Print the current values of the weights & biases for the linear model
print("Printing model parameter values:")
print(f"w: {linearModel.weight}")
print(f"b: {linearModel.bias}\n")

#With the model defined, we can now apply forward propogation to get the current model predictions
predictions = linearModel(x)

#With the model's predictions saved, we can now compare them to the true labels and get the total loss for the model
lossFunction = torch.nn.MSELoss() #First create an MSE Loss object to act as the loss function
loss = lossFunction(predictions, y) #Then compute the loss using the Mean-Square-Error (MSE)

#Print the current loss of the model
print(f"Current model loss: {loss}\n")

#With the loss calculated, we can minimize it using backward propogation
loss.backward()

#Print the gradients that were computed using backwards propogation
print("Printing gradient of first backward propogation:")
print(f"dL/dw: {linearModel.weight.grad}")
print(f"dL/db: {linearModel.bias.grad}\n")

#Since the result of backwards propogation is proportional to the negative gradient, it tells us the exact direction to take a step in our parameter space
step_size = 0.01
linearModel.weight.data -= step_size * linearModel.weight.grad.data
linearModel.bias.data -= step_size * linearModel.bias.grad.data

#With the weights updated, we can recompute the loss using the improved parameters
predictions = linearModel(x)
loss = lossFunction(predictions, y)
print(f"Loss after 1 Stochastic Gradient Descent step: {loss}")

Printing model parameter values:
w: Parameter containing:
tensor([[-0.4801,  0.0398, -0.5144],
        [ 0.2996, -0.0121,  0.4330]], requires_grad=True)
b: Parameter containing:
tensor([-0.0180,  0.2725], requires_grad=True)

Current model loss: 0.7636953592300415

Printing gradient of first backward propogation:
dL/dw: tensor([[-0.6175, -0.5759, -0.5153],
        [ 0.1304,  0.0595,  0.0957]])
dL/db: tensor([-1.1256,  0.1807])

Loss after 1 Stochastic Gradient Descent step: 0.740820050239563


>**Is there a simpler way to perform a single stochastic gradient descent (SGD) step?**
>> There is only one thing that can be added, an optimizer object. There is a SGD object than can perform the update to the parameters when the gradient is computed

In [16]:
"""
#Since the result of backwards propogation is proportional to the negative gradient, it tells us the exact direction to take a step in our parameter space
step_size = 0.01
linearModel.weight.data -= step_size * linearModel.weight.grad.data
linearModel.bias.data -= step_size * linearModel.bias.grad.data
"""
#=========================
#Replace the code above with the code below to utillize the SGD optimizer
#=========================

#Define an optimizer for the model parameters Using Stochatic Gradient Descent (SGD))
learning_rate = 0.01 #Set how quickly the model will learn (i.e., the weight multiplied to the change on the parameters)
optimizer = torch.optim.SGD(linearModel.parameters(), lr=learning_rate)

#Perform a stingle step of the SGD optimizer
optimizer.step()

## Converting Data Between PyTorch And NumPy

> **How can we input data to use in PyTorch?**
>> To convert data into PyTorch tensors, there is an easy conversion function that converts NumPy arrays into PyTorch tensors

In [19]:
#import necessary modules
import torch
import numpy as np

#Create a python list
data = [
    [1, 2],
    [3, 4]
]

#Convert python list into a numpy array
x = np.array(data)
print(f"x:\n{x}\n")

#Convert the numpy array to a pytorch tensor
y = torch.from_numpy(x)
print(f"y:\n{y}")

x:
[[1 2]
 [3 4]]

y:
tensor([[1, 2],
        [3, 4]], dtype=torch.int32)


>**Can I convert the PyTorch tensor back into a NumPy array?**
>> Yes! The conversion from PyTorch tensors to NumPy arrays is easy, only requiring one function for each type of conversion

In [20]:
#Convert the pytorch tensor into a numpy array
z = y.numpy()
print(f"z:\n{z}")

z:
[[1 2]
 [3 4]]


## The Input Pipeline

> **What is the sequence of the Input Pipeline?**
>> 1. Download dataset
>>
>> 2. Read first data vector to ensure the data was downloaded properly
>>
>> 3. Use the prebuilt PyTorch DataLoader for queues and threads
>>
>> 4. Iterate through the data an train the model
>
> **How to download a dataset?**
>> We simply need to get the data on our local storage. There are some references to common datasets which PyTorch has functions for downloaing them
>
> **What happends if I already downloaded the dataset?**
>> As long as the root path is the same, the PyTorch should be able to detect that the files already exist and prevent a redownload of the data (https://pytorch.org/vision/main/generated/torchvision.datasets.CIFAR10.html)

In [23]:
#Import torchvision module for image dataset and image to tensor transformation
import torchvision

#Download and construct the CIFAR-10 image dataset
training_dataset = torchvision.datasets.CIFAR10(root=".", 
                                                train=True, 
                                                transform=torchvision.transforms.ToTensor(),
                                                download=True)

Files already downloaded and verified


> **How to read the data?**
>> PyTorch both downloads and constructs the data. In that case, the data is saved as a variable and can be read by normal inexing operations. One thing to keep in mind is that the data in each element is separated into the features and label via a tuple. This allows for the two of them to be easily separated

In [31]:
#Read the first data vector in the dataset
image_tensor, label = training_dataset[0]

#Print the size of the image tensor
print(f"Tensor size of image 0: {image_tensor.size()}") #can also use tensor.shape to get the size
print(f"Label of image 0: {label}")

Tensor size of image 0: torch.Size([3, 32, 32])
Label of image 0: 6


> **How to use the prebuilt DataLoader?**
>> The dataloader is a class that allows for queues and threads of the data to be made in a very simply way. To use it, we simply need to provide our constructed dataset and some other parameters such as batch size and whether or not to shuffle the data

In [33]:
#Create the dataloader object from the CIFAR-10 dataset with a batch size of 64 and shuffle the order of the data
training_dataloader = torch.utils.data.DataLoader(dataset=training_dataset,
                                                  batch_size=64,
                                                  shuffle=True)

> **How to iterate through the data**
>> This can be done using a simple for loop over the dataloader object

In [34]:
#Use a for loop to iterate over each data vector in the dataloader
for images, labels in training_dataloader:
    #In the for loop, we will put code for our model to perform on each image/label in the dataset
    pass

> **How can we create our own dataset that is compatable with the dataloader?**
>> We can create a custom class which inherits the dataset class used by PyTorch. In this derived class, we will then need to override a few member functions to have them be more suited for our new dataset

In [39]:
#Template for own custom dataset class below
class CustomDataset(torch.utils.data.Dataset):
    #Define the constructor
    def __init__(self):
        #We will need to initiallize the filepath/list of files for our dataset
        pass
    #Define what to do for the indexing operator
    def __getitem__(self, index):
        #We will need to read the data from the file, proprocess the data, and then return the (feature, label) tuple
        pass
    #Define the length of the dataset
    def __len__(self):
        #replace the 999 with the actual length of the dataset
        return 999

> **Once we create a custom dataset, how do we use it with the dataloader?**
>> To use the custom dataset, we can simply instantiate the class and use it as a built-in dataset such as the CIFAR-10 dataset

In [40]:
#Instantiate the custom dataset
custom_dataset = CustomDataset()

#now use the dataset with the dataloader as you would with any other dataset
training_dataloader = torch.utils.data.DataLoader(dataset=custom_dataset,
                                                  batch_size=64,
                                                  shuffle=True)

## Running a Pre-Trained Model

> **How to download and load a pre-trained model?**
>> PyTorch has some models built in to use with a specific parameter to indicate if you want to model to be pre-trained or not

In [41]:
#Import the torchvision module for the resnet image model
import torchvision

#Download and load the resnet18 pre-trained model
resnet = torchvision.models.resnet18(pretrained=True)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to C:\Users\mike5/.cache\torch\hub\checkpoints\resnet18-f37072fd.pth
100.0%
