# Training Our First Neural Network with PyTorch

To train a neural network in PyTorch, you will first need to understand the job of a loss function. You will then realize that training a network requires minimizing that loss function, which is done by calculating gradients. You will learn how to use these gradients to update your model's parameters, and finally, you will write your first training loop.


## Building a binary classifier in PyTorch

Recall that a small neural network with a single linear layer followed by a sigmoid function is a binary classifier. It acts just like a logistic regression.

In this exercise, you'll practice building this small network and interpreting the output of the classifier.


Instructions:

- Create a neural network that takes a tensor of dimensions 1x8 as input, and returns an output of the correct shape for binary classification.
- Pass the output of the linear layer to a sigmoid, which both takes in and return a single float.


In [3]:
import torch
import torch.nn as nn
import numpy as np
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

In [55]:
input_tensor = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])

# Implement a small neural network for binary classification
model = nn.Sequential(
    nn.Linear(8, 1),
    nn.Sigmoid()
)

output = model(input_tensor)
print(output)

tensor([[0.6703]], grad_fn=<SigmoidBackward0>)


## From regression to multi-class classification

Recall that the models we have seen for binary classification, multi-class classification and regression have all been similar, barring a few tweaks to the model.

In this exercise, you'll start by building a model for regression, and then tweak the model to perform a multi-class classification.


Instructions:

- Create a neural network with exactly four linear layers, which takes the input tensor as input, and outputs a regression value, using any shapes you like for the hidden layers.
- A similar neural network to the one you just built is provided, containing four linear layers; update this network to perform a multi-class classification with four outputs.


In [56]:
input_tensor = torch.Tensor([[3, 4, 6, 7, 10, 12, 2, 3, 6, 8, 9]])

# Update network below to perform a multi-class classification with four labels
model = nn.Sequential(
  nn.Linear(11, 20),
  nn.Linear(20, 12),
  nn.Linear(12, 6),
  nn.Linear(6, 4), 
  nn.Softmax(dim=-1)
)

output = model(input_tensor)
print(output)

tensor([[0.2706, 0.2010, 0.2345, 0.2939]], grad_fn=<SoftmaxBackward0>)


## Creating one-hot encoded labels

One-hot encoding is a technique that turns a single integer label into a vector of N elements, where N is the number of classes in your dataset. This vector only contains zeros and ones. In this exercise, you'll create the one-hot encoded vector of the label `y` provided.

You'll practice doing this manually, and then make your life easier by leveraging the help of PyTorch! Your dataset contains three classes.


Instructions:

- Manually create a one-hot encoded vector of the ground truth label y by filling in the NumPy array provided.
- Create a one-hot encoded vector of the ground truth label y using PyTorch.


In [57]:
y = 1
num_classes = 3

# Create the one-hot encoded vector using NumPy
one_hot_numpy = np.array([0, 1, 0])
print(one_hot_numpy)

# Create the one-hot encoded vector using PyTorch
one_hot_pytorch = F.one_hot(torch.tensor(y), num_classes)
print(one_hot_pytorch)

[0 1 0]
tensor([0, 1, 0])


## Calculating cross entropy loss

Cross entropy loss is the most used loss for classification problems. In this exercise, you will create inputs and calculate cross entropy loss in PyTorch. You are provided with the ground truth label `y` and a vector of `scores` predicted by your model.

You'll start by creating a one-hot encoded vector of the ground truth label `y`, which is a required step to compare `y` with the scores predicted by your model. Next, you'll create a cross entropy loss function. Last, you'll call the loss function, which takes `scores` (model predictions before the final softmax function), and the one-hot encoded ground truth label, as inputs. It outputs a single float, the loss of that sample.


Instructions:

- Create the one-hot encoded vector of the ground truth label `y` and assign it to `one_hot_label`.
- Create the cross entropy loss function and store it as `criterion`.
- Calculate the cross entropy loss using the `one_hot_label` vector and the `scores` vector, by calling the `loss_function` you created.


In [58]:
y = [2]
scores = torch.tensor([[0.1, 6.0, -2.0, 3.2]])

# Create a one-hot encoded vector of the label y
one_hot_label = F.one_hot(torch.tensor(y), num_classes=scores.shape[1])

# Create the cross entropy loss function
criterion = nn.CrossEntropyLoss()

# Calculate the cross entropy loss
loss = criterion(scores.double(), one_hot_label.double())
print(loss)

tensor(8.0619, dtype=torch.float64)


## Using derivatives to update model parameters


### Estimating a sample

In previous exercises, you used linear layers to build networks.

Recall that the operation performed by `nn.Linear()` is to take an input `X` and apply the transformation `W*X+b`,where `W` and `b` are two tensors (called the weight and bias).

A critical part of training PyTorch models is to calculate gradients of the weight and bias tensors with respect to a loss function.

In this exercise, you will calculate weight and bias tensor gradients using cross entropy loss and a sample of data.

The following tensors are provded:

- `weight`: a 2x9-element tensor
- `bias`: a 2-element tensor
- `preds`: a 1x2-element tensor containing the model predictions
- `target`: a 1x2-element one-hot encoded tensor containing the ground-truth label


In [59]:
weight = torch.tensor(
    [
        [0.4490, -2.7858, -1.3348, 0.7073, -0.1589, 1.6116, 0.3382, -0.8131, 1.6632],
        [0.1648, 0.3786, -0.8212, -0.1018, 0.7969, -1.7547, -0.9105, -0.0274, 1.0297],
    ],
    requires_grad=True,
)
bias = torch.tensor([-0.0209, 1.8303], requires_grad=True)

preds = torch.tensor([[-0.2349, 2.0564]])

target = torch.tensor([[1.0, 0.0]])

Instructions:

- Use the criterion you have defined to calculate the loss value with respect to the predictions and target values.
- Compute the gradients of the cross entropy loss.
- Display the gradients of the weight and bias tensors, in that order.


In [60]:
criterion = nn.CrossEntropyLoss()

# Calculate the loss
loss = criterion(preds, target)

# Compute the gradients of the loss
loss.backward()

# Display gradients of the weight and bias tensors in order
print(weight.grad)
print(bias.grad)

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

### Accessing the model parameters

A PyTorch model created with the `nn.Sequential()` is a module that contains the different layers of your network. Recall that each layer parameter can be accessed by indexing the created model directly. In this exercise, you will practice accessing the parameters of different linear layers of a neural network. You won't be accessing the sigmoid.


Instructions:

- Access the `weight` parameter of the first linear layer.
- Access the bias parameter of the second linear layer.


In [None]:
model = nn.Sequential(nn.Linear(16, 8),
                      nn.Sigmoid(),
                      nn.Linear(8, 2))

# Access the weight of the first linear layer
weight_0 = model[0].weight

# Access the bias of the second linear layer
bias_1 = model[2].bias

### Updating the weights manually

Now that you know how to access weights and biases, you will manually perform the job of the PyTorch optimizer. PyTorch functions can do what you're about to do, but it's helpful to do the work manually at least once, to understand what's going on under the hood.

A neural network of three layers has been created and stored as the model variable. This network has been used for a forward pass and the loss and its derivatives have been calculated. A default learning rate, `lr`, has been chosen to scale the gradients when performing the update.


In [61]:
model = nn.Sequential(
    nn.Linear(in_features=16, out_features=8, bias=True),
    nn.Linear(in_features=8, out_features=4, bias=True),
    nn.Linear(in_features=4, out_features=2, bias=True),
)
lr = 0.001

Instructions:

- Create the gradient variables by accessing the local gradients of each weight tensor.
- Update the weights using the gradients scaled by the learning rate.


In [62]:
weight0 = model[0].weight
weight1 = model[1].weight
weight2 = model[2].weight

# Access the gradients of the weight of each linear layer
grads0 = model[0].weight.grad
grads1 = model[1].weight.grad
grads2 = model[2].weight.grad

# Update the weights using the learning rate and the gradients
weight0 = weight0 - lr * grads0
weight1 = weight1 - lr * grads1
weight2 = weight2 - lr * grads2

TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

### Using the PyTorch optimizer

In the previous exercise, you manually updated the weight of a network. You now know what's going on under the hood, but this approach is not scalable to a network of many layers.

Thankfully, the PyTorch SGD optimizer does a similar job in a handful of lines of code. In this exercise, you will practice the last step to complete the training loop: updating the weights using a PyTorch optimizer.

A neural network has been created and provided as the `model` variable. This model was used to run a forward pass and create the tensor of predictions `pred`. The one-hot encoded tensor is named `target` and the cross entropy loss function is stored as `criterion`.


In [63]:
pred = torch.tensor([[-0.4732, -0.6021]], requires_grad=True)
target = torch.tensor([[1.0, 0.0]])
criterion = nn.CrossEntropyLoss()

Instructions:

- Use `optim` to create an SGD optimizer with a learning rate of your choice (must be less than one).
- Update the model's parameters using the optimizer.


In [64]:
# Create the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001)

loss = criterion(pred, target)
loss.backward()

# Update the model's parameters using the optimizer
optimizer.step()

## Writing your first training loop


### Using the MSELoss

Recall that we can't use cross-entropy loss for regression problems. The mean squared error loss (MSELoss) is a common loss function for regression problems. In this exercise, you will practice calculating and observing the loss using NumPy as well as its PyTorch implementation.

Instructions: 
- Calculate the MSELoss using NumPy.
- Create a MSELoss function using PyTorch.
- Convert `y_hat` and `y` to tensors and then float data types, and then use them to calculate MSELoss using PyTorch as `mse_pytorch`.

In [2]:
y_hat = np.array(10)
y = np.array(1)

# Calculate the MSELoss using NumPy
mse_numpy = np.mean((y_hat - y ) ** 2)

# Create the MSELoss function
criterion = nn.MSELoss()

# Calculate the MSELoss using the created loss function
mse_pytorch = criterion(torch.tensor(y_hat).float(), torch.tensor(y).float())
print(mse_pytorch)

tensor(81.)


### Writing a training loop

In scikit-learn, the whole training loop is contained in the .fit() method. In PyTorch, however, you implement the loop manually. While this provides control over loop's content, it requires a custom implementation.

You will write a training loop every time you train a deep learning model with PyTorch, which you'll practice in this exercise. The show_results() function provided will display some sample ground truth and the model predictions.

The package imports provided are: pandas as `pd`, `torch`, `torch.nn` as `nn`, `torch.optim` as `optim`, as well as `DataLoader` and `TensorDataset` from `torch.utils.data`.

The following variables have been created: `dataloader`, containing the dataloader; `model`, containing the neural network; `criterion`, containing the loss function, `nn.MSELoss()`; `optimizer`, containing the SGD optimizer; and `num_epochs`, containing the number of epochs.

Instructions:

- Write a for loop that iterates over the dataloader; this should be nested within a for loop that iterates over a range equal to the number of epochs.
- Set the gradients of the optimizer to zero.
- Write the forward pass.
- Compute the MSE loss value using the criterion() function provided.
- Compute the gradients.
- Update the model's parameters.

In [None]:
# Loop over the number of epochs and the dataloader
for i in range(num_epochs):
    for data in dataloader:
        # Set the gradients to zero
        optimizer.zero_grad()
        # Run a forward pass
        feature, target = data
        prediction = model(feature)
        # Calculate the loss
        loss = criterion(prediction, target)
        # Compute the gradients
        loss.backward()
        # Update the model's parameters
        optimizer.step()

show_results(model, dataloader)