# 2. Training Our First Neural Network with PyTorch

To train a neural network in PyTorch, you will first need to understand the job of a loss function. You will then realize that training a network requires minimizing that loss function, which is done by calculating gradients. You will learn how to use these gradients to update your model's parameters, and finally, you will write your first training loop.

### Preparing the environment

In [1]:
# Importing libraries
import expectexception

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torch.nn import CrossEntropyLoss
from torch.utils.data import TensorDataset, DataLoader

In [2]:
# Global variables
SEED = 42

## 2.1 Running a forward pass

### Binary classification: forward pass

In [3]:
# Create input data of shape 5x6
input_data = torch.tensor([
    [-0.4421, 1.5207, 2.0607, -0.3647, 0.4691, 0.0946],
    [-0.9155, -0.0475, -1.3645, 0.6336, -1.9520, -0.3398],
    [ 0.7406, 1.6763, -0.8511, 0.2432, 0.1123, -0.0633],
    [-1.6630, -0.0718, -0.1285, 0.5396, -0.0288, -0.8622],
    [-0.7413, 1.7920, -0.0883, -0.6685, 0.4745, -0.4245]
])

# Create binary classification model
model = nn.Sequential(
    nn.Linear(6, 4), # First linear layer
    nn.Linear(4, 1), # Second linear layer
    nn.Sigmoid() # Sigmoid activation function
)

# Pass input data through model
output = model(input_data)
print(output)

tensor([[0.4032],
        [0.3973],
        [0.3619],
        [0.3240],
        [0.3497]], grad_fn=<SigmoidBackward0>)


### Multi-class classification: forward pass

In [4]:
# Create input data of shape 5x6
input_data = torch.tensor([
    [-0.4421, 1.5207, 2.0607, -0.3647, 0.4691, 0.0946],
    [-0.9155, -0.0475, -1.3645, 0.6336, -1.9520, -0.3398],
    [ 0.7406, 1.6763, -0.8511, 0.2432, 0.1123, -0.0633],
    [-1.6630, -0.0718, -0.1285, 0.5396, -0.0288, -0.8622],
    [-0.7413, 1.7920, -0.0883, -0.6685, 0.4745, -0.4245]
])

# Specify model has three classes
n_classes = 3

# Create multiclass classification model
model = nn.Sequential(
    nn.Linear(6, 4), # First linear layer
    nn.Linear(4, n_classes), # Second linear layer
    nn.Softmax(dim=-1) # Softmax activation
)

# Pass input data through model
output = model(input_data)
print(output)
print(output.shape)

tensor([[0.1036, 0.4562, 0.4402],
        [0.3329, 0.4356, 0.2315],
        [0.1878, 0.6083, 0.2038],
        [0.1488, 0.4402, 0.4109],
        [0.1261, 0.5277, 0.3462]], grad_fn=<SoftmaxBackward0>)
torch.Size([5, 3])


### Ex.1 - Building a binary classifier in PyTorch
Recall that a small neural network with a single linear layer followed by a sigmoid function is a binary classifier. It acts just like a logistic regression.

In this exercise, you'll practice building this small network and interpreting the output of the classifier.

The torch package and the torch.nn package have already been imported for you.

**Instructions**

1. Create a neural network that takes a tensor of dimensions 1x8 as input, and returns an output of the correct shape for binary classification.
2. Pass the output of the linear layer to a sigmoid, which both takes in and return a single float.

In [5]:
import torch
import torch.nn as nn

input_tensor = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])

# Implement a small neural network for binary classification
model = nn.Sequential(
  nn.Linear(8, 1),
  nn.Sigmoid()
)

output = model(input_tensor)
print(output)

tensor([[0.6407]], grad_fn=<SigmoidBackward0>)


### Ex.2 - From regression to multi-class classification
Recall that the models we have seen for binary classification, multi-class classification and regression have all been similar, barring a few tweaks to the model.

In this exercise, you'll start by building a model for regression, and then tweak the model to perform a multi-class classification.

**Instructions:**

1. Create a neural network with exactly four linear layers, which takes the input tensor as input, and outputs a regression value, using any shapes you like for the hidden layers.
2. A similar neural network to the one you just built is provided, containing four linear layers; update this network to perform a multi-class classification with four outputs.

In [6]:
# Create a neural network with exactly four linear layers, which takes the input tensor as input,
# and outputs a regression value, using any shapes you like for the hidden layers.
input_tensor = torch.Tensor([[3, 4, 6, 7, 10, 12, 2, 3, 6, 8, 9]])

# Implement a neural network with exactly four linear layers
model = nn.Sequential(
    nn.Linear(11, 20),
    nn.Linear(20, 12),
    nn.Linear(12, 8),
    nn.Linear(8, 1)
)

output = model(input_tensor)
print(output)

tensor([[0.1565]], grad_fn=<AddmmBackward0>)


In [7]:
# A similar neural network to the one you just built is provided, containing four linear layers;
# update this network to perform a multi-class classification with four outputs.
input_tensor = torch.Tensor([[3, 4, 6, 7, 10, 12, 2, 3, 6, 8, 9]])

# Update network below to perform a multi-class classification with four labels
model = nn.Sequential(
    nn.Linear(11, 20),
    nn.Linear(20, 12),
    nn.Linear(12, 6),
    nn.Linear(6, 4), 
    nn.Softmax(dim=-1)
)

output = model(input_tensor)
print(output)

tensor([[0.2372, 0.2076, 0.3518, 0.2034]], grad_fn=<SoftmaxBackward0>)


## 2.2 Using loss functions to assess model predictions

### One-hot encoding concepts

In [8]:
one_hot_numpy = np.array([1, 0, 0])
one_hot_numpy

array([1, 0, 0])

### Transforming labels with one-hot encoding

In [9]:
F.one_hot(torch.tensor(0), num_classes=3)

tensor([1, 0, 0])

In [10]:
F.one_hot(torch.tensor(1), num_classes=3)

tensor([0, 1, 0])

In [11]:
F.one_hot(torch.tensor(2), num_classes=3)

tensor([0, 0, 1])

### Cross entropy loss in PyTorch

In [12]:
%%expect_exception RuntimeError

# Integers are not allowed in `CrossEntropyLoss`
scores = torch.tensor([[-0.1211, 0.1059]])
one_hot_target = torch.tensor([[1, 0]])

criterion = CrossEntropyLoss()
criterion(scores, one_hot_target)

[1;31m---------------------------------------------------------------------------[0m
[1;31mRuntimeError[0m                              Traceback (most recent call last)
Cell [1;32mIn[12], line 6[0m
[0;32m      3[0m one_hot_target [38;5;241m=[39m torch[38;5;241m.[39mtensor([[[38;5;241m1[39m, [38;5;241m0[39m]])
[0;32m      5[0m criterion [38;5;241m=[39m CrossEntropyLoss()
[1;32m----> 6[0m [43mcriterion[49m[43m([49m[43mscores[49m[43m,[49m[43m [49m[43mone_hot_target[49m[43m)[49m

File [1;32mC:\ProgramData\anaconda3\envs\deep\lib\site-packages\torch\nn\modules\module.py:1553[0m, in [0;36mModule._wrapped_call_impl[1;34m(self, *args, **kwargs)[0m
[0;32m   1551[0m     [38;5;28;01mreturn[39;00m [38;5;28mself[39m[38;5;241m.[39m_compiled_call_impl([38;5;241m*[39margs, [38;5;241m*[39m[38;5;241m*[39mkwargs)  [38;5;66;03m# type: ignore[misc][39;00m
[0;32m   1552[0m [38;5;28;01melse[39;00m:
[1;32m-> 1553[0m     [38;5;28;01mreturn[39;00

In [13]:
# Casting all values to double
scores = torch.tensor([[-0.1211, 0.1059]])
one_hot_target = torch.tensor([[1, 0]])

criterion = CrossEntropyLoss()
criterion(scores.double(), one_hot_target.double())

tensor(0.8131, dtype=torch.float64)

In [14]:
# From the beginning only float values
scores = torch.tensor([[-0.1211, 0.1059]])
one_hot_target = torch.tensor([[1.0, 0.0]])

criterion = CrossEntropyLoss()
criterion(scores, one_hot_target)

tensor(0.8131)

### Ex.3 - Creating one-hot encoded labels

One-hot encoding is a technique that turns a single integer label into a vector of N elements, where N is the number of classes in your dataset. This vector only contains zeros and ones. In this exercise, you'll create the one-hot encoded vector of the label y provided.

You'll practice doing this manually, and then make your life easier by leveraging the help of PyTorch! Your dataset contains three classes.

NumPy is already imported as np, and torch.nn.functional as F. The torch package is also imported.

**Instructions**

1. Manually create a one-hot encoded vector of the ground truth label y by filling in the NumPy array provided.
2. Create a one-hot encoded vector of the ground truth label y using PyTorch.

In [15]:
y = 1
num_classes = 3

# Create the one-hot encoded vector using NumPy
one_hot_numpy = np.array([0, 1, 0])
print(one_hot_numpy)

# Create the one-hot encoded vector using PyTorch
one_hot_pytorch = F.one_hot(torch.tensor(1), num_classes=3)
print(one_hot_pytorch)

[0 1 0]
tensor([0, 1, 0])


In [16]:
one_hot_pytorch == one_hot_numpy

tensor([True, True, True])

### Ex.4 - Calculating cross entropy loss

Cross entropy loss is the most used loss for classification problems. In this exercise, you will create inputs and calculate cross entropy loss in PyTorch. You are provided with the ground truth label y and a vector of scores predicted by your model.

You'll start by creating a one-hot encoded vector of the ground truth label y, which is a required step to compare y with the scores predicted by your model. Next, you'll create a cross entropy loss function. Last, you'll call the loss function, which takes scores (model predictions before the final softmax function), and the one-hot encoded ground truth label, as inputs. It outputs a single float, the loss of that sample.

torch, CrossEntropyLoss, and torch.nn.functional as F have already been imported for you.

**Instructions**
    
1. Create the one-hot encoded vector of the ground truth label y and assign it to one_hot_label.
2. Create the cross entropy loss function and store it as criterion.

In [17]:
y = [2]
scores = torch.tensor([[0.1, 6.0, -2.0, 3.2]])
print('Scores:', scores)

# Create a one-hot encoded vector of the label y
one_hot_label = F.one_hot(torch.tensor(y), num_classes=scores.shape[1])
print('One Hot Label:', one_hot_label)

Scores: tensor([[ 0.1000,  6.0000, -2.0000,  3.2000]])
One Hot Label: tensor([[0, 0, 1, 0]])


In [18]:
torch.tensor(y)

tensor([2])

In [19]:
# Create the cross entropy loss function
criterion = CrossEntropyLoss()

# Calculate the cross entropy loss
loss = criterion(scores.double(), one_hot_label.double())
print(loss)

tensor(8.0619, dtype=torch.float64)


## 2.3 Using derivatives to update model parameters

### Backpropagation in PyTorch

In [20]:
sample = torch.randn(1, 16)
target = F.one_hot(torch.tensor([0]), num_classes=2)
print('Sample:', sample)
print('Target:', target)

Sample: tensor([[ 0.8796, -1.3361, -0.4426,  0.8302, -0.4967,  1.1242,  1.1429, -0.0919,
         -0.5407,  0.1484, -0.2874,  0.2823,  0.2772, -0.2768,  0.0566,  0.0611]])
Target: tensor([[1, 0]])


In [21]:
# Reproducibility
torch.manual_seed(SEED)

# Create the model and run a forward pass
model = nn.Sequential(
    nn.Linear(16, 8),
    nn.Linear(8, 4),
    nn.Linear(4, 2)
)

prediction = model(sample)
print('Prediction:', prediction)

Prediction: tensor([[0.2020, 0.3538]], grad_fn=<AddmmBackward0>)


In [22]:
# Calculate the loss and compute the gradients
criterion = CrossEntropyLoss()
loss = criterion(prediction.double(), target.double())
print('Loss:', loss)

Loss: tensor(0.7719, dtype=torch.float64, grad_fn=<DivBackward1>)


In [23]:
loss.backward()
print('Loss:', loss)

# Access each layer's gradients
model[0].weight.grad, model[0].bias.grad

Loss: tensor(0.7719, dtype=torch.float64, grad_fn=<DivBackward1>)


(tensor([[ 0.0239, -0.0363, -0.0120,  0.0226, -0.0135,  0.0306,  0.0311, -0.0025,
          -0.0147,  0.0040, -0.0078,  0.0077,  0.0075, -0.0075,  0.0015,  0.0017],
         [ 0.0241, -0.0365, -0.0121,  0.0227, -0.0136,  0.0307,  0.0313, -0.0025,
          -0.0148,  0.0041, -0.0079,  0.0077,  0.0076, -0.0076,  0.0015,  0.0017],
         [ 0.0912, -0.1385, -0.0459,  0.0861, -0.0515,  0.1165,  0.1185, -0.0095,
          -0.0560,  0.0154, -0.0298,  0.0293,  0.0287, -0.0287,  0.0059,  0.0063],
         [ 0.0268, -0.0407, -0.0135,  0.0253, -0.0151,  0.0342,  0.0348, -0.0028,
          -0.0165,  0.0045, -0.0088,  0.0086,  0.0084, -0.0084,  0.0017,  0.0019],
         [ 0.0432, -0.0656, -0.0217,  0.0408, -0.0244,  0.0552,  0.0561, -0.0045,
          -0.0266,  0.0073, -0.0141,  0.0139,  0.0136, -0.0136,  0.0028,  0.0030],
         [ 0.0960, -0.1459, -0.0483,  0.0906, -0.0542,  0.1228,  0.1248, -0.0100,
          -0.0590,  0.0162, -0.0314,  0.0308,  0.0303, -0.0302,  0.0062,  0.0067],
         [

In [24]:
weight = model[0].weight
weight.grad

tensor([[ 0.0239, -0.0363, -0.0120,  0.0226, -0.0135,  0.0306,  0.0311, -0.0025,
         -0.0147,  0.0040, -0.0078,  0.0077,  0.0075, -0.0075,  0.0015,  0.0017],
        [ 0.0241, -0.0365, -0.0121,  0.0227, -0.0136,  0.0307,  0.0313, -0.0025,
         -0.0148,  0.0041, -0.0079,  0.0077,  0.0076, -0.0076,  0.0015,  0.0017],
        [ 0.0912, -0.1385, -0.0459,  0.0861, -0.0515,  0.1165,  0.1185, -0.0095,
         -0.0560,  0.0154, -0.0298,  0.0293,  0.0287, -0.0287,  0.0059,  0.0063],
        [ 0.0268, -0.0407, -0.0135,  0.0253, -0.0151,  0.0342,  0.0348, -0.0028,
         -0.0165,  0.0045, -0.0088,  0.0086,  0.0084, -0.0084,  0.0017,  0.0019],
        [ 0.0432, -0.0656, -0.0217,  0.0408, -0.0244,  0.0552,  0.0561, -0.0045,
         -0.0266,  0.0073, -0.0141,  0.0139,  0.0136, -0.0136,  0.0028,  0.0030],
        [ 0.0960, -0.1459, -0.0483,  0.0906, -0.0542,  0.1228,  0.1248, -0.0100,
         -0.0590,  0.0162, -0.0314,  0.0308,  0.0303, -0.0302,  0.0062,  0.0067],
        [-0.0087,  0.0

In [25]:
model[1].weight.grad, model[1].bias.grad

(tensor([[-0.0349, -0.0738, -0.2834, -0.2535,  0.1177, -0.0723, -0.0695,  0.0597],
         [ 0.0163,  0.0346,  0.1329,  0.1189, -0.0552,  0.0339,  0.0326, -0.0280],
         [ 0.0359,  0.0760,  0.2919,  0.2611, -0.1212,  0.0745,  0.0716, -0.0615],
         [-0.0160, -0.0339, -0.1300, -0.1163,  0.0540, -0.0332, -0.0319,  0.0274]]),
 tensor([-0.3511,  0.1646,  0.3616, -0.1611]))

In [26]:
model[2].weight.grad, model[2].bias.grad

(tensor([[ 0.0043, -0.0242,  0.0798,  0.1621],
         [-0.0043,  0.0242, -0.0798, -0.1621]]),
 tensor([-0.5379,  0.5379]))

### Updating model parameters

In [27]:
# Learning rate is typically small
lr = 0.001

# Update the weights
weight = model[0].weight
weight_grad = model[0].weight.grad
weight = weight - lr * weight_grad

# Update the biases
bias = model[0].bias
bias_grad = model[0].bias.grad
bias = bias - lr * bias_grad

print('Weight:', weight)
print('Bias:', bias)

Weight: tensor([[ 0.1911,  0.2075, -0.0586,  0.2296, -0.0548,  0.0504, -0.1217,  0.1468,
          0.2204, -0.1834,  0.2173,  0.0468,  0.1847,  0.0339,  0.1205, -0.0353],
        [ 0.1927,  0.0370, -0.1167,  0.0637, -0.1152, -0.0293, -0.1016,  0.1658,
         -0.1973, -0.1153, -0.0706, -0.1503,  0.0236, -0.2469,  0.2258, -0.2124],
        [ 0.1929,  0.0417, -0.0811,  0.1544,  0.0390,  0.2019,  0.0272, -0.0788,
          0.0672, -0.0678,  0.1052,  0.2232,  0.1445, -0.1093,  0.1443,  0.0447],
        [ 0.1269, -0.1523, -0.2475, -0.0966, -0.1917,  0.2051,  0.0720,  0.1036,
          0.0791, -0.0044,  0.1957, -0.1776,  0.0157, -0.1706,  0.0771, -0.0861],
        [ 0.0766, -0.0520,  0.2074, -0.1482, -0.1491, -0.1492,  0.2248,  0.0833,
          0.2406, -0.2063, -0.2480, -0.1956, -0.1682,  0.1013,  0.0895,  0.2077],
        [-0.1292, -0.1703,  0.1327, -0.1011,  0.1518, -0.0594,  0.1429, -0.1942,
         -0.1261,  0.0762,  0.0529, -0.0638,  0.1490,  0.1700, -0.1813, -0.1335],
        [ 0.22

### Gradient descent

In [28]:
# Create the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001)
optimizer

SGD (
Parameter Group 0
    dampening: 0
    differentiable: False
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    momentum: 0
    nesterov: False
    weight_decay: 0
)

In [29]:
# Optimizer handles updating model parameters (or weights) after calculation of local
# gradients
optimizer.step()

### Ex.5 - Estimating a sample
In previous exercises, you used linear layers to build networks.

Recall that the operation performed by `nn.Linear()` is to take an input $ X $ and apply the transformation $ W * X + b $, where $ W $ and $ b $ are two tensors (called the weight and bias).

A critical part of training PyTorch models is to calculate gradients of the weight and bias tensors with respect to a loss function.

In this exercise, you will calculate weight and bias tensor gradients using cross entropy loss and a sample of data.

The following tensors are provided:

- weight: a $ 2 \times 9 $ -element tensor
- bias: a $ 2 $ -element tensor
- preds: a $ 1 \times 2 $ -element tensor containing the model predictions
- target: a $ 1 \times 2 $ -element one-hot encoded tensor containing the ground-truth label

Note: If you see the error RuntimeError: Trying to backward through the graph a second time, please reload the page and try again.

**Instructions**

1. Use the criterion you have defined to calculate the loss value with respect to the predictions and target values.
2. Compute the gradients of the cross entropy loss.
3. Display the gradients of the weight and bias tensors, in that order.

In [30]:
# Model declaration
sample = torch.randn(1, 9)
target = F.one_hot(torch.tensor([0]), num_classes=2)
print('Sample:', sample)
print('Target:', target)

# Reproducibility
torch.manual_seed(SEED)

# Create the model and run a forward pass
# del model
model = nn.Sequential(
    nn.Linear(9, 2),
    nn.Linear(2, 2)
)

preds = model(sample)
print('Preds:', preds)

Sample: tensor([[-0.0662, -0.4235, -2.3768,  0.0641, -0.3435,  1.2287, -0.2754, -0.2109,
          0.9287]])
Target: tensor([[1, 0]])
Preds: tensor([[-0.6585, -0.3968]], grad_fn=<AddmmBackward0>)


In [31]:
# Getting the weight and bias
weight = model[0].weight
bias = model[0].bias

print('Weight:', weight)
print('Bias:', bias)

Weight: Parameter containing:
tensor([[ 0.2548,  0.2767, -0.0781,  0.3062, -0.0730,  0.0673, -0.1623,  0.1958,
          0.2938],
        [-0.2445,  0.2897,  0.0624,  0.2463,  0.0451,  0.1607, -0.0471,  0.2570,
          0.0493]], requires_grad=True)
Bias: Parameter containing:
tensor([-0.1556,  0.0850], requires_grad=True)


In [32]:
# Calculate the loss and compute the gradients
criterion = CrossEntropyLoss()

# Calculate the loss
loss = criterion(preds.double(), target.double())
print('Loss:', loss)

Loss: tensor(0.8325, dtype=torch.float64, grad_fn=<DivBackward1>)


In [33]:
# Compute the gradients of the loss
loss.backward()

# Display gradients of the weight and bias tensors in order
print(model[0].weight.grad)
print(model[0].bias.grad)

tensor([[-0.0014, -0.0092, -0.0518,  0.0014, -0.0075,  0.0268, -0.0060, -0.0046,
          0.0203],
        [-0.0207, -0.1321, -0.7414,  0.0200, -0.1071,  0.3832, -0.0859, -0.0658,
          0.2897]])
tensor([0.0218, 0.3119])


### Ex.6 - Accessing the model parameters
A PyTorch model created with the nn.Sequential() is a module that contains the different layers of your network. Recall that each layer parameter can be accessed by indexing the created model directly. In this exercise, you will practice accessing the parameters of different linear layers of a neural network. You won't be accessing the sigmoid.

**Instructions**

1. Access the weight parameter of the first linear layer.
2. Access the bias parameter of the second linear layer.

In [34]:
model = nn.Sequential(
    nn.Linear(16, 8),
    nn.Sigmoid(),
    nn.Linear(8, 2)
)

# Access the weight of the first linear layer
weight_0 = model[0].weight

# Access the bias of the second linear layer
bias_1 = model[2].bias

print('Weight of first layer:', weight_0)
print('Bias of second layer:', bias_1)

Weight of first layer: Parameter containing:
tensor([[-0.0706, -0.1503,  0.0236, -0.2469,  0.2258, -0.2124,  0.1930,  0.0416,
         -0.0812,  0.1545,  0.0390,  0.2020,  0.0273, -0.0788,  0.0672, -0.0678],
        [ 0.1052,  0.2232,  0.1445, -0.1093,  0.1443,  0.0447,  0.1270, -0.1524,
         -0.2475, -0.0966, -0.1918,  0.2051,  0.0720,  0.1036,  0.0791, -0.0043],
        [ 0.1957, -0.1776,  0.0157, -0.1706,  0.0771, -0.0861,  0.0766, -0.0521,
          0.2073, -0.1482, -0.1491, -0.1491,  0.2249,  0.0833,  0.2406, -0.2063],
        [-0.2480, -0.1956, -0.1682,  0.1013,  0.0895,  0.2077, -0.1291, -0.1704,
          0.1326, -0.1011,  0.1517, -0.0593,  0.1430, -0.1942, -0.1262,  0.0762],
        [ 0.0529, -0.0637,  0.1490,  0.1700, -0.1813, -0.1335,  0.2289, -0.0844,
         -0.0886, -0.2419, -0.1432,  0.0625, -0.0330, -0.1815,  0.0059, -0.1708],
        [-0.2121, -0.1377, -0.2188, -0.1592,  0.2499,  0.0472,  0.0770, -0.2332,
         -0.1642, -0.0832,  0.0391, -0.2200, -0.1077, -0.14

### Ex.7 - Updating the weights manually

Now that you know how to access weights and biases, you will manually perform the job of the PyTorch optimizer. PyTorch functions can do what you're about to do, but it's helpful to do the work manually at least once, to understand what's going on under the hood.

A neural network of three layers has been created and stored as the model variable. This network has been used for a forward pass and the loss and its derivatives have been calculated. A default learning rate, lr, has been chosen to scale the gradients when performing the update.

**Instructions**
1. Create the gradient variables by accessing the local gradients of each weight tensor.
2. Update the weights using the gradients scaled by the learning rate.

In [35]:
lr = 0.001

sample = torch.tensor([[
    -1.1258, -1.1524, -0.2506, -0.4339,  0.8487,  0.6920, -0.3160, -2.1152,
    0.3223, -1.2633,  0.3500,  0.3081,  0.1198,  1.2377,  1.1168, -0.2473
]])

target = torch.tensor([[1., 0.]])

print('Lr:', lr)
print('Sample:', sample)
print('Target:', target)

Lr: 0.001
Sample: tensor([[-1.1258, -1.1524, -0.2506, -0.4339,  0.8487,  0.6920, -0.3160, -2.1152,
          0.3223, -1.2633,  0.3500,  0.3081,  0.1198,  1.2377,  1.1168, -0.2473]])
Target: tensor([[1., 0.]])


In [36]:
# Reproducibility
torch.manual_seed(SEED)

# Setting the model
model = nn.Sequential(
    nn.Linear(16, 8),
    nn.Linear(8, 4),
    nn.Linear(4, 2)
)
print(model)

# Predictions
preds = model(sample)
print('Preds:', preds)

Sequential(
  (0): Linear(in_features=16, out_features=8, bias=True)
  (1): Linear(in_features=8, out_features=4, bias=True)
  (2): Linear(in_features=4, out_features=2, bias=True)
)
Preds: tensor([[0.1740, 0.2616]], grad_fn=<AddmmBackward0>)


In [37]:
# Calculate the loss and compute the gradients
criterion = CrossEntropyLoss()
loss = criterion(preds.double(), target.double())
print('Loss:', loss)

Loss: tensor(0.7379, dtype=torch.float64, grad_fn=<DivBackward1>)


In [38]:
# Compute the gradients of the loss
loss.backward()

weight0 = model[0].weight
weight1 = model[1].weight
weight2 = model[2].weight

# Access the gradients of the weight of each linear layer
grads0 = model[0].weight.grad
grads1 = model[1].weight.grad
grads2 = model[2].weight.grad

In [39]:
# Update the weights using the learning rate and the gradients
weight0 = weight0 - lr * grads0
weight1 = weight1 - lr * grads1
weight2 = weight2 - lr * grads2
print(weight0)

tensor([[ 0.1912,  0.2075, -0.0586,  0.2297, -0.0548,  0.0504, -0.1217,  0.1469,
          0.2204, -0.1834,  0.2173,  0.0468,  0.1847,  0.0338,  0.1205, -0.0353],
        [ 0.1928,  0.0370, -0.1167,  0.0637, -0.1152, -0.0293, -0.1015,  0.1659,
         -0.1974, -0.1152, -0.0706, -0.1503,  0.0236, -0.2470,  0.2257, -0.2124],
        [ 0.1931,  0.0417, -0.0812,  0.1545,  0.0389,  0.2019,  0.0274, -0.0786,
          0.0671, -0.0677,  0.1052,  0.2232,  0.1445, -0.1094,  0.1442,  0.0448],
        [ 0.1270, -0.1523, -0.2475, -0.0966, -0.1918,  0.2051,  0.0720,  0.1036,
          0.0791, -0.0043,  0.1956, -0.1776,  0.0157, -0.1707,  0.0771, -0.0861],
        [ 0.0767, -0.0520,  0.2074, -0.1482, -0.1491, -0.1491,  0.2249,  0.0834,
          0.2405, -0.2063, -0.2480, -0.1956, -0.1682,  0.1012,  0.0895,  0.2077],
        [-0.1290, -0.1703,  0.1327, -0.1010,  0.1516, -0.0594,  0.1430, -0.1940,
         -0.1262,  0.0764,  0.0528, -0.0638,  0.1490,  0.1698, -0.1814, -0.1334],
        [ 0.2289, -0.0

### Ex.8 - Using the PyTorch optimizer

In the previous exercise, you manually updated the weight of a network. You now know what's going on under the hood, but this approach is not scalable to a network of many layers.

Thankfully, the PyTorch SGD optimizer does a similar job in a handful of lines of code. In this exercise, you will practice the last step to complete the training loop: updating the weights using a PyTorch optimizer.

A neural network has been created and provided as the model variable. This model was used to run a forward pass and create the tensor of predictions pred. The one-hot encoded tensor is named target and the cross entropy loss function is stored as criterion.

**Instructions**

1. Use optim to create an SGD optimizer with a learning rate of your choice (must be less than one) for the model provided.
2. Update the model's parameters using the optimizer.

In [40]:
sample = torch.tensor([[
    -0.3165, -0.3995, -0.4551, -0.5769,  0.2253, -0.8436,  0.6609, -1.3375,
    -0.3312,  0.2476, -0.0099,  1.3701,  0.5060, -0.6079,  0.0933, -0.0922
]])

target = torch.tensor([[1., 0.]])

Lr = 0.001

print('Lr:', lr)
print('Sample:', sample)
print('Target:', target)

Lr: 0.001
Sample: tensor([[-0.3165, -0.3995, -0.4551, -0.5769,  0.2253, -0.8436,  0.6609, -1.3375,
         -0.3312,  0.2476, -0.0099,  1.3701,  0.5060, -0.6079,  0.0933, -0.0922]])
Target: tensor([[1., 0.]])


In [41]:
# Reproducibility
torch.manual_seed(SEED)

# Setting the model
model = nn.Sequential(
    nn.Linear(16, 8),
    nn.Linear(8, 4),
    nn.Linear(4, 2)
)
print(model)

# Predictions
preds = model(sample)
print('Preds:', preds)

Sequential(
  (0): Linear(in_features=16, out_features=8, bias=True)
  (1): Linear(in_features=8, out_features=4, bias=True)
  (2): Linear(in_features=4, out_features=2, bias=True)
)
Preds: tensor([[0.2134, 0.2295]], grad_fn=<AddmmBackward0>)


In [42]:
# Calculate the loss and compute the gradients
criterion = CrossEntropyLoss()
loss = criterion(preds.double(), target.double())
print('Loss:', loss)

Loss: tensor(0.7012, dtype=torch.float64, grad_fn=<DivBackward1>)


In [43]:
# Compute the gradients of the loss
loss.backward()

In [44]:
# Create the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001)

In [45]:
# Update the model's parameters using the optimizer
optimizer.step()

## 2.4 Writing our first training loop

### Introducing the Data Science Salary dataset

- The target is salary in US dollars; it is not a category but a continuous quantity

In [46]:
df = pd.read_csv('data-sources/data-salaries.csv')
df = df[['experience_level', 'employment_type', 'remote_ratio',
         'company_size', 'salary_in_usd']]

# Encoding categorical variables
df['experience_level'] = df.experience_level.astype('category').cat.codes
df['employment_type'] = df.employment_type.astype('category').cat.codes
df['company_size'] = df.company_size.astype('category').cat.codes

# Normalizing numerical variables
df['remote_ratio'] = (df.remote_ratio-df.remote_ratio.min())/(df.remote_ratio.max()-df.remote_ratio.min())
df['salary_in_usd'] = (df.salary_in_usd-df.salary_in_usd.min())/(df.salary_in_usd.max()-df.salary_in_usd.min())

df.head()

Unnamed: 0,experience_level,employment_type,remote_ratio,company_size,salary_in_usd
0,0,2,0.5,0,0.102982
1,3,2,1.0,0,0.10978
2,1,2,0.0,1,0.137533
3,1,2,0.5,0,0.380363
4,0,2,1.0,2,0.20452


### Introducing the Mean Squared Error Loss

- The mean squared error loss (MSE loss) is the squared difference between the prediction and the ground truth.

In [47]:
def mean_squared_loss(prediction, target):
    return np.mean((prediction - target)**2)

target = np.array([[1., 0.]])
preds = np.array([[0.2134, 0.2295]])

# Prediction and target are float tensors
loss = mean_squared_loss(preds, target)
print('Loss:', loss)

Loss: 0.335704905


In [48]:
target = torch.tensor([[1., 0.]])
preds = torch.tensor([[0.2134, 0.2295]])

# in PyTorch
criterion = nn.MSELoss()

# Prediction and target are float tensors
loss = criterion(preds, target)
print('Loss:', loss)

Loss: tensor(0.3357)


### Before the training loop

In [49]:
# Separate into features y target
features = df.drop(columns='salary_in_usd')
target = df[['salary_in_usd']]

# Create the dataset and the dataloader
dataset = TensorDataset(torch.tensor(features.values).float(),
                        torch.tensor(target.values).float())
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)

In [50]:
# Reproducibility
torch.manual_seed(SEED)

# Create the model
model = nn.Sequential(
    nn.Linear(4, 2),
    nn.Linear(2, 1)
)

# Create the loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)

### The training loop

In [51]:
# Loop through the dataset multiple times
num_epochs = 10
for epoch in range(num_epochs):
    for data in dataloader:
        # Set the gradients to zero
        optimizer.zero_grad()

        # Get feature and target from the data loader
        feature, target = data

        # Run a forward pass
        pred = model(feature)

        # Compute loss and gradients
        loss = criterion(pred, target)
        loss.backward()

        # Update the parameters
        optimizer.step()

### Ex.9 - Using the MSELoss

Recall that we can't use cross-entropy loss for regression problems. The mean squared error loss (MSELoss) is a common loss function for regression problems. In this exercise, you will practice calculating and observing the loss using NumPy as well as its PyTorch implementation.

The torch package has been imported as well as numpy as np and torch.nn as nn.

**Instructions**

1. Calculate the MSELoss using NumPy.
2. Create a MSELoss function using PyTorch.
3. Convert y_hat and y to tensors and then float data types, and then use them to calculate MSELoss using PyTorch as `mse_pytorch`.

In [52]:
# Gettin y and y_hat
y_hat = np.array(10)
y = np.array(1)

# Calculate the MSELoss using NumPy
mse_numpy = np.mean((y - y_hat)**2)
print(mse_numpy)

# Create the MSELoss function
criterion = nn.MSELoss()

# Calculate the MSELoss using the created loss function
mse_pytorch = criterion(torch.tensor(y).double(),
                        torch.tensor(y_hat).double())
print(mse_pytorch)

81.0
tensor(81., dtype=torch.float64)


### Ex.10 - Writing a training loop

In scikit-learn, the whole training loop is contained in the .fit() method. In PyTorch, however, you implement the loop manually. While this provides control over loop's content, it requires a custom implementation.

You will write a training loop every time you train a deep learning model with PyTorch, which you'll practice in this exercise. The show_results() function provided will display some sample ground truth and the model predictions.

The package imports provided are: pandas as pd, torch, torch.nn as nn, torch.optim as optim, as well as DataLoader and TensorDataset from torch.utils.data.

The following variables have been created: dataloader, containing the dataloader; model, containing the neural network; criterion, containing the loss function, nn.MSELoss(); optimizer, containing the SGD optimizer; and num_epochs, containing the number of epochs.

**Instructions**

1. Write a for loop that iterates over the dataloader; this should be nested within a for loop that iterates over a range equal to the number of epochs.
2. Set the gradients of the optimizer to zero.
3. Write the forward pass.
4. Compute the MSE loss value using the criterion() function provided.
5. Compute the gradients.
6. Update the model's parameters.

In [53]:
# Preparing environment
def show_results(model, dataloader):
    model.eval()
    iter_loader = iter(dataloader)
    
    for _ in range(3):
        feature, target = next(iter_loader)
        preds = model(feature)
        
        for p, t in zip(preds, target):
            print(f'Ground truth salary: {t.item():.3f}. Predicted salary: {p.item():.3f}.')

# Dataset
df.head()

Unnamed: 0,experience_level,employment_type,remote_ratio,company_size,salary_in_usd
0,0,2,0.5,0,0.102982
1,3,2,1.0,0,0.10978
2,1,2,0.0,1,0.137533
3,1,2,0.5,0,0.380363
4,0,2,1.0,2,0.20452


In [54]:
# Separate into features y target
features_data = df.drop(columns='salary_in_usd')
target_data = df[['salary_in_usd']]

# Create the dataset and the dataloader
dataset = TensorDataset(torch.tensor(features_data.values).float(),
                        torch.tensor(target_data.values).float())
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)

# Set the num of ephocs
num_epochs = 10

# Create the loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)

In [55]:
# Define the model
model = nn.Sequential(
    nn.Linear(4, 2),
    nn.Sigmoid(),
    nn.Linear(2, 1)
)
model

Sequential(
  (0): Linear(in_features=4, out_features=2, bias=True)
  (1): Sigmoid()
  (2): Linear(in_features=2, out_features=1, bias=True)
)

In [56]:
# Reproducibility
torch.manual_seed(SEED)

# Loop over the number of epochs and the dataloader
for i in range(num_epochs):
    for data in dataloader:
        # Set the gradients to zero
        optimizer.zero_grad()
        
        # Run a forward pass
        feature, target = data
        prediction = model(feature)  
        
        # Calculate the loss
        loss = criterion(prediction, target) 
        
        # Compute the gradients
        loss.backward()
        
        # Update the model's parameters
        optimizer.step()
        
show_results(model, dataloader)

Ground truth salary: 0.246. Predicted salary: -0.204.
Ground truth salary: 0.297. Predicted salary: -0.186.
Ground truth salary: 0.051. Predicted salary: -0.189.
Ground truth salary: 0.106. Predicted salary: -0.179.
Ground truth salary: 0.063. Predicted salary: -0.159.
Ground truth salary: 0.075. Predicted salary: -0.197.
Ground truth salary: 0.389. Predicted salary: -0.204.
Ground truth salary: 0.246. Predicted salary: -0.192.
Ground truth salary: 0.022. Predicted salary: -0.139.
Ground truth salary: 0.287. Predicted salary: -0.204.
Ground truth salary: 0.182. Predicted salary: -0.165.
Ground truth salary: 0.157. Predicted salary: -0.197.


-----------------------------------