# Loss Functions

**Creating one-hot encoded labels**

One-hot encoding is a technique that turns a single integer label into a vector of N elements, where N is the number of classes in your dataset. This vector only contains zeros and ones. In this exercise, you'll create the one-hot encoded vector of the label y provided.

our dataset contains three classes.

NumPy is already imported as np, and torch.nn.functional as F. The torch package is also imported.

* Manually create a one-hot encoded vector of the ground truth label y by filling in the NumPy array provided.

* Create a one-hot encoded vector of the ground truth label y using PyTorch.

In [2]:
import torch
import torch.nn as nn
import numpy as np
import torch.nn.functional as F

In [3]:
y = 1
num_classes = 3

# Create the one-hot encoded vector using NumPy
one_hot_numpy = np.array([0, 1, 0])
print(one_hot_numpy)

# Create the one-hot encoded vector using PyTorch
one_hot_pytorch = F.one_hot(torch.tensor(y),num_classes)

print(one_hot_pytorch)

[0 1 0]
tensor([0, 1, 0])


If you implement a custom dataset, you can make it output the one-hot encoded label directly. Indeed, you can add the one-hot encoding step to the __getitem__ method such that the returned label is already one-hot encoded!

**Calculating cross entropy loss**

Cross entropy loss is the most used loss for classification problems. In this exercise, you will create inputs and calculate cross entropy loss in PyTorch. You are provided with the ground truth label y and a vector of scores predicted by your model.

You'll start by creating a one-hot encoded vector of the ground truth label y, which is a required step to compare y with the scores predicted by your model. Next, you'll create a cross entropy loss function. Last, you'll call the loss function, which takes scores (model predictions before the final softmax function), and the one-hot encoded ground truth label, as inputs. It outputs a single float, the loss of that sample.



In [4]:
from torch.nn import CrossEntropyLoss

y = [2]
scores = torch.tensor([[0.1, 6.0, -2.0, 3.2]])

# Create a one-hot encoded vector of the label y
one_hot_label = F.one_hot(torch.tensor(y), scores.shape[1])

# Create the cross entropy loss function
criterion = nn.CrossEntropyLoss()

# Calculate the cross entropy loss
loss = criterion(scores.double(), one_hot_label.double())
print(loss)

tensor(8.0619, dtype=torch.float64)


 This is one of the most commonly used loss functions for classification tasks, where the goal is to predict the probability distribution of a set of target categories or classes.

# loss and Gradient

Recall that the operation performed by nn.Linear() is to take an input 
 and apply the transformation W * X + b 
 ,where 
 and 
 are two tensors (called the weight and bias).

A critical part of training PyTorch models is to calculate gradients of the weight and bias tensors with respect to a loss function.

In this exercise, you will calculate weight and bias tensor gradients using cross entropy loss and a sample of data.

In [18]:
# Weight tensor (2x9)
weight = torch.tensor([
    [0.1, -0.2, 0.3, 0.4, -0.5, 0.6, -0.7, 0.8, -0.9],
    [-0.1, 0.2, -0.3, -0.4, 0.5, -0.6, 0.7, -0.8, 0.9]], requires_grad=True)

# Bias: A 2-element tensor
bias = torch.tensor([0.1, -0.1], requires_grad=True)

# Example input tensor (1x9), assuming we have 9 features
input_tensor = torch.tensor([[0.5, -0.1, 0.2, 0.4, -0.3, 0.6, -0.7, 0.8, -0.9]])

# Calculate the predictions using the weight and bias
preds = torch.matmul(input_tensor, weight.t()) + bias

# Target: A 1-element tensor containing the class index (not one-hot encoded)
target = torch.tensor([0])

# Define the criterion
criterion = nn.CrossEntropyLoss()

# Calculate the loss
loss = criterion(preds, target)

# Compute the gradients of the loss
loss.backward()

# Display gradients of the weight and bias tensors
print("Gradients of weight tensor:\n", weight.grad)
print("Gradients of bias tensor:\n", bias.grad)

Gradients of weight tensor:
 tensor([[-0.0017,  0.0003, -0.0007, -0.0014,  0.0010, -0.0020,  0.0024, -0.0027,
          0.0031],
        [ 0.0017, -0.0003,  0.0007,  0.0014, -0.0010,  0.0020, -0.0024,  0.0027,
         -0.0031]])
Gradients of bias tensor:
 tensor([-0.0034,  0.0034])


**Accessing the model parameters**

A PyTorch model created with the nn.Sequential() is a module that contains the different layers of your network. Recall that each layer parameter can be accessed by indexing the created model directly. In this exercise, you will practice accessing the parameters of different linear layers of a neural network. You won't be accessing the sigmoid.

In [20]:
model = nn.Sequential(nn.Linear(16, 8),
                      nn.Sigmoid(),
                      nn.Linear(8, 2))

# Access the weight of the first linear layer
weight_0 = model[0].weight

# Access the bias of the second linear layer
bias_1 = model[2].bias

print("Weights of the first linear layer:\n", weight_0)
print("Bias of the second linear layer:\n", bias_1)

Weights of the first linear layer:
 Parameter containing:
tensor([[-0.1423, -0.1478, -0.0884, -0.1141, -0.0578,  0.0359,  0.1297,  0.1802,
          0.0859,  0.2066, -0.1932,  0.1018,  0.0078,  0.1047,  0.1191, -0.0663],
        [-0.2249,  0.1581,  0.1306, -0.1430,  0.1553,  0.1029,  0.1422, -0.1246,
         -0.2159,  0.0912,  0.0607, -0.1059,  0.1942,  0.0917, -0.1724,  0.1290],
        [-0.1499, -0.0373, -0.0353,  0.1264, -0.2333,  0.1531, -0.0538,  0.0040,
         -0.1968, -0.1421,  0.2244, -0.0176,  0.1909, -0.0598,  0.2307, -0.2306],
        [-0.0700,  0.0447,  0.0781,  0.0728,  0.1200,  0.1893, -0.1334,  0.1353,
          0.1712,  0.2275,  0.1818, -0.0209, -0.1674, -0.2108,  0.1260,  0.1216],
        [-0.1871,  0.1757, -0.0785, -0.0004, -0.1525,  0.2074, -0.1847, -0.1274,
          0.1373,  0.0351, -0.1874, -0.0103,  0.1444,  0.0548, -0.1626, -0.1775],
        [ 0.1069,  0.2168, -0.1857,  0.0659, -0.0683,  0.0297, -0.2009, -0.1793,
          0.1772, -0.1218,  0.0102,  0.2044, -

**Updating the weights manually**

Now that you know how to access weights and biases, you will manually perform the job of the PyTorch optimizer. PyTorch functions can do what you're about to do, but it's helpful to do the work manually at least once, to understand what's going on under the hood.

A neural network of three layers has been created and stored as the model variable. This network has been used for a forward pass and the loss and its derivatives have been calculated. A default learning rate, lr, has been chosen to scale the gradients when performing the update.

In [26]:
# Define the model
model = nn.Sequential(
    nn.Linear(16, 8),
    nn.Sigmoid(),
    nn.Linear(8, 2)
)

# Example input tensor with 16 features
input_tensor = torch.randn(1, 16)

# Target tensor containing the class index
target = torch.tensor([0])

# Forward pass: compute predictions
preds = model(input_tensor)

# Define the criterion
criterion = nn.CrossEntropyLoss()

# Calculate the loss
loss = criterion(preds, target)

# Zero the gradients
model.zero_grad()

# Compute the gradients of the loss
loss.backward()

# Access the weight of each linear layer
weight0 = model[0].weight
weight2 = model[2].weight

# Access the gradients of the weight of each linear layer
grads0 = weight0.grad
grads2 = weight2.grad

# Update the weights using the learning rate and the gradients
lr = 0.001

# Update the weights manually
with torch.no_grad():
    weight0.data -= lr * grads0
    weight2.data -= lr * grads2

# Check the updated weights
print("Updated weights of the first linear layer:\n", model[0].weight)
print("Updated weights of the second linear layer:\n", model[2].weight)

Updated weights of the first linear layer:
 Parameter containing:
tensor([[-0.2220,  0.0037, -0.1200,  0.0879,  0.2013,  0.0693,  0.1950,  0.0752,
         -0.1194,  0.1314,  0.0226,  0.2360,  0.1127,  0.0841,  0.2249,  0.1622],
        [ 0.0961,  0.0146, -0.0532, -0.1262,  0.1990, -0.0035, -0.1104, -0.2489,
          0.1633, -0.1912,  0.0380,  0.0883,  0.1372, -0.0030,  0.1256,  0.1055],
        [ 0.0560,  0.0849, -0.1534,  0.0357, -0.0543,  0.1105,  0.1526,  0.1055,
          0.0965, -0.1959,  0.0978, -0.0289,  0.1946, -0.1748, -0.1036,  0.0813],
        [ 0.1952, -0.0433,  0.1138, -0.0903, -0.1646, -0.1234, -0.1440, -0.2027,
          0.1718,  0.2243,  0.0963,  0.0616,  0.0582, -0.2227, -0.2004,  0.0438],
        [-0.0135,  0.1822, -0.1623, -0.1451, -0.0274,  0.2098,  0.1412, -0.0593,
         -0.0494, -0.1888, -0.0807, -0.2328, -0.0128, -0.0886,  0.2111, -0.2062],
        [-0.1569, -0.0406,  0.1758, -0.1298,  0.1087,  0.0017,  0.0657,  0.0516,
          0.1509, -0.0811, -0.1792, -0

**Using the PyTorch optimizer**

In the previous exercise, you manually updated the weight of a network. You now know what's going on under the hood, but this approach is not scalable to a network of many layers.

Thankfully, the PyTorch SGD optimizer does a similar job in a handful of lines of code. In this exercise, you will practice the last step to complete the training loop: updating the weights using a PyTorch optimizer.

A neural network has been created and provided as the model variable. This model was used to run a forward pass and create the tensor of predictions pred. The one-hot encoded tensor is named target and the cross entropy loss function is stored as criterion.

torch.optim as optim, and torch.nn as nn have already been loaded for you.

In [32]:
import torch.optim as optim

# Create the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.0001)

# Calculate the loss
loss = criterion(preds, target)

# Backpropagation: compute gradients
loss.backward()

# Update the model's parameters using the optimizer
optimizer.step()

# If you intend to backpropagate through the graph again, specify retain_graph=True
loss.backward(retain_graph=True)


RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.