In [3]:
import numpy as np
import torch
import torch.nn as nn

# Using the MSELoss

Recall that we can't use cross-entropy loss for regression problems. The mean squared error loss (MSELoss) is a common loss function for regression problems. In this exercise, you will practice calculating and observing the loss using NumPy as well as its PyTorch implementation.

* Calculate the MSELoss using NumPy.
* Create a MSELoss function using PyTorch.
* Convert y_hat and y to tensors and then float data types, and then use them to calculate MSELoss using PyTorch as mse_pytorch.

In [4]:
y_hat = np.array(10)
y = np.array(1)

# Calculate the MSELoss using NumPy
mse_numpy = np.mean((y_hat - y)**2)
print("MSE using NumPy:", mse_numpy)

# Create the MSELoss function
criterion = nn.MSELoss()

# Calculate the MSELoss using the created loss function
mse_pytorch = criterion(torch.tensor(y).float(), torch.tensor(y_hat).float())
print(mse_pytorch)

MSE using NumPy: 81.0
tensor(81.)


the loss outputs 81, the square of 9, as expected! The MSE loss is also called L2 loss. Another common loss function for regression problem is the mean absolute error loss, also called L1 loss.

**Writing a training loop**

In scikit-learn, the whole training loop is contained in the .fit() method. In PyTorch, however, you implement the loop manually. While this provides control over loop's content, it requires a custom implementation.

You will write a training loop every time you train a deep learning model with PyTorch, which you'll practice in this exercise. The show_results() function provided will display some sample ground truth and the model predictions.

* Write a for loop that iterates over the dataloader; this should be nested within a for loop that iterates over a range equal to the number of epochs.

* Set the gradients of the optimizer to zero.
* Write the forward pass.
* Compute the MSE loss value using the criterion() function provided.
* Compute the gradients.
* Update the model's parameters.



In [None]:
# Loop over the number of epochs and the dataloader
for i in range(num_epochs):
    for data in dataloader:
    # Set the gradients to zero
    optimizer.zero_grad()
    # Run a forward pass
    feature, target = data
    prediction = model(feature)    
    # Calculate the loss
    loss = criterion(prediction, target)    
    # Compute the gradients
    loss.backward()
    # Update the model's parameters
    optimizer.step()
show_results(model, dataloader)

Ground truth salary: 0.078. Predicted salary: 0.210.

Ground truth salary: 0.098. Predicted salary: 0.187.

Ground truth salary: 0.005. Predicted salary: 0.276.

Ground truth salary: 0.293. Predicted salary: 0.171.

Ground truth salary: 0.290. Predicted salary: 0.157.

Ground truth salary: 0.167. Predicted salary: 0.171.

Ground truth salary: 0.169. Predicted salary: 0.171.

Ground truth salary: 0.367. Predicted salary: 0.171.

Ground truth salary: 0.290. Predicted salary: 0.171.

Ground truth salary: 0.417. Predicted salary: 0.205.

Ground truth salary: 0.164. Predicted salary: 0.258.

Ground truth salary: 0.233. Predicted salary: 0.178.

# Implementing ReLU

The rectified linear unit (or ReLU) function is one of the most common activation functions in deep learning.

It overcomes the training problems linked with the sigmoid function you learned, such as the vanishing gradients problem

In [5]:
# Create a ReLU function with PyTorch
relu_pytorch = nn.ReLU()

# Apply your ReLU function on x, and calculate gradients
x = torch.tensor(-1.0, requires_grad=True)
y = relu_pytorch(x)
y.backward()

# Print the gradient of the ReLU function for x
gradient = x.grad
print(gradient)

tensor(0.)


Notice that the input value was -1, and the ReLU function returned zero. Recall from the graph in the video that for negative values of x, the output of ReLU is always zero, and indeed the gradient is zero everywhere because there is no change in the function with respect to any negative value of x.

**Implementing leaky ReLU**

You've learned that ReLU is one of the most used activation functions in deep learning. You will find it in modern architecture. However, it does have the inconvenience of outputting null values for negative inputs and therefore, having null gradients. Once an element of the input is negative, it will be set to zero for the rest of the training. Leaky ReLU overcomes this challenge by using a multiplying factor for negative inputs.

In [None]:
# Create a leaky relu function in PyTorch
leaky_relu_pytorch = nn.LeakyReLU(negative_slope = 0.05)

x = torch.tensor(-2.0)
# Call the above function on the tensor x
output = leaky_relu_pytorch(x)
print(output)

64
