# 2. PyTorch Basics - Writing a PyTorch Neural Network with Forward Propagation

### About this notebook

This notebook was used in the 50.039 Deep Learning course at the Singapore University of Technology and Design.

**Author:** Matthieu DE MARI (matthieu_demari@sutd.edu.sg)

**Version:** 1.0 (27/12/2022)

**Requirements:**
- Python 3 (tested on v3.9.6)
- Matplotlib (tested on v3.5.1)
- Numpy (tested on v1.22.1)
- Time
- Torch (tested on v1.13.0)

### Imports and CUDA

In [1]:
# Matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
# Numpy
import numpy as np
from numpy.random import default_rng
# Time
from time import time
# Torch
import torch

In [2]:
# Use GPU if available, else use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


### Mock dataset, with nonlinearity

As in the previous notebooks, we will reuse our nonlinear binary classification mock dataset and generate a training set with 1000 samples.

In [3]:
# All helper functions
eps = 1e-5
min_val = -1 + eps
max_val = 1 - eps
def val(min_val, max_val):
    return round(np.random.uniform(min_val, max_val), 2)
def class_for_val(val1, val2):
    k = np.pi
    return int(val2 >= -1/4 + 3/4*np.sin(val1*k))
def create_dataset(n_points, min_val, max_val):
    val1_list = np.array([val(min_val, max_val) for _ in range(n_points)])
    val2_list = np.array([val(min_val, max_val) for _ in range(n_points)])
    inputs = np.array([[v1, v2] for v1, v2 in zip(val1_list, val2_list)])
    outputs = np.array([class_for_val(v1, v2) for v1, v2 in zip(val1_list, val2_list)]).reshape(n_points, 1)
    return val1_list, val2_list, inputs, outputs

In [4]:
# Generate dataset (train)
np.random.seed(47)
n_points = 1000
train_val1_list, train_val2_list, train_inputs, train_outputs = create_dataset(n_points, min_val, max_val)

### Our previous Shallow Neural Net class, with everything

We wil reuse our previous **ShallowNeuralNet** class from Week2 Notebook 7, which:
- implements a Shallow neural network using two fully connected layers and sigmoid activation functions,
- uses a Stochastic Mini-Batch gradient descent, with Adam as its optimizer,
- uses a random normal initialization,
- comes with a forward() method for predictions,
- comes with a backward() and train() method for backpropagation training,
- comes with a cross-entropy loss function and an accuracy calculating loss function,
- comes with a display function, to show training curves on both the loss and the accuracy,
- comes with save and load functions.

For now, we will focus on replicating the init, forward, loss and accuracy methods in PyTorch.

In [5]:
class ShallowNeuralNet():
    
    def __init__(self, n_x, n_h, n_y):
        # Network dimensions
        self.n_x = n_x
        self.n_h = n_h
        self.n_y = n_y
        
        # Initialize parameters
        self.init_parameters_normal()
         
    def init_parameters_normal(self):
        # Weights and biases matrices (randomly initialized)
        self.W1 = np.random.randn(self.n_x, self.n_h)*0.1
        self.b1 = np.random.randn(1, self.n_h)*0.1
        self.W2 = np.random.randn(self.n_h, self.n_y)*0.1
        self.b2 = np.random.randn(1, self.n_y)*0.1

    def sigmoid(self, val):
        return 1/(1 + np.exp(-val))
    
    def forward(self, inputs):
        # Wx + b operation for the first layer
        Z1 = np.matmul(inputs, self.W1)
        Z1_b = Z1 + self.b1
        A1 = self.sigmoid(Z1_b)
        # Wx + b operation for the second layer
        Z2 = np.matmul(A1, self.W2)
        Z2_b = Z2 + self.b2
        y_pred = self.sigmoid(Z2_b)
        return y_pred
    
    def CE_loss(self, inputs, outputs):
        # MSE loss function as before
        outputs_re = outputs.reshape(-1, 1)
        pred = self.forward(inputs)
        eps = 1e-10
        losses = outputs*np.log(pred + eps) + (1 - outputs)*np.log(1 - pred + eps)
        loss = -np.sum(losses)/outputs.shape[0]
        return loss
    
    def accuracy(self, inputs, outputs):
        # Calculate accuracy for given inputs and ouputs
        pred = [int(val >= 0.5) for val in self.forward(inputs)]
        acc = sum([int(val1 == val2[0]) for val1, val2 in zip(pred, outputs)])/outputs.shape[0]
        return acc

We would then run the model by running the commands below.

In [6]:
# Define a neural network structure
n_x = 2
n_h = 10
n_y = 1
np.random.seed(37)
shallow_neural_net = ShallowNeuralNet(n_x, n_h, n_y)
pred = shallow_neural_net.forward(train_inputs)
acc = shallow_neural_net.accuracy(train_inputs, train_outputs)
loss = shallow_neural_net.CE_loss(train_inputs, train_outputs)
print(pred.shape)
print(train_outputs.shape)
print(acc)
print(loss)

(1000, 1)
(1000, 1)
0.626
0.6853940202992042


### Rewriting our class using PyTorch operations instead of Numpy - Init,  Forward, Loss and Accuracy

The main differences between the original class and the PyTorch version in the **ShallowNeuralNet_PT** class are:
- The PyTorch version of the class should inherit from torch.nn.Module and call its parent's init method using super(). This is necessary because PyTorch uses classes inherited from Module to keep track of the layers and their parameters in a neural network.
- Instead of using NumPy arrays for the weights and biases, the PyTorch version uses torch.nn.Parameter objects, which are tensors that are optimized by PyTorch's optimizers.
- The activation function sigmoid is replaced with PyTorch's torch.sigmoid function.
- In the CE_loss and accuracy methods, we will reuse the torch functions and methods as much as possible, instead of the Numpy ones.

In [7]:
# Our class will inherit from the torch.nn.Module
# used to write all model in PyTorch
class ShallowNeuralNet_PT(torch.nn.Module):
    
    def __init__(self, n_x, n_h, n_y, device):
        # Super __init__ for inheritance
        super().__init__()
        
        # Network dimensions (as before)
        self.n_x = n_x
        self.n_h = n_h
        self.n_y = n_y
        
        # Device
        self.device = device
        
        # Initialize parameters using the torch.nn.Parameter type (a subclass of Tensors).
        # We immediatly initialize the parameters using a random normal.
        # The RNG is done using torch.randn instead of the NumPy RNG.
        # We add a conversion into float64 (the same float type used by Numpy to generate our data)
        # And send them to our GPU/CPU device
        self.W1 = torch.nn.Parameter(torch.randn(n_x, n_h, requires_grad = True, \
                                     dtype = torch.float64, device = device)*0.1)
        self.b1 = torch.nn.Parameter(torch.randn(1, n_h, requires_grad = True, \
                                     dtype = torch.float64, device = device)*0.1)
        self.W2 = torch.nn.Parameter(torch.randn(n_h, n_y, requires_grad = True, \
                                     dtype = torch.float64, device = device)*0.1)
        self.b2 = torch.nn.Parameter(torch.randn(1, n_y, requires_grad = True, \
                                     dtype = torch.float64, device = device)*0.1)
        self.W1.retain_grad()
        self.b1.retain_grad()
        self.W2.retain_grad()
        self.b2.retain_grad()
        
    def forward(self, inputs):
        # Instead of using np.matmul(), we use its equivalent in PyTorch,
        # which is torch.matmul()!
        # (Most numpy matrix operations ahve their equivalent in torch, check it out!)
        # Wx + b operation for the first layer
        Z1 = torch.matmul(inputs, self.W1)
        Z1_b = Z1 + self.b1
        # Sigmoid is already implemented in PyTorch, feel fre to reuse it!
        A1 = torch.sigmoid(Z1_b)
        
        # Wx + b operation for the second layer
        # (Same as first layer)
        Z2 = torch.matmul(A1, self.W2)
        Z2_b = Z2 + self.b2
        y_pred = torch.sigmoid(Z2_b)
        return y_pred
    
    def CE_loss(self, pred, outputs):
        # We will use an epsilon to avoid NaNs on the log() values
        eps = 1e-10
        # As before with matmul, most operations in NumPy have their equivalent in torch (e.g. log and sum)
        losses = outputs * torch.log(pred + eps) + (1 - outputs) * torch.log(1 - pred + eps)
        loss = -torch.sum(losses)/outputs.shape[0]
        return loss
    
    def accuracy(self, pred, outputs):
        # Calculate accuracy for given inputs and ouputs
        # We will, again, rely as much as possible on the torch methods and functions. 
        return ((pred >= 0.5).int() == outputs).float().mean()

Before we are able to use this Neural Network on our dataset, we need to convert them to PyTorch Tensor objects and send them to GPU (if available).

In [8]:
train_inputs_pt = torch.from_numpy(train_inputs).to(device)
train_outputs_pt = torch.from_numpy(train_outputs).to(device)

In [9]:
# Define a neural network structure
n_x = 2
n_h = 10
n_y = 1
np.random.seed(37)
shallow_neural_net_pt = ShallowNeuralNet_PT(n_x, n_h, n_y, device)
train_pred = shallow_neural_net_pt.forward(train_inputs_pt)
acc = shallow_neural_net_pt.accuracy(train_pred, train_outputs_pt)
loss = shallow_neural_net_pt.CE_loss(train_pred, train_outputs_pt)
print(train_pred.shape)
print(train_outputs_pt.shape)
print(acc, loss)
print(acc.item(), loss.item())

torch.Size([1000, 1])
torch.Size([1000, 1])
tensor(0.6260, device='cuda:0') tensor(0.6881, device='cuda:0', dtype=torch.float64, grad_fn=<DivBackward0>)
0.6260000467300415 0.688087761172729


Also, quick note, using the ShallowNeuralNet_PT object (or any torch.nn.Module object for that matter) as a function will call the forward method we have defined! It MUST therefore use the name forward!

In [10]:
# This is therefore equivalent to train_pred = shallow_neural_net_pt.forward(train_inputs_pt)
train_pred = shallow_neural_net_pt(train_inputs_pt)
print(train_pred.shape)
print(train_outputs_pt.shape)

torch.Size([1000, 1])
torch.Size([1000, 1])


### Computation times comparison

Below, we run both models (the NumPy one and the PyTorch one) and ask them to perform 1000 times the accuracy computation.

On my machine (which is CUDA enabled and uses an Nvidia GTX 1060), we can observe that the PyTorch model is roughly 30 times faster!

This will obviously depend on your machine.

In [14]:
# Calculate computation times (NumPy NN)
start = time()
for i in range(1000):
    train_acc = shallow_neural_net.accuracy(train_inputs, train_outputs)
end = time()
time_np = end - start
print(time_np)

# Calculate computation times (PyTorch NN)
start = time()
for i in range(1000):
    train_acc_pt = shallow_neural_net_pt.accuracy(train_inputs_pt, train_outputs_pt)
end = time()
time_pt = end - start
print(time_pt)

# Ratio
ratio = time_np/time_pt
print(ratio)

2.0732526779174805
0.07289958000183105
28.43984393141093


### What's next?

In the next notebook, we will investigate how to implement the backpropagation mechanism using the PyTorch framework, and eventually use it to train our model.