<a target="_blank" href="https://colab.research.google.com/github/ntu-dl-bootcamp/deep-learning-2025/blob/main/SESSION2/session2_part1.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Deep Learning using PyTorch

In last week's session, you learnt the basics of Python and looked at some popular libraries for working with data. In this session, you will learn to work with PyTorch, one of the most popular Python frameworks for Deep Learning.

PyTorch is a Python-based Deep Learning platform that provides flexibility and speed for developing Deep Learning solutions. At its core, PyTorch provides the following key features:

* A multidimensional **Tensor** object, similar to NumPy Array but with GPU acceleration.
* An automatic differentiation engine called **Autograd** for efficiently and automatically computing derivatives needed for optimization.
* A clean, modular API for building and deploying deep learning models.

In this notebook, we'll first look at Tensors - what they are, how to create them, manipulate them and perform operations on them. In actual Deep Learning projects, we might be dealing with huge datasets. So we'll take a look at how we can manage them easily using PyTorch's features. We then go through the whole flow of building neural networks, training them, testing (as well as monitoring) their performance. We will look at all the components that go into a deep learning solution using the task of identifying hand written digits from the MNIST dataset as an example.


We have a few exercises after each section that will help you apply the concepts discussed in the notebook as well as allow you to experiment with them. We also have some challenges for you at the end of the notebook, try them all out!

**Copy the notebook to your drive to start working on it**

A few shortcuts that might come in handy:

*   Run a cell: Ctrl + Enter
*   Run a cell and move to the next one: Shift + Enter
*   Navigate between cells: Arrow keys

# PyTorch Basics



Let's first import some useful libraries - PyTorch (torch) and Numpy

In [2]:
import torch
import numpy as np

## Tensors

A PyTorch Tensor is a specialized data structure that is very similar to arrays and matrices. It is a multi-dimensional array with a uniform type (called a dtype), and can make use of GPUs or other hardware accelerators. Apart from the data a PyTorch tensor is supposed to store, when required, it stores additional information about its derivatives with respect to some other variables which might be necessary during the optimization step. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

###Creating Tensors

There are various ways of creating tensors, and when working on any real deep learning project, we will usually use more than one.



In [3]:
# From Python iterables such as lists and tuples

# tensor from a list
a = torch.tensor([0, 1, 2])

# tensor from a tuple of tuples (nested iterables can also be handled as long as
# the dimensions are compatible)
b = ((1.0, 1.1), (1.2, 1.3))
b = torch.tensor(b)

# tensor from a numpy array
c = np.ones([2, 3])  # numpy array
c = torch.tensor(c)

print(f"a: {a}")
print(f"b: {b}")
print(f"c: {c}")

a: tensor([0, 1, 2])
b: tensor([[1.0000, 1.1000],
        [1.2000, 1.3000]])
c: tensor([[1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)


In [4]:
# Tensor constructors

x = torch.ones(
    5, 3
)  # creates a tensor with given dimensions and initilaizes all elements to 1
y = torch.zeros(
    2
)  # creates a tensor with given dimensions and initializes all elements to 0
z = torch.empty(
    1, 1, 5
)  # creates a tensor with given dimensions, without any initialization. This is faster than ones and zeros. (Elements have garbage values)

print(f"x: {x}")
print(f"y: {y}")
print(f"z: {z}")

a = torch.ones_like(
    y
)  # creates a tensor of the same shape as the input to the function, and initializes all elements to 1
b = torch.zeros_like(
    torch.tensor([4, 5, 6, 7])
)  # creates a tensor of the same shape as the input to the function, and initializes all elements to 0

print(f"\na: {a}")
print(f"b: {b}")

x: tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
y: tensor([0., 0.])
z: tensor([[[0., 0., 0., 0., 0.]]])

a: tensor([1., 1.])
b: tensor([0, 0, 0, 0])


In [5]:
# Random tensors

# Uniform distribution
a = torch.rand(1, 3)

# Normal distribution
b = torch.randn(3, 4)

print(f"a: {a}")
print(f"b: {b}")

a: tensor([[0.9722, 0.1920, 0.1819]])
b: tensor([[-0.7039, -0.2266, -0.4688,  0.4628],
        [-1.3385,  1.3154, -0.2413,  0.6706],
        [-0.0798, -1.1860,  1.4470,  0.1781]])


In [6]:
# Number ranges

a = torch.arange(
    0, 10, step=2
)  # creates a tensor with elements from 0 (included) to 10 (excluded), with step size 2
b = torch.linspace(
    0, 5, steps=11
)  # creates a tensor with 11 equally spaced elements from 0 to 5 (both included)

print(f"a: {a}\n")
print(f"b: {b}\n")

a: tensor([0, 2, 4, 6, 8])

b: tensor([0.0000, 0.5000, 1.0000, 1.5000, 2.0000, 2.5000, 3.0000, 3.5000, 4.0000,
        4.5000, 5.0000])



### Tensor Operations

In [7]:
# Pointwise tensor arithmetic using regular python operators

x = torch.tensor([1, 2, 4, 8])
y = torch.tensor([1, 2, 3, 4])

# x + y, x - y, x * y, x / y, x**y  # The `**` is the exponentiation operator

print(f"x + y: {x + y}")
print(f"x - y: {x - y}")
print(f"x * y: {x * y}")
print(f"x / y: {x / y}")
print(f"x**y: {x**y}")

x + y: tensor([ 2,  4,  7, 12])
x - y: tensor([0, 0, 1, 4])
x * y: tensor([ 1,  4, 12, 32])
x / y: tensor([1.0000, 1.0000, 1.3333, 2.0000])
x**y: tensor([   1,    4,   64, 4096])


In [8]:
# Pointwise tensor arithmetic using inbuilt functions

# Adding tensors of the same shape
z = torch.add(x, y)
print(f"Sum of x and y: {z}\n")

# Pointwise multiplication of tensors
torch.multiply(
    x, y, out=z
)  # using the optional 'out' parameter to store the result (works only when z already exists)

print(f"Pointwise multiplication of x and y: {z}")

Sum of x and y: tensor([ 2,  4,  7, 12])

Pointwise multiplication of x and y: tensor([ 1,  4, 12, 32])


In [9]:
# Operations on elements within a tensor

x = torch.rand(3, 3)
print(x)

# sum() - note the axis is the axis you move across when summing
print(f"\nSum of every element of x: {x.sum()}")
print(f"Sum of the columns of x: {x.sum(axis=0)}")
print(f"Sum of the rows of x: {x.sum(axis=1)}")

print(f"\nMean value of all elements of x {x.mean()}")
print(f"Mean values of the columns of x {x.mean(axis=0)}")
print(f"Mean values of the rows of x {x.mean(axis=1)}")

tensor([[0.3628, 0.3617, 0.3136],
        [0.7388, 0.7392, 0.1745],
        [0.2801, 0.4414, 0.7089]])

Sum of every element of x: 4.121111869812012
Sum of the columns of x: tensor([1.3817, 1.5424, 1.1970])
Sum of the rows of x: tensor([1.0382, 1.6524, 1.4305])

Mean value of all elements of x 0.45790132880210876
Mean values of the columns of x tensor([0.4606, 0.5141, 0.3990])
Mean values of the rows of x tensor([0.3461, 0.5508, 0.4768])


In [10]:
# Performing matrix multiplication on Tensors

x = torch.tensor(
    [  # 3x2 matrix
        [1, 2],
        [3, 4],
        [5, 6],
    ]
)

y = torch.tensor(
    [  # 2x3 matrix
        [1, 2, 3],
        [4, 5, 6],
    ]
)

z1 = torch.matmul(x, y)  # (3x2)*(2x3)-->(3x3)
z2 = torch.matmul(y, x)  # (2x3)*(3x2)-->(2x2)

# torch.matmul is different from torch.mul - try what torch.mul does on a pair
# of(1x3) and (3x1) tensors

print(f"z1: {z1}")
print(f"z2: {z2}")

z1: tensor([[ 9, 12, 15],
        [19, 26, 33],
        [29, 40, 51]])
z2: tensor([[22, 28],
        [49, 64]])


### Manipulating Tensors

#### Indexing elements from the tensor

In [11]:
x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Index the third element of the second row using the [_, _] format
print(x[1, 2])
print(type(x[1, 2]))

# Indexing a single element returns a 1D tensor. 1DTensor.item() to get the item stored in the tensor
print(x[1, 2].item())
print(type(x[1, 2].item()))
print("\n")

# Index the first row
print(x[0])

# Index a range of elements
print(x[1:3, 0:2])

# Changing the contents of a portion of the tensor

print(f"x = {x}")
x[0, 0] = -1
x[1:, 2] = 0
x[1:, 0:2] = torch.ones(2, 2)
print(f"\nAfter assignment, x:\n{x}")

tensor(6)
<class 'torch.Tensor'>
6
<class 'int'>


tensor([1, 2, 3])
tensor([[4, 5],
        [7, 8]])
x = tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

After assignment, x:
tensor([[-1,  2,  3],
        [ 1,  1,  0],
        [ 1,  1,  0]])


#### Changing the shape of a tensor

In [12]:
x = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(f"x = {x}")

# flatten - lets you treat a multidimensional tensor as a single dimension one
y = x.flatten()
print(f"y = {y}")

# reshape - lets you treat a tensor as one with the given shape (x.view() does
# the same thing, but works only for contiguous tensors)
z = x.reshape(4, 3)
print(f"z = {z}\n")

# Important - flatten and reshape only give you a view of the original array
# That means that they all share the same memory and any changes made to one
# of reflect in the all of them

x[0, 0] = 0
print("After changing x:")
print(f"y = {y}")
print(f"z = {z}")

y[6:] = 0
print("\nAfter changing y:")
print(f"x = {x}")

x = tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
y = tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
z = tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])

After changing x:
y = tensor([ 0,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
z = tensor([[ 0,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])

After changing y:
x = tensor([[0, 2, 3, 4],
        [5, 6, 0, 0],
        [0, 0, 0, 0]])


#### Making copies of the tensor

In [13]:
x = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

y = torch.clone(x)

x[:, 0] = 0
y[0, :] = 0
print("After making the changes to x and y:")
print(f"x = {x}")
print(f"y = {y}")

# Flatten and reshape can be used on a clone of the original tensor to keep each
# of them unaffected by changes to the other

After making the changes to x and y:
x = tensor([[ 0,  2,  3,  4],
        [ 0,  6,  7,  8],
        [ 0, 10, 11, 12]])
y = tensor([[ 0,  0,  0,  0],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])


#### Adding elements to the tensor

In [14]:
# The append function does not exist in tensors as they have a fixed size, so we
# make use of the concat function to get a new tensor with additional elements

x = torch.tensor([[1, 2], [3, 4]])

y = torch.tensor([[5, 6], [6, 7]])

z = torch.tensor([[5, 6, 7], [8, 9, 0]])

# torch.cat(tensors, dim=0) takes in a tuple of tensors to concatenate, and an
# optional dim parameter to specify the dimension along which to concatenate
print(torch.cat((x, y), dim=0))
print(torch.cat((x, z), dim=1))  # try with dim=-1 and guess what that means

# Note: cat(), concat() and concatenate() are all aliases of each other, they
# work in the exact same manner

tensor([[1, 2],
        [3, 4],
        [5, 6],
        [6, 7]])
tensor([[1, 2, 5, 6, 7],
        [3, 4, 8, 9, 0]])


In [15]:
print(torch.cat((x, z), dim=-1))  # it results in the same output as dim = 1

tensor([[1, 2, 5, 6, 7],
        [3, 4, 8, 9, 0]])


### Exercises

Practice the things learnt above using the following exercises:

In [16]:
x = torch.zeros(3, 3)
print(x)

# 1. Populate x with values ranging from 0 to 8. (*** Mention shape also)

# 2. Print out the element at the second row and third column.

# 3. Change the value of the element at the first row and second column to 10.

# 4. Flatten the tensor and print the result.

# 5. Concatenate the tensor with another 1x3 tensor with values ranging from 9 to 11.

# 6. Reshape the tensor (*** which tensor) to a 2x6 tensor and print the result.

# 7. First make a copy of the tensor, and then in the copy, have the elements appear in the reverse order.

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])


## GPU vs CPU Execution



Colab gives you the option of running the notebook on a CPU or a GPU (with a limited session time). Here is how you can check whether you have a GPU available with you or not:

In [17]:
import torch
import numpy as np

print(torch.cuda.is_available())

True



Go to the Runtime tab → Change runtime type and try switching between the T4 GPU and CPU options from the Hardware Accelerator list and rerun the above cell. Switch back to GPU and rerun the cell before moving on.

*Once you change the runtime you will need to import necessary libraries again, that is the reason for the import statements repeating in the above cell.*

Let's store the information about availability of GPU inside a variable called DEVICE which we shall use later. If we want to make use of the GPU resource, when it is available, we will need to move the tensors and the model onto the GPU. We do this in the following manner:

In [18]:
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# we can specify a device when we first create our tensor
x = torch.randn(2, 2, device=DEVICE)
print("dtype of x:", x.dtype)
print("device on which x is stored:", x.device)

# we can also use the .to() method to change the device a tensor lives on
y = torch.randn(2, 2)
print(f"y before calling to() | device: {y.device} | dtype: {y.type()}")

y = y.to(DEVICE)
print(f"y after calling to() | device: {y.device} | dtype: {y.type()}")

dtype of x: torch.float32
device on which x is stored: cuda:0
y before calling to() | device: cpu | dtype: torch.FloatTensor
y after calling to() | device: cuda:0 | dtype: torch.cuda.FloatTensor


Resources used to make this tutorial: [Neuromatch Academy PyTorch Tutorial](https://deeplearning.neuromatch.io/tutorials/W1D1_BasicsAndPytorch/student/W1D1_Tutorial1.html), [PyTorch - Learn The Basics](https://pytorch.org/tutorials/beginner/basics/intro.html)