In [1]:
import torch
from time import time
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm


An open source machine learning framework that accelerates the path from research prototyping to production deployment [PyTorch](https://pytorch.org/). To check the installation instructions check [PyTorch](https://pytorch.org/). We will import PyTorch and use it in todya's lab to implement simple and convoutional neural networks.


Let's start by checking the version of PyTorch

In [2]:
print("Using torch", torch.__version__)

Using torch 1.13.1+cpu


In several codes you will see the manaul_seed setup of PyTorch. This is done to ensure reproducibility of the code. Run the below code multiple times with and without the manual seed setup and you will that the random number geneator produces same random numbers every time when the seed is set.

In [6]:
#torch.manual_seed(1234)
torch.randn(2)

tensor([0.0461, 0.4024])

The basic data structure for PyTorch (and for that matter most other deep learning libraries) is Tensor. The simplest definition of a Tensor is "Tensor is a multi-dimensional matrix".

There are multiple ways of defining tensors in PyTorch below we will show two such methods. Run the code and see what tensors are produced. For ease of notation I have used {variable name}_t to represent tensor variables in the code

In [None]:
# Creat Pytorch Tensor

In [None]:
x=[[1,0],[0,1]]
x_t=torch.Tensor(x)
print(x_t)
print(x_t.shape)

print(x_t.dtype)
x_t = x_t.int()
print(x_t.dtype)


x=np.random.rand(3,3)
x_t=torch.from_numpy(x)
print(x_t)

tensor([[1., 0.],
        [0., 1.]])
torch.Size([2, 2])
torch.float32
torch.int32
tensor([[0.8042, 0.1620, 0.9258],
        [0.0756, 0.9674, 0.1743],
        [0.5089, 0.7478, 0.2859]], dtype=torch.float64)


Tensors in PyTorch have several useful attributes (properties). Some of these attributes are shape, requires_grad, dtype, and device. shape attributes shows the shape of the Tensor, similar to the shape of a matrix. requries_grad is a flag that shows weather the gradients w.r.t the tensor are calculated or not. If requires_grad is True it will mean that PyTorch will keep of the gradient of the [output] w.r.t. the Tensor varaiable. dtype shows the data type of the tensor. device shows the device on which the tensor resides cpu or cuda (GPU). We will discuss why this is an important feature of the Tensor and how it helps us to speed up our codes.

In [None]:
# Tensor as an Object

In [None]:
x_t.requires_grad = True
print(x_t.requires_grad)

True


In [None]:
print(x_t.requires_grad)
x_t.requires_grad = True
print(x_t.requires_grad)
print(x_t.device)

x_t = x_t.to("cuda")
print(x_t.device)

True
True
cpu
cuda:0


In [None]:
print(x_t.shape)
print(x_t.requires_grad)
print(x_t.dtype)
print(x_t.device)

x_t=torch.from_numpy(x)
x_t.requires_grad = True

print(x_t.requires_grad)

torch.Size([3, 3])
True
torch.float64
cuda:0
True


GPU (Graphics Processing Unit) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. [GPUs](https://en.wikipedia.org/wiki/Graphics_processing_unit#:~:text=A%20graphics%20processing%20unit%20(GPU,%2C%20workstations%2C%20and%20game%20consoles.)


GPUs are extremely beneficial in running matrix operations and as we will most of deep learning is composed of matrix opearations, we can significantly speed up our codes by using GPUs.

There are several methods and attributes in PyTorch to manage the device on which Tensors can reside. Below we look at a few.

GPUs are specialized expensive piece of equipment and might not be available on all machines. To check if a GPU is available on a machine we can use torch.cuda.is_available() method which returns a True value if a GPUs is available.

Note: To use a GPU on Google Colab you can select the option under Edit->NoteBook Settings.


To move Tensor between CPU and GPU we can use Tensor.to(...) method.

In [None]:
# Devices: GPU or CPU

In [None]:
if torch.cuda.is_available():
  print("I have a GPU in this machine")
else:
  print("No GPU is available in this machine")

I have a GPU in this machine


In [None]:
print(torch.cuda.device_count())

1


In [None]:
!nvidia-smi

Wed Jun  7 10:28:44 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P8     9W /  70W |      3MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
if torch.cuda.is_available():
  device = torch.device("cuda:0")

else:
  device = torch.device("cpu")

x_t = x_t.to(device)
print(x_t.device)

cuda:0


In [None]:
x_t = x_t.to("cpu")
print(x_t.device)

cpu


In [None]:
print(torch.cuda.is_available())

True


In [None]:
x_t = x_t.to("cuda")
print(x_t.device)
print(x_t*5)

x_w = x_t.to("cuda:0")
print(x_w*5)

cuda:0
tensor([[3.5955, 3.5288, 0.9141],
        [4.0810, 4.3005, 4.8160],
        [4.0865, 2.4850, 1.5268]], device='cuda:0', dtype=torch.float64,
       grad_fn=<MulBackward0>)
tensor([[3.5955, 3.5288, 0.9141],
        [4.0810, 4.3005, 4.8160],
        [4.0865, 2.4850, 1.5268]], device='cuda:0', dtype=torch.float64,
       grad_fn=<MulBackward0>)


In [None]:
print(torch.cuda.is_available())

print(torch.cuda.device_count())
if torch.cuda.is_available():
  device=torch.device("cuda:0")
else:
  device=torch.device("cpu")


print(device)
device=torch.device("cpu")
x_t_cpu=x_t.to(device)
print(x_t_cpu.device)
device=torch.device("cuda")
x_t_gpu=x_t.to(device)
print(x_t_gpu.device)

True
1
cuda:0
cpu
cuda:0


In [None]:
device=torch.device("cpu")
x_t_cpu=x_t.to(device)
print(x_t_cpu.device)
device=torch.device("cuda")
x_t_gpu=x_t.to(device)
print(x_t_gpu.device)

cpu
cuda:0


In [None]:
use_gpu=True
if use_gpu:
  device=torch.device("cuda:0")
else:
  device=torch.device("cpu")

We have dedicated functions in PyTorch to create tensors of particular types. Some of these are provided below. torch.zeros, creates a tensor of zeors, torch.ones, crates a tensor of ones, torch.eye creats an indentity matrix, torch.rand creats a tensor or random values sampled fron uniform distribution, torch.randn creates a tensor of random values samples from unit normal (gaussian distribution), torch.arange(N) ceates whole numbers till N-1

In [None]:
# Create Common Tensors

In [None]:
x = torch.rand((1,3))
print(x)

tensor([[0.3904, 0.6009, 0.2566]])


In [None]:
x=torch.zeros(3,3)
print(x)
x=torch.ones(2,2)
print(x)
x=torch.eye(2)
print(x)
x=torch.rand(2,1)
print(x)
x=torch.randn((3,2))
print(x)
x=torch.arange(5)
print(x)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
tensor([[1., 1.],
        [1., 1.]])
tensor([[1., 0.],
        [0., 1.]])
tensor([[0.7936],
        [0.9408]])
tensor([[ 1.5231,  0.6647],
        [-1.0324, -0.2770],
        [-0.1671, -0.1079]])
tensor([0, 1, 2, 3, 4])


Tensor opeartions:
Most numpy operations are also available in PyTorch. opeartaions like addition (+), subtraction (-), mulitplication (*), divistion (/), matrix multiplication (@), and power(***).

In [None]:
# Tensor operations

In [None]:
x=torch.randn((3,3))
y=torch.eye(3)
print(x,y)

zsum=x+y
zdiff=x-y
zprod=x*y
zdiv=y/x
# @ is read as at
zmatmul=x@y
zpow=x**y

print(f"element wise of the sum of the two tensors is{zsum}")
print(f"element wise of the diff of the two tensors is{zdiff}")
print(f"element wise of the prod of the two tensors is{zprod}")
print(f"element wise of the div of the two tensors is{zdiv}")
print(f"element wise of the pow of the two tensors is{zpow}")
print(f"matrix multiplicaiton of the two tensors is{zmatmul}")



tensor([[-1.4285, -0.2810,  0.7489],
        [ 1.1164,  1.2931,  0.4137],
        [-0.5710, -0.9749,  0.1863]]) tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])
element wise of the sum of the two tensors istensor([[-0.4285, -0.2810,  0.7489],
        [ 1.1164,  2.2931,  0.4137],
        [-0.5710, -0.9749,  1.1863]])
element wise of the diff of the two tensors istensor([[-2.4285, -0.2810,  0.7489],
        [ 1.1164,  0.2931,  0.4137],
        [-0.5710, -0.9749, -0.8137]])
element wise of the prod of the two tensors istensor([[-1.4285, -0.0000,  0.0000],
        [ 0.0000,  1.2931,  0.0000],
        [-0.0000, -0.0000,  0.1863]])
element wise of the div of the two tensors istensor([[-0.7000, -0.0000,  0.0000],
        [ 0.0000,  0.7733,  0.0000],
        [-0.0000, -0.0000,  5.3664]])
element wise of the pow of the two tensors istensor([[-1.4285,  1.0000,  1.0000],
        [ 1.0000,  1.2931,  1.0000],
        [ 1.0000,  1.0000,  0.1863]])
matrix multiplicaiton of the two t

We can use Tensor1 [operation]= Tensor2 for in place operations on Tensor1.

Several PyTorch functions also have an inplace version with the function name appended with and underscore.

In [None]:
#pytorch also has inplace operations

x+=y #x=x+y
x-=y
y*=x
y/=x
#x**=y.  x=x**y
x@=y


x.clamp(-0.5,0.5)
print(x)
x.clamp_(-0.5,0.5)
print(x)

tensor([[-1.4285, -0.2810,  0.7489],
        [ 1.1164,  1.2931,  0.4137],
        [-0.5710, -0.9749,  0.1863]])
tensor([[-0.5000, -0.2810,  0.5000],
        [ 0.5000,  0.5000,  0.4137],
        [-0.5000, -0.5000,  0.1863]])


PyTorch uses same indexing and slicing conventians as numpy

In [None]:
# Indexing and slicing Tensor, just like Numpy

In [None]:

x=torch.Tensor([[1,2,3],[4,5,6],[7,8,9]])
print(x)
print(x.shape)
#an element can be access by specifying the row and column
print(x[1,1])
# acessing rows and columns
print(x[1,:], x[:,1])
print(x[1,:].shape, x[:,1].shape)
# acessing rows and columns and keeping the shape
print(x[1:2,:], x[:,1:2])
print(x[1:2,:].shape, x[:,1:2].shape)


tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
torch.Size([3, 3])
tensor(5.)
tensor([4., 5., 6.]) tensor([2., 5., 8.])
torch.Size([3]) torch.Size([3])
tensor([[4., 5., 6.]]) tensor([[2.],
        [5.],
        [8.]])
torch.Size([1, 3]) torch.Size([3, 1])


In [None]:

x=torch.Tensor([[1,2,3],[4,5,6],[7,8,9]])
print(x)


print(x[1:,1:])


tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
tensor([[5., 6.],
        [8., 9.]])


Some Tensor operations in PyTorch are:
concateation, and reshaping.

In [None]:

z=torch.cat((x,y), dim=1)
print(z)

z2=z.reshape(9,2)
print(z2)
z2=torch.reshape(z,(9,2))
print(z2)

tensor([[1., 2., 3., 1., 0., 0.],
        [4., 5., 6., 0., 1., 0.],
        [7., 8., 9., 0., 0., 1.]])
tensor([[1., 2.],
        [3., 1.],
        [0., 0.],
        [4., 5.],
        [6., 0.],
        [1., 0.],
        [7., 8.],
        [9., 0.],
        [0., 1.]])
tensor([[1., 2.],
        [3., 1.],
        [0., 0.],
        [4., 5.],
        [6., 0.],
        [1., 0.],
        [7., 8.],
        [9., 0.],
        [0., 1.]])


removing a unit dimension through squeeze, adding a unit dimension through unsqueeze.

In [None]:

x=torch.rand((1,2,3,1,5))
print(x.shape)

x_u=x.unsqueeze(dim=0)
print(x_u.shape)

x.unsqueeze_(dim=1)
print(x.shape)



print(x.shape)
x_s=x.squeeze()
print(x_s.shape)

torch.Size([1, 2, 3, 1, 5])
torch.Size([1, 1, 2, 3, 1, 5])
torch.Size([1, 1, 2, 3, 1, 5])
torch.Size([1, 1, 2, 3, 1, 5])
torch.Size([2, 3, 5])


In [None]:
# GPU vs CPU

Why use GPU?
Example of matrix multiplication on CPU and GPU.

Important points to consdier:
Always use vector opeartions
Always use PyTorch functions
When possible always use GPUs for matrix operations

In [None]:
# use vectorized operations and broadcasting
# use GPU for vector/matrix operations
siz=1000

x=torch.randn((siz,siz))
y=torch.randn((siz,siz))
z=torch.zeros((siz,siz))
st=time()
for i in range(siz):
  for j in range(siz):
    z[i,j]=torch.dot(x[i,:],y[:,j])

ed=time()
print(f"time taken by the nested loop multiplciation is {ed-st} seconds")
st=time()
for i in range(siz):
  z[i,:]=torch.mm(x[i,:].unsqueeze(dim=0),y)

ed=time()
print(f"time taken by the single loop multiplciation is {ed-st} seconds")
st=time()
z=torch.mm(x,y)

ed=time()
print(f"time taken by  multiplciation on cpu is {ed-st} seconds")

device=torch.device("cuda:0")
x=x.to(device)
y=y.to(device)
z=z.to(device)
st=time()
z=torch.mm(x,y)
ed=time()
print(f"time taken by multiplciation on GPU is {ed-st} seconds")


time taken by the nested loop multiplciation is 23.62109661102295 seconds
time taken by the single loop multiplciation is 0.16016221046447754 seconds
time taken by  multiplciation on cpu is 0.03321266174316406 seconds
time taken by multiplciation on GPU is 0.1290431022644043 seconds


Autograd example with PyTorch

In [None]:
# Get derivatives automatically

In [None]:
x=torch.randn((1,1), requires_grad=True)
y=torch.Tensor([5.0])
y.requires_grad=True

z=2*x+y*y
# dz/dx = 2
# dz/dy = 2*y

print(x.grad)
print(x.grad_fn)
print(x.data)

z.backward()
print("After backpass")
print("dz/dx")
print(x.grad)
print(y.grad_fn)
print(x.data)


print(y.grad)
print(y.grad_fn)
print(y.data)

None
None
tensor([[1.6273]])
After backpass
dz/dx
tensor([[2.]])
None
tensor([[1.6273]])
tensor([10.])
None
tensor([5.])


In [None]:
x=torch.randn((1,1), requires_grad=True)
y=torch.Tensor([5.0])
y.requires_grad=True

z=2*x+torch.sin(y)
z.backward()
print(f"dz/dx is {x.grad}")
print(f"dx/dy is {y.grad}")

dz/dx is tensor([[2.]])
dx/dy is tensor([0.2837])


In [None]:
x=torch.randn((1,1), requires_grad=True)
y=torch.Tensor([5.0])
y.requires_grad=True

z=2*x+y*y*torch.exp(-y)

z.backward()
print("After backpass")
print("dz/dx")
print(x.grad)
print("dz/dy")
print(y.grad)


After backpass
dz/dx
tensor([[2.]])
dz/dy
tensor([-0.1011])


In [None]:
x=torch.randn((1,1), requires_grad=True)
y=torch.Tensor([5.0])
y.requires_grad=True

z=2*x+y*y
# dz/dx = 2
# dz/dy = 2*y

print(x.grad)
print(x.grad_fn)
print(x.data)
z.backward()
print("After backpass")
print("dz/dx")
print(x.grad)
print(z.grad_fn)
print(x.data)

print("dz/dy")
print(y.grad)
print(z)
print(y.data)
print(z.requires_grad)
with torch.no_grad():
  z=2*x+y*y
  print(z.requires_grad)


#detaching variable and coverting back to numpy
x=x.detach().cpu().numpy()
print(type(x))

None
None
tensor([[0.0106]])
After backpass
dz/dx
tensor([[2.]])
<AddBackward0 object at 0x7f0404c33c10>
tensor([[0.0106]])
dz/dy
tensor([10.])
tensor([[25.0212]], grad_fn=<AddBackward0>)
tensor([5.])
True
False
<class 'numpy.ndarray'>
