## Outline
1. PyTorch
2. What are tensors
3. Initialising, slicing, reshaping tensors
4. Numpy and PyTorch interfacing
5. GPU support for PyTorch + Enabling GPUs on Google Colab
6. Speed comparisons, NumPy - PyTorch - PyTorch on GPU
7. Autograd concepts and application
8. Writing basic learning loop using autograd
9. Exercises

In [0]:
import torch
import numpy as np
import matplotlib.pyplot as plt

All operations very similar to NumPy

# Initialise tensors

Initialise using in-built functions

In [0]:
x = torch.ones(3,2)
print(x)
x = torch.zeros(3,2)
print(x)
x = torch.rand(3,2)
print(x)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
tensor([[0.9766, 0.0918],
        [0.1453, 0.3810],
        [0.4336, 0.8751]])


In [0]:
x = torch.empty(3,2)
# Creates space but doesn't initialise values in it. So just has the values which were there initially in memory (could be NaN too)
print(x)
y = torch.zeros_like(x)
print(y)

tensor([[2.1517e-36, 0.0000e+00],
        [3.3631e-44, 0.0000e+00],
        [       nan, 0.0000e+00]])
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])


In [0]:
x = torch.linspace(0,1, steps = 5)
# Include 0 and 1
print(x)

tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])


Initialising manually

In [0]:
x = torch.tensor([[1,2],
                  [3,4],
                  [5,6]])
print(x)

tensor([[1, 2],
        [3, 4],
        [5, 6]])


# Slicing tensors

In [0]:
print(x.size()) # like x.shape in numpy
print(x[:,1])
print(x[0,:])

torch.Size([3, 2])
tensor([2, 4, 6])
tensor([1, 2])


In [0]:
y = x[1,1]
print(y) # y is still of type tensor
print(y.item()) # To get numerical/scalar value of a tensor

tensor(4)
4


# Reshaping tensors

In [0]:
print(x)
y = x.view(2,3) # Similar to numpy.reshape()
print(y)

tensor([[1, 2],
        [3, 4],
        [5, 6]])
tensor([[1, 2, 3],
        [4, 5, 6]])


In [0]:
y = x.view(6, -1) # when we want particularly one dimension as we want (6) and it can fix the other dimension suitably to fit all items itself (-1)
print(y)

tensor([[1],
        [2],
        [3],
        [4],
        [5],
        [6]])


Mismatch in tensor dimensions biggest bug in tensor-world/DL. And knowing what each axis means (input/batch/weight)

# Simple Tensor Operations

In [0]:
# Pointwirse Operations
x = torch.ones([3,2])
y = torch.ones([3,2])
z = x + y
print(z)
z = x-y
print(z)
z = x*y
print(z)

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])


In [0]:
z = y.add(x)
print(z)
print(y)

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])


In [0]:
z = y.add_(x) # Modifies y too (_ => modify-in-place). Adds x to y. z modified as well
print(z)
print(y)

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])
tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])


Useful if don't want to generate new tensors everytime

# Numpy <> PyTorch

Converting PyTorch to NumPy array

In [23]:
# We initialised x above as a numpy 3x2 ones
x_np = x.numpy()
print(type(x), type(x_np))
print(x_np)

<class 'torch.Tensor'> <class 'numpy.ndarray'>
[[1. 1.]
 [1. 1.]
 [1. 1.]]


Converting NumPy array to PyTorch tensor

In [24]:
a = np.random.randn(5)
print(a)
a_pt = torch.from_numpy(a)
print(type(a), type(a_pt))
print(a_pt)

[ 0.29573148 -0.15142621  1.43368774  0.96222957 -0.51295352]
<class 'numpy.ndarray'> <class 'torch.Tensor'>
tensor([ 0.2957, -0.1514,  1.4337,  0.9622, -0.5130], dtype=torch.float64)


In [25]:
np.add(a, 1, out=a)
print(a)
print(a_pt)

[1.29573148 0.84857379 2.43368774 1.96222957 0.48704648]
tensor([1.2957, 0.8486, 2.4337, 1.9622, 0.4870], dtype=torch.float64)


Though we added only to a, a_pt got updated too. Since we just did bridging, not copying. Therefore change in one affects the other. They refer the same underlying numerical store. Very useful if you want to perform operations suitable to one of the types. Or if there is some code in NumPy, can easily transform it to tensor, operate and we'll still be able to use the code on it.

In [26]:
%%time
for i in range(100):
  a = np.random.randn(100, 100)
  b = np.random.randn(100, 100)
  c = a + b

CPU times: user 102 ms, sys: 105 µs, total: 103 ms
Wall time: 106 ms


In [27]:
%%time
for i in range(100):
  a = torch.randn([100,100])
  b = torch.randn([100,100])
  c = a + b

CPU times: user 21 ms, sys: 872 µs, total: 21.8 ms
Wall time: 26.5 ms


Still CPU, not GPU

In [29]:
%%time
for i in range(100):
  a = np.random.randn(100, 100)
  b = np.random.randn(100, 100)
  c = np.matmul(a, b)

CPU times: user 163 ms, sys: 117 ms, total: 279 ms
Wall time: 155 ms


In [28]:
%%time
for i in range(100):
  a = torch.randn([100,100])
  b = torch.randn([100,100])
  c = torch.matmul(a,b)

CPU times: user 24.1 ms, sys: 1.09 ms, total: 25.1 ms
Wall time: 84.1 ms


In [32]:
%%time
for i in range(10):
  a = np.random.randn(1000, 1000)
  b = np.random.randn(1000, 1000)
  c = a+b

CPU times: user 965 ms, sys: 10.4 ms, total: 976 ms
Wall time: 978 ms


In [34]:
%%time
for i in range(10):
  a = torch.randn([1000,1000])
  b = torch.randn([1000,1000])
  c = a + b

CPU times: user 169 ms, sys: 887 µs, total: 169 ms
Wall time: 173 ms


Huge improvement from Python to Numpy.  
Then additional huge improvement from NumPy to PyTorch.  
And now amazingly more using Cuda

# CUDA support

CUDA is language extension by NVidia to support programming GPUs directly.  
Cuda extension for C and Photon.

In [4]:
# Check if we have GPU in our system (first include : Edit > Notebook Settings)
# RAM also increases (larger hard disk)
print(torch.cuda.device_count())

1


In [5]:
print(torch.cuda.device(0))
print(torch.cuda.get_device_name(0))

<torch.cuda.device object at 0x7f4f92793908>
Tesla P100-PCIE-16GB


In [0]:
# We got to know that there is a GPU at location 0, so mention 0
cuda0 = torch.device('cuda:0')

In [7]:
a = torch.ones(3, 2, device = cuda0)
b = torch.ones(3, 2, device = cuda0)
c = a + b
print(c)

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]], device='cuda:0')


In [8]:
print(a)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], device='cuda:0')


In [9]:
%%time
for i in range(10):
  a_cpu = torch.randn([10000,10000])
  b_cpu = torch.randn([10000,10000])
  b_cpu.add(a_cpu)

CPU times: user 14.2 s, sys: 484 ms, total: 14.6 s
Wall time: 14.7 s


In [10]:
%%time
for i in range(10):
  a_gpu = torch.randn([10000,10000],device = cuda0)
  b_gpu = torch.randn([10000,10000],device = cuda0)
  b_gpu.add(a_gpu)

CPU times: user 555 µs, sys: 2.06 ms, total: 2.61 ms
Wall time: 10.3 ms


1000 times improvement!!!

In [11]:
%%time
for i in range(100):
  a_cpu = torch.randn([10000,10000])
  b_cpu = torch.randn([10000,10000])
  torch.matmul(a_cpu,b_cpu)

CPU times: user 22min 33s, sys: 1.05 s, total: 22min 34s
Wall time: 22min 34s


In [12]:
%%time
for i in range(100):
  a_gpu = torch.randn([10000,10000], device = cuda0)
  b_gpu = torch.randn([10000,10000], device = cuda0)
  torch.matmul(a_gpu,b_gpu)

CPU times: user 7 ms, sys: 2 ms, total: 9 ms
Wall time: 13.9 ms


22 minutes versus 13 ms !!!

# Autograd

In [13]:
x = torch.ones([3,2], requires_grad = True) # Telling PyTorch x is something which could be differentiated against
print(x)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], requires_grad=True)


In [14]:
y = x + 5
print(y)

tensor([[6., 6.],
        [6., 6.],
        [6., 6.]], grad_fn=<AddBackward0>)


In [15]:
z = y*y + 1
print(z)

tensor([[37., 37.],
        [37., 37.],
        [37., 37.]], grad_fn=<AddBackward0>)


In [16]:
# Similar to forward pass where y is func of x, z is func of y, t is func of z
t = torch.sum(z)
print(t)

tensor(222., grad_fn=<SumBackward0>)


In [0]:
# Now do backward pass since t is func of x
t.backward()

In [19]:
# Prints the Derivative of t w.r.t. x
print(x.grad)

tensor([[12., 12.],
        [12., 12.],
        [12., 12.]])


The derivative is 2yi*1  
Here xi=1 (first matrix) and yi=6 (second matrix)  
Hence x.gradi = 12

In [21]:
x = torch.ones([3,2], requires_grad = True)
y = x+5
r = 1/(1+torch.exp(-y))
print(r)
s = torch.sum(r) # Adds all values of r => tensor with single value in it
print(s)
s.backward()
print(x.grad)

tensor([[0.9975, 0.9975],
        [0.9975, 0.9975],
        [0.9975, 0.9975]], grad_fn=<MulBackward0>)
tensor(5.9852, grad_fn=<SumBackward0>)
tensor([[0.0025, 0.0025],
        [0.0025, 0.0025],
        [0.0025, 0.0025]])


We so far called artificially summed r in s. And then called backward on it.  
Instead can do r.backward() directly but error.  
Since whenever you are doing x.backward() where x has multiple values, backward must have an argument.  


In [22]:
x = torch.ones([3,2], requires_grad = True)
y = x+5
r = 1/(1+torch.exp(-y))
print(r)
a = torch.ones([3,2])#Same size as r
r.backward(a)
print(x.grad)

tensor([[0.9975, 0.9975],
        [0.9975, 0.9975],
        [0.9975, 0.9975]], grad_fn=<MulBackward0>)
tensor([[0.0025, 0.0025],
        [0.0025, 0.0025],
        [0.0025, 0.0025]])


Same result

r.backward() is computing grad of r w.r.t. x  
But we are multiplying result pointwise with a (here ones, so value remains same as grad of r w.r.t. x  
Why required : to cascade chain rule through multiple functions.  
e.g. ds/dx = ds/dr * dr/dx  
r.backward is dr/dx  
And the a represents ds/dr which we don't have here. (later)  
s represents a tensor coming from up

## Autograd example that looks like what we have been doing