## What is PyTorch?

PyTorch is a system for executing dynamic computational graphs over Tensor objects that behave similarly as NumPy `ndarray`. It comes with a powerful **automatic differentiation engine** that removes the need for manual back-propagation.

## Why?

* Our code will now run on GPUs! Much faster training. When using a framework like PyTorch or TensorFlow you can harness the power of the GPU for your own custom neural network architectures without having to write CUDA code directly (which is beyond the scope of this class).

In PyTorch, the computational graph is built up as you execute the code, as opposed to TensorFlow where you define your graph and then run it.
In PyTorch, you create your graph by running it.

# Part 1: Tensors

Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing. In fact, tensors and NumPy arrays can often share the same underlying memory, eliminating the need to copy data (see [Bridge with NumPy](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#bridge-to-np-label)). Tensors are also optimized for automatic differentiation (we’ll see more about that later in the Autograd section). If you’re familiar with ndarrays, you’ll be right at home with the Tensor API.

In [1]:
import torch
import numpy as np

## Initializing a Tensor

Tensors can be initialized in various ways. Lets take a look at the four popular ways:

### 1. With random or constant values:

### 1a. Construct a matrix filled zeros and of dtype `long`:

In [2]:
x = torch.zeros(5, 3, dtype = torch.long) # or torch.zeros((5, 3), dtype = torch.long)
print(x)
print(type(x))

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])
<class 'torch.Tensor'>


`torch.zeros` returns a tensor filled with the scalar value 0, with the shape defined by the variable argument `size`.

You can get the shape of the tensor using `x.size()` or `x.shape`.

**Note:** `torch.Size` is in fact a tuple, so it supports all tuple operations.

In [3]:
print(x.size())
print(x.shape)
print(type(x.shape))
a, b = x.shape
print(a, b)

torch.Size([5, 3])
torch.Size([5, 3])
<class 'torch.Size'>
5 3


### 1b. Uninitialized matrix

An uninitialized matrix is declared, but does not contain definite known values before it is used. When an uninitialized matrix is created, whatever values were in the allocated memory at the time will appear as the initial values.

Construct a 5x3 matrix, uninitialized:

In [4]:
x = torch.empty(5, 3)
print(x)
print(x.shape)

tensor([[-5.6448e+18,  4.5801e-41, -5.6448e+18],
        [ 4.5801e-41,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  1.8788e+31,  1.7220e+22],
        [ 2.1715e-18,  1.3592e+22,  2.1124e+20]])
torch.Size([5, 3])


### 1c. Construct a randomly initialized matrix:

In [5]:
x = torch.rand(5, 3) # or x = torch.rand((5, 3))
print(x)
print(x.shape)

tensor([[0.0279, 0.3598, 0.4221],
        [0.8735, 0.3325, 0.8817],
        [0.9714, 0.2689, 0.0279],
        [0.8448, 0.7483, 0.7711],
        [0.5739, 0.6520, 0.1653]])
torch.Size([5, 3])


### 1d. Identity Matrix

In [6]:
x = torch.eye(5)
print(x)

tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])


These are some of the most poular ways of creating a tensor with random or constant values

### 2. Construct a tensor directly from data:

`torch.tensor(data, dtype=None, device=None, requires_grad=False pin_memory=False)` → Tensor

Constructs a tensor with `data`.

In [8]:
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(x)
print(x.size())
print(type(x))

tensor([[1, 2, 3],
        [4, 5, 6]])
torch.Size([2, 3])
<class 'torch.Tensor'>


### 3. Create a tensor based on an existing tensor.

These methods will reuse properties of the input tensor, e.g. `dtype`, unless new values are provided by user

In [9]:
x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)                                      # result has the same size

# rand_like will inherit all the attributes from its argument's tensor.
# This is true in general for any *_like() methods.
# Some of the other useful methods are torch.ones_like() and torch.zeros_like().

x_ones = torch.ones_like(x) # retains the properties of x
print(f"Ones Tensor: \n {x_ones} \n")

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[-2.4393, -1.2357, -0.8650],
        [ 1.1607,  1.7652,  2.0316],
        [ 1.9713, -0.1562,  0.1154],
        [ 1.3006,  0.6381,  0.8290],
        [ 0.6763, -0.1623, -0.3528]])
Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]) 



### 4. From a NumPy array

Tensors can be created from NumPy arrays (and vice versa - see [Bridge with NumPy](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#bridge-to-np-label)).

In [12]:
data = [[1, 2],[3, 4]]

np_array = np.array(data)

x_np = torch.from_numpy(np_array)
x_np_2 = torch.tensor(np_array)
print(x_np, "\n", x_np_2)
print(type(x_np))

tensor([[1, 2],
        [3, 4]]) 
 tensor([[1, 2],
        [3, 4]])
<class 'torch.Tensor'>


### Attributes of a Tensor

The important attributes of a Tensor are their shape, datatype, and the device on which they are stored.

In [9]:
import torch

tensor = torch.rand(3,4)

if torch.cuda.is_available():
  tensor = tensor.to("cuda")

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cuda:0


PyTorch supports several different data types for tensors, each of which can be specified using the `dtype` argument when creating a tensor. Here are some of the most commonly used data types:

* `torch.float32`: 32-bit floating-point number (float)
* `torch.float64`: 64-bit floating-point number (double)
* `torch.float16`: 16-bit floating-point number (half-precision)
* `torch.int8`: 8-bit integer (signed)
* `torch.uint8`: 8-bit integer (unsigned)
* `torch.int16`: 16-bit integer (signed)
* `torch.int32`: 32-bit integer (signed)
* `torch.int64`: 64-bit integer (signed)
* `torch.bool`: boolean (True or False)

In [4]:
a = torch.tensor([1, 2, 3])
print(a.dtype)

a = torch.tensor([1.1, 2.1111, 3.2])
print(a.dtype)

a = torch.tensor([1, 2, 3], dtype=torch.float16)
print(a.dtype)

a = torch.tensor([1, 2, 3], dtype=torch.float64)
print(a.dtype)

a = torch.tensor([1, 2, 3], dtype=torch.int16)
print(a.dtype)

a = torch.tensor([1, 2, 3], dtype=torch.int32)
print(a.dtype)

a = torch.tensor([1, 2, 3], dtype=torch.uint8)
print(a.dtype)

a = torch.tensor([1, 1, 0], dtype=torch.bool)
print(a.dtype)

torch.int64
torch.float32
torch.float16
torch.float64
torch.int16
torch.int32
torch.uint8
torch.bool


Till now we have looked at what a Tensor is in PyTorch, its important attributes and the different ways to create it.

### Operations on Tensors

Now let us look at various operations that can be done on a Tensor.

### 1. Addition

In [5]:
x = torch.ones(5, 3)
y = torch.ones(5, 3)

print(x + y) # 1st way

print(torch.add(x, y)) # 2nd way

result = torch.empty(5, 3) # 3rd way. Can provide an output tensor to store results
torch.add(x, y, out=result)
print(result)

tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]])
tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]])
tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]])


In [6]:
# 4th way. Adds 1 to y
print(y.add(1)) # also an example of "broadcasting" in PyTorch

# 5th way
y.add_(x) # Adds in-place
print(y)

tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]])
tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]])


**Note:** Any operation that mutates a tensor in-place is post-fixed with an `_`. For example:` x.copy_(y), x.t_(),` will change `x`.

### 2. Matrix Multiplication, Transpose and Inverse is similar to NumPy with slight variations.

In [7]:
m1 = torch.randn((5, 3))
m2 = torch.randn((5, 3))
print(m2.t() @ m1) # In NumPy this is equivalent to print(m2.T @ m1)
print(torch.inverse(m2.t() @ m1)) # In NumPy this is equivalent to print(inv(m2.T @ m1))
print((m2.t()).mm(m1)) # We can also use the mm() method to do matrix multiplications

tensor([[-2.1922,  2.8153, -0.9838],
        [ 1.7691, -0.5464, -1.9391],
        [ 3.3471, -1.1133, -0.6788]])
tensor([[ 0.1650, -0.2775,  0.5535],
        [ 0.4882, -0.4413,  0.5530],
        [ 0.0130, -0.6445,  0.3491]])
tensor([[-2.1922,  2.8153, -0.9838],
        [ 1.7691, -0.5464, -1.9391],
        [ 3.3471, -1.1133, -0.6788]])


You can use standard NumPy-like indexing with all bells and whistles!

In [10]:
print(x)
print(x.size())
print(x[0]) # 1st row
print(x[:, 0]) # 1st column
print(x[..., -1]) # last column
print(x[1, 2])
print(x[1][2])

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
torch.Size([5, 3])
tensor([1., 1., 1.])
tensor([1., 1., 1., 1., 1.])
tensor([1., 1., 1., 1., 1.])
tensor(1.)
tensor(1.)


### 3. Resizing: If you want to resize/reshape tensor, you can use `torch.view`:

In [11]:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


### 4. If you have a one element tensor, use `.item()` to get the value as a Python number

In [12]:
x = torch.randn(1)
print(x)
print(x.item())
# temp = torch.rand(5, 3)
# print(temp)
# print(temp.item()) -> Throws this error : ValueError: only one element tensors can be converted to Python scalars

tensor([0.3449])
0.34486258029937744


**Check out Later:** 100+ Tensor operations, including transposing, indexing, slicing, mathematical operations, linear algebra, random numbers, etc., are described [here](https://pytorch.org/docs/torch).

Some important ones which are definitely worth checking out are `squeeze()`, `unsqueeze()`, `max()`, `clamp()`, `cat()`, `stack()`.

Each of these operations can be run on the GPU (at typically higher speeds than on a CPU). If you’re using Colab, allocate a GPU by going to Runtime > Change runtime type > GPU.

By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using `.to` method (after checking for GPU availability). Keep in mind that copying large tensors across devices can be expensive in terms of time and memory!

### 5. Moving Tensor to GPU.

In [13]:
tensor = torch.rand(3,4)

if torch.cuda.is_available():
    tensor = tensor.to("cuda")

In [14]:
# A neater way of doing it
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(torch.zeros(2, 2).to(device))

tensor([[0., 0.],
        [0., 0.]], device='cuda:0')


**NOTE:** `dim` in PyTorch is similar to `axis` in NumPy.

### 6. Converting a Torch Tensor to a NumPy Array and vice-versa

In [18]:
a = torch.ones(5)
print(a)
print(a.size())

b = a.numpy()
print(b)

# Notice how the numpy array's value change
a.add_(1)
print(a)
print(b)

# the thing is ... a and b are stored in the same memory space

tensor([1., 1., 1., 1., 1.])
torch.Size([5])
[1. 1. 1. 1. 1.]
tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


In [19]:
# Converting NumPy Array to Torch Tensor
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


With this the Part 1 of this notebook is completed.

---



# Part 2: AUTOGRAD: Automatic Differentiation

The PyTorch's `autograd` package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

### Some Background

We had seen three important attributes of a tensor before: (recall them)

The two more important attributes are the `.requires_grad` and `.grad` attribute.

`torch.Tensor` is the central class of the package. If you set its attribute `.requires_grad` as `True`, it starts to track all operations on it. When you finish your computation you can call `.backward()` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into `.grad` attribute.

To stop a tensor from tracking history, you can call `.detach()` to detach it from the computation history, and to prevent future computation from being tracked.

To prevent tracking history (and using memory), you can also wrap the code block in `with torch.no_grad():`. This can be particularly helpful when evaluating a model because the model may have trainable parameters with `requires_grad=True`, but for which we don’t need the gradients.

There’s one more class which is very important for autograd implementation - a `Function`.

`Tensor` and `Function` are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a `.grad_fn` attribute that references a `Function` that has created the `Tensor` (except for Tensors created by the user - their `grad_fn is None`).

If you want to compute the derivatives, you can call `.backward()` on a `Tensor`. If `Tensor` is a scalar (i.e. it holds one element data), you don’t need to specify any arguments to `backward()`, however if it has more elements, you need to specify a `gradient` argument that is a tensor of matching shape.

In [21]:
x = torch.ones(2, 2, requires_grad = True)
print(x.requires_grad)
x.requires_grad_(False) # This sets the requires_grad attribute to True for the tensor test in place.
print(x.requires_grad)

True
False


In [22]:
# We can set the requires_grad attribute directly during the initialization itself

x = torch.ones(2, 2, requires_grad = True)
print(x)
print(x.requires_grad)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
True


Lets do a tensor operation on `x`:

In [23]:
y = x + 2
print(y)
print(y.requires_grad)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
True


Notice that even though we didn't explicitly set the `requires_grad` attribute of `y` to `True` it automatically sets since it is a resultant Tensor from another Tensor whose `requires_grad` is `True`. This means the PyTorch tracks all the operations of a Tensor once you set the `requires_grad` to `True`.

One other important attribute is `grad_fn`. It stores the operation which lead to its creation. Notice how PyTorch creates a chain of operations through this `grad_fn`.

In [24]:
print(y.grad_fn)

<AddBackward0 object at 0x7fde031f1090>


Let's do more operations on `y`

In [25]:
z = y * y * 3
print(z.grad_fn)
print(z.grad_fn.next_functions[0][0].next_functions[0][0])
out = z.mean()

print(z, out)

print(y.requires_grad)
print(z.requires_grad)
print(out.requires_grad)
# Notice that although we did not explicitly set the requires_grad attribute to
# "True" for the Tensors y, z and out, it is automatically set to True.

<MulBackward0 object at 0x7fde031f3dc0>
<AddBackward0 object at 0x7fde031f10c0>
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)
True
True
True


### Gradients

Let’s backprop now. You can access the gradients of a Tensor (if it exists) using the `.grad` attribute.

In [26]:
print(x.grad) # We haven't backpropagated yet. Hence, its None.

None


In [27]:
# Backpropagating in PyTorch is as simple as calling the
# `.backward()` method

out.backward()

This will do the `d(out)/dx` operation and stores the resulting the gradient in the `.grad` attribute of `x`

In [32]:
print(x.grad)

tensor([[0., 0.],
        [0., 0.]])


The result obtained can be verified as follows :

![forward_prop](https://drive.google.com/uc?id=1Q0AiOmNZNlFTULPNswk-csiCAZe-CoK9)

![backprop](https://drive.google.com/uc?id=1z1ynWdNc1FwAVhf565kaX39OIbV-Qmsi)

In [29]:
# You can zero the gradient buffer as follows:
x.grad.data.zero_()
print(x.grad)

tensor([[0., 0.],
        [0., 0.]])


You can also stop autograd from tracking history on Tensors with `.requires_grad=True` either by wrapping the code block in `with torch.no_grad():`

In [30]:
print(x.requires_grad) # True
print((x ** 2).requires_grad) # True

with torch.no_grad():
    print(x.requires_grad) # True
    print((x ** 2).requires_grad) # False

True
True
True
False


Or by using `.detach()` to get a new Tensor with the same content but that does not require gradients:

In [33]:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.requires_grad)
print(x.eq(y).all())

True
False
True
tensor(True)


**Check out Later:**

Documentation of `autograd.Function` can be found at https://pytorch.org/docs/stable/autograd.html#function

This concludes the first session on PyTorch.

Thank you!

# References

1. [Deep Learning with PyTorch: A 60 minute blitz, Soumith Chintala](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)

2. [Automatic Differentiation Package - TORCH.AUTOGRAD](https://pytorch.org/docs/stable/autograd.html)

3. [TORCH](https://pytorch.org/docs/stable/torch.html)

4. [Stefan Otte: Deep Neural Networks with PyTorch | PyData Berlin 2018](https://www.youtube.com/watch?v=_H3aw6wkCv0&t=821s)

5. [CS231n: Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/)

6. [Tensor Attributes](https://pytorch.org/docs/stable/tensor_attributes.html)

7. [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor)