<a href="https://colab.research.google.com/github/sachin-rastogi/workshops/blob/master/PyTorch/pytorch_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![PyTorch Logo](https://raw.githubusercontent.com/pytorch/pytorch/master/docs/source/_static/img/pytorch-logo-dark.png)

What is PyTorch?  
===============
It’s a Python-based scientific computing package targeted at following sets of audiences:

- A replacement for NumPy to use the power of GPUs
- A deep learning research platform that provides maximum flexibility and speed
- A Deep neural networks built on a tape-based autograd system


At a granular level, PyTorch is a library that consists of the following components:

| Component | <div style="text-align: justify"> Description </div>|
| ---- | --- |
| [**torch**](https://pytorch.org/docs/stable/torch.html) | <div style="text-align: justify"> A Tensor library like NumPy, with strong GPU support </div>|
| [**torch.utils**](https://pytorch.org/docs/stable/data.html) | <div style="text-align: justify"> DataLoader and other utility functions for convenience </div>|
| [**torch.nn**](https://pytorch.org/docs/stable/nn.html) | <div style="text-align: justify"> A neural networks library deeply integrated with autograd designed for maximum flexibility </div>|
| [**torch.autograd**](https://pytorch.org/docs/stable/autograd.html) | <div style="text-align: justify"> A tape-based automatic differentiation library that supports all differentiable Tensor operations in torch. PyTorch uses it to calculate gradients for training neural networks. Autograd, does all the work of backpropagation by calculating the gradients at each operation in the network which we can then use to update the network weights. </div>|
| [**torch.jit**](https://pytorch.org/docs/stable/jit.html) | <div style="text-align: justify"> A compilation stack (TorchScript) to create serializable and optimizable models from PyTorch code  |
| [**torch.multiprocessing**](https://pytorch.org/docs/stable/multiprocessing.html) | <div style="text-align: justify"> Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training </div>|


A GPU-Ready Tensor Library
------------------------------------------
If you use NumPy, then you have used Tensors (a.k.a. ndarray).

![Tensor illustration](https://raw.githubusercontent.com/pytorch/pytorch/master/docs/source/_static/img/tensor_illustration.png)

PyTorch provides Tensors that can live either on the CPU or the GPU, and accelerates the
computation by a huge amount.

We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs
such as slicing, indexing, math operations, linear algebra, reductions.
And they are fast!


# Basic introduction to PyTorch  

We'll cover tensors - the main data structure of PyTorch. 
- how to create tensors, 
- how to do simple operations, and 
- how tensors interact with NumPy.


## Tensors

**Tensors** a generalization of matrices. 
- A vector is a 1-dimensional tensor, 
- a matrix is a 2-dimensional tensor, 
- an array with three indices is a 3-dimensional tensor (RGB color images for example).  

The basic data structure for neural networks are tensors and PyTorch (as well as pretty much every other deep learning framework) is built around tensors.  

*A neural network computations are just a bunch of linear algebra operations on tensors*.

<img src="assets/tensor_examples.svg" width=600px>



**Tensors** are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from torchvision import datasets, transforms
from IPython.display import display, Math, Latex

In [None]:
torch.__version__

**Tensor Parameters:**  

- size..................... : Size of the tensor.   
- out=None................. : Providing an output tensor as argument.
- dtype=None............... : A torch.dtype is an object that represents the data type of a torch.Tensor.  
- layout=torch.strided..... : A torch.layout is an object that represents the memory layout of a torch.Tensor. Currently, we support torch.strided (dense Tensors) and have beta support for torch.sparse_coo (sparse COO Tensors).   
- device=None.............. : A torch.device is an object representing the device on which a torch.Tensor is or will be allocated.  
- requires_grad=False...... :  If autograd should record operations on the returned tensor. I.e. It should be True if gradients need to be computed for this Tensor, False otherwise.  

More read : https://pytorch.org/docs/stable/tensor_attributes.html

`torch.Tensor` is an alias for the default tensor type (`torch.FloatTensor`).

A tensor can be constructed from a Python list or sequence using the `torch.tensor()` constructor:

In [None]:
torch.tensor([1, 2, 3], dtype=torch.float32)

In [None]:
# Construct a 5x3 matrix, uninitialized: When an uninitialized matrix is created, whatever values were in 
# the allocated memory at the time will appear as the initial values.

x = torch.empty(5, 3)
print(x, x.dtype)

In [None]:
# Construct a randomly initialized matrix:

x = torch.rand(5, 3)
print(x)

In [None]:
# Returns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1

xn = torch.randn(50,10, dtype=torch.float32)
print(torch.std_mean(xn))

In [None]:
# Construct a matrix filled zeros and of dtype long:

x = torch.zeros(5, 3, dtype=torch.long)
print(x)

In [None]:
# Construct a matrix filled ones and of dtype long:

x = torch.ones(5, 3, dtype=torch.long)
print(x)

In [None]:
# Construct a tensor directly from data:

x = torch.tensor([5.5, 3])
print(x)

In [None]:
# - Create a tensor based on an existing tensor. 
# - These methods will reuse properties of the input tensor,
# e.g. dtype, unless new values are provided by user

x = x.new_ones(5, 3, dtype=torch.int16)      # Changing an existing tensor, new_* methods take in sizes
print(x)

y = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(y)

In [None]:
#Get its size: Use size or shape

print(x.size(), y.shape)

In [None]:
# If we have an one element tensor, we can use .item() to get the value as a Python number

x1 = torch.randn(1)
print(x1, type(x1))

print(x1.item(), type(x1.item()))

In [None]:
# Operations
# There are multiple syntaxes for operations. 
# Addition: syntax 1

print(x + y)

In [None]:
# Addition: syntax 2

print(torch.add(x, y))

In [None]:
# Addition: providing an output tensor as argument

result = torch.empty(5, 3) #define output tensor

torch.add(x, y, out=result)
print(result)

In [None]:
# Addition: in-place
# adds x to y
# Any operation that mutates a tensor in-place is post-fixed with an _. 
# For example: x.copy_(y), x.t_(), will change x.

y.add_(x)
print(y)

In [None]:
# Subtraction
a = np.array(5)
b = torch.tensor([2, 4, 6, 8, 10])

print(torch.is_tensor(a), " ", torch.is_tensor(b))

# Subtraction
print("Subtraction: {}\n".format(b.sub(b)))

# Element wise multiplication
print("Element wise multiplication: {}\n".format(torch.mul(b,b)))

# Element wise division
print("Element wise division: {}\n".format(torch.true_divide(b,b)))

In [None]:
# Mean
tensor = torch.Tensor([1,2,3,4,5])
print("Mean: {}".format(tensor.mean()))

# Standart deviation (std)
print("std: {}".format(tensor.std()))

In [None]:
# CUDA Tensors
# Tensors can be moved onto any device using the .to method.

# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU

if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print("\n")
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!
    
    # Mix and Match
#     y1 = torch.ones_like(x, device=device)
#     y2 = torch.ones_like(x, device="cpu")
#     z1 = y1 + y2
#     print(z1) # error

In [None]:
# TORCH.IS_FLOATING_POINT

c = torch.randn(5,3)
print(c, "\n", torch.is_floating_point(c))

In [None]:
# TORCH.IS_NONZERO

print(torch.is_nonzero(torch.tensor([0.])))
print("\n")
print(torch.is_nonzero(torch.tensor([1.5])))
print("\n")
print(torch.is_nonzero(torch.tensor([False])))
print("\n")
#torch.is_nonzero(torch.tensor([1, 3, 5])) # RuntimeError: bool value of Tensor with more than one value is ambiguous

In [None]:
# TORCH.ARANGE
# torch.arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

print(torch.arange(5))
print(torch.arange(1, 4))
print(torch.arange(1, 2.5, 0.5))

In [None]:
# TORCH.GET_NUM_THREADS
# Returns the number of threads used for parallelizing CPU operations

torch.get_num_threads()

In [None]:
# TORCH.SET_NUM_THREADS
# Sets the number of threads used for intraop parallelism on CPU. 
# WARNING: To ensure that the correct number of threads is used, set_num_threads must be called before running 
# eager, JIT or autograd code.

torch.set_num_threads(16)

In [None]:
# TORCH.SIGMOID
# Returns a new tensor with the sigmoid of the elements of input.
display(Math(r'\text{out}_{i} = \frac{1}{1 + e^{-\text{input}_{i}}}'))

a = torch.randn(4)
print("Input Tensor :", a, "\n Output Tensor :", torch.sigmoid(a))

In [None]:
# TORCH.ARGMAX
# Returns the indices of the maximum value of all elements in the input tensor.
# This is the second value returned by torch.max().

a = torch.randn(4, 4)
print(a, "\n", torch.argmax(a), "\n", a.dtype)


In [None]:
# dim=1 i.e. across rows, dim=0 i.e. across columns
torch.argmax(a, dim=1) # Across rows

In [None]:
z = torch.argmax(a, dim=1, keepdim=True) # Across rows
print(z, "\n", z.size())

In [None]:
torch.argmax(a, dim=0) # Across columns

In [None]:
# TORCH.MAX
# Returns the maximum value of all elements in the input tensor.

print(a, "\n", torch.max(a))

In [None]:
# TORCH.MAX 
# Across rows
torch.max(a, dim=1)

In [None]:
# TORCH.MANUAL_SEED
# Sets the seed for generating random numbers. Returns a torch.Generator object.

torch.manual_seed(1234)
print(torch.rand(2))

In [None]:
a[3,3], a[:,2]

### Numpy to Torch and back  
PyTorch has a great feature for converting between Numpy arrays and Torch tensors. To create a tensor from a Numpy array, use torch.from_numpy(). To convert a tensor to a Numpy array, use the .numpy() method.

In [None]:
# NumPy Bridge
# The Torch Tensor and NumPy array will share their underlying memory locations (if the Torch Tensor is on CPU),
# and changing one will change the other.
# Converting a Torch Tensor to a NumPy Array

a = torch.ones(5)
b = a.numpy()

print(a, "\n\n", b)

In [None]:
# See how the numpy array changed in value.

a.add_(1) # Inplace addition

print(a, "\n\n", b) #  The Torch Tensor and NumPy array will share their underlying memory locations

In [None]:
# Python list
array = [[1,2,3],[4,5,6]]

# numpy array
first_array = np.array(array) # 2x3 array

print("Array Type: {}".format(type(first_array))) # type
print("Array Shape: {}".format(np.shape(first_array))) # shape
print(type(first_array), "\n", first_array)
print(type(array), "\n", array)

In [None]:
# pytorch array
tensor = torch.Tensor(array)

print("Array Type: {}".format(tensor.dtype)) # type
print("Array Shape: {}".format(tensor.shape)) # shape
print(tensor)

In [None]:
# numpy ones
print("Numpy {}\n".format(np.ones((2,3))))

# pytorch ones
print("PyTorch {}\n".format(torch.ones((2,3))))

In [None]:
# numpy random
print("Numpy {}\n".format(np.random.rand(2,3)))

# pytorch random
print("PyTorch {}\n".format(torch.rand(2,3)))

In [None]:
# You can use standard NumPy-like indexing with all bells and whistles!

print(y[:, 1])

In [None]:
# Converting NumPy Array to Torch Tensor
# See how changing the np array changed the Torch Tensor automatically

a = np.ones(5)
b = torch.from_numpy(a)

np.add(a, 1, out=a)

print(a, "\n\n", b)

In [None]:
# TORCH.IS_TENSOR

a = np.ones(5)
b = torch.from_numpy(a)

print(torch.is_tensor(a), " ", torch.is_tensor(b))

In [None]:
a = np.random.rand(4,3)
a

In [None]:
b = torch.from_numpy(a)
print(b, "\n", b.numpy())

In [None]:
# The memory is shared between the Numpy array and Torch tensor, so if we change the values in-place of one object,
# the other will change as well.
b.mul_(2)

In [None]:
a

### Flatten, Reshape, And Squeeze

In [None]:
# Resizing: If we want to resize/reshape tensor, you can use torch.view:
# A normal distribution with a mean of 0 and a standard deviation of 1 is called a standard normal distribution.

x = torch.randn(3, 4) # Random Normal variables which have properties like mean 0 and std as 1

In PyTorch the size and shape of a tensor mean the same thing.

**rank**, commonly used as the number of dimensions of the tensor. The `rank` of a tensor is equal to the length of the tensor's shape. We can deduce a couple of things from `rank`.

In [None]:
x.shape, len(x.shape) # So Rank is 2

In [None]:
# deduce the number of elements contained within the tensor. The number of elements inside a tensor is equal to 
# the product of the shape's component values.
torch.tensor(x.shape).prod()

In [None]:
# Although, In PyTorch, there is a dedicated function for this

x.numel()

The number of elements contained within a tensor is important for reshaping because the reshaping must account for the total number of elements present. Reshaping changes the tensor's shape but not the underlying data. Our tensor has 12 elements, so any reshaping must account for exactly 12 elements.

In [None]:
# With Rank 2
x.reshape([1,12]), x.reshape([2,6]), x.reshape([3,-1]), x.reshape([-1,3]) 

In [None]:
# Increase the rank to 3

x.reshape(2,2,3), len(x.reshape(2,2,3).shape)

In [None]:
x # I.e. No change to actual x

PyTorch has another function that we may see called view() that does the same thing as the reshape().

In [None]:
y = x.view(12)

z = x.view(-1, 6)  # the size -1 is inferred from other dimensions

print(x.size(), y.size(), z.shape)

In [None]:
print(x, "\n"*2, y, "\n"*2, z)

**Changing Shape By Squeezing And Unsqueezing:**  

we can change the shape of our tensors is by `squeezing` and `unsqueezing` them.

- Squeezing a tensor removes the dimensions or axes that have a length of one.
- Unsqueezing a tensor adds a dimension with a length of one.  

These functions allow us to expand or shrink the rank (number of dimensions) of our tensor.  

In [None]:
x = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

print(x.reshape([1,12]))
print(x.reshape([1,12]).shape)

In [None]:
# squeeze
print(x.reshape([1,12]).squeeze(), "\n", 
      x.reshape([1,12]).squeeze().shape)

In [None]:
print(x.reshape([1, 3, 4]).squeeze(), "\n", 
      x.reshape([1, 3, 4]).squeeze().shape)

In [None]:
print(x.reshape([3, 4, 1]).squeeze(), "\n", 
      x.reshape([3, 4, 1]).squeeze().shape)

In [None]:
# unsqueeze
print(x.reshape([3, 4, 1]).squeeze().unsqueeze(dim=0), "\n", 
      x.reshape([3, 4, 1]).squeeze().unsqueeze(dim=0).shape)

In [None]:
# unsqueeze
print(x.reshape([3, 4, 1]).squeeze().unsqueeze(dim=1), "\n", 
      x.reshape([3, 4, 1]).squeeze().unsqueeze(dim=1).shape)

Let’s look at a common use case for squeezing a tensor by building a flatten function.

**Flatten A Tensor**  

A flatten operation on a tensor reshapes the tensor to have a shape that is equal to the number of elements contained in the tensor. This is the same thing as a 1d-array of elements.

In [None]:
# Since the argument t can be any tensor, we pass -1 as the second argument to the reshape() function. 
# In PyTorch, the -1 tells the reshape() function to figure out what the value should be based on the number of 
# elements contained within the tensor.

def flatten(t):
    t = t.reshape(1, -1)
    t = t.squeeze()
    return t

In [None]:
t = torch.ones(4, 3)
print(t.shape, "\n", flatten(t), "\n", flatten(t).shape)

**Concatenating Tensors:**  

We combine tensors using the cat() function, and the resulting tensor will have a shape that depends on the shape of the two input tensors.

In [None]:
t1 = torch.tensor([
    [1,2],
    [3,4]
])
t2 = torch.tensor([
    [5,6],
    [7,8]
])

In [None]:
# We can combine t1 and t2 row-wise (axis-0) in the following way:
torch.cat((t1, t2), dim=0), torch.cat((t1, t2), dim=0).shape

In [None]:
# We can combine t1 and t2 column-wise (axis-1) in the following way:
torch.cat((t1, t2), dim=1), torch.cat((t1, t2), dim=1).shape

**Practical example for flattening:**   
    Let’s look now at a hand written image of an eight from the MNIST dataset. This image has 2 distinct dimensions, height and width. The height and width are `18 x 18` respectively. These dimensions tell us that this is a `cropped image` because the MNIST dataset contains `28 x 28` images.  
    Let’s see now how these two axes of height and width are flattened out into a single axis of length `324`.  
    
    
<img src="assets/eight_13_13.PNG" width=300px>  

   The image below shows our flattened output with a single axis of length 324. The white on the edges corresponds to the white at the top and bottom of the image.

In this example, we are flattening the entire tensor image, but what if we want to only flatten specific axes within the tensor? This is typically required when working with CNNs. Let’s see how we can flatten out specific axes of a tensor in code with PyTorch.  

<img src="assets/eight_flatten.PNG" width=900px>  

Tensor inputs to a `convolutional neural network typically have 4 axes`, one for batch size, one for color channels, and one each for height and width.**(Batch Size, Channels, Height, Width)**

In [None]:
# Building A Tensor Representation For A Batch Of Gray Images

t1 = torch.tensor([
    [1,1,1,1],
    [1,1,1,1],
    [1,1,1,1],
    [1,1,1,1]
])

t2 = torch.tensor([
    [2,2,2,2],
    [2,2,2,2],
    [2,2,2,2],
    [2,2,2,2]
])

t3 = torch.tensor([
    [3,3,3,3],
    [3,3,3,3],
    [3,3,3,3],
    [3,3,3,3]
])

So we have three, rank-2 tensors. For our purposes here, we’ll consider these to be `three 4 x 4 images`. We will use them to create a batch that can be passed to a CNN.  
    
    Remember, batches are represented using a single tensor, so we’ll need to combine these three tensors into a single larger tensor that has three axes instead of 2.

In [None]:
t = torch.stack((t1, t2, t3))
t.shape

We have used the `stack()` method to concatenate our sequence of three tensors along a new axis. Since we have three tensors along a new axis, we know the `length of this axis should be 3`, and indeed, we can see in the shape that we have 3 tensors that have height and width of 4.  

The axis with a length of 3 represents the batch size while the axes of length 4 represent the height and width respectively.

In [None]:
t

So, we have a rank-3 tensor that contains a batch of three 4 x 4 images. Now we need to convert this tensor into a form that a CNN expects i.e. an additional axis for the color channels.  

We basically have an implicit single color channel for each of these image tensors, so in practice, these would be grayscale images.

A CNN will expect to see an explicit color channel axis, so let’s add one by reshaping this tensor.

In [None]:
t = t.reshape(3,1,4,4)
t

- We have specified an axis of length 1 right after the batch size axis. Then, we follow with the height and width axes length 4. 
- Notice how the additional axis of length 1 doesn’t change the number of elements in the tensor. This is because the product of the components values doesn't change when we multiply by one.

- The first axis has 3 elements. Each element of the first axis represents an image. For each image, we have a single color channel on the channel axis. Each of these channels contain 4 arrays that contain 4 numbers or scalar components.

Let’s see this with code by indexing into this tensor.

In [None]:
# We have the first image
t[0]

In [None]:
# First channel in the first image
t[0][0]

In [None]:
# First row of pixel in First channel of the first image
t[0][0][0]

In [None]:
#  First pixel value in the first row of the first channel of the first image.
t[0][0][0][0]

**Flattening The Tensor Batch:** 

Let’s see how to flatten the images in this batch. Remember the whole batch is a single tensor that will be passed to the CNN, so we don’t want to flatten the whole thing. We only want to flatten the image tensors within the batch tensor.

Let’s flatten the whole thing first just to see what it will look like.

In [None]:
# Function which we wrote above
print(flatten(t), "\n", flatten(t).shape)

What we should notice about this output is that we have flattened the entire batch, and this smashes all the images together into a single axis. 
    
    Remember the ones represent the pixels from the first image, the twos the second image, and the threes from the third.  
This flattened batch won’t work well inside our CNN because we need individual predictions for each image within our batch tensor, and now we have a flattened mess.

So what is the solution now?  
The solution here, is to flatten each image while still maintaining the batch axis. This means we want to flatten only part of the tensor. We want to flatten the, color channel axis with the height and width axes i.e. (C, H, W) .


In [None]:
# Here we are using pytorch built in function

t.flatten(start_dim=1).shape

In [None]:
t.flatten(start_dim=1)

Notice here, how we have specified the start_dim parameter. This tells the flatten() method which axis it should start the flatten operation. The one here is an index, so it’s the second axis which is the color channel axis. We skip over the batch axis, leaving it intact.

Checking the shape, we can see that we have a rank-2 tensor with three single color channel images that have been flattened out into 16 pixels.

**Flattening An RGB Image:**  
If we flatten an RGB image, what happens to the color ?

> Each color channel will be flattened first. Then, the flattened channels will be lined up side by side on a single axis of the tensor. Let's look at an example in code.

We'll build an example RGB image tensor with a height of two and a width of two.

In [None]:
r = torch.ones(1,2,2)
g = torch.ones(1,2,2) + 1
b = torch.ones(1,2,2) + 2

img = torch.cat(
    (r,g,b),
    dim=0
)

In [None]:
# This should be our desired tensor. We can verify this by checking the shape like so:
img.shape

In [None]:
# We have three color channels with a height and width of two. We can also inspect this tensor's data like so:
img

In [None]:
# we can see how this will look by flattening the image tensor.
# In this case, we are flattening the whole image.

img.flatten(start_dim=0)

In [None]:
# We can also flatten only the channels:

img.flatten(start_dim=1)