<a href="https://colab.research.google.com/github/namanphy/pytorch-handson/blob/master/pytorch_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch Basics: Tensors & Gradients

#### *Part 1 of "Pytorch: Zero to GANs"*

*This post is the first in a series of tutorials on building deep learning models with PyTorch, an open source neural networks library developed and maintained by Facebook. Check out the full series:*

1. [PyTorch Basics: Tensors & Gradients](https://jovian.ml/aakashns/01-pytorch-basics)
2. [Linear Regression & Gradient Descent](https://jovian.ml/aakashns/02-linear-regression)
3. [Image Classfication using Logistic Regression](https://jovian.ml/aakashns/03-logistic-regression) 
4. [Training Deep Neural Networks on a GPU](https://jovian.ml/aakashns/04-feedforward-nn)
5. [Image Classification using Convolutional Neural Networks](https://jovian.ml/aakashns/05-cifar10-cnn)
6. [Data Augmentation, Regularization and ResNets](https://jovian.ml/aakashns/05b-cifar10-resnet)
7. [Generating Images using Generative Adverserial Networks](https://jovian.ml/aakashns/06-mnist-gan)

This series attempts to make PyTorch a bit more approachable for people starting out with deep learning and neural networks. In this notebook, we’ll cover the basic building blocks of PyTorch models: tensors and gradients.

In [0]:
# Uncomment the command below if PyTorch is not installed
# !conda install pytorch cpuonly -c pytorch -y

In [0]:
import torch

## Tensors

At its core, PyTorch is a library for processing tensors. A tensor is a number, vector, matrix or any n-dimensional array. Let's create a tensor with a single number:

In [0]:
# Number
t1 = torch.tensor(4.)
t1

tensor(4.)

`4.` is a shorthand for `4.0`. It is used to indicate to Python (and PyTorch) that you want to create a floating point number. We can verify this by checking the `dtype` attribute of our tensor:

In [0]:
t1.dtype

torch.float32

Let's try creating slightly more complex tensors:

In [0]:
# Vector
t2 = torch.tensor([1., 2, 3, 4])
t2

tensor([1., 2., 3., 4.])

In [0]:
# Matrix
t3 = torch.tensor([[5., 6], [7, 8], [9, 10]])
t3

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.]])

In [0]:
# 3-dimensional array
t4 = torch.tensor([
    [[11, 12, 13], 
     [13, 14, 15]], 
    [[15, 16, 17], 
     [17, 18, 19.]]])
t4

tensor([[[11., 12., 13.],
         [13., 14., 15.]],

        [[15., 16., 17.],
         [17., 18., 19.]]])

Tensors can have any number of dimensions, and different lengths along each dimension. We can inspect the length along each dimension using the `.shape` property of a tensor.

In [0]:
print(t1)
t1.shape

tensor(4.)


torch.Size([])

In [0]:
print(t2)
t2.shape

tensor([1., 2., 3., 4.])


torch.Size([4])

In [0]:
print(t3)
t3.shape

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.]])


torch.Size([3, 2])

In [0]:
print(t4)
t4.shape

tensor([[[11., 12., 13.],
         [13., 14., 15.]],

        [[15., 16., 17.],
         [17., 18., 19.]]])


torch.Size([2, 2, 3])

## Tensor operations and gradients

We can combine tensors with the usual arithmetic operations. Let's look an example:

In [0]:
# Create tensors.
x = torch.tensor(3.)
w = torch.tensor(4., requires_grad=True)
b = torch.tensor(5., requires_grad=True)

We've created 3 tensors `x`, `w` and `b`, all numbers. `w` and `b` have an additional parameter `requires_grad` set to `True`. We'll see what it does in just a moment. 

Let's create a new tensor `y` by combining these tensors:

In [0]:
# Arithmetic operations
y = w * x + b
y

tensor(17., grad_fn=<AddBackward0>)

As expected, `y` is a tensor with the value `3 * 4 + 5 = 17`. What makes PyTorch special is that we can automatically compute the derivative of `y` w.r.t. the tensors that have `requires_grad` set to `True` i.e. w and b. To compute the derivatives, we can call the `.backward` method on our result `y`.

In [0]:
# Compute derivatives
y.backward()

The derivates of `y` w.r.t the input tensors are stored in the `.grad` property of the respective tensors.

In [0]:
# Display gradients
print('dy/dx:', x.grad)
print('dy/dw:', w.grad)
print('dy/db:', b.grad)

dy/dx: None
dy/dw: tensor(3.)
dy/db: tensor(1.)


As expected, `dy/dw` has the same value as `x` i.e. `3`, and `dy/db` has the value `1`. Note that `x.grad` is `None`, because `x` doesn't have `requires_grad` set to `True`. 

The "grad" in `w.grad` stands for gradient, which is another term for derivative, used mainly when dealing with matrices. 

## Interoperability with Numpy

[Numpy](http://www.numpy.org/) is a popular open source library used for mathematical and scientific computing in Python. It enables efficient operations on large multi-dimensional arrays, and has a large ecosystem of supporting libraries:

* [Matplotlib](https://matplotlib.org/) for plotting and visualization
* [OpenCV](https://opencv.org/) for image and video processing
* [Pandas](https://pandas.pydata.org/) for file I/O and data analysis

Instead of reinventing the wheel, PyTorch interoperates really well with Numpy to leverage its existing ecosystem of tools and libraries.

Here's how we create an array in Numpy:

In [0]:
import numpy as np

x = np.array([[1, 2], [3, 4.]])
x

array([[1., 2.],
       [3., 4.]])

We can convert a Numpy array to a PyTorch tensor using `torch.from_numpy`.

In [0]:
# Convert the numpy array to a torch tensor.
y = torch.from_numpy(x)
y

tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)

Let's verify that the numpy array and torch tensor have similar data types.

In [0]:
x.dtype, y.dtype

(dtype('float64'), torch.float64)

We can convert a PyTorch tensor to a Numpy array using the `.numpy` method of a tensor.

In [0]:
# Convert a torch tensor to a numpy array
z = y.numpy()
z

array([[1., 2.],
       [3., 4.]])

## Gradients operations

is_leaf: A node is leaf if :
  - It was initialized explicitly by some function like `x = torch.tensor(1.0)` or `x = torch.randn(1, 1)` (basically all the tensor initializing methods discussed at the beginning of this post).
  - It is created after operations on tensors which all have `requires_grad = False`.
  - It is created by calling `.detach()` method on some tensor.
  

#### What if one or more x, w or b were matrices, instead of numbers, in the above example? What would the result y and the gradients w.grad and b.grad look like in this case?
Ans. - Firstly we have to call backward with a tensor argument(normally all ones if dont want scaling) of the same dimension as of the tensor whose gradient is being calculated. And then the grad property can hold the required gradients as applicable.

In [18]:
# Create tensors.
x = torch.tensor(3., requires_grad=True)
w = torch.tensor([4.,2.], requires_grad=True)
b = torch.tensor(5., requires_grad=True)

c = w * x
y = c + b # Add function doesn't need any context tensors during backward pass. Incoming gradient is passed only.

y.backward(torch.tensor([1.,1.]))

print('dy/dx:', x.grad) # 3*1 + 3*1 = 6
print('dy/dw:', w.grad) # 3*[1., 1.] = [3., 3.]
print('dy/db:', b.grad) # 1 + 1 = 2

dy/dx: tensor(6.)
dy/dw: tensor([3., 3.])
dy/db: tensor(2.)


#### What if y was a matrix created using torch.tensor, with each element of the matrix expressed as a combination of numeric tensors x, w and b?

Ans : Here `y` is a tensor that is explicitly defined(it is a leaf node and requires grad is also true) and as no operation is applied and therefore a forward graph is not built with any value in `grad_fn` and hence no derivative is found.

In [41]:
# Create tensors.
x = torch.tensor(3., requires_grad=True)
w = torch.tensor(4., requires_grad=True)
b = torch.tensor(5., requires_grad=True)

c = w * x
y = torch.tensor([c, b], requires_grad=True)

y.backward(torch.tensor([1.,1.]))

print('dy/dx:', x.grad) # 
print('dy/dw:', w.grad) # 
print('dy/db:', b.grad) # 

print(y.grad_fn)
print(y.is_leaf)
print(y.requires_grad)

dy/dx: None
dy/dw: None
dy/db: None
None
True
True


#### What if we had a chain of operations instead of just one i.e. y = x * w + b, z = l * y + m, w = c * z + d and so on? What would calling w.grad do?

Ans - So if we do chain operations like these, here evry tensor that is resultant of operations on other tensors (like y, z, w)) are non-leaf nodes and thus for them there is nothing in `grad` property (because the gradients for them isn't accumulated and instead passed to there origin tensors) and the gradient will be accumulated in the first leaf node which occurs after the non-leaf tensor in the graph.

In [36]:
x = torch.tensor(3., requires_grad=True)
w = torch.tensor(4., requires_grad=True)
# w.retain_grad()
b = torch.tensor(5., requires_grad=True)

y = w * x + b
y.retain_grad()

l = torch.tensor(4., requires_grad=True)
m = torch.tensor(5., requires_grad=True)

z = l * y + m
z.retain_grad()

c = torch.tensor(4., requires_grad=True)
d = torch.tensor(4., requires_grad=True)

w = c * z + d
w.retain_grad()

w.backward()

print('dw/dx:', x.grad) # 
print('dw/dw:', w.grad) # 
print('dw/db:', b.grad) # 
print('dw/dl:', l.grad) # 
print('dw/dm:', m.grad) # 
print('dw/dy:', y.grad) # 
print('dw/dc:', c.grad) # 
print('dw/dz:', z.grad) # 
print('dw/dd:', d.grad) # 

print(y.grad_fn)

dw/dx: tensor(64.)
dw/dw: tensor(1.)
dw/db: tensor(16.)
dw/dl: tensor(68.)
dw/dm: tensor(4.)
dw/dy: tensor(16.)
dw/dc: tensor(73.)
dw/dz: tensor(4.)
dw/dd: tensor(1.)
<AddBackward0 object at 0x7f3f0de76c18>


Initialized empty Git repository in /content/.git/
fatal: pathspec 'pytorch-basics.ipynb' did not match any files
