#### PyTorch has emerged as a major contender in the race to be the king of deep learning frameworks.

In this notebook I will go over some regular snippets and techniques of it.

#### Compute basic gradients from the sample tensors using PyTorch

#### First some basics of Pytorch here

**Autograd**: This class is an engine to calculate derivatives (Jacobian-vector product to be more precise). It records a graph of all the operations performed on a gradient enabled tensor and creates an acyclic graph called the dynamic computational graph. The leaves of this graph are input tensors and the roots are output tensors. Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule.

A Variable class wraps a tensor. You can access this tensor by calling `.data` attribute of a Variable.

The Variable also stores the gradient of a scalar quantity (say, loss) with respect to the parameter it holds. This gradient can be accessed by calling the `.grad` attribute. This is basically the gradient computed up to this particular node, and the gradient of the every subsequent node, can be computed by multiplying the edge weight with the gradient computed at the node just before it.

The third attribute a Variable holds is a grad_fn, a Function object which created the variable.

**Variable**: The Variable, just like a Tensor is a class that is used to hold data. It differs, however, in the way it’s meant to be used. Variables are specifically tailored to hold values which change during training of a neural network, i.e. the learnable paramaters of our network. Tensors on the other hand are used to store values that are not to be learned. For example, a Tensor maybe used to store the values of the loss generated by each example.

Every **variable** object has several members one of them is **grad**:

**grad**: grad holds the value of gradient. If requires_grad is False it will hold a None value. Even if requires_grad is True, it will hold a None value unless .backward() function is called from some other node. For example, if you call out.backward() for some variable out that involved x in its calculations then x.grad will hold ∂out/∂x.

**Backward() function**
Backward is the function which actually calculates the gradient by passing it’s argument (1x1 unit tensor by default) through the backward graph all the way up to every leaf node traceable from the calling root tensor. The calculated gradients are then stored in .grad of every leaf node. Remember, the backward graph is already made dynamically during the forward pass. Backward function only calculates the gradient using the already made graph and stores them in leaf nodes.

In [2]:
import torch
from torch.autograd import Variable

def forward(x):
    return x * w

w = Variable(torch.Tensor([1.0]), requires_grad=True)
# . On setting .requires_grad = True they start forming a backward graph
# that tracks every operation applied on them to calculate the gradients
# using something called a dynamic computation graph (DCG)
# When you finish your computation you can call .backward() and have
# all the gradients computed automatically. The gradient for this tensor
# will be accumulated into .grad attribute.

# Now create an array of data.
# By PyTorch’s design, gradients can only be calculated
# for floating point tensors which is why I’ve created a float type
# array before making it a gradient enabled PyTorch tensor
x_data = [11.0, 22.0, 33.0]
y_data = [21.0, 14.0, 64.0]

def loss_function(x, y):
    y_pred = forward(x)
    return (y_pred - y) * (y_pred - y)


# Now running the training loop
for epoch in range(10):
    for x_val, y_val in zip(x_data, y_data):
        l = loss_function(x_val, y_val)
        l.backward()
        print("\tgrad: ", x_val, y_val, w.grad.data[0])
        w.data = w.data - 0.01 * w.grad

        # Manually set the gradient to zero after updating weights
        w.grad.data.zero_()

        print('progress: ', epoch, l.data[0])

	grad:  11 21 tensor(-220.)
progress:  0 tensor(100.)
	grad:  22 14 tensor(2481.6001)
progress:  0 tensor(3180.9602)
	grad:  33 64 tensor(-51303.6484)
progress:  0 tensor(604238.8125)
	grad:  11 21 tensor(118461.7578)
progress:  1 tensor(28994192.)
	grad:  22 14 tensor(-671630.6875)
progress:  1 tensor(2.3300e+08)
	grad:  33 64 tensor(13114108.)
progress:  1 tensor(3.9481e+10)
	grad:  11 21 tensor(-30279010.)
progress:  2 tensor(1.8943e+12)
	grad:  22 14 tensor(1.7199e+08)
progress:  2 tensor(1.5279e+13)
	grad:  33 64 tensor(-3.3589e+09)
progress:  2 tensor(2.5900e+15)
	grad:  11 21 tensor(7.7553e+09)
progress:  3 tensor(1.2427e+17)
	grad:  22 14 tensor(-4.4050e+10)
progress:  3 tensor(1.0023e+18)
	grad:  33 64 tensor(8.6030e+11)
progress:  3 tensor(1.6991e+20)
	grad:  11 21 tensor(-1.9863e+12)
progress:  4 tensor(8.1519e+21)
	grad:  22 14 tensor(1.1282e+13)
progress:  4 tensor(6.5750e+22)
	grad:  33 64 tensor(-2.2034e+14)
progress:  4 tensor(1.1146e+25)
	grad:  11 21 tensor(5.0875e+14

Weight initialization is an important task in training a neural network,
whether its a convolutional neural network
(CNN), a deep neural network (DNN), and a recurrent neural network
(RNN). Lets some examples of initializing the weights.


Weight initialization can be done by using various methods, including
random weight initialization.
Weight initialization based on a distribution
is done using
- Uniform distribution,
- Bernoulli distribution,
- Multinomial distribution, and normal distribution.

To execute a neural network, a set of initial weights needs to be passed to
the backpropagation layer to compute the loss function (and hence, the
accuracy can be calculated). The selection of a method depends on the
data type, the task, and the optimization required for the model.

From the Bernoulli distribution, we create sample tensors by considering the uniform distribution of size 4 and 4 in a matrix format, as follows.

In [None]:
torch.bernoulli(torch.Tensor(4, 4).uniform_(0, 1))


### The generation of sample random values from a multinomial distribution

Note the syntax of multinomial function from official doc

```python
torch.multinomial(input, num_samples, replacement=False, *, generator=None, out=None) → LongTensor
```
Returns a tensor where each row contains num_samples indices sampled from the multinomial probability distribution located in the corresponding row of tensor input.



In [None]:
sample_tensor = torch.Tensor([10, 10, 13, 10, 34,45,65,67,87,89,87,34])
torch.multinomial(torch.tensor([10., 10., 13., 10., 34., 45., 65., 67., 87., 89., 87., 34.]), 3)

