<a href="https://colab.research.google.com/github/jay05Hawk/Pytorch_all/blob/main/Pytorch_1%2C0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What is PyTorch?

PyTorch is a Python-based scientific computing package that uses the power of graphics processing units(GPU). It is also one of the preferred deep learning research platforms built to provide maximum flexibility and speed. It is known for providing two of the most high-level features; namely, tensor computations with strong GPU acceleration support and building deep neural networks on a tape-based autograd systems.

There are many existing Python libraries which have the potential to change how deep learning and artificial intelligence are performed, and this is one such library. One of the key reasons behind PyTorch’s success is it is completely Pythonic and one can build neural network models effortlessly. It is still a young player when compared to its other competitors, however, it is gaining momentum fast.

## Brief History about PyTorch

Since its release in January 2016, many researchers have continued to increasingly adopt PyTorch. It has quickly become a go-to library because of its ease in building extremely complex neural networks. It is giving a tough competition to TensorFlow especially when used for research work. However, there is still some time before it is adopted by the masses due to its still “new” and “under construction” tags.

PyTorch creators envisioned this library to be highly imperative which can allow them to run all the numerical computations quickly. This is an ideal methodology which fits perfectly with the Python programming style. It has allowed deep learning scientists, machine learning developers, and neural network debuggers to run and test part of the code in real time. Thus they don’t have to wait for the entire code to be executed to check whether it works or not.
You can always use your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch functionalities and services when required. Now you might ask, why PyTorch? What’ so special in using it to build deep learning models?

The answer is quite simple, PyTorch is a dynamic library (very flexible and you can use as per your requirements and changes) which is currently adopted by many of the researchers, students, and artificial intelligence developers. In the recent Kaggle competition, PyTorch library was used by nearly all of the top 10 finishers.

Some of the key highlights of PyTorch includes:

__Simple Interface:__ It offers easy to use API, thus it is very simple to operate and run like Python.

__Pythonic in nature:__ This library, being Pythonic, smoothly integrates with the Python data science stack. Thus it can leverage all the services and functionalities offered by the Python environment.

__Computational graphs:__ In addition to this, PyTorch provides an excellent platform which offers dynamic computational graphs, thus you can change them during runtime. This is highly useful when you have no idea how much memory will be required for creating a neural network model.

### <mark style="">__Tensors__</mark>

Tensor is similar to <mark style="background-color: Green">__Numpy's n-darray__</mark>, the additional point for Tensors in we can use it in GPUs to accelerate computing.

In [None]:
from __future__ import print_function
import torch

__Note:__ Uninitialized matrix is declared, but doesn't contain definite known values before it is used. When we created an Unintialized matrix, whatever values were allocated inside the memory will apear as the initial values.

$\color{blue}{\text{Construct 6x3 matrix, uninitialized:}}$

In [None]:
a = torch.empty(6,3)
print(a)

$\color{green}{\text{Construct a randomly initialized matrix:}}$

In [None]:
a = torch.rand(4,3)
print(a)

$\color{green}{\text{Construct a matrix filled zeros and of dtype long:}}$

In [None]:
a = torch.zeros(4,3, dtype=torch.long)
print(a)

In [None]:
#Construct a tensor with data:
a = torch.tensor([7.8, 5])
type(a)

or we can create new tensor with existing tensor.These methods we reuse its properties of input tensor, e.g. dtype, unless new values are provided by us.

In [None]:
a = a.new_ones(6,5, dtype=torch.double)    # new methods take in sizes
print(a)

a = torch.randn_like(a, dtype=torch.float)  # override dtype
print(a)                                    # result will be the same size

print(a.size())

Note: <mark style="background-color: Yellow">torch_size</mark> is actually a tuple, so it supports all tuple operations.

### Operations

There are multiple syntaxes for operations. In the following examples, we used addition operation,


In [None]:

#Addition: syntax1
b = torch.rand(6,5)
print(a + b)

In [None]:

#Addition: syntax2
print(torch.add(a,b))

In [None]:
#Addition: providing an output as an argument
result = torch.empty(6, 5)
torch.add(a, b, out=result)
print(result)

In [None]:
#Addition: in place

# adds a to b
b.add_(a)                 #here is same thing the value store in b.      first char store value
print(b)

__Note:__ Any operation that mutates a  tensor in-place is post-fixed with an <mark style="highlight: red">_.</mark> For example: a.copy_(b), a.b_(), will change a.

In [None]:
print(a[:,2])

__Resizing:__ We can resize or reshape tensor, use <mark style="background-color: Yellow">tensor.view</mark> for that:

In [None]:
a.view(9)

In [None]:
a = torch.randn(3, 3)
b = a.view(9) # shows all elements in metice
c = a.view(-1, 9)  # the size -1 is inferred from other dimensions
print(a.size(), b.size(), c.size())

If you have one value tensor, use <mark style="background-color: Yellow">.item()</mark> to get the value of the Python number


In [None]:
a = torch.randn(1)
print(a)
print(a.item())

### NumPy Bridge

Converting a Torch Tensor to NumPy array and vice versa is breeze.

The Torch Tensor and NumPy array will share their underlying memory locations (if the Torch Tensor is on CPU), and changing one will change the other.

__Converting a Torch tensor to NumPy array__

In [None]:
x = torch.ones(4)
print(x)

In [None]:
y = x.numpy()
print(y)

__See how numpy array changed in value__

In [None]:
x.add_(1)
print(x)
print(y)

__Converting NumPy array to Torch tensor__

lets see how changing the numpy array changed the Torch Tensor automatically

In [None]:
import numpy as np
f = np.ones(4)
g = torch.from_numpy(f)
np.add(f, 1, out=f)
print(f)
print(g)

All the Tensors on the CPU except a CharTensor support converting to NumPy and back.

$\color{blue}{\text{************CUDA Tensors**********}}$

Tensors can be moved onto any device using the <mark style="background-color: Yellow">.to</mark> method.

In [None]:
# let us run this cell only if CUDA is available  --->GPU
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!


##  $\color{blue}{\text{AUTOGRAD : }}$ Automatic Differentiaition

- This class is an engine to calculate derivatives. It records the graph of all the operations performed on a gradient      enabled tensor and creates a acyclic graph called the dynamic computational graph(DCG).The leaves of this graph are input tensors and the roots are output tensors. Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule.

## Tensor

<mark style="background-color: Yellow">torch.Tensor</mark> is the central class of the package.If we set its attribute <mark style="background-color: Yellow">.requires_grad</mark> as <mark style="background-color: dark grey">True</mark>, it starts to track all operations on it. When you finish your computation you can call <mark style="background-color: Yellow">.backward()</mark> and have all the gradients computed automatically. The gradient of this tensor will be accumulated into <mark style="background-color: Yellow">.grad</mark> attribute.

To stop a tensor from tracking history, you can call <mark style="background-color: Yellow">.detach()</mark> to detach it from the computation history, and to prevent future computation from being tracked.


To prevent tracking history(and using memory), you can also wrap the code block in with <mark style="background-color: Yellow">torch.no_grad():</mark>. This can be particularly helpful when evaluating a model because the model may have trainable parameters with <mark style="background-color: Yellow">requires_grad=True</mark>, but for which we don’t need the gradients.

There's one more class which is very important in autograd implementation - a <mark style="background-color: Yellow">Function</mark>

<mark style="background-color: Yellow">Tensor</mark> and <mark style="background-color: Yellow">Function</mark> are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a <mark style="background-color: Yellow">.grad_fn</mark> attribute that references a Function that has created the Tensor (except for Tensors created by the user - their <mark style="background-color: Yellow">grad_fn is None</mark>).

If you want to compute the derivatives, you can call <mark style="background-color: Yellow">.backward()</mark> on a <mark style="background-color: Yellow">Tensor</mark>. If <mark style="background-color: Yellow">Tensor</mark> is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to <mark style="background-color: Yellow">backward()</mark>, however if it has more elements, you need to specify a <mark style="background-color: Yellow">gradient</mark> argument that is a tensor of matching shape.

In [None]:
import torch

Create a tensor and set $\color{green}{\text{requires_grad=True}}$ to track computation with it.

In [None]:
a = torch.ones(2, 2, requires_grad=True)
print(a)

In [None]:
#Do a tensor operation:
b = a + 2
print(b)

In [None]:

#b was created as a result of an operatio, so it has a <mark style="background-color: Yellow">grad_fn</mark>
print(b.grad_fn)

In [None]:
c = b * b * 3
out = c.mean()

print(c, out)

<mark style="background-color: Yellow">.requires_grad_( ... )</mark>  changes an existing Tensor’s <mark style="background-color: Yellow">requires_grad</mark> flag in-place. The input flag defaults to False if not given.

In [None]:
p = torch.randn(3, 3)
p = ((p * 3) / (p - 1))
print(p.requires_grad)
p.requires_grad_(True)
print(p.requires_grad)
q = (p * p).sum()
print(q.grad_fn)

## Gradients

Let's backdrop now. Because <mark style="background-color: magenta">out</mark> contains a single scalar, <mark style="background-color: Yellow">out.backword</mark> is equivalent to  <mark style="background-color: Yellow">out.backward(torch.tensor(1.))</mark>. 

In [None]:
out.backward()

In [None]:
#Print gradients d(out)/dx
print(a.grad)

You should have got a matrix of 4.5. Let’s call the out Tensor “o”. We have that 

$o= \frac{1}{4} \sum c_i $ $c_i= 3(a_i+2)^2$  and $c_i|_{a_i=1}= 27$ . 

Therefore, $\frac{\partial_0}{\partial a_i}=\frac{3(x_i+2)}{2}$, 

hence
$\frac{\partial_0}{\partial a_i}|_{a_i=1} = \frac{9}{2} = 4.5$



In [None]:
#Now let's have a look at an example of vector-Jacobian product:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

Now in this case y is no longer a scalar.<mark style="background-color: Yellow">torch.autograd(torch.tensor(1.))</mark> could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to backward as argument:

In [None]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

You can also stop autograd from tracking history on Tensors with <mark style="background-color: Yellow">.requires_grad=True</mark> either by wrapping the code block in with <mark style="background-color: Yellow">torch.no_grad()</mark>:

In [None]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

In [None]:
#Or by using .detach() to get a new Tensor with the same content but that does not require gradients:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())

## Neural Networks

We can construct neural networks using the <mark style="background-color: Yellow">torch.nn</mark> package.

Now that you had a glimpse of autograd, nn depends on autograd to define models and differentiate them. An nn.Module contains layers, and a method forward(input)that returns the output.

For example, look at this network that classifies digit images:
<img src="neural.png"> 


### convnet

It is a simple feed-forward network. It takes the input,feeds it through several layers one after the other, and then finally gives the output.

A typical training procedure for a neural network is as follows:
- Define the neural network that has some learnable parameters (or weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule: <mark style="background-color: light-blue">weight = weight - learning_rate * gradient</mark>
        
### Define the network

Let's define the network:

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class network(nn.Module):

    def __init__(self):
        super(network, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = mx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = network()
print(net)

You just have to define the <mark style="background-color: yellow">forward</mark>,  and the <mark style="background-color: yellow">backward</mark> function (where gradients are computed) is automatically defined for you using autograd. You can use any of the Tensor operations in the <mark style="background-color: yellow">forward</mark> function.

The learnable parameters of a model are returned by <mark style="background-color: yellow">net.parameters()</mark>.

In [None]:
s = list(net.parameters())

In [None]:
len(s)

In [None]:
s[0].size()

In [None]:
s[1].size()

In [None]:
s[2].size()

In [None]:
s[3].size()

In [None]:
s[4].size()

In [None]:
s[5].size()

In [None]:
s[6].size()

In [None]:
s[7].size()

In [None]:
s[8].size()

In [None]:
s[9].size()

In [None]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

Let’s try a random 32x32 input. Note: expected input size of this net (LeNet) is 32x32. To use this net on the MNIST dataset, please resize the images from the dataset to 32x32.

In [None]:
inp = torch.randn(1, 1, 32, 32)
out = net(inp)
print(out)

Zero the gradient buffers of all parameters and backprops with random gradients:

In [None]:
net.zero_grad()
out.backward(torch.randn(1, 10))

__Note:__
<mark style="background-color: yellow">torch.nn</mark> only supports mini-batches. The entire <mark style="background-color: yellow">torch.nn</mark> package only supports inputs that are a mini-batch of samples, and not a single sample.
For example, <mark style="background-color: yellow">nn.Conv2d</mark> will take in a 4D Tensor of <mark style="background-color: yellow">nSamples x nChannels x Height x Width</mark>.
If you have a single sample, just use <mark style="background-color: yellow">input.unsqueeze(0)</mark> to add a fake batch dimension.

### Loss Function

A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.
There are several different loss functions under the nn package . A simple loss is: <mark style="background-color: yellow">nn.MSELoss</mark> which computes the mean-squared error between the input and the target.
    
For example:

In [None]:
output = net(inp)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

Now, if you follow loss in the backward direction, using its <mark style="background-color: yellow">.grad_fn</mark>attribute, you will see a graph of computations that looks like this:

input  ->   conv2d  ->   relu  ->   maxpool2d  ->   conv2d ->   relu  ->   maxpool2d
      
      -> view -> linear -> relu -> linear -> relu -> linear
      
      -> MSELoss
      
      -> loss

So, when we call <mark style="background-color: yellow">loss.backward()</mark>, the whole graph is differentiated w.r.t. the loss, and all Tensors in the graph that has requires_grad=True will have their .grad Tensor accumulated with the gradient.


For illustration, let us follow a few steps <mark style="background-color: yellow">backward</mark>:

In [None]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

### Backprop

To backpropagate the error all we have to do is to <mark style="background-color: yellow">loss.backward()</mark>. You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.


Now we shall call <mark style="background-color: yellow">loss.backward()</mark>, and have a look at conv1’s bias gradients before and after the backward.

In [None]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

### Update the weights

The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):
   
 $\color{blue}{\text{ weight = weight - learning_rate * gradient}}$
   
We can implement this using simple Python code:

In [None]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package: torch.optim that implements all these methods. Using it is very simple:

In [None]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(inp)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

Observe how gradient buffers had to be manually set to zero using <mark style="background-color:yellow">optimizer.zero_grad()</mark>. This is because gradients are accumulated as explained in the <mark style="font-color:red">Backprop</mark> section.