# Learning PyTorch
## What is PyTorch?
PyTorch is a Python-based scientific computing package targeted at two sets of audiences:
1. A replacement for numpy to use the power of GPUs,
2. A deep learning research platform that provides maximum flexibility and speed.

## Getting Started:
### Tensors:
Tensors are similar to numpys' ndarray, with the addition being that Tensors can also be used on a GPU to accelerate computing.


Construct a $5\times3$ matrix, uninitialized:

In [1]:
import torch
x = torch.Tensor(5, 3)
print(x)


 1596.3584     0.0000  1596.3584
    0.0000     0.0000     0.0000
    0.0000     0.0000     0.0000
    0.0000     0.0000     0.0000
    0.0000     0.0000     0.0000
[torch.FloatTensor of size 5x3]



Construct a randomly initialized matrix:

In [2]:
x = torch.rand(5, 3)
print(x)


 0.5157  0.6637  0.4432
 0.1964  0.8327  0.7791
 0.1398  0.4277  0.2781
 0.1079  0.1372  0.8483
 0.8859  0.9924  0.6634
[torch.FloatTensor of size 5x3]



Get its size:

In [3]:
print(x.size())

torch.Size([5, 3])


## Operations:
There are multiple syntaxes for operations. Let's see addition as an example...
* Addition: syntax 1

In [4]:
y = torch.rand(5, 3)
print(x + y)


 1.0731  0.9349  0.6311
 0.4161  0.8820  1.4066
 0.3736  0.8647  0.7641
 0.5107  0.3337  1.2597
 0.9037  1.8274  1.5955
[torch.FloatTensor of size 5x3]



* Addition: syntax 2

In [5]:
print(torch.add(x, y))


 1.0731  0.9349  0.6311
 0.4161  0.8820  1.4066
 0.3736  0.8647  0.7641
 0.5107  0.3337  1.2597
 0.9037  1.8274  1.5955
[torch.FloatTensor of size 5x3]



* Addition: giving an output tensor

In [6]:
result = torch.Tensor(5, 3)
torch.add(x, y, out=result)
print(result)


 1.0731  0.9349  0.6311
 0.4161  0.8820  1.4066
 0.3736  0.8647  0.7641
 0.5107  0.3337  1.2597
 0.9037  1.8274  1.5955
[torch.FloatTensor of size 5x3]



* Addition: in-place

In [7]:
# adds x to y
y.add_(x)
print(y)


 1.0731  0.9349  0.6311
 0.4161  0.8820  1.4066
 0.3736  0.8647  0.7641
 0.5107  0.3337  1.2597
 0.9037  1.8274  1.5955
[torch.FloatTensor of size 5x3]



You can use standard numpy-like indexing

In [8]:
print(x[:, 1])


 0.6637
 0.8327
 0.4277
 0.1372
 0.9924
[torch.FloatTensor of size 5]



## Numpy Bridge:
Converting a torch Tensor to a numpy array and vice versa is easy.
The torch Tensor and numpy array will share their underlying memory locations, and changing ont will change the other.
### Converting torch Tensor to numpy Array

In [9]:
a = torch.ones(5)
print(a)
b = a.numpy()
print(b)


 1
 1
 1
 1
 1
[torch.FloatTensor of size 5]

[ 1.  1.  1.  1.  1.]


See how to numpy array changed in value.

In [10]:
a.add_(1)
print(a)
print(b)


 2
 2
 2
 2
 2
[torch.FloatTensor of size 5]

[ 2.  2.  2.  2.  2.]


### Converting numpy Array to torch Tensor
See how changing the numpy array changed the torch Tensor automatically.

In [11]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[ 2.  2.  2.  2.  2.]

 2
 2
 2
 2
 2
[torch.DoubleTensor of size 5]



### CUDA Tensors
Tensors can be moved onto GPU using the .cuda function.

In [12]:
# let us run this cell only if CUDA is available
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    print(x + y)


 1.5887  1.5986  1.0744
 0.6124  1.7147  2.1858
 0.5134  1.2925  1.0422
 0.6186  0.4709  2.1080
 1.7895  2.8198  2.2588
[torch.cuda.FloatTensor of size 5x3 (GPU 0)]



## Autograd: automatic differentiation
Central to all neural networks in PyTorch is the **autograd** package. The **autograd** package provides automatic differentiation for all operations on Tensors.

## Variable
**autograd.Variable** is the central class of the package. It wraps a Tensor, and supports nearly all operations defined on it. Once you finish your computation you can call **.backward()** and have all the gradients computed automatically.
You can access the raw tensor through the **.data** attribute, while the gradient w.r.t. this variable is accumulated into **.grad**.

In [13]:
import torch
from torch.autograd import Variable

Create a variable:

In [14]:
x = Variable(torch.ones(2, 2), requires_grad=True)
print(x)

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]



Do an operation on variable:

In [15]:
y = x + 2
print(y)

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]



Do more operations on y:

In [17]:
z = y * y * 3
out = z.mean()
print(z, out)

Variable containing:
 27  27
 27  27
[torch.FloatTensor of size 2x2]
 Variable containing:
 27
[torch.FloatTensor of size 1]



## Gradients
Let's backprop now **out.backward()** is equivalent to doing **out.backward(torch.Tensor([1.0]))**

In [18]:
out.backward()

print gradients d(out)/dx

In [19]:
print(x.grad)

Variable containing:
 4.5000  4.5000
 4.5000  4.5000
[torch.FloatTensor of size 2x2]



Another example:

In [27]:
x = torch.randn(3)
x = Variable(x, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y *= 2
print(y)

gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)
print(x.grad)

Variable containing:
-1373.8628
  -30.9696
  -40.8120
[torch.FloatTensor of size 3]

Variable containing:
  204.8000
 2048.0000
    0.2048
[torch.FloatTensor of size 3]



## Neural Networks
Neural networks can be constructed using the **torch.nn** package.
**nn** deponds on **autograd** to define models and differentiate them. An **nn.Module** contains layers, and a method **forward(input)** that returns the **output**.
A typical training procedure for a neural network is as follows:
* Define the neural network that has some learnable parameters (or weights)
* Iterate over a dataset of inputs
* Process the input through the network
* Compute the loss (how far is the output from being correct)
* Propagate gradients back into the network's parameters
* Update the weights of the network, typically using a simple udate rule: $weight = weight - learning_rate * gradient$

### Define the network
Let's define this network:

In [None]:
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels
        # 5x5 square convol;ution kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(16, 16, 5)
        # an affine operation y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.f2c = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    
    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        