## What is PyTorch?
It’s a Python-based scientific computing package targeted at two sets of audiences:

* A replacement for NumPy to use the power of GPUs
* a deep learning research platform that provides maximum flexibility and speed

### Tensors

Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.



In [0]:
import torch
import numpy as np

## different ways to create Tensor
x1 = torch.zeros(5,3, dtype=torch.long)
x2 = torch.rand(5,3)
x3 = torch.Tensor(5,3)
x4 = torch.Tensor(5,3).uniform_(-1,1)   ## create 5x3 matrix with values between -1 to 1
x5 = torch.Tensor([1, 5 ,7 ,9])


In [0]:
# operations
y1 = x2 + x4
y2 = torch.add(x2,x4)

# add x to y
y1.add_(x4)

## You can use standard NumPy-like indexing with all bells and whistles!
print(y1[:,1])

tensor([-0.6745,  0.2601, -0.0147,  1.3664, -0.4267])


#### Note
Any operation that mutates a tensor in-place is post-fixed with an _. For example: x.copy_(y), x.t_(), will change x.

In [0]:
x = torch.randn(4,4)

y = x.view(16)
z = x.view(-1,8)  # the size -1 is inferred from other dimensions

print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


#### Numpy Bridge
The Torch Tensor and NumPy array will share their underlying memory locations, and changing one will change the other.

In [0]:
a = torch.ones(5)
b = a.numpy()

print(a)
print(b)

a.add_(1)
print(a)
print(b)

tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]
tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


In [0]:
a1 = np.ones(5)
b1 = torch.from_numpy(a1)

np.add(a1, 1, out=a1)
print(a1)
print(b1)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


####  Autograd : automatic differentiation
The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.


torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.

In [0]:
x = torch.ones(2,2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [0]:
y = x+2
z = y*y*3
out = z.mean()
out

tensor(27., grad_fn=<MeanBackward0>)

In [0]:
out.backward()
x.grad

tensor([[13.5000, 13.5000],
        [13.5000, 13.5000]])

### Neural Networks
Neural networks can be constructed using the torch.nn package.

nn.Module contains layers, and a method forward(input)that returns the output.

![alt text](https://pytorch.org/tutorials/_images/mnist.png)

A typical training procedure for a neural network is as follows:

* Define the neural network that has some learnable parameters (or weights)
* Iterate over a dataset of inputs
* Process input through the network
* Compute the loss (how far is the output from being correct)
* Propagate gradients back into the network’s parameters
* Update the weights of the network, typically using a simple 
 update rule: 
 >* weight = weight - learning_rate \* gradient

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
  
  def __init__(self):
    super(Net, self).__init__()
    
    self.conv1 = nn.Conv2d(1, 6, 5) # 1 input channel, 6 output chan., 5x5 conv kernel
    self.conv2 = nn.Conv2d(6, 16, 5)
    self.fc1 = nn.Linear(16*5*5, 120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)
    
  def forward(self, x):
    # Max pooling over a (2,2) window or just 2
    x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
    x = F.max_pool2d(F.relu(self.conv2(x)), 2)
    x = x.view(-1, self.num_flat_features(x))
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x
  
  def num_flat_features(self, x):
    size = x.size()[1:] # all dimensions except the bactch dim.
    num_features = 1
    for s in size:
      num_features *= s
    return num_features
  
net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


The learnable parameters of a model are returned by net.parameters()

In [0]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 5, 5])


Let try a random 32x32 input. 

In [0]:
input = torch.randn(1, 1, 32, 32)

out = net(input)
print(out)

tensor([[ 0.0414,  0.0925,  0.0325,  0.0966,  0.0247, -0.0236,  0.0521, -0.0203,
         -0.0744, -0.1122]], grad_fn=<AddmmBackward>)


#### Note 
torch.nn only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample.

For example, nn.Conv2d will take in a 4D Tensor of 
> nSamples x nChannels x Height x Width.

If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension.

#### Loss Function
A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

In [0]:
output = net(input)

target = torch.randn(10)    ## a dummy target
target = target.view(1, -1) ## make it the same shape as out
criterion = nn.MSELoss()    ## computer mean-squared error

loss = criterion(output, target)
print(loss)


tensor(0.4255, grad_fn=<MseLossBackward>)


Now, if you follow loss in the backward direction, using its .**grad_fn attribute**, you will see a graph of computations that looks like this:

* **input** -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
     >* -> view -> linear -> relu -> linear -> relu -> linear
     >* -> MSELoss
     >* -> loss

In [0]:
print(loss.grad_fn)
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU


<MseLossBackward object at 0x7fdef189ea90>
<AddmmBackward object at 0x7fdef189e9b0>
<AccumulateGrad object at 0x7fdef189ea90>


To backpropagate the error all we have to do is to loss.backward() You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

In [0]:
net.zero_grad() # zeroes the gradient buffers of all params

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-0.0210, -0.0019, -0.0038, -0.0027,  0.0046,  0.0106])


#### Update the weights
The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):
> weight = weight - learning_rate * gradient

In [0]:
lr = 0.01
for f in net.parameters():
  f.data.sub_(f.grad.data * lr)

**torch.optim** package contains different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.

In [0]:
import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.01) ## create optimizer

optimizer.zero_grad() # zero the gradient buffers

output = net(input)
loss   = criterion(output, target)
print('loss:', loss.item())

loss.backward()
optimizer.step()  # does the update

output = net(input)
loss   = criterion(output, target)
print('loss after one cycle:', loss.item())

loss: 0.41491034626960754
loss after one cycle: 0.4058629870414734


### Training a Classifier

#### What about data?
Generally, when you have to deal with image, text, audio or video data, you can use standard python packages that load data into a numpy array. Then you can convert this array into a torch.*Tensor.

* For images, packages such as Pillow, OpenCV are useful
* For audio, packages such as scipy and librosa
* For text, either raw Python or Cython based loading, or NLTK and SpaCy are useful

Specifically for vision, we have created a package called **torchvision**, that has data loaders for common datasets such as Imagenet, CIFAR10, MNIST, etc

#### We will do the following steps in order:

* Load and normalizing the CIFAR10 training and test datasets using torchvision
* Define a Convolutional Neural Network
* Define a loss function
* Train the network on the training data
* Test the network on the test data

In [0]:
import torch
import torchvision
import torchvision.transforms as transforms

import numpy as np
import matplotlib.pyplot as plt

##### Load and normalizing the CIFAR10 training and test datasets

In [0]:
## output of torchvision datasets are PILImage images of range [0, 1]
## so transform them to Tensors of normalized range[-1,1]
transform = transforms.Compose([transforms.ToTensor(),
                               transforms.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, 
                                        download=True, transform=transform)

## data loader
train_loader = torch.utils.data.DataLoader(dataset=trainset, batch_size=64,
                                           shuffle=True, num_workers=2)

Files already downloaded and verified


In [0]:
## fetch one data pair
image, label = trainset[0]

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

print(image.size())
print('label: ',classes[label])

torch.Size([3, 32, 32])
label:  frog


In [0]:
# get some random training images
dataiter = iter(train_loader)
images, labels = dataiter.next()

[classes[i] for i in labels[:10]]

['deer',
 'car',
 'horse',
 'deer',
 'truck',
 'frog',
 'ship',
 'frog',
 'deer',
 'bird']

##### Define a Convolutional Neural Network

In [0]:
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

##### Define a loss function

In [0]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

##### Train the network on the training data

In [0]:
for epoch in range(2):    ## loop over the dataset multiple times
  
  running_loss = 0.0
  for i,data in enumerate(train_loader, 0):
    
    inputs, labels = data
    
    optimizer.zero_grad()  ## zero the params gradients
    
    ## forward + backward + optimize
    output = net(inputs)
    loss = criterion(output, labels)
    loss.backward()
    optimizer.step()
    
    running_loss += loss.item()
    if i%200 == 199:
      print('[%d, %5d] loss:%.3f' % (epoch+1, i+1, running_loss/2000))
      running_loss=0

print('Finished Training')      

[1,   200] loss:0.116
[1,   400] loss:0.116
[1,   600] loss:0.113
[2,   200] loss:0.108
[2,   400] loss:0.105
[2,   600] loss:0.106
Finished Training
