<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#2019-04-24_week09_fundamental-learning_pytorch-blitz" data-toc-modified-id="2019-04-24_week09_fundamental-learning_pytorch-blitz-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>2019-04-24_week09_fundamental-learning_pytorch-blitz</a></span><ul class="toc-item"><li><span><a href="#first-exploration-of-tensors" data-toc-modified-id="first-exploration-of-tensors-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>first exploration of tensors</a></span></li><li><span><a href="#autograd" data-toc-modified-id="autograd-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>autograd</a></span></li><li><span><a href="#Neural-networks-with-torch.nn" data-toc-modified-id="Neural-networks-with-torch.nn-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Neural networks with torch.nn</a></span><ul class="toc-item"><li><span><a href="#defining-the-network" data-toc-modified-id="defining-the-network-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>defining the network</a></span></li><li><span><a href="#Loss-function" data-toc-modified-id="Loss-function-1.3.2"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Loss function</a></span></li><li><span><a href="#backprop" data-toc-modified-id="backprop-1.3.3"><span class="toc-item-num">1.3.3&nbsp;&nbsp;</span>backprop</a></span></li><li><span><a href="#weight-update" data-toc-modified-id="weight-update-1.3.4"><span class="toc-item-num">1.3.4&nbsp;&nbsp;</span>weight update</a></span></li></ul></li><li><span><a href="#imgagenet-example-CIFAR10-using-a-convnet" data-toc-modified-id="imgagenet-example-CIFAR10-using-a-convnet-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>imgagenet example CIFAR10 using a convnet</a></span><ul class="toc-item"><li><span><a href="#manual-check-of-network-performance" data-toc-modified-id="manual-check-of-network-performance-1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>manual check of network performance</a></span></li><li><span><a href="#performance-across-whole-dataset" data-toc-modified-id="performance-across-whole-dataset-1.4.2"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>performance across whole dataset</a></span></li></ul></li></ul></li></ul></div>

# 2019-04-24_week09_fundamental-learning_pytorch-blitz

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import torch
import numpy as np

## first exploration of tensors

In [None]:
torch.rand(5,3)

In [None]:
torch.zeros(5,3, dtype=torch.long)

In [None]:
x = torch.tensor([5.5, 3],dtype=torch.double)
x

In [None]:
x = x.new_ones(5,3) # takes a shape but inherits other properties like dtype and what it runs on ie GPU/CPU
x

In [None]:
x = torch.randn_like(x, dtype=torch.float) #result has same size
x

In [None]:
x.shape # equivilant to x.size()

In [None]:
y = torch.rand(5,3)

In [None]:
x+y

In [None]:
torch.add(x,y)

In [None]:
#can provide an output tensor as an argument ie
result = torch.empty(5,3)
torch.add(x, y, out=result)
result

In [None]:
# in place addition
y.add_(x)
y

All numpy like indexing works for tensors

In [None]:
x[:,1]

In [None]:
# use torch.view to reshape
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1,8) # -1 infers size from other dimensions
print(x.shape, y.shape, z.shape)

In [None]:
x[0,0]

In [None]:
x[0,0].item() #returns a python number

In [None]:
a = torch.ones(5)
a

In [None]:
b = a.numpy()
b

In [None]:
a.add_(1) #adds one to the tensor in place
print(a)
print(b)

numpy array stays linked to its tensor! and vice versa for a tensor created from a numpy array

In [None]:
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a,1, out=a)
print(a)
print(b)

## autograd 

torch.Tensor is the central class of pytorch. it is more than a fancy numpy wrapper, it tracks the operations on it, and then uses that to calculate the gradient.

In [None]:
# the requires_grad argument is used to track the operations on a tensor
x = torch.ones(2,2, requires_grad=True)
x

In [None]:
y=x+2
y

the `grad_fn` attribute references a Function which created the tesnor.

In [None]:
z = y*y*3
out = z.mean()
print(z, out)

In [None]:
a = torch.randn(2,2)
a = ((a*3)/(a-1))
print(a, a.requires_grad)

In [None]:
a.requires_grad_(True)
a

In [None]:
b = (a**2).sum()
b

In [None]:
### backprop

In [None]:
x = torch.ones(2,2, requires_grad=True)

y=x+2

z = y*y*3

out = z.mean()

print(x)
print(y)
print(z)
print(out)

In [None]:
out.backward() # caculates the gradients via bckprop

In [None]:
print(x.grad) # gradients d(out)/dx

Now consider a vector out, in this case we must pass a vector argument to backward, where with a scalar output we dont need to (it is implied that `out.backward(torch.tensor(1.))`)

In [None]:
x = torch.randn(3, requires_grad=True)

y = x *2

while y.data.norm()<1000:
    y = y *2

print(y)


In [None]:
v = torch.tensor([0.1, 1.0, 0.0001])
y.backward(v)
x.grad

## Neural networks with torch.nn

`nn.Module` contains layers and a method `forward(input)` that returns `output`. `nn` depends on autograd to define models and differentiate them.

### defining the network

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    
    def __init__(self):
        super(Net,self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        # nn.Conv2d(in_channels, out_channels, kernel_size)
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6,16,5)
        # an affine operation: y=Wx+b
        #torch.nn.Linear(in_features, out_features)
        self.fc1 = nn.Linear(16*5*5,120)
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84,10)
        
    def forward(self,x):
        # applies 2d max pooling over an input signal composed of several input planes
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        #square can specify only one number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2) 
        x = x.view(-1, self.num_flat_features(x))
        # F.relu applies the relu function elementwise
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:] # all dimensions except batch
        num_features = 1
        for s in size:
            num_features*=s
        return num_features
        
        

In [None]:
net=Net()
net

In [None]:
for p in list(net.parameters()):
    print(p.shape)

In [None]:
for i, layer in enumerate(list(net.children())):
    print("==============")
    print(f"layer : {layer}")
    print(f"parameters : {list(net.parameters())[2*i].size()}")

Note on convolutional layers:
- in the first convolutional layer, the weights of the kernal are 5x5 and there are 6 output channels so the total weights tensor is 5x5x6.
- in the *second* convolutional layer, each kernal samples a 5x5x6 volume (since there are 6 input channels) and it does this 16 times as there are 16 output channels. hence its weights will be 5x5x6x16

### Loss function

In [None]:
# net takes a 32x32 input image
input = torch.randn(1,1,32,32)
input.shape

In [None]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out, out.shape)

In [None]:
# Zero the gradient buffers of all parameters and backprops with random gradients
net.zero_grad()
out.backward(torch.randn(1,10))

In [None]:
output = net(input)
target = torch.randn(10)
target = target.view(1,-1) #reshape to output shape
criterion = nn.MSELoss() # simple mean squared error loss function

loss = criterion(output,target)
loss

```python
from torchviz import make_dot
make_dot(loss, params=dict(net.named_parameters()))
``` 
can be used to visualise the computation graph. its pretty big, so to keep things short here, the graph is:

```
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss
```

### backprop

In [None]:
net.zero_grad() # zeros the gradient buffers else they will accumulate

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward() 
# backward clears intermediary results to reclaim memory when run unless 
# `retain_graph=True` is used.

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

### weight update

In [None]:
# using simple python
learning_rate =0.01
for f in net.parameters():
    f.data.sub_(f.grad.data*learning_rate)

In [None]:
# using torch.optim to implement various update rules
import torch.optim as optim

# use a stochastic gradient descent (SGD) optimizer in this case
optimizer = optim.SGD(net.parameters(), lr=0.01) 

optimizer.zero_grad() # zeros the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # update weights

## imgagenet example CIFAR10 using a convnet

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(), 
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ]
)
#Normalize takes a (mean,mean,mean)(std,std,std)
# returns (input[channel] - mean[channel]) / std[channel]
# since images are in range [0,1] this will transform them to [-1,1]

trainset = torchvision.datasets.CIFAR10(root='data/CIFAR10', train= True, 
                                        download=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                         shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='data/CIFAR10', train= False, 
                                        download=True, transform=transform)

testloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


In [None]:
dataiter = iter(trainloader)
images, labels = dataiter.next()

In [None]:
images.shape

In [None]:
class Net(nn.Module):
    
    def __init__(self,c1,c2):
        super(Net,self).__init__()
        self.c1=c1
        self.c2=c2
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        # nn.Conv2d(in_channels, out_channels, kernel_size)
        self.conv1 = nn.Conv2d(3, self.c1, 5)
        self.conv2 = nn.Conv2d(self.c1,self.c2,5)
        # an affine operation: y=Wx+b
        #torch.nn.Linear(in_features, out_features)
        self.fc1 = nn.Linear(self.c2*5*5,120)
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84,10)
        
    def forward(self,x):
        # applies 2d max pooling over an input signal composed of several input planes
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        #square can specify only one number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2) 
        x = x.view(-1, self.c2*5*5)
        # F.relu applies the relu function elementwise
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net=Net(6,16)

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

In [None]:
for epoch in range(2):
    running_loss = 0.0
    
    # data has 4 images in a 4x3x32x32 tensor and a single size 4 with the labels
    # batch size is 4
    for i, data in enumerate(trainloader,0):
        # get the data and labels
        inputs, labels = data
        # forward, backward, update weights
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs,labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        if i %2000 == 1999:
            print(f"{epoch+1},{i+1} loss : {running_loss/2000:0.3f}")
            running_loss=0.0
        
print("done")
        
        

### manual check of network performance

In [None]:
data_iter = iter(testloader)
images, labels = data_iter.next()

print(f"Ground truth : {[classes[labels[j]] for j in range (4)]}")

In [None]:
outputs = net(images)
outputs[0]

In [None]:
# returns the values of the energy for each class, and the index
energies, predicted = torch.max(outputs,1) 
print(energies)
print(predicted)

In [None]:
print(f"Network output : {[classes[predicted[j]] for j in range (4)]}")

### performance across whole dataset

In [None]:
correct = 0 
total = 0

with torch.no_grad(): # does not track function on tensors
    for data in testloader:
        images, labels = data
        outputs = net(images)
        energies, predicted = torch.max(outputs.data, 1)
        total +=labels.size(0)
        correct += (predicted == labels).sum().item()
f"accuracy : {correct/total*100}%"

In [None]:
class_correct=[0.]*10
class_total = [0.]*10
with torch.no_grad(): # does not track function on tensors
    for data in testloader:
        images, labels = data
        outputs = net(images)
        energies, predicted = torch.max(outputs.data, 1)
        total +=labels.size(0)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label= labels[i]
            class_correct[label] += c[i].item()
            class_total[label] +=1
            


In [None]:
for i, name in enumerate(classes):
    accuracy = class_correct[i]/class_total[i]*100
    print(f"accuracy of {name:6s} : {accuracy:0.1f}%")