## Overview


* About Pytorch
* Tensors
* Autograd
* Building a model
* Loading data
* Training


## Installing Pytorch here...




In [None]:
%pip install torch torchvision

And then to use it in your python code:



In [None]:
import torch
print(torch.cuda.is_available())


## What is PyTorch?

An open source ML framework in Python that accerlates path from prototyping to deplyment. It integrates with just about every other framework.  

Pytorch has lots of tools NN layer types, activation and loss functions, optimizers, and a "vision" add-on if you want to do machine vision.

PyTorch offers more than 300 functions for doing associated operations.

## What is a Tensor?

It's just a "multi-dimensional array", and a universal standard for storing data for uses in ML.  Since we've played with Python and NumPy, you already have a pretty good idea.  You access thes tensors through python API, but actually runs highly optimized C++ code.

In [None]:
# make a tensor with 5rows, and 3 columns, and inspect it...
theZeros = torch.zeros(5, 3)
print(theZeros)
print(theZeros.dtype)  #notices it's a matrix of 32bit floats

But you can make a collection of integers if you prefer, but we usually use floats.

In [None]:
i = torch.ones((5, 3), dtype=torch.int16)
print(i)

Ok -- let's make a small tensor and fill it with random data. Now, let's do some operations on it.

With tensors, we can apply basic scalar operations across all values. We can add, subtract, multiply.  We can get absolute values, apply trigonometric operations, statistical operations, and lots more.

In [None]:
torch.manual_seed(1729) #rerun this, and you'll restart the random sequence
r1 = torch.rand(2, 2)
print('A random tensor:')
print(r1)

r = torch.rand(2, 2) - 0.5 * 2 # values between -1 and 1
print('A random matrix, r:')
print(r)

# Common mathematical operations are supported:
print('\nAbsolute value of r:')
print(torch.abs(r))

# ...as are trigonometric functions:
print('\nInverse sine of r:')
print(torch.asin(r))

# ...and linear algebra operations like determinant and singular value decomposition
print('\nDeterminant of r:')
print(torch.det(r))
print('\nSingular value decomposition of r:')
print(torch.svd(r))

# ...and statistical and aggregate operations:
print('\nAverage and standard deviation of r:')
print(torch.std_mean(r))
print('\nMaximum value of r:')
print(torch.max(r))

Pay special attention to the shape of your tensors.
NOTE: This code is supposed to be broken to illustrate.

In [None]:
ones = torch.ones(2, 3)
print(ones)

twos = torch.ones(2, 3) * 2 # every element is multiplied by 2
print(twos)

threes = ones + twos       # additon allowed because shapes are similar
print(threes)              # tensors are added element-wise
print(threes.shape)        # this has the same dimensions as input tensors

r1 = torch.rand(2, 3)
r2 = torch.rand(3, 2)
r3 = r1 + r2               # error because shapes don't match!

## Build LaNet Digit Classifier in PyTorch

One of the very first hand-written digit classifiers was called "LaNet".  [Learn more here](https://d2l.ai/chapter_convolutional-neural-networks/lenet.html)


LeNet consists of two parts: (i) a convolutional encoder consisting of two convolutional layers; and (ii) a dense block consisting of three fully connected layers.


![Alt text for broken image link](https://anatomiesofintelligence.github.io/img/l/lenet5-architecture.gif)


Here's how it works. The first layer (C1) is a convolutional layer, meaning that it scans the input image for features it learned during training. It outputs a map of where it saw each of its learned features in the image.

This "activation map" is downsampled in layer S2.

Layer C3 is another convolutional layer, this time scanning C1's activation map for combinations of features. It also puts out an activation map describing the spatial locations of these feature combinations, which is downsampled in layer S4.

Finally, the fully-connected layers at the end, F5, F6, and OUTPUT, are a classifier that takes the final activation map, and classifies it into one of ten bins representing the 10 digits.

### Let' take a look in Python:


In [None]:
import torch                     # for all things PyTorch
import torch.nn as nn            # for torch.nn.Module, the parent object for PyTorch models
import torch.nn.functional as F  # for the activation function

class LeNet(nn.Module):

    def __init__(self):
        super(LeNet, self).__init__()
        # 1 input image channel (B&W), 6 output channels, 3x3 convolution kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)

        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

## Initialize our model (class)
Instantiate the LeNet class, and we print the net object.

A subclass of torch.nn.Module will report the layers it has created and their shapes and parameters. This can provide a handy overview of a model if you want to get the gist of its processing.

## Make a fake image

Next, we make a dummy input representing a 32x32 image with 1 color channel. Ordinarily we would use a real image. This data is random, so the classifier will choose the best match. Just don't expect an actual digit!

We added an extra dimension to our tensor - the "batch" dimension. PyTorch models assume they are working on batches of data - for example, a batch of 16 of our image tiles would have the shape (16, 1, 32, 32). Since we're only using one image, we create a batch of 1 with shape (1, 1, 32, 32).

## Make an Inference

We ask the model for an inference by calling it like a function: net(input). The output of this call represents the model's confidence that the input represents a particular digit. (Since this instance of the model hasn't learned anything yet, we shouldn't expect to see any signal in the output.) Looking at the shape of output, we can see that it also has a batch dimension, the size of which should always match the input batch dimension. If we had passed in an input batch of 16 instances, output would have a shape of (16, 10).

Let's try a run and see what happens...

In [None]:
net = LeNet()
print(net)                         # what does the object tell us about itself?

input = torch.rand(1, 1, 32, 32)   # stand-in for a 32x32 black & white image
print('\nImage batch shape:')
print(input.shape)

output = net(input)                # we don't call forward() directly
print('\nRaw output:')
print(output)
print(output.shape)

## Building A Model

For a real example, we'll load a dataset from TorchVision. This will give us the chance to use the DataLoader to feed the model batches of data. We typically break data up into chunks.

In [None]:
%matplotlib inline

import torch
import torchvision
import torchvision.transforms as transforms

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

### Here, we specify two input transformations:


transforms.ToTensor() converts images loaded by Pillow into PyTorch tensors.

transforms.Normalize() adjusts the values of the tensor so that their average is zero and their standard deviation is 0.5. Most activation functions have their strongest gradients around x = 0, so centering our data there can speed learning.
There are many more transforms available, including cropping, centering, rotation, and reflection.

## Now let's get some sample images...

Next, we'll download and create an instance of the CIFAR10 dataset (it make take some time).  This are 32x32 color images of 10 classes of objects: 6 of animals (bird, cat, deer, dog, frog, horse) and 4 of vehicles (airplane, automobile, ship, truck). This is an example of creating a dataset object in PyTorch.

Downloadable datasets (like CIFAR-10 above) are subclasses of torch.utils.data.Dataset.

In [None]:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)


When we instantiate our dataset, we need to tell the system:

* where we want data to go
* whether to download the data
* whether we're using data for training or or testing
* what transformations to use

Once your dataset is ready, you can give it to the DataLoader:

In [None]:
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

## Key Points

1. A Dataset subclass wraps access to the data, and is specialized to the type of data it's serving.

2. The DataLoader knows nothing about the data, but organizes the input tensors served by the Dataset into batches with the parameters you specify.

We've asked a DataLoader to give us batches of 4 images from trainset, randomizing their order (shuffle=True), and we told it to spin up two workers to load data from disk.  Let's visualize:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))


# get some random training images
dataiter = iter(trainloader)
# Use next(dataiter) instead of dataiter.next()
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))


## Training your model

Let's put it all together to load and test some images from our dataset...

In [None]:
%matplotlib inline

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim #a new lib...

import torchvision
import torchvision.transforms as transforms

import matplotlib
import matplotlib.pyplot as plt
import numpy as np

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [None]:
import matplotlib.pyplot as plt
import numpy as np

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))


# get some random training images
dataiter = iter(trainloader)
# Use next(dataiter) instead of dataiter.next()
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))


Ok...we have a dataset to train our model. Now lets make the model.


In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

Next we need to functions:
1. a loss function
2. a optimizer function

The loss function, measures how far from our ideal output was the model's prediction.

The optimizer is what drives the learning.This optimizer implements stochastic gradient descent, one of the more straightforward optimization algorithms.

Besides parameters of the algorithm, like the learning rate (lr) and momentum, we also pass in net.parameters(), which is a collection of all the learning weights in the model - which is what the optimizer adjusts.

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Finally, all of this is assembled into the training loop.  We will do two iterations [range(2)]

In [None]:
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

Each pass has an inner loop that iterates over the training data (line 4), serving batches of transformed input images and their correct labels.

Zeroing the gradients (line 9) is an important step. Gradients are accumulated over a batch; if we do not reset them for every batch, they will keep accumulating, which will provide incorrect gradient values, making learning impossible.

In line 12, we ask the model for its predictions on this batch. In the following line (13), we compute the loss - the difference between outputs (the model prediction) and labels (the correct output).

In line 14, we do the backward() pass, and calculate the gradients that will direct the learning.

In line 15, the optimizer performs one learning step - it uses the gradients from the backward() call to nudge the learning weights in the direction it thinks will reduce the loss.

The remainder of the loop does some light reporting on the epoch number, how many training instances have been completed, and what the collected loss is over the training loop.

Take a look at the output of our test run. Note that the loss is monotonically descending, indicating that our model is continuing to improve its performance on the training dataset.

As a final step, we should check that the model is actually doing *general* learning, and not simply "memorizing" the dataset. This is called **overfitting,** and usually indicates that the dataset is too small (not enough examples for general learning), or that the model has more learning parameters than it needs to correctly model the dataset.

This is the reason datasets are split into training and test subsets - to test the generality of the model, we ask it to make predictions on data it hasn't trained on:



In [None]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Running this, you should see that the model is roughly  > 53% accurate at this point. That's not exactly state-of-the-art, but it's far better than the 10% accuracy we'd expect from a random output, and proves our model is learning.