<a href="https://colab.research.google.com/github/jfogarty/machine-learning-intro-workshop/blob/master/misc/pytorch_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch with GPU in Colab

- From [Getting Started With Pytorch In Google Collab With Free GPU](https://www.marktechpost.com/2019/06/09/getting-started-with-pytorch-in-google-collab-with-free-gpu/) by Niranjan Kumar in [www.marktechpost.com](https://www.marktechpost.com).

Updated by [John Fogarty](https://github.com/jfogarty) for Python 3.6 and [Base2 MLI](https://github.com/base2solutions/mli) and [colab](https://colab.research.google.com) standalone evaluation.

**NOTE** This is currently a **Colab only** notebook. It will need significant changes to work locally.

## Colab has pytorch support built-in for Python 3 kernels.

You don't need to install any extra stuff.  Very nice.

## Pytorch – Tensors

Numpy based operations are not optimized to utilize GPUs to accelerate its numerical computations. For modern deep neural networks, GPUs often provide speedups of 50x or greater. So, unfortunately, numpy won’t be enough for modern deep learning. 

This is where Pytorch introduces the concept of **Tensor**. A Pytorch Tensor is conceptually identical to an n-dimensional numpy array. Unlike numpy, **PyTorch Tensors can utilize GPUs to accelerate their numeric computations**

Let’s see how you can create a Pytorch Tensor. First, we will import the required libraries. Remember that torch, numpy and matplotlib are pre-installed in Colab’s virtual machine.

In [0]:
import torch
import numpy
import matplotlib.pyplot as plt

The default tensor type in PyTorch is a float tensor defined as **torch.FloatTensor**. We can create tensors by using the inbuilt functions present inside the torch package.

In [65]:
## creating a tensor of 3 rows and 2 columns consisting of ones
x = torch.ones(3,2)
print(x)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])


In [66]:
## creating a tensor of 3 rows and 2 columns consisting of zeros
x = torch.zeros(3,2)
print(x)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])


### Creating tensors by random initialization:

In [67]:
# To increase the reproducibility, we often set the random seed to a specific value first.
torch.manual_seed(2)

x = torch.rand(3, 2) 
print(x)

tensor([[0.6147, 0.3810],
        [0.6371, 0.4745],
        [0.7136, 0.6190]])


In [68]:
#generating tensor randomly from normal distribution
x = torch.randn(3,3)
print(x)

tensor([[-2.1409, -0.5534, -0.5000],
        [-0.0815, -0.1633,  1.5277],
        [-0.4023,  0.0972, -0.5682]])


## Simple Tensor Operations

### Slicing of Tensors
You can slice PyTorch tensors the same way you slice ndarrays

In [69]:
#create a tensor
x = torch.tensor([[1, 2], 
                 [3, 4], 
                 [5, 6]])
print(x)
print(f"- Every row, only the last column : {x[:, 1]}")
print(f"-       Every column in first row : {x[0, :]}") 

y = x[1, 1] # take the element in first row and first column and create a another tensor
print(y)

tensor([[1, 2],
        [3, 4],
        [5, 6]])
- Every row, only the last column : tensor([2, 4, 6])
-       Every column in first row : tensor([1, 2])
tensor(4)


### Reshape Tensor

Reshape a Tensor to different shape

In [70]:
 x = torch.tensor([[1, 2], 
                 [3, 4], 
                 [5, 6]]) #(3 rows and 2 columns)
print(x)

print("\n- reshaping to 2 rows and 3 columns")
y = x.view(2, 3) #reshaping to 2 rows and 3 columns
y

tensor([[1, 2],
        [3, 4],
        [5, 6]])

- reshaping to 2 rows and 3 columns


tensor([[1, 2, 3],
        [4, 5, 6]])

Use of -1 to reshape the tensors.

-1 indicates that the shape will be inferred from previous dimensions. 

In the below code snippet `x.view(6,-1)` will result in a tensor of shape 6x1 because we have fixed the size of rows to be 6, Pytorch will now infer the best possible dimension for the column such that it will be able to accommodate all the values present in the tensor.

In [71]:
x = torch.tensor([[1, 2], 
                 [3, 4], 
                 [5, 6]]) #(3 rows and 2 columns
print(x)

print("- y shape will be 6x1")
y = x.view(6,-1)
y

tensor([[1, 2],
        [3, 4],
        [5, 6]])
- y shape will be 6x1


tensor([[1],
        [2],
        [3],
        [4],
        [5],
        [6]])

## Mathematical Operations

In [72]:
#Create two tensors
x = torch.ones([3, 2])
y = torch.ones([3, 2])

#adding two tensors
z = x + y #method 1
z = torch.add(x,y) #method 2
print(f"X\n{x}\n+\nY\n{y}\nis\n{z}")

#subtracting two tensors
z = x - y #method 1

print("\nElement-wise subtraction:")
torch.sub(x,y) #method 2

X
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
+
Y
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
is
tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])

Element-wise subtraction:


tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [73]:
# Scalar element-wise divison
x / 2

tensor([[0.5000, 0.5000],
        [0.5000, 0.5000],
        [0.5000, 0.5000]])

### Inplace Operations

In Pytorch all operations on the tensor that operate in-place on it will have an **`_` postfix**. For example, **`add`** is the out-of-place version, and **`add_`** is the in-place version.

In [74]:
y.add_(x) #tensor y added with x and result will be stored in y

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]])

## Pytorch to Numpy Bridge

Converting an **Pytorch tensor** to **numpy ndarray** is very useful sometimes. By using `.numpy()` on a tensor, we can easily convert tensor to ndarray.

In [75]:
x = torch.linspace(0 , 1, steps = 5) #creating a tensor using linspace
x_np = x.numpy() #convert tensor to numpy
print(type(x), type(x_np)) #check the types 

<class 'torch.Tensor'> <class 'numpy.ndarray'>


To convert numpy ndarray to pytorch tensor, we can use .from_numpy() to convert ndarray to tensor

In [76]:
import numpy as np
a = np.random.randn(5) #generate a random numpy array
a_pt = torch.from_numpy(a) #convert numpy array to a tensor
print(type(a), type(a_pt)) 

<class 'numpy.ndarray'> <class 'torch.Tensor'>


During the conversion, Pytorch tensor and numpy ndarray will share their underlying memory locations and changing one will change the other.

## CUDA Support

# IMPORTANT!

You **must** enable GPU support with **Runtime** | **Change Runtime Type** | **GPU** from the Colab menu before this will work.

To check how many CUDA supported GPU’s are connected to the machine, you can use below code snippet. If you are executing the code in Colab you will get 1, that means that the Colab virtual machine is connected to one GPU. torch.cuda is used to set up and run CUDA operations. It keeps track of the currently selected GPU.

In [77]:
import torch
n = torch.cuda.device_count()
print(f"The number of CUDA devices available to Torch is {n}.")
if n == 0:
    print("*** ERROR! You need to enable GPU support first using Runtime | Change Runtime Type | GPU")

The number of CUDA devices available to Torch is 1.


In [78]:
!nvidia-smi

Sat Aug 17 00:56:59 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P0    27W /  70W |    891MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
+-------

In [79]:
print(torch.cuda.get_device_name(0))

Tesla T4


The important thing to note is that we can reference this CUDA supported GPU card to a variable and use this variable for any Pytorch Operations.

All CUDA tensors you allocate will be created on that device. The selected GPU device can be changed with a [torch.cuda.device](https://pytorch.org/docs/stable/cuda.html#torch.cuda.device) context manager.

In [80]:
#Assign cuda GPU located at location '0' to a variable
cuda0 = torch.device('cuda:0')

#Performing the addition on GPU
a = torch.ones(3, 2, device=cuda0) #creating a tensor 'a' on GPU
b = torch.ones(3, 2, device=cuda0) #creating a tensor 'b' on GPU
c = a + b
print(c)


tensor([[2., 2.],
        [2., 2.],
        [2., 2.]], device='cuda:0')


## Automatic Differentiation

In this section, we will discuss the important package called automatic differentiation or autograd in Pytorch. The `autograd` package gives us the ability to perform automatic differentiation or automatic gradient computation for all operations on tensors. It is a define-by-run framework, which means that your back-propagation is defined by how your code is run.

Let’s see how to perform automatic differentiation by using a simple example. First, we create a tensor with `requires_grad` parameter set to `True` because we want to track all the operations performing on that tensor.

In [81]:
#create a tensor with requires_grad = True
x = torch.ones([3,2], requires_grad = True)
print(x)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], requires_grad=True)


Perform a simple tensor addition operation

In [82]:
y = x + 5 #tensor addition
print(y) #check the result

tensor([[6., 6.],
        [6., 6.],
        [6., 6.]], grad_fn=<AddBackward0>)


Because $y$ was created as a result of an operation on $x$, so it has a $grad\_fn$. 

Perform more operations on $y$ and create a new tensor $z$.

In [83]:
z = y*y + 1
print(z)

print("- adding all the values in z")
t = torch.sum(z) 
print(t)

tensor([[37., 37.],
        [37., 37.],
        [37., 37.]], grad_fn=<AddBackward0>)
- adding all the values in z
tensor(222., grad_fn=<SumBackward0>)


## Back-Propagation

To perform back-propagation, you can just call `t.backward()`

In [0]:
 t.backward() #peform backpropagation but pytorch will not print any output.

In [85]:
t

tensor(222., grad_fn=<SumBackward0>)

In [86]:
x

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], requires_grad=True)

Print gradients: $$\frac{d(t)}{dx}$$

In [88]:
print(x.grad)

tensor([[12., 12.],
        [12., 12.],
        [12., 12.]])


`x.grad` will give you the **partial derivative of t with respect to x** : $\partial t / \partial x$

If you are able to figure out how we got a tensor with all the values equal to 12, then you have understood the automatic differentiation. If not don’t worry just follow along, when we execute `t.backward()` we are calculating the partial derivate of t with respect to x. 

Remember that $t$ is a function of $z$, which in turn is a function of $x$.

$$
d(t)/dx = 2y + 1\ \text{at}\ x = 1\ \text{and}\ y = 6,\ \text{where}\ y = x + 5
$$

The important point to note is that the value of the derivative is calculated at the point where we initialized the tensor $x$.

Since we initialized $x$ at a value equal to one, we get an output tensor with all the values equal to 12.

## Conclusion

In this post, we briefly looked at the Pytorch & Google Colab and we also saw how to enable GPU hardware accelerator in Colab. Then we have seen how to create tensors in Pytorch and perform some basic operations on those tensors by utilizing CUDA supported GPU. After that, we discussed the Pytorch autograd package which gives us the ability to perform automatic gradient computation on tensors by taking a simple example. If you any issues or doubts while implementing the above code, feel free to ask them in the comment section below or send me a message in LinkedIn citing this article.

# BONUS!  TorchVision FNN Example

Let's try to get some actual work done with PyTorch.  Here we do a Feed Forward Network using the GPU if it's enabled. Very nice.

- From [Using PyTorch with GPU in Google Colab](https://jovianlin.io/pytorch-with-gpu-in-google-colab/) by Jovian Lin in [jovianlin.io](https://jovianlin.io).

- [This was his orignal Colan notebook](https://goo.gl/4U46tA) but it no longer works, out-of-the-box. Updated by [John Fogarty](https://github.com/jfogarty) for Python 3.6 and [Base2 MLI](https://github.com/base2solutions/mli) and [colab](https://colab.research.google.com) standalone evaluation.


### Upgrade to torchvision 0.4.0 first!

This was required to get around the following error; **[YMMV](https://dictionary.cambridge.org/us/dictionary/english/ymm)**
```
ImportError: /usr/local/lib/python3.6/dist-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at7getTypeERKNS_6TensorE
```

In [89]:
!pip install -U torchvision==0.4.0

Requirement already up-to-date: torchvision==0.4.0 in /usr/local/lib/python3.6/dist-packages (0.4.0)


In [0]:
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

## Initialize Hyper-parameters

In [0]:
input_size    = 784   # The image size = 28 x 28 = 784
hidden_size   = 500   # The number of nodes at the hidden layer
num_classes   = 10    # The number of output classes. In this case, from 0 to 9
num_epochs    = 5     # The number of times entire dataset is trained
batch_size    = 100   # The size of input data took for one iteration
learning_rate = 1e-3  # The speed of convergence

## Download MNIST Dataset
MNIST is a huge database of handwritten digits (i.e. 0 to 9) that is often used in image classification.

In [0]:
train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

## Load the Dataset

Note: We shuffle the loading process of train_dataset to make the learning process independent of data order, but the order of test_loader remains so as to examine whether we can handle unspecified bias order of inputs.

In [0]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

## Build the Feedforward Neural Network

Feedforward Neural Network Model Structure
The FNN includes two fully-connected layers (i.e. fc1 & fc2) and a non-linear ReLU layer in between. Normally we call this structure 1-hidden layer FNN, without counting the output layer (fc2) in.

By running the forward pass, the input images ($$) can go through the neural network and generate a output ($out$) demonstrating how are likey it is to belongs to each of the 10 classes.

For example, a cat image can have 0.8 factor to a dog class and a 0.3 factor to a airplane class.

In [0]:
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(Net, self).__init__()                    # Inherited from the parent class nn.Module
        self.fc1 = nn.Linear(input_size, hidden_size)  # 1st Full-Connected Layer: 784 (input data) -> 500 (hidden node)
        self.relu = nn.ReLU()                          # Non-Linear ReLU Layer: max(0,x)
        self.fc2 = nn.Linear(hidden_size, num_classes) # 2nd Full-Connected Layer: 500 (hidden node) -> 10 (output class)
    
    def forward(self, x):                              # Forward pass: stacking each layer together
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

## Instantiate the FNN

We now create a real FNN based on our structure.

In [0]:
net = Net(input_size, hidden_size, num_classes)

## Enable GPU


In [96]:
import torch
n = torch.cuda.device_count()
print(f"The number of CUDA devices available to Torch is {n}.")
use_cuda = n > 0

The number of CUDA devices available to Torch is 1.


In [97]:
if use_cuda and torch.cuda.is_available():
    print("- Woo hoo! CUDA GPU is enabled and available.")
    net.cuda()

- Woo hoo! CUDA GPU is enabled and available.


## Choose the Loss Function and Optimizer

Loss function (criterion) decides how the output can be compared to a class, which determines how good or bad the neural network performs. And the optimizer chooses a way to update the weight in order to converge to find the best weights in this neural network.

In [0]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

## Training the FNN Model

This process might take around **3 to 5** minutes depending on the backend machine. The detailed explanations are listed as comments (#) in the following codes.

In [99]:
print("Training Feed Forward Network.")
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):   # Load a batch of images with its (index, data, class)
        images = Variable(images.view(-1, 28*28))         # Convert torch tensor to Variable: change image from a vector of size 784 to a matrix of 28 x 28
        labels = Variable(labels)
        
        if use_cuda and torch.cuda.is_available():
            images = images.cuda()
            labels = labels.cuda()
        
        optimizer.zero_grad()                             # Intialize the hidden weight to all zeros
        outputs = net(images)                             # Forward pass: compute the output class given a image
        loss = criterion(outputs, labels)                 # Compute the loss: difference between the output class and the pre-given label
        loss.backward()                                   # Backward pass: compute the weight
        optimizer.step()                                  # Optimizer: update the weights of hidden nodes
        
        if (i+1) % 100 == 0:                              # Logging
            steps = len(train_dataset)//batch_size
            print(f'---- Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{steps}], Loss: {loss.data.item():.4f}')
            # JF - Note change in tensor structure. Zero indexing no longer allowed.
            #print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
            #     %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))
    print(f"-- Completed Epoch [{epoch+1}/{num_epochs}]")
print("Training complete.")

Epoch [1/5], Step [100/600], Loss: 0.2547
Epoch [1/5], Step [200/600], Loss: 0.1884
Epoch [1/5], Step [300/600], Loss: 0.2279
Epoch [1/5], Step [400/600], Loss: 0.1420
Epoch [1/5], Step [500/600], Loss: 0.2475
Epoch [1/5], Step [600/600], Loss: 0.2016
Epoch [2/5], Step [100/600], Loss: 0.2127
Epoch [2/5], Step [200/600], Loss: 0.0693
Epoch [2/5], Step [300/600], Loss: 0.1538
Epoch [2/5], Step [400/600], Loss: 0.0974
Epoch [2/5], Step [500/600], Loss: 0.0714
Epoch [2/5], Step [600/600], Loss: 0.1544
Epoch [3/5], Step [100/600], Loss: 0.0812
Epoch [3/5], Step [200/600], Loss: 0.0385
Epoch [3/5], Step [300/600], Loss: 0.1517
Epoch [3/5], Step [400/600], Loss: 0.0511
Epoch [3/5], Step [500/600], Loss: 0.0502
Epoch [3/5], Step [600/600], Loss: 0.1696
Epoch [4/5], Step [100/600], Loss: 0.0512
Epoch [4/5], Step [200/600], Loss: 0.0301
Epoch [4/5], Step [300/600], Loss: 0.1154
Epoch [4/5], Step [400/600], Loss: 0.0186
Epoch [4/5], Step [500/600], Loss: 0.0817
Epoch [4/5], Step [600/600], Loss:

## Testing the FNN Model
Similar to training the neural network, we also need to load batches of test images and collect the outputs. The differences are that:

1. No loss & weights calculation
2. No weights update
3. Has correct prediction calculation

In [100]:
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.view(-1, 28*28))
    
    if use_cuda and torch.cuda.is_available():
        images = images.cuda()
        labels = labels.cuda()
    
    
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)  # Choose the best class from the output: The class with the best score
    total += labels.size(0)                    # Increment the total count
    correct += (predicted == labels).sum()     # Increment the correct count
    
print('Accuracy of the network on the 10K test images: %d %%' % (100 * correct / total))

Accuracy of the network on the 10K test images: 97 %


## Save the trained FNN Model for future use

We save the trained model as a pickle that can be loaded and used later.

In [0]:
torch.save(net.state_dict(), 'fnn_model.pkl')

The PyTorch serialization format is built off of pickle (.pkl) but it overrides some functionality to handle tensors that share the same backing storage.

You can use [`torch.load`](https://pytorch.org/docs/stable/torch.html) to reload this model if you've saved this file.

Note that if you call `torch.load() `on a file which contains GPU tensors, those tensors will be loaded to GPU by default. You can call torch.load(.., map_location='cpu') and then load_state_dict() to avoid GPU RAM surge when loading a model checkpoint.

## Congrats!

You have done building your first Feedforward Neural Network!

### End of notebook.