---

# Practical Machine Learning with Python
# Chapter 9: Introduction to PyTorch
## Guillermo Avendaño-Franco  and Aldo Humberto Romero
## West Virginia University

### Machine Learning Workshop 2019

---

This notebook is based on a variety of sources, usually other notebooks, the material was adapted to the topics covered during lessons. In some cases, the original notebooks were created for Python 2.x or older versions of Scikit-learn or Tensorflow and they have to be adapted. 

## References

### Books

 * **Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems**, 1st Edition *Aurélien Géron*  (2017)

 * **Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow**, 2nd Edition, *Sebastian Raschka* and *Vahid Mirjalili* (2017)

 * **Deep Learning: A Practitioner's approach**, *Josh Patterson* and *Adam Gibson* 
 
 * **Deep Learning**, *Ian Goodfelow*, *Yoshua Bengio* and *Aaron Courville* (2016)

### Jupyter Notebooks

 * [Yale Digital Humanities Lab](https://github.com/YaleDHLab/lab-workshops)
 
 * Aurelein Geron Hands-on Machine Learning with Scikit-learn 
   [First Edition](https://github.com/ageron/handson-ml)
   [Second Edition (In preparation)](https://github.com/ageron/handson-ml2)
   
 * [A progressive collection notebooks of the Machine Learning course by the University of Turin](https://github.com/rugantio/MachineLearningCourse)
   
 * [A curated set of jupyter notebooks about many topics](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)
   
### Videos

 * [Caltech's "Learning from Data" by Professor Yaser Abu-Mostafa](https://work.caltech.edu/telecourse.html)
 
 The support of the National Science Foundation and the US Department of Energy under projects: DMREF-NSF 1434897, NSF OAC-1740111 and DOE DE-SC0016176 is recognized.

<div style="clear: both; display: table;">
<div style="border: none; float: left; width: 40%; padding: 10px">
<img src="fig/NSF.jpg" alt="National Science Foundation" style="width:50%" align="left">
    </div>
    <div style="border: none; float: right; width: 40%; padding: 10px">
<img src="fig/DOE.jpg" alt="National Science Foundation" style="width:50%" align="right">
</div>

## Setup

This Jupyter notebook was created to run on a Python 3 kernel. Some Ipython magics were used: 

In [1]:
# commands prefaced by a % in Jupyter are called "magic"
# these "magic" commands allow us to do special things only related to jupyter

# %matplotlib inline - allows one to display charts from the matplotlib library in a notebook
# %load_ext autoreload - automatically reloads imported modules if they change
# %autoreload 2 - automatically reloads imported modules if they change
import matplotlib
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [2]:
%load_ext watermark
%watermark

2019-07-29T15:01:48-04:00

CPython 3.7.3
IPython 5.8.0

compiler   : GCC 8.3.0
system     : Linux
release    : 5.0.0-20-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 8
interpreter: 64bit


In [3]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import sklearn
import scipy
import torch

In [4]:
%watermark -iv

numpy      1.16.2
matplotlib 3.0.2
pandas     0.23.3
IPython    5.8.0
torch      1.1.0
sklearn    0.20.2
scipy      1.2.1



# Introduction to PyTorch

PyTorch is a Python-based scientific computing package that is very similar to Numpy and is intended to be used in two contexts:

 1. As replacement for NumPy when you want to take advantage of using the power of GPUs
    
 2. As a full featured Deep Learning research platform that provides maximum flexibility and speed

The idea on PyTorch is to use the familiarity of Numpy ndarrays and allow the processing to happen on GPUs if they are available.

## Creation

There are several tensor constructors. Similar to those in Numpy.

In [2]:
x = torch.zeros(4,4)
x

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

There is a constructor for uninitalized variables:

In [3]:
y = torch.empty(5,5)
y

tensor([[1.5471e-04, 2.6731e-06, 3.4183e-06, 1.6877e-07, 8.2728e+20],
        [2.0546e+20, 1.7279e-04, 1.7062e-07, 3.0792e-18, 1.9421e+31],
        [2.7491e+20, 6.1949e-04, 7.1856e+22, 4.3605e+27, 2.3329e-18],
        [1.9284e+31, 3.2314e-18, 4.3424e-05, 3.4198e+21, 8.2287e-10],
        [2.6879e-06, 1.6631e+22, 4.2193e-08, 1.3603e-05, 8.1998e-10]])

The values contain the remanents of the data in the locations of memory where the object is created.

A Tensor with random values in its entries:

In [4]:
x = torch.rand(3,3)
x

tensor([[0.6617, 0.5053, 0.6741],
        [0.1677, 0.2567, 0.6331],
        [0.4644, 0.0215, 0.1473]])

In [5]:
x = torch.randint(0,9,(3,3))
x

tensor([[8, 0, 1],
        [7, 1, 4],
        [3, 2, 8]])

In [6]:
x = torch.randint

Similar to Numpy, new tensor can be created from lists and lists of lists:

In [7]:
x = torch.tensor([3.14, 1.67])
x

tensor([3.1400, 1.6700])

New tensor sharing properties from an existing object are created from `.new_*` methods:

In [8]:
x=torch.tensor(range(16), dtype=torch.int8).reshape(4,4)
x

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]], dtype=torch.int8)

In [9]:
x.new_empty((2,2))

tensor([[ 96, -29],
        [ 90, -60]], dtype=torch.int8)

In [10]:
x.new_full((2,2),2.3)

tensor([[2, 2],
        [2, 2]], dtype=torch.int8)

In [11]:
x.new_ones((2,2))

tensor([[1, 1],
        [1, 1]], dtype=torch.int8)

In [12]:
x.new_zeros((2,2))

tensor([[0, 0],
        [0, 0]], dtype=torch.int8)

## Operations

In [13]:
x=torch.rand((2,3))
print(x)
y=torch.rand((2,3))
print(y)

tensor([[0.8659, 0.3298, 0.9382],
        [0.4632, 0.1986, 0.1409]])
tensor([[0.0900, 0.3043, 0.3818],
        [0.8440, 0.2117, 0.3115]])


In [14]:
x+y

tensor([[0.9559, 0.6341, 1.3200],
        [1.3072, 0.4103, 0.4524]])

In [15]:
torch.add(x,y)

tensor([[0.9559, 0.6341, 1.3200],
        [1.3072, 0.4103, 0.4524]])

In [16]:
z=torch.empty((2,3))
torch.add(x,y, out=z)
z

tensor([[0.9559, 0.6341, 1.3200],
        [1.3072, 0.4103, 0.4524]])

In [17]:
y.add(x)

tensor([[0.9559, 0.6341, 1.3200],
        [1.3072, 0.4103, 0.4524]])

In [18]:
y.add_(x)
y

tensor([[0.9559, 0.6341, 1.3200],
        [1.3072, 0.4103, 0.4524]])

## Numpy like operations

Traditional operations for Numpy apply to tensors

In [19]:
x=torch.tensor(range(64)).reshape(8,8)
x

tensor([[ 0,  1,  2,  3,  4,  5,  6,  7],
        [ 8,  9, 10, 11, 12, 13, 14, 15],
        [16, 17, 18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29, 30, 31],
        [32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47],
        [48, 49, 50, 51, 52, 53, 54, 55],
        [56, 57, 58, 59, 60, 61, 62, 63]])

In [20]:
y=x[2:5,:]
y

tensor([[16, 17, 18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29, 30, 31],
        [32, 33, 34, 35, 36, 37, 38, 39]])

Same as Numpy, those are views and changing the view, change the undelying tensor.

In [21]:
y[0,0]=100
x

tensor([[  0,   1,   2,   3,   4,   5,   6,   7],
        [  8,   9,  10,  11,  12,  13,  14,  15],
        [100,  17,  18,  19,  20,  21,  22,  23],
        [ 24,  25,  26,  27,  28,  29,  30,  31],
        [ 32,  33,  34,  35,  36,  37,  38,  39],
        [ 40,  41,  42,  43,  44,  45,  46,  47],
        [ 48,  49,  50,  51,  52,  53,  54,  55],
        [ 56,  57,  58,  59,  60,  61,  62,  63]])

In [22]:
y=x.view(64)
z=x.view(-1,16)
print(x.size(),y.size(),z.size())

torch.Size([8, 8]) torch.Size([64]) torch.Size([4, 16])


## Data extraction into Python numbers and lists

In [23]:
y=x[6,6]
y

tensor(54)

In [24]:
y.item()

54

In [25]:
x.tolist()

[[0, 1, 2, 3, 4, 5, 6, 7],
 [8, 9, 10, 11, 12, 13, 14, 15],
 [100, 17, 18, 19, 20, 21, 22, 23],
 [24, 25, 26, 27, 28, 29, 30, 31],
 [32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47],
 [48, 49, 50, 51, 52, 53, 54, 55],
 [56, 57, 58, 59, 60, 61, 62, 63]]

## Torch Tensors {to, from} Numpy Arrays 

In [26]:
x=torch.tensor(range(36)).reshape(6,6)
x

tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17],
        [18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35]])

In [27]:
y=x.numpy()
y

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

The Torch Tensor and NumPy array will share their underlying memory locations (if the Torch Tensor is on CPU), and changing one will change the other.

In [28]:
x[0,0]=999
y

array([[999,   1,   2,   3,   4,   5],
       [  6,   7,   8,   9,  10,  11],
       [ 12,  13,  14,  15,  16,  17],
       [ 18,  19,  20,  21,  22,  23],
       [ 24,  25,  26,  27,  28,  29],
       [ 30,  31,  32,  33,  34,  35]])

Conversion from Numpy to Torch preserving memory addressing is also possible


In [29]:
xn=np.random.rand(3,3)
xn

array([[0.4483372 , 0.16132143, 0.91520312],
       [0.64532699, 0.55950573, 0.23426107],
       [0.80146715, 0.92280189, 0.41816932]])

In [30]:
xt=torch.from_numpy(xn)
xt

tensor([[0.4483, 0.1613, 0.9152],
        [0.6453, 0.5595, 0.2343],
        [0.8015, 0.9228, 0.4182]], dtype=torch.float64)

In [31]:
xt[0,0]=100
xn

array([[100.        ,   0.16132143,   0.91520312],
       [  0.64532699,   0.55950573,   0.23426107],
       [  0.80146715,   0.92280189,   0.41816932]])

# Autograd: Automatic gradient evaluation

In [32]:
x = torch.ones(3, 3, requires_grad=True)
print(x)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], requires_grad=True)


In [33]:
y = x + 3.14
print(y)

tensor([[4.1400, 4.1400, 4.1400],
        [4.1400, 4.1400, 4.1400],
        [4.1400, 4.1400, 4.1400]], grad_fn=<AddBackward0>)


In [34]:
y.grad_fn

<AddBackward0 at 0x7febc456dda0>

In [35]:
z = y**2 * y * 3
out = z.mean()

print(z, out)

tensor([[212.8739, 212.8739, 212.8739],
        [212.8739, 212.8739, 212.8739],
        [212.8739, 212.8739, 212.8739]], grad_fn=<MulBackward0>) tensor(212.8739, grad_fn=<MeanBackward0>)


The property of being tracked for gradient computation can be enable after the object is created

In [36]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x7febc4557b70>


## Gradients

In [37]:
out.backward()

In [38]:
x.grad

tensor([[17.1396, 17.1396, 17.1396],
        [17.1396, 17.1396, 17.1396],
        [17.1396, 17.1396, 17.1396]])

# Neural Networks

In [39]:
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)


Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [40]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 3, 3])


In [41]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[-0.1307,  0.0773,  0.0707,  0.1349,  0.1128,  0.0438,  0.0938, -0.0394,
          0.0989,  0.0660]], grad_fn=<AddmmBackward>)


In [42]:
input.shape

torch.Size([1, 1, 32, 32])

In [43]:
net.zero_grad()
out.backward(torch.randn(1, 10))

In [44]:
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(0.8719, grad_fn=<MseLossBackward>)


In [45]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU


<MseLossBackward object at 0x7febc4571a58>
<AddmmBackward object at 0x7febc4571b00>
<AccumulateGrad object at 0x7febc4571a58>


In [46]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)


conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-0.0336, -0.0101, -0.0126, -0.0177,  0.0230,  0.0273])


In [47]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update
