## Chapter 3: Deep Learning with PyTorch

No RL in this chapter! Just an intro to using PyTorch for deep learning. I'm excited as I've seen a bunch of positive stuff about PyTorch - interested to see how it compares to Tensorflow and Keras.

#### Tensors

In [1]:
import torch
import numpy as np

In [2]:
# Initialise a random tensor
a = torch.FloatTensor(3, 2)
a

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [3]:
# Tensors have methods, e.g. set all values to 0 with .zero_()
# Trailing underscore indicates an in-place method
a.zero_()

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [4]:
# Alternatively we can convert a numpy array - but we probably want a smaller
# dtype than the 64-bit default (overkill for DL generally)
n = np.zeros(shape=(3, 2))
torch.tensor(n, dtype=torch.float32)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

#### Gradient calculations

In [6]:
# If we want auto-calculated gradients, we have to explicitly say so
# That property will then be appropriately inherited
v1 = torch.tensor([1.0, 1.0], requires_grad=True)
v2 = torch.tensor([2.0, 2.0])

v_sum = v1 + v2
v_res = (v_sum*2).sum()
v_res

tensor(12., grad_fn=<SumBackward0>)

In [7]:
# Note how the is_leaf (= was this explicitly defined by the user, rather than
# created as a result of function transformation) and requires_grad attributes change
print(v1.is_leaf, v2.is_leaf, v_sum.is_leaf, v_res.is_leaf)
print(v1.requires_grad, v2.requires_grad, v_sum.requires_grad, v_res.requires_grad)

True True False False
True False True True


In [8]:
# Tell PyTorch to calculate gradients - the .backward() method calculates numerical derivatives
v_res.backward()
v1.grad

tensor([2., 2.])

In [10]:
# We don't get gradients for anything which didn't require them (i.e. for which we
# didn't state that they were required
v2.grad

#### NN building blocks

There's a load of preimplemented classes in the `torch.nn` package.

In [12]:
import torch.nn as nn

# Randomly initialised feed-forward layer with 2 inputs and 5 outputs
L = nn.Linear(2, 5)
v = torch.FloatTensor([1, 2])
L(v)

tensor([ 0.4857,  1.2639, -1.6472, -0.4163, -0.3356], grad_fn=<AddBackward0>)

In [13]:
# The Sequential() class is useful for building a multilayered network
s = nn.Sequential(
    nn.Linear(2, 5),
    nn.ReLU(),
    nn.Linear(5, 20),
    nn.ReLU(),
    nn.Linear(20, 10),
    nn.Dropout(p=0.3),
    nn.Softmax(dim=1)
)
s

Sequential(
  (0): Linear(in_features=2, out_features=5, bias=True)
  (1): ReLU()
  (2): Linear(in_features=5, out_features=20, bias=True)
  (3): ReLU()
  (4): Linear(in_features=20, out_features=10, bias=True)
  (5): Dropout(p=0.3, inplace=False)
  (6): Softmax(dim=1)
)

In [16]:
# Pushing a tensor through it, just to prove it works
# NOTE: we are defining a 2d tensor here using nested lists i.e. [[row0], [row1], ...]
s(torch.FloatTensor([[1, 2]]))

tensor([[0.1262, 0.0804, 0.1029, 0.0821, 0.0890, 0.1116, 0.0890, 0.0675, 0.1323,
         0.1189],
        [0.1299, 0.0917, 0.0917, 0.0845, 0.0734, 0.1149, 0.0945, 0.0917, 0.1362,
         0.0917]], grad_fn=<SoftmaxBackward>)