# Introduction to PyTorch

This notebooke provides a very gentle introduction to PyTorch. We will explore tensor creation, network definition and learning.

In [1]:
import sys
# Unncomment the following if not running in conda
#!{sys.executable} -m pip install torch==1.0.1 seaborn==0.9.0 matplotlib numpy
!conda install pytorch==1.0.1 seaborn==0.9.0 matplotlib numpy -y > /dev/null
import torch



  current version: 4.5.12
  latest version: 4.6.14

Please update conda by running

    $ conda update -n base -c defaults conda




Let's start by exploring different ways of creating tensors using PyTorch:

In [2]:
# Creating tensors using PyTorch constructor
data = torch.FloatTensor(3, 3)
print(data)
data.zero_()                     # In place
print(data)

# Creating tensors from numpy arrays
import numpy as np
np_data = np.random.random((3,3)).astype(np.float32)
data = torch.tensor(np_data)
print(data)

# Scalar tensors (0 dimentional vectors, a.k.a. a point)
scalar = torch.tensor(3.3)
print(scalar)

# We can get a scalar by summarizing a tensor
scalar = data.sum()
print(scalar)
# and access the scalar value by using the special member function item()
print(scalar.item())

tensor([[ 4.3013e-08,  3.0674e-41,  4.3066e+21],
        [ 1.1824e+22,  4.3066e+21,  6.3828e+28],
        [ 3.8016e-39, -3.8913e-37,  6.7262e-44]])
tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
tensor([[0.1178, 0.7866, 0.2842],
        [0.1321, 0.4596, 0.5513],
        [0.0543, 0.6711, 0.1467]])
tensor(3.3000)
tensor(3.2037)
3.2037463188171387


## GPU tensors

Tensors can be backed by a CPU or GPU device. CPU tensors reside in the computer's main memory, whereas GPU tensors reside in the GPU memory. Support for GPU is restricted to cuda compatible devices.

In [None]:
# Create a CPU tensor
data = torch.FloatTensor(3, 3)
data.device

# Create a GPU tensors: Note the following will only work on a computer with a cuda compatible GPU
cu_data = torch.cuda.FloatTensor(3, 3)

# Convert a CPU tensor to GPU tensor
cu_data = data.cuda()

## Autograd

We are going to define an expression as the following

$$
a = <a_0, a_1>\\
b = <b_0, b_1>\\
v_{prod} = <2 * (a_0 + b_0), 2 * (a_1 + b_1)>\\
v_{res} = v_{prod_0} + v_{prod_1}\\
v_{res} = (2 * (a_0 + b_0)) + (2 * (a_1 + b_1))
$$

and we will display the gradients with respect to the components of $a$ and $b$, which using simple calculus should return:

$$
\frac{\partial v_{res}}{\partial a_0} = 2, 
\frac{\partial v_{res}}{\partial a_1} = 2,
\frac{\partial v_{res}}{\partial b_0} = 2, 
\frac{\partial v_{res}}{\partial b_1} = 2
$$

This is all made very easy by they autograd (automatic gradient calculation) of PyTorch

In [3]:
a = torch.tensor([1, 2], dtype=torch.float32, requires_grad=True)
b = torch.tensor([2, 2], dtype=torch.float32, requires_grad=True)
v_res = ((a + b) * 2).sum()
v_res.backward()
print(a.grad)
print(b.grad)

tensor([2., 2.])
tensor([2., 2.])


## Building Neural Networks

This is the bread and butter of PyTorch day-to-day usage. Let's see how to define a neural network with three densly connected layers, using ReLu (Rectified Linear Unit) as the non-linear activation function:

In [4]:
import torch.nn as nn

# Linear combination operator
linear = nn.Linear(in_features=2, out_features=3, bias=True)
print(linear.state_dict())
# Apply operator to some data
data = torch.tensor([1, 2], dtype=torch.float32)
res = linear(data)
print(res)

# ReLu operator
relu = nn.ReLU()
output = relu(res)
print(output)

OrderedDict([('weight', tensor([[-0.2584, -0.6231],
        [ 0.4604,  0.5747],
        [ 0.4584,  0.5001]])), ('bias', tensor([-0.0767, -0.2505,  0.4687]))])
tensor([-1.5814,  1.3593,  1.9273], grad_fn=<AddBackward0>)
tensor([0.0000, 1.3593, 1.9273], grad_fn=<ThresholdBackward0>)


We can combine multiple operators sequentially to build multiple layers in a simple manner, using the ```Sequential``` class. For example, the following code defines a network with 1 input layer, one hidden layer with 3 units, and one output layer with a single unit (_Note: hidden layers are normally considered to be composed of a linear operation and a non-linear activation function_):

In [5]:
net = nn.Sequential(
    nn.Linear(2, 3),
    nn.ReLU(),
    nn.Linear(3, 1)
)

print(net)

res = net(data)
res.backward()
print(res)


Sequential(
  (0): Linear(in_features=2, out_features=3, bias=True)
  (1): ReLU()
  (2): Linear(in_features=3, out_features=1, bias=True)
)
tensor([-0.4978], grad_fn=<AddBackward0>)
