# PyTorch Tutorial

The goal of PyTorch is to provide a replacement package for numpy when working with deep learning. This is necessary because
 - Numpy doesn't have GPU support
 - Numpy doesn't have automatic differentiation (autodiff)
 - Numpy doesn't have utilities for distributing computation across devices (e.g., train a model on 100 GPUs simultaneously)

## Array Creation
The common numpy operations for array creation also work in PyTorch. In PyTorch, we call these *tensors*.

In [8]:
import torch
a = torch.empty(2, 3)
b = torch.ones(3, 2)
c = torch.zeros(3, 3)

print(c)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])


## Interfacing with Numpy
Going back and forth between numpy and PyTorch is easy.

In [12]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
c = b.numpy()

a == c

array([ True,  True,  True,  True,  True])

## Basic Math
All of the typical mathematical operators are defined as expected. Unlike numpy, the types need to match—casting is not done implicitly.

In [21]:
x = torch.tensor([[2, 3], [4,5]], dtype=torch.float)
y = torch.ones(2, 2)
print(x)
print(y)
print(x + y)

tensor([[2., 3.],
        [4., 5.]])
tensor([[1., 1.],
        [1., 1.]])
tensor([[3., 4.],
        [5., 6.]])


## Automatic Differentiation
Let's build the simple neural network we derived. Note that this example is missing a lot of small details (e.g., batching, iterative updating, regularization).

In [118]:
x = torch.tensor([5., 2., 10.])
y = torch.tensor([1.])

learning_rate = 1e-2

theta_1 = torch.rand(3, 3, requires_grad=True)
theta_2 = torch.rand(3, requires_grad=True)

print(theta_1)
print(theta_2)

z = torch.mv(theta_1, x)
h = torch.max(z, torch.zeros_like(z))
y_hat = torch.dot(theta_2, h)
loss = 0.5 * (y - y_hat)**2

print('Loss is: {}'.format(loss))

loss.backward()

with torch.no_grad():
    theta_1 -= learning_rate * theta_1.grad
    theta_2 -= learning_rate * theta_2.grad

print(theta_1)
print(theta_2)

z = torch.mv(theta_1, x)
h = torch.max(z, torch.zeros_like(z))
y_hat = torch.dot(theta_2, h)
loss = 0.5 * (y - y_hat)**2

print('Loss is: {}'.format(loss))

tensor([[0.3624, 0.0351, 0.4195],
        [0.5525, 0.7437, 0.5118],
        [0.7857, 0.1524, 0.6084]], requires_grad=True)
tensor([0.9750, 0.8846, 0.2824], requires_grad=True)
Loss is: tensor([130.0210], grad_fn=<MulBackward>)
tensor([[-0.4238, -0.2794, -1.1527],
        [-0.1608,  0.4584, -0.9147],
        [ 0.5580,  0.0613,  0.1529]], requires_grad=True)
tensor([-0.0050, -0.6261, -1.3813], requires_grad=True)
tensor(-6.1360, grad_fn=<DotBackward>)
Loss is: tensor([25.4610], grad_fn=<MulBackward>)


## Neural Networks
PyTorch provides a lot of convenient utilities for constructing neural networks that allow us to build our networks by combining modular network components.

In [119]:
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.dense1 = nn.Linear(784, 200)
        self.dense2 = nn.Linear(200, 10)
    
    def forward(self, x):
        x = F.relu(self.dense1(x))
        return self.dense2(x)

In [138]:
from load_data import load_data

train_features, test_features, train_targets, test_targets = load_data('mnist-multiclass')

x = torch.from_numpy(train_features.astype('float')).float()
y = torch.from_numpy(train_targets.astype('float')).long()

model = Net()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)
loss_fn = nn.CrossEntropyLoss()

for i in range(10):
    model.train()
    optimizer.zero_grad()
    y_hat = model(x)
    loss = loss_fn(y_hat, y)
    loss.backward()
    optimizer.step()
    print('Loss is {}'.format(loss.detach().numpy()))

Loss is 2.368236541748047
Loss is 2.0589964389801025
Loss is 1.7804220914840698
Loss is 1.5231589078903198
Loss is 1.2872402667999268
Loss is 1.079508662223816
Loss is 0.9041410684585571
Loss is 0.7614465951919556
Loss is 0.6480257511138916
Loss is 0.5590713024139404
