# Plan for today:
- PyTorch installation
- PyTorch basics
- Data loading and simple models in PyTorch (linear regression, images of digits)

# Install pytorch and torchvision
Installation instruction: https://pytorch.org/  
e.g. with conda:  
`conda install pytorch torchvision -c pytorch`  
or with pip:  
`pip3 install https://download.pytorch.org/whl/cpu/torch-1.0.1.post2-cp36-cp36m-linux_x86_64.whl`  
`pip3 install torchvision`  
  
Different commands for GPU version -> all in PyTorch website

# PyTorch resources
- https://pytorch.org/tutorials/
- https://github.com/yunjey/pytorch-tutorial
- http://cs231n.stanford.edu/

# PyTorch basics

## Tensors

In [None]:
import torch

Basic datatype in PyTorch is a Tensor (syntax very similar to numpy array). https://pytorch.org/docs/stable/tensors.html  
**A `torch.Tensor` is a multi-dimensional matrix containing elements of a single data type.**

In [None]:
data = torch.tensor([1, 2, 3])
print(data)
print(data.size())
print(data.dtype)

In [None]:
data = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.float32)
print(data)
print(data.size())

In [None]:
torch.zeros((3, 4))

In [None]:
torch.ones((3, 4))

In [None]:
# Random numbers
torch.rand(3, 5)

## CPU vs GPU (CUDA)
Tensors can live on CPU or on GPU if torch is installed with Nvidia CUDA support. *CUDA tensors allow for much faster computation!*

In [None]:
torch.cuda.is_available()

To make code compatible with both GPU and CPU, you can set the device at the beginning of your code and always transfer your tensors/models to this device

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
print(device)

In [None]:
a = torch.tensor(5, device=device)
print(a)

In [None]:
a = a.to(device)
print(a)

In [None]:
a = a.to(torch.device('cpu'))
print(a)
print(a.device)

## Computation

In [None]:
a = torch.tensor(5.0)
b = torch.tensor(2.0)
print(a, b)

In [None]:
c = a + b
print(c)

In [None]:
c = a - b
print(c)

In [None]:
c = a * b
print(c)

In [None]:
c = a / b
print(c)

In [None]:
# Matrix computations
a = torch.rand(3, 4)
b = torch.rand(4, 1)
print(a)
print(b)

In [None]:
c = a.matmul(b)
print(c)

In [None]:
print(c + torch.rand(3, 1))

# PyTorch computational graph
Important property of PyTorch tensor operations - they create a **dynamic computational graph**.  
The tensor **c** knows that how it was created from **a** and **b**.  
Thanks to this, derivatives (gradients) can be computed for tensors.  
Example:  
<img src="tree-eval.png" alt="Drawing" style="width: 600px;"/>  
*Source: http://colah.github.io/posts/2015-08-Backprop/ - good blog post to understand backpropagation*

### Compute for a=2, b=1

In [None]:
# Create a and b
# Compute c, d, e
a = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(1.0, requires_grad=True)
c = a + b
d = b + 1
e = c * d
print(a, b, c, d, e)

In [None]:
# grad_fn stores information about computation
print(c.grad_fn)
print(e.grad_fn)

### Now we can computer derivative of $e$ w.r.t. $a, b, c, d$
<img src="tree-eval-derivs.png" alt="Drawing" style="width: 600px;"/>  

\begin{equation} 
e = c * d
\end{equation}
\begin{equation} 
\frac {\partial e} {\partial c} = d = 2
\end{equation}
\begin{equation} 
\frac {\partial e} {\partial d} = c = 3
\end{equation}
\begin{equation} 
\frac {\partial e} {\partial a} = \frac {\partial e} {\partial c} \frac {\partial c} {\partial a} = 2
\end{equation}
\begin{equation} 
\frac {\partial e} {\partial b} = \frac {\partial e} {\partial c} \frac {\partial c} {\partial b} + \frac {\partial e} {\partial d} \frac {\partial d} {\partial b} = 2 * 1 + 3 * 1 = 5
\end{equation}

Compute gradients with `torch.autograd.grad` https://pytorch.org/docs/stable/autograd.html

In [None]:
# Compute gradients with torch.autograd.grad https://pytorch.org/docs/stable/autograd.html

In [None]:
grad_a, grad_b, grad_c, grad_d = torch.autograd.grad(e, [a, b, c, d])

In [None]:
print(grad_a, grad_b, grad_c, grad_d)

In [None]:
# Another way for leaf tensors with `backward` method

In [None]:
# Create a and b
# Compute c, d, e
a = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(1.0, requires_grad=True)
c = a + b
d = b + 1
e = c * d
print(a, b, c, d, e)

In [None]:
e.backward()

In [None]:
print(a.grad, b.grad, c.grad, d.grad)