# Pytorch tutorial

## What is PyTorch?

It’s a Python-based scientific computing package targeted at two sets of
audiences:

-  A replacement for NumPy to use the power of GPUs
-  a deep learning research platform that provides maximum flexibility
   and speed

## Basic operators

This section provides basic concepts about tensors and operators between them. All possible funtionalities could be bound at [pytorch documantation](https://pytorch.org/docs/stable/index.html)




In [6]:
import torch
seed_num = 1234
torch.manual_seed(seed_num)

<torch._C.Generator at 0x10f531330>

In [7]:
torch.rand([2, 3])

tensor([[0.0290, 0.4019, 0.2598],
        [0.3666, 0.0583, 0.7006]])

In [8]:
torch.zeros([4, 5])

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [9]:
torch.randint(9, [2,5])

tensor([[6, 6, 7, 3, 3],
        [1, 7, 4, 0, 6]])

In [10]:
x = torch.tensor([[3.] * 4] * 2)
print(x)

tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.]])


In [18]:
y = 2*torch.ones([2, 4])
print(y)

tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.]])


In [19]:
x - y

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [20]:
x + y

tensor([[5., 5., 5., 5.],
        [5., 5., 5., 5.]])

In [21]:
x * y

tensor([[6., 6., 6., 6.],
        [6., 6., 6., 6.]])

In [22]:
x/y

tensor([[1.5000, 1.5000, 1.5000, 1.5000],
        [1.5000, 1.5000, 1.5000, 1.5000]])

In [40]:
y.t()

tensor([[2., 2.],
        [2., 2.],
        [2., 2.],
        [2., 2.]])

In [39]:
torch.mm(x, y.t())

tensor([[24., 24.],
        [24., 24.]])

## NumPy Bridge

Converting a Torch Tensor to a NumPy array and vice versa is a breeze.

The Torch Tensor and NumPy array will share their underlying memory
locations, and changing one will change the other.

In [46]:
a = torch.ones(5)
print(a)

tensor([1., 1., 1., 1., 1.])


In [47]:
b = a.numpy()
print(b)

[1. 1. 1. 1. 1.]


In [48]:
a.add_(1)
print("Tensor form: ", a)
print("Numpy form: ", b)

Tensor form:  tensor([2., 2., 2., 2., 2.])
Numpy form:  [2. 2. 2. 2. 2.]


In [50]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print("Numpy form: ", a)
print("Tensor form: ", b)

Numpy form:  [2. 2. 2. 2. 2.]
Tensor form:  tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


## CUDA Tensors
Tensors can be moved onto any device using the .to method.

In [51]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!


## Autograd: Automatic Differentiation

Central to all neural networks in PyTorch is the ``autograd`` package.
Let’s first briefly visit this, and we will then go to training our
first neural network.


The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.

Let us see this in more simple terms with some examples.

### Tensor

``torch.Tensor`` is the central class of the package. If you set its attribute
``.requires_grad`` as ``True``, it starts to track all operations on it. When
you finish your computation you can call ``.backward()`` and have all the
gradients computed automatically. The gradient for this tensor will be
accumulated into ``.grad`` attribute.

To stop a tensor from tracking history, you can call ``.detach()`` to detach
it from the computation history, and to prevent future computation from being
tracked.

To prevent tracking history (and using memory), you can also wrap the code block
in ``with torch.no_grad():``. This can be particularly helpful when evaluating a
model because the model may have trainable parameters with `requires_grad=True`,
but for which we don't need the gradients.

There’s one more class which is very important for autograd
implementation - a ``Function``.

``Tensor`` and ``Function`` are interconnected and build up an acyclic
graph, that encodes a complete history of computation. Each tensor has
a ``.grad_fn`` attribute that references a ``Function`` that has created
the ``Tensor`` (except for Tensors created by the user - their
``grad_fn is None``).

If you want to compute the derivatives, you can call ``.backward()`` on
a ``Tensor``. If ``Tensor`` is a scalar (i.e. it holds a one element
data), you don’t need to specify any arguments to ``backward()``,
however if it has more elements, you need to specify a ``gradient``
argument that is a tensor of matching shape.



In [52]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [53]:
def forward(x):
    y = x + 2
    print("y = ", y)
    z = y * y * 3
    print("z = ", z)
    return z.mean()

In [54]:
out = forward(x)
print("out = ", out)

y =  tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
z =  tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
out =  tensor(27., grad_fn=<MeanBackward1>)


In [55]:
out.backward()
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


You should have got a matrix of ``4.5``. Let’s call the ``out``
*Tensor* “$o$”.  
We have that $o = \frac{1}{4}\sum_i z_i$,
$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$.  
Therefore,
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$,  
hence
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.

## Implementing a simple neural network
### Generate random data and parameters

In [60]:
import torch.nn as nn
n_in, n_h, n_out, batch_size = 10, 5, 1, 10

In [61]:
x = torch.randn(batch_size, n_in)
y = torch.tensor([[1.0], [0.0], [0.0], [1.0], [1.0], [1.0], [0.0], [0.0], [1.0], [1.0]])

### Construct a model

In [63]:
model = nn.Sequential(nn.Linear(n_in, n_h),
                     nn.ReLU(),
                     nn.Linear(n_h, n_out),
                     nn.Sigmoid())
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

### Train the model

In [64]:
for epoch in range(10):
    # Forward Propagation
    y_pred = model(x)
    # Compute and print loss
    loss = criterion(y_pred, y)
    print('epoch: ', epoch,' loss: ', loss.item())
    # Zero the gradients
    optimizer.zero_grad()

    # perform a backward pass (backpropagation)
    loss.backward()

    # Update the parameters
    optimizer.step()

epoch:  0  loss:  0.24630241096019745
epoch:  1  loss:  0.2461433708667755
epoch:  2  loss:  0.2459845244884491
epoch:  3  loss:  0.2458258718252182
epoch:  4  loss:  0.24566738307476044
epoch:  5  loss:  0.2455090880393982
epoch:  6  loss:  0.24535097181797028
epoch:  7  loss:  0.24519303441047668
epoch:  8  loss:  0.24503527581691742
epoch:  9  loss:  0.24487769603729248
