# **CIS 520: Machine Learning, Fall 2020**
# **Week 5, Worksheet 1**
## PyTorch Autograd Basics


- **Content Creators:** Mihir Parmar, Tejas Srivastava
- **Content Reviewers:**  Shaozhe Lyu, Michael Zhou




In this tutorial, we will cover:

*  [PyTorch](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html)
*  [Autograd](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py): Automatic Differentiation 




 
**Note**: Remember to change Runtime type to GPU Hardware accelerated for leveraging GPU resources.
```
> Runtime > Change runtime type > Hardware accelerator > Select GPU
```


What is PyTorch?
================

It’s a Python-based scientific computing package targeted at two sets of
audiences:

-  A replacement for NumPy to use the power of GPUs
-  A deep learning platform that provides maximum flexibility
   and speed

At its core, PyTorch provides a few key features:

- A multidimensional **Tensor** object, similar to [numpy](https://numpy.org/) but with GPU acceleration.
- An optimized **autograd** engine for automatically computing derivatives
- A clean, modular API for building and deploying **deep learning models**

You can find more information about PyTorch by following one of the [official tutorials](https://pytorch.org/tutorials/) or by [reading the documentation](https://pytorch.org/docs/1.1.0/).





Getting Started
---------------

###Tensors

Tensors are similar to NumPy’s ndarrays.


In [None]:
import torch

Construct a 5x3 matrix, uninitialized:



In [None]:
x = torch.empty(5, 3)
print(x)

Construct a randomly initialized matrix:



In [None]:
x = torch.rand(5, 3)
print(x)

Construct a matrix filled with zeros and of dtype long:



In [None]:
x = torch.zeros(5, 3, dtype=torch.long)
print(x)

Construct a tensor directly from data:



In [None]:
x = torch.tensor([5.5, 3])
print(x)

or create a tensor based on an existing tensor. These methods  
will reuse properties of the input tensor, e.g. dtype, unless  
new values are provided by the user.



In [None]:
x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)                                      # result has the same size

Get its size.   
**Note**: torch.Size is a tuple, so it supports all tuple operations.




In [None]:
print(x.size())

###Operations
There are multiple syntaxes for operations. In the following  
example, we will take a look at the addition operation.

Addition: Syntax 1



In [None]:
y = torch.rand(5, 3)
print(x + y)

Addition: Syntax 2



In [None]:
print(torch.add(x, y))

Addition: Providing an output tensor as argument



In [None]:
result = torch.empty(5, 3)
torch.add(x, y, out=result)
print(result)

Addition: In-place



In [None]:
# adds x to y
y.add_(x) # _ at end means the operation mutates tensor y in-place
print(y)

**Read later:**


  100+ Tensor operations, including transposing, indexing, slicing,  
  mathematical operations, linear algebra, random numbers, etc.,  
  are described
  [here](http://pytorch.org/docs/torch).






  
You can use standard NumPy-like indexing with all bells and whistles!



In [None]:
print(x[:, 1])

If you want to resize/reshape a tensor, you can use ``torch.view``:



In [None]:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

If you have a one element tensor, use ``.item()`` to get the value as a
Python number:



In [None]:
x = torch.randn(1)
print(x)
print(x.item())



NumPy Bridge
------------

Converting a Torch Tensor to a NumPy array and vice versa is a breeze.

The Torch Tensor and NumPy array will share their underlying memory  
locations, and changing one will change the other.

###Converting a Torch Tensor to a NumPy Array



In [None]:
a = torch.ones(5)
print(a)

In [None]:
b = a.numpy()
print(b)

See how the numpy array changed in value.



In [None]:
a.add_(1)
print(a)
print(b)

###Converting NumPy Array to Torch Tensor
See how changing the np array changed the Torch Tensor automatically:



In [None]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

All the Tensors on the CPU except a CharTensor support conversion to
NumPy and back.

CUDA Tensors
------------
One of the most important features of PyTorch is that it can use graphics processing units (GPUs) to accelerate its tensor operations.

We can easily check whether PyTorch is configured to use GPUs:

Tensors can be moved onto any device using the .to() method.




In [None]:
import torch

if torch.cuda.is_available:
  print('GPU is available for use')
else:
  print('Cannot use GPU.')

In [None]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!


Autograd: Automatic Differentiation
===================================

Central to all neural networks in PyTorch is the ``autograd`` package.  
Let’s first briefly visit this. We will train our
first neural network in another Worksheet.


The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.

Let us see this in more simple terms with some examples.

Tensor
--------

``torch.Tensor`` is the central class of the package. If you set its attribute
``.requires_grad`` as ``True``, it starts to track all operations on it. When  
you finish your computation you can call ``.backward()`` and have all the
gradients computed automatically. The gradient for this tensor will be accumulated into ``.grad`` attribute.

To stop a tensor from tracking history, you can call ``.detach()`` to detach
it from the computation history, and to prevent future computation from being
tracked.

To prevent tracking history (and use of memory), you can also wrap the code block
in ``with torch.no_grad():``. This can be particularly helpful when evaluating a
model because the model may have trainable parameters with `requires_grad=True`,
but for which we don't need the gradients.

There’s one more class which is very important for autograd implementation - a ``Function``.  

<img src="https://miro.medium.com/max/1536/1*wE1f2i7L8QRw8iuVx5mOpw.png" alt="Function" width="600"/>

``Tensor`` and ``Function`` are interconnected and build up an acyclic
graph, that encodes a complete history of computation. Each tensor has  
a ``.grad_fn`` attribute that references a ``Function`` that has created
the ``Tensor`` (except for Tensors created by the user - their  
``grad_fn is None``).

<img src="https://miro.medium.com/max/1684/1*FDL9Se9otGzz83F3rofQuA.png" alt="Computation Graph" width="500"/>

If you want to compute the derivatives, you can call ``.backward()`` on
a ``Tensor``. If ``Tensor`` is a scalar (i.e. it holds a one element  
data), you don’t need to specify any arguments to ``backward()``,
however if it has more elements, you need to specify a ``gradient``  
argument that is a tensor of matching shape.

<img src="https://miro.medium.com/max/1684/1*EWpoG5KayZSqkWmwM_wMFQ.png" alt="Computation Graph with Gradients" width="500"/>

In [None]:
import torch

Create a tensor and set requires_grad=True to track computation with it:



In [None]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

Do an operation of tensor:



In [None]:
y = x + 2
print(y)

``y`` was created as a result of an operation, so it has a ``grad_fn``.



In [None]:
print(y.grad_fn)

Do more operations on y:



In [None]:
z = y * y * 3
out = z.mean()

print(z, out)

``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
flag in-place. The input flag defaults to ``False`` if not given.



In [None]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

Gradients
---------
Let's do backprop now.
Because ``out`` contains a single scalar, ``out.backward()`` is
equivalent to ``out.backward(torch.tensor(1))``.



In [None]:
out.backward()

Print gradients d(out)/dx:




In [None]:
print(x.grad)

You should have gotten a matrix of ``4.5``. Let’s call the ``out``
*Tensor* “$o$”.  
We have that $o = \frac{1}{4}\sum_i z_i$,  
$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$.  
Therefore,  
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$, hence  
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.



You can do many crazy things with autograd!



In [None]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

In [None]:
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(gradients)

print(x.grad)

You can also stop autograd from tracking history on Tensors
with ``.requires_grad=True`` by wrapping the code block in
``with torch.no_grad()``:



In [None]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
	print((x ** 2).requires_grad)

**Read Later:**

Documentation of ``autograd`` and ``Function`` is at
http://pytorch.org/docs/autograd



**References:**<br>
 - [Pytorch official tutorials](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html).<br>

- [CIS 522 Spring '20](https://www.seas.upenn.edu/~cis522/index.html) Course Material