<a href="https://colab.research.google.com/github/joshualin24/NCSA_pytorch_tutorial/blob/main/NCSA_PyTorch_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **PyTorch Basics Tutorial**

The following tutorial is adapted from the [DeepLearningForNLPInPytorch tutorial](https://github.com/rguthrie3/DeepLearningForNLPInPytorch/blob/master/Deep%20Learning%20for%20Natural%20Language%20Processing%20with%20Pytorch.ipynb) and the [Official PyTorch tutorials](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py).

## Installation

**Via Anaconda/Miniconda:**  
`condainstall pytorch-c pytorch`

**Via pip:**  
`pip3 install torch`


In [None]:
import torch
print(torch.__version__)

1.9.0+cu102


## 1. Tensors
Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.

### 1.1 Creating Tensors from data
Tensors can be created from Python lists with the `torch.Tensor()` function.

In [None]:
# Example with 1-D data
data = [1.0, 2.0, 3.0]
tensor = torch.Tensor(data)
print("Example with 1-D data")
print(tensor)

# Example with 2-D data
data = [[1., 2., 3.], [4., 5., 6]]
tensor = torch.Tensor(data)
print("\nExample with 2-D data")
print(tensor)

# Example with 3-D data
data = [[[1.,2.], [3.,4.]],
        [[5.,6.], [7.,8.]]]
tensor = torch.Tensor(data)
print("\nExample with 3-D data")
print(tensor)

Example with 1-D data
tensor([1., 2., 3.])

Example with 2-D data
tensor([[1., 2., 3.],
        [4., 5., 6.]])

Example with 3-D data
tensor([[[1., 2.],
         [3., 4.]],

        [[5., 6.],
         [7., 8.]]])


### 1.2 Initializing an empty Tensor
An uninitialized matrix is declared, but does not contain definite known values before it is used. When an uninitialized matrix is created, whatever values were in the allocated memory at the time will appear as the initial values.


In [None]:
# Construct a 2x3 matrix, uninitialized
x = torch.empty(2, 3)
print(x)

tensor([[-2.1558e-28,  3.0901e-41,  3.3631e-44],
        [ 0.0000e+00,         nan,  6.0000e+00]])


### 1.3 Randomly initialized Tensor



In [None]:
x = torch.rand(2, 3)
print(x)

tensor([[0.2707, 0.0604, 0.9663],
        [0.4437, 0.8198, 0.2761]])


### 1.4 Tensor with zeros or ones

In [None]:
# Create a matrix of all zeros
x = torch.zeros(2, 3)
print("Matrix of zeros")
print(x)

# Create a matrix of all zeros and explicitly set data type to be long int
x = torch.zeros(2, 3, dtype=torch.long)
print("\nMatrix of zeros typecasted to long")
print(x)

x = torch.ones(2, 3, dtype=torch.long)
print("\nMatrix of ones typecasted to long")
print(x)


Matrix of zeros
tensor([[0., 0., 0.],
        [0., 0., 0.]])

Matrix of zeros typecasted to long
tensor([[0, 0, 0],
        [0, 0, 0]])

Matrix of ones typecasted to long
tensor([[1, 1, 1],
        [1, 1, 1]])


### 1.5 Create Tensor based on existing Tensor

In [None]:
x = x.new_ones(2, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)                                      # result has the same size

tensor([[1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[-0.1537,  0.0114,  0.8903],
        [ 0.6393, -0.1981, -1.8608]])


### 1.6 Size of a Tensor
`torch.Size` is in fact a tuple, so it supports all tuple operations.

In [None]:
# Example with 1-D data
data = [1.0, 2.0, 3.0]
tensor = torch.Tensor(data)
print("Example with 1-D data")
print(tensor)
print(tensor.size())

# Example with 2-D data
data = [[1., 2., 3.], [4., 5., 6]]
tensor = torch.Tensor(data)
print("\nExample with 2-D data")
print(tensor)
print(tensor.size())

# Example with 3-D data
data = [[[1.,2.], [3.,4.]],
        [[5.,6.], [7.,8.]]]
tensor = torch.Tensor(data)
print("\nExample with 3-D data")
print(tensor)
print(tensor.size())

Example with 1-D data
tensor([1., 2., 3.])
torch.Size([3])

Example with 2-D data
tensor([[1., 2., 3.],
        [4., 5., 6.]])
torch.Size([2, 3])

Example with 3-D data
tensor([[[1., 2.],
         [3., 4.]],

        [[5., 6.],
         [7., 8.]]])
torch.Size([2, 2, 2])


### 1.7 Operations with Tensors
Most operations are very similar to NumPy.

In [None]:
# Addition
x = torch.Tensor([ 1., 2., 3. ])
y = torch.Tensor([ 4., 5., 6. ])

# using arithmetic operation
z = x + y
print(z)

# using method
print(torch.add(x, y))

# using method and providing an output tensor as argument
output = torch.empty(2, 3)
torch.add(x, y, out=output)
print(output)

tensor([5., 7., 9.])
tensor([5., 7., 9.])
tensor([5., 7., 9.])


  


In [None]:
# In-place addition

x = torch.Tensor([ 1., 2., 3. ])
y = torch.Tensor([ 4., 5., 6. ])

y.add_(x)
print(y)

tensor([5., 7., 9.])


In [None]:
# elementwise multiplication

x = torch.Tensor([ 1., 2., 3. ])
y = torch.Tensor([ 4., 5., 6. ])

z = x * y 
print(z)

tensor([ 4., 10., 18.])


In [None]:
# Matrix product (inner) of two tensors

x = torch.Tensor([ 1., 2., 3. ])
y = torch.Tensor([ 4., 5., 6. ])

z = torch.matmul(x, y)
print(z)

tensor(32.)


In [None]:
# outer product

x = torch.Tensor([ 1., 2., 3. ])
y = torch.Tensor([ 4., 5., 6. ])

z = torch.outer(x, y.T)
print(z) ### 3 by 3

tensor([[ 4.,  5.,  6.],
        [ 8., 10., 12.],
        [12., 15., 18.]])


In [None]:
# activation function (ReLU)

import torch.nn as nn

x = torch.Tensor([ 1., 100., -100. ])
m = nn.ReLU()
z = m(x)
print(z) ### bring positive/ negative large value -> 1 / -1


tensor([  1., 100.,   0.])


In [None]:
# activation function (Tanh)

x = torch.Tensor([ 1., 100., -100. ])
m = nn.Tanh()
z = m(x)

print(z) ### bring positive/ negative large value -> 1 / -1

tensor([ 0.7616,  1.0000, -1.0000])


Any operation that mutates a tensor in-place is post-fixed with an underscore `_`. For example: `x.copy_(y)`, `x.t_()`, will change `x`.

See [the PyTorch official documentation](http://pytorch.org/docs/torch.html) for a complete list of the massive number of operations available to you.  They expand beyond just mathematical operations.


In [None]:
# Indexing

x = torch.Tensor([[1., 2., 3.], [4., 5., 6]])
print(x[:, 1]) # Gets column with index 1

tensor([2., 5.])


### 1.8 Reshaping Tensors

In [None]:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

print('x', x.unsqueeze(0).shape)

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])
x torch.Size([1, 4, 4])


In [None]:
x = torch.randn(1)
print(x)
print(x.item())

tensor([-1.0118])
-1.0117835998535156


### 1.9 Converting to and from NumPy/ List
Converting a Torch Tensor to a NumPy array and vice versa is a breeze.

The Torch Tensor and NumPy array will **share their underlying memory locations** (if the Torch Tensor is on CPU), and **changing one will change the other**.



In [None]:
a = torch.ones(5)
print("Original a:", a)

b = a.numpy()
print("Original b:", b)

a.add_(1)
print("New a:", a)
print("New b:", b)

Original a: tensor([1., 1., 1., 1., 1.])
Original b: [1. 1. 1. 1. 1.]
New a: tensor([2., 2., 2., 2., 2.])
New b: [2. 2. 2. 2. 2.]


In [None]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


In [None]:
data = [[1, 2],[3, 4]] #### list 
x_data = torch.tensor(data) ### list to pytorch tensor

print('list', data)
print('tensor', x_data)

list [[1, 2], [3, 4]]
tensor tensor([[1, 2],
        [3, 4]])


### 1.10. CUDA Tensors

Tensors can be moved onto any device using the `.to` method.


### Enable GPU on your Colab notebook

Go to Edit -> Notebook Settings -> select GPU as Hardware accelerator

<img src="https://jovianlin.io/content/images/2018/01/Screen-Shot-2018-01-23-at-8.38.04-AM.png" width="300">

In [None]:
# Try to run this cell with both GPU support and without
import torch
print("CUDA available?", torch.cuda.is_available())

# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    x = torch.Tensor([1.0, 2.0, 3.0])
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

CUDA available? True
tensor([2., 3., 4.], device='cuda:0')
tensor([2., 3., 4.], dtype=torch.float64)


## 2. Autograd: Automatic Differentiation

The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

``torch.Tensor`` is the central class of the package. If you set its attribute
``.requires_grad`` as ``True``, it **starts to track all operations on it**. When
you finish your computation you can call ``.backward()`` and have **all the
gradients computed automatically**. The gradient for this tensor will be
accumulated into ``.grad`` attribute.

To **stop a tensor from tracking history**, you can call ``.detach()`` to detach
it from the computation history, and to prevent future computation from being
tracked.

To **prevent tracking history (and using memory)**, you can also wrap the code block
in ``with torch.no_grad():``. This can be particularly helpful when evaluating a
model because the model may have trainable parameters with `requires_grad=True`,
but for which we don't need the gradients.

In [None]:
import torch
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [None]:
y = x + 2
print(y)

# y was created as a result of an operation, so it has a grad_fn.

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


In [None]:
z = y * y * 3
out = z.mean()

print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
flag in-place. The input flag defaults to ``False`` if not given.



In [None]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x7f59b6701f50>


Let us consider a complete example now.

Let $out = \frac{1}{4}\sum_i z_i$,  
$z_i = 3(x_i+2)^2$  
and $z_i\bigr\rvert_{x_i=1} = 27$.  
Therefore,  
$\frac{\partial out}{\partial x_i} = \frac{1}{4}\frac{\partial z_i}{\partial x_i} = \frac{1}{4}.3.2(x_i+2) = \frac{3}{2}(x_i+2)$,  
 hence  
$\frac{\partial out}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.



In [None]:
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()

out.backward()
print(x.grad)


tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


Mathematically, if you have a vector valued function $\vec{y}=f(\vec{x})$, then the gradient of $\vec{y}$ with respect to $\vec{x}$ is a Jacobian matrix:
\begin{split}J=\left(\begin{array}{ccc}
 \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\
 \vdots & \ddots & \vdots\\
 \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
 \end{array}\right)\end{split}

Generally speaking, `torch.autograd` is an engine for computing vector-Jacobian product. That is, given any vector $v=\left(\begin{array}{cccc} v_{1} & v_{2} & \cdots & v_{m}\end{array}\right)^{T}$, compute the product $v^{T}\cdot J$. If $v$ happens to be the gradient of a scalar function $l=g\left(\vec{y}\right)$, that is, $v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}$, then by the chain rule, the vector-Jacobian product would be the gradient of $l$ with respect to $\vec{x}$:
\begin{split}J^{T}\cdot v=\left(\begin{array}{ccc}
 \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
 \vdots & \ddots & \vdots\\
 \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
 \end{array}\right)\left(\begin{array}{c}
 \frac{\partial l}{\partial y_{1}}\\
 \vdots\\
 \frac{\partial l}{\partial y_{m}}
 \end{array}\right)=\left(\begin{array}{c}
 \frac{\partial l}{\partial x_{1}}\\
 \vdots\\
 \frac{\partial l}{\partial x_{n}}
 \end{array}\right)\end{split}

(Note that $v^{T}\cdot J$ gives a row vector which can be treated as a column vector by taking $J^{T}\cdot v$.)

This characteristic of vector-Jacobian product makes it very convenient to feed external gradients into a model that has non-scalar output.

You can also stop autograd from tracking history on Tensors with `.requires_grad=True` either by wrapping the code block in with `torch.no_grad()`:

In [None]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

True
True
False


Every time a variable is back propogated through, the gradient will be accumulated instead of being replaced. Calling `tensor.grad_zero()` would reset the gradients that have accumulated to 0.
