In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
import torch
import numpy as np
torch.__version__

## 1. Tensor basics

PyTorch is a library for training deep neural networks, and much of it is based on the `Tensor`, an array type that is similar to NumPy arrays.

Under the hood, PyTorch runs on compiled C, and if available, CUDA and cuDNN code.

In [None]:
tensor = torch.Tensor([0, 1, 2, 3])

In [None]:
print(tensor.shape)
print(tensor.dtype)

In [None]:
tensor.sum()

It's easy to convert between NumPy arrays PyTorch Tensors.

In [None]:
tensor.numpy()

In [None]:
# A better alternative to torch.Tensor(arr)
torch.from_numpy(np.arange(5))

## 2. Using CUDA

**The following code only works if you have PyTorch set up for a GPU.**

In [None]:
assert torch.cuda.is_available and torch.has_cudnn

In [None]:
x = torch.Tensor(range(5))
y = torch.Tensor(np.ones(5))

It's similarly easy to move Tensors onto a GPU.

In [None]:
x.cuda()

In [None]:
z = x.cuda() + y.cuda()
print(z)

In [None]:
z.cpu()

In [None]:
if torch.cuda.is_available and torch.has_cudnn:
    device = torch.device('cuda')
else:
    device = torch.device("cpu")

In [None]:
z = x.to(device) + y.to(device)
print(z)

## 3. Exercises

* (Taken from DS-GA 1011, Fall 2017)

1) Initialize random tensors A, B, C of size [2,3], [2,3], [3,3,2].

2) Fill tensor A with all 10s

3) Fill tensor B with elements sampled from the normal distribution

4) Point-wise multiply A with B, and put the result into tensor B

5) Print the mean and standard deviation of the elements of B

6) Fill tensor C with elements samples from the uniform distribution U(-1,1). Print the dimensions of C.

7) Transpose the second and third dimension of tensor C, and put the result into tensor C itself (in-place). Print the dimensions of C.

8) Show the contiguity property of the tensors

9) Print the second column of the third dimension of tensor C (note zero-indexed)

10) Perform operation A+B+C (note the broadcasting)

## 4. Autograd

Autograd is a submodule in PyTorch that handles automatic differentiations and gradient computation. This allows you to simply a define model once, in a forward fashion, and the library handles the computation of all gradients in the computational graph.

Here, we create 2 Tensors, but we want PyTorch to compute gradients with respect to $x$. By default, for arbitrary computations in PyTorch, no gradiens are computed (e.g for $y$).

In [None]:
x = torch.randn(5, requires_grad=True)
y = torch.arange(5.)

In [None]:
print(x)
print(x.grad)

In [None]:
print(y)
print(y.grad)

We defined $z = x \cdot y$. Then

$$\frac{dz}{dx} = y$$

Note `z.grad_fn`, which shows $z$ was computed, capturing its dependencies in the computation graph.

In [None]:
z = (x * y).sum()
print(z)
print(z.grad)
print(z.grad_fn)

At this point, no gradients are computed yet. It is only when we call `z.backward()` that PyTorch computes the gradients, and backpropagates them to any node in the graph that required gradients (e.g. $x$).

In [None]:
z.backward()

As we can see, $x$ now has gradients associated with it, but $y$ does not.

In [None]:
print(x.grad)
print(y.grad)
print(z.grad)

With just this, we can compute a very rudimentary form of gradient descent!

In [None]:
# A very silly case of gradient descent:
learning_rate = 0.01
x = torch.tensor([1000.], requires_grad=True)
x_values = []
for i in range(1000):
    
    # Our loss function is: We want x**2 to be small
    loss = x ** 2
    loss.backward()
    
    # Have to do something a little convoluted here to subtract the 
    #   gradient -- don't worry, we'll never do this again
    x.data.sub_(x.grad.data * learning_rate)
    
    # Remember to zero-out the gradient! 
    # PyTorch doesn't do it automatically.
    x.grad.data.set_(torch.Tensor([0]))
    x_values.append(x.item())

In [None]:
plt.plot(x_values)

Lastly, sometimes you want to run things *without* computing gradients:

In [None]:
x = torch.tensor([1000.], requires_grad=True)

# With gradient computation:
loss = x ** 2
print(loss.requires_grad)


# Without gradient computation:
with torch.no_grad():
    loss = x ** 2
print(loss.requires_grad)

**Highly Recommend**: https://pytorch.org/docs/stable/autograd.html