In [1]:
import torch
import numpy as np
import matplotlib.pyplot as plt

# Tensors

## Basics

PyTorch tensors work like numpy tensors (arrays) and can for the most part be used like them.

In [2]:
x = [[1, 2], [3, 4]]
xt = torch.tensor(x)
xn = np.array(x)

In [None]:
print(xt)
print(xn)

In [None]:
shape = (2, 3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

In [None]:
tensor = torch.ones(4, 4)
tensor[:,1] = 0
print(tensor)

### In-place operations

PyTorch has a number of operations that changes the contents of a tensor. Similarly to in Julia, these are by convention by an underscore at the end of the name.

In [None]:
print(tensor, "\n")

tensor.add(5) # This is the same as writing "tensor + 5"
print(tensor, "\n") # Therefore, the tensor is unchanged

tensor.add_(5)
print(tensor)

PyTorch tensors can be located on either the CPU or GPU.

In [None]:
xt.device

PyTorch tensors (on the CPU) can often be used in place of numpy arrays

In [None]:
np.linalg.lstsq(xt, np.ones(2))

To actually make a numpy array, `.numpy()` extracts a numpy array from a tensor.

In [None]:
t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

These tensors are using the same underlying data, which means that changes to one are reflected in the other.

In [None]:
# Modifying the tensor
t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

In [None]:
# Modifying the numpy array
n[0] *= 10
print(f"t: {t}")
print(f"n: {n}")

WARNING: When we use gradients, the gradients can under various circumstances be wrong if the tensor is not detached first using `.detach()`

## Using a GPU

My main device is from NVIDIA, so we here use the CUDA module.

Existing CPU data can be moved to a GPU using `.to`

In [12]:
if torch.cuda.is_available():
    xt = xt.to('cuda')
    print(f"Device tensor is stored on: {xt.device}")

With multiple GPUs, you specify which one to use. This lists them all.

In [13]:
for i in range(torch.cuda.device_count()):
    if i == torch.cuda.current_device():
        print("-> ", end="")
    else:
        print("   ", end="")
    print(torch.cuda.get_device_name(i))

We can set PyTorch to use the CUDA enabled GPU by default.

In [14]:
if torch.cuda.is_available():
    torch.set_default_device('cuda')

For other devices, such as Apple's chips, calls have to be modified:

In [None]:
import torch
if torch.backends.mps.is_available():
    mps_device = torch.device("mps")
    x = torch.ones(1, device=mps_device)
    print (x)
    torch.set_default_device('mps')
else:
    print ("MPS device not found.")

In [None]:
xt = torch.tensor([[1, 2], [3, 4]])
xt.device

The data can be moved to and from the GPU if needed, for instance when you want to process the data using a non-PyTorch system.

In [None]:
np.linalg.lstsq(xt, np.ones(2)) # This does not work because GPU tensors don't work with numpy

In [None]:
np.linalg.lstsq(xt.cpu(), np.ones(2)) # Moving it to the CPU first works
xt.cpu().device

Do not move data between the GPU and CPU unnecessarily, as this is a relatively slow operation.

## Gradients

We will use simple linear models here as an example. The goal is to train the least squares objective by gradient descent. In other words,

\begin{equation}
    \min_\beta \| y - \beta X \|_2^2
\end{equation}

and we need to take gradients in $\beta$.

PyTorch can compute gradients for us. To tell PyTorch that we want gradients for a particular tensor, we add `requires_grad=True` when instantiating, like for `beta` below. After this, *every* computation involving beta will be tracked for gradient computations.

In [20]:
# This example is not ideal for GPU computation...
# The .cpu() calls can be removed but are kept for generality.
torch.set_default_device('cpu')

In [21]:
beta = torch.zeros(2, requires_grad=True)
x = 42 + 2. * torch.arange(30) # Note the floating point 2. as opposed to just 2
y = 130 + 0.6 * x + 5 * torch.rand_like(x)
X = torch.column_stack((torch.ones_like(y), x))

After running computations with `beta` to compute the loss value `loss`, the gradient of `loss` with respect to `beta` can be computed using `loss.backward()`. This stores the gradient inside `beta.grad`, which we use to update `beta`.

WARNING: This code is doing it manually for illustration purposes. Do not do it like this!!

In [22]:
def gradient_step_(beta, eps):
    loss = torch.mean((y - X @ beta)**2)
    loss.backward()
    beta.data -= eps * beta.grad.data
    beta.grad.data.zero_()

In [None]:
for _ in range(100000):
    gradient_step_(beta, 0.0001)
print(beta.detach())
plt.scatter(x.cpu(), y.cpu())
plt.plot(x.cpu(), (X @ beta.detach()).cpu())

Instead of manually updating the parameters, you should be using an optimizer. `torch.optim.SGD` is what we implemented above. In this example we use `Adam`, which is much easier to use than plain gradient descent (try both and see how they change with the learning rate).

In [26]:
optimizer = torch.optim.Adam((beta,), lr=0.01)

In [None]:
def model(X):
    return X @ beta
def objective(y, yhat):
    return torch.mean((y - yhat)**2)

for _ in range(10000):
    optimizer.zero_grad()
    objective(y, model(X)).backward()
    optimizer.step()
print(beta)

# Neural network modules

PyTorch has build in support for simplifying the transformations. For our example here, it means we do not have to construct `beta` manually and do not have to write the matrix multiplication manually.

Here we are interested in a `Linear` module/layer. Its two main arguments are the input dimension and output dimension. The input dimension is the number of features in our input and the output dimension is here 1.

In [None]:
beta = torch.nn.Linear(2, 1, bias = False)
torch.equal(beta(X), X @ beta.weight.T)

With `bias=True` (which is the default), we do not even need the constant column of `X`. However, note that PyTorch expects inputs to be $N \times p$ matrices.

In [None]:
x = x[:,torch.newaxis]
beta = torch.nn.Linear(1,1)
torch.equal(beta(x), beta.bias + x @ beta.weight.T)

In [None]:
optimizer = torch.optim.Adam(beta.parameters(), lr=0.01)

optimizer.zero_grad()
objective(y, beta(x)).backward()
optimizer.step()

print(beta.bias.data)
print(beta.weight.data)

In [None]:
beta = torch.nn.Linear(4, 3)
print(beta.bias)
print(beta.weight)

# Neural networks

Pytorch follows python design principles, and the typcal way of constructing a neural network is by subclassing `torch.nn.Module` and defining a `forward` method.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super().__init__()
        # These define fully connected transformations
        # x -> Wx + b
        # Note that the output dimension of each layer
        # must match the input of the next.
        self.fc1 = nn.Linear(10, 11)
        self.fc2 = nn.Linear(11, 12)
        self.fc3 = nn.Linear(12, 2)

    def forward(self, x):
        # Defining forward is like defining __call__ for
        # regular python classes, but specialized for pytorch.
        # __call__ should not be overwritten, as it takes care
        # of running hooks.

        # This gradually runs each layer, with relu activation.
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
print(net)

In [None]:
params = list(net.parameters())
print(len(params))
print(params[0].size()) # The size of the first layer weights (not bias) fc1.weight

The neural network can be evaluated by calling it:

In [None]:
net(torch.rand(4, 10))