# Fitting: Computing an NLL

We will be using PyTorch this time.

In [None]:
!nvidia-smi

In [None]:
import numpy as np
import math

np.random.seed(42)

dist = np.hstack(
    [
        np.random.normal(loc=1, scale=2.0, size=500_000),
        np.random.normal(loc=1, scale=0.5, size=500_000),
    ]
)

In [None]:
import torch

## Torch: CPU
By default, Torch data will be on the CPU unless sent to a GPU. Let's start with CPU, then:

In [None]:
d_dist = torch.tensor(dist)

This is similar to NumPy, though we'll have to be careful to use a non-Torch `sqrt` function since it does not operate on a Torch Tensor:

In [None]:
def gaussian(x, μ, σ):
    return (
        1 / torch.sqrt(2 * np.pi * σ**2) * torch.exp(-((x - μ) ** 2) / (2 * σ**2))
    )


def add(x, f_0, mean, sigma, sigma2):
    return f_0 * gaussian(x, mean, sigma) + (1 - f_0) * gaussian(x, mean, sigma2)


@torch.jit.script
def nll(dist, f_0, mean, sigma, sigma2):
    return -torch.sum(torch.log(add(dist, f_0, mean, sigma, sigma2)))

Now, let's check the performance:

In [None]:
%%timeit
vals = [torch.tensor(v) for v in np.random.rand(4)]
nll(d_dist, *vals)

## Torch: GPU

Moving this to the GPU is very simple; we get a CUDA device and then use `.to` to send data to the device. *Note that we do not have to send functions to the device, only data. If you are doing ML, models usually also have to be sent to the device, because they contain weights, and weights are data)*.

In [None]:
device = torch.device("cuda:0")

In [None]:
dev_dist = d_dist.to(device)

> Warning: in the current environment, this is a little broken - PyTorch and conda-forge are conflicting, I believe.

In [None]:
%%timeit
vals = [torch.tensor(v).to(device) for v in np.random.rand(4)]
nll(dev_dist, *vals)
torch.cuda.synchronize()

## Exercise

Try enabling the `torch.jit.script` decorator. What happens to the performance? How does it compare with the other methods now?

## PyTorch gradients

Torch's strong point (along with TensorFlow) is the gradient functionality. If you make a tensor with `requires_grad=True`, it then keeps a record of what happens to it during calculations, called a tape. If you call `result.backward(values)`, it replays the tape of gradient operations in reverse, allowing you to get the gradient. This is very powerful in fitting problems, such as those encountered in ML.