In [1]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

### Tutorial 4: Predictions

When one has a machine learning model, a natural next step is to make predictions with that model. Accordingly, `tangermeme` implements general-purpose functions for making predictions from PyTorch models in a memory-efficient manner, regardless of the number of inputs or outputs from the model. These functions can be used by themselves but their primary purpose is as building blocks for more complicated analysis functions. 

Although making predictions using a model is conceptually simple to understand, there are several technical issues with doing efficiently so in practice. First, when your data is too big to fit in GPU memory you cannot simply move it all over at once and so you must make predictions in a batched manner. Here, a small number of examples are moved from the CPU to the GPU, predictions are made, and the results are moved back to the CPU. The batch size can be tuned to the largest number of examples that fit in GPU memory. Second, when a model has multiple outputs, making predictions in a batched manner yields a set of tensors for each batch. These tensors must be correctly concatenated across batches to make sure that the final output from the function matches the shape of the data as if all examples were run through the model at the same time. Third, some models have multiple inputs and so this function must be able to handle an optional set of additional arguments.

#### Predict

The simplest function that implements these ideas is `predict`, which takes in a model, data, and optional additional arguments, and makes batched predictions on the data given the model. The function can be run on GPUs, CPUs, or any other devices that work with PyTorch.

To demonstate it, let's use a model that takes its inputs and flattens them before feeding them into a dense layer to make three predictions per example. The forward function takes in two optional arguments: `alpha`, which gets added to the predictions, and `beta`, which multiplies the predictions (but not `alpha`). By default, these are set such that the predictions are returned without modification.

In [2]:
import torch

class FlattenDense(torch.nn.Module):
	def __init__(self, length=10):
		super(FlattenDense, self).__init__()
		self.dense = torch.nn.Linear(length*4, 3)

	def forward(self, X, alpha=0, beta=1):
		X = X.reshape(X.shape[0], -1)
		return self.dense(X) * beta + alpha

Now let's generate some random sequence and see what the model output looks like for it. We need to use `torch.manual_seed` because of the random initializations used in the `torch.nn.Linear` layer. As a side note, even though we are not training the model here, the usage doesn't change based on whether the model is randomly initialized or trained.

In [3]:
from tangermeme.utils import random_one_hot
torch.manual_seed(0)

X = random_one_hot((5, 4, 10), random_state=0).float()
model = FlattenDense()

y = model(X)
y

tensor([[-0.3154, -0.1625, -0.3183],
        [-0.0866,  0.5461, -0.0244],
        [ 0.3089, -0.2828, -0.1485],
        [ 0.1671, -0.1341, -0.3094],
        [-0.0627,  0.0088,  0.3471]], grad_fn=<AddBackward0>)

This is simple enough to do for a simple model on a small amount of data. Let's try using the built-in predict function with different batch sizes, to demonstrate how one would do batched predictions.

In [4]:
from tangermeme.predict import predict

y0 = predict(model, X, batch_size=2)
y0

tensor([[-0.3154, -0.1625, -0.3183],
        [-0.0866,  0.5461, -0.0244],
        [ 0.3089, -0.2828, -0.1485],
        [ 0.1671, -0.1341, -0.3094],
        [-0.0627,  0.0088,  0.3471]])

In [5]:
y0 = predict(model, X, batch_size=100)
y0

tensor([[-0.3154, -0.1625, -0.3183],
        [-0.0866,  0.5461, -0.0244],
        [ 0.3089, -0.2828, -0.1485],
        [ 0.1671, -0.1341, -0.3094],
        [-0.0627,  0.0088,  0.3471]])

Note that the tensor no longer has `grad_fn=<AddBackward0>` meaning that gradients were not calculated or stored. Specifically, the prediction loop is wrapped in `torch.no_grad`. By default, this function will move each batch to the GPU. However, it doesn't have to. You can pass `device='cpu'` to have the predictions be made on the CPU.

In [6]:
y0 = predict(model, X, device='cpu')
y0

tensor([[-0.3154, -0.1625, -0.3183],
        [-0.0866,  0.5461, -0.0244],
        [ 0.3089, -0.2828, -0.1485],
        [ 0.1671, -0.1341, -0.3094],
        [-0.0627,  0.0088,  0.3471]])

Next, let's consider the setting where you want to pass additional arguments into the forward function because the model is multi-input. Remember that our model can optionally take in `alpha` and `beta` parameters. All we have to do is pass in a tuple of `args` to the `predict` function where each element in `args` is a tensor containing values for one of the inputs to the model.

Let's start off by looking at just passing in `alpha` to the model.

In [7]:
torch.manual_seed(0)
alpha = torch.randn(5, 1)

y + alpha

tensor([[ 1.2256,  1.3785,  1.2227],
        [-0.3800,  0.2527, -0.3178],
        [-1.8699, -2.4616, -2.3273],
        [ 0.7355,  0.4344,  0.2591],
        [-1.1472, -1.0757, -0.7374]], grad_fn=<AddBackward0>)

In [8]:
predict(model, X, args=(alpha,))

tensor([[ 1.2256,  1.3785,  1.2227],
        [-0.3800,  0.2527, -0.3178],
        [-1.8699, -2.4616, -2.3273],
        [ 0.7355,  0.4344,  0.2591],
        [-1.1472, -1.0757, -0.7374]])

Now, let's try passing in both `alpha` and `beta`.

In [9]:
torch.manual_seed(1)
beta = torch.randn(5, 1)

y * beta + alpha

tensor([[ 1.3324,  1.4336,  1.3305],
        [-0.3165, -0.1477, -0.2999],
        [-2.1597, -2.1962, -2.1879],
        [ 0.6723,  0.4851,  0.3762],
        [-1.0562, -1.0885, -1.2414]], grad_fn=<AddBackward0>)

In [10]:
predict(model, X, args=(alpha, beta))

tensor([[ 1.3324,  1.4336,  1.3305],
        [-0.3165, -0.1477, -0.2999],
        [-2.1597, -2.1962, -2.1879],
        [ 0.6723,  0.4851,  0.3762],
        [-1.0562, -1.0885, -1.2414]])

This implementation is extremely flexible. It makes no assumptions on the shape of the underlying data (except that the `batch_size` dimension is the same), and so we could pass in bigger tensors if we wanted to without having to modify the code.

In [11]:
torch.manual_seed(0)
alpha = torch.randn(5, 3)

y + alpha

tensor([[ 1.2256, -0.4559, -2.4970],
        [ 0.4818, -0.5384, -1.4230],
        [ 0.7123,  0.5552, -0.8678],
        [-0.2362, -0.7307, -0.1273],
        [-0.9193,  1.1094, -0.7241]], grad_fn=<AddBackward0>)

In [12]:
predict(model, X, args=(alpha,))

tensor([[ 1.2256, -0.4559, -2.4970],
        [ 0.4818, -0.5384, -1.4230],
        [ 0.7123,  0.5552, -0.8678],
        [-0.2362, -0.7307, -0.1273],
        [-0.9193,  1.1094, -0.7241]])

This means that if you have a model with one input that is biological sequence and another input that is something more complicated -- like an image, for instance -- you can easily pass both into the model. They just need to be passed in in the same order as defined by the forward function.

#### Predict Cross

When handling models that have multiple inputs ($X$ and $Y$), sometimes you'd like to make predictions for each element of $X$ given each element of $Y$. Basically, making predictions for each element in a `len(X)` by `len(Y)` matrix. This can be achieved using just the `predict` function, but involves an unneccesarily amount of memory to have every combination of elements in $X$ and elements in $Y$.

The `predict_cross` function in `tangermeme` addresses this. Rather than having the tensors in `args` be aligned with $X$, in that the `batch_size` dimension is the same, the idea is that predictions are made for each element in $X$ given each element in `args`. This is implemented in an efficient manner where batches are generated of the cross and then discarded afterward, so the entire cross does not need to be saved in memory. An important note is that the tensors in `args` have to all be aligned with each other, in that they have to all have the same length. 

Let's see a basic example. Imagine that we want to make predictions for each element in $X$ and also across each `alpha`.

In [13]:
from tangermeme.predict import predict_cross

torch.manual_seed(0)
alpha = torch.randn(3, 1)

y_cross = predict_cross(model, X, args=(alpha,))
y_cross.shape

torch.Size([5, 3, 3])

Here, the shape is `(5, 3, 3)` because there were 5 sequences in $X$ and 3 values in `alpha`, and the output from the model is also 3 (the last dimension).

In [14]:
y_cross

tensor([[[ 1.2256,  1.3785,  1.2227],
         [-0.6089, -0.4559, -0.6117],
         [-2.4942, -2.3412, -2.4970]],

        [[ 1.4544,  2.0871,  1.5166],
         [-0.3800,  0.2527, -0.3178],
         [-2.2654, -1.6327, -2.2032]],

        [[ 1.8499,  1.2582,  1.3925],
         [ 0.0155, -0.5762, -0.4419],
         [-1.8699, -2.4616, -2.3273]],

        [[ 1.7081,  1.4069,  1.2316],
         [-0.1263, -0.4275, -0.6028],
         [-2.0117, -2.3129, -2.4882]],

        [[ 1.4783,  1.5498,  1.8881],
         [-0.3561, -0.2846,  0.0537],
         [-2.2414, -2.1700, -1.8317]]])

In [15]:
y.unsqueeze(1) + alpha.unsqueeze(0)

tensor([[[ 1.2256,  1.3785,  1.2227],
         [-0.6089, -0.4559, -0.6117],
         [-2.4942, -2.3412, -2.4970]],

        [[ 1.4544,  2.0871,  1.5166],
         [-0.3800,  0.2527, -0.3178],
         [-2.2654, -1.6327, -2.2032]],

        [[ 1.8499,  1.2582,  1.3925],
         [ 0.0155, -0.5762, -0.4419],
         [-1.8699, -2.4616, -2.3273]],

        [[ 1.7081,  1.4069,  1.2316],
         [-0.1263, -0.4275, -0.6028],
         [-2.0117, -2.3129, -2.4882]],

        [[ 1.4783,  1.5498,  1.8881],
         [-0.3561, -0.2846,  0.0537],
         [-2.2414, -2.1700, -1.8317]]], grad_fn=<AddBackward0>)

When we have multiple other arguments being passed in, they have to aligned with each other. Put another way, all of the tensors passed into `args` have to have the same batch size dimension but do not have to match with $X$'s batch size.

In [16]:
torch.manual_seed(0)
alpha = torch.randn(3, 1)
beta = torch.randn(3, 1)

y_cross = predict_cross(model, X, args=(alpha, beta))
y_cross

tensor([[[ 1.3617,  1.4486,  1.3601],
         [ 0.0487, -0.1172,  0.0517],
         [-1.7376, -1.9516, -1.7337]],

        [[ 1.4918,  1.8514,  1.5271],
         [-0.1995, -0.8857, -0.2670],
         [-2.0577, -2.9426, -2.1447]],

        [[ 1.7166,  1.3803,  1.4566],
         [-0.6284,  0.0133, -0.1324],
         [-2.6108, -1.7833, -1.9711]],

        [[ 1.6360,  1.4648,  1.3651],
         [-0.4747, -0.1480,  0.0421],
         [-2.4125, -1.9913, -1.7461]],

        [[ 1.5054,  1.5460,  1.7383],
         [-0.2255, -0.3030, -0.6699],
         [-2.0912, -2.1911, -2.6643]]])

In [17]:
y.unsqueeze(1) * beta.unsqueeze(0) + alpha.unsqueeze(0)

tensor([[[ 1.3617,  1.4486,  1.3601],
         [ 0.0487, -0.1172,  0.0517],
         [-1.7376, -1.9516, -1.7337]],

        [[ 1.4918,  1.8514,  1.5271],
         [-0.1995, -0.8857, -0.2670],
         [-2.0577, -2.9426, -2.1447]],

        [[ 1.7166,  1.3803,  1.4566],
         [-0.6284,  0.0133, -0.1324],
         [-2.6108, -1.7833, -1.9711]],

        [[ 1.6360,  1.4648,  1.3651],
         [-0.4747, -0.1480,  0.0421],
         [-2.4125, -1.9913, -1.7461]],

        [[ 1.5054,  1.5460,  1.7383],
         [-0.2255, -0.3030, -0.6699],
         [-2.0912, -2.1911, -2.6643]]], grad_fn=<AddBackward0>)