In [1]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

### Tutorial 11: Cartesian Product

Sometimes, when using multi-input models, one wants to run a function on the cartesian product between the two inputs. Put another way, given a sequence input $X$ and some other input $Y$, one wants to make predictions for $X_0 Y_0, X_0 Y_1... X_n Y_m$ where $X$ has $n$ elements and $Y$ has $m$ inputs. Of course, one could simply run each of the functions for a fixed value of $Y$ across all $X$ (or vice-versa) and then change the value of $Y$ each time. But that's not convenient, and can have challenges with efficiency if one axis is held constant each time but the other axis is small, i.e., if your batching setup is that you run all sequences through the model for each example in $Y$ but you only have a very small number of sequences you're considering.

Instead, `tangermeme` provides a general-purpose `apply_product` function that takes in a function, model, and data, and handles running the function on the cartesian product of the inputs in a batch- and memory-efficient manner. In theory, the most conceptually simple way to set up this function is to unravel the entire product into CPU memory and then run the provided function on the entire thing. However, this can take a huge amount of memory, particularly if the product is over several elements. In practice, it's better to construct each batch iteratively and only run one batch at a time through the model. That way, only the model predictions are stored in CPU memory as opposed to the (usually much larger) inputs.

#### Apply Product

More concretely, to use the `apply_product` function, one provides a function, e.g., `predict`, `marginalize`, etc..., a model, the sequence input `X`, and any number of additional arguments in `args`. The first dimension of these arguments do not have to match. Unlike the core functions like `predict`, the function is not just being applied to `X` and the series of arguments. Rather, the point of `apply_product` is to apply the function to the product of arguments and get a tensor back where the first dimensions are `len(X), len(args[0]), len(args[1])...` and then the dimensions of the predictions from the model.

Let's see this in action with a toy model that takes an input, flattens it, and applies an optional linear transformation. 

In [2]:
import torch

class FlattenDense(torch.nn.Module):
	def __init__(self, length=10):
		super(FlattenDense, self).__init__()
		self.dense = torch.nn.Linear(length*4, 3)

	def forward(self, X, alpha=0, beta=1):
		X = X.reshape(X.shape[0], -1)
		return self.dense(X) * beta + alpha

This model has two optional inputs: `alpha`, which is an additive constant on the output from the dense layer, and `beta`, which is a multiplicative factor. Yes, it's redundant to have these factors after a dense layer which is doing a pretty similar thing, but this is meant just to demonstrate how to use the functions and to confirm that it's doing the expected thing.

##### Predict

Let's start off by generating some random one-hot encodings and running the model on them. 

In [3]:
from tangermeme.utils import random_one_hot
torch.manual_seed(0)

X = random_one_hot((5, 4, 10), random_state=0).float()
model = FlattenDense()

y = model(X)
y

tensor([[-0.3154, -0.1625, -0.3183],
        [-0.0866,  0.5461, -0.0244],
        [ 0.3089, -0.2828, -0.1485],
        [ 0.1671, -0.1341, -0.3094],
        [-0.0627,  0.0088,  0.3471]], grad_fn=<AddBackward0>)

We can confirm that the `apply_product` is working correctly by passing in `alpha` and `beta` values that are equal to their defaults.

In [4]:
from tangermeme.predict import predict
from tangermeme.product import apply_product

torch.manual_seed(0)
alpha = torch.zeros(1, 1)
beta = torch.ones(1, 1)

y_product = apply_product(predict, model, X, args=(alpha, beta))[:, 0, 0]
y_product

tensor([[-0.3154, -0.1625, -0.3183],
        [-0.0866,  0.5461, -0.0244],
        [ 0.3089, -0.2828, -0.1485],
        [ 0.1671, -0.1341, -0.3094],
        [-0.0627,  0.0088,  0.3471]])

Looks like the values are identical. We have to index the first element of the second and third dimensions because dimensions of size 1 are added. 

As mentioned repeatedly, `tangermeme` tries to be as assumption-free as possible. This means that `alpha` and `beta` can be any shape that works with the math provided in the implementation. Because three outputs are generated for each example, we can have our `alpha` and `beta` tensors also have three dimensions.

In [5]:
alpha = torch.zeros(1, 3)
beta = torch.ones(1, 3)

y_product = apply_product(predict, model, X, args=(alpha, beta))
y_product.shape

torch.Size([5, 1, 1, 3])

If we wanted to scan over a range of `alpha` and `beta` values, we just need to pass in a bigger tensor.

In [6]:
alpha = torch.zeros(10, 3)
beta = torch.ones(54, 3)

y_product = apply_product(predict, model, X, args=(alpha, beta))
y_product.shape

torch.Size([5, 10, 54, 3])

Now, we can check that the values are correct even when adding the additional arguments. Let's start off by considering only the `alpha` parameter.

In [7]:
torch.manual_seed(0)
alpha = torch.randn(3, 1)

y_product = apply_product(predict, model, X, args=(alpha,))
y_product

tensor([[[ 1.2256,  1.3785,  1.2227],
         [-0.6089, -0.4559, -0.6117],
         [-2.4942, -2.3412, -2.4970]],

        [[ 1.4544,  2.0871,  1.5166],
         [-0.3800,  0.2527, -0.3178],
         [-2.2654, -1.6327, -2.2032]],

        [[ 1.8499,  1.2582,  1.3925],
         [ 0.0155, -0.5762, -0.4419],
         [-1.8699, -2.4616, -2.3273]],

        [[ 1.7081,  1.4069,  1.2316],
         [-0.1263, -0.4275, -0.6028],
         [-2.0117, -2.3129, -2.4882]],

        [[ 1.4783,  1.5498,  1.8881],
         [-0.3561, -0.2846,  0.0537],
         [-2.2414, -2.1700, -1.8317]]])

Since all we are doing is adding a value in a broadcasted manner, we can easily check by adding in the appropriate dimensions and doing the addition outside the context of this function.

In [8]:
y.unsqueeze(1) + alpha.unsqueeze(0)

tensor([[[ 1.2256,  1.3785,  1.2227],
         [-0.6089, -0.4559, -0.6117],
         [-2.4942, -2.3412, -2.4970]],

        [[ 1.4544,  2.0871,  1.5166],
         [-0.3800,  0.2527, -0.3178],
         [-2.2654, -1.6327, -2.2032]],

        [[ 1.8499,  1.2582,  1.3925],
         [ 0.0155, -0.5762, -0.4419],
         [-1.8699, -2.4616, -2.3273]],

        [[ 1.7081,  1.4069,  1.2316],
         [-0.1263, -0.4275, -0.6028],
         [-2.0117, -2.3129, -2.4882]],

        [[ 1.4783,  1.5498,  1.8881],
         [-0.3561, -0.2846,  0.0537],
         [-2.2414, -2.1700, -1.8317]]], grad_fn=<AddBackward0>)

Same values. If we add in a `beta` value, we see the same thing.

In [9]:
torch.manual_seed(0)
alpha = torch.randn(3, 1)
beta = torch.randn(1, 1)

y_product = apply_product(predict, model, X, args=(alpha, beta))[:, :, 0]
y_product

tensor([[[ 1.3617,  1.4486,  1.3601],
         [-0.4727, -0.3858, -0.4743],
         [-2.3581, -2.2711, -2.3597]],

        [[ 1.4918,  1.8514,  1.5271],
         [-0.3427,  0.0170, -0.3073],
         [-2.2280, -1.8684, -2.1926]],

        [[ 1.7166,  1.3803,  1.4566],
         [-0.1178, -0.4542, -0.3779],
         [-2.0032, -2.3395, -2.2632]],

        [[ 1.6360,  1.4648,  1.3651],
         [-0.1984, -0.3696, -0.4693],
         [-2.0838, -2.2550, -2.3546]],

        [[ 1.5054,  1.5460,  1.7383],
         [-0.3290, -0.2884, -0.0961],
         [-2.2144, -2.1738, -1.9815]]])

In [10]:
y.unsqueeze(1) * beta.unsqueeze(0) + alpha.unsqueeze(0)

tensor([[[ 1.3617,  1.4486,  1.3601],
         [-0.4727, -0.3858, -0.4743],
         [-2.3581, -2.2711, -2.3597]],

        [[ 1.4918,  1.8514,  1.5271],
         [-0.3427,  0.0170, -0.3073],
         [-2.2280, -1.8684, -2.1926]],

        [[ 1.7166,  1.3803,  1.4566],
         [-0.1178, -0.4542, -0.3779],
         [-2.0032, -2.3395, -2.2632]],

        [[ 1.6360,  1.4648,  1.3651],
         [-0.1984, -0.3696, -0.4693],
         [-2.0838, -2.2550, -2.3546]],

        [[ 1.5054,  1.5460,  1.7383],
         [-0.3290, -0.2884, -0.0961],
         [-2.2144, -2.1738, -1.9815]]], grad_fn=<AddBackward0>)

##### Marginalize

Naturally, `apply_product` is not restricted to working with the `predict` function. We can also use the `marginalize` function if we want to run the marginalization experiments on a range of additional arguments. If we want to pass additional arguments into the function (as opposed to the model) we can list them after `X`. In this case, we can just write the motif we want to marginalize over as a string, just like we would if we used the marginalize function itself. Importantly, arguments into the <i>model</i> have to be specified using the `args` keyword.

In [11]:
from tangermeme.marginalize import marginalize

y_before, y_after = apply_product(marginalize, model, X, "TGA", args=(alpha, beta))
y_before.shape, y_after.shape

(torch.Size([5, 3, 1, 3]), torch.Size([5, 3, 1, 3]))

Since `alpha` is an addition, we shouldn't see any effect on the difference between `y_after` and `y_before` across those values.

In [12]:
(y_after - y_before)[:, :, 0]

tensor([[[ 0.1444, -0.0518,  0.0818],
         [ 0.1444, -0.0518,  0.0818],
         [ 0.1444, -0.0518,  0.0818]],

        [[-0.0629,  0.0639,  0.1607],
         [-0.0629,  0.0639,  0.1607],
         [-0.0629,  0.0639,  0.1607]],

        [[-0.0593,  0.2247,  0.0418],
         [-0.0593,  0.2247,  0.0418],
         [-0.0593,  0.2247,  0.0418]],

        [[-0.0770,  0.0949,  0.1096],
         [-0.0770,  0.0949,  0.1096],
         [-0.0770,  0.0949,  0.1096]],

        [[ 0.0569,  0.0804,  0.0643],
         [ 0.0569,  0.0804,  0.0643],
         [ 0.0569,  0.0804,  0.0643]]])

However, if we pass in a bunch of different values for `beta`, we can see that the difference between `y_after` and `y_before` changes with the changing `beta` values. This is what you'd expect because `beta` is multiplicative.

In [13]:
torch.manual_seed(0)
alpha = torch.randn(1, 1)
beta = torch.randn(3, 1)

y_before, y_after = apply_product(marginalize, model, X, "TGA", args=(alpha, beta))
(y_after - y_before)[:, 0]

tensor([[[-0.0745,  0.0268, -0.0422],
         [-0.5534,  0.1987, -0.3134],
         [ 0.1444, -0.0518,  0.0818]],

        [[ 0.0325, -0.0330, -0.0830],
         [ 0.2411, -0.2448, -0.6161],
         [-0.0629,  0.0639,  0.1607]],

        [[ 0.0306, -0.1160, -0.0216],
         [ 0.2273, -0.8612, -0.1602],
         [-0.0593,  0.2247,  0.0418]],

        [[ 0.0397, -0.0490, -0.0566],
         [ 0.2951, -0.3636, -0.4200],
         [-0.0770,  0.0949,  0.1096]],

        [[-0.0294, -0.0415, -0.0332],
         [-0.2180, -0.3082, -0.2463],
         [ 0.0569,  0.0804,  0.0643]]])

##### Other Functions

Any other function in `tangermeme` that takes in `args` can be wrapped using `apply_product`. 