# Masking Demonstration

Here we aim to demonstrate how the masking procedure works. We can apply a mask across cells, across genes, or at random. Any combination of these masks can be applied and we want to showcase how this happens.

**Despite there being different notations for this, we assume that a value of 0 denotes a mask is being applied for that element and a value of 1 means NO mask is applied. The reasoning for this choice will be clear later.**

### Example Data

In [117]:
import torch

In [118]:
test = torch.tensor([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]])
test

tensor([[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]])

### Masking Cells (mask_cells_prop = 0.25)

In [119]:
new_test = test
new_test

tensor([[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]])

In [120]:
masked = ~(torch.rand((4,1)) < 0.25)
masked.type('torch.DoubleTensor')

tensor([[1.],
        [1.],
        [1.],
        [1.]], dtype=torch.float64)

In [121]:
masked = masked.type_as(new_test)

new_test *= masked
new_test

tensor([[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]])

### Masking Genes (mask_genes_prop = 0.25)

In [122]:
test = torch.tensor([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]])
new_test = test
new_test

tensor([[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]])

In [123]:
masked = ~(torch.rand((1,4)) < 0.25)
masked.type('torch.DoubleTensor')

tensor([[1., 0., 1., 1.]], dtype=torch.float64)

In [124]:
masked = masked.type_as(new_test)

new_test *= masked
new_test

tensor([[1, 0, 3, 4],
        [1, 0, 3, 4],
        [1, 0, 3, 4],
        [1, 0, 3, 4]])

### Random Masking (mask_random_prop = 0.25)

In [125]:
test = torch.tensor([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]])
new_test = test
new_test

tensor([[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]])

In [126]:
masked = ~(torch.rand((4,4)) < 0.25)
masked.type('torch.DoubleTensor')

tensor([[1., 1., 0., 1.],
        [0., 1., 1., 0.],
        [1., 1., 1., 0.],
        [1., 1., 0., 0.]], dtype=torch.float64)

In [127]:
masked = masked.type_as(new_test)

new_test *= masked
new_test

tensor([[1, 2, 0, 4],
        [0, 2, 3, 0],
        [1, 2, 3, 0],
        [1, 2, 0, 0]])

### Puttting it all Together

When we want to applies multiple masks at once, we need to ensure that any of the masks are applied at loss calculation time. By allowing a 0 to represent a mask being applied, element wise multiplication across all the various masks ensures that if an element has a mask in any setting, it will have the mask at loss calculatiion time.

In [128]:
test = torch.tensor([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]])
new_test = test
new_test

tensor([[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]])

In [129]:
masked_cells = ~(torch.rand((4,1)) < 0.25)
masked_cells.type('torch.DoubleTensor')

tensor([[1.],
        [0.],
        [0.],
        [1.]], dtype=torch.float64)

In [130]:
masked_genes = ~(torch.rand((1,4)) < 0.25)
masked_genes.type('torch.DoubleTensor')

tensor([[1., 0., 1., 1.]], dtype=torch.float64)

In [131]:
masked_at_random = ~(torch.rand((4,4)) < 0.25)
masked_at_random.type('torch.DoubleTensor')

tensor([[1., 1., 0., 1.],
        [1., 1., 1., 1.],
        [0., 0., 1., 0.],
        [0., 1., 1., 1.]], dtype=torch.float64)

In [132]:
new_test *= (masked_at_random * masked_genes * masked_cells)
new_test

tensor([[1, 0, 0, 4],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 3, 4]])

Note that broadcasting takes care of the fact that masked_cells and masked_genes are masks for entire rows and columns respectively.

### Calculating Losses with Masks

Suppose the true tensor was the test example from above.

$$\begin{bmatrix} 1 & 2 & 3 & 4\\ 1 & 2 & 3 & 4 \\ 1 & 2 & 3 & 4 \\ 1 & 2 & 3 & 4\end{bmatrix}$$

In [133]:
import torch.nn.functional as F

In [134]:
test = torch.tensor([[1.,2.,3.,4.],[1.,2.,3.,4.],[1.,2.,3.,4.],[1.,2.,3.,4.]])

The tensor `new_test` represents our original tensor after all previous masks have been applied.

In [135]:
new_test

tensor([[1, 0, 0, 4],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 3, 4]])

In [136]:
F.l1_loss(test, new_test, reduction="mean")

tensor(1.7500)

^ The above loss includes comparing the values that weren't masked, which shrinks the loss towards 0. We want to compare the masked values only.

In [147]:
masking_tensor = masked_at_random * masked_cells * masked_genes
masking_tensor.type('torch.DoubleTensor')

tensor([[1., 0., 0., 1.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 1., 1.]], dtype=torch.float64)

In [148]:
F.l1_loss(test[~masking_tensor], new_test[~masking_tensor])

tensor(2.3333)

In [149]:
test[~masking_tensor], new_test[~masking_tensor]

(tensor([2., 3., 1., 2., 3., 4., 1., 2., 3., 4., 1., 2.]),
 tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

^ Notice that when taking the logical inverse of the masking tensor, we only keep the indeces that were masked. This achieves the desired loss calculation.