There are three main arguments to `torch.nn.functional.embedding_bag`. In the most basic case, they are:

* `weight` is a list of vectors called "word vectors". In an application, each vector has a different word associated with it, but that actual word string is not needed for the operation. The word vectors all have the same number of elements.
* `input` is a list of indices into the first dimension of `weight`. In other words, it's a list of indices of different word vectors. This list can have duplicates and it can have any length.
* `offsets` specifies different groups, called "bags", of the words specified in `input`. Each successive element of `offsets` represents one bag, so each bag has one element in `offsets`. The element value is the index into `input` for the first word in the bag.

For each bag specified by the combination of `offsets` and `input`, the word vectors of each word in the bag are reduced together (by mean, sum, etc.), and the return value gives the reduced vector for each bag.

In [1]:
import torch
import torch.nn.functional as F

# vocab of 5 words, embedding dim = 3
weight = torch.tensor([
    [0.1, 0.2, 0.3],  # word 0
    [0.0, 0.1, 0.0],  # word 1
    [0.4, 0.0, 0.5],  # word 2
    [0.2, 0.3, 0.1],  # word 3
    [0.7, 0.9, 0.8],  # word 4
    [0, 0, 0],  # dummy word
])

# 1D input indices for 2 bags: [1,2] and [0,3,4]
input = torch.tensor([1, 2, 0, 3, 4])
offsets = torch.tensor([0, 2])  # bag 1 starts at index 0 of `input`, bag 2 at index 2 of `input`

out = F.embedding_bag(input, weight, offsets, mode='mean')
print(out)


tensor([[0.2000, 0.0500, 0.2500],
        [0.3333, 0.4667, 0.4000]])


In the above example, we create two bags with mean reduction. The first bag contains word 1 and 2, so the first vector in the output is the mean of the word vectors for words 1 and 2. Those word vectors are `[0.0, 0.1, 0.0]` and `[0.4, 0.0, 0.5]`, so the mean of those two word vectors is `[0.0 + 0.4, 0.1 + 0.0, 0.0 + 0.5] / 2 = [0.2, 0.05, 0.25]`. And the second bag contains words 0, 3, 4, and the mean of those three words is calculated for the second vector of the output.

Below is a simple python implementation of the operation.

In [2]:
def my_embedding_bag(input, weight, offsets, mode):
    if mode == 'mean':
        reduction_op = torch.mean
    else:
        raise NotImplementedError

    output = torch.empty(len(offsets), weight.shape[1])

    for bag_idx in range(len(offsets)):
        start = offsets[bag_idx].item()
        end = offsets[bag_idx + 1].item() if (bag_idx + 1) < offsets.numel() else None
        output[bag_idx] = reduction_op(weight[input[start:end]], dim=0)

    return output

my_embedding_bag(input, weight, offsets, mode='mean')

tensor([[0.2000, 0.0500, 0.2500],
        [0.3333, 0.4667, 0.4000]])

However, this is only one way to call the operation. It is also possible to give a tensor with more than 1 dimension as the `input`, in which case `offsets` is not used, and instead the bags are specified by the leading dimension groupings of `input`. In that case, if the user wants to put different numbers of words in each bag, they can use a sentinel "padding index". For instance, in the following example, we create the same exact bags from the previous examples with a 2D input.

In [3]:
# 2D input indices for 2 bags: [1,2] and [0,3,4]
input = torch.tensor([[1, 2, 5], [0, 3, 4]])
out = F.embedding_bag(input, weight, mode='mean', padding_idx=5)
print(out)


tensor([[0.2000, 0.0500, 0.2500],
        [0.3333, 0.4667, 0.4000]])
