Code for the paper Dispatcher: A Message Passing Approach To Language Modelling

This code illustrates the Dispatcher algorithm as presented in the paper.

Installation

virtualenv --python=/usr/bin/python3 .env
pip install -r requirements.txt

Training the models

The models can be trained anew using the following scripts

train_dispatcher_after_openwebtext_wikitext2.py
train_dispatcher_after_openwebtext_wikitext103.py
train_msa_wikitext2.py
train_msa_wikitext103.py
train_plain_dispatcher_on_wikitext2.py
train_plain_dispatcher_on_wikitext103.py

Evaluation

The perplexity of the pre-trained models can be evaluated using the following scripts

test_dispatcher_after_openwebtext_on_wikitext2.py
test_dispatcher_after_openwebtext_on_wikitext103.py
test_plain_dispatcher_on_wikitext2.py
test_plain_dispatcher_on_wikitext103.py

The plain dispatcher has about 30% more parameters on Wikitext103 because of a slighly different tokenization technique. The vocabulary of tokens is smaller on Wikitext2 to achieve a better performance.

Code

The Dispatcher is identical to the Transformer architecture with one crucial difference: the self-attention layer is substituted with the Dispatcher layer.

This algorithm - explained in the paper - is contained in the file dispatcher_model.py The following code is this work's main contribution:

class DispatcherLayer(nn.Module):
    def __init__(self, embed_dim, num_heads, bptt, dropout=0.):
        super(DispatcherLayer, self).__init__()

        self._levels = int(math.log(bptt, 2))
        self._shifts = [pow(2, i) for i in range(self._levels)]

        self.embed_dim = embed_dim
        self.num_heads = num_heads
        self.head_dim = embed_dim // num_heads
        self.dropout = dropout

        self.linear_in = nn.Linear(self.head_dim, self.head_dim, bias=False)
        self.internal_attention = nn.Linear(self.head_dim, self._levels, bias=False)
        self.linear_out = nn.Linear(self.head_dim, self.head_dim, bias=False)

    def forward(self, value, mask):
        inp = value.transpose(1, 0)
        batch_length = inp.shape[0]
        length = inp.shape[1]
        inp = inp.reshape(batch_length * self.num_heads, length, self.head_dim)

        V = self.linear_in(inp)

        coefficient_tensor = F.sigmoid(self.internal_attention(inp)) * mask.detach()
        coefficient = torch.chunk(coefficient_tensor, chunks=self._levels, dim=2)

        for c, shift in zip(coefficient, self._shifts):
            if shift > length:
                break
            if self.training and random.uniform(0, 1) < self.dropout:
                continue
            V += c * torch.roll(V, shifts=shift, dims=1)

        out = self.linear_out(V)
        out = out.reshape(batch_length, length, self.embed_dim)
        return out.transpose(1, 0)

The main loop is in the forward() method, where the shift and sum steps are applied (see the paper).

A second file contains the "standard" Masked Self-Attention model msa_model.py. The two models are nearly identical, with the exception of the Dispatcher layers.

Multihead Dispatcher

The code above is the one used in the paper. After submission, a new model was found that performs better with multiple heads. This model can be found in this repo at dispatcher/dispatcher_model_multihead.py. Please use this latest model if you plan for competitive multihead results.

Run the code

A notebook is included here to run the code and generate texts using the various models: dispatcher.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
dispatcher		dispatcher
notebooks		notebooks
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for the paper Dispatcher: A Message Passing Approach To Language Modelling

Installation

Training the models

Evaluation

Code

Multihead Dispatcher

Run the code

About

Releases

Packages

Languages

fractalego/dispatcher

Folders and files

Latest commit

History

Repository files navigation

Code for the paper Dispatcher: A Message Passing Approach To Language Modelling

Installation

Training the models

Evaluation

Code

Multihead Dispatcher

Run the code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages