Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
240 changes: 240 additions & 0 deletions CONTRIBUTING_MODELS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
# Model Contribution Guide

KerasNLP has a plethora of pre-trained large language models
ranging from BERT to OPT. We are always looking for more models and are always
open to contributions!

In this guide, we will walk you through the steps one needs to take in order to
contribute a new pre-trained model to KerasNLP. For illustration purposes, let's
assume that you want to contribute the DistilBERT model. Before we dive in, we encourage you to go through
[our getting started guide](https://keras.io/guides/keras_nlp/getting_started/)
for an introduction to the library, and our
[contribution guide](https://github.com/keras-team/keras-nlp/blob/master/CONTRIBUTING.md).

## Checklist

This to-do list is a brief outline of how a model can be contributed.
Keep this checklist handy!

### Step 1: Open an issue/find an issue

- [ ] Open an issue or find an issue to contribute a backbone model.

### Step 2: PR #1 - Add XXBackbone

- [ ] An `xx/xx_backbone.py` file which has the model graph \[[Example](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone.py)\].
- [ ] An `xx/xx_backbone_test.py` file which has unit tests for the backbone \[[Example](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone_test.py)\].
- [ ] A Colab notebook link in the PR description which matches the outputs of the implemented backbone model with the original source \[[Example](https://colab.research.google.com/drive/1SeZWJorKWmwWJax8ORSdxKrxE25BfhHa?usp=sharing)\].

### Step 3: PR #2 - Add XXTokenizer

- [ ] An `xx/xx_tokenizer.py` file which has the tokenizer for the model \[[Example](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_tokenizer.py)\].
- [ ] An `xx/xx_tokenizer_test.py` file which has unit tests for the model tokenizer \[[Example](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_tokenizer_test.py)\].
- [ ] A Colab notebook link in the PR description, demonstrating that the output of the tokenizer matches the original tokenizer \[[Example](https://colab.research.google.com/drive/1MH_rpuFB1Nz_NkKIAvVtVae2HFLjXZDA?usp=sharing)].

### Step 4: PR #3 - Add XX Presets

- [ ] An `xx/xx_presets.py` file with links to weights uploaded to a personal GCP bucket/Google Drive \[[Example](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets.py)\].
- [ ] An `xx/xx_presets_test.py` file with runnable tests for each preset \[[Example](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets_test.py)\].
- [ ] A `tools/checkpoint_conversion/convert_xx_checkpoints.py` which is reusable script for converting checkpoints \[[Example](https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/convert_distilbert_checkpoints.py)\].
- [ ] A Colab notebook link in the PR description, showing an end-to-end task such as text classification, etc. The task model can be built using the backbone model, with the task head on top \[[Example](https://gist.github.com/mattdangerw/bf0ca07fb66b6738150c8b56ee5bab4e)\].

### Step 5: PR #4 and Beyond - Add XX Tasks and Preprocessors

This PR is optional.

- [ ] An `xx/xx_<task>.py` file for adding a task model like classifier, masked LM, etc. \[[Example](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_classifier.py)\]
- [ ] An `xx/xx_<task>_preprocessor.py` file which has the preprocessor and can be used to get inputs suitable for the task model \[[Example](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_preprocessor.py)\].
- [ ] `xx/xx_<task>_test.py` file and `xx/xx_<task>_preprocessor_test.py` files which have unit tests for the above two modules \[[Example 1](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_classifier_test.py) and [Example 2](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_preprocessor_test.py)\].
- [ ] A Colab notebook link in the PR description, demonstrating that the output of the preprocessor matches the output of the original preprocessor \[[Example](https://colab.research.google.com/drive/1GFFC7Y1I_2PtYlWDToqKvzYhHWv1b3nC?usp=sharing)].

## Detailed Instructions

This section discusses, in details, every necessary step.

### Step 1: Open an issue/Find an open issue

Before getting started with the code, it's important to check if there are any
[open issues](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3Amodel-contribution)
related to the model you wish to contribute. If there is an open issue, you can
claim it by commenting on the issue and letting us know that you're interested
in working on it. This helps us keep track of who is working on what and avoid
duplicated effort.

If there aren't any open issues, you can create one by clicking the "New Issue"
button on our repository page.

Note that you need not have all the answers or complete knowledge of the inner
workings of the model at the time of opening the issue. But it is appreciated if
you can furnish as much detail as possible to enable us to help you with the
contribution! 🙂

### Step 2: PR #1 - Add XXBackbone

#### Add the backbone class

Once you are done identifying all the required layers, you should implement the
model backbone class.

To keep the code simple and readable, we follow
[Keras' functional style model](https://keras.io/guides/functional_api/) wrapped
around by a class to implement our models.

A model is typically split into three/four sections. We would recommend you to
compare this side-by-side with the
[`keras_nlp.layers.DistilBertBackbone` source code](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone.py)!

**Inputs to the model**

Generally, the standard inputs to any text model are:
- `token_ids`: tokenised inputs (An integer representation of the text sequence).
- `padding_mask`: Masks the padding tokens.

**Embedding layer(s)**

Standard layers used: `keras.layers.Embedding`,
`keras_nlp.layers.PositionEmbedding`, `keras_nlp.layers.TokenAndPositionEmbedding`.

**Encoder layers**

Standard layers used: `keras_nlp.layers.TransformerEncoder`, `keras_nlp.layers.FNetEncoder`.

**Decoder layers (possibly)**

Standard layers used: `keras_nlp.layers.TransformerDecoder`.

**Other layers which might be used**

`keras.layers.LayerNorm`, `keras.layers.Dropout`, `keras.layers.Conv1D`, etc.

<br/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure the break is really necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put a break because the paragraph at the bottom doesn't belong to the heading "Other layers...":

**Other layers which might be used**

`keras.layers.LayerNorm`, `keras.layers.Dropout`, `keras.layers.Conv1D`, etc.

<br/>

The standard layers provided in Keras and KerasNLP are generally enough for


The standard layers provided in Keras and KerasNLP are generally enough for
most of the usecases and it is recommended to do a thorough search
[here](https://keras.io/api/layers/) and [here](https://keras.io/api/keras_nlp/layers/).
However, sometimes, models have small tweaks/paradigm changes in their architecture.
This is when things might slightly get complicated.

If the model introduces a paradigm shift, such as using relative attention instead
of vanilla attention, the contributor will have to implement complete custom layers. A case
in point is `keras_nlp.models.DebertaV3Backbone` where we had to [implement layers
from scratch](https://github.com/keras-team/keras-nlp/tree/master/keras_nlp/models/deberta_v3).

On the other hand, if the model has a small tweak, something simpler can be done.
For instance, in the Whisper model, the self-attention and cross-attention mechanism
is exactly the same as vanilla attention, with the exception that the key projection
layer does not have a bias term. In this case, we can inherit the custom layer
from one of the standard layers and make minor modifications. See [this PR](https://github.com/keras-team/keras-nlp/pull/801/files#diff-8533ae3a7755c0dbe95ccbb71f85c677297f687bf3884fadefc64f1d0fdce51aR22) for
more details.

Since the first PR is only to add the model backbone class, you should omit the
`from_presets()` function; this will be added at a later stage when you open a PR
for adding presets.

#### Convert weights from the original source and check output!

Before you open a PR for adding the model backbone class, it is essential to check
whether the model has been implemented exactly as the source implementation. This
also helps in adding model "presets" at a later stage.

The preferred way of doing this is to add a Colab link in the PR description, which
1) converts the original preset weights to our format, and
2) checks whether the outputs of the original model and your implemented model are close enough.

It is okay if you demonstrate it for one preset at this stage; you can do the conversion
for the other presets when you officially add presets to the library at a later stage.

#### Add Unit Tests

It is essential to add units tests. These unit tests are basic and mostly check
whether the forward pass goes through successfully, whether the model can be saved
and loaded correctly, etc.

### Step 3: PR #2 - Add XXTokenizer

#### Tokenizer

Most text models nowadays use subword tokenizers such as WordPiece, SentencePiece
and BPE Tokenizer. Since KerasNLP has implementations of most of the popular
subword tokenizers, the model tokenizer layer typically inherits from a base
tokenizer class.

For example, DistilBERT uses the WordPiece tokenizer. So, we can introduce a new
class, `DistilBertTokenizer`, which inherits from `keras_nlp.tokenizers.WordPieceTokenizer`.
All the underlying actual tokenization will be taken care of by the superclass.

The important thing here is adding "special tokens". Most models have
special tokens such as beginning-of-sequence token, end-of-sequence token,
mask token, pad token, etc. These have to be
[added as member attributes](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_tokenizer.py#L91-L105)
to the tokenizer class. These member attributes are then accessed by the
preprocessor layers.

For a full list of the tokenizers KerasNLP offers, please visit
[this link](https://keras.io/api/keras_nlp/tokenizers/) and make use of the
tokenizer your model uses!

#### Unit Tests

The last step here is to add unit tests for the tokenizer. A dummy vocabulary is
created, and the output of both these layers is verified including tokenization,
detokenization, etc.

### Step 4: PR #3 - Add XX Presets

Once the backbone and tokenizer PRs have been merged, you can open a PR for
adding presets. For every model, we have a separate file where we mention our
preset configurations. This preset configuration has model-specific arguments
such as number of layers, number of attention heads; preprocessor-specific
arguments such as whether we want to lowercase the input text; checkpoint and
vocabulary file URLs, etc. In the PR description, you can add
Google Drive/personal GCP bucket links to the checkpoint and the vocabulary
files. These files will then be uploaded to GCP by us!

After wrapping up the preset configuration file, you need to
add the `from_preset` function to all three classes, i.e., `DistilBertBackbone`,
and `DistilBertTokenizer`. Here is an
[example](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone.py#L187-L189).

The testing for presets is divided into two: "large" and "extra large".
For "large" tests, we pick the smallest preset (in terms of number of parameters)
and verify whether the output is correct. For "extra large tests", we loop over
all the presets and just check whether the backbone and the tokenizer can
be called without any error.

Additionally, a checkpoint conversion script should be added. This script
demonstrates that the outputs of our backbone model and outputs of the source
model match. This should be done for all presets.

### Step 5: PR #4 and Beyond: Add XXTasks and XXPreprocessors

Once you are finished with Steps 1-4, you can add "task" models and
preprocessors.

### Task model

Task models are essentially models which have "task heads" on top of the backbone
models. For instance, for the text classification task, you can have a
feedforward layer on top of a backbone model like DistilBERT. Task models are
very essential since pretrained models are used extensively for downstream tasks
like text classification, token classification, text summarization, neural
machine translation, etc.

#### Preprocessor

The preprocessor class is responsible for making the inputs suitable for
consumption by the model - it packs multiple inputs together, i.e., given
multiple input texts, it will add appropriate special tokens, pad the inputs
and return the dictionary in the form expected by the model.

The preprocessor class might have a few intricacies depending on the model. For example,
the DeBERTaV3 tokenizer does not have the `[MASK]` in the provided sentencepiece
proto file, and we had to make some modifications [here](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/deberta_v3/deberta_v3_preprocessor.py). Secondly, we have
a separate preprocessor class for every task. This is because different tasks
might require different input formats. For instance, we have a [separate preprocessor](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_masked_lm_preprocessor.py)
for masked language modeling (MLM) for DistilBERT.

## Conclusion

Once all three PRs (and optionally, the fourth PR) have been merged, you have
successfully contributed a model to KerasNLP. Congratulations! 🔥