Skip to content

Conversation

abheesht17
Copy link
Collaborator

@abheesht17 abheesht17 commented Mar 10, 2023

Very rudimentary guide (😭) for adding backbone models

Resolves #505

@abheesht17 abheesht17 marked this pull request as ready for review March 10, 2023 12:26
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might help if we start with a clear list of steps with checklists somewhere near the top. Something like

PR 1: Backbone & Colab

  • A xx/xx_backbone.py file in the style outlined below. [Example link].
  • A xx/xx_backbone_test.py file with a full set of unit test for the backbone. [Example link].
  • In the description PR, a colab demonstrating we can use the backbone to match outputs with a known source. [Example link].

PR 2: Tokenizer & Colab

  • ... (similar to above)

PR 3: Presets & Conversion Script

  • A xx/xx_presets.py file with links to weights uploaded to a personal GCP bucket.
  • A xx/xx_presets_test.py file with runnable tests using each preset.
  • A tools/checkpoint_conversion/xx... reusable script for converting checkpoints.

PRs 4+: Preprocessors, Tasks, etc.

  • todo

Something like this at the top could really be a good chance to really spell out specifically what we are looking for and make sure we get back quality PRs.


In this guide, we will walk you through the steps one needs to take in order to contribute
a backbone model. For illustration purposes, let's assume that you want to
contribute the DistilBERT model. Before we dive in, we encourage you to go through
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe say to read both the getting stated guide and the CONTRIBUTING.md guide first, and link to both

@@ -0,0 +1,204 @@
# Model Contribution Guide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this into the base of the repo, we don't really want to distribute random markdown with our package


A model is typically split into three/four sections. We would recommend you to
compare this side-by-side with the
[`keras_nlp.layers.DistilBERTBackbone` source code](https://github.com/keras-team/keras-nlp/blob/v0.4.1/keras_nlp/models/distil_bert/distil_bert_backbone.py#L108-L114)!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just link to the file on the master branch with no line linkages (we want to match the latest style, not the style at the 0.4.1 release!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, made changes. However, in certain cases, I've kept the line linkages. I guess the DistilBERT code in the master branch is fairly stable.

<br/>

The standard layers provided in Keras and KerasNLP are generally enough for
99% of the usecases and it is recommended to do a thorough search
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

99% -> most. I think sadly 99 would be a little overblown, new models love to mess with random bits of the arch.

If the model introduces a paradigm shift, such as using relative attention instead
of vanilla attention, the contributor will have to implement complete custom layers. A case
in point is `keras_nlp.models.DebertaV3Backbone` where we had to [implement layers
from scratch](https://github.com/keras-team/keras-nlp/tree/v0.4.1/keras_nlp/models/distil_bert).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to link to deberta here?

3) "Add `DistilBertBackbone` Presets"


### Open an issue/Claim an open issue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claim -> find

Given how many people we have looking for issues, we should discourage a "stake your claim" vibe :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:P

us know that you're interested in working on it. This helps us keep track of who
is working on what and avoid duplicated effort.

In case you open a new issue, please follow this template:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might cut this template below for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, remove?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we can just call it an example instead of a template.

"A sample issue is given below: "

you can furnish as much detail as possible to enable us to help you with the
contribution! 🙂

Occasionally, if the model requires, say a variant of the original self-attention,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels kind of repetitive with what is above, I wonder if we can restructure the flow a bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shifted it to the section above.

@abheesht17 abheesht17 requested a review from mattdangerw March 16, 2023 07:28
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok one more overall structure idea. What about if the checklist and the detailed guide below have the same structure. Like this...

Checklist

Step 1: Fine an issue.

  • Find or open an issue on KerasNLP for work on a new model architecture.

Step 2: Add a backbone.

  • A xx/xx_backbone.py file...
  • Other bullets...

Detailed instructions

Step 1: Fine an issue.

Your description below about finding an issue.

Step 2: Add a backbone.

Your discussion below of backbone.

etc.

In this guide, we will walk you through the steps one needs to take in order to contribute
a backbone model. For illustration purposes, let's assume that you want to
contribute the DistilBERT model. Before we dive in, we encourage you to go through
[our guide](https://keras.io/guides/keras_nlp/getting_started/) on getting started
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[our getting started guide](...) for an introduction to the library, and our [contribution guide](...) for an overview of our contribution process.

cut last sentence

- [ ] A `xx/xx_backbone_test.py` file which has unit tests for the backbone [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone_test.py)].
- [ ] A Colab notebook link in the PR description which matches the outputs of the implemented backbone model with the original source [[Example Link](https://colab.research.google.com/drive/1SeZWJorKWmwWJax8ORSdxKrxE25BfhHa?usp=sharing)].

### PR #2: Add XXTokenizer and XXPreprocessor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Often our preprocessors seem to come with a lot of extra discussion (as they are higher level), and for gpt, bart, tf and whisper we have started with just the tokenizers. Maybe we should just formalize this in our process? And recommend just the tokenizer here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, added a small note.

### PR #3: Add XX Presets
- [ ] A `xx/xx_presets.py` file with links to weights uploaded to a personal GCP bucket/Google Drive [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets.py)].
- [ ] A `xx/xx_presets_test.py` file with runnable tests for each preset [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets_test.py)].
- [ ] A `tools/checkpoint_conversion/convert_xx_checkpoints.py` which is reusable script for converting checkpoints [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/convert_distilbert_checkpoints.py)].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also recommend asking for a colab showing an end to end task here. E.g. write a classifier for distilbert using the low level backbone and tokenizer (as a preview for step 4 and validation for our presets).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an example Colab for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

- [ ] A `xx/xx_presets_test.py` file with runnable tests for each preset [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets_test.py)].
- [ ] A `tools/checkpoint_conversion/convert_xx_checkpoints.py` which is reusable script for converting checkpoints [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/convert_distilbert_checkpoints.py)].

### (Optional) PR #4: Add XX Tasks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write this as PR #4 and beyond: Add Tasks and Preprocessors.

This PR is optional and is for adding tasks like classifier, masked LM, etc. [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_classifier.py)]

## How is a Backbone model structured in KerasNLP?
To keep the code simple and readable, we follow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally keep one empty newline after headings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't do that because I leave lines before headings...and then, it gets confusing if I leave lines before and after headings :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the linter I have dislikes this, which is probably this -> https://github.com/markdownlint/markdownlint

But doing a single space before and after would also fit with other docs in the library, e.g. https://github.com/keras-team/keras-nlp/blob/master/STYLE_GUIDE.md

Can we switch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, done!


A model is typically split into three/four sections. We would recommend you to
compare this side-by-side with the
[`keras_nlp.layers.DistilBERTBackbone` source code](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone.py)!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DistilBertBackbone (fix case)

### Inputs to the model

Generally, the standard inputs to any text model are:
- `token_ids`: tokenised inputs (IDs in the form of text).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"IDs in the form of text" is a little confusing. "An integer representation of the text sequence"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I have no clue why I wrote that. That's a disgusting description. Maybe, I was sleepy, lol

@abheesht17 abheesht17 requested a review from mattdangerw March 22, 2023 09:10
@mattdangerw mattdangerw self-assigned this Mar 22, 2023
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still should probably do a closer careful read for typos, but here's some quick comments! Overall looking good.

@@ -0,0 +1,218 @@
# Model Contribution Guide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

silly nit, but we have CONTRIBUTING.md, should we make this CONTRIBUTING_MODELS.md or something to keep the filename parallel?

This PR is optional and is for adding tasks like classifier, masked LM, etc. [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_classifier.py)]

## How is a Backbone model structured in KerasNLP?
To keep the code simple and readable, we follow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the linter I have dislikes this, which is probably this -> https://github.com/markdownlint/markdownlint

But doing a single space before and after would also fit with other docs in the library, e.g. https://github.com/keras-team/keras-nlp/blob/master/STYLE_GUIDE.md

Can we switch?

This PR is optional and is for adding tasks like classifier, masked LM, etc. [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_classifier.py)]


## How is a Backbone model structured in KerasNLP?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole section could still go into the "detailed instructions" for Step 2 (creating a backbone).

## Detailed Instructions
This section discusses, in details, every necessary step.

### Open an issue/Find an open issue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 1: ...

### PR #3: Add XX Presets
- [ ] A `xx/xx_presets.py` file with links to weights uploaded to a personal GCP bucket/Google Drive [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets.py)].
- [ ] A `xx/xx_presets_test.py` file with runnable tests for each preset [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets_test.py)].
- [ ] A `tools/checkpoint_conversion/convert_xx_checkpoints.py` which is reusable script for converting checkpoints [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/convert_distilbert_checkpoints.py)].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abheesht17 abheesht17 requested a review from mattdangerw March 23, 2023 02:24
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Gave a last pass with some copy edits and other comments.

@@ -0,0 +1,235 @@
# Model Contribution Guide

KerasNLP has a plethora of cutting-edge "backbone" model ranging from BERT and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cutting-edge "backbone" model -> pre-trained large language models

# Model Contribution Guide

KerasNLP has a plethora of cutting-edge "backbone" model ranging from BERT and
RoBERTa to FNet and DeBERTa. We are always looking for more models and are always
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BERT and RoBERTa to FNet and DeBERTa -> BERT to OPT

open to contributions!

In this guide, we will walk you through the steps one needs to take in order to contribute
a backbone model. For illustration purposes, let's assume that you want to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a backbone model -> a new pre-trained model to KerasNLP

(we are talking about more than just "backbones" below)


- [ ] An `xx/xx_backbone.py` file which has the model graph [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone.py)].
- [ ] An `xx/xx_backbone_test.py` file which has unit tests for the backbone [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone_test.py)].
- [ ] A Colab notebook link in the PR description which matches the outputs of the implemented backbone model with the original source [[Example Link](https://colab.research.google.com/drive/1SeZWJorKWmwWJax8ORSdxKrxE25BfhHa?usp=sharing)].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe say to include this in the PR description? Same for other colabs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have. "A Colab notebook link in the PR description..."

- [ ] An `xx/xx_backbone_test.py` file which has unit tests for the backbone [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone_test.py)].
- [ ] A Colab notebook link in the PR description which matches the outputs of the implemented backbone model with the original source [[Example Link](https://colab.research.google.com/drive/1SeZWJorKWmwWJax8ORSdxKrxE25BfhHa?usp=sharing)].

### Step 3: PR #2 - Add XXTokenizer and XXPreprocessor [The preprocessor is optional at this stage]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we could keep this simple and just list XXTokenizer, leave preprocessor as a follow up.


For a full list of the tokenizers KerasNLP offers, please visit [this link](https://keras.io/api/keras_nlp/tokenizers/) and make use of the tokenizer your model uses!

#### Preprocessor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ditch preprocessor from step 3 entirely, we could probably rework this whole section to be a bit shorted and go in the final step

#### Tokenizer

Most text models nowadays use subword tokenizers such as WordPiece, SentencePiece
and BPE Tokenizer. Hence, it becomes easy to implement the tokenizer layer for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hence, it becomes easy to implement the tokenizer layer for the model; the tokenizer class inherits from one of these base tokenizer classes. -> Since KerasNLP has implementation of most popular subword tokenizers, usually the tokenizer class inherits from a base tokenizer class.

mask token, pad token, etc. These have to be
[added as member attributes](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_tokenizer.py#L91-L105)
to the tokenizer class. These member attributes are then accessed by the
preprocessor class.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accessed by the preprocessor class. -> accessed by preprocessor layers.


### Step 4: PR #3 - Add XX Presets

Once the two PR have been merged, you can open a PR for adding presets. For every
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two PR -> backbone and tokenizer have been merged


After wrapping up the preset configuration file, you need to
add the `from_preset` function to all three classes, i.e., `DistilBertBackbone`,
`DistilBertTokenizer` and `DistilBertPreprocessor`,. Here is an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove trailing comma, and potentially DistilBertPreprocessor as mentioned above.

@mattdangerw mattdangerw mentioned this pull request Mar 28, 2023
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! We might be able to take this for a spin with the XLNet PR.

@mattdangerw mattdangerw merged commit 4d6d4b0 into keras-team:master Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a Basic README for Conversion Scripts
2 participants