-
Notifications
You must be signed in to change notification settings - Fork 301
Add model contribution guide #820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model contribution guide #820
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might help if we start with a clear list of steps with checklists somewhere near the top. Something like
PR 1: Backbone & Colab
- A
xx/xx_backbone.py
file in the style outlined below. [Example link]. - A
xx/xx_backbone_test.py
file with a full set of unit test for the backbone. [Example link]. - In the description PR, a colab demonstrating we can use the backbone to match outputs with a known source. [Example link].
PR 2: Tokenizer & Colab
- ... (similar to above)
PR 3: Presets & Conversion Script
- A
xx/xx_presets.py
file with links to weights uploaded to a personal GCP bucket. - A
xx/xx_presets_test.py
file with runnable tests using each preset. - A
tools/checkpoint_conversion/xx...
reusable script for converting checkpoints.
PRs 4+: Preprocessors, Tasks, etc.
- todo
Something like this at the top could really be a good chance to really spell out specifically what we are looking for and make sure we get back quality PRs.
|
||
In this guide, we will walk you through the steps one needs to take in order to contribute | ||
a backbone model. For illustration purposes, let's assume that you want to | ||
contribute the DistilBERT model. Before we dive in, we encourage you to go through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe say to read both the getting stated guide and the CONTRIBUTING.md guide first, and link to both
@@ -0,0 +1,204 @@ | |||
# Model Contribution Guide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this into the base of the repo, we don't really want to distribute random markdown with our package
|
||
A model is typically split into three/four sections. We would recommend you to | ||
compare this side-by-side with the | ||
[`keras_nlp.layers.DistilBERTBackbone` source code](https://github.com/keras-team/keras-nlp/blob/v0.4.1/keras_nlp/models/distil_bert/distil_bert_backbone.py#L108-L114)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just link to the file on the master branch with no line linkages (we want to match the latest style, not the style at the 0.4.1 release!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, made changes. However, in certain cases, I've kept the line linkages. I guess the DistilBERT code in the master
branch is fairly stable.
<br/> | ||
|
||
The standard layers provided in Keras and KerasNLP are generally enough for | ||
99% of the usecases and it is recommended to do a thorough search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
99% -> most. I think sadly 99 would be a little overblown, new models love to mess with random bits of the arch.
If the model introduces a paradigm shift, such as using relative attention instead | ||
of vanilla attention, the contributor will have to implement complete custom layers. A case | ||
in point is `keras_nlp.models.DebertaV3Backbone` where we had to [implement layers | ||
from scratch](https://github.com/keras-team/keras-nlp/tree/v0.4.1/keras_nlp/models/distil_bert). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to link to deberta here?
3) "Add `DistilBertBackbone` Presets" | ||
|
||
|
||
### Open an issue/Claim an open issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claim -> find
Given how many people we have looking for issues, we should discourage a "stake your claim" vibe :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:P
us know that you're interested in working on it. This helps us keep track of who | ||
is working on what and avoid duplicated effort. | ||
|
||
In case you open a new issue, please follow this template: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might cut this template below for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in, remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we can just call it an example instead of a template.
"A sample issue is given below: "
you can furnish as much detail as possible to enable us to help you with the | ||
contribution! 🙂 | ||
|
||
Occasionally, if the model requires, say a variant of the original self-attention, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels kind of repetitive with what is above, I wonder if we can restructure the flow a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shifted it to the section above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok one more overall structure idea. What about if the checklist and the detailed guide below have the same structure. Like this...
Checklist
Step 1: Fine an issue.
- Find or open an issue on KerasNLP for work on a new model architecture.
Step 2: Add a backbone.
- A
xx/xx_backbone.py
file... - Other bullets...
Detailed instructions
Step 1: Fine an issue.
Your description below about finding an issue.
Step 2: Add a backbone.
Your discussion below of backbone.
etc.
MODEL_CONTRIBUTION_GUIDE.md
Outdated
In this guide, we will walk you through the steps one needs to take in order to contribute | ||
a backbone model. For illustration purposes, let's assume that you want to | ||
contribute the DistilBERT model. Before we dive in, we encourage you to go through | ||
[our guide](https://keras.io/guides/keras_nlp/getting_started/) on getting started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[our getting started guide](...) for an introduction to the library, and our [contribution guide](...) for an overview of our contribution process.
cut last sentence
MODEL_CONTRIBUTION_GUIDE.md
Outdated
- [ ] A `xx/xx_backbone_test.py` file which has unit tests for the backbone [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone_test.py)]. | ||
- [ ] A Colab notebook link in the PR description which matches the outputs of the implemented backbone model with the original source [[Example Link](https://colab.research.google.com/drive/1SeZWJorKWmwWJax8ORSdxKrxE25BfhHa?usp=sharing)]. | ||
|
||
### PR #2: Add XXTokenizer and XXPreprocessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Often our preprocessors seem to come with a lot of extra discussion (as they are higher level), and for gpt, bart, tf and whisper we have started with just the tokenizers. Maybe we should just formalize this in our process? And recommend just the tokenizer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, added a small note.
MODEL_CONTRIBUTION_GUIDE.md
Outdated
### PR #3: Add XX Presets | ||
- [ ] A `xx/xx_presets.py` file with links to weights uploaded to a personal GCP bucket/Google Drive [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets.py)]. | ||
- [ ] A `xx/xx_presets_test.py` file with runnable tests for each preset [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets_test.py)]. | ||
- [ ] A `tools/checkpoint_conversion/convert_xx_checkpoints.py` which is reusable script for converting checkpoints [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/convert_distilbert_checkpoints.py)]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also recommend asking for a colab showing an end to end task here. E.g. write a classifier for distilbert using the low level backbone and tokenizer (as a preview for step 4 and validation for our presets).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an example Colab for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but making one now -> https://gist.github.com/mattdangerw/bf0ca07fb66b6738150c8b56ee5bab4e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
MODEL_CONTRIBUTION_GUIDE.md
Outdated
- [ ] A `xx/xx_presets_test.py` file with runnable tests for each preset [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets_test.py)]. | ||
- [ ] A `tools/checkpoint_conversion/convert_xx_checkpoints.py` which is reusable script for converting checkpoints [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/convert_distilbert_checkpoints.py)]. | ||
|
||
### (Optional) PR #4: Add XX Tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write this as PR #4 and beyond: Add Tasks and Preprocessors
.
MODEL_CONTRIBUTION_GUIDE.md
Outdated
This PR is optional and is for adding tasks like classifier, masked LM, etc. [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_classifier.py)] | ||
|
||
## How is a Backbone model structured in KerasNLP? | ||
To keep the code simple and readable, we follow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally keep one empty newline after headings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't do that because I leave lines before headings...and then, it gets confusing if I leave lines before and after headings :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the linter I have dislikes this, which is probably this -> https://github.com/markdownlint/markdownlint
But doing a single space before and after would also fit with other docs in the library, e.g. https://github.com/keras-team/keras-nlp/blob/master/STYLE_GUIDE.md
Can we switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, done!
MODEL_CONTRIBUTION_GUIDE.md
Outdated
|
||
A model is typically split into three/four sections. We would recommend you to | ||
compare this side-by-side with the | ||
[`keras_nlp.layers.DistilBERTBackbone` source code](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone.py)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DistilBertBackbone (fix case)
MODEL_CONTRIBUTION_GUIDE.md
Outdated
### Inputs to the model | ||
|
||
Generally, the standard inputs to any text model are: | ||
- `token_ids`: tokenised inputs (IDs in the form of text). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"IDs in the form of text" is a little confusing. "An integer representation of the text sequence"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I have no clue why I wrote that. That's a disgusting description. Maybe, I was sleepy, lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still should probably do a closer careful read for typos, but here's some quick comments! Overall looking good.
@@ -0,0 +1,218 @@ | |||
# Model Contribution Guide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
silly nit, but we have CONTRIBUTING.md
, should we make this CONTRIBUTING_MODELS.md
or something to keep the filename parallel?
MODEL_CONTRIBUTION_GUIDE.md
Outdated
This PR is optional and is for adding tasks like classifier, masked LM, etc. [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_classifier.py)] | ||
|
||
## How is a Backbone model structured in KerasNLP? | ||
To keep the code simple and readable, we follow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the linter I have dislikes this, which is probably this -> https://github.com/markdownlint/markdownlint
But doing a single space before and after would also fit with other docs in the library, e.g. https://github.com/keras-team/keras-nlp/blob/master/STYLE_GUIDE.md
Can we switch?
MODEL_CONTRIBUTION_GUIDE.md
Outdated
This PR is optional and is for adding tasks like classifier, masked LM, etc. [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_classifier.py)] | ||
|
||
|
||
## How is a Backbone model structured in KerasNLP? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this whole section could still go into the "detailed instructions" for Step 2 (creating a backbone).
MODEL_CONTRIBUTION_GUIDE.md
Outdated
## Detailed Instructions | ||
This section discusses, in details, every necessary step. | ||
|
||
### Open an issue/Find an open issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step 1: ...
MODEL_CONTRIBUTION_GUIDE.md
Outdated
### PR #3: Add XX Presets | ||
- [ ] A `xx/xx_presets.py` file with links to weights uploaded to a personal GCP bucket/Google Drive [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets.py)]. | ||
- [ ] A `xx/xx_presets_test.py` file with runnable tests for each preset [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_presets_test.py)]. | ||
- [ ] A `tools/checkpoint_conversion/convert_xx_checkpoints.py` which is reusable script for converting checkpoints [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/convert_distilbert_checkpoints.py)]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but making one now -> https://gist.github.com/mattdangerw/bf0ca07fb66b6738150c8b56ee5bab4e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Gave a last pass with some copy edits and other comments.
CONTRIBUTING_MODELS.md
Outdated
@@ -0,0 +1,235 @@ | |||
# Model Contribution Guide | |||
|
|||
KerasNLP has a plethora of cutting-edge "backbone" model ranging from BERT and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cutting-edge "backbone" model -> pre-trained large language models
CONTRIBUTING_MODELS.md
Outdated
# Model Contribution Guide | ||
|
||
KerasNLP has a plethora of cutting-edge "backbone" model ranging from BERT and | ||
RoBERTa to FNet and DeBERTa. We are always looking for more models and are always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BERT and RoBERTa to FNet and DeBERTa -> BERT to OPT
CONTRIBUTING_MODELS.md
Outdated
open to contributions! | ||
|
||
In this guide, we will walk you through the steps one needs to take in order to contribute | ||
a backbone model. For illustration purposes, let's assume that you want to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a backbone model -> a new pre-trained model to KerasNLP
(we are talking about more than just "backbones" below)
CONTRIBUTING_MODELS.md
Outdated
|
||
- [ ] An `xx/xx_backbone.py` file which has the model graph [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone.py)]. | ||
- [ ] An `xx/xx_backbone_test.py` file which has unit tests for the backbone [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone_test.py)]. | ||
- [ ] A Colab notebook link in the PR description which matches the outputs of the implemented backbone model with the original source [[Example Link](https://colab.research.google.com/drive/1SeZWJorKWmwWJax8ORSdxKrxE25BfhHa?usp=sharing)]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe say to include this in the PR description? Same for other colabs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have. "A Colab notebook link in the PR description..."
CONTRIBUTING_MODELS.md
Outdated
- [ ] An `xx/xx_backbone_test.py` file which has unit tests for the backbone [[Example Link](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_backbone_test.py)]. | ||
- [ ] A Colab notebook link in the PR description which matches the outputs of the implemented backbone model with the original source [[Example Link](https://colab.research.google.com/drive/1SeZWJorKWmwWJax8ORSdxKrxE25BfhHa?usp=sharing)]. | ||
|
||
### Step 3: PR #2 - Add XXTokenizer and XXPreprocessor [The preprocessor is optional at this stage] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we could keep this simple and just list XXTokenizer, leave preprocessor as a follow up.
CONTRIBUTING_MODELS.md
Outdated
|
||
For a full list of the tokenizers KerasNLP offers, please visit [this link](https://keras.io/api/keras_nlp/tokenizers/) and make use of the tokenizer your model uses! | ||
|
||
#### Preprocessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we ditch preprocessor from step 3 entirely, we could probably rework this whole section to be a bit shorted and go in the final step
CONTRIBUTING_MODELS.md
Outdated
#### Tokenizer | ||
|
||
Most text models nowadays use subword tokenizers such as WordPiece, SentencePiece | ||
and BPE Tokenizer. Hence, it becomes easy to implement the tokenizer layer for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hence, it becomes easy to implement the tokenizer layer for the model; the tokenizer class inherits from one of these base tokenizer classes. -> Since KerasNLP has implementation of most popular subword tokenizers, usually the tokenizer class inherits from a base tokenizer class.
CONTRIBUTING_MODELS.md
Outdated
mask token, pad token, etc. These have to be | ||
[added as member attributes](https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/distil_bert/distil_bert_tokenizer.py#L91-L105) | ||
to the tokenizer class. These member attributes are then accessed by the | ||
preprocessor class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accessed by the preprocessor class. -> accessed by preprocessor layers.
CONTRIBUTING_MODELS.md
Outdated
|
||
### Step 4: PR #3 - Add XX Presets | ||
|
||
Once the two PR have been merged, you can open a PR for adding presets. For every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two PR -> backbone and tokenizer have been merged
CONTRIBUTING_MODELS.md
Outdated
|
||
After wrapping up the preset configuration file, you need to | ||
add the `from_preset` function to all three classes, i.e., `DistilBertBackbone`, | ||
`DistilBertTokenizer` and `DistilBertPreprocessor`,. Here is an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove trailing comma, and potentially DistilBertPreprocessor
as mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! We might be able to take this for a spin with the XLNet PR.
Very rudimentary guide (😭) for adding backbone models
Resolves #505