Skip to content

Commit

Permalink
Update the new model template (#6019)
Browse files Browse the repository at this point in the history
  • Loading branch information
sgugger committed Jul 24, 2020
1 parent 295466a commit a884b7f
Show file tree
Hide file tree
Showing 5 changed files with 989 additions and 571 deletions.
96 changes: 61 additions & 35 deletions templates/adding_a_new_model/README.md
Original file line number Diff line number Diff line change
@@ -1,64 +1,90 @@
# How to add a new model in 🤗Transformers
# How to add a new model in 🤗 Transformers

This folder describes the process to add a new model in 🤗Transformers and provide templates for the required files.
This folder describes the process to add a new model in 🤗 Transformers and provide templates for the required files.

The library is designed to incorporate a variety of models and code bases. As such the process for adding a new model usually mostly consists in copy-pasting to relevant original code in the various sections of the templates included in the present repository.
The library is designed to incorporate a variety of models and code bases. As such the process for adding a new model
usually mostly consists in copy-pasting to relevant original code in the various sections of the templates included in
the present repository.

One important point though is that the library has the following goals impacting the way models are incorporated:

- one specific feature of the API is the capability to run the model and tokenizer inline. The tokenization code thus often have to be slightly adapted to allow for running in the python interpreter.
- the package is also designed to be as self-consistent and with a small and reliable set of packages dependencies. In consequence, additional dependencies are usually not allowed when adding a model but can be allowed for the inclusion of a new tokenizer (recent examples of dependencies added for tokenizer specificities include `sentencepiece` and `sacremoses`). Please make sure to check the existing dependencies when possible before adding a new one.
- One specific feature of the API is the capability to run the model and tokenizer inline. The tokenization code thus
often have to be slightly adapted to allow for running in the python interpreter.
- the package is also designed to be as self-consistent and with a small and reliable set of packages dependencies. In
consequence, additional dependencies are usually not allowed when adding a model but can be allowed for the
inclusion of a new tokenizer (recent examples of dependencies added for tokenizer specificities include
`sentencepiece` and `sacremoses`). Please make sure to check the existing dependencies when possible before adding a
new one.

For a quick overview of the library organization, please check the [QuickStart section of the documentation](https://huggingface.co/transformers/quickstart.html).
For a quick overview of the general philosphy of the library and its organization, please check the
[QuickStart section of the documentation](https://huggingface.co/transformers/philosophy.html).

# Typical workflow for including a model

Here an overview of the general workflow:

- [ ] add model/configuration/tokenization classes
- [ ] add conversion scripts
- [ ] add tests
- [ ] add @slow integration test
- [ ] finalize
- [ ] Add model/configuration/tokenization classes.
- [ ] Add conversion scripts.
- [ ] Add tests and a @slow integration test.
- [ ] Document your model.
- [ ] Finalize.

Let's detail what should be done at each step
Let's detail what should be done at each step.

## Adding model/configuration/tokenization classes

Here is the workflow for adding model/configuration/tokenization classes:

- [ ] copy the python files from the present folder to the main folder and rename them, replacing `xxx` with your model name,
- [ ] edit the files to replace `XXX` (with various casing) with your model name
- [ ] copy-paste or create a simple configuration class for your model in the `configuration_...` file
- [ ] copy-paste or create the code for your model in the `modeling_...` files (PyTorch and TF 2.0)
- [ ] copy-paste or create a tokenizer class for your model in the `tokenization_...` file
- [ ] Copy the python files from the present folder to the main folder and rename them, replacing `xxx` with your model
name.
- [ ] Edit the files to replace `XXX` (with various casing) with your model name.
- [ ] Copy-paste or create a simple configuration class for your model in the `configuration_...` file.
- [ ] Copy-paste or create the code for your model in the `modeling_...` files (PyTorch and TF 2.0).
- [ ] Copy-paste or create a tokenizer class for your model in the `tokenization_...` file.

# Adding conversion scripts
## Adding conversion scripts

Here is the workflow for the conversion scripts:

- [ ] copy the conversion script (`convert_...`) from the present folder to the main folder.
- [ ] edit this script to convert your original checkpoint weights to the current pytorch ones.
- [ ] Copy the conversion script (`convert_...`) from the present folder to the main folder.
- [ ] Edit this script to convert your original checkpoint weights to the current pytorch ones.

# Adding tests:
## Adding tests:

Here is the workflow for the adding tests:

- [ ] copy the python files from the `tests` sub-folder of the present folder to the `tests` subfolder of the main folder and rename them, replacing `xxx` with your model name,
- [ ] edit the tests files to replace `XXX` (with various casing) with your model name
- [ ] edit the tests code as needed
- [ ] Copy the python files from the `tests` sub-folder of the present folder to the `tests` subfolder of the main
folder and rename them, replacing `xxx` with your model name.
- [ ] Edit the tests files to replace `XXX` (with various casing) with your model name.
- [ ] Edit the tests code as needed.

# Final steps
## Documenting your model:

Here is the workflow for documentation:

- [ ] Make sure all your arguments are properly documened in your configuration and tokenizer.
- [ ] Most of the documentation of the models is automatically generated, you just ahve to male sure that
`XXX_START_DOCSTRING` contains an introduction to the model you're adding and a link to the original
article and that `XXX_INPUTS_DOCSTRING` contains all the inputs of your model.
- [ ] Create a new page `xxx.rst` in the folder `docs/source/model_doc` and add this file in `docs/source/index.rst`.

Make sure to check you have no sphinx warnings when building the documentation locally and follow our
[documentaiton guide](https://github.com/huggingface/transformers/tree/master/docs#writing-documentation---specification).

## Final steps

You can then finish the addition step by adding imports for your classes in the common files:

- [ ] add import for all the relevant classes in `__init__.py`
- [ ] add your configuration in `configuration_auto.py`
- [ ] add your PyTorch and TF 2.0 model respectively in `modeling_auto.py` and `modeling_tf_auto.py`
- [ ] add your tokenizer in `tokenization_auto.py`
- [ ] add your models and tokenizer to `pipeline.py`
- [ ] add a link to your conversion script in the main conversion utility (in `commands/convert.py`)
- [ ] edit the PyTorch to TF 2.0 conversion script to add your model in the `convert_pytorch_checkpoint_to_tf2.py` file
- [ ] add a mention of your model in the doc: `README.md` and the documentation itself at `docs/source/pretrained_models.rst`.
- [ ] upload the pretrained weights, configurations and vocabulary files.
- [ ] create model card(s) for your models on huggingface.co. For those last two steps, check the [model sharing documentation](https://github.com/huggingface/transformers#quick-tour-of-model-sharing).
- [ ] Add import for all the relevant classes in `__init__.py`.
- [ ] Add your configuration in `configuration_auto.py`.
- [ ] Add your PyTorch and TF 2.0 model respectively in `modeling_auto.py` and `modeling_tf_auto.py`.
- [ ] Add your tokenizer in `tokenization_auto.py`.
- [ ] Add your models and tokenizer to `pipeline.py`.
- [ ] Add a link to your conversion script in the main conversion utility (in `commands/convert.py`)
- [ ] Edit the PyTorch to TF 2.0 conversion script to add your model in the `convert_pytorch_checkpoint_to_tf2.py`
file.
- [ ] Add a mention of your model in the doc: `README.md` and the documentation itself
in `docs/source/index.rst` and `docs/source/pretrained_models.rst`.
- [ ] Upload the pretrained weights, configurations and vocabulary files.
- [ ] Create model card(s) for your models on huggingface.co. For those last two steps, check the
[model sharing documentation](https://huggingface.co/transformers/model_sharing.html).
132 changes: 62 additions & 70 deletions templates/adding_a_new_model/configuration_xxx.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@


import logging
from typing import Callable, Union

from .configuration_utils import PretrainedConfig

Expand All @@ -30,85 +31,76 @@

class XxxConfig(PretrainedConfig):
r"""
:class:`~transformers.XxxConfig` is the configuration class to store the configuration of a
`XxxModel`.
This is the configuration class to store the configuration of a :class:`~transformers.XXXModel`.
It is used to instantiate a XXX model according to the specified arguments, defining the model
architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of
the XXX `xxx-base-uncased <https://huggingface.co/xxx/xxx-base-uncased>`__ architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used
to control the model outputs. Read the documentation from :class:`~transformers.PretrainedConfig`
for more information.
Arguments:
vocab_size: Vocabulary size of `inputs_ids` in `XxxModel`.
hidden_size: Size of the encoder layers and the pooler layer.
num_hidden_layers: Number of hidden layers in the Transformer encoder.
num_attention_heads: Number of attention heads for each attention layer in
the Transformer encoder.
intermediate_size: The size of the "intermediate" (i.e., feed-forward)
layer in the Transformer encoder.
hidden_act: The non-linear activation function (function or string) in the
encoder and pooler. If string, "gelu", "relu", "swish" and "gelu_new" are supported.
hidden_dropout_prob: The dropout probabilitiy for all fully connected
layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob: The dropout ratio for the attention
probabilities.
max_position_embeddings: The maximum sequence length that this model might
ever be used with. Typically set this to something large just in case
(e.g., 512 or 1024 or 2048).
type_vocab_size: The vocabulary size of the `token_type_ids` passed into
`XxxModel`.
initializer_range: The sttdev of the truncated_normal_initializer for
initializing all weight matrices.
layer_norm_eps: The epsilon used by LayerNorm.
Args:
vocab_size (:obj:`int`, optional, defaults to 30522):
Vocabulary size of the XXX model. Defines the different tokens that
can be represented by the `inputs_ids` passed to the forward method of :class:`~transformers.XXXModel`.
hidden_size (:obj:`int`, optional, defaults to 768):
Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, optional, defaults to 12):
Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, optional, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
hidden_act (:obj:`str` or :obj:`function`, optional, defaults to :obj:`"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler.
If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, optional, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, optional, defaults to 512):
The maximum sequence length that this model might ever be used with.
Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (:obj:`int`, optional, defaults to 2):
The vocabulary size of the `token_type_ids` passed into :class:`~transformers.BertModel`.
initializer_range (:obj:`float`, optional, defaults to 0.02):
The standard deviation of the :obj:`truncated_normal_initializer` for initializing all weight matrices.
layer_norm_eps (:obj:`float`, optional, defaults to 1e-5):
The epsilon used by the layer normalization layers.
gradient_checkpointing (:obj:`bool`, optional, defaults to :obj:`False`):
If :obj:`True`, use gradient checkpointing to save memory at the expense of slower backward pass.
kwargs:
Additional arguments for common configurations, passed to :class:`~transformers.PretrainedConfig`.
"""
model_type = "xxx"

def __init__(
self,
vocab_size=50257,
n_positions=1024,
n_ctx=1024,
n_embd=768,
n_layer=12,
n_head=12,
resid_pdrop=0.1,
embd_pdrop=0.1,
attn_pdrop=0.1,
layer_norm_epsilon=1e-5,
initializer_range=0.02,
summary_type="cls_index",
summary_use_proj=True,
summary_activation=None,
summary_proj_to_labels=True,
summary_first_dropout=0.1,
vocab_size: int = 50257,
hidden_size: int = 1024,
num_hidden_layers: int = 12,
num_attention_heads: int = 12,
hidden_act: Union[str, Callable] = "gelu",
hidden_dropout_prob: float = 0.1,
attention_probs_dropout_prob: float = 0.1,
max_position_embeddings: int = 512,
type_vocab_size: int = 2,
initializer_range: float = 0.02,
layer_norm_epsilon: float = 1e-5,
gradient_checkpointing: bool = False,
**kwargs
):
super().__init__(**kwargs)
self.vocab_size = vocab_size
self.n_ctx = n_ctx
self.n_positions = n_positions
self.n_embd = n_embd
self.n_layer = n_layer
self.n_head = n_head
self.resid_pdrop = resid_pdrop
self.embd_pdrop = embd_pdrop
self.attn_pdrop = attn_pdrop
self.layer_norm_epsilon = layer_norm_epsilon
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.hidden_act = hidden_act
self.hidden_dropout_prob = hidden_dropout_prob
self.attention_probs_dropout_prob = attention_probs_dropout_prob
self.max_position_embeddings = max_position_embeddings
self.type_vocab_size = type_vocab_size
self.initializer_range = initializer_range
self.summary_type = summary_type
self.summary_use_proj = summary_use_proj
self.summary_activation = summary_activation
self.summary_first_dropout = summary_first_dropout
self.summary_proj_to_labels = summary_proj_to_labels

@property
def max_position_embeddings(self):
return self.n_positions

@property
def hidden_size(self):
return self.n_embd

@property
def num_attention_heads(self):
return self.n_head

@property
def num_hidden_layers(self):
return self.n_layer
self.layer_norm_epsilon = layer_norm_epsilon
self.gradient_checkpointing = gradient_checkpointing
Loading

0 comments on commit a884b7f

Please sign in to comment.