New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model like #14992
Add model like #14992
Conversation
This is really cool, looking forward to it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your PR, @sgugger! This is a fantastic addition, which will be extremely helpful. I think having it work perfectly is a complex subject, so very impressed to see it perform so well already.
Questions/suggestions:
- It generates the conversion script, but I'm not sure this is relevant/exact in most cases. I would put it behind a question as well "Would you like to have the same conversion script as XXX in order to convert from an original checkpoint?"
# Copied from
statements appear twice in case we're adding a model like another which already has copied from statements. The two statements seem to copy to the appropriate model name chosen, however.- I tried to add a model like the
roberta
model, here are some attributes I would have expected to see changed but which remained like the original:- The introduction of the model file here:
"""PyTorch RoBERTa model.""" - The introduction to model classes:
transformers/src/transformers/models/roberta/modeling_roberta.py
Lines 878 to 880 in 5f3c57f
@add_start_docstrings( """RoBERTa Model with a `language modeling` head on top for CLM fine-tuning.""", ROBERTA_START_DOCSTRING )
- The introduction of the model file here:
- The copyright at the top is a complex question IMO, as the code is definitely inheriting from another so the copyright should be kept - but shouldn't the script ask for the organization which authored the model so that the copyright to that org is also respected? Maybe not, if they don't modify much of it. Open question!
- All copyrights should be changed to 2022
- The integration test generated didn't use the checkpoint I specified, instead changed the model identifier to include the lowercase model name I chose. Not necessarily important, or worth the (most likely complex!) change.
- I think the current approach is only set to work with NLP models? (at least models that have a tokenizer, even if non-NLP) Trying it with
detr
for example, throws aKeyError
asdetr
is not in theTOKENIZER_MAPPING_NAMES
- Absolutely love this:
What is the model you would like to duplicate? s2t
s2t is not a valid model type.
What is the model you would like to duplicate? speech2text
speech2text is not a valid model type.
Did you mean speech_to_text or speech_to_text_2 or unispeech?
.github/workflows/add-model-like.yml
Outdated
- "src/**" | ||
- "tests/**" | ||
- ".github/**" | ||
types: [assigned, opened, synchronize, reopened] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I know this is in the original model templates too) I don't think assigned
is necessary, as it will relaunch the tests when the PR is assigned to someone even though the tests will already have been run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove!
"frameworks": [ | ||
"pt", | ||
"tf", | ||
"flax" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
Once you're happy with it, we can then put it to practical use and give it a good real case test by re-doing #14084 which got out of sync with all the recent revamps. So basically cloning |
Note that GPT-2 is the model where this command is the most likely to fail as the checkpoint name is the same as the model type/model lower cased which then creates bad replacements I can't really control. It will also duplicate the |
But I'm not modifying GPT-Neo or GPT-J. I don't understand why GPT to GPTMeg is different from GPTNeo to GPTMeg? Is it because GPTNeo has a postfix over the prefix GPT. Could you please give an example of what you mean when you say it'd fail? Perhaps it can be done in 2 steps? GPT to XYZ, and then a few one-liner to rename XYZ to the target? |
Like I said, it's because the checkpoint for GPT2 is named gpt2, which is also the model type prefix for GPT2. It's the only model that conflates the two of them, which results in instances of the checkpoint not being replaced by the checkpoint of the new model, but the model type of the new model. So it needs careful proofreading of the generated files.
Just run the command and look at the result. |
c3a6fa6
to
877f5fe
Compare
This PR is now in a state that is good in my opinion. I have:
Also there was a fix on the check-copies script on master which make all the quality tests pass when the new model added includes the I don't have much more time to spend on this so for the following problems (or potential bugs) I'd like us to rely on the community. The test suite added is there to catch any regression.
This could be a nice feature to add, but it's also super easy to just remove the conversion scripts sine they are completely independent.
This is fixed.
I haven't touched at all the copyrights. The new model author should add themselves manually, but if nothing much is changed, I think it's goo to keep the defaults to the authors of the copied model. For the change of year, we can make it a good first issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all intents and purposes, this seems to work very well!
I tested with the following starting models:
- DETR
- BERT
- RoBERTa
- Wav2Vec2
- Vision encoder decoder
One bug I found with the vision encoder-decoder class (but it's questionable whether this class should be supported with this script) is that the resulting code still has the [CONFIG_CLASS]
everywhere.
Other than that, it looks great!
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Noted for the bug in |
Failure is unrelated so merging this! |
What does this PR do?
This PR adds a new command
transformers-cli add-new-model-like
that creates a new model in the Transformers library exactly like an existing one, for models we want to add that are just tweaked versions of an existing model.Fixes #14032
Note that this PR is still a draft, I'm just putting all of this here so that @stas00 can start playing around with it but there are still some rough edges, including:
Also it needs to be cleaned up and tested before a merge, but I'm going on vacation and this way there is a prototype available to play why I'm not here :-)
The whole thing works with string matches/replacements, so it can easily fall apart when a model has the same string of characters for its model type and checkpoint (like
gpt2
) or model camel-cased and upper-cased names (like GPT2). I have used gpt2 as a test model to make sure those are not out of control, but it's still a good idea to proofread the result.Other than that the command:
There are some rough edges still:
Those can be fixed in followup PRs if necessary, but I think they are acceptable steps for a user to fix manually for now. The test added adds a DistilBert-like model (can't user Bert as seen above) and checks the model added passes all the quality checks and that its common PyTorch tests pass (running the TF/Flax tests is significantly slower but they pass too).
To test right now, use an env where the clone of the transformers repo you are working on to add the package is the transformers library registered (might work if it isn't since the transformers module is only used to get the constant in the auto modules, but untested).
The easiest is to run
transformers-cli add-new-model-like
and follow the prompts.Otherwise, create a config.json file like the one "add_new_model_config.json" in the first commits of this PR (provided as an example, it has been removed in the final version) with content like this:
then run
Note that it's possible you have to redo an editable install of the repo with this branch checked out to properly register the new CLI command.