-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPNet: Masked and Permuted Pre-training for Language Understanding #8971
Conversation
* refactor * further refactor * fix the rest tomorrow * save intermediate * finish slow tokenizer * make more tests pass * finish refactor * fix comment * clean further * fix name * fix naming * Update src/transformers/models/reformer/tokenization_reformer.py * Apply suggestions from code review * Apply suggestions from code review * refactor * fix init tokenizers * refactor * improve convert * refactor * correct convert slow tokenizer * final fix for Pegasus Tok * remove ipdb * improve links
* Fix minor typos * Additional typos * Style fix Co-authored-by: guyrosin <guyrosin@assist-561.cs.technion.ac.il>
* implement job skipping for doc-only PRs * silent grep is crucial * wip * wip * wip * wip * wip * wip * wip * wip * let's add doc * let's add code * revert test commits * restore * Better name * Better name * Better name * some more testing * some more testing * some more testing * finish testing
* Migration guide from v3.x to v4.x * Better wording * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Sylvain's comments * Better wording. Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add T5 Encoder class for feature extraction * fix T5 encoder add_start_docstrings indent * update init with T5 encoder * update init with TFT5ModelEncoder * remove TFT5ModelEncoder * change T5ModelEncoder order in init * add T5ModelEncoder to transformers init * clean T5ModelEncoder * update init with TFT5ModelEncoder * add TFModelEncoder for Tensorflow * update init with TFT5ModelEncoder * Update src/transformers/models/t5/modeling_t5.py change output from Seq2SeqModelOutput to BaseModelOutput Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * remove encoder_outputs 1. remove encoder_outputs from the function call. 2. remove the encoder_outputs If statement. 3. remove isinstance from return_dict. * Authorize missing decoder keys * remove unnecessary input parameters remove pask_key_values and use_cache * remove use_cache remove use_cache from the forward method * add doctoring for T5 encoder add doctoring for T5 encoder with T5_ENCODER_INPUTS_DOCSTRING * change return_dict to dot access * add T5_ENCODER_INPUTS_DOCSTRING for TF T5 * change TFT5Encoder output type to BaseModelOutput * remove unnecessary parameters for TFT5Encoder * remove unnecessary if statement * add import BaseModelOutput * fix BaseModelOutput typo to TFBaseModelOutput * update T5 doc with T5ModelEncoder * add T5ModelEncoder to tests * finish pytorch * finish docs and mt5 * add mtf to init * fix init * remove n_positions * finish PR * Update src/transformers/models/mt5/modeling_mt5.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/t5/modeling_t5.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/t5/modeling_tf_t5.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/mt5/modeling_tf_mt5.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * make style Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
… is_world_master() (huggingface#8828)
Related issue: huggingface#8837
* Use model.from_pretrained for DataParallel also When training on multiple GPUs, the code wraps a model with torch.nn.DataParallel. However if the model has custom from_pretrained logic, it does not get applied during load_best_model_at_end. This commit uses the underlying model during load_best_model_at_end, and re-wraps the loaded model with DataParallel. If you choose to reject this change, then could you please move the this logic to a function, e.g. def load_best_model_checkpoint(best_model_checkpoint) or something, so that it can be overridden? * Fix silly bug * Address review comments Thanks for the feedback. I made the change that you proposed, but I also think we should update L811 to check if `self.mode` is an instance of `PreTrained`, otherwise we would still not get into that `if` section, right?
* Remove deprecated `evalutate_during_training` * Update src/transformers/training_args_tf.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Slightly increase tolerance between pytorch and flax output Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * test_multiple_sentences doesn't require torch Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Simplify parameterization on "jit" to use boolean rather than str Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Use `require_torch` on `test_multiple_sentences` because we pull the weight from the hub. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Rename "jit" parameter to "use_jit" for (hopefully) making it self-documenting. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Remove pytest.mark.parametrize which seems to fail in some circumstances Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Fix unused imports. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Fix style. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Give default parameters values for traced model. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Review comment: Change sentences to sequences Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
…ngface#8781) * NerPipeline (TokenClassification) now outputs offsets of words - It happens that the offsets are missing, it forces the user to pattern match the "word" from his input, which is not always feasible. For instance if a sentence contains the same word twice, then there is no way to know which is which. - This PR proposes to fix that by outputting 2 new keys for this pipelines outputs, "start" and "end", which correspond to the string offsets of the word. That means that we should always have the invariant: ```python input[entity["start"]: entity["end"]] == entity["entity_group"] # or entity["entity"] if not grouped ``` * Fixing doc style
* fix DP case on multi-gpu * make executable * test all 3 modes * use the correct check for distributed * dp doesn't need a special case * restore original name * cleanup
* add CTRLForSequenceClassification * pass local test * merge with master * fix modeling test for sequence classification * fix deco * fix assert
* 2 typos - from_question_encoder_generator_configs fix 2 typos from_encoder_generator_configs --> from_question_encoder_generator_configs * apply make style
…it contains. Fixes huggingface#6582. (huggingface#8860) Update src/transformers/tokenization_utils_base.py with review fix Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* restore skip * Revert "Remove deprecated `evalutate_during_training` (huggingface#8852)" This reverts commit 5530299. * check that pipeline.git.base_revision is defined before proceeding * Revert "Revert "Remove deprecated `evalutate_during_training` (huggingface#8852)"" This reverts commit dfec84d. * check that pipeline.git.base_revision is defined before proceeding * doc only * doc + code * restore * restore * typo
* Add a `distributed_env` property to TrainingArguments * Change name * Address comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work, thanks a lot! There are few last adjustments to make and then it will be good to merge!
Great work on this model!
self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False) | ||
self.bias = nn.Parameter(torch.zeros(config.vocab_size)) | ||
|
||
# Need a link between the two variables so that the bias is correctly resized with `resize_token_embeddings` | ||
self.decoder.bias = self.bias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were just saying today with @LysandreJik that this is a bad design as it causes multiple problems with the equivalent PT/TF models afterwards. The variable self.bias
is not needed nor used. So L658 to L661 should be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger I directly remove L658 to L 661, it seems that cannot pass test_pt_tf_model_equivalence
since self.decoder.bias will be a random initialization. Maybe I can merge L658 to L661 as:
self.decoder.bias = nn.Parameter(torch.zeros(config.vocab_size))
, what is your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger It seems only using the original style (like L658 to L661) can pass tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then let's forget about this for now, we'll try to fix this on all models in another PR :-)
utils/check_repo.py
Outdated
@@ -37,6 +37,7 @@ | |||
"TFDPRSpanPredictor", # Building part of bigger (tested) model. | |||
"TFElectraMainLayer", # Building part of bigger (tested) model (should it be a TFPreTrainedModel ?) | |||
"TFRobertaForMultipleChoice", # TODO: fix | |||
"TFMPNetForMultipleChoice", # TODO: fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have someone working on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger I add one line to address this problem. But I am not sure its style is ok in the huggingface implementation.
This problem seems is because the position ids from fairseq are different from others (which is started from 2). Since our model is also implemented in fairseq, which will report such error (same as roberta).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger Besides, do you have any other comments.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!! I just left few tiny comments to address before to merge.
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
@jplu I have fixed your comments now. |
Thanks!! @sgugger @patrickvonplaten @LysandreJik I'm seeing that the |
I observe that some models also lack |
Hey @StillKeepTry, we are super sorry, we had a problem yesterday with git and this is why your git history is cluttered with wrong commits earlier. I cleaned your PR and pushed it to a new branch on master here: #9004 . However if you want to be the main author of the PR (which is 100% understandable and which is what I would want!), can you do the following steps to open a new clean PR which was exactly like before: In your repo (https://github.com/StillKeepTry/transformers), assuming that the remote to the original hugging face repo (https://github.com/huggingface/transformers.git) is called
Lemme know if you need help doing this (or if you don't mind merging #9004 - but it would be fairer to you if you're also officially the main author!). Big sorry again! |
@patrickvonplaten Never mind, just use your PR. I am ok if our work can be merged into the master quickly. |
Hello! I can not see the data collator for permuted and masked language models. Was it added also inside HuggingFace? There is an already proposed way to do this collator inside the trainer? Thanks! |
@gaceladri we have an example for permutation language modeling, check it out here: https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_plm.py |
Hi @LysandreJik, thank you for your kind response. This data collator that you pointed me out, is the collator from permuted language model used in XLNet right? I am unsure that this is a collator to replicate MPNet that mask tokens, not indices and also do the permutation. Sure that I am misunderstanding something... |
Model addition
MPNet
Model description
MPNet introduces a novel self-supervised objective named masked and permuted language modeling for language understanding. It inherits the advantages of both the masked language modeling (MLM) and the permuted language modeling (PLM) to addresses the limitations of MLM/PLM, and further reduce the inconsistency between the pre-training and fine-tuning paradigms.
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.