Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPNet: Masked and Permuted Pre-training for Language Understanding #8971

Closed
wants to merge 83 commits into from

Conversation

StillKeepTry
Copy link
Contributor

Model addition

MPNet

Model description

MPNet introduces a novel self-supervised objective named masked and permuted language modeling for language understanding. It inherits the advantages of both the masked language modeling (MLM) and the permuted language modeling (PLM) to addresses the limitations of MLM/PLM, and further reduce the inconsistency between the pre-training and fine-tuning paradigms.

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.

patrickvonplaten and others added 30 commits November 28, 2020 19:50
* refactor

* further refactor

* fix the rest tomorrow

* save intermediate

* finish slow tokenizer

* make more tests pass

* finish refactor

* fix comment

* clean further

* fix name

* fix naming

* Update src/transformers/models/reformer/tokenization_reformer.py

* Apply suggestions from code review

* Apply suggestions from code review

* refactor

* fix init tokenizers

* refactor

* improve convert

* refactor

* correct convert slow tokenizer

* final fix for Pegasus Tok

* remove ipdb

* improve links
* Fix minor typos

* Additional typos

* Style fix

Co-authored-by: guyrosin <guyrosin@assist-561.cs.technion.ac.il>
* implement job skipping for doc-only PRs

* silent grep is crucial

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* let's add doc

* let's add code

* revert test commits

* restore

* Better name

* Better name

* Better name

* some more testing

* some more testing

* some more testing

* finish testing
* Migration guide from v3.x to v4.x

* Better wording

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sylvain's comments

* Better wording.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add T5 Encoder class for feature extraction

* fix T5 encoder add_start_docstrings indent

* update init with T5 encoder

* update init with TFT5ModelEncoder

* remove TFT5ModelEncoder

* change T5ModelEncoder order in init

* add T5ModelEncoder to transformers init

* clean T5ModelEncoder

* update init with TFT5ModelEncoder

* add TFModelEncoder for Tensorflow

* update init with TFT5ModelEncoder

* Update src/transformers/models/t5/modeling_t5.py

change output from Seq2SeqModelOutput to BaseModelOutput

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* remove encoder_outputs

1. remove encoder_outputs from the function call.
2. remove the encoder_outputs If statement.
3. remove isinstance from return_dict.

* Authorize missing decoder keys

* remove unnecessary input parameters

remove pask_key_values and use_cache

* remove use_cache

remove use_cache from the forward method

* add doctoring for T5 encoder

add doctoring for T5 encoder with T5_ENCODER_INPUTS_DOCSTRING

* change return_dict to dot access

* add T5_ENCODER_INPUTS_DOCSTRING for TF T5

* change TFT5Encoder output type to BaseModelOutput

* remove unnecessary parameters for TFT5Encoder

* remove unnecessary if statement

* add import BaseModelOutput

* fix BaseModelOutput typo to TFBaseModelOutput

* update T5 doc with T5ModelEncoder

* add T5ModelEncoder to tests

* finish pytorch

* finish docs and mt5

* add mtf to init

* fix init

* remove n_positions

* finish PR

* Update src/transformers/models/mt5/modeling_mt5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/t5/modeling_t5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/t5/modeling_tf_t5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/mt5/modeling_tf_mt5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Use model.from_pretrained for DataParallel also

When training on multiple GPUs, the code wraps a model with torch.nn.DataParallel. However if the model has custom from_pretrained logic, it does not get applied during load_best_model_at_end.

This commit uses the underlying model during load_best_model_at_end, and re-wraps the loaded model with DataParallel.

If you choose to reject this change, then could you please move the this logic to a function, e.g. def load_best_model_checkpoint(best_model_checkpoint) or something, so that it can be overridden?

* Fix silly bug

* Address review comments

Thanks for the feedback. I made the change that you proposed, but I also think we should update L811 to check if `self.mode` is an instance of `PreTrained`, otherwise we would still not get into that `if` section, right?
* Remove deprecated `evalutate_during_training`

* Update src/transformers/training_args_tf.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Slightly increase tolerance between pytorch and flax output

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* test_multiple_sentences doesn't require torch

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Simplify parameterization on "jit" to use boolean rather than str

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use `require_torch` on `test_multiple_sentences` because we pull the weight from the hub.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rename "jit" parameter to "use_jit" for (hopefully) making it self-documenting.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove pytest.mark.parametrize which seems to fail in some circumstances

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Give default parameters values for traced model.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Review comment: Change sentences to sequences

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
…ngface#8781)

* NerPipeline (TokenClassification) now outputs offsets of words

- It happens that the offsets are missing, it forces the user to pattern
match the "word" from his input, which is not always feasible.
For instance if a sentence contains the same word twice, then there
is no way to know which is which.
- This PR proposes to fix that by outputting 2 new keys for this
pipelines outputs, "start" and "end", which correspond to the string
offsets of the word. That means that we should always have the
invariant:

```python
input[entity["start"]: entity["end"]] == entity["entity_group"]
                                    # or entity["entity"] if not grouped
```

* Fixing doc style
* fix DP case on multi-gpu

* make executable

* test all 3 modes

* use the correct check for distributed

* dp doesn't need a special case

* restore original name

* cleanup
* add CTRLForSequenceClassification

* pass local test

* merge with master

* fix modeling test for sequence classification

* fix deco

* fix assert
* 2 typos - from_question_encoder_generator_configs

fix 2 typos
from_encoder_generator_configs --> from_question_encoder_generator_configs

* apply make style
…it contains. Fixes huggingface#6582. (huggingface#8860)

Update src/transformers/tokenization_utils_base.py with review fix

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* restore skip

* Revert "Remove deprecated `evalutate_during_training` (huggingface#8852)"

This reverts commit 5530299.

* check that pipeline.git.base_revision is defined before proceeding

* Revert "Revert "Remove deprecated `evalutate_during_training` (huggingface#8852)""

This reverts commit dfec84d.

* check that pipeline.git.base_revision is defined before proceeding

* doc only

* doc + code

* restore

* restore

* typo
* Add a `distributed_env` property to TrainingArguments

* Change name

* Address comment
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work, thanks a lot! There are few last adjustments to make and then it will be good to merge!
Great work on this model!

Comment on lines +657 to +661
self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
self.bias = nn.Parameter(torch.zeros(config.vocab_size))

# Need a link between the two variables so that the bias is correctly resized with `resize_token_embeddings`
self.decoder.bias = self.bias
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were just saying today with @LysandreJik that this is a bad design as it causes multiple problems with the equivalent PT/TF models afterwards. The variable self.bias is not needed nor used. So L658 to L661 should be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger I directly remove L658 to L 661, it seems that cannot pass test_pt_tf_model_equivalence since self.decoder.bias will be a random initialization. Maybe I can merge L658 to L661 as:

self.decoder.bias = nn.Parameter(torch.zeros(config.vocab_size)), what is your opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger It seems only using the original style (like L658 to L661) can pass tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then let's forget about this for now, we'll try to fix this on all models in another PR :-)

src/transformers/models/mpnet/tokenization_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/tokenization_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/tokenization_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/tokenization_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/tokenization_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/tokenization_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/tokenization_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/tokenization_mpnet.py Outdated Show resolved Hide resolved
@@ -37,6 +37,7 @@
"TFDPRSpanPredictor", # Building part of bigger (tested) model.
"TFElectraMainLayer", # Building part of bigger (tested) model (should it be a TFPreTrainedModel ?)
"TFRobertaForMultipleChoice", # TODO: fix
"TFMPNetForMultipleChoice", # TODO: fix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have someone working on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger I add one line to address this problem. But I am not sure its style is ok in the huggingface implementation.

https://github.com/StillKeepTry/transformers/blob/78dcc71fd96ec6a04739a422f38bab0532e42a8d/src/transformers/models/mpnet/modeling_tf_mpnet.py#L175

This problem seems is because the position ids from fairseq are different from others (which is started from 2). Since our model is also implemented in fairseq, which will report such error (same as roberta).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger Besides, do you have any other comments.

StillKeepTry and others added 8 commits December 8, 2020 12:48
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Copy link
Contributor

@jplu jplu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!! I just left few tiny comments to address before to merge.

src/transformers/models/mpnet/modeling_tf_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/modeling_tf_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/modeling_tf_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/modeling_tf_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/modeling_tf_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/modeling_tf_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/modeling_tf_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/modeling_tf_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/modeling_tf_mpnet.py Outdated Show resolved Hide resolved
src/transformers/models/mpnet/modeling_tf_mpnet.py Outdated Show resolved Hide resolved
StillKeepTry and others added 7 commits December 8, 2020 20:53
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
@StillKeepTry
Copy link
Contributor Author

@jplu I have fixed your comments now.

@jplu
Copy link
Contributor

jplu commented Dec 8, 2020

Thanks!!

@sgugger @patrickvonplaten @LysandreJik I'm seeing that the TFMPNetForPreTraining and MPNetForPreTraining are missing from the TF and PT file. Should they be added? Otherwise it is fine for me :)

@StillKeepTry
Copy link
Contributor Author

Thanks!!

@sgugger @patrickvonplaten @LysandreJik I'm seeing that the TFMPNetForPreTraining and MPNetForPreTraining are missing from the TF and PT file. Should they be added? Otherwise it is fine for me :)

I observe that some models also lack TFXXXForPreTraining and XXXForPreTraining. I am willing to add them in the next stage.

@patrickvonplaten patrickvonplaten mentioned this pull request Dec 9, 2020
5 tasks
@patrickvonplaten
Copy link
Contributor

Hey @StillKeepTry,

we are super sorry, we had a problem yesterday with git and this is why your git history is cluttered with wrong commits earlier. I cleaned your PR and pushed it to a new branch on master here: #9004 .
It should include all the commits you had earlier. I think we all gave our thumbs-up, so we could merge the other pull request to master (which would require the least amount of work from your side).

However if you want to be the main author of the PR (which is 100% understandable and which is what I would want!), can you do the following steps to open a new clean PR which was exactly like before:

In your repo (https://github.com/StillKeepTry/transformers), assuming that the remote to the original hugging face repo (https://github.com/huggingface/transformers.git) is called upstream:

$ git fetch upstream
$ git checkout upstream/master
$ git checkout -b add_mp_net_new
# now we'll cherry pick all of your commits
$ git cherry-pick 7361516^..78dcc71
$ git push
# => now you should be able to open new PR with exactly the commits you had previously

Lemme know if you need help doing this (or if you don't mind merging #9004 - but it would be fairer to you if you're also officially the main author!).

Big sorry again!

@StillKeepTry
Copy link
Contributor Author

@patrickvonplaten Never mind, just use your PR. I am ok if our work can be merged into the master quickly.

@gaceladri
Copy link

Hello! I can not see the data collator for permuted and masked language models. Was it added also inside HuggingFace? There is an already proposed way to do this collator inside the trainer?

Thanks!

@LysandreJik
Copy link
Member

@gaceladri we have an example for permutation language modeling, check it out here: https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_plm.py

@gaceladri
Copy link

Hi @LysandreJik, thank you for your kind response. This data collator that you pointed me out, is the collator from permuted language model used in XLNet right? I am unsure that this is a collator to replicate MPNet that mask tokens, not indices and also do the permutation. Sure that I am misunderstanding something...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet