-
Notifications
You must be signed in to change notification settings - Fork 26.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add UniLM model #2160
[WIP] Add UniLM model #2160
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2160 +/- ##
==========================================
+ Coverage 80.32% 80.37% +0.04%
==========================================
Files 122 127 +5
Lines 18342 19000 +658
==========================================
+ Hits 14734 15272 +538
- Misses 3608 3728 +120
Continue to review full report at Codecov.
|
Thank you for the PR! I edited your post to add the guideline for adding a new model; we'll check the boxes as we go. I'll have a look at the code and come back to you quickly! |
transformers/tokenization_unilm.py
Outdated
@@ -0,0 +1,65 @@ | |||
# coding=utf-8 | |||
# Copyright 2018 The HuggingFace Inc. team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to change the licence in the header with the licence of the original code:
The MIT License (MIT)
Copyright (c) Microsoft Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The licence are updated to MIT.
BertLayerNorm = torch.nn.LayerNorm | ||
|
||
|
||
class BertSelfAttention(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between BertSelfAttention
defined here and the once defined in modeling_bert.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The forward() method has the argument history_states
so that we can incrementally perform decoding without re-computing the hidden states of previous time steps.
return context_layer | ||
|
||
|
||
class BertAttention(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The forward() method has the argument history_states
so that we can incrementally perform decoding without re-computing the hidden states of previous time steps.
return layer_output | ||
|
||
|
||
class BertEncoder(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will confirm as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The forward() method has the arguments prev_embedding, prev_encoded_layers
so that we can incrementally perform decoding without re-computing the hidden states of previous time steps.
@rlouf Sorry, I didn't find the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ready to merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a huge contribution! Sorry for taking so long to circle back to it. I added a bunch of comments to try to make sure people after you don't break it :). Most important, I think, is using ModelTester.all_model_classes
. Let me know if you need help getting that working :)
@require_torch | ||
class UnilmModelTest(CommonTestCases.CommonModelTester): | ||
|
||
all_model_classes = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be set to the models you want to test, otherwise test_common doesn't actually hit any of your classes.
logger = logging.getLogger(__name__) | ||
|
||
|
||
def detokenize(tk_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) I prefer token_list as the name, didnt know what tk was.
|
||
|
||
def get_best_sequence(sample, eos_id, pad_id, length_penalty=None, alpha=None, expect=None, min_len=None): | ||
# if not any((length_penalty, alpha, expect, min_len)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete or leave in
return seq | ||
|
||
|
||
def detokenize(tk_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defined elsewhere, pls use that one.
@@ -0,0 +1,414 @@ | |||
# coding=utf-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) should be called train_seq2seq.py?
|
||
|
||
class UnilmForSeq2SeqDecode(UnilmPreTrainedModel): | ||
"""refer to BertForPreTraining""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is what you use for the pretraining task in the paper? better docstring would help
|
||
return torch.cat(output_ids, dim=1) | ||
|
||
def beam_search(self, input_ids, token_type_ids, position_ids, attention_mask): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible to use PretrainedModel.generate
?
def generate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either for this or forward
# initialize new position embeddings | ||
_k = 'bert.embeddings.position_embeddings.weight' | ||
if _k in state_dict and config.max_position_embeddings != state_dict[_k].shape[0]: | ||
logger.info("config.max_position_embeddings != state_dict[bert.embeddings.position_embeddings.weight] ({0} - {1})".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this part scares me a little because it can effect other models, so need to be more cautious.
Would be much better to do is resize after we load the model with a resize_position_embeddings
method, if possible.
If that's not possible, I would rather only support the default max_position_embeddings than do this in such a central place.
@require_torch | ||
class UnilmModelTest(CommonTestCases.CommonModelTester): | ||
|
||
all_model_classes = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(major)
this needs to include your classes so that the common tests run on them. Sorry if I said this elsewhere.
def test_for_seq2seq_finetuning(self): | ||
config_and_inputs = self.model_tester.prepare_config_and_inputs_for_seq2seq_finetuning() | ||
self.model_tester.create_and_check_unilm_model_for_seq2seq_finetuning(*config_and_inputs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an @slow test that hardcodes the features you want your prettrained transformer to produce (ala roberta tests) so that we know whether it changes somehow?
attention_probs_dropout_prob=self.attention_probs_dropout_prob, | ||
max_position_embeddings=self.max_position_embeddings, | ||
type_vocab_size=self.type_vocab_size, | ||
is_decoder=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this ever True? I don't see decoders.
@sshleifer Thanks for the comments! We will merge them into the code. @addf400 |
Is anyone still working on this? @addf400 @donglixp @JetRunner |
I'm also looking forward to applying the UniLM model via Huggingface Transformers! |
It seems that this pull request has lasted for a year but still not finished? Is someone still working on it? |
Has this PR for UniLM model been added to Huggingface Transformers? |
Thanks @stefan-it. I don't think MiniLM and UniLM are the same thing, altough it all falls under one project. The MS papers are promising! |
I'm also looking forward to applying the unilm model via Huggingface Transformers! |
2022 year, still not merged the unilm model into the master branch. |
I'm still looking forward to applying the unilm model via Huggingface Transformers! 👻👻 |
I'm still looking forward to applying the unilm model via Huggingface Transformers too! |
Typical workflow for including a model
Here an overview of the general workflow:
Let's detail what should be done at each step
Adding model/configuration/tokenization classes
Here is the workflow for adding model/configuration/tokenization classes:
xxx
with your model name,XXX
(with various casing) with your model nameconfiguration_...
filemodeling_...
files (PyTorch and TF 2.0)tokenization_...
fileAdding conversion scripts
Here is the workflow for the conversion scripts:
convert_...
) from the present folder to the main folder.Adding tests:
Here is the workflow for the adding tests:
tests
sub-folder of the present folder to thetests
subfolder of the main folder and rename them, replacingxxx
with your model name,XXX
(with various casing) with your model nameFinal steps
You can then finish the addition step by adding imports for your classes in the common files:
__init__.py
configuration_auto.py
modeling_auto.py
andmodeling_tf_auto.py
tokenization_auto.py
pipeline.py
__main__
but will be moved to thecommands
subfolder in the near future)convert_pytorch_checkpoint_to_tf2.py
fileREADME.md
and the documentation itself atdocs/source/pretrained_models.rst
.