Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add UniLM model #2160

Closed
wants to merge 26 commits into from
Closed

[WIP] Add UniLM model #2160

wants to merge 26 commits into from

Conversation

addf400
Copy link

@addf400 addf400 commented Dec 13, 2019

Typical workflow for including a model

Here an overview of the general workflow:

  • add model/configuration/tokenization classes
  • add conversion scripts
  • add tests
  • finalize

Let's detail what should be done at each step

Adding model/configuration/tokenization classes

Here is the workflow for adding model/configuration/tokenization classes:

  • copy the python files from the present folder to the main folder and rename them, replacing xxx with your model name,
  • edit the files to replace XXX (with various casing) with your model name
  • copy-paste or create a simple configuration class for your model in the configuration_... file
  • copy-paste or create the code for your model in the modeling_... files (PyTorch and TF 2.0)
  • copy-paste or create a tokenizer class for your model in the tokenization_... file

Adding conversion scripts

Here is the workflow for the conversion scripts:

  • copy the conversion script (convert_...) from the present folder to the main folder.
  • edit this script to convert your original checkpoint weights to the current pytorch ones.

Adding tests:

Here is the workflow for the adding tests:

  • copy the python files from the tests sub-folder of the present folder to the tests subfolder of the main folder and rename them, replacing xxx with your model name,
  • edit the tests files to replace XXX (with various casing) with your model name
  • edit the tests code as needed

Final steps

You can then finish the addition step by adding imports for your classes in the common files:

  • add import for all the relevant classes in __init__.py
  • add your configuration in configuration_auto.py
  • add your PyTorch and TF 2.0 model respectively in modeling_auto.py and modeling_tf_auto.py
  • add your tokenizer in tokenization_auto.py
  • add your models and tokenizer to pipeline.py
  • add a link to your conversion script in the main conversion utility (currently in __main__ but will be moved to the commands subfolder in the near future)
  • edit the PyTorch to TF 2.0 conversion script to add your model in the convert_pytorch_checkpoint_to_tf2.py file
  • add a mention of your model in the doc: README.md and the documentation itself at docs/source/pretrained_models.rst.
  • upload the pretrained weigths, configurations and vocabulary files.

@addf400 addf400 closed this Dec 13, 2019
@addf400 addf400 reopened this Dec 13, 2019
@codecov-io
Copy link

codecov-io commented Dec 13, 2019

Codecov Report

Merging #2160 into master will increase coverage by 0.04%.
The diff coverage is 49.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2160      +/-   ##
==========================================
+ Coverage   80.32%   80.37%   +0.04%     
==========================================
  Files         122      127       +5     
  Lines       18342    19000     +658     
==========================================
+ Hits        14734    15272     +538     
- Misses       3608     3728     +120
Impacted Files Coverage Δ
transformers/modeling_utils.py 88.91% <15.38%> (-2.55%) ⬇️
transformers/configuration_auto.py 44.68% <33.33%> (-0.78%) ⬇️
transformers/tokenization_auto.py 59.18% <33.33%> (-1.69%) ⬇️
transformers/modeling_unilm.py 36.95% <36.95%> (ø)
transformers/modeling_auto.py 38.79% <53.84%> (+1.89%) ⬆️
transformers/tests/tokenization_unilm_test.py 55% <55%> (ø)
transformers/configuration_unilm.py 87.09% <87.09%> (ø)
transformers/tests/modeling_unilm_test.py 93.75% <93.75%> (ø)
transformers/tokenization_unilm.py 94.73% <94.73%> (ø)
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f061606...bbacc86. Read the comment docs.

@rlouf rlouf changed the title UniLM update [WIP] Add UniLM model Dec 13, 2019
@rlouf
Copy link
Contributor

rlouf commented Dec 13, 2019

Thank you for the PR! I edited your post to add the guideline for adding a new model; we'll check the boxes as we go. I'll have a look at the code and come back to you quickly!

@@ -0,0 +1,65 @@
# coding=utf-8
# Copyright 2018 The HuggingFace Inc. team.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to change the licence in the header with the licence of the original code:

The MIT License (MIT)

Copyright (c) Microsoft Corporation

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The licence are updated to MIT.

transformers.code-workspace Outdated Show resolved Hide resolved
BertLayerNorm = torch.nn.LayerNorm


class BertSelfAttention(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between BertSelfAttention defined here and the once defined in modeling_bert.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forward() method has the argument history_states so that we can incrementally perform decoding without re-computing the hidden states of previous time steps.

return context_layer


class BertAttention(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forward() method has the argument history_states so that we can incrementally perform decoding without re-computing the hidden states of previous time steps.

return layer_output


class BertEncoder(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will confirm as soon as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forward() method has the arguments prev_embedding, prev_encoded_layers so that we can incrementally perform decoding without re-computing the hidden states of previous time steps.

transformers/modeling_unilm.py Outdated Show resolved Hide resolved
@donglixp
Copy link
Contributor

Typical workflow for including a model

  • add your models and tokenizer to pipeline.py

@rlouf Sorry, I didn't find the pipeline.py file.

Copy link
Author

@addf400 addf400 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed

Copy link
Contributor

@donglixp donglixp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ready to merge

Copy link
Author

@addf400 addf400 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge

Copy link
Author

@addf400 addf400 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready

Copy link
Contributor

@sshleifer sshleifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a huge contribution! Sorry for taking so long to circle back to it. I added a bunch of comments to try to make sure people after you don't break it :). Most important, I think, is using ModelTester.all_model_classes. Let me know if you need help getting that working :)

@require_torch
class UnilmModelTest(CommonTestCases.CommonModelTester):

all_model_classes = ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be set to the models you want to test, otherwise test_common doesn't actually hit any of your classes.

logger = logging.getLogger(__name__)


def detokenize(tk_list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I prefer token_list as the name, didnt know what tk was.



def get_best_sequence(sample, eos_id, pad_id, length_penalty=None, alpha=None, expect=None, min_len=None):
# if not any((length_penalty, alpha, expect, min_len)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete or leave in

return seq


def detokenize(tk_list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defined elsewhere, pls use that one.

@@ -0,0 +1,414 @@
# coding=utf-8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) should be called train_seq2seq.py?



class UnilmForSeq2SeqDecode(UnilmPreTrainedModel):
"""refer to BertForPreTraining"""
Copy link
Contributor

@sshleifer sshleifer Feb 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is what you use for the pretraining task in the paper? better docstring would help


return torch.cat(output_ids, dim=1)

def beam_search(self, input_ids, token_type_ids, position_ids, attention_mask):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible to use PretrainedModel.generate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either for this or forward

# initialize new position embeddings
_k = 'bert.embeddings.position_embeddings.weight'
if _k in state_dict and config.max_position_embeddings != state_dict[_k].shape[0]:
logger.info("config.max_position_embeddings != state_dict[bert.embeddings.position_embeddings.weight] ({0} - {1})".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this part scares me a little because it can effect other models, so need to be more cautious.
Would be much better to do is resize after we load the model with a resize_position_embeddings method, if possible.

If that's not possible, I would rather only support the default max_position_embeddings than do this in such a central place.

@require_torch
class UnilmModelTest(CommonTestCases.CommonModelTester):

all_model_classes = ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(major)
this needs to include your classes so that the common tests run on them. Sorry if I said this elsewhere.

def test_for_seq2seq_finetuning(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs_for_seq2seq_finetuning()
self.model_tester.create_and_check_unilm_model_for_seq2seq_finetuning(*config_and_inputs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an @slow test that hardcodes the features you want your prettrained transformer to produce (ala roberta tests) so that we know whether it changes somehow?

attention_probs_dropout_prob=self.attention_probs_dropout_prob,
max_position_embeddings=self.max_position_embeddings,
type_vocab_size=self.type_vocab_size,
is_decoder=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this ever True? I don't see decoders.

@donglixp
Copy link
Contributor

@sshleifer Thanks for the comments! We will merge them into the code. @addf400

@JetRunner
Copy link
Contributor

Let's restart with a new pull request @addf400 @donglixp

@JetRunner JetRunner closed this Jun 16, 2020
@cccntu
Copy link
Contributor

cccntu commented Oct 4, 2020

Is anyone still working on this? @addf400 @donglixp @JetRunner
also @thomwolf from #1530

@Kyeongpil
Copy link
Contributor

Kyeongpil commented Oct 10, 2020

I'm also looking forward to applying the UniLM model via Huggingface Transformers!
@donglixp @JetRunner @thomwolf

@jind11
Copy link

jind11 commented Dec 14, 2020

It seems that this pull request has lasted for a year but still not finished? Is someone still working on it?

@huu4ontocord
Copy link

huu4ontocord commented Jan 22, 2021

Has this PR for UniLM model been added to Huggingface Transformers?
@donglixp @JetRunner @thomwolf @sshleifer

@stefan-it
Copy link
Collaborator

stefan-it commented Jan 22, 2021

Hey @ontocord , I think it the "minilm" model should work out-of-the-box:

#5777

Not sure if you're looking for this model 🤔

I haven't tried it yet, but the recent Microsoft papers (on language modeling) are looking really promising!

@huu4ontocord
Copy link

Thanks @stefan-it. I don't think MiniLM and UniLM are the same thing, altough it all falls under one project. The MS papers are promising!

@AnShengqiang
Copy link

I'm also looking forward to applying the unilm model via Huggingface Transformers!

@boy-be-ambitious
Copy link

2022 year, still not merged the unilm model into the master branch.

@AnShengqiang
Copy link

I'm still looking forward to applying the unilm model via Huggingface Transformers! 👻👻

@Alwin4Zhang
Copy link

I'm still looking forward to applying the unilm model via Huggingface Transformers too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet