[WIP] Add UniLM model #2160

addf400 · 2019-12-13T08:54:20Z

Typical workflow for including a model

Here an overview of the general workflow:

add model/configuration/tokenization classes
add conversion scripts
add tests
finalize

Let's detail what should be done at each step

Adding model/configuration/tokenization classes

Here is the workflow for adding model/configuration/tokenization classes:

copy the python files from the present folder to the main folder and rename them, replacing xxx with your model name,
edit the files to replace XXX (with various casing) with your model name
copy-paste or create a simple configuration class for your model in the configuration_... file
copy-paste or create the code for your model in the modeling_... files (PyTorch and TF 2.0)
copy-paste or create a tokenizer class for your model in the tokenization_... file

Adding conversion scripts

Here is the workflow for the conversion scripts:

copy the conversion script (convert_...) from the present folder to the main folder.
edit this script to convert your original checkpoint weights to the current pytorch ones.

Adding tests:

Here is the workflow for the adding tests:

copy the python files from the tests sub-folder of the present folder to the tests subfolder of the main folder and rename them, replacing xxx with your model name,
edit the tests files to replace XXX (with various casing) with your model name
edit the tests code as needed

Final steps

You can then finish the addition step by adding imports for your classes in the common files:

add import for all the relevant classes in __init__.py
add your configuration in configuration_auto.py
add your PyTorch and TF 2.0 model respectively in modeling_auto.py and modeling_tf_auto.py
add your tokenizer in tokenization_auto.py
add your models and tokenizer to pipeline.py
add a link to your conversion script in the main conversion utility (currently in __main__ but will be moved to the commands subfolder in the near future)
edit the PyTorch to TF 2.0 conversion script to add your model in the convert_pytorch_checkpoint_to_tf2.py file
add a mention of your model in the doc: README.md and the documentation itself at docs/source/pretrained_models.rst.
upload the pretrained weigths, configurations and vocabulary files.

codecov-io · 2019-12-13T09:35:02Z

Codecov Report

Merging #2160 into master will increase coverage by 0.04%.
The diff coverage is 49.69%.

@@            Coverage Diff             @@
##           master    #2160      +/-   ##
==========================================
+ Coverage   80.32%   80.37%   +0.04%     
==========================================
  Files         122      127       +5     
  Lines       18342    19000     +658     
==========================================
+ Hits        14734    15272     +538     
- Misses       3608     3728     +120

Impacted Files	Coverage Δ
transformers/modeling_utils.py	`88.91% <15.38%> (-2.55%)`	⬇️
transformers/configuration_auto.py	`44.68% <33.33%> (-0.78%)`	⬇️
transformers/tokenization_auto.py	`59.18% <33.33%> (-1.69%)`	⬇️
transformers/modeling_unilm.py	`36.95% <36.95%> (ø)`
transformers/modeling_auto.py	`38.79% <53.84%> (+1.89%)`	⬆️
transformers/tests/tokenization_unilm_test.py	`55% <55%> (ø)`
transformers/configuration_unilm.py	`87.09% <87.09%> (ø)`
transformers/tests/modeling_unilm_test.py	`93.75% <93.75%> (ø)`
transformers/tokenization_unilm.py	`94.73% <94.73%> (ø)`
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f061606...bbacc86. Read the comment docs.

rlouf · 2019-12-13T11:05:23Z

Thank you for the PR! I edited your post to add the guideline for adding a new model; we'll check the boxes as we go. I'll have a look at the code and come back to you quickly!

rlouf · 2019-12-13T11:10:35Z

transformers/tokenization_unilm.py

@@ -0,0 +1,65 @@
+# coding=utf-8
+# Copyright 2018 The HuggingFace Inc. team.


You may want to change the licence in the header with the licence of the original code:

The MIT License (MIT) Copyright (c) Microsoft Corporation Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

The licence are updated to MIT.

transformers.code-workspace

rlouf · 2019-12-13T11:34:00Z

transformers/modeling_unilm.py

+BertLayerNorm = torch.nn.LayerNorm
+
+
+class BertSelfAttention(nn.Module):


What is the difference between BertSelfAttention defined here and the once defined in modeling_bert.py?

The forward() method has the argument history_states so that we can incrementally perform decoding without re-computing the hidden states of previous time steps.

rlouf · 2019-12-13T11:34:19Z

transformers/modeling_unilm.py

+        return context_layer
+
+
+class BertAttention(nn.Module):


Same question?

The forward() method has the argument history_states so that we can incrementally perform decoding without re-computing the hidden states of previous time steps.

rlouf · 2019-12-13T11:34:29Z

transformers/modeling_unilm.py

+        return layer_output
+
+
+class BertEncoder(nn.Module):


Same question?

We will confirm as soon as possible.

The forward() method has the arguments prev_embedding, prev_encoded_layers so that we can incrementally perform decoding without re-computing the hidden states of previous time steps.

transformers/modeling_unilm.py

donglixp · 2019-12-16T07:49:25Z

Typical workflow for including a model

add your models and tokenizer to pipeline.py

@rlouf Sorry, I didn't find the pipeline.py file.

addf400

Reviewed

donglixp

ready to merge

addf400

Ready to merge

addf400

Ready

sshleifer

Thanks, this is a huge contribution! Sorry for taking so long to circle back to it. I added a bunch of comments to try to make sure people after you don't break it :). Most important, I think, is using ModelTester.all_model_classes. Let me know if you need help getting that working :)

sshleifer · 2020-02-08T22:17:52Z

transformers/tests/modeling_unilm_test.py

+@require_torch
+class UnilmModelTest(CommonTestCases.CommonModelTester):
+
+    all_model_classes = ()


I think this should be set to the models you want to test, otherwise test_common doesn't actually hit any of your classes.

sshleifer · 2020-02-08T22:22:24Z

examples/decode_seq2seq.py

+logger = logging.getLogger(__name__)
+
+
+def detokenize(tk_list):


(nit) I prefer token_list as the name, didnt know what tk was.

sshleifer · 2020-02-08T22:27:27Z

examples/gen_seq_from_trace.py

+
+
+def get_best_sequence(sample, eos_id, pad_id, length_penalty=None, alpha=None, expect=None, min_len=None):
+    # if not any((length_penalty, alpha, expect, min_len)):


delete or leave in

sshleifer · 2020-02-08T22:27:38Z

examples/gen_seq_from_trace.py

+    return seq
+
+
+def detokenize(tk_list):


defined elsewhere, pls use that one.

sshleifer · 2020-02-08T22:28:56Z

examples/run_seq2seq.py

@@ -0,0 +1,414 @@
+# coding=utf-8


(nit) should be called train_seq2seq.py?

sshleifer · 2020-02-08T22:58:43Z

transformers/modeling_unilm.py

+
+
+class UnilmForSeq2SeqDecode(UnilmPreTrainedModel):
+    """refer to BertForPreTraining"""


so this is what you use for the pretraining task in the paper? better docstring would help

sshleifer · 2020-02-08T23:00:25Z

transformers/modeling_unilm.py

+
+        return torch.cat(output_ids, dim=1)
+
+    def beam_search(self, input_ids, token_type_ids, position_ids, attention_mask):


possible to use PretrainedModel.generate?

transformers/src/transformers/modeling_utils.py

Line 585 in 6c1b235

def generate(

either for this or forward

sshleifer · 2020-02-08T23:07:43Z

transformers/modeling_utils.py

+            # initialize new position embeddings
+            _k = 'bert.embeddings.position_embeddings.weight'
+            if _k in state_dict and config.max_position_embeddings != state_dict[_k].shape[0]:
+                logger.info("config.max_position_embeddings != state_dict[bert.embeddings.position_embeddings.weight] ({0} - {1})".format(


ok this part scares me a little because it can effect other models, so need to be more cautious.
Would be much better to do is resize after we load the model with a resize_position_embeddings method, if possible.

If that's not possible, I would rather only support the default max_position_embeddings than do this in such a central place.

sshleifer · 2020-02-08T23:08:43Z

transformers/tests/modeling_unilm_test.py

+@require_torch
+class UnilmModelTest(CommonTestCases.CommonModelTester):
+
+    all_model_classes = ()


(major)
this needs to include your classes so that the common tests run on them. Sorry if I said this elsewhere.

sshleifer · 2020-02-08T23:10:10Z

transformers/tests/modeling_unilm_test.py

+    def test_for_seq2seq_finetuning(self):
+        config_and_inputs = self.model_tester.prepare_config_and_inputs_for_seq2seq_finetuning()
+        self.model_tester.create_and_check_unilm_model_for_seq2seq_finetuning(*config_and_inputs)
+


can you add an @slow test that hardcodes the features you want your prettrained transformer to produce (ala roberta tests) so that we know whether it changes somehow?

sshleifer · 2020-02-09T20:28:29Z

transformers/tests/modeling_unilm_test.py

+                attention_probs_dropout_prob=self.attention_probs_dropout_prob,
+                max_position_embeddings=self.max_position_embeddings,
+                type_vocab_size=self.type_vocab_size,
+                is_decoder=False,


is this ever True? I don't see decoders.

donglixp · 2020-02-11T11:51:53Z

@sshleifer Thanks for the comments! We will merge them into the code. @addf400

JetRunner · 2020-06-16T09:05:30Z

Let's restart with a new pull request @addf400 @donglixp

cccntu · 2020-10-04T04:09:05Z

Is anyone still working on this? @addf400 @donglixp @JetRunner
also @thomwolf from #1530

Kyeongpil · 2020-10-10T09:23:21Z

I'm also looking forward to applying the UniLM model via Huggingface Transformers!
@donglixp @JetRunner @thomwolf

jind11 · 2020-12-14T19:52:04Z

It seems that this pull request has lasted for a year but still not finished? Is someone still working on it?

huu4ontocord · 2021-01-22T18:56:53Z

Has this PR for UniLM model been added to Huggingface Transformers?
@donglixp @JetRunner @thomwolf @sshleifer

stefan-it · 2021-01-22T19:49:20Z

Hey @ontocord , I think it the "minilm" model should work out-of-the-box:

#5777

Not sure if you're looking for this model 🤔

I haven't tried it yet, but the recent Microsoft papers (on language modeling) are looking really promising!

huu4ontocord · 2021-01-25T01:04:06Z

Thanks @stefan-it. I don't think MiniLM and UniLM are the same thing, altough it all falls under one project. The MS papers are promising!

AnShengqiang · 2021-05-26T03:07:48Z

I'm also looking forward to applying the unilm model via Huggingface Transformers!

boy-be-ambitious · 2022-01-14T05:34:17Z

2022 year, still not merged the unilm model into the master branch.

AnShengqiang · 2022-04-25T11:26:07Z

I'm still looking forward to applying the unilm model via Huggingface Transformers! 👻👻

Alwin4Zhang · 2022-11-26T09:58:21Z

I'm still looking forward to applying the unilm model via Huggingface Transformers too!

donglixp added 2 commits October 19, 2019 20:36

merge unilm code

2bd3702

amp save&load

1aa685b

addf400 closed this Dec 13, 2019

addf400 reopened this Dec 13, 2019

addf400 added 2 commits December 13, 2019 09:10

Merge branch 'master' into master

6f519d2

Update modeling_unilm.py

75d0408

rlouf changed the title ~~UniLM update~~ [WIP] Add UniLM model Dec 13, 2019

rlouf reviewed Dec 13, 2019

View reviewed changes

transformers.code-workspace Outdated Show resolved Hide resolved

Delete transformers.code-workspace

28bb1ae

rlouf reviewed Dec 13, 2019

View reviewed changes

addf400 added 6 commits December 13, 2019 20:23

update get_linear_schedule_with_warmup

ad1efc3

update get_linear_schedule_with_warmup

bbf553d

update checkpoint url & base model

0992152

tokenizer for base model

da897dd

tokenizer for base model

4803777

Update MIT

3a683df

donglixp reviewed Dec 16, 2019

View reviewed changes

transformers/modeling_unilm.py Show resolved Hide resolved

addf400 and others added 4 commits December 16, 2019 16:05

Add unilm into readme

76bfe9a

update

8e2ac12

Merge branch 'master' into master

d946faa

Update README.md

49c016c

addf400 commented Dec 16, 2019

View reviewed changes

addf400 and others added 3 commits December 16, 2019 11:01

test for modeling & tokenizer

2227907

Update licence

0430325

Upload model checkpoint

58663ba

addf400 added 3 commits December 17, 2019 16:30

upload model config

a97ea6f

Update vocab

cce3218

Merge branch 'master' into master

f45ad65

donglixp approved these changes Dec 17, 2019

View reviewed changes

Update tokenization_auto.py

f34a338

addf400 commented Dec 20, 2019

View reviewed changes

addf400 added 3 commits December 23, 2019 19:29

Update config_auto.py

3f891dd

change name

03125cf

fx decode

8ab9bc3

addf400 commented Dec 26, 2019

View reviewed changes

fx scheduler

bbacc86

sshleifer suggested changes Feb 8, 2020

View reviewed changes

sshleifer reviewed Feb 9, 2020

View reviewed changes

JetRunner closed this Jun 16, 2020

This was referenced Sep 15, 2021

请问博主的uinlm体现在哪里，transformers里的bert的attention默认是双向的 chenjie97/SimBert_PyTorch#2

Open

似乎该仓库的代码和这里是一样的 YunwenTechnology/Unilm#25

Closed

		@@ -0,0 +1,65 @@
		# coding=utf-8
		# Copyright 2018 The HuggingFace Inc. team.

		BertLayerNorm = torch.nn.LayerNorm


		class BertSelfAttention(nn.Module):

		logger = logging.getLogger(__name__)


		def detokenize(tk_list):



		def get_best_sequence(sample, eos_id, pad_id, length_penalty=None, alpha=None, expect=None, min_len=None):
		# if not any((length_penalty, alpha, expect, min_len)):



		class UnilmForSeq2SeqDecode(UnilmPreTrainedModel):
		"""refer to BertForPreTraining"""


		return torch.cat(output_ids, dim=1)

		def beam_search(self, input_ids, token_type_ids, position_ids, attention_mask):

[WIP] Add UniLM model #2160

[WIP] Add UniLM model #2160

Conversation

addf400 commented Dec 13, 2019 • edited Loading

Typical workflow for including a model

Adding model/configuration/tokenization classes

Adding conversion scripts

Adding tests:

Final steps

codecov-io commented Dec 13, 2019 • edited Loading

Codecov Report

rlouf commented Dec 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

donglixp commented Dec 16, 2019

Typical workflow for including a model

addf400 left a comment

Choose a reason for hiding this comment

donglixp left a comment

Choose a reason for hiding this comment

addf400 left a comment

Choose a reason for hiding this comment

addf400 left a comment

Choose a reason for hiding this comment

sshleifer left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sshleifer Feb 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

donglixp commented Feb 11, 2020

JetRunner commented Jun 16, 2020

cccntu commented Oct 4, 2020

Kyeongpil commented Oct 10, 2020 • edited Loading

jind11 commented Dec 14, 2020

huu4ontocord commented Jan 22, 2021 • edited Loading

stefan-it commented Jan 22, 2021 • edited Loading

huu4ontocord commented Jan 25, 2021

AnShengqiang commented May 26, 2021

boy-be-ambitious commented Jan 14, 2022

AnShengqiang commented Apr 25, 2022

Alwin4Zhang commented Nov 26, 2022

addf400 commented Dec 13, 2019 •

edited

Loading

codecov-io commented Dec 13, 2019 •

edited

Loading

sshleifer left a comment •

edited

Loading

sshleifer Feb 8, 2020 •

edited

Loading

Kyeongpil commented Oct 10, 2020 •

edited

Loading

huu4ontocord commented Jan 22, 2021 •

edited

Loading

stefan-it commented Jan 22, 2021 •

edited

Loading