Add new model RoFormer (use rotary position embedding ) by JunnYu · Pull Request #11684 · huggingface/transformers

JunnYu · 2021-05-11T16:25:42Z

What does this PR do?

Add new model RoFormer

RoFormer: Enhanced Transformer with Rotary Position Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
The original code can be found here.

The abstract from the paper is the following:

Position encoding in transformer architecture provides supervision for dependency modeling between elements at
different positions in the sequence. We investigate various methods to encode positional information in
transformer-based language models and propose a novel implementation named Rotary Position Embedding(RoPE). The
proposed RoPE encodes absolute positional information with rotation matrix and naturally incorporates explicit relative
position dependency in self-attention formulation. Notably, RoPE comes with valuable properties such as flexibility of
being expand to any sequence lengths, decaying inter-token dependency with increasing relative distances, and
capability of equipping the linear self-attention with relative position encoding. As a result, the enhanced
transformer with rotary position embedding, or RoFormer, achieves superior performance in tasks with long texts. We
release the theoretical analysis along with some preliminary experiment results on Chinese data. The undergoing
experiment for English benchmark will soon be updated.

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

patil-suraj

Great addition, thanks a lot for adding this!

I left a few comments below, specifically,

please add as many copied from statements as possible.
would be nice to refactor sinusoidal embeds in it's own module

docs/source/index.rst

docs/source/model_doc/roformer.rst

src/transformers/models/auto/modeling_auto.py

src/transformers/models/roformer/modeling_roformer.py

src/transformers/models/roformer/tokenization_roformer.py

src/transformers/models/roformer/modeling_roformer.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

…lPositionalEmbedding

JunnYu · 2021-05-12T15:39:02Z

@patil-suraj I have updated some codes, please review again. Thanks~

patil-suraj

Thanks for working on this, the PR is in a good shape!

I left a few more comments below, specifically

the docstrings format should be fixed, this will make the build_doc tests pass
resolve the merge conflicts
and fix the style issues, we could do this by running make style and make quality

Let me know if you need any help with this :)

src/transformers/models/auto/modeling_tf_auto.py

patil-suraj · 2021-05-13T05:55:01Z

src/transformers/models/auto/modeling_tf_auto.py

    TFRobertaModel,
 )
+
+# Add modeling imports here


This shouldn't be here

src/transformers/models/roformer/__init__.py

patil-suraj · 2021-05-13T06:06:46Z

src/transformers/models/roformer/modeling_roformer.py

+
+
+# Copied from transformers.models.marian.modeling_marian.MarianSinusoidalPositionalEmbedding with Marian->RoFormer
+class RoFormerSinusoidalPositionalEmbedding(nn.Embedding):


patil-suraj · 2021-05-13T06:07:35Z

src/transformers/models/roformer/modeling_roformer.py

+                # sin [batch_size, num_heads, sequence_length, embed_size_per_head//2]
+                # cos [batch_size, num_heads, sequence_length, embed_size_per_head//2]
+                sin, cos = sinusoidal_pos.chunk(2, dim=-1)
+                # sin [θ0,θ1,θ2......θd/2-1]-> sin_pos [θ0,θ0,θ1,θ1,θ2,θ2......θd/2-1,θd/2-1]
+                sin_pos = torch.repeat_interleave(sin, 2, dim=-1)
+                # cos [θ0,θ1,θ2......θd/2-1]-> cos_pos [θ0,θ0,θ1,θ1,θ2,θ2......θd/2-1,θd/2-1]
+                cos_pos = torch.repeat_interleave(cos, 2, dim=-1)
+                # rotate_half_query_layer [-q1,q0,-q3,q2......,-qd-1,qd-2]
+                rotate_half_query_layer = torch.stack(
+                    [-query_layer[..., 1::2], query_layer[..., ::2]], dim=-1
+                ).reshape_as(query_layer)
+                query_layer = query_layer * cos_pos + rotate_half_query_layer * sin_pos
+                # rotate_half_key_layer [-k1,k0,-k3,k2......,-kd-1,kd-2]
+                rotate_half_key_layer = torch.stack([-key_layer[..., 1::2], key_layer[..., ::2]], dim=-1).reshape_as(
+                    key_layer
+                )
+                key_layer = key_layer * cos_pos + rotate_half_key_layer * sin_pos


src/transformers/models/roformer/modeling_roformer.py

src/transformers/models/marian/modeling_tf_marian.py

src/transformers/models/pegasus/modeling_tf_pegasus.py

…ding

JunnYu · 2021-05-13T09:28:57Z

@patil-suraj

I fixed the docstrings format and the build_doc tests pass
I have resolved the merge conflicts
I have run make style and make quality

Thank you for reviewing on this PR. ∩▂∩

patil-suraj

Very cool, that was really quick @JunnYu !

Would be nice to add the fast tokenizer as well. Other than that looks good to me!

@LysandreJik the run_tests_torch seems to be timing out, not sure why, could you please take a look ?

patrickvonplaten · 2021-05-13T11:21:59Z

src/transformers/models/roformer/configuration_roformer.py

+            The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
+        attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
+            The dropout ratio for the attention probabilities.
+        max_position_embeddings (:obj:`int`, `optional`, defaults to 512):


Can this be set to 2048 ? Would the model work with that?

I have set it to 1536.
The pre-training was done in multiple stages with changing batch size and maximum input sequence length in
order to adapt the model with various scenarios.

stage max_len batch_size train_step loss acc

1 512 256 200k 1.73 65.0%

2 1536 256 12.5k 1.61 66.8%

3 256 256 120k 1.75 64.6%

4 128 512 80k 1.83 63.4%

5 1536 256 10k 1.58 67.4%

6 512 512 30k 1.66 66.2%

And roformer_chinese_base model is a word level model not char level model.

BERT WoBERT NEZHA RoFormer

tokenization level char word char word

src/transformers/models/roformer/modeling_roformer.py

patrickvonplaten · 2021-05-13T11:27:26Z

src/transformers/models/roformer/tokenization_roformer.py

+        return "".join(output)
+
+
+class RoFormerTokenizer(BertTokenizer):


Could we try to not inherit from BertTokenizer & instead just copy & past all the functionality in here?

patrickvonplaten · 2021-05-13T11:31:37Z

src/transformers/models/roformer/modeling_roformer.py

+            key_layer = self.transpose_for_scores(self.key(hidden_states))
+            value_layer = self.transpose_for_scores(self.value(hidden_states))
+            if sinusoidal_pos is not None:
+                # https://kexue.fm/archives/8265


Could we maybe put all of this logic into a staticmethod function? that takes as in input sinusoidal_pos, query_layer, and key_layer? This would make it a bit more readable IMO and more importantly would allow us to easily test this layer

LysandreJik

Very nice work! It's in very good shape, thanks @JunnYu!

One aspect that is important is the ability to save/load tokenizers. Once this issue is solved, then it looks good to merge for me!

src/transformers/models/roformer/modeling_roformer.py

LysandreJik · 2021-05-14T16:00:23Z

src/transformers/models/roformer/tokenization_roformer.py

+        return "".join(output)
+
+
+class RoFormerTokenizer(BertTokenizer):


LysandreJik · 2021-05-14T16:01:21Z

src/transformers/models/roformer/tokenization_roformer.py

+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        try:
+            import jieba


This is the second occurrence of an import jieba in the library, I guess the third time we'll refactor that with dummy objects @sgugger?

LysandreJik · 2021-05-14T16:04:44Z

tests/test_tokenization_roformer.py

+    def test_added_token_serializable(self):
+        pass
+
+    def test_save_pretrained(self):
+        pass
+
+    def test_pickle_tokenizer(self):
+        pass
+
+    def test_save_and_load_tokenizer(self):
+        pass
+
+    def test_encode_decode_with_spaces(self):
+        pass


This is a bit of an issue, serializing a tokenizer is important! I guess one option to enable this is to delete self.jieba from the tokenizer before serializing it, as it is stateless?

LysandreJik · 2021-05-14T16:06:29Z

tests/test_modeling_roformer.py

+        for type in ["absolute", "relative_key", "relative_key_query"]:
+            config_and_inputs[0].position_embedding_type = type
+            self.model_tester.create_and_check_model(*config_and_inputs)


Is this test supported? I don't think there's a position_embedding_type in the model

@LysandreJik @sgugger RoFormerTokenizer is not identical to BertTokenizer when tokenizing Chinese chars.
So we need to create a new file for RoFormerTokenizer.

import jieba from transformers import BertTokenizer,RoFormerTokenizer jieba.lcut(zh_text,HMM=False) # ['今天天气', '非常', '好', '！'] # difference bert_tokenzier = BertTokenizer.from_pretrained("junnyu/roformer_chinese_base") roformer_tokenizer = RoFormerTokenizer.from_pretrained("junnyu/roformer_chinese_base") zh_text = "今天天气非常好！" bert_output = bert_tokenzier.tokenize(zh_text) roformer_output = roformer_tokenizer.tokenize(zh_text) print(bert_output) print(roformer_output) # ['今', '天', '天', '气', '非', '常', '好', '！'] # ['今', '天', '天', '气', '非常', '好', '！'] # same en_text = "I love Beijing, Beijing is the capital of China!" bert_output = bert_tokenzier.tokenize(en_text) roformer_output = roformer_tokenizer.tokenize(en_text) # ['i', 'love', 'be', '##i', '##jing', ',', 'be', '##i', '##jing', 'is', 'the', 'capital', 'of', 'china', '!'] # ['i', 'love', 'be', '##i', '##jing', ',', 'be', '##i', '##jing', 'is', 'the', 'capital', 'of', 'china', '!'] print(bert_output) print(roformer_output)

In this case, 今天天气 and 非常 are meaningful words. We shouldn't split them into single char.
For 今天天气, this word do not exist in the vocabulary，we should use original BertTokenizer to tokenize them.
we got 今 、天、 天、 气.
For 非常 , this word do exist in the vocabulary, we do not use original BertTokenizer to tokenize them.

LysandreJik

Minus the pickle issue, I think this looks great! Thanks for the efforts @JunnYu!

@patrickvonplaten and @sgugger could you take an additional look at this? Thank you

LysandreJik · 2021-05-19T14:29:32Z

tests/test_tokenization_roformer.py

+    def test_alignement_methods(self):
+        pass
+
+    def test_pickle_tokenizer(self):


If I understand correctly, you're doing something custom for this test because the PreTokenizer cannot be pickled.

This seems to put quite a large burden on the user were they to want to pickle their tokenizer. Wouldn't it be better to leverage to __get_state__ and __set_state__ similar to what we do for SentencePiece-based tokenizers? See an example here: https://github.com/huggingface/transformers/blob/master/src/transformers/models/albert/tokenization_albert.py#L182-L195

Thanks, I have changed this. :)

sgugger

Looking great apart from the issue with pickle! Thanks for adding the fast tokenzier!

sgugger · 2021-05-19T15:03:16Z

src/transformers/models/roformer/tokenization_roformer.py

+from typing import List, Optional, Tuple
+
+from ...utils import logging
+from ..bert.tokenization_bert import BasicTokenizer, PreTrainedTokenizer, WordpieceTokenizer, load_vocab


PreTrainedTokenizer should be imported from ...tokenization_utils

docs/source/model_doc/roformer.rst

patrickvonplaten · 2021-05-20T09:12:33Z

src/transformers/models/roformer/modeling_roformer.py

+        return outputs
+
+    @staticmethod
+    def apply_rotary_position_embeddings(sinusoidal_pos, query_layer, key_layer, value_layer=None):


Great to factor it out here!

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

patrickvonplaten · 2021-05-20T09:14:26Z

tests/test_modeling_roformer.py

+
+
+@require_torch
+class RoFormerSelfAttentionRotaryPositionEmbeddingTest(unittest.TestCase):


patrickvonplaten

Looks very clean now! Great job @JunnYu

You might have to run

make style

once to solve the check quality test ;-)

JunnYu · 2021-05-20T10:01:20Z

@patrickvonplaten i have done it，thanks;)

patrickvonplaten · 2021-05-20T10:47:48Z

Tests are fine I think (PyTorch times out :-/).
Good to merge for me

LysandreJik · 2021-05-20T12:00:49Z

Thanks a lot @JunnYu, fantastic addition!

…11684) * add roformer * Update docs/source/model_doc/roformer.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/roformer.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * update * add TFRoFormerSinusoidalPositionalEmbedding and fix TFMarianSinusoidalPositionalEmbedding * update docs * make style and make quality * roback * unchanged * rm copies from , this is a error in TFMarianSinusoidalPositionalEmbedding * update Copyright year * move # Add modeling imports here to the correct position * max_position_embeddings can be set to 1536 * # Copied from transformers.models.bert.modeling_bert.BertOutput with Bert->RoFormer * # Copied from transformers.models.bert.modeling_bert.BertLayer.__init__ with Bert->RoFormer * update tokenization_roformer * make style * add staticmethod apply_rotary_position_embeddings * add TF staticmethod apply_rotary_position_embeddings * update torch apply_rotary_position_embeddings * fix tf apply_rotary_position_embeddings error * make style * add pytorch RoFormerSelfAttentionRotaryPositionEmbeddingTest * add TF rotary_position_embeddings test * update test_modeling_rofomer * Update docs/source/model_doc/roformer.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/roformer/convert_roformer_original_tf_checkpoint_to_pytorch.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/roformer/modeling_roformer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/roformer/modeling_roformer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/roformer/modeling_tf_roformer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * refact roformer tokenizer * add RoFormerTokenizerFast * add RoFormerTokenizationTest * add require_jieba * update Copyright * update tokenizer & add copy from * add option rotary_value * use rust jieba * use rjieba * use rust jieba * fix test_alignement_methods * slice normalized_string is too slow * add config.embedding_size when embedding_size!=hidden_size * fix pickle tokenizer * Update docs/source/model_doc/roformer.rst Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style and make quality Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

JunnYu and others added 2 commits May 11, 2021 23:57

add roformer

2021c67

Merge branch 'huggingface:master' into master

a7e5e18

LysandreJik requested review from LysandreJik, patil-suraj, patrickvonplaten and sgugger May 11, 2021 18:55

JunnYu mentioned this pull request May 12, 2021

有计划让模型上线huggingface / transformers吗 JunnYu/RoFormer_pytorch#5

Closed

patil-suraj reviewed May 12, 2021

View reviewed changes

JunnYu and others added 4 commits May 12, 2021 16:33

Update docs/source/model_doc/roformer.rst

2d6fe8f

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Update docs/source/model_doc/roformer.rst

f2cd7f2

Co-authored-by: Suraj Patil <surajp815@gmail.com>

update

cc25c43

add TFRoFormerSinusoidalPositionalEmbedding and fix TFMarianSinusoida…

f529d61

…lPositionalEmbedding

JunnYu requested a review from patil-suraj May 12, 2021 15:40

patil-suraj reviewed May 13, 2021

View reviewed changes

JunnYu and others added 8 commits May 13, 2021 15:46

update docs

3837ae1

make style and make quality

b4379ed

roback

1ef4a4c

unchanged

b672683

rm copies from , this is a error in TFMarianSinusoidalPositionalEmbed…

382778c

…ding

update Copyright year

8f76efa

move # Add modeling imports here to the correct position

9790bee

Merge branch 'master' into master

d69185b

JunnYu requested a review from patil-suraj May 13, 2021 09:29

patil-suraj approved these changes May 13, 2021

View reviewed changes

patrickvonplaten reviewed May 13, 2021

View reviewed changes

src/transformers/models/roformer/modeling_roformer.py Show resolved Hide resolved

patrickvonplaten reviewed May 13, 2021

View reviewed changes

JunnYu requested review from patrickvonplaten and sgugger May 14, 2021 08:40

JunnYu and others added 4 commits May 14, 2021 16:53

add require_jieba

a19117a

Merge branch 'huggingface:master' into master

e90313c

update Copyright

42d995e

update

30b9540

LysandreJik reviewed May 14, 2021

View reviewed changes

update tokenizer & add copy from

ae12e0b

JunnYu requested a review from LysandreJik May 15, 2021 06:34

JunnYu added 7 commits May 15, 2021 18:20

add option rotary_value

fa70f25

use rust jieba

baec2fd

use rjieba

9500156

use rust jieba

b2ae8c1

fix test_alignement_methods

dcf994b

slice normalized_string is too slow

6cad267

add config.embedding_size when embedding_size!=hidden_size

80ac5c8

LysandreJik approved these changes May 19, 2021

View reviewed changes

sgugger approved these changes May 19, 2021

View reviewed changes

fix pickle tokenizer

b85213b

patrickvonplaten reviewed May 20, 2021

View reviewed changes

docs/source/model_doc/roformer.rst Outdated Show resolved Hide resolved

patrickvonplaten reviewed May 20, 2021

View reviewed changes

Update docs/source/model_doc/roformer.rst

c3708ff

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

patrickvonplaten reviewed May 20, 2021

View reviewed changes

patrickvonplaten approved these changes May 20, 2021

View reviewed changes

make style and make quality

986d1b2

LysandreJik merged commit 206f06f into huggingface:master May 20, 2021



		# Copied from transformers.models.marian.modeling_marian.MarianSinusoidalPositionalEmbedding with Marian->RoFormer
		class RoFormerSinusoidalPositionalEmbedding(nn.Embedding):

stage	max_len	batch_size	train_step	loss	acc
1	512	256	200k	1.73	65.0%
2	1536	256	12.5k	1.61	66.8%
3	256	256	120k	1.75	64.6%
4	128	512	80k	1.83	63.4%
5	1536	256	10k	1.58	67.4%
6	512	512	30k	1.66	66.2%

		return "".join(output)


		class RoFormerTokenizer(BertTokenizer):



		@require_torch
		class RoFormerSelfAttentionRotaryPositionEmbeddingTest(unittest.TestCase):

Conversation

JunnYu commented May 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?