[M2M100] fix positional embeddings #10590

patil-suraj · 2021-03-08T08:58:11Z

What does this PR do?

The torchscript tests for M2M100 are failing on master. This is because the weights in M2M100SinusoidalPositionalEmbedding are initially not on the same device as the rest of the parameters.

The PR makes the weights as nn.Parameter so they'll be on the same device.

patil-suraj · 2021-03-08T09:02:39Z

src/transformers/models/m2m_100/modeling_m2m_100.py

+    _keys_to_ignore_on_save = [
+        r"model.encoder.embed_positions.weights",
+        r"model.decoder.embed_positions.weights",


since M2M100 uses sinusoidal positional embeddings, we don't need to save the pos embed weights.

tests/test_modeling_m2m_100.py

patrickvonplaten · 2021-03-08T10:35:15Z

tests/test_modeling_m2m_100.py

        decoder_input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)

+        # we need to clamp the input ids here to avoid having pad token in between


great! Thank for the very in-detail explanation!

* fix tests * emb should be a parameter * fix positional embeddings * fix make_weights * don't save pos embeds * add comment to describe the clamping

patil-suraj added 5 commits March 8, 2021 11:07

fix tests

34781ba

emb should be a parameter

b12c956

fix positional embeddings

001400f

fix make_weights

63358e7

don't save pos embeds

fe1d024

patil-suraj changed the title ~~Fix m2m100~~ [M2M100] fix positional embeddings Mar 8, 2021

patil-suraj commented Mar 8, 2021

View reviewed changes

patil-suraj requested review from LysandreJik and patrickvonplaten March 8, 2021 09:03

patrickvonplaten reviewed Mar 8, 2021

View reviewed changes

tests/test_modeling_m2m_100.py Outdated Show resolved Hide resolved

add comment to describe the clamping

b4ee406

patil-suraj requested a review from patrickvonplaten March 8, 2021 09:39

patrickvonplaten approved these changes Mar 8, 2021

View reviewed changes

patrickvonplaten reviewed Mar 8, 2021

View reviewed changes

patil-suraj merged commit 2a737bf into huggingface:master Mar 8, 2021

patil-suraj deleted the fix-m2m100 branch March 8, 2021 10:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[M2M100] fix positional embeddings #10590

[M2M100] fix positional embeddings #10590

patil-suraj commented Mar 8, 2021

patil-suraj Mar 8, 2021

patrickvonplaten Mar 8, 2021

		decoder_input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)

		# we need to clamp the input ids here to avoid having pad token in between

[M2M100] fix positional embeddings #10590

[M2M100] fix positional embeddings #10590

Conversation

patil-suraj commented Mar 8, 2021

What does this PR do?

patil-suraj Mar 8, 2021

Choose a reason for hiding this comment

patrickvonplaten Mar 8, 2021

Choose a reason for hiding this comment