Use cross_attention_hidden_size in Encoder-Decoder models #14378

ydshieh · 2021-11-12T10:50:17Z

What does this PR do?

Add a projection layer (enc_to_dec_proj) between encoder and decoder models in composite models, incorporating the attribute cross_attention_hidden_size.
- add some pt/tf equivalence and pt/flax equivalence tests in tf/flax composite model test scripts.
- also make some logging and ValueError messages consistent across composite model scripts.

ydshieh · 2021-11-13T13:25:45Z

I ran slow tests for all the encoder-decoder models test scripts, and it is fine. (e.g. RUN_SLOW=1 python -m pytest ...)

BTW, is there an easy way to run all cross tests in a test script, i.e. disabling @is_pt_tf_cross_test or @is_pt_flax_cross_test?

ydshieh · 2021-11-13T13:32:28Z

tests/test_modeling_tf_encoder_decoder.py

+        encoder(encoder.dummy_inputs)
+        decoder(decoder.dummy_inputs)
+        tf_model = TFEncoderDecoderModel(encoder=encoder, decoder=decoder)
+


TF encoder-decoder model family doesn't work smoothly with checkpoint loading, and requires some hacks to make it working.

In the case here, if a TF composite model (whose weights are created under the scope of the top model) saves its encoder/decoder component separately, the 2 checkpoints will contain the top model names, i.e. the encoder/decoder checkpoint weights will begin with tf_encoder_decoder_model.
This causes problems when we want to load them again, in particular, in from_encoder_decoder_pretrained.

However, if a TF composite model is constructed by having the encoder & decoder models first, their weight names don't have the top model name, and we can save the 2 components and reload them again.

P.S.: Once PR #14016 is merged, the equivalence tests need to be reworked in order to pass.

ydshieh · 2021-11-13T13:38:50Z

tests/test_modeling_tf_encoder_decoder.py

+        # self.assertTrue(config.hidden_size != decoder_config.hidden_size)
+        # self.check_equivalence_pt_to_tf(config, decoder_config, inputs_dict)
+        # self.check_equivalence_tf_to_pt(config, decoder_config, inputs_dict)
+


There is no easy way to deal with enc_to_dec_proj for TF composite models with regard to checkpoint loading, while we need to load the encoder/decoder components separately.

ydshieh · 2021-11-13T14:53:47Z

src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py

+        if (
+            self.encoder.config.hidden_size != self.decoder.config.hidden_size
+            and self.decoder.config.cross_attention_hidden_size is None
+        ):


Made this block the same as in other encoder/decoder models.

ydshieh · 2021-11-13T14:54:16Z

src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py

+        if (
+            self.encoder.config.hidden_size != self.decoder.config.hidden_size
+            and self.decoder.config.cross_attention_hidden_size is None
+        ):


Made this block the same as in other encoder/decoder models.

ydshieh · 2021-11-13T14:57:11Z

src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py

-                    f"No `encoder_model` is passed to kwargs: {kwargs_encoder}. "
-                    f"In this case make sure that `encoder_pretrained_model_name_or_path` defined"
+                    "If `encoder_model` is not defined as an argument, a `encoder_pretrained_model_name_or_path` has "
+                    "to be defined."


Changed it to be the same as the corresponding occurrence in other encoder decoder models.

NielsRogge

LGTM! Thanks for adding this consistency.

patrickvonplaten · 2021-11-29T16:25:33Z

Hey @ydshieh,

We need to slightly update this PR for the speech encoder decoder classes sadly so that the newly introduced variable config.output_hidden_size as shown here:

transformers/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py

Line 227 in cea17ac

    
           self.encoder_output_dim = getattr(config.encoder, "output_hidden_size", config.encoder.hidden_size)

is compatible with it.

The other files can stay the same :-)

ydshieh · 2021-11-29T16:53:55Z

Hey @ydshieh,

We need to slightly update this PR for the speech encoder decoder classes sadly so that the newly introduced variable config.output_hidden_size as shown here:

transformers/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py

Line 227 in cea17ac

self.encoder_output_dim = getattr(config.encoder, "output_hidden_size", config.encoder.hidden_size)

is compatible with it.

The other files can stay the same :-)

No problem, @patrickvonplaten. But I have a slight doubt at this line:

self.enc_to_dec_proj = nn.Linear(self.encoder.config.hidden_size, self.decoder.config.hidden_size)

Should it be

self.enc_to_dec_proj = nn.Linear(self.encoder_output_dim, self.decoder.config.hidden_size)

if config.output_hidden_size is introduced in the config and used here? I didn't go through the speech model, but it looks more natural to do so.

ydshieh · 2021-11-29T17:27:25Z

I made the necessary updates where config.output_hidden_size is involved.
I didn't change the line

self.enc_to_dec_proj = nn.Linear(self.encoder.config.hidden_size, self.decoder.config.hidden_size)

despite a slight doubt.

ydshieh · 2021-11-29T17:34:31Z

(Fixed)

The failed TF/Torch test is due to #14016 being merged to master (and I rebased this PR on master), which is expected. I will take care of this issue.

… (PT/Flax)

patrickvonplaten

Thanks a lot for working on this!

ydshieh changed the title ~~[WIP] Use cross_attention_hidden_size in Encoder-Decoder models~~ Use cross_attention_hidden_size in Encoder-Decoder models Nov 13, 2021

ydshieh marked this pull request as ready for review November 13, 2021 11:35

ydshieh commented Nov 13, 2021

View reviewed changes

ydshieh changed the title ~~Use cross_attention_hidden_size in Encoder-Decoder models~~ [WIP] Use cross_attention_hidden_size in Encoder-Decoder models Nov 14, 2021

ydshieh marked this pull request as draft November 14, 2021 12:46

ydshieh changed the title ~~[WIP] Use cross_attention_hidden_size in Encoder-Decoder models~~ Use cross_attention_hidden_size in Encoder-Decoder models Nov 14, 2021

ydshieh marked this pull request as ready for review November 14, 2021 15:07

NielsRogge requested a review from patrickvonplaten November 22, 2021 14:28

NielsRogge approved these changes Nov 22, 2021

View reviewed changes

ydshieh force-pushed the add_cross_attention_hidden_size branch from f5c0df5 to 7b9d31a Compare November 29, 2021 17:23

ydshieh added 11 commits December 2, 2021 17:05

add cross_attention_hidden_size to text-2-text encoder-decoder models…

e3ae9ce

… (PT/Flax)

for TFEncoderDecoderModel

513390d

add equivalence test for TFEncoderDecoderModel

ee159f2

fix

5ba920d

fix failed equivalence tests

2223dd0

remove unused import

eeca3cd

add detailed comment

e858eef

Fix check_equivalence_tf_to_pt by using encoder/decoder

dff66af

cleaning

11aa68c

Use cross_attention_hidden_size in speech-to-text

0e3e1ee

clean fast init logging msg in encoder decoder models

9d1a8e7

ydshieh added 7 commits December 2, 2021 17:05

increase tol from 1e-5 to 1e-3 for tf test

f610f69

style

218809b

style

f842342

make sure projection layer can run

fb49205

remove type conversion + add check

0faa9ee

fix conflict (config.output_hidden_size)

d61a7b1

Remove TF -> PT in check_pt_tf_equivalence for TFEncoderDecoderModel

f178e1d

ydshieh force-pushed the add_cross_attention_hidden_size branch from d78af6a to f178e1d Compare December 2, 2021 16:11

patrickvonplaten approved these changes Dec 6, 2021

View reviewed changes

patrickvonplaten merged commit 4cdb67c into huggingface:master Dec 6, 2021

ydshieh deleted the add_cross_attention_hidden_size branch May 5, 2022 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cross_attention_hidden_size in Encoder-Decoder models #14378

Use cross_attention_hidden_size in Encoder-Decoder models #14378

ydshieh commented Nov 12, 2021 •

edited

Loading

ydshieh commented Nov 13, 2021

ydshieh Nov 13, 2021 •

edited

Loading

ydshieh Nov 13, 2021

ydshieh Nov 13, 2021 •

edited

Loading

ydshieh Nov 13, 2021 •

edited

Loading

ydshieh Nov 13, 2021

NielsRogge left a comment

patrickvonplaten commented Nov 29, 2021 •

edited

Loading

ydshieh commented Nov 29, 2021

ydshieh commented Nov 29, 2021

ydshieh commented Nov 29, 2021 •

edited

Loading

patrickvonplaten left a comment

Use cross_attention_hidden_size in Encoder-Decoder models #14378

Use cross_attention_hidden_size in Encoder-Decoder models #14378

Conversation

ydshieh commented Nov 12, 2021 • edited Loading

What does this PR do?

ydshieh commented Nov 13, 2021

ydshieh Nov 13, 2021 • edited Loading

Choose a reason for hiding this comment

ydshieh Nov 13, 2021

Choose a reason for hiding this comment

ydshieh Nov 13, 2021 • edited Loading

Choose a reason for hiding this comment

ydshieh Nov 13, 2021 • edited Loading

Choose a reason for hiding this comment

ydshieh Nov 13, 2021

Choose a reason for hiding this comment

NielsRogge left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Nov 29, 2021 • edited Loading

ydshieh commented Nov 29, 2021

ydshieh commented Nov 29, 2021

ydshieh commented Nov 29, 2021 • edited Loading

patrickvonplaten left a comment

Choose a reason for hiding this comment

ydshieh commented Nov 12, 2021 •

edited

Loading

ydshieh Nov 13, 2021 •

edited

Loading

ydshieh Nov 13, 2021 •

edited

Loading

ydshieh Nov 13, 2021 •

edited

Loading

patrickvonplaten commented Nov 29, 2021 •

edited

Loading

ydshieh commented Nov 29, 2021 •

edited

Loading