[EncoderDecoderModel] add a `add_cross_attention` boolean to config #6377

patrickvonplaten · 2020-08-10T10:27:44Z

The EncoderDecoderModel uses models from AUTO_MODEL_FOR_CAUSAL_LM as their decoder models. The problem is that these models can be used in two ways:

As a stand-alone decoder model (GPT2) like without cross-attention layers
As part of a EncoderDecoderModel with cross-attention layers.

Currently it is decided via the parameter config.is_decoder whether cross-attention layers should be added. The problem is that config.is_decoder is True for both 1) and 2), which is correct since both 1) and 2) should use a causal mask, but means that for 1) cross-attention layers are added without ever being used.

This PR solves this problem by introducing a new config param called add_cross_attention which is only relevant for models in AUTO_MODEL_FOR_CAUSAL_LM.

I also played around with the idea of not having the flag in the config, but just passing it along the init function, such as:

        super().__init__(config, add_cross_attention=False)

in

and then calling setting this param to True for all encoder-decoder models. I decided to put the param in the config instead because:
a) The init signature does not have to change and
b) EncoderDecoderModels make extensive use of AutoModelForCausalLM.from_pretrained(...) which would have meant that all models that are part of MODEL_FOR_CAUSAL_LM_MAPPING have to have this signature.

Taking all this into account I think the first solution (putting add_cross_attenion into the config) is the better way to go here.

IMPORTANT: This PR introduces a breaking change. All `EncoderDecoderModel` models have to be updated with `add_cross_attention=True`.

=> All "bert2bert" models were updated: https://huggingface.co/models?search=bert2bert

TODO:

After this, I think the framework is flexible enough to handle all other models and I can extend EncoderDecoderModel to GPT2, Roberta, Longformer and maybe Reformer as well.
EncoderDecoder is not yet officially released, I think, so this slightly backwards compatibility breaking change is OK. I will updated all Bert2Bert models on the model hub with add_cross_attention=True and add a bigger message in this PR when merged.

src/transformers/modeling_encoder_decoder.py

codecov · 2020-08-10T10:34:54Z

Codecov Report

Merging #6377 into master will decrease coverage by 0.02%.
The diff coverage is 90.00%.

@@            Coverage Diff             @@
##           master    #6377      +/-   ##
==========================================
- Coverage   78.38%   78.36%   -0.03%     
==========================================
  Files         148      148              
  Lines       27196    27202       +6     
==========================================
- Hits        21317    21316       -1     
- Misses       5879     5886       +7

Impacted Files	Coverage Δ
src/transformers/modeling_encoder_decoder.py	`91.02% <66.66%> (-1.19%)`	⬇️
src/transformers/configuration_encoder_decoder.py	`100.00% <100.00%> (ø)`
src/transformers/configuration_utils.py	`96.57% <100.00%> (+0.02%)`	⬆️
src/transformers/modeling_bert.py	`88.45% <100.00%> (+0.05%)`	⬆️
src/transformers/generation_tf_utils.py	`85.21% <0.00%> (-1.26%)`	⬇️
src/transformers/file_utils.py	`82.18% <0.00%> (-0.26%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1429b92...e2fcc0d. Read the comment docs.

sgugger

Seems like a good idea to me. Just added a few nits.

src/transformers/configuration_utils.py

src/transformers/modeling_bert.py

LysandreJik

Cool, LGTM. Nice and explicit change!

xxbidiao · 2020-09-16T19:46:00Z

All EncoderDecoderModel models have to be updated with add_cross_attention=True.

How do I exactly do this? I got hit by AttributeError: 'GPT2Config' object has no attribute 'add_cross_attention' after updating to newest release.

patrickvonplaten · 2020-09-16T23:07:13Z

Hey @xxbidiao,

You have to set gpt2.config.add_cross_attention = True and then save this config. Or you can directly add the parameter add_cross_attention=True to the gpt2 config json file

correct encoder decoder model

e8b68fa

patrickvonplaten requested review from sshleifer, thomwolf, LysandreJik and sgugger August 10, 2020 10:27

patrickvonplaten commented Aug 10, 2020

View reviewed changes

src/transformers/modeling_encoder_decoder.py Outdated Show resolved Hide resolved

Apply suggestions from code review

e7423cb

sgugger approved these changes Aug 10, 2020

View reviewed changes

src/transformers/configuration_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_bert.py Outdated Show resolved Hide resolved

LysandreJik approved these changes Aug 10, 2020

View reviewed changes

apply sylvains suggestions

e2fcc0d

patrickvonplaten merged commit 3425936 into huggingface:master Aug 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EncoderDecoderModel] add a `add_cross_attention` boolean to config #6377

[EncoderDecoderModel] add a `add_cross_attention` boolean to config #6377

patrickvonplaten commented Aug 10, 2020 •

edited

Loading

codecov bot commented Aug 10, 2020 •

edited

Loading

sgugger left a comment

LysandreJik left a comment

xxbidiao commented Sep 16, 2020

patrickvonplaten commented Sep 16, 2020

[EncoderDecoderModel] add a add_cross_attention boolean to config #6377

[EncoderDecoderModel] add a add_cross_attention boolean to config #6377

Conversation

patrickvonplaten commented Aug 10, 2020 • edited Loading

IMPORTANT: This PR introduces a breaking change. All EncoderDecoderModel models have to be updated with add_cross_attention=True.

TODO:

codecov bot commented Aug 10, 2020 • edited Loading

Codecov Report

sgugger left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

xxbidiao commented Sep 16, 2020

patrickvonplaten commented Sep 16, 2020

[EncoderDecoderModel] add a `add_cross_attention` boolean to config #6377

[EncoderDecoderModel] add a `add_cross_attention` boolean to config #6377

patrickvonplaten commented Aug 10, 2020 •

edited

Loading

IMPORTANT: This PR introduces a breaking change. All `EncoderDecoderModel` models have to be updated with `add_cross_attention=True`.

codecov bot commented Aug 10, 2020 •

edited

Loading