Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EncoderDecoderModel] add a add_cross_attention boolean to config #6377

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Aug 10, 2020

The EncoderDecoderModel uses models from AUTO_MODEL_FOR_CAUSAL_LM as their decoder models. The problem is that these models can be used in two ways:

  1. As a stand-alone decoder model (GPT2) like without cross-attention layers
  2. As part of a EncoderDecoderModel with cross-attention layers.

Currently it is decided via the parameter config.is_decoder whether cross-attention layers should be added. The problem is that config.is_decoder is True for both 1) and 2), which is correct since both 1) and 2) should use a causal mask, but means that for 1) cross-attention layers are added without ever being used.

This PR solves this problem by introducing a new config param called add_cross_attention which is only relevant for models in AUTO_MODEL_FOR_CAUSAL_LM.

I also played around with the idea of not having the flag in the config, but just passing it along the init function, such as:

        super().__init__(config, add_cross_attention=False)

in

and then calling setting this param to True for all encoder-decoder models. I decided to put the param in the config instead because:
a) The init signature does not have to change and
b) EncoderDecoderModels make extensive use of AutoModelForCausalLM.from_pretrained(...) which would have meant that all models that are part of MODEL_FOR_CAUSAL_LM_MAPPING have to have this signature.

Taking all this into account I think the first solution (putting add_cross_attenion into the config) is the better way to go here.

IMPORTANT: This PR introduces a breaking change. All EncoderDecoderModel models have to be updated with add_cross_attention=True.

=> All "bert2bert" models were updated: https://huggingface.co/models?search=bert2bert

TODO:

After this, I think the framework is flexible enough to handle all other models and I can extend EncoderDecoderModel to GPT2, Roberta, Longformer and maybe Reformer as well.
EncoderDecoder is not yet officially released, I think, so this slightly backwards compatibility breaking change is OK. I will updated all Bert2Bert models on the model hub with add_cross_attention=True and add a bigger message in this PR when merged.

@codecov
Copy link

codecov bot commented Aug 10, 2020

Codecov Report

Merging #6377 into master will decrease coverage by 0.02%.
The diff coverage is 90.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6377      +/-   ##
==========================================
- Coverage   78.38%   78.36%   -0.03%     
==========================================
  Files         148      148              
  Lines       27196    27202       +6     
==========================================
- Hits        21317    21316       -1     
- Misses       5879     5886       +7     
Impacted Files Coverage Δ
src/transformers/modeling_encoder_decoder.py 91.02% <66.66%> (-1.19%) ⬇️
src/transformers/configuration_encoder_decoder.py 100.00% <100.00%> (ø)
src/transformers/configuration_utils.py 96.57% <100.00%> (+0.02%) ⬆️
src/transformers/modeling_bert.py 88.45% <100.00%> (+0.05%) ⬆️
src/transformers/generation_tf_utils.py 85.21% <0.00%> (-1.26%) ⬇️
src/transformers/file_utils.py 82.18% <0.00%> (-0.26%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1429b92...e2fcc0d. Read the comment docs.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea to me. Just added a few nits.

src/transformers/configuration_utils.py Outdated Show resolved Hide resolved
src/transformers/modeling_bert.py Outdated Show resolved Hide resolved
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, LGTM. Nice and explicit change!

@patrickvonplaten patrickvonplaten merged commit 3425936 into huggingface:master Aug 10, 2020
@xxbidiao
Copy link
Contributor

All EncoderDecoderModel models have to be updated with add_cross_attention=True.

How do I exactly do this? I got hit by AttributeError: 'GPT2Config' object has no attribute 'add_cross_attention' after updating to newest release.

@patrickvonplaten
Copy link
Contributor Author

Hey @xxbidiao,

You have to set gpt2.config.add_cross_attention = True and then save this config. Or you can directly add the parameter add_cross_attention=True to the gpt2 config json file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants