-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EncoderDecoderModel] add a add_cross_attention
boolean to config
#6377
[EncoderDecoderModel] add a add_cross_attention
boolean to config
#6377
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6377 +/- ##
==========================================
- Coverage 78.38% 78.36% -0.03%
==========================================
Files 148 148
Lines 27196 27202 +6
==========================================
- Hits 21317 21316 -1
- Misses 5879 5886 +7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good idea to me. Just added a few nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, LGTM. Nice and explicit change!
How do I exactly do this? I got hit by |
Hey @xxbidiao, You have to set |
The
EncoderDecoderModel
uses models fromAUTO_MODEL_FOR_CAUSAL_LM
as their decoder models. The problem is that these models can be used in two ways:EncoderDecoderModel
with cross-attention layers.Currently it is decided via the parameter
config.is_decoder
whether cross-attention layers should be added. The problem is thatconfig.is_decoder
isTrue
for both 1) and 2), which is correct since both 1) and 2) should use a causal mask, but means that for 1) cross-attention layers are added without ever being used.This PR solves this problem by introducing a new config param called
add_cross_attention
which is only relevant for models inAUTO_MODEL_FOR_CAUSAL_LM
.I also played around with the idea of not having the flag in the config, but just passing it along the
init
function, such as:in
and then calling setting this param to
True
for all encoder-decoder models. I decided to put the param in the config instead because:a) The init signature does not have to change and
b) EncoderDecoderModels make extensive use of
AutoModelForCausalLM.from_pretrained(...)
which would have meant that all models that are part ofMODEL_FOR_CAUSAL_LM_MAPPING
have to have this signature.Taking all this into account I think the first solution (putting
add_cross_attenion
into the config) is the better way to go here.IMPORTANT: This PR introduces a breaking change. All
EncoderDecoderModel
models have to be updated withadd_cross_attention=True
.=> All "bert2bert" models were updated: https://huggingface.co/models?search=bert2bert
TODO:
After this, I think the framework is flexible enough to handle all other models and I can extend
EncoderDecoderModel
to GPT2, Roberta, Longformer and maybe Reformer as well.EncoderDecoder is not yet officially released, I think, so this slightly backwards compatibility breaking change is OK. I will updated all Bert2Bert models on the model hub with
add_cross_attention=True
and add a bigger message in this PR when merged.