-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
馃悰 Summarization pipeline : T5-base much slower than BART-large #3605
Comments
Hi @colanim, thanks a lot for your speed comparison :-). It might be possible that the pipelines used different default parameters for
If there is still a big difference in time, then I guess we have to take a closer look! |
Thanks for your fast answer @patrickvonplaten Here is the link to the modified notebook, with the parameters you mentioned : Unfortunately, there is still a huge difference...
|
Ok, good to know! thanks for doing the comparison @colanim. This might interest you as well @sshleifer :-) Oh actually I just remember that Bart caches the decoder hidden key/value outputs when doing auto-regressive decoding (similar to GPT2 - check Visuals under "GPT-2 Masked Self-Attention" in this post) and I think T5 does not. But T5 could cache the decoder key/value outputs to speed up decoding as well since it uses a causal mask for the decoder. This could definitely be a Feature Request. What do you think |
Sounds worth it! |
馃悰 Bug
Information
Model :
bart-large-cnn
andt5-base
Language : English
The problem arises when using : this colab notebook, using both BART and T5 with pipeline for Summarization.
Dataset : CNN/DM
To reproduce
Run the notebook and measure time for inference between the 2 models. On my run, I have :
Expected behavior
I expected T5 to be at least as fast as BART, since there is less parameters (for the base version at least). Instead it takes much longer with T5...
@patrickvonplaten Do you happen to know why T5 is so slow ?
The text was updated successfully, but these errors were encountered: