Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃悰 Summarization pipeline : T5-base much slower than BART-large #3605

Closed
astariul opened this issue Apr 3, 2020 · 4 comments 路 Fixed by #3682
Closed

馃悰 Summarization pipeline : T5-base much slower than BART-large #3605

astariul opened this issue Apr 3, 2020 · 4 comments 路 Fixed by #3682
Assignees
Labels
Core: Pipeline Internals of the library; Pipeline.

Comments

@astariul
Copy link
Contributor

astariul commented Apr 3, 2020

馃悰 Bug

Information

Model : bart-large-cnn and t5-base

Language : English

The problem arises when using : this colab notebook, using both BART and T5 with pipeline for Summarization.

Dataset : CNN/DM

To reproduce

Run the notebook and measure time for inference between the 2 models. On my run, I have :

BART = 73s
T5 = 369s

Expected behavior

I expected T5 to be at least as fast as BART, since there is less parameters (for the base version at least). Instead it takes much longer with T5...

@patrickvonplaten Do you happen to know why T5 is so slow ?

@BramVanroy BramVanroy added the Core: Pipeline Internals of the library; Pipeline. label Apr 3, 2020
@patrickvonplaten
Copy link
Contributor

Hi @colanim, thanks a lot for your speed comparison :-).

It might be possible that the pipelines used different default parameters for T5 and Bart under the hood which strongly influence their running times.
Besides min_length and max_length could you also insert those parameters into both T5 and Bart to overwrite the default parameters:

      "early_stopping": True
      "length_penalty": 2.0
      "no_repeat_ngram_size": 3
      "num_beams": 4

If there is still a big difference in time, then I guess we have to take a closer look!

@astariul
Copy link
Contributor Author

astariul commented Apr 3, 2020

Thanks for your fast answer @patrickvonplaten

Here is the link to the modified notebook, with the parameters you mentioned :
https://colab.research.google.com/drive/1kCm5ew8qDQqguZjbsC6Ujs9KZBaSfafi


Unfortunately, there is still a huge difference...

BART = 66s
T5 = 226s

@patrickvonplaten patrickvonplaten self-assigned this Apr 3, 2020
@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented Apr 3, 2020

Ok, good to know! thanks for doing the comparison @colanim. This might interest you as well @sshleifer :-)

Oh actually I just remember that Bart caches the decoder hidden key/value outputs when doing auto-regressive decoding (similar to GPT2 - check Visuals under "GPT-2 Masked Self-Attention" in this post) and I think T5 does not.

But T5 could cache the decoder key/value outputs to speed up decoding as well since it uses a causal mask for the decoder. This could definitely be a Feature Request. What do you think
@sshleifer @craffel @thomwolf ?

@craffel
Copy link

craffel commented Apr 3, 2020

Sounds worth it!

@patrickvonplaten patrickvonplaten linked a pull request Apr 7, 2020 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core: Pipeline Internals of the library; Pipeline.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants