🐛 Summarization pipeline : T5-base much slower than BART-large #3605

astariul · 2020-04-03T06:59:18Z

🐛 Bug

Information

Model : bart-large-cnn and t5-base

Language : English

The problem arises when using : this colab notebook, using both BART and T5 with pipeline for Summarization.

Dataset : CNN/DM

To reproduce

Run the notebook and measure time for inference between the 2 models. On my run, I have :

BART = 73s
T5 = 369s

Expected behavior

I expected T5 to be at least as fast as BART, since there is less parameters (for the base version at least). Instead it takes much longer with T5...

@patrickvonplaten Do you happen to know why T5 is so slow ?

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2020-04-03T09:31:46Z

Hi @colanim, thanks a lot for your speed comparison :-).

It might be possible that the pipelines used different default parameters for T5 and Bart under the hood which strongly influence their running times.
Besides min_length and max_length could you also insert those parameters into both T5 and Bart to overwrite the default parameters:

      "early_stopping": True
      "length_penalty": 2.0
      "no_repeat_ngram_size": 3
      "num_beams": 4

If there is still a big difference in time, then I guess we have to take a closer look!

astariul · 2020-04-03T09:51:41Z

Thanks for your fast answer @patrickvonplaten

Here is the link to the modified notebook, with the parameters you mentioned :
https://colab.research.google.com/drive/1kCm5ew8qDQqguZjbsC6Ujs9KZBaSfafi

Unfortunately, there is still a huge difference...

BART = 66s
T5 = 226s

patrickvonplaten · 2020-04-03T12:34:39Z

Ok, good to know! thanks for doing the comparison @colanim. This might interest you as well @sshleifer :-)

Oh actually I just remember that Bart caches the decoder hidden key/value outputs when doing auto-regressive decoding (similar to GPT2 - check Visuals under "GPT-2 Masked Self-Attention" in this post) and I think T5 does not.

But T5 could cache the decoder key/value outputs to speed up decoding as well since it uses a causal mask for the decoder. This could definitely be a Feature Request. What do you think
@sshleifer @craffel @thomwolf ?

craffel · 2020-04-03T17:34:51Z

Sounds worth it!

BramVanroy added the Core: Pipeline Internals of the library; Pipeline. label Apr 3, 2020

patrickvonplaten self-assigned this Apr 3, 2020

patrickvonplaten linked a pull request Apr 7, 2020 that will close this issue

[T5, generation] Add decoder caching for T5 #3682

Merged

6 tasks

patrickvonplaten closed this as completed in #3682 Apr 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Summarization pipeline : T5-base much slower than BART-large #3605

🐛 Summarization pipeline : T5-base much slower than BART-large #3605

astariul commented Apr 3, 2020

patrickvonplaten commented Apr 3, 2020

astariul commented Apr 3, 2020 •

edited

Loading

patrickvonplaten commented Apr 3, 2020 •

edited

Loading

craffel commented Apr 3, 2020

🐛 Summarization pipeline : T5-base much slower than BART-large #3605

🐛 Summarization pipeline : T5-base much slower than BART-large #3605

Comments

astariul commented Apr 3, 2020

🐛 Bug

Information

To reproduce

Expected behavior

patrickvonplaten commented Apr 3, 2020

astariul commented Apr 3, 2020 • edited Loading

patrickvonplaten commented Apr 3, 2020 • edited Loading

craffel commented Apr 3, 2020

astariul commented Apr 3, 2020 •

edited

Loading

patrickvonplaten commented Apr 3, 2020 •

edited

Loading