Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up beam search by ~2x #1851

Closed
yuyan2do opened this issue Mar 17, 2020 · 1 comment
Closed

speed up beam search by ~2x #1851

yuyan2do opened this issue Mar 17, 2020 · 1 comment

Comments

@yuyan2do
Copy link
Contributor

yuyan2do commented Mar 17, 2020

🚀 Feature Request

Speed up beam search ~2x by removing unnecessary reorder and merging small ops

Motivation

GPU utility is only ~40% during BART model inference. By profile, I see 2 issues in incremental generation.

  1. Half of time is used for transfer small data between GPU and CPU when no_repeat_ngram_size > 0. This pattern may apply to other seq2seq models, because the code cause small data transfer is in beam search part, not in model code.
  2. State reorder use as much time as computation in model forward, and many of these reorder are unnecessary.

Pitch

I created PR #1852 with below changes.

  1. Copy whole tensor from gpu to cpu once, instead of do it in for loop
  2. Ban ngram token in one kernel call, instead of in for loop
  3. Remove unnecessary reorder
    In encoder_decoder_attention, reorder only need when batch size change. Because encoder state
    is shared across beam size.

Additional context

Inference speed (sample/s) on CNN-DM dataset using V100

Before change After change Speed up
no_repeat_ngram_size=3 3.6 6.8 1.9X
no_repeat_ngram_size=0 5.3 8.3 1.6X

(beam=4, lenpen=2.0, max_len_b=140, min_len=55)

Profile data to compare before and after change.
image

To benchmark the speed, run "CUDA_VISIBLE_DEVICES=0 python generation_speed_test.py".
benchmark code modify from here
cnndm_128.txt
generation_speed_test.py.txt

@myleott
Copy link
Contributor

myleott commented Mar 17, 2020

Very nice!

moussaKam pushed a commit to moussaKam/language-adaptive-pretraining that referenced this issue Sep 29, 2020
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes facebookresearch#1851 .

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: facebookresearch#1852

Reviewed By: ngoyal2707

Differential Revision: D20490964

Pulled By: myleott

fbshipit-source-id: 22f6c849408029f5432e531589da29d95e31d392
mgaido91 pushed a commit to mgaido91/FBK-fairseq-ST that referenced this issue Jan 12, 2021
Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes facebookresearch/fairseq#1851 .

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: facebookresearch/fairseq#1852

Reviewed By: ngoyal2707

Differential Revision: D20490964

Pulled By: myleott

fbshipit-source-id: 22f6c849408029f5432e531589da29d95e31d392
facebook-github-bot pushed a commit that referenced this issue May 6, 2021
Summary:
see title

Pull Request resolved: fairinternal/fairseq-py#1851

Reviewed By: michaelauli, arbabu123

Differential Revision: D28226892

Pulled By: alexeib

fbshipit-source-id: e07641dda46be2708e1f9d0c0cbc5b8dedaa92e7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants