speed up beam search by ~2x #1851

yuyan2do · 2020-03-17T04:39:36Z

🚀 Feature Request

Speed up beam search ~2x by removing unnecessary reorder and merging small ops

Motivation

GPU utility is only ~40% during BART model inference. By profile, I see 2 issues in incremental generation.

Half of time is used for transfer small data between GPU and CPU when no_repeat_ngram_size > 0. This pattern may apply to other seq2seq models, because the code cause small data transfer is in beam search part, not in model code.
State reorder use as much time as computation in model forward, and many of these reorder are unnecessary.

Pitch

I created PR #1852 with below changes.

Copy whole tensor from gpu to cpu once, instead of do it in for loop
Ban ngram token in one kernel call, instead of in for loop
Remove unnecessary reorder
In encoder_decoder_attention, reorder only need when batch size change. Because encoder state
is shared across beam size.

Additional context

Inference speed (sample/s) on CNN-DM dataset using V100

	Before change	After change	Speed up
no_repeat_ngram_size=3	3.6	6.8	1.9X
no_repeat_ngram_size=0	5.3	8.3	1.6X

(beam=4, lenpen=2.0, max_len_b=140, min_len=55)

Profile data to compare before and after change.

To benchmark the speed, run "CUDA_VISIBLE_DEVICES=0 python generation_speed_test.py".
benchmark code modify from here
cnndm_128.txt
generation_speed_test.py.txt

myleott · 2020-03-17T14:38:23Z

Very nice!

Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes facebookresearch#1851 . ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: facebookresearch#1852 Reviewed By: ngoyal2707 Differential Revision: D20490964 Pulled By: myleott fbshipit-source-id: 22f6c849408029f5432e531589da29d95e31d392

Summary: # Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes facebookresearch/fairseq#1851 . ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: facebookresearch/fairseq#1852 Reviewed By: ngoyal2707 Differential Revision: D20490964 Pulled By: myleott fbshipit-source-id: 22f6c849408029f5432e531589da29d95e31d392

Summary: see title Pull Request resolved: fairinternal/fairseq-py#1851 Reviewed By: michaelauli, arbabu123 Differential Revision: D28226892 Pulled By: alexeib fbshipit-source-id: e07641dda46be2708e1f9d0c0cbc5b8dedaa92e7

yuyan2do added enhancement help wanted needs triage labels Mar 17, 2020

yuyan2do mentioned this issue Mar 17, 2020

Beam search perf improve #1852

Closed

4 tasks

myleott removed the needs triage label Mar 17, 2020

facebook-github-bot closed this as completed in a84cb78 Mar 20, 2020

yuyan2do mentioned this issue Apr 2, 2020

Increase beam search speed and reduce memory usage #1957

Closed

yuyan2do mentioned this issue Dec 18, 2020

RuntimeError: CUDA error: no kernel image is available for execution on the device microsoft/fastseq#70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up beam search by ~2x #1851

speed up beam search by ~2x #1851

yuyan2do commented Mar 17, 2020 •

edited

myleott commented Mar 17, 2020

speed up beam search by ~2x #1851

speed up beam search by ~2x #1851

Comments

yuyan2do commented Mar 17, 2020 • edited

🚀 Feature Request

Motivation

Pitch

Additional context

myleott commented Mar 17, 2020

yuyan2do commented Mar 17, 2020 •

edited