Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Whisper] Computing features on GPU in batch mode for whisper feature extractor. #29900

Conversation

vaibhavagg303
Copy link
Contributor

@vaibhavagg303 vaibhavagg303 commented Mar 27, 2024

What does this PR do?

This pull request introduces a feature enabling the computation of audio features for the Whisper model in batch mode on the GPU. This enhancement substantially reduces latency during feature extraction, thereby enhancing overall performance.
cc: @yashjogi

@sanchit-gandhi @ArthurZucker @hollance Please look into it.

fixes #29901

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice addition!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Let's add a test the uses require_torch_gpu to make sure this runs on GPU!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's ask for @sanchit-gandhi 's approval as well here!

Copy link
Contributor

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @vaibhavagg303 - performance improvements to Whisper are always most welcome! In general, I'm in favour of the changes towards faster feature extractors. Do you have any numbers on how much faster this CUDA variant is? My only reluctancy in merging this PR would be if the performance gains are negligible, but we add extra complexity due to the potential new device argument. To check this, you can adapt and update the toy benchmark that was proposed as part of the first torch stft addition: #26119 (comment). Otherwise, I've left some small suggestions below.

@sanchit-gandhi
Copy link
Contributor

cc @kamilakesbi for viz

@yashjogi
Copy link

yashjogi commented Apr 2, 2024

#29901
The details regarding the time improvement are mentioned in this issue. It reduces the complete inference time for Whisper-small by around 25%, which is significant improvement considering the overall latency of Whisper models. And we also see a significant time decrease in extracting features from 1.5 seconds ( without batch and CPU ) to 0.02 seconds ( by using batch and GPU ).
@sanchit-gandhi @ArthurZucker

@sanchit-gandhi
Copy link
Contributor

Thanks for the benchmark link @vaibhavagg303, that's most helpful! Once follow-up question to your benchmark: you mention that torch gpu stft "cuts the average time for batches of 8 from 1.5 seconds to 0.25 seconds". How does the compute time change for bsz=1, since this is also a common use-case for computing features in short-form mode (<30s).

@sanchit-gandhi
Copy link
Contributor

sanchit-gandhi commented Apr 2, 2024

As a follow-up PR: we'll need to update the ASR pipeline class to compute batched input features to leverage this speed-up, since currently it uses bsz=1 always for the feature extraction

processed = feature_extractor(chunk, sampling_rate=feature_extractor.sampling_rate, return_tensors="pt")

@yashjogi
Copy link

yashjogi commented Apr 2, 2024

Thanks for the benchmark link @vaibhavagg303, that's most helpful! Once follow-up question to your benchmark: you mention that torch gpu stft "cuts the average time for batches of 8 from 1.5 seconds to 0.25 seconds". How does the compute time change for bsz=1, since this is also a common use-case for computing features in short-form mode (<30s).

This is the average computation time for 8 audios.

Mode Time
CPU Batch Size 1 1.5 seconds
CPU Batch Size 8 0.25 seconds
GPU Batch Size 8 0.02 seconds

@sanchit-gandhi
Copy link
Contributor

sanchit-gandhi commented Apr 2, 2024

Thanks @yashjogi. To clarify my question above, what's the speed up of CPU Batch Size 1 vs GPU Batch Size 1?

@vaibhavagg303
Copy link
Contributor Author

vaibhavagg303 commented Apr 2, 2024

Thanks @yashjogi. To clarify my question above, what's the speed up of CPU Batch Size 1 vs GPU Batch Size 1?

It's around 0.02291 seconds for GPU Batch Size 1 @sanchit-gandhi

@yashjogi
Copy link

yashjogi commented Apr 2, 2024

Also, we found out that this is a huge bottleneck for training Whisper -- this simple change reduced the training time by almost 9 times, depending on the CPU resources of the machine we're training on. This means that it would significantly decrease the time to train distil-whisper as well! https://github.com/huggingface/distil-whisper/blob/main/training/run_distillation.py
@sanchit-gandhi

Copy link
Contributor

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @vaibhavagg303 - just one small suggestion about the docstring, otherwise LGTM!

@vaibhavagg303
Copy link
Contributor Author

Hi @ArthurZucker, can you please review and approve these changes?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting through this and bringing fast whisper processing! 🤗

@ArthurZucker ArthurZucker merged commit 1ed93be into huggingface:main Apr 8, 2024
17 checks passed
@vaibhavagg303
Copy link
Contributor Author

Thanks, @sanchit-gandhi @ArthurZucker

@vaibhavagg303 vaibhavagg303 deleted the feature/optimize_feature_extractor branch April 8, 2024 09:01
@sanchit-gandhi
Copy link
Contributor

Thanks for your contribution @vaibhavagg303!

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Apr 12, 2024
… extractor. (huggingface#29900)

* add _torch_extract_fbank_features_batch function in feature_extractor_whisper

* reformat feature_extraction_whisper.py file

* handle batching in single function

* add gpu test & doc

* add batch test & device in each __call__

* add device arg in doc string

---------

Co-authored-by: vaibhav.aggarwal <vaibhav.aggarwal@sprinklr.com>
ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024
… extractor. (#29900)

* add _torch_extract_fbank_features_batch function in feature_extractor_whisper

* reformat feature_extraction_whisper.py file

* handle batching in single function

* add gpu test & doc

* add batch test & device in each __call__

* add device arg in doc string

---------

Co-authored-by: vaibhav.aggarwal <vaibhav.aggarwal@sprinklr.com>
itazap pushed a commit that referenced this pull request May 14, 2024
… extractor. (#29900)

* add _torch_extract_fbank_features_batch function in feature_extractor_whisper

* reformat feature_extraction_whisper.py file

* handle batching in single function

* add gpu test & doc

* add batch test & device in each __call__

* add device arg in doc string

---------

Co-authored-by: vaibhav.aggarwal <vaibhav.aggarwal@sprinklr.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Whisper] Computing features on CPU slows down WhisperFeatureExtractor
4 participants