[Whisper] Computing features on GPU in batch mode for whisper feature extractor. #29900

vaibhavagg303 · 2024-03-27T10:47:21Z

What does this PR do?

This pull request introduces a feature enabling the computation of audio features for the Whisper model in batch mode on the GPU. This enhancement substantially reduces latency during feature extraction, thereby enhancing overall performance.
cc: @yashjogi

@sanchit-gandhi @ArthurZucker @hollance Please look into it.

fixes #29901

…_whisper

ArthurZucker

Very nice addition!

src/transformers/models/whisper/feature_extraction_whisper.py

ArthurZucker

Looks good. Let's add a test the uses require_torch_gpu to make sure this runs on GPU!

src/transformers/models/whisper/feature_extraction_whisper.py

ArthurZucker

LGTM, let's ask for @sanchit-gandhi 's approval as well here!

sanchit-gandhi

Thanks for this PR @vaibhavagg303 - performance improvements to Whisper are always most welcome! In general, I'm in favour of the changes towards faster feature extractors. Do you have any numbers on how much faster this CUDA variant is? My only reluctancy in merging this PR would be if the performance gains are negligible, but we add extra complexity due to the potential new device argument. To check this, you can adapt and update the toy benchmark that was proposed as part of the first torch stft addition: #26119 (comment). Otherwise, I've left some small suggestions below.

src/transformers/models/whisper/feature_extraction_whisper.py

tests/models/whisper/test_feature_extraction_whisper.py

sanchit-gandhi · 2024-04-02T14:57:05Z

cc @kamilakesbi for viz

yashjogi · 2024-04-02T15:07:48Z

#29901
The details regarding the time improvement are mentioned in this issue. It reduces the complete inference time for Whisper-small by around 25%, which is significant improvement considering the overall latency of Whisper models. And we also see a significant time decrease in extracting features from 1.5 seconds ( without batch and CPU ) to 0.02 seconds ( by using batch and GPU ).
@sanchit-gandhi @ArthurZucker

sanchit-gandhi · 2024-04-02T15:26:53Z

Thanks for the benchmark link @vaibhavagg303, that's most helpful! Once follow-up question to your benchmark: you mention that torch gpu stft "cuts the average time for batches of 8 from 1.5 seconds to 0.25 seconds". How does the compute time change for bsz=1, since this is also a common use-case for computing features in short-form mode (<30s).

sanchit-gandhi · 2024-04-02T15:28:15Z

As a follow-up PR: we'll need to update the ASR pipeline class to compute batched input features to leverage this speed-up, since currently it uses bsz=1 always for the feature extraction

transformers/src/transformers/pipelines/automatic_speech_recognition.py

Line 66 in 9b0a8ea

    
           processed = feature_extractor(chunk, sampling_rate=feature_extractor.sampling_rate, return_tensors="pt")

yashjogi · 2024-04-02T15:36:45Z

Thanks for the benchmark link @vaibhavagg303, that's most helpful! Once follow-up question to your benchmark: you mention that torch gpu stft "cuts the average time for batches of 8 from 1.5 seconds to 0.25 seconds". How does the compute time change for bsz=1, since this is also a common use-case for computing features in short-form mode (<30s).

This is the average computation time for 8 audios.

Mode	Time
CPU Batch Size 1	1.5 seconds
CPU Batch Size 8	0.25 seconds
GPU Batch Size 8	0.02 seconds

sanchit-gandhi · 2024-04-02T15:45:20Z

Thanks @yashjogi. To clarify my question above, what's the speed up of CPU Batch Size 1 vs GPU Batch Size 1?

vaibhavagg303 · 2024-04-02T16:15:52Z

Thanks @yashjogi. To clarify my question above, what's the speed up of CPU Batch Size 1 vs GPU Batch Size 1?

It's around 0.02291 seconds for GPU Batch Size 1 @sanchit-gandhi

yashjogi · 2024-04-02T16:19:10Z

Also, we found out that this is a huge bottleneck for training Whisper -- this simple change reduced the training time by almost 9 times, depending on the CPU resources of the machine we're training on. This means that it would significantly decrease the time to train distil-whisper as well! https://github.com/huggingface/distil-whisper/blob/main/training/run_distillation.py
@sanchit-gandhi

sanchit-gandhi

Thanks for the updates @vaibhavagg303 - just one small suggestion about the docstring, otherwise LGTM!

tests/models/whisper/test_feature_extraction_whisper.py

src/transformers/models/whisper/feature_extraction_whisper.py

vaibhavagg303 · 2024-04-07T14:44:15Z

Hi @ArthurZucker, can you please review and approve these changes?

ArthurZucker

Thanks for getting through this and bringing fast whisper processing! 🤗

vaibhavagg303 · 2024-04-08T08:52:53Z

Thanks, @sanchit-gandhi @ArthurZucker

sanchit-gandhi · 2024-04-10T13:49:50Z

Thanks for your contribution @vaibhavagg303!

… extractor. (huggingface#29900) * add _torch_extract_fbank_features_batch function in feature_extractor_whisper * reformat feature_extraction_whisper.py file * handle batching in single function * add gpu test & doc * add batch test & device in each __call__ * add device arg in doc string --------- Co-authored-by: vaibhav.aggarwal <vaibhav.aggarwal@sprinklr.com>

… extractor. (#29900) * add _torch_extract_fbank_features_batch function in feature_extractor_whisper * reformat feature_extraction_whisper.py file * handle batching in single function * add gpu test & doc * add batch test & device in each __call__ * add device arg in doc string --------- Co-authored-by: vaibhav.aggarwal <vaibhav.aggarwal@sprinklr.com>

add _torch_extract_fbank_features_batch function in feature_extractor…

4936965

…_whisper

yashjogi mentioned this pull request Mar 27, 2024

[Whisper] Computing features on CPU slows down WhisperFeatureExtractor #29901

Closed

reformat feature_extraction_whisper.py file

dd89711

ArthurZucker reviewed Mar 28, 2024

View reviewed changes

src/transformers/models/whisper/feature_extraction_whisper.py Outdated Show resolved Hide resolved

src/transformers/models/whisper/feature_extraction_whisper.py Outdated Show resolved Hide resolved

handle batching in single function

a90fbf8

vaibhavagg303 requested a review from ArthurZucker March 29, 2024 03:55

ArthurZucker reviewed Mar 30, 2024

View reviewed changes

src/transformers/models/whisper/feature_extraction_whisper.py Show resolved Hide resolved

add gpu test & doc

7373617

vaibhavagg303 requested a review from ArthurZucker April 1, 2024 03:15

ArthurZucker approved these changes Apr 2, 2024

View reviewed changes

ArthurZucker requested a review from sanchit-gandhi April 2, 2024 08:15

sanchit-gandhi reviewed Apr 2, 2024

View reviewed changes

src/transformers/models/whisper/feature_extraction_whisper.py Outdated Show resolved Hide resolved

tests/models/whisper/test_feature_extraction_whisper.py Show resolved Hide resolved

add batch test & device in each __call__

866ed74

vaibhavagg303 requested a review from ArthurZucker April 4, 2024 01:30

sanchit-gandhi approved these changes Apr 5, 2024

View reviewed changes

tests/models/whisper/test_feature_extraction_whisper.py Show resolved Hide resolved

src/transformers/models/whisper/feature_extraction_whisper.py Show resolved Hide resolved

src/transformers/models/whisper/feature_extraction_whisper.py Show resolved Hide resolved

add device arg in doc string

44f3f0a

ArthurZucker approved these changes Apr 8, 2024

View reviewed changes

ArthurZucker merged commit 1ed93be into huggingface:main Apr 8, 2024
17 checks passed

vaibhavagg303 deleted the feature/optimize_feature_extractor branch April 8, 2024 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Whisper] Computing features on GPU in batch mode for whisper feature extractor. #29900

[Whisper] Computing features on GPU in batch mode for whisper feature extractor. #29900

vaibhavagg303 commented Mar 27, 2024 •

edited by ArthurZucker

ArthurZucker left a comment

ArthurZucker left a comment

ArthurZucker left a comment

sanchit-gandhi left a comment

sanchit-gandhi commented Apr 2, 2024

yashjogi commented Apr 2, 2024 •

edited

sanchit-gandhi commented Apr 2, 2024

sanchit-gandhi commented Apr 2, 2024 •

edited

yashjogi commented Apr 2, 2024 •

edited

sanchit-gandhi commented Apr 2, 2024 •

edited

vaibhavagg303 commented Apr 2, 2024 •

edited

yashjogi commented Apr 2, 2024 •

edited

sanchit-gandhi left a comment

vaibhavagg303 commented Apr 7, 2024

ArthurZucker left a comment

vaibhavagg303 commented Apr 8, 2024

sanchit-gandhi commented Apr 10, 2024

[Whisper] Computing features on GPU in batch mode for whisper feature extractor. #29900

[Whisper] Computing features on GPU in batch mode for whisper feature extractor. #29900

Conversation

vaibhavagg303 commented Mar 27, 2024 • edited by ArthurZucker

What does this PR do?

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

sanchit-gandhi left a comment

Choose a reason for hiding this comment

sanchit-gandhi commented Apr 2, 2024

yashjogi commented Apr 2, 2024 • edited

sanchit-gandhi commented Apr 2, 2024

sanchit-gandhi commented Apr 2, 2024 • edited

yashjogi commented Apr 2, 2024 • edited

sanchit-gandhi commented Apr 2, 2024 • edited

vaibhavagg303 commented Apr 2, 2024 • edited

yashjogi commented Apr 2, 2024 • edited

sanchit-gandhi left a comment

Choose a reason for hiding this comment

vaibhavagg303 commented Apr 7, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

vaibhavagg303 commented Apr 8, 2024

sanchit-gandhi commented Apr 10, 2024

vaibhavagg303 commented Mar 27, 2024 •

edited by ArthurZucker

yashjogi commented Apr 2, 2024 •

edited

sanchit-gandhi commented Apr 2, 2024 •

edited

yashjogi commented Apr 2, 2024 •

edited

sanchit-gandhi commented Apr 2, 2024 •

edited

vaibhavagg303 commented Apr 2, 2024 •

edited

yashjogi commented Apr 2, 2024 •

edited