Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size #30637

kamilakesbi · 2024-05-03T14:43:48Z

This PR aims at fixing issue #30611:

First: an error will be thrown if the assistant and main models encoders don't have the same size, and the assistant is loaded using AutoModelForCausalLM.
Second: This PR makes the pipeline work when using an assistant with a different encoder size (loaded with AutoModelForSpeechSeq2Seq) than the main model:

When using AutomaticSpeechRecognitionPipeline, If we use an assistant with a different encoder size than the main model , the pipeline is broken and we get the following error message:

ValueError: Whisper expects the mel input features to be of length 3000, but found 1500. Make sure to pad the input mel features to 3000.

Explanation of the solution

When doing short form generation with the pipeline, input_features aren't passed to the generate method, which instead takes the output of the main model's encoder.

If the main model and the assistant don't share the same encoder, the encoder_output passed to generate cannot be used by the assistant for generation, and we get an error.

The solution here is to also pass the input_features to the generate method to be used by the assistant.

Who can review?

@sanchit-gandhi

HuggingFaceDocBuilderDev · 2024-05-03T15:03:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sanchit-gandhi

Thanks for taking this up @kamilakesbi - left some comments below!

src/transformers/models/whisper/generation_whisper.py

src/transformers/pipelines/automatic_speech_recognition.py

sanchit-gandhi

Small comment regarding the generality of the check (note that in generation/utils.py, we are assuming that all checks + functionality can be applied to all models in the library that are generate-compatible, not just speech recognition ones)

src/transformers/generation/utils.py

sanchit-gandhi · 2024-05-08T10:12:43Z

Is there a test that confirms correctness after the fix? There's likely a relevant slow pipeline test that was either failing, or was not rigorous enough

gante · 2024-05-09T12:11:08Z

Please have a look at #30726 for an alternative fix -- IMO, the root source of problems is the ASR pipeline doing a redundant operation, and not in generate :)

I hope you don't mind me crashing into the issue 🙌 (I only noticed this PR after opening #30726, when trying to link all related issues)

sanchit-gandhi

Pipeline changes LGTM, just some minor suggestions regarding the slow tests. Would love a second opinion from generate expert @gante on the assistant model validation!

src/transformers/generation/utils.py

tests/pipelines/test_pipelines_automatic_speech_recognition.py

kamilakesbi · 2024-05-14T09:53:05Z

I think this PR is ready to be merged!

cc @amyeroberts @gante if you want to have a look ;)

amyeroberts

Thanks for working on this!

I've just done a quick pass over and have some outstanding Qs. I'll review again once @gante has confirmed the update to the generation validation is OK

src/transformers/generation/utils.py

tests/pipelines/test_pipelines_automatic_speech_recognition.py

gante

LGTM, thank you for fixing! 💪 I've added a few minor nits to help with readability.

(I see that you've included the diff from #30726, I'm going to close that PR :D)

src/transformers/generation/utils.py

kamilakesbi · 2024-05-20T12:33:58Z

Thanks @gante for the review :)
@amyeroberts could you please merge this PR ?

amyeroberts · 2024-05-20T16:45:18Z

@kamilakesbi It still needs a final core maintainer review and approval before merge (+ resolution of conflicts) :). I'll review now

amyeroberts

Thanks for adding and iterating on this!

Only thing left to add are tests for _validate_assistant - there should be tests making sure that it correctly raises exceptions for the two cases it's checking for

kamilakesbi · 2024-05-21T15:04:43Z

hi @amyeroberts, I've added a slow test which pass :) I think quality checks fails are unrelated to this PR, and will be fixed by #30932.

amyeroberts

Thanks for adding the tests - looks great!

tests/generation/test_utils.py

sanchit-gandhi · 2024-05-21T16:41:10Z

Quick checklist before merge:

Resolve all comment threads that have been addressed
Fix the merge conflict in automatic_speech_recognition.py
Rebase onto main to get the style fixes from update ruff version #30932 (once the PR is merged)
Ping me to get this merged!

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

sanchit-gandhi · 2024-05-23T08:59:34Z

Nice work @kamilakesbi!

…encoder size (#30637) * fiw input to generate in pipeline * fixup * pass input_features to generate with assistant * error if model and assistant with different enc size * fix * apply review suggestions * use self.config.is_encoder_decoder * pass inputs to generate directly * add slow tests * Update src/transformers/generation/utils.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * apply review * Update src/transformers/generation/utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * apply code review * update attributes encoder_xyz to check * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * add slow test * solve conflicts --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

…encoder size (huggingface#30637) * fiw input to generate in pipeline * fixup * pass input_features to generate with assistant * error if model and assistant with different enc size * fix * apply review suggestions * use self.config.is_encoder_decoder * pass inputs to generate directly * add slow tests * Update src/transformers/generation/utils.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * apply review * Update src/transformers/generation/utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * apply code review * update attributes encoder_xyz to check * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * add slow test * solve conflicts --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

sanchit-gandhi reviewed May 3, 2024

View reviewed changes

src/transformers/models/whisper/generation_whisper.py Outdated Show resolved Hide resolved

src/transformers/pipelines/automatic_speech_recognition.py Outdated Show resolved Hide resolved

sanchit-gandhi reviewed May 8, 2024

View reviewed changes

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

gante mentioned this pull request May 9, 2024

Whisper: fix asr pipeline with seq2seq assistant model #30726

Closed

sanchit-gandhi approved these changes May 10, 2024

View reviewed changes

kamilakesbi changed the title ~~[WIP] - Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size~~ Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size May 15, 2024

amyeroberts reviewed May 15, 2024

View reviewed changes

kamilakesbi requested a review from gante May 16, 2024 15:20

kamilakesbi added Audio Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! labels May 17, 2024

kamilakesbi self-assigned this May 17, 2024

gante approved these changes May 20, 2024

View reviewed changes

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

gante reviewed May 20, 2024

View reviewed changes

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

amyeroberts reviewed May 20, 2024

View reviewed changes

amyeroberts approved these changes May 21, 2024

View reviewed changes

tests/generation/test_utils.py Show resolved Hide resolved

sanchit-gandhi mentioned this pull request May 21, 2024

Assistant model not working for different sized openai models when using pipeline for ASR #30407

Closed

4 tasks

kamilakesbi force-pushed the speculative_decoding_asr branch 2 times, most recently from f8fad64 to 2c48d6c Compare May 22, 2024 11:06

kamilakesbi added 5 commits May 23, 2024 10:17

fiw input to generate in pipeline

864df8d

fixup

ff0c638

pass input_features to generate with assistant

749cfaa

error if model and assistant with different enc size

f3011b0

fix

404f67b

kamilakesbi and others added 20 commits May 23, 2024 10:17

apply review suggestions

fd492a7

use self.config.is_encoder_decoder

e41d519

pass inputs to generate directly

27242a6

add slow tests

726f53f

Update src/transformers/generation/utils.py

c7f3f1c

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

405606c

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

2c8c039

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

5b6f297

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

03d2c3e

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

f1c8c8a

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

apply review

d1571a9

Update src/transformers/generation/utils.py

83e17f6

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

29046c6

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

apply code review

d216376

update attributes encoder_xyz to check

87b08e9

Update src/transformers/generation/utils.py

a43e202

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

Update src/transformers/generation/utils.py

b23f1f3

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

Update src/transformers/generation/utils.py

5547aef

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

add slow test

e2bdde1

solve conflicts

3a35145

kamilakesbi force-pushed the speculative_decoding_asr branch from 06e5839 to 3a35145 Compare May 23, 2024 08:17

sanchit-gandhi merged commit eb1a77b into huggingface:main May 23, 2024
21 checks passed

kamilakesbi mentioned this pull request May 23, 2024

Speculative Decoding Snippet Not Working #29869

Closed

4 tasks

sanchit-gandhi mentioned this pull request Jul 3, 2024

[pipeline] fix padding for 1-d tensors #31776

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size #30637

Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size #30637

kamilakesbi commented May 3, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented May 3, 2024

sanchit-gandhi left a comment

sanchit-gandhi left a comment

sanchit-gandhi commented May 8, 2024

gante commented May 9, 2024 •

edited

Loading

sanchit-gandhi left a comment

kamilakesbi commented May 14, 2024

amyeroberts left a comment

gante left a comment •

edited

Loading

kamilakesbi commented May 20, 2024

amyeroberts commented May 20, 2024

amyeroberts left a comment

kamilakesbi commented May 21, 2024

amyeroberts left a comment

sanchit-gandhi commented May 21, 2024

sanchit-gandhi commented May 23, 2024

Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size #30637

Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size #30637

Conversation

kamilakesbi commented May 3, 2024 • edited Loading

Explanation of the solution

Who can review?

HuggingFaceDocBuilderDev commented May 3, 2024

sanchit-gandhi left a comment

Choose a reason for hiding this comment

sanchit-gandhi left a comment

Choose a reason for hiding this comment

sanchit-gandhi commented May 8, 2024

gante commented May 9, 2024 • edited Loading

sanchit-gandhi left a comment

Choose a reason for hiding this comment

kamilakesbi commented May 14, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

gante left a comment • edited Loading

Choose a reason for hiding this comment

kamilakesbi commented May 20, 2024

amyeroberts commented May 20, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

kamilakesbi commented May 21, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

sanchit-gandhi commented May 21, 2024

sanchit-gandhi commented May 23, 2024

kamilakesbi commented May 3, 2024 •

edited

Loading

gante commented May 9, 2024 •

edited

Loading

gante left a comment •

edited

Loading