Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codelab issue Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers #1794

Closed
ghost opened this issue Feb 1, 2024 · 2 comments
Closed

Comments

@ghost
Copy link

ghost commented Feb 1, 2024

A quick run on your codelab, and got this error on evaludation:

=================

Due to a bug fix in huggingface/transformers#28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass language='en'.
.'` or make sure all input audio is of the same language."
1168 )

ValueError: Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing language='...' or make sure all input audio is of the same language.

@ghost
Copy link
Author

ghost commented Feb 1, 2024

Found solution to enforce single language as below in the model config:

MODEL_NAME = "openai/whisper-medium"
LANGUAGE = "Vietnamese"
TASK = "transcribe"
MAX_LENGTH = 224

model.config.forced_decoder_ids = processor.tokenizer.get_decoder_prompt_ids(
language=LANGUAGE, task=TASK
)
model.config.suppress_tokens = []
model.generation_config.forced_decoder_ids = processor.tokenizer.get_decoder_prompt_ids(
language=LANGUAGE, task=TASK
)
model.generation_config.suppress_tokens = []

@ghost ghost closed this as completed Feb 1, 2024
@RitchieP
Copy link

Hi, I tried implementing your solution but I'm still stuck. I've set my transformer to have logging log verbosely, and this is what I see.
image
It just stops and shows nothing when running evaluation. Any idea on why this is?

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant