Evaluate trainer on Code-Switched Speech fails with "ValueError: Multiple languages detected when trying to predict the most likely target language for transcription." #30654

sproocht · 2024-05-04T21:09:16Z

System Info

transformers version: 4.41.0.dev0
Platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.23.0
Safetensors version: 0.4.3
Accelerate version: 0.30.1.dev0
Accelerate config: not found
PyTorch version (GPU?): 2.3.0+cu121 (True)
Tensorflow version (GPU?): 2.13.1 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help?

@sanchit-gandhi
@ArthurZucker
@muellerzr

This issue is related to finetuning Whisper on datasets that may contain switches from a base language to other languages, or simply low resource languages for which language identification by the pre-trained model is not accurate enough. So the issue may be reproduced by mixing a few audio utterances from French into a German dataset, for example, and running "trainer.evaluate" on it .

Up until transformers version 4.37.2, fine-tuning and evaluating on these types of datasets did not raise any issues and the fine-tuning result was very acceptable. In more recent versions, starting with 4.38.0, model evaluation systematically fails on such datasets (in transformers/models/whisper/generation_whisper.py)

I can understand the idea of forcing a single language in a batch, but in real-life situations, people use many languages concurrently in their daily interactions and this is reflected in the datasets. However, this issue prohibits fine-tuning for languages such as Luxembourgish, where it is frequent to mix Luxembourgish with English, French or German in the same utterances. Many other cases concerns Spanglish or Hinglish cases, or low resource languages borrowing words or phrases from other high-resource languages. So, it could prevent using the transformers library to fine-tune for such languages.

The only workaround that I have at the moment, is to stick to version 4.37.2 . Please have a look at this regression.

Thank you in advance!

Here is the full error code and messages:

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_12853/1263219524.py in
1 # Get initial evaluation results
----> 2 trainer.evaluate()

~/.local/lib/python3.10/site-packages/transformers/trainer_seq2seq.py in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix, **gen_kwargs)
178 self.gather_function = self.accelerator.gather
179 self._gen_kwargs = gen_kwargs
--> 180 return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
181
182 def predict(

~/.local/lib/python3.10/site-packages/transformers/trainer.py in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
3513
3514 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3515 output = eval_loop(
3516 eval_dataloader,
3517 description="Evaluation",

~/.local/lib/python3.10/site-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
3696
3697 # Prediction step
-> 3698 loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
3699 main_input_name = getattr(self.model, "main_input_name", "input_ids")
3700 inputs_decode = self._prepare_input(inputs[main_input_name]) if args.include_inputs_for_metrics else None

~/.local/lib/python3.10/site-packages/transformers/trainer_seq2seq.py in prediction_step(self, model, inputs, prediction_loss_only, ignore_keys, **gen_kwargs)
308 k: v for k, v in inputs.items() if k not in ("decoder_input_ids", "decoder_attention_mask")
309 }
--> 310 generated_tokens = self.model.generate(**generation_inputs, **gen_kwargs)
311
312 # Temporary hack to ensure the generation config is not initialized for each iteration of the evaluation loop

~/.local/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py in generate(self, input_features, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, return_timestamps, task, language, is_multilingual, prompt_ids, prompt_condition_type, condition_on_prev_tokens, temperature, compression_ratio_threshold, logprob_threshold, no_speech_threshold, num_segment_frames, attention_mask, time_precision, return_token_timestamps, return_segments, return_dict_in_generate, **kwargs)
528
529 # pass self.config for backward compatibility
--> 530 init_tokens = self._retrieve_init_tokens(
531 input_features,
532 generation_config=generation_config,

_~/.local/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py in _retrieve_init_tokens(self, input_features, generation_config, config, num_segment_frames, kwargs)
1167
1168 if torch.unique(lang_ids).shape[0] > 1:
-> 1169 raise ValueError(
1170 "Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing language='...' or make sure all input audio is of the same language."
1171 )

ValueError: Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing language='...' or make sure all input audio is of the same language.`_

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Run : trainer.evaluate() on a dataset containing a mix of languages.

Expected behavior

Works in transformers versions up to 4.37.2

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-05-07T09:39:18Z

cc @kamilakesbi

kamilakesbi · 2024-05-10T08:01:35Z

Hi @sproocht,

Thanks for sharing this error! It will be solved with PR #29688.

sproocht · 2024-05-11T11:07:54Z

Hi @kamilakesbi,
Perfect! Thank you for confirming and for working on this.
Best regards,

sanchit-gandhi · 2024-05-16T13:43:05Z

Hey @sproocht - thanks for reporting! This issue was in-fact closed by #29938 for the Transformers example, and huggingface/blog#1944 for the blog post.

If you copy the latest example script and use the latest version of Transformers, you should be able to force the language token by setting the --language argument, which will bypass the automatic language detection.

Hope that helps!

sanchit-gandhi · 2024-05-16T15:15:43Z

Hey @sproocht - I battle-tested this a bit and found you're indeed correct, the generation config is still not correctly updated. This PR should fix this once and for all: #30865

leophill · 2024-05-16T15:28:30Z

Hey @sanchit-gandhi,
That's great! Thank you for the updates. I look forward to testing the fix once the PR is merged.

sproocht · 2024-05-16T15:41:55Z

Hey @sanchit-gandhi,
Nice job! Thanks for confirming. I will definitely give it a try after the PR is merged.
Best regards,

amyeroberts added the Audio label May 7, 2024

kamilakesbi closed this as completed May 14, 2024

amyeroberts reopened this May 14, 2024

sanchit-gandhi closed this as completed May 16, 2024

sanchit-gandhi mentioned this issue May 16, 2024

[whisper] fix multilingual fine-tuning #30865

Merged

sanchit-gandhi reopened this May 16, 2024

sanchit-gandhi closed this as completed in #30865 May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate trainer on Code-Switched Speech fails with "ValueError: Multiple languages detected when trying to predict the most likely target language for transcription." #30654

Evaluate trainer on Code-Switched Speech fails with "ValueError: Multiple languages detected when trying to predict the most likely target language for transcription." #30654

sproocht commented May 4, 2024 •

edited

Loading

amyeroberts commented May 7, 2024

kamilakesbi commented May 10, 2024

sproocht commented May 11, 2024

sanchit-gandhi commented May 16, 2024

sanchit-gandhi commented May 16, 2024

leophill commented May 16, 2024

sproocht commented May 16, 2024

Evaluate trainer on Code-Switched Speech fails with "ValueError: Multiple languages detected when trying to predict the most likely target language for transcription." #30654

Evaluate trainer on Code-Switched Speech fails with "ValueError: Multiple languages detected when trying to predict the most likely target language for transcription." #30654

Comments

sproocht commented May 4, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented May 7, 2024

kamilakesbi commented May 10, 2024

sproocht commented May 11, 2024

sanchit-gandhi commented May 16, 2024

sanchit-gandhi commented May 16, 2024

leophill commented May 16, 2024

sproocht commented May 16, 2024

sproocht commented May 4, 2024 •

edited

Loading