Skip to content

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Sep 3, 2025

What does this PR do?

tests/models/voxtral/test_modeling_voxtral.py::VoxtralForConditionalGenerationModelTest::test_prompt_lookup_decoding_matches_greedy_search

is flaky

https://app.circleci.com/pipelines/github/huggingface/transformers/144445/workflows/829f1347-398e-493f-a531-36c3178da153/jobs/1909491

    @can_return_tuple
    @auto_docstring
    def forward(
        self,
        input_ids: Optional[torch.LongTensor] = None,
        input_features: Optional[torch.FloatTensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.LongTensor] = None,
        past_key_values: Optional[Cache] = None,
        inputs_embeds: Optional[torch.FloatTensor] = None,
        labels: Optional[torch.LongTensor] = None,
        use_cache: Optional[bool] = None,
        cache_position: Optional[torch.LongTensor] = None,
        logits_to_keep: Union[int, torch.Tensor] = 0,
        **kwargs: Unpack[TransformersKwargs],
    ) -> CausalLMOutputWithPast:
        if inputs_embeds is None:
            inputs_embeds = self.get_input_embeddings()(input_ids)
    
        if input_features is not None:
            audio_embeds = self.get_audio_embeds(input_features)
    
            # replace text-audio token placeholders with audio embeddings
            audio_token_mask = input_ids == self.config.audio_token_id
>           inputs_embeds[audio_token_mask] = audio_embeds
E           RuntimeError: shape mismatch: value tensor of shape [30, 32] cannot be broadcast to indexing result of shape [32, 32]

/usr/local/lib/python3.9/site-packages/transformers/models/voxtral/modeling_voxtral.py:512: RuntimeError```

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing, it's been annoying me for a while as well!

Now we have audio models and the list is growing huge 🙃 Maybe we can update this by checking if the config has any of audio/image/video token id and skip it?

@ydshieh
Copy link
Collaborator Author

ydshieh commented Sep 3, 2025

Maybe we can update this by checking if the config has any of audio/image/video token id and skip it?

Let me add a TODO as comment and do it later 🙏

too much things this week 😢

@ydshieh ydshieh enabled auto-merge (squash) September 3, 2025 11:36
@ydshieh ydshieh merged commit c485c52 into main Sep 3, 2025
25 checks passed
@ydshieh ydshieh deleted the fix_voxtral branch September 3, 2025 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants