Fix #44155: [AudioFlamingo3] Batched inference produces incorrect result by danielalanbates · Pull Request #44191 · huggingface/transformers

danielalanbates · 2026-02-21T04:32:30Z

Summary

This PR fixes: [AudioFlamingo3] Batched inference produces incorrect results due to embedding/token leak between tracks

Changes

.../audioflamingo3/modeling_audioflamingo3.py      | 51 +++++++++++++++++++---
 .../audioflamingo3/modular_audioflamingo3.py       | 51 +++++++++++++++++++---
 .../audioflamingo3/processing_audioflamingo3.py    |  4 +-
 3 files changed, 95 insertions(+), 11 deletions(-)

Testing

Please review the changes carefully. The fix was verified against the existing test suite.

This PR was created with the assistance of Claude Sonnet 4.6 by Anthropic | effort: low. Happy to make any adjustments!

… misalignment The processor and model computed audio token counts with different formulas, causing masked_scatter to misalign embeddings across batch items when audio clips had different lengths. Processor formula (correct): sum mask values across all windows for a sample → apply downsampling Old model formula (wrong): apply downsampling per window → sum results Fix: - Pass `windows_per_sample` from the processor to the model so that get_audio_features can group windows by sample and apply the same formula as the processor. - Processor now includes `windows_per_sample` in its output BatchFeature and model_input_names. - forward() and prepare_inputs_for_generation() thread windows_per_sample through to get_audio_features. - When windows_per_sample is not provided (e.g. single-sample inference), the original per-window path is used as a fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-02-21T04:33:27Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3

ebezzam self-assigned this Feb 25, 2026

ebezzam added the Audio label Feb 25, 2026

ebezzam mentioned this pull request Feb 25, 2026

Add Music Flamingo #43538

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #44155: [AudioFlamingo3] Batched inference produces incorrect result#44191

Fix #44155: [AudioFlamingo3] Batched inference produces incorrect result#44191
danielalanbates wants to merge 1 commit intohuggingface:mainfrom
danielalanbates:fix/issue-44155

danielalanbates commented Feb 21, 2026

Uh oh!

github-actions bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielalanbates commented Feb 21, 2026

Summary

Changes

Testing

Uh oh!

github-actions bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants