Skip to content

Fix #44155: [AudioFlamingo3] Batched inference produces incorrect result#44191

Closed
danielalanbates wants to merge 1 commit intohuggingface:mainfrom
danielalanbates:fix/issue-44155
Closed

Fix #44155: [AudioFlamingo3] Batched inference produces incorrect result#44191
danielalanbates wants to merge 1 commit intohuggingface:mainfrom
danielalanbates:fix/issue-44155

Conversation

@danielalanbates
Copy link
Copy Markdown

Fixes #44155

Summary

This PR fixes: [AudioFlamingo3] Batched inference produces incorrect results due to embedding/token leak between tracks

Changes

.../audioflamingo3/modeling_audioflamingo3.py      | 51 +++++++++++++++++++---
 .../audioflamingo3/modular_audioflamingo3.py       | 51 +++++++++++++++++++---
 .../audioflamingo3/processing_audioflamingo3.py    |  4 +-
 3 files changed, 95 insertions(+), 11 deletions(-)

Testing

Please review the changes carefully. The fix was verified against the existing test suite.


This PR was created with the assistance of Claude Sonnet 4.6 by Anthropic | effort: low. Happy to make any adjustments!

… misalignment

The processor and model computed audio token counts with different
formulas, causing masked_scatter to misalign embeddings across batch
items when audio clips had different lengths.

Processor formula (correct):
  sum mask values across all windows for a sample → apply downsampling

Old model formula (wrong):
  apply downsampling per window → sum results

Fix:
- Pass `windows_per_sample` from the processor to the model so that
  get_audio_features can group windows by sample and apply the same
  formula as the processor.
- Processor now includes `windows_per_sample` in its output
  BatchFeature and model_input_names.
- forward() and prepare_inputs_for_generation() thread
  windows_per_sample through to get_audio_features.
- When windows_per_sample is not provided (e.g. single-sample
  inference), the original per-window path is used as a fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3

@ebezzam ebezzam self-assigned this Feb 25, 2026
@ebezzam ebezzam added the Audio label Feb 25, 2026
@ebezzam ebezzam mentioned this pull request Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AudioFlamingo3] Batched inference produces incorrect results due to embedding/token leak between tracks

2 participants