[`BLT`] Fix cache usage #42188

vasqu · 2025-11-13T15:53:27Z

The issue is/was that the cache wasn't utilized at all while several generation methods rely on it. This resulted in a mismatch between actual input and attention mask

the mask was correctly expanded one by one
after the prefill phase, the cache wasn't used resulting in kv always being of size 1

Why did it even work?

because we silently cut the mask to fit the kv shape, i.e. a silent bug and another good reason why enforcing the correct shape is highly needed

The fix:

apply self caches for both the encoder and decoder

Future considerations:

add cross cache for both encoder + decoder --> not used atm (but also wasn't used before as far as I can see)

vasqu · 2025-11-13T15:53:48Z

run-slow: blt

github-actions · 2025-11-13T15:55:02Z

This comment contains run-slow, running the specified jobs:

models: ["models/blt"]
quantizations: []

HuggingFaceDocBuilderDev · 2025-11-13T16:02:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu · 2025-11-13T16:25:12Z

run-slow: blt

github-actions · 2025-11-13T16:25:57Z

CI Results

Workflow Run ⚙️

⚠️ No test being reported (jobs are skipped or cancelled)!

github-actions · 2025-11-13T16:27:11Z

This comment contains run-slow, running the specified jobs:

models: ["models/blt"]
quantizations: []

github-actions · 2025-11-14T10:12:56Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

github-actions · 2025-11-14T11:54:18Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: blt, mllama

vasqu · 2025-11-14T11:55:46Z

run-slow: blt

github-actions · 2025-11-14T11:56:58Z

This comment contains run-slow, running the specified jobs:

models: ["models/blt"]
quantizations: []

vasqu · 2025-11-14T12:23:24Z

run-slow: blt

github-actions · 2025-11-14T12:35:53Z

CI Results

Workflow Run ⚙️

⚠️ No test being reported (jobs are skipped or cancelled)!

github-actions · 2025-11-14T12:37:01Z

This comment contains run-slow, running the specified jobs:

models: ["models/blt"]
quantizations: []

vasqu · 2025-11-14T13:14:24Z

Fixes the normal fast tests; weird issue atm on main where it takes significantly more memory / it's not properly cleaned up. Locally the slow tests pass except for those that already didnt

zucchini-nlp

Very interesting model! Approving since the PR fixes CI

I think it's fine if we don't have cache on cross attention module, and instead add a cache for Global Transformers. I skimmer over the models' tech report and seems that global transformer is a much bigger module than local encoder-decoders.

ArthurZucker

thanks

github-actions · 2025-11-14T15:11:15Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

Remove incorrect TODO comments (batch 4) seamless_m4t_v2: Remove TODO comments that claimed docstrings were missing - The docstrings for t2u_variance_predictor parameters already exist - Lines 175-182 in the class docstring document all these parameters - Simply removed the incorrect TODO comments Impact: - Removes 4 incorrect TODO comments - No actual changes needed - documentation already complete

fix

31a306e

properly

1b9d8ae

Merge branch 'main' into fix-blt

f7ed925

fix tests

678edc1

vasqu marked this pull request as ready for review November 14, 2025 13:11

github-actions bot requested review from ArthurZucker and Rocketknight1 November 14, 2025 13:12

zucchini-nlp approved these changes Nov 14, 2025

View reviewed changes

ArthurZucker approved these changes Nov 14, 2025

View reviewed changes

vasqu merged commit 309180f into huggingface:main Nov 14, 2025
18 of 19 checks passed

vasqu deleted the fix-blt branch November 14, 2025 14:58

[BLT] Fix cache usage #42188

[BLT] Fix cache usage #42188

Uh oh!

Conversation

vasqu commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 13, 2025

Uh oh!

vasqu commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

CI Results

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

CI Results

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

vasqu commented Nov 14, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

vasqu commented Nov 14, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

CI Results

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

vasqu commented Nov 14, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 14, 2025

CI Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[`BLT`] Fix cache usage #42188

[`BLT`] Fix cache usage #42188

vasqu commented Nov 13, 2025 •

edited

Loading