Skip to content

Conversation

@vasqu
Copy link
Contributor

@vasqu vasqu commented Nov 13, 2025

The issue is/was that the cache wasn't utilized at all while several generation methods rely on it. This resulted in a mismatch between actual input and attention mask

  • the mask was correctly expanded one by one
  • after the prefill phase, the cache wasn't used resulting in kv always being of size 1

Why did it even work?

  • because we silently cut the mask to fit the kv shape, i.e. a silent bug and another good reason why enforcing the correct shape is highly needed

The fix:

  • apply self caches for both the encoder and decoder

Future considerations:

  • add cross cache for both encoder + decoder --> not used atm (but also wasn't used before as far as I can see)

@vasqu
Copy link
Contributor Author

vasqu commented Nov 13, 2025

run-slow: blt

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/blt"]
quantizations: []

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vasqu
Copy link
Contributor Author

vasqu commented Nov 13, 2025

run-slow: blt

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

⚠️ No test being reported (jobs are skipped or cancelled)!

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/blt"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: blt, mllama

@vasqu
Copy link
Contributor Author

vasqu commented Nov 14, 2025

run-slow: blt

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/blt"]
quantizations: []

@vasqu
Copy link
Contributor Author

vasqu commented Nov 14, 2025

run-slow: blt

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

⚠️ No test being reported (jobs are skipped or cancelled)!

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/blt"]
quantizations: []

@vasqu vasqu marked this pull request as ready for review November 14, 2025 13:11
@vasqu
Copy link
Contributor Author

vasqu commented Nov 14, 2025

Fixes the normal fast tests; weird issue atm on main where it takes significantly more memory / it's not properly cleaned up. Locally the slow tests pass except for those that already didnt

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting model! Approving since the PR fixes CI

I think it's fine if we don't have cache on cross attention module, and instead add a cache for Global Transformers. I skimmer over the models' tech report and seems that global transformer is a much bigger module than local encoder-decoders.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@vasqu vasqu merged commit 309180f into huggingface:main Nov 14, 2025
18 of 19 checks passed
@vasqu vasqu deleted the fix-blt branch November 14, 2025 14:58
@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

CrazyDubya pushed a commit to CrazyDubya/transformers that referenced this pull request Nov 18, 2025
Remove incorrect TODO comments (batch 4)

seamless_m4t_v2: Remove TODO comments that claimed docstrings were missing
- The docstrings for t2u_variance_predictor parameters already exist
- Lines 175-182 in the class docstring document all these parameters
- Simply removed the incorrect TODO comments

Impact:
- Removes 4 incorrect TODO comments
- No actual changes needed - documentation already complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants