-
Notifications
You must be signed in to change notification settings - Fork 31.3k
[BLT] Fix cache usage
#42188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BLT] Fix cache usage
#42188
Conversation
|
run-slow: blt |
|
This comment contains models: ["models/blt"] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: blt |
CI Results |
|
This comment contains models: ["models/blt"] |
CI Results✅ No failing test specific to this PR 🎉 ! |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: blt, mllama |
|
run-slow: blt |
|
This comment contains models: ["models/blt"] |
|
run-slow: blt |
CI Results |
|
This comment contains models: ["models/blt"] |
|
Fixes the normal fast tests; weird issue atm on main where it takes significantly more memory / it's not properly cleaned up. Locally the slow tests pass except for those that already didnt |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting model! Approving since the PR fixes CI
I think it's fine if we don't have cache on cross attention module, and instead add a cache for Global Transformers. I skimmer over the models' tech report and seems that global transformer is a much bigger module than local encoder-decoders.
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
CI Results✅ No failing test specific to this PR 🎉 ! |
Remove incorrect TODO comments (batch 4) seamless_m4t_v2: Remove TODO comments that claimed docstrings were missing - The docstrings for t2u_variance_predictor parameters already exist - Lines 175-182 in the class docstring document all these parameters - Simply removed the incorrect TODO comments Impact: - Removes 4 incorrect TODO comments - No actual changes needed - documentation already complete
The issue is/was that the cache wasn't utilized at all while several generation methods rely on it. This resulted in a mismatch between actual input and attention mask
Why did it even work?
The fix:
Future considerations: