Blockwise mask fn as opt arg in all masking functions#45477
Blockwise mask fn as opt arg in all masking functions#45477zucchini-nlp wants to merge 9 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma3, gemma4, git, paligemma, pi0 |
|
run-slow: gemma3, gemma4, git, paligemma, pi0 |
|
This comment contains models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"] |
CI ResultsCommit Info
Model CI Report❌ 17 new failed tests from this PR 😭
|
|
oh, no, I forgot about conversion. Will wait until merged and trigger test again |
What does this PR do?
As per title, I think this pattern is used quite often and deserves to be a public mask-fn. Used currently in gemma/paligemma family, GIT, PI0 and will be used in two upcoming models (deepseekOcr and Molmo2)
This PR allows all these models ton go
non-vmappath, which iirc is more preferable for us.I opted for adding 'blockwise_mask' as an arg in existing mask fn, it doesn't seems like a new mask type of itself. We might have to create otherwise -
create_blockwise_causal_mask,create_blockwise_sliding_causal_mask, etc.