Skip to content

Blockwise mask fn as opt arg in all masking functions#45477

Open
zucchini-nlp wants to merge 9 commits intohuggingface:mainfrom
zucchini-nlp:split-out-gemma-style-mask
Open

Blockwise mask fn as opt arg in all masking functions#45477
zucchini-nlp wants to merge 9 commits intohuggingface:mainfrom
zucchini-nlp:split-out-gemma-style-mask

Conversation

@zucchini-nlp
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp commented Apr 16, 2026

What does this PR do?

As per title, I think this pattern is used quite often and deserves to be a public mask-fn. Used currently in gemma/paligemma family, GIT, PI0 and will be used in two upcoming models (deepseekOcr and Molmo2)

This PR allows all these models ton go non-vmap path, which iirc is more preferable for us.

I opted for adding 'blockwise_mask' as an arg in existing mask fn, it doesn't seems like a new mask type of itself. We might have to create otherwise - create_blockwise_causal_mask, create_blockwise_sliding_causal_mask, etc.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp changed the title [WIP] Add blockwise mask fn as opt arg for all masking functions Blockwise mask fn as opt arg for all masking functions Apr 17, 2026
@zucchini-nlp zucchini-nlp changed the title Blockwise mask fn as opt arg for all masking functions Blockwise mask fn as opt arg in all masking functions Apr 17, 2026
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3, gemma4, git, paligemma, pi0

@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: gemma3, gemma4, git, paligemma, pi0

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 36c679a5 workflow commit (merge commit)
PR e8f06b29 branch commit (from PR)
main 77de8dd8 base commit (on main)

Model CI Report

17 new failed tests from this PR 😭

  • gemma3:
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_batch (❌ ⟹ ❌)
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_batch_crops (❌ ⟹ ❌)
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_bf16 (❌ ⟹ ❌)
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_crops (❌ ⟹ ❌)
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_multiimage (❌ ⟹ ❌)

  • gemma4:
    tests/models/gemma4/test_modeling_gemma4.py::Gemma4IntegrationTest::test_export_text_only (❌ ⟹ ❌)

  • paligemma:
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_integration_detection_bug (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_multiimage (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_paligemma_VQA (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_paligemma_batched (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_paligemma_batched_bf16 (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_paligemma_batched_f16 (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_paligemma_empty_prompt (❌ ⟹ ❌)

  • pi0:
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_pi0_base_libero (❌ ⟹ ❌)
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_pi0_base_reference_values (❌ ⟹ ❌)
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_train_pi0_base_libero (❌ ⟹ ❌)

@zucchini-nlp
Copy link
Copy Markdown
Member Author

oh, no, I forgot about conversion. Will wait until merged and trigger test again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants