[Gemma4] Replace one-hot matmul with F.embedding in position embeddings by Sriniketh24 · Pull Request #46176 · huggingface/transformers

Sriniketh24 · 2026-05-24T08:00:46Z

Summary

Gemma4VisionPatchEmbedder._position_embeddings materializes a one-hot tensor of shape [batch, num_patches, 2, position_embedding_size] in int64, then casts to the table's dtype, and matrix-multiplies against position_embedding_table. For typical training configs (position_embedding_size=10240, batch=40, 2520 patches), this allocates ~19 GiB of GPU memory for what is mathematically a 2-row embedding lookup.

This PR replaces the F.one_hot + matmul pattern with two F.embedding calls (one per spatial axis), summed. The change:

Eliminates the 15.38 GiB int64 one-hot tensor and its 3.85 GiB bf16 cast copy
Preserves numerical equivalence (embedding lookup is the same operation)
Clamps position IDs to [0, position_embedding_size - 1] (the original only clamped min=0; F.embedding would crash on OOB unlike F.one_hot which silently handled it)

Coordination

Issue discussion and approval: [Gemma4] Gemma4VisionPatchEmbedder._position_embeddings materializes a ~19 GiB one-hot tensor that's mathematically a 2-row embedding lookup #46175 (comment)
Explicit approval from issue author @kuso2006: [Gemma4] Gemma4VisionPatchEmbedder._position_embeddings materializes a ~19 GiB one-hot tensor that's mathematically a 2-row embedding lookup #46175 (comment)
No duplicate PRs exist (checked via gh pr list --search "46175 in:body")

Test

python -m pytest tests/models/gemma4/test_modeling_gemma4.py::Gemma4VisionPatchEmbedderTest -xvs

PASSED test_no_one_hot_intermediate
PASSED test_padding_zeroed
PASSED test_negative_positions_clamped
PASSED test_oob_positions_clamped
4 passed in 18.11s

AI-assisted contribution (Claude Code).

Replace the one-hot encoding + matrix multiply pattern in Gemma4VisionPatchEmbedder._position_embeddings with two F.embedding lookups (one per spatial axis) summed together. This is mathematically equivalent but avoids materializing a ~19 GiB intermediate one-hot tensor (int64) and its bf16 cast copy during training with large batch sizes. Fixes huggingface#46175 AI-assisted contribution (Claude Code).

zucchini-nlp · 2026-05-25T02:30:25Z

+        (shape ``(2, position_embedding_size, hidden_size)``).  The result is the
+        sum of the x- and y-embeddings for each patch.
+        """
+        clamped_positions = pixel_position_ids.clamp(min=0, max=self.position_embedding_size - 1)


i am not sure about clamping the upper bound, F.one_hot raises an error when the values are beyond the total number of classes

Can you check what exactly happened, and why we need the upper bound clamping?

Good catch — removed the upper-bound clamp in the latest commit. The original code only had clamp(min=0) to handle negative padding sentinels, and F.one_hot would have raised on out-of-bounds values. I've matched that behavior: clamp(min=0) only, so valid positions remain in-range by construction and any out-of-bounds input would surface the same way it did before.

zucchini-nlp · 2026-05-25T02:30:43Z

+        x_emb = F.embedding(clamped_positions[..., 0], self.position_embedding_table[0])
+        y_emb = F.embedding(clamped_positions[..., 1], self.position_embedding_table[1])


zucchini-nlp · 2026-05-25T02:31:19Z

+
+
+@require_torch
+class Gemma4VisionPatchEmbedderTest(unittest.TestCase):
+    """Unit tests for Gemma4VisionPatchEmbedder._position_embeddings."""
+
+    def _make_embedder(self, position_embedding_size=64, hidden_size=32):
+        from transformers import Gemma4VisionConfig


to remove, we don't need a test imo. I will trigger slow CI to check that model isn't broken

Done — removed the Gemma4VisionPatchEmbedderTest class entirely in the latest commit. Happy to let the slow CI integration tests cover this.

zucchini-nlp · 2026-05-25T02:31:32Z

run-slow: gemma4

github-actions · 2026-05-25T02:32:51Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/gemma4"]
quantizations: []

HuggingFaceDocBuilderDev · 2026-05-25T02:42:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-05-25T03:07:53Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	fbdd74d7	workflow commit (merge commit)
PR	af5be20a	branch commit (from PR)
main	10555512	base commit (on `main`)

Model CI Report

❌ 1 new failed tests from this PR 😭

gemma4:
tests/models/gemma4/test_modeling_gemma4.py::Gemma4IntegrationTest::test_export_text_only (❌ ⟹ ❌)

- Drop max= from pixel_position_ids.clamp(): only negative values (padding sentinels) need guarding; valid positions are in-range by construction, matching the original F.one_hot behavior. - Remove Gemma4VisionPatchEmbedderTest per reviewer request; slow CI integration tests are sufficient to catch regressions.

zucchini-nlp · 2026-05-27T11:31:53Z

run-slow: gemma4

github-actions · 2026-05-27T11:32:34Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma4

github-actions · 2026-05-27T11:33:14Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/gemma4"]
quantizations: []

github-actions · 2026-05-27T12:12:06Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	b15cc21c	workflow commit (merge commit)
PR	8696bd9b	branch commit (from PR)
main	f39b5c8b	base commit (on `main`)

Model CI Report

❌ 1 new failed tests from this PR 😭

gemma4:
tests/models/gemma4/test_modeling_gemma4.py::Gemma4IntegrationTest::test_export_text_only (❌ ⟹ ❌)

zucchini-nlp

Merging, thanks for iterating!

…gs (huggingface#46176) * [Gemma4] Replace one-hot matmul with F.embedding in position embeddings Replace the one-hot encoding + matrix multiply pattern in Gemma4VisionPatchEmbedder._position_embeddings with two F.embedding lookups (one per spatial axis) summed together. This is mathematically equivalent but avoids materializing a ~19 GiB intermediate one-hot tensor (int64) and its bf16 cast copy during training with large batch sizes. Fixes huggingface#46175 AI-assisted contribution (Claude Code). * Address review feedback: remove upper-bound clamp and test class - Drop max= from pixel_position_ids.clamp(): only negative values (padding sentinels) need guarding; valid positions are in-range by construction, matching the original F.one_hot behavior. - Remove Gemma4VisionPatchEmbedderTest per reviewer request; slow CI integration tests are sufficient to catch regressions. --------- Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

zucchini-nlp reviewed May 25, 2026

View reviewed changes

Sriniketh24 and others added 2 commits May 26, 2026 11:25

Merge branch 'main' into fix/gemma4-position-embeddings-memory

8696bd9

zucchini-nlp approved these changes May 28, 2026

View reviewed changes

zucchini-nlp enabled auto-merge May 28, 2026 11:37

zucchini-nlp added this pull request to the merge queue May 28, 2026

Merged via the queue into huggingface:main with commit bc8f70a May 28, 2026
23 of 24 checks passed

		x_emb = F.embedding(clamped_positions[..., 0], self.position_embedding_table[0])
		y_emb = F.embedding(clamped_positions[..., 1], self.position_embedding_table[1])

Conversation

Sriniketh24 commented May 24, 2026

Summary

Coordination

Test

Uh oh!

zucchini-nlp May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Sriniketh24 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp May 25, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Sriniketh24 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

zucchini-nlp commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants