SeedVR2 native RoPE: rewire NaMMRotaryEmbedding3d.forward to apply_rope1 directly (closes pollockjj/mydevelopment#224)#52
Merged
Conversation
Land 4 regression tests for the upcoming rewrite of NaMMRotaryEmbedding3d.forward to call apply_rope1 directly (matching the pattern used by flux/hidream/kandinsky5/lumina/qwen_image/wan/sam3) and NaMMRotaryEmbedding3d.get_freqs to emit freqs in flux-canonical shape [..., d/2, 2, 2] with cos/-sin/sin/cos baked in. Pre-rewrite baseline: - test_namm_forward_calls_apply_rope1_directly: RED — wrapper.call_count == 4 - test_get_freqs_emits_flux_canonical_shape: RED — current shape is 2D [L, dim] - test_namm_forward_output_tensor_equal_against_legacy_oracle: GREEN — trivially passes since forward IS the wrapper path pre-rewrite; this test becomes the post-rewrite equivalence gate. - test_namm_forward_ast_has_no_apply_rotary_emb_calls: RED — 4 calls at lines 536/537/543/544. Self-contained: no .pt fixture. Oracle is computed inside the test by reproducing the pre-rewrite get_freqs body and feeding the unchanged apply_rotary_emb wrapper (Shape B keeps the wrapper for RotaryEmbedding3d.forward and the lucidrains-RotaryEmbedding staticmethod registration).
…ope1 directly NaMMRotaryEmbedding3d.forward now calls comfy.ldm.flux.math.apply_rope1 directly via a thin _apply_rope1_partial helper that handles the partial- rope passthrough case (rope_dim=128 → per-axis 42 → total 126; head_dim=128; trailing 2 dims pass through unrotated, mirroring the legacy apply_rotary_emb wrapper's t_left/t_middle/t_right contract). NaMMRotaryEmbedding3d.get_freqs now emits freqs in flux-canonical layout [..., d/2, 2, 2] with cos/-sin/sin/cos baked in, via a new _to_flux_freqs_cis helper. The lucidrains-interleaved upstream layout [θ0, θ0, θ1, θ1, ...] is un-interleaved via the standard ::2 stride slice. Pattern matches the 7 other native ComfyUI native-DiT models (flux, hidream, kandinsky5, lumina, qwen_image, wan, sam3) that call apply_rope1 directly. Eliminates the wasteful freqs_mat construction + terminal torch.cat in the legacy apply_rotary_emb wrapper that OOM'd on VideoLQ_000 (1280x960x100). Wrapper retained for RotaryEmbedding3d.forward (non-multimodal video-only path) and the lucidrains RotaryEmbedding staticmethod registration. Tests added in test_seedvr_rope_rewrite.py: - test_namm_forward_calls_apply_rope1_directly - test_get_freqs_emits_flux_canonical_shape - test_namm_forward_output_tensor_equal_against_legacy_oracle - test_namm_forward_partial_rope_passthrough_matches_wrapper_oracle (dim=128) - test_namm_forward_ast_has_no_apply_rotary_emb_calls All five GREEN; pre-rewrite baseline had 3 RED (call-graph, get_freqs shape, AST) and 1 GREEN (oracle, trivially equal pre-rewrite). Existing test_seedvr_rope_delegation.py (8 parametrized cases of the wrapper itself) remains GREEN — wrapper unchanged. Net diff: comfy/ldm/seedvr/model.py +27/-4. Single file, no new dep. Tracking: pollockjj/mydevelopment#224
There was a problem hiding this comment.
Pull request overview
Reworks SeedVR2’s multimodal 3D RoPE path (NaMMRotaryEmbedding3d) to bypass the legacy apply_rotary_emb wrapper and call Flux’s apply_rope1 directly using flux-canonical freqs ([..., d/2, 2, 2]), reducing unnecessary intermediate allocations that contributed to OOMs.
Changes:
- Add helpers to convert lucidrains-interleaved freqs to flux-canonical
freqs_cis, and to apply RoPE only to the leading rotated sub-dimension (preserving the “partial-rope” tail passthrough contract). - Rewire
NaMMRotaryEmbedding3d.forwardto use_apply_rope1_partial(...)(4 call sites) and updateget_freqsto emit flux-canonical freqs. - Add a new unit test suite to lock in call-graph, shape/layout, numerical equivalence vs the legacy wrapper, partial-rope behavior, and AST-level enforcement.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
comfy/ldm/seedvr/model.py |
Adds flux-canonical freqs conversion + partial RoPE helper; rewires NaMMRotaryEmbedding3d to use apply_rope1 directly. |
tests-unit/comfy_test/test_seedvr_rope_rewrite.py |
New regression tests for the RoPE rewrite (behavioral, structural, and partial-rope coverage). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from pathlib import Path | ||
| from unittest.mock import patch | ||
|
|
||
| import pytest |
| msg="txt_k partial-rope output diverges from wrapper oracle") | ||
|
|
||
|
|
||
| # Test 4 — drives AC-4 statically: AST walk over NaMMRotaryEmbedding3d.forward |
Ruff F401 — `pytest` was imported but never referenced (no decorators, no fixtures, no parametrize). All 5 tests are plain `def test_*` functions that pytest discovers via filename + function-name conventions; they don't need the `pytest` symbol at module scope. Verified locally: `ruff check` clean; all 5 tests still GREEN.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewires
NaMMRotaryEmbedding3d.forward(the multimodal RoPE used by SeedVR2 inference) to callcomfy.ldm.flux.math.apply_rope1directly via a thin_apply_rope1_partialhelper, withNaMMRotaryEmbedding3d.get_freqsemitting freqs in flux-canonical shape[..., d/2, 2, 2](cos/-sin/sin/cos baked in). Matches the pattern used by 7 other native ComfyUI native-DiT models (flux, hidream, kandinsky5, lumina, qwen_image, wan, sam3) and eliminates the wastefulfreqs_matconstruction + terminaltorch.catin the legacyapply_rotary_embwrapper that OOM'd on the largest cell of the SeedVR2 native_3b non-tiled corpus (VideoLQ_000 1280×960×100 on RTX 5090 32 GB).Tracking issue: pollockjj/mydevelopment#224.
Diagnosis
Pre-rewrite OOM site
NaMMRotaryEmbedding3d.forwardcalls Yusef'sapply_rotary_embwrapper atcomfy/ldm/seedvr/model.py:466-504. The wrapper builds a 2×2 rotation matrix via threetorch.stacks and ends intorch.cat((t_left, t_middle_out, t_right), dim=-1). For VideoLQ_000 (320×240 LR → 1280×960×100 target):```
torch.OutOfMemoryError: Allocation on device 0 would exceed allowed memory.
Currently allocated : 26.98 GiB
Requested : 2.38 GiB
Device limit : 31.36 GiB
Free (according to CUDA): 19.69 MiB
File "comfy/ldm/seedvr/model.py", line 503, in apply_rotary_emb
out = torch.cat((t_left, t_middle_out, t_right), dim=-1)
```
Canonical (vendor
seedvr_repo/models/dit_v2/rope.py) and numz (ComfyUI-SeedVR2_VideoUpscaler/src/models/dit_3b/rope.py) both clear the same cell with the same fp16 weights. Both callfrom rotary_embedding_torch import apply_rotary_embdirectly — the upstream lucidrains library implementation. Native is the only leg that hand-rolls the wrapper.Prior signal
Kosinkadink raised this exact concern as item #2 of his upstream PR review (Comfy-Org#11294):
Yusef's response confirmed the matrix-form refactor that this PR straightens out:
The wrapper's
apply_rope1call was preserved; the wasteful matrix construction around it is what this PR removes.Implementation
Single file changed:
comfy/ldm/seedvr/model.py(one new helper + one rewrite + 4 forward sites rewired)._to_flux_freqs_cis(freqs_interleaved)(new helper)Converts lucidrains-interleaved freqs
[..., d]([θ0, θ0, θ1, θ1, ...]fromRotaryEmbedding.forward'srepeat(freqs, '... n -> ... (n r)', r=2)) into flux-canonicalfreqs_cisof shape[..., d/2, 2, 2]with cos/-sin/sin/cos baked. Mirrorscomfy/ldm/flux/math.py:rope(line 27)._apply_rope1_partial(t, freqs_cis)(new helper)Applies
apply_rope1to the leadingrot_d = 2 * freqs_cis.shape[-3]components oft's last dim, passing through the remaining dims untouched. Mirrors the partial-rope contract of the legacy wrapper'st_left/t_middle/t_rightsplit. Critical for SeedVR2-3B:rope_dim=128integer-divides into 3 axes as128 // 3 = 42per-axis, total42 * 3 = 126;head_dimis 128, so the trailing 2 head dims are unrotated. The fast path triggers whenrot_d == t.shape[-1]and avoids the cat entirely.NaMMRotaryEmbedding3d.get_freqsrewriteComputes interleaved freqs via the existing parent-class
get_axial_freqscalls (unchanged), then post-processes through_to_flux_freqs_cisso the returned tensors have the flux-canonical shapeapply_rope1expects.NaMMRotaryEmbedding3d.forwardrewireThe 4 call sites (vid_q, vid_k, txt_q, txt_k) at the previous lines 536-544 now call
_apply_rope1_partial(t, freqs_cis)directly. The legacyapply_rotary_embwrapper is no longer on the active SeedVR2 inference path.Out of scope
RotaryEmbedding3d.forward(lines 421-438; the non-multimodal video-only path) still uses the wrapper. No OOM observed there. Defer to a follow-up if benchmarking shows benefit.MMRotaryEmbeddingBase.__init__staticmethod registration (line 323). Stays.apply_rotary_embwrapper,slice_at_dim,rotate_half,existshelpers all remain — still used by the non-rewired paths.Tests
New file:
tests-unit/comfy_test/test_seedvr_rope_rewrite.py(5 tests, all GREEN post-rewrite; 3 RED on baseline before the implementation, 1 GREEN trivially as oracle gate, +1 partial-rope test added during implementation):test_namm_forward_calls_apply_rope1_directly— call-graph spy:apply_rotary_emb.call_count == 0,apply_rope1.call_count == 4.test_get_freqs_emits_flux_canonical_shape— shape[..., d/2, 2, 2]with cos/-sin/sin/cos pattern.test_namm_forward_output_tensor_equal_against_legacy_oracle— fp32 CPU oracle from the unchanged wrapper fed with legacy-shape freqs;torch.testing.assert_close(rtol=0, atol=0).test_namm_forward_partial_rope_passthrough_matches_wrapper_oracle— dim=128, rot_d=126 < head_dim=128; exercises the partial-rope path explicitly with the sametorch.equalgate. Catches the partial-rope bug that the dim=192 oracle (test 3) cannot.test_namm_forward_ast_has_no_apply_rotary_emb_calls— AST walk over forward, asserts zero references toapply_rotary_emb.Existing
test_seedvr_rope_delegation.py(8 parametrized cases of the wrapper itself) remains GREEN — wrapper unchanged.Net diff
comfy/ldm/seedvr/model.py: +27 / -4 (one new helper, one wrapper, 4 forward sites rewired)tests-unit/comfy_test/test_seedvr_rope_rewrite.py: +297 / -0 (new file, 5 tests)Single file in the production path. No new dependency. Acquires the
comfy.quant_ops.ck.apply_rope1quantized fast path for free (same as the other 7 native models).Test plan
test_seedvr_rope_delegation.py(existing wrapper tests) remain GREEN.apply_rotary_emb'storch.cat(26.98 GiB allocated, requested 2.38 GiB) reproduced on RTX 5090 32 GB cold start.concat_win(comfy/ldm/seedvr/model.py:148, called frommodel.py:852v=concat_win(vid_v, txt_v)) — that is a separate native-vs-upstream divergence (upstream casts q/k/v to bfloat16 before the cat; native does not) tracked outside this PR.pollockjj/mydevelopment:main:github_issues/220/05_outputs/*_native_3b.{mp4,log}post-rewrite, matching the pre-rewrite passing-cell count. The 22nd (VideoLQ_000) hits the separate concat_win wall.Linked