Conversation
Issue 130: Clean SeedVR2 native runtime fixes
There was a problem hiding this comment.
Pull request overview
Adds a new unit regression test to ensure comfy.ldm.seedvr.model.apply_rotary_emb continues to implement RoPE by constructing the 2x2 rotation matrix and delegating the actual rotation to comfy.ldm.flux.math.apply_rope1.
Changes:
- Adds a parametrized pytest covering several CPU dtypes and a CUDA-guarded case.
- Recomputes
freqs_matinline and compares the wrapper output to a directapply_rope1call.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @pytest.fixture(scope="session", autouse=True) | ||
| def _print_apply_rotary_emb_params(pytestconfig): | ||
| names = list(inspect.signature(apply_rotary_emb).parameters) | ||
| line = f"params: {json.dumps(names)}" | ||
| tr = pytestconfig.pluginmanager.get_plugin("terminalreporter") | ||
| if tr is not None: | ||
| tr.write_line(line) | ||
| else: | ||
| print(line) | ||
| yield |
There was a problem hiding this comment.
The autouse session fixture prints the apply_rotary_emb parameter list to the test output, which adds noise to CI logs and doesn't assert anything. Consider removing this fixture, or turning it into an assertion (if you want to lock the signature), or gating the printing behind an explicit debug flag/CLI option.
There was a problem hiding this comment.
ACCEPT (already applied) — addressed in commit c51764a1 (Round 0, before this round-1 review request): the autouse session fixture and params: [...] print line were removed entirely. The current head SHA 5be4c76d (round 1 fix) does not contain any autouse=True, pytestconfig, or tr.write_line references.
| tol = _TOL[dtype] | ||
| assert torch.allclose(wrapper_out, direct_out, atol=tol) |
There was a problem hiding this comment.
The PR description calls out “byte-equivalence”, but the test currently uses torch.allclose(...) with dtype-dependent atol and the default rtol. If you really want to lock in exact delegation behavior, consider asserting exact equality (torch.equal) or using torch.testing.assert_close with rtol=0, atol=0 (or explicitly set rtol=0 if keeping atol), so small drifts don’t accidentally pass.
There was a problem hiding this comment.
ACCEPT (already applied) — addressed in commit c51764a1 (Round 0): switched from torch.allclose(..., atol=tol) with per-dtype atol literals to torch.testing.assert_close(rtol=0, atol=0). Round 1 commit 5be4c76d keeps that and rewords "byte-exact" → "exact tensor equality" per Copilot comment 3150100175.
| pytest.param("cpu", torch.float32, (1, 8, 16), id="cpu-float32-1x8x16"), | ||
| pytest.param("cpu", torch.float16, (1, 8, 16), id="cpu-float16-1x8x16"), | ||
| pytest.param("cpu", torch.bfloat16, (1, 8, 16), id="cpu-bfloat16-1x8x16"), | ||
| pytest.param("cpu", torch.float32, (2, 16, 32), id="cpu-float32-2x16x32"), | ||
| pytest.param( | ||
| "cuda", | ||
| torch.float16, | ||
| (1, 8, 16), | ||
| id="cuda-float16-1x8x16", | ||
| marks=pytest.mark.skipif(not torch.cuda.is_available(), reason="no cuda"), | ||
| ), | ||
| ] | ||
|
|
||
|
|
||
| @pytest.mark.parametrize("device,dtype,shape", _CASES) | ||
| def test_apply_rotary_emb_delegates_to_apply_rope1(device, dtype, shape): | ||
| torch.manual_seed(0) | ||
| t = torch.randn(*shape, dtype=dtype, device=device) | ||
| freqs = torch.randn(shape[-2], shape[-1], dtype=dtype, device=device) | ||
|
|
||
| wrapper_out = apply_rotary_emb(freqs, t) | ||
|
|
||
| rot_feats = freqs.shape[-1] | ||
| t_middle = t[..., 0:rot_feats] | ||
| angles = freqs.to(t_middle.device)[..., ::2] | ||
| cos = torch.cos(angles) * 1.0 | ||
| sin = torch.sin(angles) * 1.0 | ||
| col0 = torch.stack([cos, sin], dim=-1) | ||
| col1 = torch.stack([-sin, cos], dim=-1) | ||
| freqs_mat = torch.stack([col0, col1], dim=-1) | ||
| direct_out = apply_rope1(t_middle, freqs_mat) |
There was a problem hiding this comment.
This test only covers the default apply_rotary_emb call path where rot_feats == t.shape[-1] (so there is no left/right concatenation) and scale/start_index remain at defaults. If the goal is to lock in the wrapper logic around slicing/concatenation and scaling as well as apply_rope1 delegation, add at least one case where t has extra trailing features (t_right non-empty) and/or pass a non-default start_index / scale, and compare against a corresponding direct reproduction.
| pytest.param("cpu", torch.float32, (1, 8, 16), id="cpu-float32-1x8x16"), | |
| pytest.param("cpu", torch.float16, (1, 8, 16), id="cpu-float16-1x8x16"), | |
| pytest.param("cpu", torch.bfloat16, (1, 8, 16), id="cpu-bfloat16-1x8x16"), | |
| pytest.param("cpu", torch.float32, (2, 16, 32), id="cpu-float32-2x16x32"), | |
| pytest.param( | |
| "cuda", | |
| torch.float16, | |
| (1, 8, 16), | |
| id="cuda-float16-1x8x16", | |
| marks=pytest.mark.skipif(not torch.cuda.is_available(), reason="no cuda"), | |
| ), | |
| ] | |
| @pytest.mark.parametrize("device,dtype,shape", _CASES) | |
| def test_apply_rotary_emb_delegates_to_apply_rope1(device, dtype, shape): | |
| torch.manual_seed(0) | |
| t = torch.randn(*shape, dtype=dtype, device=device) | |
| freqs = torch.randn(shape[-2], shape[-1], dtype=dtype, device=device) | |
| wrapper_out = apply_rotary_emb(freqs, t) | |
| rot_feats = freqs.shape[-1] | |
| t_middle = t[..., 0:rot_feats] | |
| angles = freqs.to(t_middle.device)[..., ::2] | |
| cos = torch.cos(angles) * 1.0 | |
| sin = torch.sin(angles) * 1.0 | |
| col0 = torch.stack([cos, sin], dim=-1) | |
| col1 = torch.stack([-sin, cos], dim=-1) | |
| freqs_mat = torch.stack([col0, col1], dim=-1) | |
| direct_out = apply_rope1(t_middle, freqs_mat) | |
| pytest.param( | |
| "cpu", | |
| torch.float32, | |
| (1, 8, 16), | |
| 16, | |
| 0, | |
| 1.0, | |
| id="cpu-float32-1x8x16-default", | |
| ), | |
| pytest.param( | |
| "cpu", | |
| torch.float16, | |
| (1, 8, 16), | |
| 16, | |
| 0, | |
| 1.0, | |
| id="cpu-float16-1x8x16-default", | |
| ), | |
| pytest.param( | |
| "cpu", | |
| torch.bfloat16, | |
| (1, 8, 16), | |
| 16, | |
| 0, | |
| 1.0, | |
| id="cpu-bfloat16-1x8x16-default", | |
| ), | |
| pytest.param( | |
| "cpu", | |
| torch.float32, | |
| (2, 16, 32), | |
| 16, | |
| 4, | |
| 0.5, | |
| id="cpu-float32-2x16x32-partial-start4-scale0p5", | |
| ), | |
| pytest.param( | |
| "cuda", | |
| torch.float16, | |
| (1, 8, 16), | |
| 16, | |
| 0, | |
| 1.0, | |
| id="cuda-float16-1x8x16-default", | |
| marks=pytest.mark.skipif(not torch.cuda.is_available(), reason="no cuda"), | |
| ), | |
| ] | |
| @pytest.mark.parametrize( | |
| "device,dtype,shape,rotary_dim,start_index,scale", | |
| _CASES, | |
| ) | |
| def test_apply_rotary_emb_delegates_to_apply_rope1( | |
| device, dtype, shape, rotary_dim, start_index, scale | |
| ): | |
| torch.manual_seed(0) | |
| t = torch.randn(*shape, dtype=dtype, device=device) | |
| freqs = torch.randn(shape[-2], rotary_dim, dtype=dtype, device=device) | |
| wrapper_out = apply_rotary_emb(freqs, t, start_index=start_index, scale=scale) | |
| rot_feats = freqs.shape[-1] | |
| end_index = start_index + rot_feats | |
| t_left = t[..., :start_index] | |
| t_middle = t[..., start_index:end_index] | |
| t_right = t[..., end_index:] | |
| angles = freqs.to(t_middle.device)[..., ::2] | |
| cos = torch.cos(angles) * scale | |
| sin = torch.sin(angles) * scale | |
| col0 = torch.stack([cos, sin], dim=-1) | |
| col1 = torch.stack([-sin, cos], dim=-1) | |
| freqs_mat = torch.stack([col0, col1], dim=-1) | |
| direct_middle = apply_rope1(t_middle, freqs_mat) | |
| direct_out = torch.cat((t_left, direct_middle, t_right), dim=-1) |
There was a problem hiding this comment.
MODIFY (already applied) — addressed in commit c51764a1 (Round 0): non-default start_index=4 and scale=0.5 cases were added (cpu-float32-non-empty-left-and-right-slices and cpu-float32-non-default-scale). Round 1 commit 5be4c76d adds a further freqs-longer-than-seq case per Copilot comment 3150100225. The parametrize signature kept freqs_shape as a tuple rather than the suggested rotary_dim int — equivalent coverage with explicit shape control.
| torch.manual_seed(0) | ||
| t = torch.randn(*shape, dtype=dtype, device=device) | ||
| freqs = torch.randn(shape[-2], shape[-1], dtype=dtype, device=device) |
There was a problem hiding this comment.
torch.manual_seed(0) mutates the global RNG state for the entire test process, which can make unrelated tests order-dependent if they also use torch randomness. Prefer using a local torch.Generator (passed to torch.randn) or wrapping in torch.random.fork_rng() so this test is deterministic without changing global state.
| torch.manual_seed(0) | |
| t = torch.randn(*shape, dtype=dtype, device=device) | |
| freqs = torch.randn(shape[-2], shape[-1], dtype=dtype, device=device) | |
| generator = torch.Generator(device=device).manual_seed(0) | |
| t = torch.randn(*shape, dtype=dtype, device=device, generator=generator) | |
| freqs = torch.randn(shape[-2], shape[-1], dtype=dtype, device=device, generator=generator) |
There was a problem hiding this comment.
ACCEPT (already applied) — addressed in commit c51764a1 (Round 0): switched from torch.manual_seed(0) to a local torch.Generator(device=device).manual_seed(0) passed to torch.randn via the generator= kwarg. No global RNG mutation.
…rtol=0/atol=0, local Generator, non-default start_index/scale coverage, drop autouse params print
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| torch.testing.assert_close( | ||
| wrapper_out, | ||
| direct_out, | ||
| rtol=0, | ||
| atol=0, | ||
| msg=lambda m: ( | ||
| f"apply_rotary_emb does not byte-match direct apply_rope1 reproduction " | ||
| f"(device={device}, dtype={dtype}, t_shape={t_shape}, " | ||
| f"freqs_shape={freqs_shape}, start_index={start_index}, scale={scale}): {m}" | ||
| ), |
There was a problem hiding this comment.
The test and messages repeatedly say “byte-exact/byte-match”, but torch.testing.assert_close(rtol=0, atol=0) checks numerical equality and can still treat some bitwise differences as equal (e.g., +0.0 vs -0.0, and NaN payload/sign differences). If true byte-level stability is required, compare bit patterns explicitly; otherwise, consider rewording the docstring/message to “exact equality” or “zero-tolerance closeness” to avoid overstating what’s asserted.
There was a problem hiding this comment.
MODIFY — applied in commit 5be4c76d. Reworded "byte-exact"/"byte-match" to "exact tensor equality" in the module docstring, the assertion message, and the _direct_reproduction docstring. assert_close(rtol=0, atol=0) is exact numerical equality (+0.0 vs -0.0 and NaN payloads can still match), not bit-equality — agreed; the wording now reflects that.
| torch.testing.assert_close( | ||
| wrapper_out, | ||
| direct_out, | ||
| rtol=0, | ||
| atol=0, | ||
| msg=lambda m: ( |
There was a problem hiding this comment.
PR description mentions per-dtype atol literals (e.g., 1e-6/1e-3/1e-2), but the test currently uses rtol=0, atol=0 for all dtypes. Either update the description to match the implementation or adjust the test to the intended per-dtype tolerances (keeping in mind the stated goal of catching drift).
| """Regression test: comfy.ldm.seedvr.model.apply_rotary_emb must delegate to | ||
| comfy.ldm.flux.math.apply_rope1 with byte-exact equality across the wrapper's | ||
| slicing, scaling, and concatenation logic. Drift between the wrapper and the | ||
| delegate would silently corrupt SeedVR2's RoPE; this test fails loudly on any | ||
| future drift. |
There was a problem hiding this comment.
The test/docstring claims to enforce that apply_rotary_emb delegates to apply_rope1, but the assertion only checks output equivalence. If apply_rotary_emb were rewritten to compute the same result without calling apply_rope1, this test would still pass. To actually lock in the delegation invariant, consider monkeypatching/spying comfy.ldm.seedvr.model.apply_rope1 (the symbol used by the wrapper) and asserting it is invoked (and/or invoked with the expected freqs_mat shape) during the call.
There was a problem hiding this comment.
ACCEPT — applied in commit 5be4c76d. The test now wraps each apply_rotary_emb call inside unittest.mock.patch.object(seedvr_model, "apply_rope1", wraps=seedvr_model.apply_rope1) as spy: and asserts spy.call_count >= 1 per case. A future change that inlines the math and stops calling the imported apply_rope1 will leave call_count == 0 and fail the test. Same fix addresses codex P2 finding.
| """Byte-for-byte reproduction of comfy/ldm/seedvr/model.py:471-505 | ||
| apply_rotary_emb body, calling apply_rope1 directly on the middle slice. | ||
| """ | ||
| rot_feats = freqs.shape[-1] | ||
| end_index = start_index + rot_feats | ||
| t_left = t[..., :start_index] | ||
| t_middle = t[..., start_index:end_index] | ||
| t_right = t[..., end_index:] | ||
| angles = freqs.to(t_middle.device)[..., ::2] | ||
| cos = torch.cos(angles) * scale |
There was a problem hiding this comment.
_direct_reproduction is documented as a byte-for-byte reproduction of apply_rotary_emb (model.py:471-505), but it omits the earlier freqs_seq_dim inference and slice_at_dim(..., slice(-seq_len, None), ...) logic. Either include those steps in the reproduction and add a case where freqs is longer than seq_len (so that slicing actually changes the input), or adjust the docstring to match what’s reproduced today.
There was a problem hiding this comment.
MODIFY — applied in commit 5be4c76d. Extended _direct_reproduction to mirror the wrapper's implicit freqs_seq_dim=0 slicing path: when freqs.ndim == 2 and t.ndim == 3 and freqs.shape[0] > t.shape[seq_dim], the reproduction now slices freqs = freqs[-seq_len:] before computing freqs_mat. Coupled with the new cpu-float32-freqs-longer-than-seq parametrize case (per Copilot 3150100225), the slice_at_dim path is now covered. Updated the _direct_reproduction docstring to describe what is and isn't reproduced.
| pytest.param("cpu", torch.float32, (2, 16, 32), (16, 32), 0, 1.0, | ||
| id="cpu-float32-larger"), | ||
| pytest.param("cpu", torch.float32, (1, 8, 24), (8, 16), 4, 1.0, | ||
| id="cpu-float32-non-empty-left-and-right-slices"), |
There was a problem hiding this comment.
The current parameter set never exercises the freqs = slice_at_dim(freqs, slice(-seq_len, None), ...) behavior in apply_rotary_emb, because every CPU/CUDA case uses freqs_shape[0] == seq_len. Add at least one case where freqs has a longer sequence dimension than t.shape[seq_dim] so the wrapper’s freqs-slicing path is actually covered by this regression test.
| id="cpu-float32-non-empty-left-and-right-slices"), | |
| id="cpu-float32-non-empty-left-and-right-slices"), | |
| pytest.param("cpu", torch.float32, (1, 8, 16), (12, 16), 0, 1.0, | |
| id="cpu-float32-freqs-longer-than-seq"), |
There was a problem hiding this comment.
ACCEPT — applied in commit 5be4c76d. Added pytest.param("cpu", torch.float32, (1, 8, 16), (12, 16), 0, 1.0, id="cpu-float32-freqs-longer-than-seq") so freqs.shape[0] = 12 > t.shape[seq_dim] = 8 actually exercises the wrapper's slice_at_dim(freqs, slice(-seq_len, None), dim=0) path. _direct_reproduction was extended to mirror this slicing (per Copilot 3150100217). All 8 parametrized cases pass locally.
| torch.testing.assert_close( | ||
| wrapper_out, | ||
| direct_out, | ||
| rtol=0, | ||
| atol=0, | ||
| msg=lambda m: ( | ||
| f"apply_rotary_emb does not byte-match direct apply_rope1 reproduction " | ||
| f"(device={device}, dtype={dtype}, t_shape={t_shape}, " | ||
| f"freqs_shape={freqs_shape}, start_index={start_index}, scale={scale}): {m}" | ||
| ), |
There was a problem hiding this comment.
torch.testing.assert_close’s msg parameter is documented as a string; passing a lambda here risks producing an unhelpful message (e.g., the function repr) or even raising a TypeError if a failure occurs. Prefer building the formatted message string eagerly (or using pytest’s assertion introspection) so failures report cleanly.
| torch.testing.assert_close( | |
| wrapper_out, | |
| direct_out, | |
| rtol=0, | |
| atol=0, | |
| msg=lambda m: ( | |
| f"apply_rotary_emb does not byte-match direct apply_rope1 reproduction " | |
| f"(device={device}, dtype={dtype}, t_shape={t_shape}, " | |
| f"freqs_shape={freqs_shape}, start_index={start_index}, scale={scale}): {m}" | |
| ), | |
| msg = ( | |
| f"apply_rotary_emb does not byte-match direct apply_rope1 reproduction " | |
| f"(device={device}, dtype={dtype}, t_shape={t_shape}, " | |
| f"freqs_shape={freqs_shape}, start_index={start_index}, scale={scale})" | |
| ) | |
| torch.testing.assert_close( | |
| wrapper_out, | |
| direct_out, | |
| rtol=0, | |
| atol=0, | |
| msg=msg, |
There was a problem hiding this comment.
ACCEPT — applied in commit 5be4c76d. Replaced the msg=lambda m: ... callable with an eagerly-built string assigned to a local msg variable and passed to torch.testing.assert_close(..., msg=msg). Failure messages will now be plain strings, not callable reprs.
Codex Review -- Round 1 -- KICKOFFRunner: Prompt sent to codex reviewAwaiting codex output. The result will be posted as a separate comment on this PR when codex exits. |
Codex Review -- Round 1 -- RESULTPR: #21 Prompt sent to codex reviewVerbatim codex review outputThis PR was reviewed using The added regression test does not actually prove the delegation invariant that the PR is intended to lock, so a core target of the test can regress while CI remains green. Review comment:
Codex runner stderr (non-evidence) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # (device, dtype, t_shape, freqs_shape, start_index, scale) | ||
| _CASES = [ | ||
| pytest.param("cpu", torch.float32, (1, 8, 16), (8, 16), 0, 1.0, | ||
| id="cpu-float32-base"), | ||
| pytest.param("cpu", torch.float16, (1, 8, 16), (8, 16), 0, 1.0, | ||
| id="cpu-float16-base"), | ||
| pytest.param("cpu", torch.bfloat16, (1, 8, 16), (8, 16), 0, 1.0, | ||
| id="cpu-bfloat16-base"), | ||
| pytest.param("cpu", torch.float32, (2, 16, 32), (16, 32), 0, 1.0, | ||
| id="cpu-float32-larger"), | ||
| pytest.param("cpu", torch.float32, (1, 8, 24), (8, 16), 4, 1.0, | ||
| id="cpu-float32-non-empty-left-and-right-slices"), | ||
| pytest.param("cpu", torch.float32, (1, 8, 16), (8, 16), 0, 0.5, | ||
| id="cpu-float32-non-default-scale"), |
There was a problem hiding this comment.
All current parametrized cases make freqs_shape[0] == t_shape[1], so the wrapper’s freqs = slice_at_dim(freqs, slice(-seq_len, None), ...) path is effectively a no-op. Consider adding at least one case where freqs has a longer sequence dimension than t (and update the direct reproduction accordingly) so the test will catch drift in the wrapper’s freqs slicing behavior as well as the apply_rope1 delegation.
There was a problem hiding this comment.
ACCEPT (already applied) — addressed in commit 5be4c76d (round 1 fix): added pytest.param("cpu", torch.float32, (1, 8, 16), (12, 16), 0, 1.0, id="cpu-float32-freqs-longer-than-seq") so freqs.shape[0] = 12 > t.shape[seq_dim] = 8 exercises the wrapper's slice_at_dim(freqs, slice(-seq_len, None), dim=0) path. _direct_reproduction was extended in lock-step to mirror that slicing. Same fix as Copilot comment 3150100225.
…5/3150100250 — apply_rope1 spy, freqs-longer-than-seq case, exact-equality wording, eager msg string
Codex Review -- Round 2 -- KICKOFFRunner: Prompt sent to codex reviewAwaiting codex output. The result will be posted as a separate comment on this PR when codex exits. |
Codex Review -- Round 2 -- KICKOFFRunner: Prompt sent to codex reviewAwaiting codex output. The result will be posted as a separate comment on this PR when codex exits. |
Codex Review -- Round 2 -- RESULTPR: #21 Prompt sent to codex reviewVerbatim codex review outputThis PR was reviewed using APPROVE Codex runner stderr (non-evidence) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| import pytest | ||
| import torch | ||
| import torch.testing |
There was a problem hiding this comment.
import torch.testing is unused in this file (the code uses torch.testing.assert_close via the torch module). Removing the unused import keeps the test module clean and avoids lint noise if/when linting is enabled for tests.
| import torch.testing |
There was a problem hiding this comment.
ACCEPT — applied in commit ef0d3166. Removed the redundant import torch.testing line; import torch is sufficient on this torch version (torch.testing.assert_close resolves via torch's implicit attribute exposure of the testing submodule). Verified locally: import torch; torch.testing.assert_close(...) resolves cleanly without the explicit submodule import, and pytest still passes 8/8.
| math and stops calling ``apply_rope1`` fails the test (Copilot review on | ||
| PR #21 comment 3150100205; codex P2). | ||
| 2. Compares the wrapper's output against a hand-rolled reproduction using | ||
| ``torch.testing.assert_close(rtol=0, atol=0)`` -- exact tensor equality, | ||
| not bit-equality (``+0.0`` vs ``-0.0`` and NaN payloads can still match); | ||
| the assertion catches any future kernel-precision drift in the | ||
| ``apply_rope1`` dispatch (Copilot review on PR #21 comments 3149914528 and | ||
| 3150100175). | ||
|
|
||
| The test uses a local ``torch.Generator`` so global RNG state is not mutated | ||
| (Copilot review on PR #21 comment 3149914599). Parametrization covers | ||
| non-default ``start_index`` and ``scale`` and a case where | ||
| ``freqs.shape[0] > t.shape[seq_dim]`` so the wrapper's | ||
| ``slice_at_dim(freqs, slice(-seq_len, None), dim=0)`` path is exercised | ||
| (Copilot review on PR #21 comments 3149914553, 3150100217, 3150100225). |
There was a problem hiding this comment.
The module docstring contains references to PR numbers and specific Copilot comment IDs (e.g., “PR #21 comment …”). These references are brittle (they won’t be meaningful outside that PR context) and can become stale over time. Consider replacing them with a stable issue link (e.g., Issue Comfy-Org#120) or removing them so the docstring focuses on the test’s invariants and rationale.
| math and stops calling ``apply_rope1`` fails the test (Copilot review on | |
| PR #21 comment 3150100205; codex P2). | |
| 2. Compares the wrapper's output against a hand-rolled reproduction using | |
| ``torch.testing.assert_close(rtol=0, atol=0)`` -- exact tensor equality, | |
| not bit-equality (``+0.0`` vs ``-0.0`` and NaN payloads can still match); | |
| the assertion catches any future kernel-precision drift in the | |
| ``apply_rope1`` dispatch (Copilot review on PR #21 comments 3149914528 and | |
| 3150100175). | |
| The test uses a local ``torch.Generator`` so global RNG state is not mutated | |
| (Copilot review on PR #21 comment 3149914599). Parametrization covers | |
| non-default ``start_index`` and ``scale`` and a case where | |
| ``freqs.shape[0] > t.shape[seq_dim]`` so the wrapper's | |
| ``slice_at_dim(freqs, slice(-seq_len, None), dim=0)`` path is exercised | |
| (Copilot review on PR #21 comments 3149914553, 3150100217, 3150100225). | |
| math and stops calling ``apply_rope1`` fails the test. | |
| 2. Compares the wrapper's output against a hand-rolled reproduction using | |
| ``torch.testing.assert_close(rtol=0, atol=0)`` -- exact tensor equality, | |
| not bit-equality (``+0.0`` vs ``-0.0`` and NaN payloads can still match); | |
| the assertion catches any future kernel-precision drift in the | |
| ``apply_rope1`` dispatch. | |
| The test uses a local ``torch.Generator`` so global RNG state is not mutated. | |
| Parametrization covers non-default ``start_index`` and ``scale`` and a case | |
| where ``freqs.shape[0] > t.shape[seq_dim]`` so the wrapper's | |
| ``slice_at_dim(freqs, slice(-seq_len, None), dim=0)`` path is exercised. |
There was a problem hiding this comment.
MODIFY — applied in commit ef0d3166. Replaced the inline per-comment-id and per-finding-number references in the module docstring with a single stable anchor: "Test design rationale and per-decision review trail are recorded on the tracking issue: https://github.com/pollockjj/mydevelopment/issues/120". The two assertion bullets now describe the invariants directly (delegation spy + exact-equality output) without referencing PR #21 comment IDs. Per-comment review history remains in commit messages and inline replies for archaeology.
…testing', replace per-comment-id docstring refs with stable issue link
Codex Review -- Round 3 -- KICKOFFRunner: Prompt sent to codex reviewAwaiting codex output. The result will be posted as a separate comment on this PR when codex exits. |
Codex Review -- Round 3 -- RESULTPR: #21 Note: This result comment is being posted manually because the original Prompt sent to codex reviewVerbatim codex review outputThis PR was reviewed using The SeedVR2 tiled preprocessing path uses the generic VAE tiler with an incompatible 5D layout, and the VAE decode path depends on state that is not initialized outside the custom preprocessing node. These are merge-blocking functional issues for supported user workflows. Full review comments:
Codex runner stderrThe full unabridged codex stderr (749,015 bytes) is preserved at |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| torch._dynamo``), so running the test against the real modules is the | ||
| fail-loud-from-real-state approach this repo's tests follow. |
There was a problem hiding this comment.
The module docstring states that “running the test against the real modules is the fail-loud-from-real-state approach this repo's tests follow”, but other unit tests explicitly stub/mask heavy imports to avoid side effects (e.g. tests-unit/comfy_extras_test/image_stitch_test.py uses patch.dict(sys.modules, ...) to prevent CUDA init). Consider rewording this to describe the choice for this test/module specifically (or dropping the repo-wide generalization) to avoid misleading future readers.
| torch._dynamo``), so running the test against the real modules is the | |
| fail-loud-from-real-state approach this repo's tests follow. | |
| torch._dynamo``), so this test intentionally runs against the real modules | |
| to fail loudly if that import path or runtime state drifts. |
There was a problem hiding this comment.
ACCEPT — applied in commit acfeb68f. Narrowed the docstring claim from a repo-wide assertion ("the fail-loud-from-real-state approach this repo's tests follow") to a test-local choice, and explicitly cited tests-unit/comfy_extras_test/image_stitch_test.py as a counter-example of stub usage so future readers see both patterns coexist. Did not adopt the verbatim suggestion (which trimmed the rationale paragraphs); instead kept the import-chain explanation and reworded only the overgeneralization. pytest 8/8 still passes, Ruff clean.
| pytest.param("cpu", torch.float16, (1, 8, 16), (8, 16), 0, 1.0, | ||
| id="cpu-float16-base"), | ||
| pytest.param("cpu", torch.bfloat16, (1, 8, 16), (8, 16), 0, 1.0, | ||
| id="cpu-bfloat16-base"), |
There was a problem hiding this comment.
The parametrization includes CPU float16 and bfloat16 cases, but the in-repo SeedVR2 call sites cast inputs to float32 before calling apply_rotary_emb (e.g. comfy/ldm/seedvr/model.py:435-436 and 537-545 use q.float()/k.float()). If apply_rotary_emb is only exercised with float32 on CPU in practice, these extra CPU dtypes add a stronger (and potentially unnecessary) compatibility constraint to the test. Consider limiting CPU cases to float32 and keeping the float16 coverage on CUDA (where half precision is expected).
| pytest.param("cpu", torch.float16, (1, 8, 16), (8, 16), 0, 1.0, | |
| id="cpu-float16-base"), | |
| pytest.param("cpu", torch.bfloat16, (1, 8, 16), (8, 16), 0, 1.0, | |
| id="cpu-bfloat16-base"), |
There was a problem hiding this comment.
REJECT — the issue Comfy-Org#120 plan body's Slice 2 AC-2 (https://github.com/pollockjj/mydevelopment/issues/120) literally requires this dtype coverage: AC-2: Parametrize coverage: the test in AC-1 is parametrized over at minimum these (device, dtype, shape) combinations: (cpu, torch.float32, (1, 8, 16)), (cpu, torch.float16, (1, 8, 16)), (cpu, torch.bfloat16, (1, 8, 16)), (cpu, torch.float32, (2, 16, 32)), and one CUDA case (cuda, torch.float16, (1, 8, 16)). The CPU float16 / bfloat16 cases were probed live (GTP-O) at max_abs_delta = 0.0 and locked into the AC by the planner before this PR was opened. Removing them would deviate from the approved plan AC and shrink the regression's coverage of any future caller that does pass non-float32 dtypes. The q.float()/k.float() casts at comfy/ldm/seedvr/model.py:435-436 and 537-545 are existing call-site choices, not contracts on apply_rotary_emb — the wrapper accepts arbitrary dtypes per its signature, and the regression intentionally locks behavior across all three. Skill rule reference: rejection rationale must reference a concrete project fact / plan AC (.claude/skills/pr-review/SKILL.md); plan Comfy-Org#120 Slice 2 AC-2 is the concrete fact.
…nvention to test-local choice; cite image_stitch_test.py as a counter-example of stub usage in this repo
Codex Review — Round 4 — DISPUTEThe two PR diff scope (live re-verification)Still one file: Per-finding disputeFinding 1 — DISPUTE. Same finding family as the round-3 dispute (which codex worded as "Route SeedVR tiling through the SeedVR tiler" against the same lines). Finding 2 — DISPUTE. Skill rule referenced
Both findings are recorded as DISPUTE in Codex review pattern observation (4 rounds)Codex review behavior on PR #21 across four rounds:
Rounds 3 and 4 codex reviews are surfacing pre-existing concerns in the broader SeedVR2 ecosystem that fall outside both this PR's diff and codex's own prompt rule against pre-existing bugs. Without a way to constrain codex to the PR diff, the FSM cannot autonomously converge on PR #21 by issuing more rounds — additional rounds will likely surface more pre-existing concerns in the SeedVR2 ecosystem rather than new test-file findings. Recommendation to userThe PR-scoped review state is:
If the user agrees the 4 codex P2 disputes are out-of-scope-for-this-PR, override them and the FSM transitions to READY. Otherwise the user makes the merge call directly. The slice LLM is exiting the FSM reentrantly at this point — the loop cannot autonomously converge while codex keeps surfacing pre-existing-and-out-of-scope concerns. Re-invoking |
Codex Review -- Round 5 -- KICKOFFRunner: Prompt sent to codex reviewAwaiting codex output. The result will be posted as a separate comment on this PR when codex exits. |
Codex Review -- Round 5 -- RESULTPR: #21 Prompt sent to codex reviewVerbatim codex review outputThis PR was reviewed using No merge-blocking issues were found in the PR-scoped test-only diff. Codex runner stderr (non-evidence) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # (device, dtype, t_shape, freqs_shape, start_index, scale) | ||
| _CASES = [ | ||
| pytest.param("cpu", torch.float32, (1, 8, 16), (8, 16), 0, 1.0, | ||
| id="cpu-float32-base"), | ||
| pytest.param("cpu", torch.float16, (1, 8, 16), (8, 16), 0, 1.0, | ||
| id="cpu-float16-base"), |
There was a problem hiding this comment.
The parametrization includes a CPU torch.float16 case. On the repo’s unit-test CI this runs with the CPU PyTorch wheels (.github/workflows/test-unit.yml installs from the /whl/cpu index), and some PyTorch builds don’t implement torch.cos/torch.sin for float16 tensors on CPU (raising at runtime). Consider either removing the CPU-float16 case, or guarding it with a skip that checks whether torch.cos(torch.zeros(1, dtype=torch.float16)) works on the current runtime so CI remains stable across PyTorch versions/platforms.
| # (device, dtype, t_shape, freqs_shape, start_index, scale) | |
| _CASES = [ | |
| pytest.param("cpu", torch.float32, (1, 8, 16), (8, 16), 0, 1.0, | |
| id="cpu-float32-base"), | |
| pytest.param("cpu", torch.float16, (1, 8, 16), (8, 16), 0, 1.0, | |
| id="cpu-float16-base"), | |
| def _cpu_float16_trig_supported(): | |
| """Return whether CPU float16 trig ops are available on this runtime.""" | |
| try: | |
| torch.cos(torch.zeros(1, dtype=torch.float16)) | |
| except (RuntimeError, TypeError): | |
| return False | |
| return True | |
| # (device, dtype, t_shape, freqs_shape, start_index, scale) | |
| _CASES = [ | |
| pytest.param("cpu", torch.float32, (1, 8, 16), (8, 16), 0, 1.0, | |
| id="cpu-float32-base"), | |
| pytest.param( | |
| "cpu", torch.float16, (1, 8, 16), (8, 16), 0, 1.0, | |
| id="cpu-float16-base", | |
| marks=pytest.mark.skipif( | |
| not _cpu_float16_trig_supported(), | |
| reason="torch.cos/torch.sin unsupported for float16 tensors on CPU", | |
| ), | |
| ), |
There was a problem hiding this comment.
ACCEPT — applied in commit f80a8727. Adopted the suggested skip-if-trig-unsupported pattern. Generalized to a helper _cpu_trig_supported(dtype) so the same guard applies to both cpu-float16-base and cpu-bfloat16-base cases (the bfloat16 case is in plan Comfy-Org#120 Slice 2 AC-2 and has the same potential CPU-wheel portability issue). Module-level probes _CPU_FP16_TRIG_OK / _CPU_BF16_TRIG_OK evaluate at collection time; cases skip cleanly with a clear reason on builds that don't implement those trig ops. Plan AC-2 dtype coverage is preserved (the cases stay in the parametrize list); only the runtime fan-out is adapted to the build's capability matrix. pytest 8/8 still passes locally on amor-fati (CUDA + working CPU fp16/bf16); Ruff clean.
…h pytest.mark.skipif(not _cpu_trig_supported(dtype)) so PyTorch CPU wheels that don't implement torch.cos/sin for those dtypes skip cleanly instead of failing CI; cases preserved per plan Comfy-Org#120 Slice 2 AC-2
Codex Review -- Round 6 -- KICKOFFRunner: Prompt sent to codex reviewAwaiting codex output. The result will be posted as a separate comment on this PR when codex exits. |
Codex Review -- Round 6 -- RESULTPR: #21 Prompt sent to codex reviewVerbatim codex review outputThis PR was reviewed using No P0/P1/P2/P3 issues were found in the latest PR state. Codex runner stderr (non-evidence) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| the wrapper and the delegate would silently corrupt SeedVR2's RoPE; this test | ||
| fails loudly on any future drift. | ||
|
|
||
| Each parametrized case both: |
There was a problem hiding this comment.
Module docstring sentence "Each parametrized case both:" is grammatically incomplete. Consider rephrasing to "Each parametrized case does both:" (or similar) so the intro reads cleanly before the numbered list.
| Each parametrized case both: | |
| Each parametrized case does both: |
There was a problem hiding this comment.
ACCEPT — applied in commit 960d0ce5. Updated to "Each parametrized case does both:" per the suggestion. pytest 8/8 still passes, Ruff clean.
…se both:' -> 'each parametrized case does both:'
Codex Review -- Round 7 -- KICKOFFRunner: Prompt sent to codex reviewAwaiting codex output. The result will be posted as a separate comment on this PR when codex exits. |
Codex Review -- Round 7 -- RESULTPR: #21 Prompt sent to codex reviewVerbatim codex review outputThis PR was reviewed using No P0/P1/P2/P3 issues were found in the latest PR state. Codex runner stderr (non-evidence) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codex Review -- Round 1 -- KICKOFFRunner: Prompt sent to codex reviewAwaiting codex output. The result will be posted as a separate comment on this PR when codex exits. |
|
Superseded by #30 — consolidated into a single test PR against the clean |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 41 out of 44 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Regression test: ``comfy.ldm.seedvr.model.apply_rotary_emb`` must delegate | ||
| to ``comfy.ldm.flux.math.apply_rope1`` and produce exact-equality output | ||
| across the wrapper's slicing, scaling, and concatenation logic. Drift between | ||
| the wrapper and the delegate would silently corrupt SeedVR2's RoPE; this test | ||
| fails loudly on any future drift. | ||
|
|
||
| Each parametrized case does both: | ||
|
|
||
| 1. Patches ``comfy.ldm.seedvr.model.apply_rope1`` with a ``wraps``-style spy | ||
| and asserts ``spy.call_count >= 1`` so a future change that inlines the | ||
| math and stops calling ``apply_rope1`` fails the test. | ||
| 2. Compares the wrapper's output against a hand-rolled reproduction using | ||
| ``torch.testing.assert_close(rtol=0, atol=0)`` -- exact tensor equality, | ||
| not bit-equality (``+0.0`` vs ``-0.0`` and NaN payloads can still match); | ||
| the assertion catches any future kernel-precision drift in the | ||
| ``apply_rope1`` dispatch. | ||
|
|
||
| The test uses a local ``torch.Generator`` so global RNG state is not mutated. | ||
| Parametrization covers non-default ``start_index`` and ``scale`` and a case | ||
| where ``freqs.shape[0] > t.shape[seq_dim]`` so the wrapper's | ||
| ``slice_at_dim(freqs, slice(-seq_len, None), dim=0)`` path is exercised. | ||
| Imports are taken at module level. Heavy-import stubbing of | ||
| ``comfy.model_management`` was attempted but is insufficient on this live | ||
| import chain (``comfy.ldm.seedvr.model`` pulls | ||
| ``comfy.ldm.modules.diffusionmodules.model -> comfy.ops -> | ||
| comfy.memory_management -> comfy.quant_ops -> comfy_kitchen.tensor -> | ||
| torch._dynamo``), so this test intentionally runs against the real modules | ||
| to fail loudly if that import path or runtime state drifts. Other tests in | ||
| this repo (e.g. ``tests-unit/comfy_extras_test/image_stitch_test.py``) do | ||
| stub via ``patch.dict(sys.modules, ...)`` for narrower targets; the choice | ||
| here is local to this regression and not a repo-wide convention. | ||
|
|
||
| Test design rationale and per-decision review trail are recorded on the | ||
| tracking issue: https://github.com/pollockjj/mydevelopment/issues/120 | ||
| """ |
There was a problem hiding this comment.
PR description says this repo change is “test only”, but this PR also adds/changes substantial runtime code paths (e.g., new SeedVR2 model/VAE integration, attention backend changes, nodes loading). Please update the PR description (or split into separate PRs) so reviewers and release notes correctly reflect the scope and risk of these changes.
There was a problem hiding this comment.
SeedVR2 latent format sets latent_dimensions = 16, which is inconsistent with other video latent formats in this file (e.g., HunyuanVideo15 uses latent_dimensions = 3 for (T,H,W)). If SeedVR2 latents are video-like (the VAE path sets latent_dim = 3), this value is likely incorrect and can break latent shape handling. Please confirm the intended latent dimensionality and adjust latent_dimensions accordingly (likely 3).
| @torch.inference_mode() | ||
| def tiled_vae(x, vae_model, tile_size=(512, 512), tile_overlap=(64, 64), temporal_size=16, encode=True, **kwargs): | ||
|
|
||
| gc.collect() | ||
| torch.cuda.empty_cache() | ||
|
|
||
| x = x.to(next(vae_model.parameters()).dtype) |
There was a problem hiding this comment.
tiled_vae() unconditionally calls torch.cuda.empty_cache(). On CPU-only PyTorch builds (no CUDA compiled in), this can raise at runtime; it also bypasses ComfyUI’s device-aware cache clearing helpers. Prefer using comfy.model_management.soft_empty_cache() (or at least guarding with if torch.cuda.is_available():) so this works across CPU/MPS/XPU/NPU and doesn’t crash on CPU-only installs.
| def clear_vae_memory(vae_model): | ||
| for module in vae_model.modules(): | ||
| if hasattr(module, "memory"): | ||
| module.memory = None | ||
| gc.collect() | ||
| torch.cuda.empty_cache() | ||
|
|
There was a problem hiding this comment.
clear_vae_memory() unconditionally calls torch.cuda.empty_cache(). This can raise on CPU-only PyTorch builds and doesn’t handle non-CUDA accelerators. Prefer comfy.model_management.soft_empty_cache() (or guard with torch.cuda.is_available()) so the node works reliably across environments.
Issue Comfy-Org#120: Verify SeedVR2 RoPE implementation reuse
Tracking issue: https://github.com/pollockjj/mydevelopment/issues/120
Final QA PASS: https://github.com/pollockjj/mydevelopment/issues/120#issuecomment-4329927195 (qa-agent-seveneves[bot], Slice 2 PASS)
Affected Repository (this PR)
Summary
Adds a parametrized regression test that locks in the wrapper-to-
apply_rope1delegation invariant ofcomfy.ldm.seedvr.model.apply_rotary_emb. The wrapper builds a 2x2 rotation matrixfreqs_matfromfreqs[..., ::2]and delegates the rotation toapply_rope1fromcomfy.ldm.flux.math(live-verified atmax_abs_delta = 0.0per GTP-J/O of the plan). The test exercises the wrapper across CPU float32 / float16 / bfloat16, larger and partial-slice shapes, non-defaultstart_indexandscale, afreqs.shape[0] > t.shape[seq_dim]case to cover the wrapper'sslice_at_dimpath, and a CUDA-guarded float16 case.The test asserts two invariants per case:
comfy.ldm.seedvr.model.apply_rope1is patched with aunittest.mock.patch.object(..., wraps=...)spy. After the wrapper runs, the test assertsspy.call_count >= 1so a future change that inlines the math and stops callingapply_rope1fails the test.torch.testing.assert_close(wrapper_out, direct_out, rtol=0, atol=0)— exact tensor equality (not bit-equality;+0.0/-0.0and NaN payloads can still match) — against a hand-rolled reproduction of the wrapper's body. Livemax_abs_deltaonissue_101HEAD is0.0for every case in the plan's GTP-O probe, so any future precision drift in theapply_rope1dispatch is also caught.Validation Summary
<comfyui_repo>/.venv/Scripts/pytest.exe tests-unit/comfy_test/test_seedvr_rope_delegation.py -v→8 passed in 1.88son amor-fati (Windows, Python 3.13.5, pytest 9.0.3, CUDA available).pollockjj/mydevelopmentissue_120github_issues/120/slice2/mutation_proof.md: a no-op early-return at a temporary local-only branch produced exit code 1 with assertion failures ontorch.testing.assert_close(rtol=0, atol=0); the unmutatedissue_120produced exit code 0 with all cases passing.comfy/ldm/seedvr/model.pyonissue_120. The mutation branch was deleted after capture.What This PR Does Not Do
This PR does not merge and does not close issue Comfy-Org#120. It opens the test file from
issue_120againstissue_101for review, in line with the issue's parent issue #101 single-PR target. Branch and issue remain open until the parent flow handles them.