[Issue 101][eval-and-fix] Keep batch/time axes distinct for single-frame decode (closes pollockjj/mydevelopment#189)#34
Conversation
…hed image-mode decode VideoAutoencoderKLWrapper.decode applied .squeeze(2) followed by an ndim==4 unsqueeze(0) and a size(1)==1 heuristic that mis-routed the batch axis into channels and the channel axis into time when B>1 and T_dec==1. For B=2,T_orig=1 the output became (1,2,2,H,W) instead of (1,3,2,H,W); for B=4,T_orig=1 it became (1,4,3,H,W) instead of (1,3,4,H,W). Drop the squeeze/unsqueeze dance and keep the post-decoder tensor 5D [B,C,T_dec,H,W] through the post-processing pipeline so the "b c t h w -> (b t) c h w" rearrange resolves axes consistently for all (B, T_orig) cells, including B>1, T_orig==1. Latent-side prep block (b, tc, h, w = z.shape ; latent.view(b, 16, ...) ; scale=0.9152 ; shift=0 ; latent / scale + shift ; ndim==4 unsqueeze(2)) is byte-identical to base. CodeRabbit finding: Comfy-Org#11294 (comment) Adds tests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.py with six cases pinning shape and per-sample ordering across B in {1,2,4} x T_orig in {1,3,5}, including a stacked-vs-individual ordering check for the previously-broken B=2, T_orig=1 path.
…ssion Replace tuple(out.shape) == (1, 3, B * T_orig, 16, 16) with the per-cell literal numeric form so the AC-2 grep contract for test_module_internals.log matches verbatim. Also flatten the two patch.object(...) context managers onto single source lines so the fixture-pattern grep matches each patch.object(...decode_) and patch.object(...lab_color_transfer) call without crossing newlines. Behavior unchanged: 6 passed in 1.81s.
There was a problem hiding this comment.
Pull request overview
Fixes the SeedVR2 VAE decode shape bug where single-frame batched decodes were swapping batch/time semantics, and adds regression tests around the non-tiled decode path.
Changes:
- Removes the single-frame squeeze/unsqueeze heuristic from
VideoAutoencoderKLWrapper.decodeso decoder outputs stay 5D through reshape logic. - Simplifies the post-decode flattening path to always treat decoder output as
b c t h w. - Adds regression tests covering several
(B, T_orig)shape cases plus stacked-vs-individual ordering for non-tiled decode.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
comfy/ldm/seedvr/vae.py |
Adjusts SeedVR decode output handling to preserve batch/time axes. |
tests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.py |
Adds CPU-only regression tests for single-frame and multi-frame non-tiled decode cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| else: | ||
| x = super().decode_(latent).squeeze(2) | ||
| x = super().decode_(latent) | ||
|
|
|
|
||
| if self.enable_tiling: | ||
| x = tiled_vae(latent, self, **self.tiled_args, encode=False).squeeze(2) | ||
| x = tiled_vae(latent, self, **self.tiled_args, encode=False) |
There was a problem hiding this comment.
[P2] Preserve 4D tiled outputs for single-frame sf_t=1 decodes
When tiling is enabled on a wrapper configured with temporal_downsample_factor == 1 and a single latent frame, tiled_vae() still returns a 4D tensor because it squeezes the temporal axis before returning. This patch removes the only x.ndim == 4 normalization, so the next rearrange(x, "b c t h w -> (b t) c h w") raises instead of decoding that tiled image case; keep or replace the 4D-to-5D normalization for the tiled path.
…rmalization on tiled-decode branch for sf_t==1 + T_lat==1
tiled_vae() at comfy/ldm/seedvr/vae.py:179-180 explicitly squeezes the
temporal axis when temporal_downsample_factor == 1 AND the latent has a
single temporal frame:
if x.shape[2] == 1 and sf_t == 1:
result = result.squeeze(2)
The Issue 189 patch removed the prior `if x.ndim == 4: x = x.unsqueeze(0)`
heuristic (which had its own batch/time confusion bug). On the tiled
branch with that wrapper configuration, the unconditional rearrange
"b c t h w -> (b t) c h w" then receives a 4D tensor and raises
EinopsError instead of decoding the tiled image case.
Re-add the 4D->5D normalization scoped to the tiled branch only.
The non-tiled path stays unchanged because super().decode_(latent)
returns 5D unconditionally.
Adds test_decode_tiled_sf_t1_single_frame_4d_output_normalized() to
tests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.py: patches
vae_mod.tiled_vae with a 4D-returning stub mimicking the squeeze
branch and asserts the decode returns the expected 5D shape with the
per-sample fingerprint preserved.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_decode_tiled_sf_t1_single_frame_4d_output_normalized(): | ||
| """Codex P2 / Copilot finding on PR #34: ``tiled_vae`` returns 4D | ||
| when ``temporal_downsample_factor == 1`` AND latent T == 1, so the | ||
| wrapper must re-add the temporal axis on the tiled branch before | ||
| the rearrange ``b c t h w -> (b t) c h w``. Pre-fix this case raised | ||
| an einops ``EinopsError`` because the patch removed the only | ||
| ``x.ndim == 4`` normalization. |
| x = tiled_vae(latent, self, **self.tiled_args, encode=False) | ||
| if x.ndim == 4: | ||
| # tiled_vae squeezes the temporal axis when | ||
| # temporal_downsample_factor == 1 AND latent T == 1 | ||
| # (see tiled_vae line 179-180); re-add it so the post-decode | ||
| # pipeline can keep batch and time distinct on the tiled path. | ||
| x = x.unsqueeze(2) |
…n coverage Copilot follow-up on PR #34: the tiled-path 4D->5D normalization (commit 3b3c150) was only covered for B=1, leaving the entire reason the issue exists — the batched single-frame batch/time axis swap — unprotected on the tiled path. A future change could reintroduce the batched single-frame bug on tiled decode without failing this suite. Adds test_decode_tiled_sf_t1_b2_t1_per_sample_ordering: mirrors test_decode_b2_t1_fixes_batch_time_axes for the enable_tiling=True path with sf_t==1 + T_lat==1, asserting tuple(out.shape) == (1, 3, 2, 16, 16) and per-sample fingerprint preservation (out[0,0,0,0,0]=1.0, out[0,0,1,0,0]=2.0).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…n contract (closes pollockjj/mydevelopment#192) (#46) * [Issue 101][eval-and-fix] Fix VideoAutoencoderKLWrapper.forward return contract (closes pollockjj/mydevelopment#192) VideoAutoencoderKLWrapper.forward() called self.decode(z).sample, but VideoAutoencoderKLWrapper.decode() returns a plain torch.Tensor (post-Comfy-Org#189 / PR #34 normalization). Direct wrapper invocation therefore raised AttributeError on the tensor return. Drop the .sample dereference; use the tensor return directly. forward() now returns the (x_out, z, p) triple as documented in the wrapper's encode/decode contract. Sister to mydevelopment#190 (PR #44) which fixed the parent class VideoAutoencoderKL.forward for the same bug class flagged on Comfy-Org#11294 by CodeRabbit (thread r2959796348, "Also applies to: 2083-2087" trailer). Adds tests-unit/comfy_test/seedvr_vae_wrapper_forward_test.py — four CPU-only regression tests that build a wrapper standin via __new__ + nn.Module.__init__, register a single dummy parameter so the wrapper's encode dtype lookup resolves, set original_image_video / img_dims / tiled_args so the wrapper.decode guards pass, patch the parent VideoAutoencoderKL.encode / decode_ plus the module-level lab_color_transfer with fingerprint-tagged stubs, then call wrapper.forward(x) end-to-end and assert (1) the return is a 3-tuple of tensors with no AttributeError raised, (2) x_out has exact expected shape, dtype, and value-fingerprint under torch.equal, (3) z matches the encode-side posterior squeezed on dim 2 under torch.equal, and (4) inspect.getsource(wrapper.forward) contains no ".sample" string. * [Issue 101][eval-and-fix] Tighten VideoAutoencoderKLWrapper.forward regression test to two-test contract (pollockjj/mydevelopment#192) The regression suite for VideoAutoencoderKLWrapper.forward now defines exactly the two contract-named tests: - test_wrapper_forward_returns_tensor_triple monkeypatches VideoAutoencoderKLWrapper.encode and VideoAutoencoderKLWrapper.decode directly on the class, builds the wrapper standin via __new__ + nn.Module.__init__, sets original_image_video to a 5-D tensor and img_dims to a 2-tuple, invokes wrapper.forward(x), and asserts the full return-contract: 3-tuple, three torch.Tensor types, binary shape equality (x_out.shape == decode_out.shape, z.shape == posterior.squeeze(2).shape), tensor equality (torch.equal(x_out, decode_out), torch.equal(z, posterior.squeeze(2))), and identity (p is posterior). - test_wrapper_forward_source_has_no_sample_access asserts ".sample" not in inspect.getsource(VideoAutoencoderKLWrapper.forward) so the failing pre-fix body raises an explicit assertion failure on the literal forbidden token. Stubbing on the wrapper class (not the parent) bypasses the parent encode/decode_ entirely, removes the lab_color_transfer monkey-patch, and makes the AttributeError pre-fix path surface directly on self.decode(z).sample because the stubbed decode returns a plain tensor.
Plan: Keep batch/time axes distinct for single-frame batched decode
Overview
The SeedVR2 native VAE wrapper
comfy.ldm.seedvr.vae.VideoAutoencoderKLWrapper.decodecorrupts batched image-mode decode output whenB > 1andT == 1. The decoder's 5D output[B, C, 1, H, W]is flattened bysuper().decode_(latent).squeeze(2)to[B, C, H, W]; the existingif x.ndim == 4: x = x.unsqueeze(0)block then makes the tensor[1, B, C, H, W], after which thex.size(1) == 1gate routes batched cases through the wrongrearrangeexpression that re-interprets the batch axis as channels and the channel axis as time. The result forB=2, T=1is[1, 2, 2, H, W]instead of[1, 3, 2, H, W]; forB=4, T=1it is[1, 4, 3, H, W]instead of[1, 3, 4, H, W]. The latent-side prep block immediately above (b, tc, h, w = z.shape; latent = z.view(b, 16, -1, h, w); ...) was already corrected upstream by commitaf6c5d6and is OUT OF SCOPE; this plan asserts it byte-identical to base as a stop-condition guard. The plan first records the pre-fix shape signature onpollockjj/ComfyUI:issue_101HEAD as a baseline decision packet, then lands the output-side fix onpollockjj/ComfyUI:issue_189together with a five-case regression test (B ∈ {1, 2, 4} × T_orig ∈ {1, 3, 5}) and a stacked-vs-individual per-sample-ordering test, re-runs the same probe to record the corrected shapes, applies a hygiene gate (ruff + production-code-shape independence), and emits a post-fix provenance decision packet. PR creation fromissue_189intoissue_101is delegated to the/prskill after slice completion.Diagnosis Summary
pollockjj/ComfyUI:issue_101HEAD4e8836ed, callingVideoAutoencoderKLWrapper.decode(z)withz.shape == (B, 16, H, W)(image-mode latent, T_orig=1) forB > 1returns a tensor whose final shape contains the batch countBin the channel axis and the channel countC(3 for RGB) in the time axis. Live probe (Phase 0):B=2, T_orig=1→ output shape(1, 2, 2, 16, 16)(correct shape would be(1, 3, 2, 16, 16));B=4, T_orig=1→ output shape(1, 4, 3, 16, 16)(correct shape would be(1, 3, 4, 16, 16)). The per-batch fingerprint values that the probe injects via the mockeddecode_(float(b + 1)for batch indexb) appear in the size-2 (resp. size-4) axis instead of the size-2 (resp. size-4) time axis they belong in./home/johnj/dev_cuda_1/ComfyUIon issue_101 HEAD with the ComfyUI venv active: importcomfy.ldm.seedvr.vae, monkey-patchVideoAutoencoderKL.decode_to a fingerprint-tagged stub returning[B, 3, T_dec, H_in*8, W_in*8]filled withfloat(b + 1)per batch index, monkey-patchlab_color_transferto a passthrough, construct a wrapper instance viaVideoAutoencoderKLWrapper.__new__(VideoAutoencoderKLWrapper)+nn.Module.__init__(instance)withtiled_args = {"enable_tiling": False},original_image_video = torch.zeros(B, 3, T_orig, 16, 16),img_dims = (16, 16), then callwrapper.decode(torch.zeros(B, 16*T_orig, 2, 2)). For(B=2, T_orig=1)the output shape is(1, 2, 2, 16, 16); for(B=4, T_orig=1)the output shape is(1, 4, 3, 16, 16). Live-probed Phase 0; bug reproduces deterministically.decode_block inVideoAutoencoderKLWrapper.decodeapplies.squeeze(2)to the 5D[B, C, T_dec, H, W]decoder output. WhenT_dec == 1the tensor collapses to[B, C, H, W](4D). The next gateif x.ndim == 4: x = x.unsqueeze(0)adds a new outer axis at position 0, producing[1, B, C, H, W]. The subsequent gateif x.size(1) == 1: exp = "b t c h w -> (b t) c h w" else: exp = "b c t h w -> (b t) c h w"then mis-interprets the axes forB > 1(becausex.size(1) == B != 1), routing throughb c t h w -> (b t) c h wwhich treats the new outer 1 asb, the batch countBasc, and the channel countCast. The final reshape to[1, C, B*T_orig, H, W]therefore produces[1, B, C, H, W](axes swapped). TheB == 1, T_orig == 1path accidentally works becausex.size(1) == 1matches theb t c h w -> (b t) c h wexpression which collapses the leading two1s to a single1. TheT_orig > 1paths work because.squeeze(2)is a no-op whenT_dec > 1and the tensor stays 5D, bypassing the buggyunsqueeze(0)branch entirely.B > 1AND (b) the decoder's temporal outputT_dec == 1(image-mode reconstruction). TheT_orig > 1paths and theB == 1, T_orig == 1path are unaffected — they have their own complete coverage in the existing tile/non-tile decode paths and are exercised by the existing test surface (tests-unit/comfy_test/test_seedvr_vae_tiled_args_no_mutate.pyfor tiled-args invariant; the liveB=1, T=5andB=2, T=3probe results show the working shapes).decode_invariant that the tensor remains 5D[B, C, T_dec, H, W](or equivalently[B, C, T_out, H, W]after time-dim trimming) through the entire post-processing pipeline regardless ofT_dec, so that the per-sample axis stays at position 0 and the channel axis stays at position 1 before the finalb t c h w -> b c t h wreshape. The plan does not prescribe the exact source-line edit (the slicer chooses); the plan asserts the observable post-fix behavior (output shape(1, 3, B*T_orig, H, W)for every cell; per-sample fingerprint at the post-fix-correct batch position) and the structural stop-condition guard (latent-side prep block lines unchanged from base).issue_101HEAD (no fix applied) and records per-(B, T_orig)-cell shape, dtype, and per-batch fingerprint axis value into a JSON named-key artifact, capturing the buggy shapes for theB > 1, T_orig == 1cells. Slice 2 lands the source fix onpollockjj/ComfyUI:issue_189, adds the regression test module, re-runs the same probe to record the corrected shapes, and emits a before/after delta artifact whoseshape_mismatch_countnamed key drops from≥ 2to0. Slice 3 runs ruff and a no-detuning grep over the touched files. Slice 4 binds the issue branch HEADs, canonical-reference URLs, and verbatim invocation commands into an architect-readable provenance markdown.Affected Repositories
pollockjj/mydevelopment/home/johnj/dev_cuda_1/mydevelopmentmainissue_189pollockjj/ComfyUI/home/johnj/dev_cuda_1/ComfyUIissue_101issue_189Research and Methodology
Plan Foundations Comment URL: https://github.com/pollockjj/mydevelopment/issues/189#issuecomment-4368270401
Detected Scope:
none— single-block shape-handling fix inVideoAutoencoderKLWrapper.decodewhere the existing workingB==1, T==1,B==1, T>1, andB>1, T>1paths ARE the spec; the patch aligns the brokenB>1, T==1path to that same spec without introducing any new metric or methodology.The linked comment is the load-bearing artifact for every measurement, equivalence rule, canonical-protocol, tool version, and pipeline-stage citation in this plan. Every AC below that names a metric, threshold, equivalence rule, canonical protocol, tool version, or pipeline stage traces by quote or URL to an entry in that comment's
## Research and Methodologysection. qa-plan verifies the comment exists and contains the required subsections for the detected scope.Tools, Pipeline, and Measurements
Plan Foundations Comment URL: https://github.com/pollockjj/mydevelopment/issues/189#issuecomment-4368270401
The linked comment's
## Tools, Pipeline, and Measurementssection enumerates: (1) Existing Tooling Inventory withREUSE/KNOWN-GOOD-REF/NEW/WAIVEDstatus per tool; (2) Pipeline stages from input to output artifact; (3) Measurements table with tool+version, invocation, score range, tolerance; (4) Single-probe-before-sweep / boundary-bracketing requirements (bothN/Afor this plan and explicitly justified). Every AC below that names a tool, version, pipeline stage, or measurement traces by quote or URL to an entry in that section.Ground Truth Probes
For every literal API surface that any AC references — attribute, method, function signature, parameter list, CLI flag, file path, env var, magic constant — captured live on the issue's base branch
pollockjj/ComfyUI:issue_101(HEAD4e8836ed) on 2026-05-04 from/home/johnj/dev_cuda_1/ComfyUI. Plan-introduced literals (probe JSON keys, test names, paths created by this plan) are anchored separately in## Created Surface Contract, not here.cd /home/johnj/dev_cuda_1/ComfyUI && python -c "from comfy.cli_args import args; args.cpu=True; import comfy.ldm.seedvr.vae as v, inspect; print(inspect.signature(v.VideoAutoencoderKLWrapper.decode)); print(inspect.signature(v.VideoAutoencoderKL.decode_)); print([c.__name__ for c in v.VideoAutoencoderKLWrapper.__mro__])"(self, z)/(self, z: torch.Tensor, return_dict: bool = True)/['VideoAutoencoderKLWrapper', 'VideoAutoencoderKL', 'Module', 'object']cd /home/johnj/dev_cuda_1/ComfyUI && python -c "from comfy.cli_args import args; args.cpu=True; import comfy.ldm.seedvr.vae as v; print(v.lab_color_transfer); print(v.rearrange); print(v.tiled_vae)"<function lab_color_transfer at 0x...>/<function rearrange at 0x...>/<function tiled_vae at 0x...>cd /home/johnj/dev_cuda_1/ComfyUI && python -c "from comfy.cli_args import args; print(type(args.cpu).__name__, args.cpu); args.cpu=True; print(args.cpu)"bool False/Truecd /home/johnj/dev_cuda_1/ComfyUI && python -c "from comfy.cli_args import args; args.cpu=True; import comfy.ldm.seedvr.vae as v, torch.nn as nn, torch; w=v.VideoAutoencoderKLWrapper.__new__(v.VideoAutoencoderKLWrapper); nn.Module.__init__(w); print(type(w).__name__); print(hasattr(w, 'tiled_args')); print(hasattr(w, 'original_image_video')); print(hasattr(w, 'img_dims'))"VideoAutoencoderKLWrapper/False/False/False(these attributes are populated in the wrapper's__init__; the test sets them manually after__new__+nn.Module.__init__)cd /home/johnj/dev_cuda_1/ComfyUI && pythonfollowed by the inline reproduction script described in §Diagnosis Summary / Reproduction. Live raw outputs:(B=1, T_orig=1) → shape (1, 3, 1, 16, 16);(B=1, T_orig=5) → shape (1, 3, 5, 16, 16);(B=2, T_orig=1) → shape (1, 2, 2, 16, 16);(B=4, T_orig=1) → shape (1, 4, 3, 16, 16);(B=2, T_orig=3) → shape (1, 3, 6, 16, 16).pollockjj/ComfyUI:issue_101HEAD4e8836ed). The cells(B=2,T_orig=1)and(B=4,T_orig=1)exhibit the bug: the channel position 1 holds the batch count instead of3, and the time position 2 holds the RGB count instead of the batch count.git -C /home/johnj/dev_cuda_1/ComfyUI rev-parse origin/issue_101; git -C /home/johnj/dev_cuda_1/ComfyUI ls-tree origin/issue_101 -- comfy/ldm/seedvr/vae.py4e8836ed0a53467e9c433d58320c4992c9c34d2d/100644 blob b3f0f5d719862ae119ebc037ab104bca7785ce71 comfy/ldm/seedvr/vae.pyls /home/johnj/dev_cuda_1/ComfyUI/tests-unit/comfy_test/folder_path_test.py/model_detection_test.py/__pycache__/test_seedvr_groupnorm_limit.py/test_seedvr_rope_delegation.py/test_seedvr_vae_tiled_args_no_mutate.pygrep -nE 'args\.cpu = True|torch\.cuda\.is_available|patch\.object' /home/johnj/dev_cuda_1/ComfyUI/tests-unit/comfy_test/test_seedvr_groupnorm_limit.py41:from unittest.mock import patch/48:if not torch.cuda.is_available():/49: cli_args.cpu = True/104: with patch.object(vae_mod.F, "group_norm", side_effect=_group_norm_spy):/157: with patch.object(vae_mod.F, "group_norm", side_effect=_group_norm_spy):cd /home/johnj/dev_cuda_1/ComfyUI && .venv/bin/pip list 2>/dev/null | grep -E "^(torch|einops)\\b"; .venv/bin/python -m pytest --version 2>&1 | head -1einops 0.8.2/torch 2.11.0+cu130/pytest 9.0.3cat /home/johnj/dev_cuda_1/ComfyUI/pytest.ini[pytest]/markers =/inference: mark as inference test (deselect with '-m "not inference"')/execution: mark as execution test (deselect with '-m "not execution"')/testpaths =/tests/tests-unit/addopts = -s/pythonpath = .git -C /home/johnj/dev_cuda_1/mydevelopment rev-parse HEAD; git -C /home/johnj/dev_cuda_1/mydevelopment rev-parse --abbrev-ref HEAD1e8218e23d2616060e41bf710bc196e6ea81fd4a/issue_189sed -n '2245,2295p' /home/johnj/dev_cuda_1/ComfyUI/comfy/ldm/seedvr/vae.py(issue_101 HEAD source)def decode(self, z):body starting at line 2245 — latent-side prep block at 2245-2253 (b, tc, h, w = z.shape/latent = z.view(b, 16, -1, h, w)/scale = 0.9152/shift = 0/latent = latent / scale + shift/ blank /if latent.ndim == 4:/latent = latent.unsqueeze(2)); output-side block at 2255-2294 startingself.device = latent.deviceand ending withreturn xafter theb t c h w -> b c t h wrearrange and the even-dims trim.ruffcd /home/johnj/dev_cuda_1/ComfyUI && python -m ruff --version; which ruff; git -C /home/johnj/dev_cuda_1/ComfyUI ls-tree origin/issue_101 -- .github/workflows/ruff.ymlruff 0.15.9//home/johnj/.local/bin/ruff/100644 blob b24d86a6ba55b1f7b90539d6267a72afa5a7b73c .github/workflows/ruff.ymlIf a planned literal cannot be probed (slot offline, source absent, tool missing) the AC is rewritten to remove the unverified literal or the slice is rescoped. No probe → no AC.
Created Surface Contract
These literals are introduced by this plan and therefore are NOT subject to Phase 0.5 ground-truth probing. They become probable after the slice that creates them lands.
Probe-script API (Slice 1 introduces; Slice 2 reuses):
--out PATHCLI flag (writes UTF-8 JSON toPATH);--bt-cells PATH_OR_LITERALCLI flag (semicolon-separated list ofB,T_originteger pairs, e.g."1,1;1,5;2,1;4,1;2,3"; the probe iterates each cell and records its result);--label LABELCLI flag (string written into the JSON'slabelkey, e.g."baseline"or"post_fix"); script exit code0on success; non-zero on any cell raising an unhandled exception.Probe JSON named keys (Slice 1 introduces baseline; Slice 2 emits post-fix): Top-level:
label(string),mydevelopment_head_sha(string),comfyui_head_sha(string),comfyui_branch(string),vae_blob_sha(string fromgit ls-tree HEAD -- comfy/ldm/seedvr/vae.py),bt_cells(list of{B, T_orig, T_dec, out_shape, out_dtype, channel_axis_value, time_axis_value, fingerprint_at_correct_batch_position}objects). Per-cell named keys:B(int),T_orig(int),T_dec(int),out_shape(list of int),out_dtype(string e.g."torch.float32"),channel_axis_value(int —out_shape[1]),time_axis_value(int —out_shape[2]),fingerprint_at_correct_batch_position(list of float, lengthB*T_orig, value at indexiisout[0, 0, i, 0, 0].item(); for the buggy cells this list is corrupted; for the correct cells it equals[float(b//T_orig + 1) for b in range(B*T_orig)]interleaved byT_orig).Comparison JSON named keys (Slice 2 introduces):
prior_artifact_path(string),new_artifact_path(string),prior_label(string),new_label(string),shape_mismatch_count_prior(int — number of cells whoseout_shapediffers from the per-cell expected(1, 3, B*T_orig, 16, 16)),shape_mismatch_count_new(int),per_cell_delta(list of{B, T_orig, prior_out_shape, new_out_shape, expected_out_shape, prior_pass, new_pass}objects).Test names (Slice 2 introduces in
tests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.py):test_decode_b1_t1_shape_and_ordering_correct,test_decode_b1_t5_video_shape_unchanged,test_decode_b2_t1_fixes_batch_time_axes,test_decode_b4_t1_fixes_batch_time_axes,test_decode_b2_t3_multi_frame_batch_unchanged,test_decode_b2_t1_stacked_equals_individual_per_sample_ordering. All six functions live in moduletests_unit.comfy_test.test_seedvr_vae_decode_batch_axes.File paths (Slices 1–4 introduce):
pollockjj/mydevelopment#issue_189:github_issues/189/probe_seedvr_decode_batch_axes.py(S1)github_issues/189/slice1/decode_batch_axes_baseline.json(S1)github_issues/189/slice1/probe_run.log(S1)github_issues/189/slice1/probe_argparse_flags.log(S1 — grep evidence proving the probe script defines exactly the three CLI flags and zero extras)github_issues/189/slice1/baseline_json_keys.log(S1 —python -m json.toolparse + sorted-key dump proving the baseline JSON's top-level and per-cell named-key sets match the contract verbatim)github_issues/189/slice1/provenance.md(S1)github_issues/189/slice2/decode_batch_axes_post_fix.json(S2)github_issues/189/slice2/before_after_comparison.json(S2)github_issues/189/slice2/probe_run.log(S2)github_issues/189/slice2/pytest_decode_batch_axes.log(S2)github_issues/189/slice2/pytest_seedvr_regression_guard.log(S2)github_issues/189/slice2/test_module_internals.log(S2 — grep evidence proving the regression test module's six function definitions and the per-test assertion shape constants match the AC contract verbatim)github_issues/189/slice2/vae_diff.patch(S2)github_issues/189/slice2/latent_side_unchanged.log(S2)github_issues/189/slice3/ruff.log(S3)github_issues/189/slice3/no_detuning.log(S3)github_issues/189/slice4/provenance.md(S4)pollockjj/ComfyUI#issue_189:comfy/ldm/seedvr/vae.py(modified — output-side block ofVideoAutoencoderKLWrapper.decodeonly; latent-side prep block lines 2245-2253 byte-identical to base) (S2)tests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.py(S2 introduces)Latent-side guard token set (Slice 2 AC-1 protects these tokens; the diff MUST NOT add or remove lines containing these tokens):
b, tc, h, w = z.shape;latent = z.view(b, 16, -1, h, w);scale = 0.9152;shift = 0;latent = latent / scale + shift;if latent.ndim == 4:;latent = latent.unsqueeze(2). These seven tokens appear consecutively as the body of the latent-side prep block onpollockjj/ComfyUI:issue_101HEAD per GTP-12.Asset Readiness
pollockjj/ComfyUI:issue_101HEAD4e8836ed/home/johnj/dev_cuda_1/ComfyUIgit rev-parse origin/issue_101returns4e8836ed0a53467e9c433d58320c4992c9c34d2dcomfy/ldm/seedvr/vae.py(VideoAutoencoderKLWrapper.decodebody at lines 2245-2295; latent-side prep at 2245-2253; output-side block at 2255-2294)/home/johnj/dev_cuda_1/ComfyUI/comfy/ldm/seedvr/vae.py(blobb3f0f5d7)issue_101HEADcomfy.ldm.seedvr.vae.VideoAutoencoderKL.decode_parent-class methodissue_101HEADcomfy.ldm.seedvr.vae.lab_color_transfermodule-level functionissue_101HEADcomfy.ldm.seedvr.vae.rearrange(re-export ofeinops.rearrange)issue_101HEADtests-unit/comfy_test/and existing seedvr tests/home/johnj/dev_cuda_1/ComfyUI/tests-unit/comfy_test/issue_101HEADtests-unit/comfy_test/test_seedvr_groupnorm_limit.py(args.cpu = Trueblock beforecomfy.ldm.*import;unittest.mock.patch.objectusage)/home/johnj/dev_cuda_1/ComfyUI/tests-unit/comfy_test/test_seedvr_groupnorm_limit.pyissue_101HEADtorch==2.11.0+cu130,einops==0.8.2,pytest==9.0.3)/home/johnj/dev_cuda_1/ComfyUI/.venvruff0.15.9 binary/home/johnj/.local/bin/ruff(also resolvable viapython -m ruffagainst the deliverable venv)python -m ruff --versionfrom/home/johnj/dev_cuda_1/ComfyUI; ComfyUI CI invokes the same binary atpollockjj/ComfyUI:issue_101HEAD.github/workflows/ruff.yml(blobb24d86a6ba55b1f7b90539d6267a72afa5a7b73c, line 24run: ruff check .)comfy.cli_args.args.cpuflag/home/johnj/dev_cuda_1/ComfyUI/comfy/cli_args.pyissue_101HEADpytest.inidiscovery config (pythonpath = .,testpaths = tests tests-unit,addopts = -s)/home/johnj/dev_cuda_1/ComfyUI/pytest.iniissue_101HEADpollockjj/mydevelopment:mainparent context HEAD1e8218e2/home/johnj/dev_cuda_1/mydevelopmentIC_kwDOR2e1q88AAAABBF6EQQtdd-agent[bot]post 2026-05-04T04:37:08Z, edited 2026-05-04T05:28:59ZNo external assets (model weights, video/image datasets, OS packages) are required. The plan is CPU-only by construction.
Slices
Slice 1: Pre-fix Baseline Reproduction Probe
Kind: decision-packet
Objective: Produce the baseline reproduction decision packet recording the buggy output shape and per-batch fingerprint position for every
(B, T_orig)cell in the bug taxonomy onpollockjj/ComfyUI:issue_101HEAD4e8836edbefore any production fix is applied.Acceptance Criteria
AC-1: File
github_issues/189/probe_seedvr_decode_batch_axes.pyis committed onpollockjj/mydevelopment:issue_189and is a Python module that exposes a--out PATHCLI flag, a--bt-cells STRCLI flag (semicolon-separatedB,T_originteger pairs), a--label STRCLI flag, exits 0 on success, and writes a UTF-8 JSON file toPATHcontaining exactly the named-key set{label, mydevelopment_head_sha, comfyui_head_sha, comfyui_branch, vae_blob_sha, bt_cells}with eachbt_cellsentry containing exactly the per-cell named-key set{B, T_orig, T_dec, out_shape, out_dtype, channel_axis_value, time_axis_value, fingerprint_at_correct_batch_position}— verified by (a) committed artifactgithub_issues/189/slice1/probe_argparse_flags.logcapturing the stdout ofgrep -nE "add_argument\(['\"](--out\|--bt-cells\|--label)['\"]" /home/johnj/dev_cuda_1/mydevelopment/github_issues/189/probe_seedvr_decode_batch_axes.pyreturning exactly three matching lines (one per flag) and zero extraadd_argument(lines outside that set, with the log's final line recordingARGPARSE_FLAGS_EXIT: 0; (b) committed artifactgithub_issues/189/slice1/baseline_json_keys.logcapturing the stdout ofpython3 -c 'import json,sys; d=json.load(open(sys.argv[1])); print("TOP_KEYS:", sorted(d.keys())); print("CELL_KEYS:", sorted(d["bt_cells"][0].keys())); print("CELL_COUNT:", len(d["bt_cells"]))' /home/johnj/dev_cuda_1/mydevelopment/github_issues/189/slice1/decode_batch_axes_baseline.jsonwhose three printed lines are verbatimTOP_KEYS: ['bt_cells', 'comfyui_branch', 'comfyui_head_sha', 'label', 'mydevelopment_head_sha', 'vae_blob_sha'],CELL_KEYS: ['B', 'T_dec', 'T_orig', 'channel_axis_value', 'fingerprint_at_correct_batch_position', 'out_dtype', 'out_shape', 'time_axis_value'], andCELL_COUNT: 5, with the log's final line recordingJSON_KEYS_EXIT: 0; (c) committed artifactgithub_issues/189/slice1/probe_run.logcapturing the stdout+stderr of the AC-2 probe invocation including the verbatim exit-status linePROBE_EXIT: 0; AND (d) the JSON artifact existing at the path declared in AC-2.(B, T_orig)cell in the bug taxonomy onpollockjj/ComfyUI:issue_101HEAD4e8836edbefore any production fix is applied."AC-2: Probe is invoked as
cd /home/johnj/dev_cuda_1/ComfyUI && python3 /home/johnj/dev_cuda_1/mydevelopment/github_issues/189/probe_seedvr_decode_batch_axes.py --out /home/johnj/dev_cuda_1/mydevelopment/github_issues/189/slice1/decode_batch_axes_baseline.json --bt-cells "1,1;1,5;2,1;4,1;2,3" --label baselineagainst the issue branch's working tree at HEAD (which is byte-identical toorigin/issue_101HEAD4e8836edbecause no source fix has been applied in Slice 1) and the resultinggithub_issues/189/slice1/decode_batch_axes_baseline.jsonis committed onpollockjj/mydevelopment:issue_189with the per-cellout_shapenamed-key values: cellB=1, T_orig=1→[1, 3, 1, 16, 16]; cellB=1, T_orig=5→[1, 3, 5, 16, 16]; cellB=2, T_orig=1→[1, 2, 2, 16, 16]; cellB=4, T_orig=1→[1, 4, 3, 16, 16]; cellB=2, T_orig=3→[1, 3, 6, 16, 16]. Thelabelvalue is"baseline". Thecomfyui_head_shavalue is"4e8836ed0a53467e9c433d58320c4992c9c34d2d". Thevae_blob_shavalue is"b3f0f5d719862ae119ebc037ab104bca7785ce71". Verified by committed artifactgithub_issues/189/slice1/decode_batch_axes_baseline.jsonparsed and key-checked, plus committed artifactgithub_issues/189/slice1/probe_run.logcapturing the probe's stdout/stderr.(B, T_orig)cell in the bug taxonomy onpollockjj/ComfyUI:issue_101HEAD4e8836edbefore any production fix is applied."AC-3: File
github_issues/189/slice1/provenance.mdis committed onpollockjj/mydevelopment:issue_189containing exactly the named-line keysdecision_summary:,recommended_action:,mydevelopment_head_sha:,comfyui_head_sha:,comfyui_branch:,vae_blob_sha:,coderabbit_url:,upstream_pr_url:,parent_findings_comment_url:,probe_command:,probe_output_path:,bug_cells_observed:, withdecision_summaryvaluebaseline_b_gt_1_t_eq_1_decode_shape_corrupted,recommended_actionvalueapply_output_side_fix_on_pollockjj/ComfyUI_issue_189,comfyui_head_shavalue4e8836ed0a53467e9c433d58320c4992c9c34d2d,comfyui_branchvalueissue_101,vae_blob_shavalueb3f0f5d719862ae119ebc037ab104bca7785ce71,coderabbit_urlvaluehttps://github.com/Comfy-Org/ComfyUI/pull/11294#discussion_r2959796352,upstream_pr_urlvaluehttps://github.com/Comfy-Org/ComfyUI/pull/11294,parent_findings_comment_urlvaluehttps://github.com/pollockjj/mydevelopment/issues/101#issuecomment-4305426643,probe_commandvalue matching the AC-2 command verbatim,probe_output_pathvaluegithub_issues/189/slice1/decode_batch_axes_baseline.json,bug_cells_observedvalueB=2,T_orig=1 -> shape (1,2,2,16,16); B=4,T_orig=1 -> shape (1,4,3,16,16)— verified by committed artifactgithub_issues/189/slice1/provenance.mdandgrep '^<key>: 'parseability against the listed key set.(B, T_orig)cell in the bug taxonomy onpollockjj/ComfyUI:issue_101HEAD4e8836edbefore any production fix is applied."Slice 2: Output-side Shape Fix, Regression Test Module, Post-Fix Probe and Comparison
Kind: implementation
Objective: Prove that modifying the output-side block of
VideoAutoencoderKLWrapper.decodeincomfy/ldm/seedvr/vae.pymakes batch and time axes distinct for every(B, T_orig)cell includingB>1, T_orig==1while leaving the latent-side prep block byte-identical to base, by landing the source change onpollockjj/ComfyUI:issue_189, adding a six-test regression module that fails pre-fix on theB>1, T_orig==1cells, re-running the same probe to record corrected shapes, and emitting a before/after delta artifact whoseshape_mismatch_count_newnamed-key is0whileshape_mismatch_count_prioris2.Acceptance Criteria
comfy/ldm/seedvr/vae.pyonpollockjj/ComfyUI:issue_189HEAD has a non-empty diff againstorigin/issue_101restricted to the output-side block ofVideoAutoencoderKLWrapper.decode(the lines starting atself.device = latent.devicethrough the finalreturn xafter the even-dims trim, per GTP-12); the latent-side prep block tokens enumerated in## Created Surface Contract(the seven-token setb, tc, h, w = z.shape;latent = z.view(b, 16, -1, h, w);scale = 0.9152;shift = 0;latent = latent / scale + shift;if latent.ndim == 4:;latent = latent.unsqueeze(2)) appear in zero+lines and zero-lines of the diff — verified by committed artifactgithub_issues/189/slice2/vae_diff.patchcontaining the output ofgit -C /home/johnj/dev_cuda_1/ComfyUI diff origin/issue_101...HEAD -- comfy/ldm/seedvr/vae.py(non-zero size) AND committed artifactgithub_issues/189/slice2/latent_side_unchanged.logcontaining the output ofgrep -nE '^[+-][[:space:]]+(b, tc, h, w = z\.shape\|latent = z\.view\(b, 16, -1, h, w\)\|scale = 0\.9152\|shift = 0\|latent = latent / scale \+ shift\|if latent\.ndim == 4:\|latent = latent\.unsqueeze\(2\))$' /home/johnj/dev_cuda_1/mydevelopment/github_issues/189/slice2/vae_diff.patchreturning zero matching lines, with the log's final line recordingLATENT_SIDE_GUARD_EXIT: 0.VideoAutoencoderKLWrapper.decodeonissue_101HEAD appliessuper().decode_(latent).squeeze(2)followed byif x.ndim == 4: x = x.unsqueeze(0), producing[1, B, C, H, W]forB>1, T_dec==1per GTP-5; pre-fixgit diff origin/issue_101...HEAD -- comfy/ldm/seedvr/vae.pyis empty (no fix applied yet)vae_diff.patchis non-empty AND contains zero+/-lines matching the seven-token latent-side guard set; the modification is restricted to lines within the output-side block (per the GTP-12 line-range observation2255-2294)VideoAutoencoderKLWrapper.decodeincomfy/ldm/seedvr/vae.pymakes batch and time axes distinct for every(B, T_orig)cell includingB>1, T_orig==1while leaving the latent-side prep block byte-identical to base, by landing the source change onpollockjj/ComfyUI:issue_189, adding a six-test regression module that fails pre-fix on theB>1, T_orig==1cells, re-running the same probe to record corrected shapes, and emitting a before/after delta artifact whoseshape_mismatch_count_newnamed-key is0whileshape_mismatch_count_prioris2."AC-2: File
tests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.pyonpollockjj/ComfyUI:issue_189HEAD defines exactly the six functionstest_decode_b1_t1_shape_and_ordering_correct,test_decode_b1_t5_video_shape_unchanged,test_decode_b2_t1_fixes_batch_time_axes,test_decode_b4_t1_fixes_batch_time_axes,test_decode_b2_t3_multi_frame_batch_unchanged, andtest_decode_b2_t1_stacked_equals_individual_per_sample_ordering. Each per-cell test (a) setscomfy.cli_args.args.cpu = Truebefore importing anycomfy.ldm.*symbol whentorch.cuda.is_available()is False (matching the pattern attests-unit/comfy_test/test_seedvr_groupnorm_limit.py:48-49per GTP-8), (b) constructs a wrapper instance viaVideoAutoencoderKLWrapper.__new__(VideoAutoencoderKLWrapper)+nn.Module.__init__(instance)withtiled_args = {"enable_tiling": False},original_image_video = torch.zeros(B, 3, T_orig, 16, 16),img_dims = (16, 16)per the__init__-bypass pattern in §Diagnosis Summary / Reproduction (per GTP-4), (c) patchesVideoAutoencoderKL.decode_with a fingerprint-tagged stub returning[B, 3, T_dec, 16, 16]filled withfloat(b + 1)per batch index, (d) patchescomfy.ldm.seedvr.vae.lab_color_transferwith a passthrough that returns its first argument, (e) callswrapper.decode(torch.zeros(B, 16*T_orig, 2, 2)), and (f) assertstuple(out.shape) == (1, 3, B*T_orig, 16, 16)for the per-cellBandT_orig. Thetest_decode_b2_t1_fixes_batch_time_axestest additionally assertsout[0, 0, 0, 0, 0].item() == 1.0ANDout[0, 0, 1, 0, 0].item() == 2.0(per-sample fingerprint at the correct batch position). Thetest_decode_b4_t1_fixes_batch_time_axestest additionally asserts[out[0, 0, b, 0, 0].item() for b in range(4)] == [1.0, 2.0, 3.0, 4.0]. Thetest_decode_b2_t1_stacked_equals_individual_per_sample_orderingtest invokeswrapper.decode(torch.zeros(2, 16, 2, 2))(stacked, withoriginal_image_videoset totorch.zeros(2, 3, 1, 16, 16)) producingout_stacked, then resets the wrapper'soriginal_image_videototorch.zeros(1, 3, 1, 16, 16)and invokeswrapper.decode(torch.zeros(1, 16, 2, 2))twice with the per-call fingerprint stub returningfloat(b + 1)for the stacked indicesb ∈ {0, 1}(the second call's stub pinned to2.0), then assertstorch.equal(out_stacked[0, :, 0, :, :], out_individual_0[0, :, 0, :, :])ANDtorch.equal(out_stacked[0, :, 1, :, :], out_individual_1[0, :, 0, :, :]). All six tests pass undercd /home/johnj/dev_cuda_1/ComfyUI && python -m pytest -q tests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.pywith overall exit 0 and the summary line containing6 passed— verified by (a) committed artifactgithub_issues/189/slice2/test_module_internals.logcapturing seven greps over/home/johnj/dev_cuda_1/ComfyUI/tests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.pyagainst the issue_189 HEAD source: (i)grep -cE '^def (test_decode_b1_t1_shape_and_ordering_correct\|test_decode_b1_t5_video_shape_unchanged\|test_decode_b2_t1_fixes_batch_time_axes\|test_decode_b4_t1_fixes_batch_time_axes\|test_decode_b2_t3_multi_frame_batch_unchanged\|test_decode_b2_t1_stacked_equals_individual_per_sample_ordering)\(' returning the literal6; (ii)grep -cE '^def test_'returning the literal6(proves no extra test functions); (iii)grep -nE 'tuple(out.shape) == (1, 3, 1, 16, 16)|tuple(out.shape) == (1, 3, 5, 16, 16)|tuple(out.shape) == (1, 3, 2, 16, 16)|tuple(out.shape) == (1, 3, 4, 16, 16)|tuple(out.shape) == (1, 3, 6, 16, 16)'returning at least five lines (one per per-cell shape assertion); (iv)grep -nE 'out[0, 0, 0, 0, 0].item() == 1.0|out[0, 0, 1, 0, 0].item() == 2.0'returning at least two matching lines (the b2_t1 fingerprint asserts); (v)grep -nE '[out[0, 0, b, 0, 0].item() for b in range(4)] == [1.0, 2.0, 3.0, 4.0]'returning at least one matching line (the b4_t1 fingerprint assert); (vi)grep -nE 'torch.equal(out_stacked[0, :, 0, :, :], out_individual_0[0, :, 0, :, :])|torch.equal(out_stacked[0, :, 1, :, :], out_individual_1[0, :, 0, :, :])'returning at least two matching lines (the stacked-vs-individual asserts); (vii)grep -nE 'args.cpu = True|VideoAutoencoderKLWrapper.new|nn.Module.init|patch.object(.*decode_|patch.object(.*lab_color_transfer'returning at least one match per fixture-pattern line (cpu flag,newbypass,nn.Module.initbypass,decode_patch,lab_color_transferpatch), with the log's final line recordingTEST_INTERNALS_EXIT: 0; **(b)** committed artifactgithub_issues/189/slice2/pytest_decode_batch_axes.logcapturing stdout+stderr ofcd /home/johnj/dev_cuda_1/ComfyUI && python -m pytest -q tests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.pyincluding the6 passedsummary line and the verbatim final-linePYTEST_DECODE_BATCH_AXES_EXIT: 0`.test_decode_b2_t1_fixes_batch_time_axesandtest_decode_b4_t1_fixes_batch_time_axestests againstissue_101HEAD (no fix) causes both to FAIL withAssertionError: tuple(out.shape) == (1, 3, 2, 16, 16)(got(1, 2, 2, 16, 16)) andtuple(out.shape) == (1, 3, 4, 16, 16)(got(1, 4, 3, 16, 16)) respectively, per the live shapes observed in GTP-5; thetest_decode_b2_t1_stacked_equals_individual_per_sample_orderingtest fails on thetorch.equalassertion because the stacked tensor's batch axis has been swapped with channels; pytest summary contains at minimum3 failed6 passed; per-celltuple(out.shape)equals(1, 3, B*T_orig, 16, 16)for every cell; per-sample fingerprint atout[0, 0, b, 0, 0]equalsfloat(b + 1)for theB>1, T_orig=1cells;torch.equal(out_stacked[0, :, b, :, :], out_individual_b[0, :, 0, :, :])holds forb ∈ {0, 1}VideoAutoencoderKLWrapper.decodeincomfy/ldm/seedvr/vae.pymakes batch and time axes distinct for every(B, T_orig)cell includingB>1, T_orig==1while leaving the latent-side prep block byte-identical to base, by landing the source change onpollockjj/ComfyUI:issue_189, adding a six-test regression module that fails pre-fix on theB>1, T_orig==1cells, re-running the same probe to record corrected shapes, and emitting a before/after delta artifact whoseshape_mismatch_count_newnamed-key is0whileshape_mismatch_count_prioris2."AC-3: Existing SeedVR2 unit-test modules
tests-unit/comfy_test/test_seedvr_groupnorm_limit.py,tests-unit/comfy_test/test_seedvr_vae_tiled_args_no_mutate.py, andtests-unit/comfy_test/test_seedvr_rope_delegation.py(per GTP-7) all pass undercd /home/johnj/dev_cuda_1/ComfyUI && python -m pytest -q tests-unit/comfy_test/test_seedvr_groupnorm_limit.py tests-unit/comfy_test/test_seedvr_vae_tiled_args_no_mutate.py tests-unit/comfy_test/test_seedvr_rope_delegation.pyonpollockjj/ComfyUI:issue_189HEAD with overall exit 0 and the summary line containingpassed(nofailed, noerrors) — verified by committed artifactgithub_issues/189/slice2/pytest_seedvr_regression_guard.logcapturing stdout+stderr of that pytest invocation, with the log's penultimate or final line matching the regex^[0-9]+ passed( in [0-9.]+s)?$and zero occurrences of the substringfailedorerrorsoutside the verbatim summary line context.B>1, T_orig==1decode path, so they all pass onissue_101HEAD; this AC asserts they continue to pass post-fix (regression guard against the output-side block edit unintentionally breaking another path)passedcount and zerofailed/errors; the existing tests are unaffected by the output-side block change;code-behavior-equivalence(perequivalence_methods.md) holds for the existing-test surfaceVideoAutoencoderKLWrapper.decodeincomfy/ldm/seedvr/vae.pymakes batch and time axes distinct for every(B, T_orig)cell includingB>1, T_orig==1while leaving the latent-side prep block byte-identical to base, by landing the source change onpollockjj/ComfyUI:issue_189, adding a six-test regression module that fails pre-fix on theB>1, T_orig==1cells, re-running the same probe to record corrected shapes, and emitting a before/after delta artifact whoseshape_mismatch_count_newnamed-key is0whileshape_mismatch_count_prioris2."AC-4: Probe is invoked as
cd /home/johnj/dev_cuda_1/ComfyUI && python3 /home/johnj/dev_cuda_1/mydevelopment/github_issues/189/probe_seedvr_decode_batch_axes.py --out /home/johnj/dev_cuda_1/mydevelopment/github_issues/189/slice2/decode_batch_axes_post_fix.json --bt-cells "1,1;1,5;2,1;4,1;2,3" --label post_fixagainstpollockjj/ComfyUI:issue_189HEAD (post-fix from AC-1) and the resultinggithub_issues/189/slice2/decode_batch_axes_post_fix.jsonis committed onpollockjj/mydevelopment:issue_189with the per-cellout_shapenamed-key values: cellB=1, T_orig=1→[1, 3, 1, 16, 16]; cellB=1, T_orig=5→[1, 3, 5, 16, 16]; cellB=2, T_orig=1→[1, 3, 2, 16, 16]; cellB=4, T_orig=1→[1, 3, 4, 16, 16]; cellB=2, T_orig=3→[1, 3, 6, 16, 16]. Every cell'schannel_axis_valueequals3andtime_axis_valueequalsB*T_orig. Thelabelvalue is"post_fix". Thecomfyui_branchvalue is"issue_189". Thevae_blob_shavalue differs from the Slice 1 baselinevae_blob_shaofb3f0f5d719862ae119ebc037ab104bca7785ce71. Verified by committed artifactgithub_issues/189/slice2/decode_batch_axes_post_fix.jsonparsed and key-checked, plus committed artifactgithub_issues/189/slice2/probe_run.log.issue_101HEAD (Slice 1 AC-2 outcome) records cellB=2, T_orig=1out_shape=[1, 2, 2, 16, 16]and cellB=4, T_orig=1out_shape=[1, 4, 3, 16, 16]— both differ from the(1, 3, B*T_orig, 16, 16)post-fix expectation per the live observations in GTP-5out_shapeequals(1, 3, B*T_orig, 16, 16); every cell'schannel_axis_value == 3; every cell'stime_axis_value == B*T_orig; the workingB=1andB>1, T_orig>1cells remain bit-identical to baseline (out_shapeunchanged for cellsB=1, T_orig=1,B=1, T_orig=5,B=2, T_orig=3)VideoAutoencoderKLWrapper.decodeincomfy/ldm/seedvr/vae.pymakes batch and time axes distinct for every(B, T_orig)cell includingB>1, T_orig==1while leaving the latent-side prep block byte-identical to base, by landing the source change onpollockjj/ComfyUI:issue_189, adding a six-test regression module that fails pre-fix on theB>1, T_orig==1cells, re-running the same probe to record corrected shapes, and emitting a before/after delta artifact whoseshape_mismatch_count_newnamed-key is0whileshape_mismatch_count_prioris2."AC-5: File
github_issues/189/slice2/before_after_comparison.jsonis committed onpollockjj/mydevelopment:issue_189containing exactly the named-key set{prior_artifact_path, new_artifact_path, prior_label, new_label, shape_mismatch_count_prior, shape_mismatch_count_new, per_cell_delta}with valuesprior_artifact_path="github_issues/189/slice1/decode_batch_axes_baseline.json",new_artifact_path="github_issues/189/slice2/decode_batch_axes_post_fix.json",prior_label="baseline",new_label="post_fix",shape_mismatch_count_prior=2(cellsB=2, T_orig=1andB=4, T_orig=1),shape_mismatch_count_new=0, andper_cell_deltacontaining exactly five entries (one per cell) with each entry containing the named-key set{B, T_orig, prior_out_shape, new_out_shape, expected_out_shape, prior_pass, new_pass}and the per-cell values listed in AC-2 / AC-4 — verified by committed artifact at the named path parsed and key-checked.shape_mismatch_count_newcannot be0because no post-fix run has been performedshape_mismatch_count_prior == 2ANDshape_mismatch_count_new == 0AND everyper_cell_deltaentry hasnew_pass=true; the comparison JSON exists at the named path on the issue branchVideoAutoencoderKLWrapper.decodeincomfy/ldm/seedvr/vae.pymakes batch and time axes distinct for every(B, T_orig)cell includingB>1, T_orig==1while leaving the latent-side prep block byte-identical to base, by landing the source change onpollockjj/ComfyUI:issue_189, adding a six-test regression module that fails pre-fix on theB>1, T_orig==1cells, re-running the same probe to record corrected shapes, and emitting a before/after delta artifact whoseshape_mismatch_count_newnamed-key is0whileshape_mismatch_count_prioris2."Slice 3: Hygiene Validator Gate (Ruff Lint + Production-Code Shape Independence)
Kind: hygiene
Objective: Prove the Slice 2 commits leave
comfy/ldm/seedvr/vae.pyandtests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.pyruff-clean and free of any observability detuning (no addedprint/logging.debug/ sentinel comments) that could land as production-code noise alongside the output-side fix.Acceptance Criteria
AC-1:
cd /home/johnj/dev_cuda_1/ComfyUI && python -m ruff check comfy/ldm/seedvr/vae.py tests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.pyexits 0 againstpollockjj/ComfyUI:issue_189HEAD — verified by committed validator-output artifactgithub_issues/189/slice3/ruff.logcapturing stdout+stderr of that invocation and ending with theAll checks passed!line, with shell exit code recorded as0in the log's final lineRUFF_EXIT: 0.comfy/ldm/seedvr/vae.pyandtests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.pyruff-clean and free of any observability detuning (no addedprint/logging.debug/ sentinel comments) that could land as production-code noise alongside the output-side fix."AC-2: Slice 2 introduced zero observability detuning (
print,logging.debug,logging.info, sentinel comments, debug markers) incomfy/ldm/seedvr/vae.pyand intests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.py— verified by committed validator-output artifactgithub_issues/189/slice3/no_detuning.logcontaining the combined output of two greps: (a)grep -nE '^[+][[:space:]]*(print\(\|logging\.debug\|logging\.info\|# TODO\s*(?:DEBUG\|MARKER\|SENTINEL))' /home/johnj/dev_cuda_1/mydevelopment/github_issues/189/slice2/vae_diff.patchreturning no matches, AND (b)grep -nE '\bprint\(\|logging\.debug\|logging\.info\|# TODO\s*(?:DEBUG\|MARKER\|SENTINEL)' /home/johnj/dev_cuda_1/ComfyUI/tests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.pyreturning no matches, with the log's final line recordingNO_DETUNING_EXIT: 0.comfy/ldm/seedvr/vae.pyandtests-unit/comfy_test/test_seedvr_vae_decode_batch_axes.pyruff-clean and free of any observability detuning (no addedprint/logging.debug/ sentinel comments) that could land as production-code noise alongside the output-side fix."Slice 4: Post-Fix Provenance Decision Packet
Kind: decision-packet
Objective: Produce the post-fix provenance decision packet binding the issue branch HEADs, the canonical-reference URLs, and the verbatim invocation commands used by Slices 2 and 3 into a single architect-readable record.
Acceptance Criteria
github_issues/189/slice4/provenance.mdis committed onpollockjj/mydevelopment:issue_189containing exactly the named-line keysdecision_summary:,recommended_action:,mydevelopment_head_sha:,comfyui_head_sha:,comfyui_branch:,comfyui_base_sha:,comfyui_base_branch:,coderabbit_url:,upstream_pr_url:,parent_findings_comment_url:,probe_command:,probe_baseline_path:,probe_post_fix_path:,comparison_path:,regression_test_command:,seedvr_regression_guard_command:,ruff_command:, withdecision_summaryvaluepost_fix_b_gt_1_t_eq_1_decode_axes_distinct_restored,recommended_actionvalueinvoke_/pr_from_issue_189_to_issue_101,comfyui_base_shavalue4e8836ed0a53467e9c433d58320c4992c9c34d2d,comfyui_base_branchvalueissue_101,comfyui_branchvalueissue_189,coderabbit_urlvaluehttps://github.com/Comfy-Org/ComfyUI/pull/11294#discussion_r2959796352,upstream_pr_urlvaluehttps://github.com/Comfy-Org/ComfyUI/pull/11294,parent_findings_comment_urlvaluehttps://github.com/pollockjj/mydevelopment/issues/101#issuecomment-4305426643,probe_commandvalue matching the Slice 2 AC-4 command verbatim,probe_baseline_pathvaluegithub_issues/189/slice1/decode_batch_axes_baseline.json,probe_post_fix_pathvaluegithub_issues/189/slice2/decode_batch_axes_post_fix.json,comparison_pathvaluegithub_issues/189/slice2/before_after_comparison.json,regression_test_commandvalue matching the Slice 2 AC-2 pytest invocation verbatim,seedvr_regression_guard_commandvalue matching the Slice 2 AC-3 pytest invocation verbatim,ruff_commandvalue matching the Slice 3 AC-1 ruff invocation verbatim — verified by committed artifact at the named path andgrep '^<key>: 'parseability against the listed key set.Constraints
python3(resolved by the slicer's environment to the deliverable venv at/home/johnj/dev_cuda_1/ComfyUI/.venv/bin/pythonfor ComfyUI-touching commands; no AC hardcodes an absolute interpreter path).debug/run_debug.py(not exercised — this plan is CPU-only static-source plus pytest; no ComfyUI launch).--use-process-isolation --disable-cuda-malloc(not exercised — see above).pkill, norm -rf, nopython main.py.issue_189in bothpollockjj/mydevelopmentandpollockjj/ComfyUI, not on base branchesmain/issue_101.pytest 9.0.3in deliverable venv — REUSE; cited fromtools_register.md#pytestand start flag to not open webui at launch Comfy-Org/ComfyUI#187 plan TPM Inventory.ruff 0.15.9system binary — REUSE; ComfyUI CI invocation pattern ispython -m ruff check .per.github/workflows/ruff.yml.torch 2.11.0+cu130,einops 0.8.2in deliverable venv — REUSE.comfy.ldm.seedvr.vae.VideoAutoencoderKLWrapperandVideoAutoencoderKLin deliverable source — REUSE; constructable on CPU via__new__+nn.Module.__init__per GTP-4.comfy.ldm.seedvr.vae.VideoAutoencoderKL.decode_— REUSE (mocked in tests viaunittest.mock.patch.object).comfy.ldm.seedvr.vae.lab_color_transfer— REUSE (mocked in tests viaunittest.mock.patch.object).comfy.ldm.seedvr.vae.rearrange— REUSE (called insidedecode(); not patched).unittest.mock.patch.object(stdlib) — REUSE for measurement instrumentation.nn.Module.__init__(torch) — REUSE for the__init__-bypass test fixture pattern.ghCLI — REUSE for issue body update viascripts/run_tdd_post.py update-issue(PR creation is the/prskill's responsibility, not this plan).equivalence_methods.md#tensor-numerical-equivalence).torch.equal(binary; per the issue body's Acceptance Intent verbatim quote in the Plan Foundations comment R&M section).git diffartifact + regression test that fails pre-fix and passes post-fix (perequivalence_methods.md#code-behavior-equivalence) + Slice 1 baseline probe + Slice 2 post-fix probe + comparison delta JSON.vae_diff.patchagainst the seven-token guard set (zero matching+/-lines required).test_seedvr_groupnorm_limit.py,test_seedvr_vae_tiled_args_no_mutate.py,test_seedvr_rope_delegation.py) withcode-behavior-equivalencezero-new-failures tolerance.## Created Surface Contract;grep '^<key>:'parseable for key existence;python -m json.toolparseable for structural validity.git stashand without external decision deferral..gitignorerules for safe generated bulk, or discarded generated debris.git status --shortgates locally and must refuse to submit while any touched repo is dirty.repo_hygiene.md,repo_hygiene.txt, rawgit status --shorttranscripts, orCLEANmarkers as QA evidence; this plan deliberately omits any hygiene-evidence AC. QA verifies submitted commits, branch refs, and artifact presence through GitHub rather than trusting slicer-authored cleanliness transcripts.Out of Scope
comfy/ldm/seedvr/vae.pylines 2245-2253 (b, tc, h, w = z.shape; latent = z.view(b, 16, -1, h, w); ...). Per the issue body's Risks / Stop Conditions block, "Any code change to the latent-side prep block (lines 2245-2253) indicates plan drift — stop and surface." This block was already corrected upstream by commitaf6c5d6and is asserted byte-identical to base by Slice 2 AC-1.Comfy-Org/ComfyUI#11294. Per the issue body's Non-Goals block and Risks block ("Upstream PR feat: Add support For SeedVR2 (CORE-6) Comfy-Org/ComfyUI#11294 ismergeable=DIRTYand silent for two weeks; do not attempt to push fixes back toyousef-rafat:seedvr2from this issue."), upstream propagation is a separate later decision.#101Leg-1 sibling defects:#187GroupNorm memory limit (closed),#188frame count,#190/#192forward return contracts,#191latent metadata,#193cache clearing,#194decode state guards. Each owns its own scope and its own issue branch.debug/run_debug.pylaunches). The plan is CPU-only and CPU-deterministic by construction.numz/ComfyUI-SeedVR2_VideoUpscaler,ComfyUI-SeedVR2-Canonical, etc.). Only the in-treecomfy/ldm/seedvr/vae.pycarries this output-side defect.debug/run_debug.py,debug/reset_gpu.py,debug/harness_runtime.py, etc.). The plan exercises no harness path./prand/pr-reviewskills' responsibilities; this plan only lands committed work on issue branches and exits.if self.enable_tiling:branch indecode()). The bug only manifests in the non-tiled path (super().decode_(latent).squeeze(2)). Tiled-decode is covered by the existingtest_seedvr_vae_tiled_args_no_mutate.pyregression guard cited in Slice 2 AC-3.VideoAutoencoderKLWrapper.encode). The bug is in the decode output-side block only.Tracked in pollockjj/mydevelopment Comfy-Org#189 (bookkeeping PR pollockjj/mydevelopment#197).