fix: while/do-while loop condition reads stale iteration-0 step output by mnriem · Pull Request #2662 · github/spec-kit

mnriem · 2026-05-21T14:15:04Z

Summary

Fixes #2592 — while/do-while loop conditions always read the iteration-0 step output, ignoring updates from subsequent iterations.

Root cause

The engine namespaces nested step IDs per iteration (e.g., my-loop:my-gate:1) to preserve per-iteration history. However, the condition resolver (evaluate_condition → _resolve_dot_path) looks up step output by the original unprefixed key (steps.my-gate), which is only written during iteration 0 and never updated afterward.

Fix

Keep namespaced step IDs for execution (preserving unique log entries, state keys, and on_step_start callback IDs per iteration), but execute each body step individually and immediately alias its result back to the unprefixed key after it completes successfully. This ensures:

evaluate_condition() sees the latest iteration's output
Inter-step references within multi-step loop bodies work correctly (step B sees step A's output from the current iteration, not a stale previous one)
Namespaced history entries are preserved for debugging (parent:child:1, parent:child:2, etc.)
Pause/fail is checked after each step — aliasing only occurs for successfully completed steps
Fully backward-compatible — same namespaced key format, same log IDs, no changes to condition syntax or workflow YAML

Changes

src/specify_cli/workflows/engine.py — Execute loop body steps one at a time with namespaced IDs, aliasing each result back to the unprefixed key immediately after successful completion
tests/test_workflows.py — 5 new engine-level regression tests in TestWorkflowEngine:
- test_while_loop_condition_reads_latest_iteration (positive: loop stops early)
- test_do_while_loop_condition_reads_latest_iteration (positive: loop stops early)
- test_while_loop_runs_to_max_when_condition_stays_true (negative: loop exhausts max_iterations)
- test_do_while_loop_runs_to_max_when_condition_stays_true (negative: loop exhausts max_iterations)
- test_while_loop_multi_step_body_inter_step_refs (multi-step body: step B sees step A's current-iteration output)

Testing

All 2998 existing + new tests pass (2998 passed, 1 skipped, 0 failures).

After executing namespaced loop body steps, copy each iteration's results back to the original unprefixed step key so that evaluate_condition() sees the latest values instead of stale iteration-0 data. Fixes github#2592

Copilot

Pull request overview

Fixes a workflow-engine bug where while / do-while loop conditions kept reading stale iteration-0 nested-step outputs by copying the latest namespaced nested-step results back onto the original (unprefixed) step IDs after each loop iteration.

Changes:

Update loop execution in the workflow engine to map namespaced nested-step IDs back to original IDs so condition evaluation sees the latest outputs.
Add regression tests covering early loop termination and max-iteration exhaustion for both while and do-while.

Show a summary per file

File	Description
`src/specify_cli/workflows/engine.py`	Copies namespaced loop-iteration nested-step results back to the original step IDs so conditions read the latest values.
`tests/test_workflows.py`	Adds regression tests ensuring loop conditions observe updated nested-step outputs across iterations.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 5

- Rewrite shell scripts in tests to use Python via script files instead of POSIX syntax, so they pass on Windows CI. - Snapshot iteration-0 nested-step results under a namespaced key (parent:child:0) before the first copy-back overwrite, preserving complete per-iteration history for debugging.

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 1

Move the status check before the copy-back so that partial results from paused or failed nested steps (e.g., a gate awaiting input) do not overwrite the unprefixed key. This preserves correct resume behavior.

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 4

Quote both the Python executable and script file paths in the run: commands to handle spaces in paths on Windows.

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 1

Instead of namespacing step IDs for execution and copying results back, execute the loop body with original (unprefixed) step IDs so results naturally land at the right keys. Snapshot previous iteration results to namespaced keys (parent:child:N) for history only. This fixes multi-step loop bodies where step B references step A's output within the same iteration — previously step B would see stale data until the copy-back ran after the entire iteration.

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 2

Revert to namespaced step IDs for execution (preserving unique log entries and state keys per iteration) but copy each step's result back to the unprefixed key immediately after it completes. This preserves backward compatibility (same namespaced key format, same log IDs) while fixing both the condition evaluation bug and inter-step references within multi-step loop bodies.

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 2

doquanghuy

Thanks @mnriem! I checked out the branch locally and ran the new regression tests — all 4 pass. The fix logic is sound: per-step aliasing of the namespaced ID back to the unprefixed key cleanly handles the condition-resolver gap I reported in #2592.

A few observations for consideration, none are blockers:

1. Test placement (worth fixing)

The 4 new tests are inserted into the TestWorkflowValidation class (at lines 1892, 1951, 1996, 2047 — TestWorkflowValidation spans lines 1208 → ~2099 in this PR's view), but they exercise engine execution behaviour, not validation. Future maintainers grepping for loop tests in TestWhileStep (line 910) or TestDoWhileStep (line 965) won't find them, and the file's validation-vs-execution boundary gets muddied.

A small move to those existing classes (or a new TestLoopIterationOutput adjacent to them) keeps the file organisation clean.

2. Coverage of a claim in the PR description

The PR description states:

Inter-step references within multi-step loop bodies work correctly (step B sees step A's output from the current iteration, not a stale previous one)

The logic supports this — the per-step for ns in result.next_steps aliases after each step before the next runs — but no test exercises a multi-step body. A small steps: [stepA, stepB] test where stepB reads {{ steps.stepA.output.stdout }} would lock that contract explicitly.

3. The canonical bug-report scenario goes untested

My issue's reproducer used a gate step inside the loop (pause-resume across iterations). The PR's tests use only shell steps, so the pause-then-resume path through the new aliasing logic isn't exercised. Not a regression risk (resume re-runs the loop from scratch, so any aliased PAUSED-state value gets overwritten on the next iter-0), but the canonical use case from the bug report goes uncovered. A monkey-patched gate test would lock that no surprises lurk in the pause-then-alias ordering.

4. Minor: PR description vs. code

PR description:

Pause/fail is checked after each step — no aliasing of partial results

But the code does alias before checking status:

self._execute_steps([ns_copy], context, state, registry, step_offset=-1)
if orig and ns_copy["id"] in context.steps:
    context.steps[orig] = context.steps[ns_copy["id"]]
    state.step_results[orig] = context.steps[ns_copy["id"]]
if state.status in (RunStatus.PAUSED, RunStatus.FAILED, RunStatus.ABORTED):
    return

This is fine in practice — _execute_steps writes context.steps[step_id] at engine.py:606 before the pause check at :619, so the alias picks up a valid (if PAUSED-state) result, and resume re-runs the loop from scratch anyway. But the description implies a stricter guarantee than the code. Either tighten the wording or guard the alias with a status check (if state.status not in (PAUSED, FAILED, ABORTED):) — both reach the same end state.

Otherwise, the fix is exactly what #2592 asked for. Thanks for jumping on it!

AI disclosure: drafted with Claude Opus, human-reviewed.

@doquanghuy

- Move per-step aliasing below the PAUSED/FAILED/ABORTED status check so partial results from incomplete steps are not aliased back to the unprefixed key. - Add test_while_loop_multi_step_body_inter_step_refs to exercise a multi-step loop body where step B reads step A's output within the same iteration, verifying per-step aliasing works correctly. Addresses feedback from @doquanghuy (items 2 & 4) and Copilot review on commit 9d0a222.

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 2

- Use enumerate() for stable fallback IDs when loop body steps lack an explicit id (step-0, step-1, etc. instead of always step-0). - Rewrite multi-step body test so step B uses expression substitution ({{ steps.step-a.output.stdout }}) instead of reading the counter file directly, making it a true regression test for per-step aliasing.

Copilot