fix: while/do-while loop condition reads stale iteration-0 step output#2662
Conversation
After executing namespaced loop body steps, copy each iteration's results back to the original unprefixed step key so that evaluate_condition() sees the latest values instead of stale iteration-0 data. Fixes github#2592
There was a problem hiding this comment.
Pull request overview
Fixes a workflow-engine bug where while / do-while loop conditions kept reading stale iteration-0 nested-step outputs by copying the latest namespaced nested-step results back onto the original (unprefixed) step IDs after each loop iteration.
Changes:
- Update loop execution in the workflow engine to map namespaced nested-step IDs back to original IDs so condition evaluation sees the latest outputs.
- Add regression tests covering early loop termination and max-iteration exhaustion for both
whileanddo-while.
Show a summary per file
| File | Description |
|---|---|
src/specify_cli/workflows/engine.py |
Copies namespaced loop-iteration nested-step results back to the original step IDs so conditions read the latest values. |
tests/test_workflows.py |
Adds regression tests ensuring loop conditions observe updated nested-step outputs across iterations. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 5
- Rewrite shell scripts in tests to use Python via script files instead of POSIX syntax, so they pass on Windows CI. - Snapshot iteration-0 nested-step results under a namespaced key (parent:child:0) before the first copy-back overwrite, preserving complete per-iteration history for debugging.
Move the status check before the copy-back so that partial results from paused or failed nested steps (e.g., a gate awaiting input) do not overwrite the unprefixed key. This preserves correct resume behavior.
Quote both the Python executable and script file paths in the run: commands to handle spaces in paths on Windows.
Instead of namespacing step IDs for execution and copying results back, execute the loop body with original (unprefixed) step IDs so results naturally land at the right keys. Snapshot previous iteration results to namespaced keys (parent:child:N) for history only. This fixes multi-step loop bodies where step B references step A's output within the same iteration — previously step B would see stale data until the copy-back ran after the entire iteration.
Revert to namespaced step IDs for execution (preserving unique log entries and state keys per iteration) but copy each step's result back to the unprefixed key immediately after it completes. This preserves backward compatibility (same namespaced key format, same log IDs) while fixing both the condition evaluation bug and inter-step references within multi-step loop bodies.
doquanghuy
left a comment
There was a problem hiding this comment.
Thanks @mnriem! I checked out the branch locally and ran the new regression tests — all 4 pass. The fix logic is sound: per-step aliasing of the namespaced ID back to the unprefixed key cleanly handles the condition-resolver gap I reported in #2592.
A few observations for consideration, none are blockers:
1. Test placement (worth fixing)
The 4 new tests are inserted into the TestWorkflowValidation class (at lines 1892, 1951, 1996, 2047 — TestWorkflowValidation spans lines 1208 → ~2099 in this PR's view), but they exercise engine execution behaviour, not validation. Future maintainers grepping for loop tests in TestWhileStep (line 910) or TestDoWhileStep (line 965) won't find them, and the file's validation-vs-execution boundary gets muddied.
A small move to those existing classes (or a new TestLoopIterationOutput adjacent to them) keeps the file organisation clean.
2. Coverage of a claim in the PR description
The PR description states:
Inter-step references within multi-step loop bodies work correctly (step B sees step A's output from the current iteration, not a stale previous one)
The logic supports this — the per-step for ns in result.next_steps aliases after each step before the next runs — but no test exercises a multi-step body. A small steps: [stepA, stepB] test where stepB reads {{ steps.stepA.output.stdout }} would lock that contract explicitly.
3. The canonical bug-report scenario goes untested
My issue's reproducer used a gate step inside the loop (pause-resume across iterations). The PR's tests use only shell steps, so the pause-then-resume path through the new aliasing logic isn't exercised. Not a regression risk (resume re-runs the loop from scratch, so any aliased PAUSED-state value gets overwritten on the next iter-0), but the canonical use case from the bug report goes uncovered. A monkey-patched gate test would lock that no surprises lurk in the pause-then-alias ordering.
4. Minor: PR description vs. code
PR description:
Pause/fail is checked after each step — no aliasing of partial results
But the code does alias before checking status:
self._execute_steps([ns_copy], context, state, registry, step_offset=-1)
if orig and ns_copy["id"] in context.steps:
context.steps[orig] = context.steps[ns_copy["id"]]
state.step_results[orig] = context.steps[ns_copy["id"]]
if state.status in (RunStatus.PAUSED, RunStatus.FAILED, RunStatus.ABORTED):
returnThis is fine in practice — _execute_steps writes context.steps[step_id] at engine.py:606 before the pause check at :619, so the alias picks up a valid (if PAUSED-state) result, and resume re-runs the loop from scratch anyway. But the description implies a stricter guarantee than the code. Either tighten the wording or guard the alias with a status check (if state.status not in (PAUSED, FAILED, ABORTED):) — both reach the same end state.
Otherwise, the fix is exactly what #2592 asked for. Thanks for jumping on it!
AI disclosure: drafted with Claude Opus, human-reviewed.
- Move per-step aliasing below the PAUSED/FAILED/ABORTED status check so partial results from incomplete steps are not aliased back to the unprefixed key. - Add test_while_loop_multi_step_body_inter_step_refs to exercise a multi-step loop body where step B reads step A's output within the same iteration, verifying per-step aliasing works correctly. Addresses feedback from @doquanghuy (items 2 & 4) and Copilot review on commit 9d0a222.
- Use enumerate() for stable fallback IDs when loop body steps lack
an explicit id (step-0, step-1, etc. instead of always step-0).
- Rewrite multi-step body test so step B uses expression
substitution ({{ steps.step-a.output.stdout }}) instead of
reading the counter file directly, making it a true regression
test for per-step aliasing.
Summary
Fixes #2592 —
while/do-whileloop conditions always read the iteration-0 step output, ignoring updates from subsequent iterations.Root cause
The engine namespaces nested step IDs per iteration (e.g.,
my-loop:my-gate:1) to preserve per-iteration history. However, the condition resolver (evaluate_condition→_resolve_dot_path) looks up step output by the original unprefixed key (steps.my-gate), which is only written during iteration 0 and never updated afterward.Fix
Keep namespaced step IDs for execution (preserving unique log entries, state keys, and
on_step_startcallback IDs per iteration), but execute each body step individually and immediately alias its result back to the unprefixed key after it completes successfully. This ensures:evaluate_condition()sees the latest iteration's outputparent:child:1,parent:child:2, etc.)Changes
src/specify_cli/workflows/engine.py— Execute loop body steps one at a time with namespaced IDs, aliasing each result back to the unprefixed key immediately after successful completiontests/test_workflows.py— 5 new engine-level regression tests inTestWorkflowEngine:test_while_loop_condition_reads_latest_iteration(positive: loop stops early)test_do_while_loop_condition_reads_latest_iteration(positive: loop stops early)test_while_loop_runs_to_max_when_condition_stays_true(negative: loop exhausts max_iterations)test_do_while_loop_runs_to_max_when_condition_stays_true(negative: loop exhausts max_iterations)test_while_loop_multi_step_body_inter_step_refs(multi-step body: step B sees step A's current-iteration output)Testing
All 2998 existing + new tests pass (2998 passed, 1 skipped, 0 failures).