You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a node fails on visit 1 and succeeds on visit 2 via loop recovery, the workflow completes successfully and produces correct output — but --report summary says Status: failed and lists the node in the Errors section. This is a cosmetic issue in trace event aggregation, not a runtime correctness bug.
Background
Found during Task 148 verification while testing loop recovery semantics. The Task 148 invariant is upheld correctly at runtime (__failures__ is cleared on loop re-entry, shared[node_id] holds the fresh success data, workflow exits 0, template resolution sees success data). The issue is entirely in how --report / trace consumers aggregate per-visit events.
Task 148's spec covered the runtime invariant and direct failed-node reference UX, but explicitly did not address trace aggregation for loop recovery. The only --report acceptance criterion in Task 148 was for the non-loop #208 repro. Filing this as a follow-up so it's tracked.
```python
failed = any(not e.get("success", True) for e in self.events)
if failed:
return "failed"
```
This is monotonic over the full event list. Each node execution appends its own event, and events are never removed. Visit 1's success: False event persists forever in self.events, so any() returns True even though visit 2 succeeded.
```python
failed_nodes = [e for e in self.events if not e.get("success", True)]
trace_data = {
"final_status": final_status,
"nodes_executed": len(self.events),
"nodes_failed": len(failed_nodes),
...
}
```
Same pattern — counts failed events, not failed nodes-by-final-state.
src/pflow/core/trace_report.py:223-229 (_collect_errors) filters events the same way, which is why the Errors section lists visit 1's exit 9 even after visit 2 recovered.
The trace format currently conflates "this node ever failed during the run" with "this node's final state is failed". For non-loop workflows the two are identical; for loop recovery they diverge.
Design question (not prescribing a fix)
There are legitimate uses for both interpretations:
Final-state view: An operator wants to know "did the workflow succeed?" → yes. Loop recovery should report green.
Audit view: An agent debugging flakiness wants to know "did this node ever fail during the run?" → yes. Loop recovery should surface the visit-1 failure.
A reasonable fix would report both: top-level final_status from the final per-node state, but preserve per-event history (the pipeline table already shows both visits). Options to consider:
Group events by `node_id`, take the last event's `success` flag as each node's final state. Compute `final_status` from that. Pipeline table and per-node files stay unchanged.
Walk the engine's `shared["execution"]["completed_nodes"]` and `shared["failures"]` at save time to compute final state from runtime invariant rather than event history. Requires threading shared state into the trace collector.
Tag each event with a `visit_number` and have the report aggregator distinguish "final" from "prior". More intrusive but preserves all the data.
The first option is the smallest change and matches what users intuitively expect from `--report`.
Acceptance criteria
Running the reproducer with `--report` produces `Status: succeeded` (or equivalent) in `summary.md`.
The pipeline table still shows both `maybe-fail` visits (first FAILED, second ok) — per-visit history is audit data, not rewritten.
The Errors section does NOT list `maybe-fail` as an error, because its final state is success.
Regression test added that runs the loop recovery reproducer and asserts `summary.md` contains `Status: succeeded` (or uses `trace.final_status` equivalent), not `failed`.
Related
Task 148 — Failed-node invariant fix. This issue is explicitly out of scope per `task-148.md` (only non-loop `--report` criterion was specified).
Verified during Task 148 manual testing that runtime correctness is not affected: `shared["failures"]` is correctly empty after loop recovery, `${maybe-fail.stdout}` correctly resolves to `succeeded-on-retry`, and the workflow exits 0. Only report/trace aggregation is affected.
Summary
When a node fails on visit 1 and succeeds on visit 2 via loop recovery, the workflow completes successfully and produces correct output — but
--reportsummary saysStatus: failedand lists the node in the Errors section. This is a cosmetic issue in trace event aggregation, not a runtime correctness bug.Background
Found during Task 148 verification while testing loop recovery semantics. The Task 148 invariant is upheld correctly at runtime (
__failures__is cleared on loop re-entry,shared[node_id]holds the fresh success data, workflow exits 0, template resolution sees success data). The issue is entirely in how--report/ trace consumers aggregate per-visit events.Task 148's spec covered the runtime invariant and direct failed-node reference UX, but explicitly did not address trace aggregation for loop recovery. The only
--reportacceptance criterion in Task 148 was for the non-loop #208 repro. Filing this as a follow-up so it's tracked.Reproduction
scratchpads/task148-verification/09_loop_recovery.pflow.md:Run:
CLI output (correct):
succeeded-on-retry, exit 0, workflow completed successfully.summary.md(wrong):```
Execution Report: 09_loop_recovery
Pipeline
Errors
```
The workflow succeeded and produced
succeeded-on-retry, but the report claims it failed and lists a stale error from visit 1.Root cause
src/pflow/runtime/workflow_trace.py:275-288(_determine_trace_status):```python
failed = any(not e.get("success", True) for e in self.events)
if failed:
return "failed"
```
This is monotonic over the full event list. Each node execution appends its own event, and events are never removed. Visit 1's
success: Falseevent persists forever inself.events, soany()returns True even though visit 2 succeeded.src/pflow/runtime/workflow_trace.py:408-421(save_to_file):```python
failed_nodes = [e for e in self.events if not e.get("success", True)]
trace_data = {
"final_status": final_status,
"nodes_executed": len(self.events),
"nodes_failed": len(failed_nodes),
...
}
```
Same pattern — counts failed events, not failed nodes-by-final-state.
src/pflow/core/trace_report.py:223-229(_collect_errors) filters events the same way, which is why the Errors section lists visit 1's exit 9 even after visit 2 recovered.The trace format currently conflates "this node ever failed during the run" with "this node's final state is failed". For non-loop workflows the two are identical; for loop recovery they diverge.
Design question (not prescribing a fix)
There are legitimate uses for both interpretations:
A reasonable fix would report both: top-level
final_statusfrom the final per-node state, but preserve per-event history (the pipeline table already shows both visits). Options to consider:The first option is the smallest change and matches what users intuitively expect from `--report`.
Acceptance criteria
Related
Files likely to change