Trace aggregation reports workflow as 'failed' after loop recovery

## Summary

When a node fails on visit 1 and succeeds on visit 2 via loop recovery, the workflow completes successfully and produces correct output — but `--report` summary says `Status: failed` and lists the node in the Errors section. This is a cosmetic issue in trace event aggregation, not a runtime correctness bug.

## Background

Found during Task 148 verification while testing loop recovery semantics. The Task 148 invariant is upheld correctly at runtime (`__failures__` is cleared on loop re-entry, `shared[node_id]` holds the fresh success data, workflow exits 0, template resolution sees success data). The issue is entirely in how `--report` / trace consumers aggregate per-visit events.

Task 148's spec covered the runtime invariant and direct failed-node reference UX, but explicitly did not address trace aggregation for loop recovery. The only `--report` acceptance criterion in Task 148 was for the non-loop #208 repro. Filing this as a follow-up so it's tracked.

## Reproduction

`scratchpads/task148-verification/09_loop_recovery.pflow.md`:

```markdown
### setup
Remove any pre-existing marker so the test is repeatable.
- type: shell
- next: maybe-fail
- cache: false
\`\`\`shell command
rm -f /tmp/pflow-loop-marker
\`\`\`

### maybe-fail
Fails first time, creates marker, succeeds on retry.
- type: shell
- on-error: retry
- next: end
- cache: false
\`\`\`shell command
if [ -f /tmp/pflow-loop-marker ]; then
  echo "succeeded-on-retry"
else
  touch /tmp/pflow-loop-marker
  exit 9
fi
\`\`\`

### retry
Loops back to maybe-fail.
- type: shell
- next: maybe-fail
- cache: false
\`\`\`shell command
echo "retrying"
\`\`\`

## Outputs
### result
- source: ${maybe-fail.stdout}
```

Run:

```bash
rm -f /tmp/pflow-loop-marker
uv run pflow scratchpads/task148-verification/09_loop_recovery.pflow.md --no-cache --report
```

**CLI output (correct)**: `succeeded-on-retry`, exit 0, workflow completed successfully.

**`summary.md` (wrong)**:

\`\`\`
# Execution Report: 09_loop_recovery

- Status: failed
- Nodes: 4

## Pipeline

| # | Node | Type | Time | Status |
|---|------|------|------|--------|
| 1 | setup | ShellNode | 6ms | ok |
| 2 | maybe-fail | ShellNode | 5ms | **FAILED** |
| 3 | retry | ShellNode | 3ms | ok |
| 4 | maybe-fail | ShellNode | 3ms | ok |

## Errors

- **maybe-fail** (ShellNode): Command failed with exit code 9
\`\`\`

The workflow succeeded and produced `succeeded-on-retry`, but the report claims it failed and lists a stale error from visit 1.

## Root cause

`src/pflow/runtime/workflow_trace.py:275-288` (`_determine_trace_status`):

\`\`\`python
failed = any(not e.get("success", True) for e in self.events)
if failed:
    return "failed"
\`\`\`

This is monotonic over the full event list. Each node execution appends its own event, and events are never removed. Visit 1's `success: False` event persists forever in `self.events`, so `any()` returns True even though visit 2 succeeded.

`src/pflow/runtime/workflow_trace.py:408-421` (`save_to_file`):

\`\`\`python
failed_nodes = [e for e in self.events if not e.get("success", True)]
trace_data = {
    "final_status": final_status,
    "nodes_executed": len(self.events),
    "nodes_failed": len(failed_nodes),
    ...
}
\`\`\`

Same pattern — counts failed events, not failed nodes-by-final-state.

`src/pflow/core/trace_report.py:223-229` (`_collect_errors`) filters events the same way, which is why the Errors section lists visit 1's exit 9 even after visit 2 recovered.

The trace format currently conflates "this node ever failed during the run" with "this node's final state is failed". For non-loop workflows the two are identical; for loop recovery they diverge.

## Design question (not prescribing a fix)

There are legitimate uses for both interpretations:

1. **Final-state view**: An operator wants to know "did the workflow succeed?" → yes. Loop recovery should report green.
2. **Audit view**: An agent debugging flakiness wants to know "did this node ever fail during the run?" → yes. Loop recovery should surface the visit-1 failure.

A reasonable fix would report both: top-level `final_status` from the final per-node state, but preserve per-event history (the pipeline table already shows both visits). Options to consider:

- **Group events by \`node_id\`, take the last event's \`success\` flag** as each node's final state. Compute \`final_status\` from that. Pipeline table and per-node files stay unchanged.
- **Walk the engine's \`shared[\"__execution__\"][\"completed_nodes\"]\` and \`shared[\"__failures__\"]\` at save time** to compute final state from runtime invariant rather than event history. Requires threading shared state into the trace collector.
- **Tag each event with a \`visit_number\`** and have the report aggregator distinguish "final" from "prior". More intrusive but preserves all the data.

The first option is the smallest change and matches what users intuitively expect from \`--report\`.

## Acceptance criteria

- Running the reproducer with \`--report\` produces \`Status: succeeded\` (or equivalent) in \`summary.md\`.
- The pipeline table still shows both \`maybe-fail\` visits (first FAILED, second ok) — per-visit history is audit data, not rewritten.
- The Errors section does NOT list \`maybe-fail\` as an error, because its final state is success.
- Non-loop workflows are unaffected: #208 repro (and any single-pass workflow with a truly failed node) still reports \`Status: failed\` with the failed node in the Errors section.
- Regression test added that runs the loop recovery reproducer and asserts \`summary.md\` contains \`Status: succeeded\` (or uses \`trace.final_status\` equivalent), not \`failed\`.

## Related

- Task 148 — Failed-node invariant fix. This issue is explicitly out of scope per \`task-148.md\` (only non-loop \`--report\` criterion was specified).
- #235 — Replace \`runner._extract_runtime_warnings\` canned suggestions (related UX cleanup in the same area).
- Verified during Task 148 manual testing that runtime correctness is not affected: \`shared["__failures__"]\` is correctly empty after loop recovery, \`${maybe-fail.stdout}\` correctly resolves to \`succeeded-on-retry\`, and the workflow exits 0. Only report/trace aggregation is affected.

## Files likely to change

- \`src/pflow/runtime/workflow_trace.py\` — \`_determine_trace_status\`, \`save_to_file\` (per-node-final-state aggregation)
- \`src/pflow/core/trace_report.py\` — \`_collect_errors\` (filter by final state)
- Tests: \`tests/test_runtime/test_workflow_trace.py\` or a new \`tests/test_core/test_trace_report.py\` for the loop recovery regression case

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace aggregation reports workflow as 'failed' after loop recovery #240

Summary

Background

Reproduction

Execution Report: 09_loop_recovery

Pipeline

Errors

Root cause

Design question (not prescribing a fix)

Acceptance criteria

Related

Files likely to change

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

#	Node	Type	Time	Status
1	setup	ShellNode	6ms	ok
2	maybe-fail	ShellNode	5ms	FAILED
3	retry	ShellNode	3ms	ok
4	maybe-fail	ShellNode	3ms	ok

Trace aggregation reports workflow as 'failed' after loop recovery #240

Description

Summary

Background

Reproduction

Execution Report: 09_loop_recovery

Pipeline

Errors

Root cause

Design question (not prescribing a fix)

Acceptance criteria

Related

Files likely to change

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions