Skip to content

fix(outcome-collector): update report for new Safe Output Outcome Evaluation#35552

Merged
mnkiefer merged 2 commits into
mainfrom
copilot/update-outcome-report-workflow
May 28, 2026
Merged

fix(outcome-collector): update report for new Safe Output Outcome Evaluation#35552
mnkiefer merged 2 commits into
mainfrom
copilot/update-outcome-report-workflow

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 28, 2026

The outcome-collector workflow had stale paths and a report format that predated the typed outcome evaluators introduced in ADR-35218 (add_reviewer, submit_pull_request_review, update_issue/update_pull_request with evidence strength, ignored state, fallback_exists_only signal).

Path fixes

  • Telemetry export never ran: Export outcome telemetry checked /tmp/gh-aw/agent/outcome-evaluations.jsonl but evaluate_outcomes.cjs writes to /tmp/gh-aw/outcome-evaluations.jsonl — telemetry was silently skipped every cycle
  • Agent read paths: Removed spurious /agent/ prefix from outcome-summary.json and outcomes/run-*.json references in the markdown body

Report format updates

Scorecard — added Ignored row; expanded Accepted into evidence-strength sub-rows:

| Accepted        | 42 / 60  | —                          |
| — strong        | 28       | merged, completed, approved |
| — medium        | 11       | engaged, retained           |
| — weak          | 3        | existence only              |
| Rejected        | 8        |                            |
| Ignored         | 10       | no observable follow-up     |

Median resolution — field is median_resolution_sec (int, seconds); scorecard now shows the conversion formula inline (÷ 3600 → hours, ÷ 60 → minutes)

Evidence Quality section — surfaces fallback_exists_only_count with context: items evaluated via the generic existence fallback inflate accepted_weak and may overstate acceptance

Action items — two new checks: ignored rate >30% (targeting/quality signal) and fallback evaluation rate >20% (data quality signal)

Per-workflow table — added Ignored column; removed Reactions (not present in summary JSON)

Field reference table — documents all summary JSON fields produced by evaluate_outcomes.cjs, including accepted_strong/medium/weak, fallback_exists_only_count, ignored, and noop

Copilot AI and others added 2 commits May 28, 2026 18:54
…luation

- Fix path bug: /tmp/gh-aw/agent/outcome-evaluations.jsonl →
  /tmp/gh-aw/outcome-evaluations.jsonl in Export outcome telemetry step
- Fix paths in markdown body (remove spurious /agent/ prefix)
- Add field reference table documenting all summary JSON fields
- Add evidence strength breakdown to scorecard (accepted_strong/medium/weak)
- Add ignored count to scorecard and per-workflow breakdown table
- Add Evidence Quality section for fallback_exists_only_count signal
- Fix median_resolution display: field is median_resolution_sec (seconds),
  add conversion guidance in guidelines
- Replace Reaction Summary section with Evidence Quality section
- Update action items: add ignored rate and fallback evaluation checks
- Recompile lock file

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
- Fix escaped pipe in markdown table (int|null)
- Clarify median_resolution_sec conversion formula inline in scorecard
- Rephrase Evidence Quality evaluator list ending

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
@mnkiefer mnkiefer marked this pull request as ready for review May 28, 2026 19:04
Copilot AI review requested due to automatic review settings May 28, 2026 19:04
@mnkiefer mnkiefer merged commit 3cde543 into main May 28, 2026
1 check failed
@mnkiefer mnkiefer deleted the copilot/update-outcome-report-workflow branch May 28, 2026 19:04
Copilot AI review requested due to automatic review settings May 28, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants