fix(outcome-collector): update report for new Safe Output Outcome Evaluation#35552
Merged
Conversation
…luation - Fix path bug: /tmp/gh-aw/agent/outcome-evaluations.jsonl → /tmp/gh-aw/outcome-evaluations.jsonl in Export outcome telemetry step - Fix paths in markdown body (remove spurious /agent/ prefix) - Add field reference table documenting all summary JSON fields - Add evidence strength breakdown to scorecard (accepted_strong/medium/weak) - Add ignored count to scorecard and per-workflow breakdown table - Add Evidence Quality section for fallback_exists_only_count signal - Fix median_resolution display: field is median_resolution_sec (seconds), add conversion guidance in guidelines - Replace Reaction Summary section with Evidence Quality section - Update action items: add ignored rate and fallback evaluation checks - Recompile lock file Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
- Fix escaped pipe in markdown table (int|null) - Clarify median_resolution_sec conversion formula inline in scorecard - Rephrase Evidence Quality evaluator list ending Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Copilot created this pull request from a session on behalf of
mnkiefer
May 28, 2026 19:03
View session
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The outcome-collector workflow had stale paths and a report format that predated the typed outcome evaluators introduced in ADR-35218 (
add_reviewer,submit_pull_request_review,update_issue/update_pull_requestwith evidence strength,ignoredstate,fallback_exists_onlysignal).Path fixes
Export outcome telemetrychecked/tmp/gh-aw/agent/outcome-evaluations.jsonlbutevaluate_outcomes.cjswrites to/tmp/gh-aw/outcome-evaluations.jsonl— telemetry was silently skipped every cycle/agent/prefix fromoutcome-summary.jsonandoutcomes/run-*.jsonreferences in the markdown bodyReport format updates
Scorecard — added
Ignoredrow; expandedAcceptedinto evidence-strength sub-rows:Median resolution — field is
median_resolution_sec(int, seconds); scorecard now shows the conversion formula inline (÷ 3600→ hours,÷ 60→ minutes)Evidence Quality section — surfaces
fallback_exists_only_countwith context: items evaluated via the generic existence fallback inflateaccepted_weakand may overstate acceptanceAction items — two new checks: ignored rate >30% (targeting/quality signal) and fallback evaluation rate >20% (data quality signal)
Per-workflow table — added
Ignoredcolumn; removedReactions(not present in summary JSON)Field reference table — documents all summary JSON fields produced by
evaluate_outcomes.cjs, includingaccepted_strong/medium/weak,fallback_exists_only_count,ignored, andnoop