docs: Add outcome span attributes and outcomes reference#34627
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR updates the outcomes documentation/specs for GitHub Agentic Workflows, aiming to clarify outcome evaluation principles, provide a consolidated outcomes reference page, and document OpenTelemetry outcome-related attributes.
Changes:
- Refines outcome evaluation principles and adds a provenance/actor-visibility section in the outcome evaluation spec.
- Adds a new Outcomes reference page summarizing outcome states, acceptance meaning by safe output type, and rollups.
- Adds an OpenTelemetry reference section documenting outcome-related span attributes.
Show a summary per file
| File | Description |
|---|---|
| specs/safe-output-outcome-evaluation.md | Updates evaluation principles/provenance guidance and adds a “current default acceptance map” summary table. |
| docs/src/content/docs/reference/outcomes.md | Introduces a new Outcomes reference page summarizing states, acceptance meaning, and related links. |
| docs/src/content/docs/reference/open-telemetry.md | Adds an “Outcome span attributes” section to the OTel reference. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (7)
docs/src/content/docs/reference/outcomes.md:46
- The
/gh-aw/specs/safe-output-outcome-evaluation/link likely won’t resolve in the docs site: no other docs pages link under/gh-aw/specs/, and the spec content lives in the repo-rootspecs/folder (not underdocs/src/content/docs). Consider linking to the published spec location (or to the GitHub file), or adding the spec into the docs content so this route exists.
Rows marked `fallback rule` use a generic existence check, not a type-specific rule. For exact rules, edge cases, and conformance details, see [Safe Output Outcome Evaluation Specification](/gh-aw/specs/safe-output-outcome-evaluation/).
docs/src/content/docs/reference/outcomes.md:55
- The
add_labelsrow saysacceptedmeans “label retention”, but the current Go evaluator can’t verify retention because the manifest doesn’t record which labels were added; it returnspendingwhen any labels exist and onlyrejectedwhen none exist (pkg/cli/outcome_eval_label.go). Please adjust this row to match current behavior (or clarify that it’s a non-accepting/indeterminate check today).
| `add_comment` | reacted to or replied to | dedicated rule |
| `add_labels` | label retention | limited check |
| `add_reviewer` | review target exists | fallback rule |
specs/safe-output-outcome-evaluation.md:93
- In the “Current Default Acceptance Map”, the
add_labelsrow says the evaluator accepts based on “label retention”, butevalAddLabelscurrently returnspendingwhen labels are present (it cannot verify retention because added labels aren’t recorded) and only rejects on an empty label list (pkg/cli/outcome_eval_label.go:37-47). Please update the table to match current runtime behavior.
| `create_issue` | `evalCreateIssue` | completed/closed |
| `add_comment` | `evalAddComment` | reacted to or replied to |
| `add_labels` | `evalAddLabels` | label retention |
| `add_reviewer` | `evalGenericSticky` fallback | review target exists |
docs/src/content/docs/reference/outcomes.md:63
- These rows describe “discussion target exists” as the fallback behavior, but the Go fallback evaluator (
evalGenericSticky) only callsGET /issues/{number}. That won’t validate a Discussion’s existence and will likely produce an error outcome. Consider updating the table to reflect the real fallback behavior/limitations for non-issue/PR objects.
| `close_discussion` | none yet | no implemented rule yet |
| `create_discussion` | none yet | no implemented rule yet |
| `update_discussion` | discussion target exists | fallback rule |
| `create_pull_request_review_comment` | none yet | no implemented rule yet |
specs/safe-output-outcome-evaluation.md:95
- The
evalGenericStickyfallback only checks whether an issue/PR exists viaGET /issues/{number}(it does not validate “review target exists”). Please update theaccepted at a glancewording for fallback rows likeadd_reviewerto match the actual behavior (generic issue/PR existence only).
| `add_labels` | `evalAddLabels` | label retention |
| `add_reviewer` | `evalGenericSticky` fallback | review target exists |
| `update_issue` | `evalGenericSticky` fallback | issue still exists |
| `update_pull_request` | `evalGenericSticky` fallback | PR still exists |
docs/src/content/docs/reference/outcomes.md:74
- Several “fallback rule” entries here (e.g.
dispatch_workflow, code scanning alerts,link_sub_issue) imply the evaluator checks the existence of those specific resources. In Go, unknown types fall back toevalGenericSticky, which only callsGET /issues/{number}; for non-issue/PR resources this won’t confirm existence and will likely error/pending. Consider adjusting these summaries to reflect the current fallback limitations.
| `dispatch_workflow` | dispatch target exists | fallback rule |
| `autofix_code_scanning_alert` | alert target exists | fallback rule |
| `create_code_scanning_alert` | alert target exists | fallback rule |
| `link_sub_issue` | sub-issue link target exists | fallback rule |
| `hide_comment` | none yet | no implemented rule yet |
specs/safe-output-outcome-evaluation.md:104
- For
update_discussionand other non-issue/PR types, theevalGenericStickyfallback won’t validate the target because it calls the Issues REST endpoint (GET /issues/{number}). The table currently implies resource-specific existence checks (“discussion target exists”, “review target exists”, etc.); please revise theacceptedsummaries so they match actual fallback behavior/limitations.
| `update_discussion` | `evalGenericSticky` fallback | discussion target exists |
| `create_pull_request_review_comment` | `evalReviewComment` | none yet |
| `submit_pull_request_review` | `evalGenericSticky` fallback | PR still exists |
| `reply_to_pull_request_review_comment` | `evalGenericSticky` fallback | review target exists |
| `resolve_pull_request_review_thread` | `evalResolveThread` | none yet |
- Files reviewed: 3/3 changed files
- Comments generated: 3
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
✅ smoke-ci: safeoutputs CLI comment + comment-memory run (26397041759)
|
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
|
@copilot Address remaining review comments and resolve merge conflicts. |
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Addressed. Remaining review feedback is handled in |
docs: Add outcome span attributes and outcomes reference
Summary
This PR introduces documentation and specification refinements for the Outcomes model in gh-aw. It adds a new Outcomes reference page, documents
gh-aw.outcome.*OTel span attributes in the OpenTelemetry reference, and hardens the safe-output outcome evaluation spec with tightened provenance language and normative rules.What changed and why
docs/src/content/docs/reference/outcomes.md(new file)A new 110-line reference page covering the Outcomes model end-to-end:
accepted,rejected,pending,ignored,noopThis gives operators and integrators a single canonical reference for understanding how gh-aw evaluates and categorises workflow outputs.
docs/src/content/docs/reference/open-telemetry.md(modified)Added a new "Outcome span attributes" section documenting the
gh-aw.outcome.*OTel span attributes emitted for workflow-level outcome rollups:gh-aw.outcome.totalgh-aw.outcome.acceptedgh-aw.outcome.rejectedgh-aw.outcome.pendinggh-aw.outcome.ignoredgh-aw.outcome.acceptance_rategh-aw.outcome.waste_rateThe table is documented as non-exhaustive, with a cross-reference to the new Outcomes reference page.
specs/safe-output-outcome-evaluation.md(modified)Significant provenance model refinements throughout the spec:
ghaw.outcome.*→gh-aw.outcome.*(see breaking change below)human_comments,human_edits,zero_touchare explicitly noted as historical field namesOTel span attribute namespace renamed
The span attribute prefix has changed from
ghaw.outcome.*togh-aw.outcome.*.Impact: Any consumer, dashboard, or alert rule reading or emitting outcome span attributes using the old
ghaw.*prefix must be updated to usegh-aw.*.Files changed
docs/src/content/docs/reference/outcomes.mddocs/src/content/docs/reference/open-telemetry.mdspecs/safe-output-outcome-evaluation.mdChecklist
gh-aw.outcome.*attribute table (non-exhaustive)