Skip to content

docs: Add outcome span attributes and outcomes reference#34627

Merged
mnkiefer merged 7 commits into
mainfrom
add-outcome-docs
May 25, 2026
Merged

docs: Add outcome span attributes and outcomes reference#34627
mnkiefer merged 7 commits into
mainfrom
add-outcome-docs

Conversation

@mnkiefer
Copy link
Copy Markdown
Collaborator

@mnkiefer mnkiefer commented May 25, 2026

docs: Add outcome span attributes and outcomes reference

Summary

This PR introduces documentation and specification refinements for the Outcomes model in gh-aw. It adds a new Outcomes reference page, documents gh-aw.outcome.* OTel span attributes in the OpenTelemetry reference, and hardens the safe-output outcome evaluation spec with tightened provenance language and normative rules.


What changed and why

docs/src/content/docs/reference/outcomes.md (new file)

A new 110-line reference page covering the Outcomes model end-to-end:

  • Defines all outcome states: accepted, rejected, pending, ignored, noop
  • Introduces outcome efficiency (effective tokens per accepted outcome)
  • Provides the per-safe-output-type acceptance table
  • Documents telemetry integration, cost/rollup guidance, and links to related pages

This gives operators and integrators a single canonical reference for understanding how gh-aw evaluates and categorises workflow outputs.

docs/src/content/docs/reference/open-telemetry.md (modified)

Added a new "Outcome span attributes" section documenting the gh-aw.outcome.* OTel span attributes emitted for workflow-level outcome rollups:

Attribute Description
gh-aw.outcome.total Total outcome count
gh-aw.outcome.accepted Accepted count
gh-aw.outcome.rejected Rejected count
gh-aw.outcome.pending Pending count
gh-aw.outcome.ignored Ignored count
gh-aw.outcome.acceptance_rate Acceptance rate
gh-aw.outcome.waste_rate Waste rate

The table is documented as non-exhaustive, with a cross-reference to the new Outcomes reference page.

specs/safe-output-outcome-evaluation.md (modified) ⚠️ Breaking

Significant provenance model refinements throughout the spec:

  • Language hardening: replaced "human" with "visible non-bot actor" / "actor-visible" throughout, including conformance test rows — removes ambiguity around bot-authored actions
  • New "Provenance Limits" section: five normative rules governing what counts as valid provenance
  • OTel prefix rename: ghaw.outcome.*gh-aw.outcome.* (see breaking change below)
  • Historical field name clarifications: human_comments, human_edits, zero_touch are explicitly noted as historical field names
  • "Current Default Acceptance Map" table: maps each output type to its runtime evaluator function and acceptance behaviour

⚠️ Breaking change

OTel span attribute namespace renamed

The span attribute prefix has changed from ghaw.outcome.* to gh-aw.outcome.*.

Impact: Any consumer, dashboard, or alert rule reading or emitting outcome span attributes using the old ghaw.* prefix must be updated to use gh-aw.*.


Files changed

File Change Impact
docs/src/content/docs/reference/outcomes.md Added Medium
docs/src/content/docs/reference/open-telemetry.md Modified Medium
specs/safe-output-outcome-evaluation.md Modified High / Breaking

Checklist

  • New Outcomes reference page covers all outcome states, efficiency metric, and telemetry
  • OTel reference updated with gh-aw.outcome.* attribute table (non-exhaustive)
  • Spec provenance language de-ambiguated ("visible non-bot actor")
  • Normative Provenance Limits section added
  • Default Acceptance Map table added to spec
  • Breaking OTel prefix rename documented

Generated by PR Description Updater for issue #34627 · sonnet46 1.3M ·

@mnkiefer mnkiefer self-assigned this May 25, 2026
Copilot AI review requested due to automatic review settings May 25, 2026 10:52
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the outcomes documentation/specs for GitHub Agentic Workflows, aiming to clarify outcome evaluation principles, provide a consolidated outcomes reference page, and document OpenTelemetry outcome-related attributes.

Changes:

  • Refines outcome evaluation principles and adds a provenance/actor-visibility section in the outcome evaluation spec.
  • Adds a new Outcomes reference page summarizing outcome states, acceptance meaning by safe output type, and rollups.
  • Adds an OpenTelemetry reference section documenting outcome-related span attributes.
Show a summary per file
File Description
specs/safe-output-outcome-evaluation.md Updates evaluation principles/provenance guidance and adds a “current default acceptance map” summary table.
docs/src/content/docs/reference/outcomes.md Introduces a new Outcomes reference page summarizing states, acceptance meaning, and related links.
docs/src/content/docs/reference/open-telemetry.md Adds an “Outcome span attributes” section to the OTel reference.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (7)

docs/src/content/docs/reference/outcomes.md:46

  • The /gh-aw/specs/safe-output-outcome-evaluation/ link likely won’t resolve in the docs site: no other docs pages link under /gh-aw/specs/, and the spec content lives in the repo-root specs/ folder (not under docs/src/content/docs). Consider linking to the published spec location (or to the GitHub file), or adding the spec into the docs content so this route exists.
Rows marked `fallback rule` use a generic existence check, not a type-specific rule. For exact rules, edge cases, and conformance details, see [Safe Output Outcome Evaluation Specification](/gh-aw/specs/safe-output-outcome-evaluation/).

docs/src/content/docs/reference/outcomes.md:55

  • The add_labels row says accepted means “label retention”, but the current Go evaluator can’t verify retention because the manifest doesn’t record which labels were added; it returns pending when any labels exist and only rejected when none exist (pkg/cli/outcome_eval_label.go). Please adjust this row to match current behavior (or clarify that it’s a non-accepting/indeterminate check today).
| `add_comment` | reacted to or replied to | dedicated rule |
| `add_labels` | label retention | limited check |
| `add_reviewer` | review target exists | fallback rule |

specs/safe-output-outcome-evaluation.md:93

  • In the “Current Default Acceptance Map”, the add_labels row says the evaluator accepts based on “label retention”, but evalAddLabels currently returns pending when labels are present (it cannot verify retention because added labels aren’t recorded) and only rejects on an empty label list (pkg/cli/outcome_eval_label.go:37-47). Please update the table to match current runtime behavior.
| `create_issue` | `evalCreateIssue` | completed/closed |
| `add_comment` | `evalAddComment` | reacted to or replied to |
| `add_labels` | `evalAddLabels` | label retention |
| `add_reviewer` | `evalGenericSticky` fallback | review target exists |

docs/src/content/docs/reference/outcomes.md:63

  • These rows describe “discussion target exists” as the fallback behavior, but the Go fallback evaluator (evalGenericSticky) only calls GET /issues/{number}. That won’t validate a Discussion’s existence and will likely produce an error outcome. Consider updating the table to reflect the real fallback behavior/limitations for non-issue/PR objects.
| `close_discussion` | none yet | no implemented rule yet |
| `create_discussion` | none yet | no implemented rule yet |
| `update_discussion` | discussion target exists | fallback rule |
| `create_pull_request_review_comment` | none yet | no implemented rule yet |

specs/safe-output-outcome-evaluation.md:95

  • The evalGenericSticky fallback only checks whether an issue/PR exists via GET /issues/{number} (it does not validate “review target exists”). Please update the accepted at a glance wording for fallback rows like add_reviewer to match the actual behavior (generic issue/PR existence only).
| `add_labels` | `evalAddLabels` | label retention |
| `add_reviewer` | `evalGenericSticky` fallback | review target exists |
| `update_issue` | `evalGenericSticky` fallback | issue still exists |
| `update_pull_request` | `evalGenericSticky` fallback | PR still exists |

docs/src/content/docs/reference/outcomes.md:74

  • Several “fallback rule” entries here (e.g. dispatch_workflow, code scanning alerts, link_sub_issue) imply the evaluator checks the existence of those specific resources. In Go, unknown types fall back to evalGenericSticky, which only calls GET /issues/{number}; for non-issue/PR resources this won’t confirm existence and will likely error/pending. Consider adjusting these summaries to reflect the current fallback limitations.
| `dispatch_workflow` | dispatch target exists | fallback rule |
| `autofix_code_scanning_alert` | alert target exists | fallback rule |
| `create_code_scanning_alert` | alert target exists | fallback rule |
| `link_sub_issue` | sub-issue link target exists | fallback rule |
| `hide_comment` | none yet | no implemented rule yet |

specs/safe-output-outcome-evaluation.md:104

  • For update_discussion and other non-issue/PR types, the evalGenericSticky fallback won’t validate the target because it calls the Issues REST endpoint (GET /issues/{number}). The table currently implies resource-specific existence checks (“discussion target exists”, “review target exists”, etc.); please revise the accepted summaries so they match actual fallback behavior/limitations.
| `update_discussion` | `evalGenericSticky` fallback | discussion target exists |
| `create_pull_request_review_comment` | `evalReviewComment` | none yet |
| `submit_pull_request_review` | `evalGenericSticky` fallback | PR still exists |
| `reply_to_pull_request_review_comment` | `evalGenericSticky` fallback | review target exists |
| `resolve_pull_request_review_thread` | `evalResolveThread` | none yet |
  • Files reviewed: 3/3 changed files
  • Comments generated: 3

Comment thread docs/src/content/docs/reference/outcomes.md Outdated
Comment thread specs/safe-output-outcome-evaluation.md Outdated
Comment thread docs/src/content/docs/reference/open-telemetry.md Outdated
mnkiefer and others added 2 commits May 25, 2026 12:58
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ smoke-ci: safeoutputs CLI comment + comment-memory run (26397041759)

Generated by 🧪 Smoke CI for issue #34627 ·

Copilot AI and others added 2 commits May 25, 2026 11:11
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
@mnkiefer
Copy link
Copy Markdown
Collaborator Author

@copilot Address remaining review comments and resolve merge conflicts.

Copilot AI and others added 2 commits May 25, 2026 11:23
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 25, 2026

@copilot Address remaining review comments and resolve merge conflicts.

Addressed. Remaining review feedback is handled in a00caf2, and merge conflicts were resolved by merging main and fixing lockfile conflicts in 8ca9be9. Branch is now updated with latest main.

@mnkiefer mnkiefer merged commit 209fa60 into main May 25, 2026
@mnkiefer mnkiefer deleted the add-outcome-docs branch May 25, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants