Skip to content

Commit af26e46

Browse files
authored
feat: add deployment-incident-monitor example workflow and deployment_status state filter (#28549)
1 parent 7512713 commit af26e46

16 files changed

Lines changed: 1900 additions & 11 deletions

.github/aw/github-agentic-workflows.md

Lines changed: 50 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.github/workflows/deployment-incident-monitor.lock.yml

Lines changed: 1308 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
---
2+
description: Monitors deployment failures and automatically creates deduplicated incident issues with root cause analysis.
3+
on:
4+
deployment_status:
5+
state: [error, failure]
6+
skip-if-match: "is:issue is:open label:incident label:deployment-failure"
7+
permissions:
8+
contents: read
9+
actions: read
10+
deployments: read
11+
engine: copilot
12+
tools:
13+
github:
14+
toolsets: [repos, actions]
15+
safe-outputs:
16+
create-issue:
17+
expires: 7d
18+
title-prefix: "[Incident] "
19+
labels: [incident, deployment-failure]
20+
close-older-issues: true
21+
noop:
22+
timeout-minutes: 10
23+
---
24+
25+
# Deployment Incident Monitor
26+
27+
A deployment to **${{ github.event.deployment.environment }}** has failed with state `${{ github.event.deployment_status.state }}`.
28+
29+
## Your Task
30+
31+
Perform a root cause analysis of this deployment failure and create a focused incident issue.
32+
33+
## Deployment Context
34+
35+
- **Environment**: ${{ github.event.deployment.environment }}
36+
- **Status**: ${{ github.event.deployment_status.state }}
37+
- **Repository**: ${{ github.repository }}
38+
39+
## Investigation Steps
40+
41+
1. **Check for an existing open incident issue**: Look for open issues with both `incident` and `deployment-failure` labels. If one already exists for this environment and recent timeframe, call `noop` with a brief explanation.
42+
43+
2. **Gather context** using the available GitHub MCP tools:
44+
- Look up recent workflow runs and job logs in the `actions` toolset to identify what failed
45+
- Review recent commits to the deployed branch to identify changes that may have caused the failure
46+
- Check if there were any related CI failures preceding the deployment
47+
48+
3. **Create an incident issue** if no duplicate exists. The issue should include:
49+
- **Environment** and the deployment failure state
50+
- **Summary** of likely root cause based on available evidence
51+
- **Evidence**: relevant log excerpts, failing steps, or recent commits linked to the failure
52+
- **Suggested remediation** steps for the on-call team
53+
- A link to the failing deployment for quick access
54+
55+
## Output Guidelines
56+
57+
- Use `noop` if a duplicate open incident issue already exists.
58+
- Keep the issue concise and actionable — focus on what the on-call engineer needs to know immediately.
59+
- Do not create speculative issues; only create one when there is concrete evidence of a failure.

.github/workflows/dev-hawk.lock.yml

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

actions/setup/js/aw_context.cjs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ function resolveItemContext(payload) {
107107
* item_number: string,
108108
* comment_id: string,
109109
* comment_node_id: string,
110+
* deployment_state: string,
110111
* otel_trace_id: string,
111112
* otel_parent_span_id: string
112113
* }}
@@ -122,6 +123,10 @@ function resolveItemContext(payload) {
122123
* Only populated for discussion/discussion_comment events. Can be passed
123124
* as reply_to_id in add_comment to thread responses under the triggering
124125
* comment when a dispatched specialist workflow replies to a discussion.
126+
* - deployment_state: The deployment status state value (e.g. "failure", "error",
127+
* "success") when the workflow was triggered by a deployment_status event.
128+
* Empty string for all other event types. Propagated to child workflows via
129+
* workflow_call so they can identify which state triggered the parent.
125130
* - otel_trace_id: OTLP trace ID from the parent workflow's setup span.
126131
* Empty string when OTLP is not configured or the parent setup step has
127132
* not yet run. Used by child workflow setup steps to continue the same
@@ -150,6 +155,10 @@ function buildAwContext() {
150155
item_number,
151156
comment_id,
152157
comment_node_id,
158+
// deployment_state carries the GitHub deployment_status state value when the
159+
// triggering event is deployment_status. Empty string for all other events.
160+
// Propagated to called workflows so they can access the deployment state.
161+
deployment_state: context.eventName === "deployment_status" ? (context.payload?.deployment_status?.state ?? "") : "",
153162
// Propagate the current OTLP trace ID to dispatched child workflows so that
154163
// composite actions share the same trace as their parent. Empty string when
155164
// OTLP is not configured or the parent setup step has not run yet.

actions/setup/js/generate_aw_info.cjs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,14 @@ async function main(core, ctx) {
8686
awInfo.cli_version = cliVersion;
8787
}
8888

89+
// Include deployment_state when triggered by a deployment_status event.
90+
// This makes the deployment state available to the agent without requiring it to
91+
// read the raw event payload, and is propagated to child workflows via aw_context.
92+
const deploymentState = ctx.payload?.deployment_status?.state;
93+
if (deploymentState && typeof deploymentState === "string") {
94+
awInfo.deployment_state = deploymentState;
95+
}
96+
8997
// Include custom token weights when set (engine.token-weights in workflow frontmatter).
9098
// Deep structure validation is intentionally minimal here: the JSON schema and Go parser
9199
// already validate the structure at compile time. We only verify the top-level type to

actions/setup/js/runtime_import.cjs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ const ALLOWED_EXPRESSIONS = [
5454
"github.event.comment.id",
5555
"github.event.deployment.id",
5656
"github.event.deployment_status.id",
57+
"github.event.deployment_status.state",
5758
"github.event.head_commit.id",
5859
"github.event.installation.id",
5960
"github.event.issue.number",

actions/setup/js/send_otlp_span.cjs

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -512,6 +512,13 @@ async function sendJobSetupSpan(options = {}) {
512512
if (eventName) {
513513
attributes.push(buildAttr("gh-aw.event_name", eventName));
514514
}
515+
// Deployment state: prefer the env var (set from github.event.deployment_status.state
516+
// in the compiled workflow), fall back to aw_context propagation via awInfo.
517+
const deploymentStateSetup =
518+
process.env.GH_AW_GITHUB_EVENT_DEPLOYMENT_STATUS_STATE || (typeof awInfo.deployment_state === "string" ? awInfo.deployment_state : "") || (typeof awInfo.context?.deployment_state === "string" ? awInfo.context.deployment_state : "");
519+
if (deploymentStateSetup) {
520+
attributes.push(buildAttr("gh-aw.deployment.state", deploymentStateSetup));
521+
}
515522
attributes.push(buildAttr("gh-aw.staged", staged));
516523

517524
const resourceAttributes = [buildAttr("github.repository", repository), buildAttr("github.run_id", runId)];
@@ -743,6 +750,13 @@ async function sendJobConclusionSpan(spanName, options = {}) {
743750
if (jobName) attributes.push(buildAttr("gh-aw.job.name", jobName));
744751
if (engineId) attributes.push(buildAttr("gh-aw.engine.id", engineId));
745752
if (eventName) attributes.push(buildAttr("gh-aw.event_name", eventName));
753+
// Deployment state: prefer the env var (set from github.event.deployment_status.state
754+
// in the compiled workflow), fall back to aw_info.deployment_state or aw_context propagation.
755+
const deploymentStateConclusion =
756+
process.env.GH_AW_GITHUB_EVENT_DEPLOYMENT_STATUS_STATE || (typeof awInfo.deployment_state === "string" ? awInfo.deployment_state : "") || (typeof awInfo.context?.deployment_state === "string" ? awInfo.context.deployment_state : "");
757+
if (deploymentStateConclusion) {
758+
attributes.push(buildAttr("gh-aw.deployment.state", deploymentStateConclusion));
759+
}
746760
attributes.push(buildAttr("gh-aw.staged", staged));
747761
if (!isNaN(effectiveTokens) && effectiveTokens > 0) {
748762
attributes.push(buildAttr("gh-aw.effective_tokens", effectiveTokens));
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# ADR-28549: Compile `deployment_status.state` Filter into GitHub Actions `if:` Condition
2+
3+
**Date**: 2026-04-26
4+
**Status**: Draft
5+
**Deciders**: Unknown (generated from PR diff — [PR #28549](https://github.com/github/gh-aw/pull/28549))
6+
7+
---
8+
9+
## Part 1 — Narrative (Human-Friendly)
10+
11+
### Context
12+
13+
The gh-aw compiler translates a higher-level Markdown-based workflow DSL into GitHub Actions YAML. The GitHub `deployment_status` event fires for every state change in an external deployment (pending, queued, in_progress, success, failure, error, inactive, waiting). For DevOps incident automation — the primary use-case for this trigger — only the terminal failure states (`error`, `failure`) are actionable, but GitHub Actions provides no native trigger-level filter for `deployment_status` by state. Without compiler support, workflow authors must write raw `if:` expressions manually, which is inconsistent with the DSL's abstraction level and causes agents to default to suboptimal triggers when generating workflows.
14+
15+
### Decision
16+
17+
We will add a `state:` field to the `deployment_status` trigger in the gh-aw DSL schema and compiler. When present, the compiler reads `on.deployment_status.state` (accepting a single string or an array) and synthesises the equivalent GitHub Actions expression (`github.event.deployment_status.state == 'error' || ...`), merging it into the job-level `if:` condition. The `state:` lines are commented out in the compiled lock file with an explanatory note. We will also introduce natural-language trigger shorthands (e.g., `"deployment failed"`, `"deployment failed or error"`) in `trigger_parser.go` that expand to the same `deployment_status` trigger with the appropriate `state` condition, enabling both the declarative YAML form and a concise prose form.
18+
19+
### Alternatives Considered
20+
21+
#### Alternative 1: Document the Pattern Without Compiler Changes
22+
23+
Add a canonical example using a manually written `if: github.event.deployment_status.state == 'failure'` expression and document the approach in the workflow guide, leaving the compiler unchanged.
24+
25+
This was not chosen because it keeps the filtering burden on workflow authors, is inconsistent with other trigger abstractions in the DSL (e.g., `issue.state`), and does not enable natural-language shorthands. Agents generating workflows from prose descriptions would still lack a declarative signal to use.
26+
27+
#### Alternative 2: Runtime Filtering Inside the Agent Prompt
28+
29+
Instead of compile-time condition synthesis, instruct the agent (via its system prompt or workflow description) to exit early when `github.event.deployment_status.state` is not a failure state.
30+
31+
This was not chosen because it consumes agent tokens on every non-failure deployment event, increases latency, and places correctness-critical control flow inside an LLM response rather than in deterministic compiled infrastructure. It also makes no-op runs indistinguishable from real activations in the audit log.
32+
33+
### Consequences
34+
35+
#### Positive
36+
- Workflow authors can express state-filtered deployment triggers declaratively (`state: [error, failure]`), consistent with other DSL trigger filters.
37+
- Natural-language shorthands (`on: "deployment failed or error"`) lower the barrier for DevOps automation, enabling agents to generate correct workflows from prose intent.
38+
- Compile-time `if:` conditions prevent unnecessary agent invocations for non-failure events, reducing cost and noise.
39+
- A canonical, compilable example (`deployment-incident-monitor.md`) gives teams a tested starting point.
40+
41+
#### Negative
42+
- The hardcoded `state` enum (`error`, `failure`, `pending`, `success`, `inactive`, `in_progress`, `queued`, `waiting`) must be kept in sync with GitHub's deployment status API; additions or renames require a compiler update.
43+
- Each new trigger type with semantic sub-fields (like `state:`) increases the surface area of the compiler's extraction logic, adding maintenance burden.
44+
- The natural-language parser introduces implicit mappings (`"deployment failed"``state == 'failure'`) that are opaque unless documented; future contributors may not know the shorthand exists.
45+
46+
#### Neutral
47+
- The `state:` lines are intentionally commented out in the compiled lock file, which may surprise contributors inspecting the generated YAML.
48+
- `TriggerIR.Conditions` propagation through `schedule_preprocessing.go` is a prerequisite change that affects all future NL trigger shorthands, not just `deployment_status`.
49+
50+
---
51+
52+
## Part 2 — Normative Specification (RFC 2119)
53+
54+
> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).
55+
56+
### Schema and Validation
57+
58+
1. The `deployment_status` trigger object **MUST** accept an optional `state` property that is either a single string or an array of strings.
59+
2. Each value in `state` **MUST** be one of the enumerated GitHub deployment status values: `error`, `failure`, `pending`, `success`, `inactive`, `in_progress`, `queued`, `waiting`.
60+
3. An unrecognised `state` value **SHOULD** produce a compiler warning and **MUST NOT** be silently ignored.
61+
62+
### Compilation
63+
64+
1. When `on.deployment_status.state` is present, the compiler **MUST** synthesise a GitHub Actions expression of the form `github.event.deployment_status.state == '<value>'`, joining multiple values with ` || `.
65+
2. The synthesised expression **MUST** be merged into the job-level `if:` condition of the activation job.
66+
3. The `state:` lines in the compiled lock file **MUST** be commented out with an explanatory note indicating that state filtering was compiled into the `if:` condition.
67+
4. The compiled lock file **MUST NOT** include a native `deployment_status.state` filter under `on:`, as GitHub Actions does not support trigger-level state filtering for this event.
68+
69+
### Natural-Language Trigger Parsing
70+
71+
1. The natural-language trigger parser **MUST** recognise the phrase `"deployment failed"` and expand it to a `deployment_status` trigger with `state == 'failure'`.
72+
2. The natural-language trigger parser **MUST** recognise the phrase `"deployment error"` and expand it to a `deployment_status` trigger with `state == 'error'`.
73+
3. The natural-language trigger parser **MUST** recognise the phrase `"deployment failed or error"` (and semantically equivalent phrasings) and expand it to a `deployment_status` trigger with `state == 'failure' || state == 'error'`.
74+
4. Natural-language expansions **MUST** produce conditions that are propagated through `TriggerIR.Conditions` into the frontmatter `if:` field.
75+
5. New natural-language deployment shorthands **SHOULD** be added to this parser rather than handled inline in calling code.
76+
77+
### Conformance
78+
79+
An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance.
80+
81+
---
82+
83+
*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/24955643779) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*

pkg/constants/tool_constants.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ var AllowedExpressions = []string{
1010
"github.event.comment.id",
1111
"github.event.deployment.id",
1212
"github.event.deployment_status.id",
13+
"github.event.deployment_status.state", // enum-like: "error", "failure", "success", "pending", "inactive", "in_progress", "queued", "waiting"
1314
"github.event.head_commit.id",
1415
"github.event.installation.id",
1516
"github.event.issue.number",

0 commit comments

Comments
 (0)