Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions .github/aw/github-agentic-workflows.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1,308 changes: 1,308 additions & 0 deletions .github/workflows/deployment-incident-monitor.lock.yml

Large diffs are not rendered by default.

59 changes: 59 additions & 0 deletions .github/workflows/deployment-incident-monitor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
description: Monitors deployment failures and automatically creates deduplicated incident issues with root cause analysis.
on:
deployment_status:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot

  • extend front matter support with "state" field that allows to declare the states, which then compiled into the if expression.
  • also add support in the on: ... short form natural language parser

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3f33d38. Two additions were made:

1. state: field in frontmatterdeployment_status now accepts a state: property that is compiled into the activation job's if: condition automatically, with the lines commented out in the lock file:

on:
  deployment_status:
    state: [error, failure]   # → compiled to: github.event.deployment_status.state == 'error' || ...

2. Natural language shorthand — the trigger parser now handles deployment patterns:

on: "deployment failed or error"    # → deployment_status trigger with state condition
on: "deployment error"              # → single-state condition

The example workflow deployment-incident-monitor.md was updated to use state: [error, failure] instead of the explicit if: condition, and the TriggerIR.Conditions are now propagated into the frontmatter if: field for all NL trigger shorthands.

state: [error, failure]
skip-if-match: "is:issue is:open label:incident label:deployment-failure"
permissions:
contents: read
actions: read
deployments: read
engine: copilot
tools:
github:
toolsets: [repos, actions]
safe-outputs:
create-issue:
expires: 7d
title-prefix: "[Incident] "
labels: [incident, deployment-failure]
close-older-issues: true
noop:
timeout-minutes: 10
---

# Deployment Incident Monitor

A deployment to **${{ github.event.deployment.environment }}** has failed with state `${{ github.event.deployment_status.state }}`.

## Your Task

Perform a root cause analysis of this deployment failure and create a focused incident issue.

## Deployment Context

- **Environment**: ${{ github.event.deployment.environment }}
- **Status**: ${{ github.event.deployment_status.state }}
- **Repository**: ${{ github.repository }}

## Investigation Steps

1. **Check for an existing open incident issue**: Look for open issues with both `incident` and `deployment-failure` labels. If one already exists for this environment and recent timeframe, call `noop` with a brief explanation.

2. **Gather context** using the available GitHub MCP tools:
- Look up recent workflow runs and job logs in the `actions` toolset to identify what failed
- Review recent commits to the deployed branch to identify changes that may have caused the failure
- Check if there were any related CI failures preceding the deployment

3. **Create an incident issue** if no duplicate exists. The issue should include:
- **Environment** and the deployment failure state
- **Summary** of likely root cause based on available evidence
- **Evidence**: relevant log excerpts, failing steps, or recent commits linked to the failure
- **Suggested remediation** steps for the on-call team
- A link to the failing deployment for quick access

## Output Guidelines

- Use `noop` if a duplicate open incident issue already exists.
- Keep the issue concise and actionable — focus on what the on-call engineer needs to know immediately.
- Do not create speculative issues; only create one when there is concrete evidence of a failure.
2 changes: 1 addition & 1 deletion .github/workflows/dev-hawk.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions actions/setup/js/aw_context.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ function resolveItemContext(payload) {
* item_number: string,
* comment_id: string,
* comment_node_id: string,
* deployment_state: string,
* otel_trace_id: string,
* otel_parent_span_id: string
* }}
Expand All @@ -122,6 +123,10 @@ function resolveItemContext(payload) {
* Only populated for discussion/discussion_comment events. Can be passed
* as reply_to_id in add_comment to thread responses under the triggering
* comment when a dispatched specialist workflow replies to a discussion.
* - deployment_state: The deployment status state value (e.g. "failure", "error",
* "success") when the workflow was triggered by a deployment_status event.
* Empty string for all other event types. Propagated to child workflows via
* workflow_call so they can identify which state triggered the parent.
* - otel_trace_id: OTLP trace ID from the parent workflow's setup span.
* Empty string when OTLP is not configured or the parent setup step has
* not yet run. Used by child workflow setup steps to continue the same
Expand Down Expand Up @@ -150,6 +155,10 @@ function buildAwContext() {
item_number,
comment_id,
comment_node_id,
// deployment_state carries the GitHub deployment_status state value when the
// triggering event is deployment_status. Empty string for all other events.
// Propagated to called workflows so they can access the deployment state.
deployment_state: context.eventName === "deployment_status" ? (context.payload?.deployment_status?.state ?? "") : "",
// Propagate the current OTLP trace ID to dispatched child workflows so that
// composite actions share the same trace as their parent. Empty string when
// OTLP is not configured or the parent setup step has not run yet.
Expand Down
8 changes: 8 additions & 0 deletions actions/setup/js/generate_aw_info.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,14 @@ async function main(core, ctx) {
awInfo.cli_version = cliVersion;
}

// Include deployment_state when triggered by a deployment_status event.
// This makes the deployment state available to the agent without requiring it to
// read the raw event payload, and is propagated to child workflows via aw_context.
const deploymentState = ctx.payload?.deployment_status?.state;
if (deploymentState && typeof deploymentState === "string") {
awInfo.deployment_state = deploymentState;
}

// Include custom token weights when set (engine.token-weights in workflow frontmatter).
// Deep structure validation is intentionally minimal here: the JSON schema and Go parser
// already validate the structure at compile time. We only verify the top-level type to
Expand Down
1 change: 1 addition & 0 deletions actions/setup/js/runtime_import.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ const ALLOWED_EXPRESSIONS = [
"github.event.comment.id",
"github.event.deployment.id",
"github.event.deployment_status.id",
"github.event.deployment_status.state",
"github.event.head_commit.id",
"github.event.installation.id",
"github.event.issue.number",
Expand Down
14 changes: 14 additions & 0 deletions actions/setup/js/send_otlp_span.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -512,6 +512,13 @@ async function sendJobSetupSpan(options = {}) {
if (eventName) {
attributes.push(buildAttr("gh-aw.event_name", eventName));
}
// Deployment state: prefer the env var (set from github.event.deployment_status.state
// in the compiled workflow), fall back to aw_context propagation via awInfo.
const deploymentStateSetup =
process.env.GH_AW_GITHUB_EVENT_DEPLOYMENT_STATUS_STATE || (typeof awInfo.deployment_state === "string" ? awInfo.deployment_state : "") || (typeof awInfo.context?.deployment_state === "string" ? awInfo.context.deployment_state : "");
if (deploymentStateSetup) {
attributes.push(buildAttr("gh-aw.deployment.state", deploymentStateSetup));
}
attributes.push(buildAttr("gh-aw.staged", staged));

const resourceAttributes = [buildAttr("github.repository", repository), buildAttr("github.run_id", runId)];
Expand Down Expand Up @@ -743,6 +750,13 @@ async function sendJobConclusionSpan(spanName, options = {}) {
if (jobName) attributes.push(buildAttr("gh-aw.job.name", jobName));
if (engineId) attributes.push(buildAttr("gh-aw.engine.id", engineId));
if (eventName) attributes.push(buildAttr("gh-aw.event_name", eventName));
// Deployment state: prefer the env var (set from github.event.deployment_status.state
// in the compiled workflow), fall back to aw_info.deployment_state or aw_context propagation.
const deploymentStateConclusion =
process.env.GH_AW_GITHUB_EVENT_DEPLOYMENT_STATUS_STATE || (typeof awInfo.deployment_state === "string" ? awInfo.deployment_state : "") || (typeof awInfo.context?.deployment_state === "string" ? awInfo.context.deployment_state : "");
if (deploymentStateConclusion) {
attributes.push(buildAttr("gh-aw.deployment.state", deploymentStateConclusion));
}
attributes.push(buildAttr("gh-aw.staged", staged));
if (!isNaN(effectiveTokens) && effectiveTokens > 0) {
attributes.push(buildAttr("gh-aw.effective_tokens", effectiveTokens));
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# ADR-28549: Compile `deployment_status.state` Filter into GitHub Actions `if:` Condition

**Date**: 2026-04-26
**Status**: Draft
**Deciders**: Unknown (generated from PR diff — [PR #28549](https://github.com/github/gh-aw/pull/28549))

---

## Part 1 — Narrative (Human-Friendly)

### Context

The gh-aw compiler translates a higher-level Markdown-based workflow DSL into GitHub Actions YAML. The GitHub `deployment_status` event fires for every state change in an external deployment (pending, queued, in_progress, success, failure, error, inactive, waiting). For DevOps incident automation — the primary use-case for this trigger — only the terminal failure states (`error`, `failure`) are actionable, but GitHub Actions provides no native trigger-level filter for `deployment_status` by state. Without compiler support, workflow authors must write raw `if:` expressions manually, which is inconsistent with the DSL's abstraction level and causes agents to default to suboptimal triggers when generating workflows.

### Decision

We will add a `state:` field to the `deployment_status` trigger in the gh-aw DSL schema and compiler. When present, the compiler reads `on.deployment_status.state` (accepting a single string or an array) and synthesises the equivalent GitHub Actions expression (`github.event.deployment_status.state == 'error' || ...`), merging it into the job-level `if:` condition. The `state:` lines are commented out in the compiled lock file with an explanatory note. We will also introduce natural-language trigger shorthands (e.g., `"deployment failed"`, `"deployment failed or error"`) in `trigger_parser.go` that expand to the same `deployment_status` trigger with the appropriate `state` condition, enabling both the declarative YAML form and a concise prose form.

### Alternatives Considered

#### Alternative 1: Document the Pattern Without Compiler Changes

Add a canonical example using a manually written `if: github.event.deployment_status.state == 'failure'` expression and document the approach in the workflow guide, leaving the compiler unchanged.

This was not chosen because it keeps the filtering burden on workflow authors, is inconsistent with other trigger abstractions in the DSL (e.g., `issue.state`), and does not enable natural-language shorthands. Agents generating workflows from prose descriptions would still lack a declarative signal to use.

#### Alternative 2: Runtime Filtering Inside the Agent Prompt

Instead of compile-time condition synthesis, instruct the agent (via its system prompt or workflow description) to exit early when `github.event.deployment_status.state` is not a failure state.

This was not chosen because it consumes agent tokens on every non-failure deployment event, increases latency, and places correctness-critical control flow inside an LLM response rather than in deterministic compiled infrastructure. It also makes no-op runs indistinguishable from real activations in the audit log.

### Consequences

#### Positive
- Workflow authors can express state-filtered deployment triggers declaratively (`state: [error, failure]`), consistent with other DSL trigger filters.
- Natural-language shorthands (`on: "deployment failed or error"`) lower the barrier for DevOps automation, enabling agents to generate correct workflows from prose intent.
- Compile-time `if:` conditions prevent unnecessary agent invocations for non-failure events, reducing cost and noise.
- A canonical, compilable example (`deployment-incident-monitor.md`) gives teams a tested starting point.

#### Negative
- The hardcoded `state` enum (`error`, `failure`, `pending`, `success`, `inactive`, `in_progress`, `queued`, `waiting`) must be kept in sync with GitHub's deployment status API; additions or renames require a compiler update.
- Each new trigger type with semantic sub-fields (like `state:`) increases the surface area of the compiler's extraction logic, adding maintenance burden.
- The natural-language parser introduces implicit mappings (`"deployment failed"` → `state == 'failure'`) that are opaque unless documented; future contributors may not know the shorthand exists.

#### Neutral
- The `state:` lines are intentionally commented out in the compiled lock file, which may surprise contributors inspecting the generated YAML.
- `TriggerIR.Conditions` propagation through `schedule_preprocessing.go` is a prerequisite change that affects all future NL trigger shorthands, not just `deployment_status`.

---

## Part 2 — Normative Specification (RFC 2119)

> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).

### Schema and Validation

1. The `deployment_status` trigger object **MUST** accept an optional `state` property that is either a single string or an array of strings.
2. Each value in `state` **MUST** be one of the enumerated GitHub deployment status values: `error`, `failure`, `pending`, `success`, `inactive`, `in_progress`, `queued`, `waiting`.
3. An unrecognised `state` value **SHOULD** produce a compiler warning and **MUST NOT** be silently ignored.

### Compilation

1. When `on.deployment_status.state` is present, the compiler **MUST** synthesise a GitHub Actions expression of the form `github.event.deployment_status.state == '<value>'`, joining multiple values with ` || `.
2. The synthesised expression **MUST** be merged into the job-level `if:` condition of the activation job.
3. The `state:` lines in the compiled lock file **MUST** be commented out with an explanatory note indicating that state filtering was compiled into the `if:` condition.
4. The compiled lock file **MUST NOT** include a native `deployment_status.state` filter under `on:`, as GitHub Actions does not support trigger-level state filtering for this event.

### Natural-Language Trigger Parsing

1. The natural-language trigger parser **MUST** recognise the phrase `"deployment failed"` and expand it to a `deployment_status` trigger with `state == 'failure'`.
2. The natural-language trigger parser **MUST** recognise the phrase `"deployment error"` and expand it to a `deployment_status` trigger with `state == 'error'`.
3. The natural-language trigger parser **MUST** recognise the phrase `"deployment failed or error"` (and semantically equivalent phrasings) and expand it to a `deployment_status` trigger with `state == 'failure' || state == 'error'`.
4. Natural-language expansions **MUST** produce conditions that are propagated through `TriggerIR.Conditions` into the frontmatter `if:` field.
5. New natural-language deployment shorthands **SHOULD** be added to this parser rather than handled inline in calling code.

### Conformance

An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance.

---

*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/24955643779) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*
1 change: 1 addition & 0 deletions pkg/constants/tool_constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ var AllowedExpressions = []string{
"github.event.comment.id",
"github.event.deployment.id",
"github.event.deployment_status.id",
"github.event.deployment_status.state", // enum-like: "error", "failure", "success", "pending", "inactive", "in_progress", "queued", "waiting"
"github.event.head_commit.id",
"github.event.installation.id",
"github.event.issue.number",
Expand Down
21 changes: 20 additions & 1 deletion pkg/parser/schemas/main_workflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -1255,7 +1255,26 @@
},
{
"type": "object",
"additionalProperties": false
"additionalProperties": false,
"properties": {
"state": {
"description": "Filter to specific deployment states (compiled into if condition). Use a string for one state or an array for multiple states.",
"oneOf": [
{
"type": "string",
"enum": ["error", "failure", "pending", "success", "inactive", "in_progress", "queued", "waiting"]
},
{
"type": "array",
"items": {
"type": "string",
"enum": ["error", "failure", "pending", "success", "inactive", "in_progress", "queued", "waiting"]
},
"minItems": 1
}
]
}
}
}
]
},
Expand Down
Loading