-
Notifications
You must be signed in to change notification settings - Fork 367
feat: add deployment-incident-monitor example workflow and deployment_status state filter #28549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
pelikhan
merged 8 commits into
main
from
copilot/add-deployment-status-workflow-example
Apr 26, 2026
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
6a912c4
Initial plan
Copilot cb97266
feat: add deployment-incident-monitor example workflow and allowed ex…
Copilot 6cb86fd
fix: address code review - use explicit 7d expires format and documen…
Copilot 3f33d38
feat: add deployment_status state: field and natural language trigger…
Copilot 161844b
docs(adr): add draft ADR-28549 for deployment_status state filter com…
github-actions[bot] 71dd7ab
fix: strip ${{ }} wrappers before merging if conditions; add conditio…
Copilot f81c851
feat: guard state condition by event_name, add deployment_state to ag…
Copilot f7c3d40
feat: store deployment_state in OTLP span attributes
Copilot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
1,308 changes: 1,308 additions & 0 deletions
1,308
.github/workflows/deployment-incident-monitor.lock.yml
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| --- | ||
| description: Monitors deployment failures and automatically creates deduplicated incident issues with root cause analysis. | ||
| on: | ||
| deployment_status: | ||
| state: [error, failure] | ||
| skip-if-match: "is:issue is:open label:incident label:deployment-failure" | ||
| permissions: | ||
| contents: read | ||
| actions: read | ||
| deployments: read | ||
| engine: copilot | ||
| tools: | ||
| github: | ||
| toolsets: [repos, actions] | ||
| safe-outputs: | ||
| create-issue: | ||
| expires: 7d | ||
| title-prefix: "[Incident] " | ||
| labels: [incident, deployment-failure] | ||
| close-older-issues: true | ||
| noop: | ||
| timeout-minutes: 10 | ||
| --- | ||
|
|
||
| # Deployment Incident Monitor | ||
|
|
||
| A deployment to **${{ github.event.deployment.environment }}** has failed with state `${{ github.event.deployment_status.state }}`. | ||
|
|
||
| ## Your Task | ||
|
|
||
| Perform a root cause analysis of this deployment failure and create a focused incident issue. | ||
|
|
||
| ## Deployment Context | ||
|
|
||
| - **Environment**: ${{ github.event.deployment.environment }} | ||
| - **Status**: ${{ github.event.deployment_status.state }} | ||
| - **Repository**: ${{ github.repository }} | ||
|
|
||
| ## Investigation Steps | ||
|
|
||
| 1. **Check for an existing open incident issue**: Look for open issues with both `incident` and `deployment-failure` labels. If one already exists for this environment and recent timeframe, call `noop` with a brief explanation. | ||
|
|
||
| 2. **Gather context** using the available GitHub MCP tools: | ||
| - Look up recent workflow runs and job logs in the `actions` toolset to identify what failed | ||
| - Review recent commits to the deployed branch to identify changes that may have caused the failure | ||
| - Check if there were any related CI failures preceding the deployment | ||
|
|
||
| 3. **Create an incident issue** if no duplicate exists. The issue should include: | ||
| - **Environment** and the deployment failure state | ||
| - **Summary** of likely root cause based on available evidence | ||
| - **Evidence**: relevant log excerpts, failing steps, or recent commits linked to the failure | ||
| - **Suggested remediation** steps for the on-call team | ||
| - A link to the failing deployment for quick access | ||
|
|
||
| ## Output Guidelines | ||
|
|
||
| - Use `noop` if a duplicate open incident issue already exists. | ||
| - Keep the issue concise and actionable — focus on what the on-call engineer needs to know immediately. | ||
| - Do not create speculative issues; only create one when there is concrete evidence of a failure. | ||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
83 changes: 83 additions & 0 deletions
83
docs/adr/28549-compile-deployment-status-state-filter-into-if-condition.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| # ADR-28549: Compile `deployment_status.state` Filter into GitHub Actions `if:` Condition | ||
|
|
||
| **Date**: 2026-04-26 | ||
| **Status**: Draft | ||
| **Deciders**: Unknown (generated from PR diff — [PR #28549](https://github.com/github/gh-aw/pull/28549)) | ||
|
|
||
| --- | ||
|
|
||
| ## Part 1 — Narrative (Human-Friendly) | ||
|
|
||
| ### Context | ||
|
|
||
| The gh-aw compiler translates a higher-level Markdown-based workflow DSL into GitHub Actions YAML. The GitHub `deployment_status` event fires for every state change in an external deployment (pending, queued, in_progress, success, failure, error, inactive, waiting). For DevOps incident automation — the primary use-case for this trigger — only the terminal failure states (`error`, `failure`) are actionable, but GitHub Actions provides no native trigger-level filter for `deployment_status` by state. Without compiler support, workflow authors must write raw `if:` expressions manually, which is inconsistent with the DSL's abstraction level and causes agents to default to suboptimal triggers when generating workflows. | ||
|
|
||
| ### Decision | ||
|
|
||
| We will add a `state:` field to the `deployment_status` trigger in the gh-aw DSL schema and compiler. When present, the compiler reads `on.deployment_status.state` (accepting a single string or an array) and synthesises the equivalent GitHub Actions expression (`github.event.deployment_status.state == 'error' || ...`), merging it into the job-level `if:` condition. The `state:` lines are commented out in the compiled lock file with an explanatory note. We will also introduce natural-language trigger shorthands (e.g., `"deployment failed"`, `"deployment failed or error"`) in `trigger_parser.go` that expand to the same `deployment_status` trigger with the appropriate `state` condition, enabling both the declarative YAML form and a concise prose form. | ||
|
|
||
| ### Alternatives Considered | ||
|
|
||
| #### Alternative 1: Document the Pattern Without Compiler Changes | ||
|
|
||
| Add a canonical example using a manually written `if: github.event.deployment_status.state == 'failure'` expression and document the approach in the workflow guide, leaving the compiler unchanged. | ||
|
|
||
| This was not chosen because it keeps the filtering burden on workflow authors, is inconsistent with other trigger abstractions in the DSL (e.g., `issue.state`), and does not enable natural-language shorthands. Agents generating workflows from prose descriptions would still lack a declarative signal to use. | ||
|
|
||
| #### Alternative 2: Runtime Filtering Inside the Agent Prompt | ||
|
|
||
| Instead of compile-time condition synthesis, instruct the agent (via its system prompt or workflow description) to exit early when `github.event.deployment_status.state` is not a failure state. | ||
|
|
||
| This was not chosen because it consumes agent tokens on every non-failure deployment event, increases latency, and places correctness-critical control flow inside an LLM response rather than in deterministic compiled infrastructure. It also makes no-op runs indistinguishable from real activations in the audit log. | ||
|
|
||
| ### Consequences | ||
|
|
||
| #### Positive | ||
| - Workflow authors can express state-filtered deployment triggers declaratively (`state: [error, failure]`), consistent with other DSL trigger filters. | ||
| - Natural-language shorthands (`on: "deployment failed or error"`) lower the barrier for DevOps automation, enabling agents to generate correct workflows from prose intent. | ||
| - Compile-time `if:` conditions prevent unnecessary agent invocations for non-failure events, reducing cost and noise. | ||
| - A canonical, compilable example (`deployment-incident-monitor.md`) gives teams a tested starting point. | ||
|
|
||
| #### Negative | ||
| - The hardcoded `state` enum (`error`, `failure`, `pending`, `success`, `inactive`, `in_progress`, `queued`, `waiting`) must be kept in sync with GitHub's deployment status API; additions or renames require a compiler update. | ||
| - Each new trigger type with semantic sub-fields (like `state:`) increases the surface area of the compiler's extraction logic, adding maintenance burden. | ||
| - The natural-language parser introduces implicit mappings (`"deployment failed"` → `state == 'failure'`) that are opaque unless documented; future contributors may not know the shorthand exists. | ||
|
|
||
| #### Neutral | ||
| - The `state:` lines are intentionally commented out in the compiled lock file, which may surprise contributors inspecting the generated YAML. | ||
| - `TriggerIR.Conditions` propagation through `schedule_preprocessing.go` is a prerequisite change that affects all future NL trigger shorthands, not just `deployment_status`. | ||
|
|
||
| --- | ||
|
|
||
| ## Part 2 — Normative Specification (RFC 2119) | ||
|
|
||
| > The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119). | ||
|
|
||
| ### Schema and Validation | ||
|
|
||
| 1. The `deployment_status` trigger object **MUST** accept an optional `state` property that is either a single string or an array of strings. | ||
| 2. Each value in `state` **MUST** be one of the enumerated GitHub deployment status values: `error`, `failure`, `pending`, `success`, `inactive`, `in_progress`, `queued`, `waiting`. | ||
| 3. An unrecognised `state` value **SHOULD** produce a compiler warning and **MUST NOT** be silently ignored. | ||
|
|
||
| ### Compilation | ||
|
|
||
| 1. When `on.deployment_status.state` is present, the compiler **MUST** synthesise a GitHub Actions expression of the form `github.event.deployment_status.state == '<value>'`, joining multiple values with ` || `. | ||
| 2. The synthesised expression **MUST** be merged into the job-level `if:` condition of the activation job. | ||
| 3. The `state:` lines in the compiled lock file **MUST** be commented out with an explanatory note indicating that state filtering was compiled into the `if:` condition. | ||
| 4. The compiled lock file **MUST NOT** include a native `deployment_status.state` filter under `on:`, as GitHub Actions does not support trigger-level state filtering for this event. | ||
|
|
||
| ### Natural-Language Trigger Parsing | ||
|
|
||
| 1. The natural-language trigger parser **MUST** recognise the phrase `"deployment failed"` and expand it to a `deployment_status` trigger with `state == 'failure'`. | ||
| 2. The natural-language trigger parser **MUST** recognise the phrase `"deployment error"` and expand it to a `deployment_status` trigger with `state == 'error'`. | ||
| 3. The natural-language trigger parser **MUST** recognise the phrase `"deployment failed or error"` (and semantically equivalent phrasings) and expand it to a `deployment_status` trigger with `state == 'failure' || state == 'error'`. | ||
| 4. Natural-language expansions **MUST** produce conditions that are propagated through `TriggerIR.Conditions` into the frontmatter `if:` field. | ||
| 5. New natural-language deployment shorthands **SHOULD** be added to this parser rather than handled inline in calling code. | ||
|
|
||
| ### Conformance | ||
|
|
||
| An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance. | ||
|
|
||
| --- | ||
|
|
||
| *This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/24955643779) workflow. The PR author must review, complete, and finalize this document before the PR can merge.* |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3f33d38. Two additions were made:
1.
state:field in frontmatter —deployment_statusnow accepts astate:property that is compiled into the activation job'sif:condition automatically, with the lines commented out in the lock file:2. Natural language shorthand — the trigger parser now handles deployment patterns:
The example workflow
deployment-incident-monitor.mdwas updated to usestate: [error, failure]instead of the explicitif:condition, and theTriggerIR.Conditionsare now propagated into the frontmatterif:field for all NL trigger shorthands.