specs: daily SPDD work plan 2026-05-17 — safeguards, DriftRecord schema, sync notes, conformance tests, lifecycle sections by Copilot · Pull Request #32844 · github/gh-aw

Copilot · 2026-05-17T16:17:34Z

Five spec files had thin or missing Safeguards sections, no entity schemas for internal data structures, absent Sync Notes, and incomplete conformance test coverage. This PR addresses all 10 tasks from the daily SPDD rotation (2 P0, 3 P1, 5 P2).

`specs/aw-harness.md`

§11.4 Degraded Mode & Safeguards — 4 numbered MUST/SHOULD statements covering budget-exhaustion orderly shutdown, non-fatal observability failure behavior, fail-secure exit-code contract (0/1/2), and step-summary degraded-mode annotation
## Sync Notes — 6-row table mapping spec sections to actions/setup/js/aw_harness.cjs, pkg/workflow/, actions/setup/js/, and pkg/cli/workflows/

`specs/awf-config-sources-spec.md`

§4.5 DriftRecord Entity Schema — formal JSON Schema block with property_path, drift_category (enum: missing_in_ghaw | missing_in_schema | spec_mismatch), suggested_action, detected_at; field reference table and usage example with corrective PR embedding requirement
CR-06a — escalation-owner assignment rule: default owner is last maintainer to touch the drifted property's implementation file; must acknowledge within 1 business day; issue must not be left unassigned

`specs/compiler-threat-detection-spec.md`

§6.4 False-Positive Handling — suppression via frontmatter threat-detection-suppress annotation (mandatory reason, optional expires), lock-file audit trail requirement, 10-business-day SLA with expired-suppression enforcement
§6.5 Threat Category Lifecycle — 3 normative stages: Experimental (prototype, no prod failures) → Candidate (feature-flagged, one release cycle, evidence collection) → Normative (full PR: rule + mapping + test + changelog, security maintainer review required)

`specs/safe-output-outcome-evaluation.md`

### Conformance Test Table — one row per all 29 safe-output types with ghaw.outcome.type OTel attribute, pass condition, and fail condition
### OTel Backend Unavailability — 2 MUST statements: persist outcome to local audit log before OTLP export attempt; exponential back-off retry (5s→5min, max 5 attempts) with export_failed flag on exhaustion

`specs/security-architecture-spec.md`

## Sync Notes — cross-references specs/security-architecture-spec-validation.md; defines 4 revalidation triggers (minor version bump, MUST change, CVE attribution, implementation change)
T-SG07-001 / T-SG07-002 — fail-secure compliance tests for SG-07 added to §12.2.7: schema validation error → compile failure + no lock file written; MUST-level requirement violation → compile failure + rule ID in stderr. Both added to compliance checklist at Level 1 Required.

…riftRecord schema, sync notes, conformance tests, and lifecycle sections Agent-Logs-Url: https://github.com/github/gh-aw/sessions/10531a0b-f21b-4ae0-8c29-d02fef519698 Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>

Copilot

Pull request overview

This PR updates several specifications with safeguards, schema definitions, sync notes, lifecycle guidance, and conformance coverage for gh-aw workflows and related security/telemetry behavior.

Changes:

Adds degraded-mode, fail-secure, false-positive, and lifecycle requirements across harness and threat-detection specs.
Adds DriftRecord schema and escalation-owner rules for AWF config drift.
Adds sync notes and conformance/compliance test tables for security architecture and safe-output outcome evaluation.

Show a summary per file

File	Description
`specs/aw-harness.md`	Adds degraded-mode safeguards and implementation sync notes for the AW harness.
`specs/awf-config-sources-spec.md`	Adds escalation ownership rules and a DriftRecord JSON schema/usage section.
`specs/compiler-threat-detection-spec.md`	Adds false-positive suppression handling and threat-category lifecycle stages.
`specs/safe-output-outcome-evaluation.md`	Adds safe-output conformance table and OTLP backend unavailability requirements.
`specs/security-architecture-spec.md`	Adds SG-07 fail-secure tests, checklist coverage, and validation sync notes.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (16)

specs/safe-output-outcome-evaluation.md:649

This row treats an open issue as satisfying the accepted pass condition, but Section 2 classifies open issues as pending when engaged or ignored when there are no human comments; accepted is only state_reason == completed. The conformance test would assert the wrong outcome.

| `create_issue` | `create_issue` | Issue exists in open state, or was closed by human action (not bot policy) within the evaluation window | Issue closed-as-not-planned by human within the evaluation window, or deleted |

specs/safe-output-outcome-evaluation.md:650

This row says a comment merely existing is the accepted pass condition, while Section 3 requires replies or reactions for accepted and classifies an existing comment with no engagement as ignored. This weakens the conformance test relative to the evaluator definition.

| `add_comment` | `add_comment` | Comment exists on the target object at evaluation time | Comment was deleted or hidden by a human (non-bot) actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:651

This row accepts the outcome when at least one applied label remains, but Section 4 requires all added labels to remain and marks partial removal as rejected. The conformance test would incorrectly pass partial label rollback.

| `add_labels` | `add_labels` | At least one of the bot-applied labels is still present on the target object at evaluation time | All bot-applied labels were removed by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:652

This row treats a still-pending reviewer request as accepted, but Section 5 requires the assigned reviewer to submit a review for accepted; no review while the PR is open is pending. The pass condition should not include the requested-reviewer state by itself.

| `add_reviewer` | `add_reviewer` | Requested reviewer is still listed as a requested reviewer, or has already submitted a review | Reviewer request was removed by a human actor before any review was submitted |

specs/safe-output-outcome-evaluation.md:655

This conformance row uses accepted for any issue that remains closed, but Section 8 classifies bot/lifecycle closes as lifecycle_close and non-lifecycle closes as rejected; it does not define a closed issue as accepted. This conflicts with the existing outcome taxonomy.

| `close_issue` | `close_issue` | Issue remains closed at evaluation time | Issue was reopened by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:656

This row treats a PR remaining closed as accepted, but Section 9 classifies a still-closed PR using lifecycle/non-lifecycle logic and marks it lifecycle_close or rejected, not accepted. The conformance pass condition contradicts the evaluator definition.

| `close_pull_request` | `close_pull_request` | PR remains closed (not merged) at evaluation time | PR was reopened or merged after the bot closed it within the evaluation window |

specs/safe-output-outcome-evaluation.md:658

This row says a discussion merely existing is sufficient for accepted, but Section 11 requires replies or an accepted answer; a discussion with no replies and no answer is ignored. The conformance test would over-count successful discussion creations.

| `create_discussion` | `create_discussion` | Discussion exists and has not been deleted or locked within the evaluation window | Discussion was deleted or permanently locked (preventing any responses) within the evaluation window |

specs/safe-output-outcome-evaluation.md:660

This row treats the review comment's existence as the accepted condition, while Section 13 requires the thread to be resolved or have replies and otherwise classifies it as ignored. The conformance table should match that engagement-based outcome.

| `create_pull_request_review_comment` | `create_pull_request_review_comment` | Review comment exists on the PR diff at evaluation time | Review comment was deleted by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:661

This row accepts any submitted PR review record, but Section 14 defines accepted based on follow-up commits or the PR being merged; a review that simply exists with no follow-up remains pending. This conformance condition would mark unresolved review feedback as accepted.

| `submit_pull_request_review` | `submit_pull_request_review` | PR review record exists with the submitted state (APPROVED, CHANGES_REQUESTED, COMMENT) at evaluation time | Review was dismissed by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:662

This row defines a reply comment existing as accepted, but Section 15 only accepts the outcome when the thread is resolved after the reply or others continue the conversation; no further activity is ignored. The conformance row drops that required engagement signal.

| `reply_to_pull_request_review_comment` | `reply_to_pull_request_review_comment` | Reply comment exists in the review thread at evaluation time | Reply comment was deleted by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:665

This row accepts any PR that is no longer draft, but Section 18 requires at least one review submitted after marking ready for review; a ready PR with no reviews is pending or ignored depending on state. This conformance test would pass too early.

| `mark_pull_request_as_ready_for_review` | `mark_pull_request_as_ready_for_review` | PR is no longer in draft state at evaluation time | PR was converted back to draft by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:666

This row accepts the assignment record itself, but Section 19 bases accepted on the agent producing a merged PR or the issue being resolved; no agent PR is ignored and an open agent PR is pending. This pass condition measures dispatch rather than outcome.

| `assign_to_agent` | `assign_to_agent` | Assignment record exists on the target issue/PR at evaluation time | Assignment was removed by a human actor before the assigned agent acted on it |

specs/safe-output-outcome-evaluation.md:667

This row accepts dispatched workflow runs that reach either success or failure, but Section 20 explicitly classifies conclusion == "failure" as rejected. A failed workflow run should be the fail condition, not part of the pass condition.

| `dispatch_workflow` | `dispatch_workflow` | The dispatched workflow run exists and reached a terminal state (success or failure) within the evaluation window | The dispatched workflow run was cancelled before reaching a terminal state; or no corresponding run record is found |

specs/safe-output-outcome-evaluation.md:668

This row treats dismissed code scanning alerts as accepted, but Section 21 classifies alert dismissal as rejected and only fixed alerts or merged linked PRs as accepted. The conformance row contradicts the evaluator rules.

| `autofix_code_scanning_alert` | `autofix_code_scanning_alert` | Code scanning alert is in a fixed or dismissed state at evaluation time | Alert was re-opened or the fix commit was reverted within the evaluation window |

specs/safe-output-outcome-evaluation.md:669

This row accepts the mere existence of a created code scanning alert, while Section 22 only treats fixed or dismissed-with-reason alerts as accepted and leaves open alerts pending. The conformance condition would mark untriaged alerts as successful.

| `create_code_scanning_alert` | `create_code_scanning_alert` | Alert record exists in the repository's code scanning results at evaluation time | Alert was immediately dismissed (within the evaluation window) with no investigation action |

specs/safe-output-outcome-evaluation.md:664

This row accepts a pushed commit that is still present in the PR branch, but Section 17 classifies an open PR as pending and only PR merged as accepted (with closed/reverted cases rejected). Presence in branch history is not sufficient for acceptance under the existing evaluator rules.

| `push_to_pull_request_branch` | `push_to_pull_request_branch` | The pushed commit SHA is still present in the PR branch history at evaluation time | The commit was force-pushed out of the branch history by a human actor within the evaluation window |

Files reviewed: 5/5 changed files
Comments generated: 18

+
+| Output type | Expected `ghaw.outcome.type` OTel attribute | Pass condition | Fail condition |
+|---|---|---|---|
+| `create_pull_request` | `create_pull_request` | PR exists in open or merged state; was not closed-as-not-planned or reverted within the evaluation window | PR closed-as-not-planned, reverted, or deleted within the evaluation window |


+| `noop` | `noop` | Evaluation is skipped; no outcome is computed | N/A — `noop` always results in `ignored` |
+| `missing_tool` | `missing_tool` | Evaluation is skipped; no outcome is computed | N/A — `missing_tool` always results in `ignored` |


+    "property_path": {
+      "type": "string",
+      "description": "Dot-notation path to the drifted configuration property (e.g., 'apiProxy.anthropicAutoCache').",
+      "examples": ["apiProxy.anthropicAutoCache", "container.dockerHostPathPrefix"]


+    "drift_category": {
+      "type": "string",
+      "enum": ["missing_in_ghaw", "missing_in_schema", "spec_mismatch"],
+      "description": "Classification of the drift condition. 'missing_in_ghaw': property exists in canonical schema but gh-aw has no coverage. 'missing_in_schema': gh-aw generates a field not present in either schema. 'spec_mismatch': CLI mapping in gh-aw disagrees with the normative spec description."


+| `property_path` | `string` | **MUST** | Dot-notation config property path (e.g., `apiProxy.anthropicAutoCache`) |
+| `drift_category` | `enum` | **MUST** | One of `missing_in_ghaw`, `missing_in_schema`, or `spec_mismatch` (see Section 4.2, Step 4) |
+| `suggested_action` | `string` | **MUST** | Actionable remediation text; **MUST NOT** be empty |
+| `detected_at` | `string` (ISO 8601) | **MUST** | UTC timestamp of detection; filesystem-safe format **SHOULD** use `YYYY-MM-DDTHH:MM:SSZ` |


+
+#### 4.5.3 Usage
+
+The drift detection procedure (Section 4.2, Step 5) **MUST** produce a list of zero or more `DriftRecord` objects. When any record has `drift_category` of `missing_in_ghaw` or `spec_mismatch`, the detecting automation **MUST** open a corrective PR (CR-05) and, if the SLA window is exceeded, an escalation issue (CR-06). The corrective PR description **MUST** embed the full `DriftRecord` list as JSON.


+    "detected_at": {
+      "type": "string",
+      "format": "date-time",
+      "description": "ISO 8601 timestamp (UTC) when this drift item was first detected in the current run."
+    }


+
+---
+
+## Sync Notes


+
+The table below specifies one conformance test row per safe-output type. Each row defines the expected OTel attribute value emitted by a correct evaluator, the pass condition (what must be true for `accepted`), and the fail condition (what signals `rejected`). Implementations **MUST** satisfy the pass condition and **MUST** not emit `accepted` when the fail condition is observed.
+
+| Output type | Expected `ghaw.outcome.type` OTel attribute | Pass condition | Fail condition |


+
+2. **Audit trail requirement**: Every active suppression annotation **MUST** be recorded in the compiled lock file (`.lock.yml`) manifest section so that reviewers can audit which rules are suppressed and why. The lock file **MUST** include the full `rule`, `reason`, and `expires` values for each suppression. Suppressions absent from the lock file manifest **MUST** be treated by subsequent compilations as unapproved and re-evaluated against the current CTR rule.
+
+3. **SLA for resolution**: Suppressions marked as false positives that affect a `MUST`-level security control (as defined in Section 5.1 — specifically those rules whose compiler action is `reject` in non-strict mode) **SHOULD** be resolved within **10 business days** — either by confirming the suppression is correct and updating the rule's detection logic to eliminate the false positive, or by removing the suppression when the workflow is corrected. The daily optimizer **SHOULD** surface unresolved suppressions older than 10 business days in its daily output. A suppression **MUST** be re-evaluated and explicitly renewed if the `expires` date passes; expired suppressions **MUST** be treated by the compiler as if they do not exist.


Initial plan

f88e48c

Copilot AI assigned Copilot and gh-aw-bot May 17, 2026

Copilot started work on behalf of gh-aw-bot May 17, 2026 16:17 View session

Copilot AI linked an issue May 17, 2026 that may be closed by this pull request

[spdd] Daily spec work plan - 2026-05-17 #32843

Closed

11 tasks

Copilot AI changed the title ~~[WIP] Review daily SPDD specs for improvements~~ specs: daily SPDD work plan 2026-05-17 — safeguards, DriftRecord schema, sync notes, conformance tests, lifecycle sections May 17, 2026

pelikhan approved these changes May 17, 2026

View reviewed changes

Merge branch 'main' into copilot/review-daily-spec-work-plan

bb41d98

pelikhan marked this pull request as ready for review May 17, 2026 16:29

Copilot AI review requested due to automatic review settings May 17, 2026 16:29

pelikhan merged commit e497729 into main May 17, 2026
1 check passed

pelikhan deleted the copilot/review-daily-spec-work-plan branch May 17, 2026 16:29

Copilot stopped work on behalf of gh-aw-bot due to an error May 17, 2026 16:29
The session was cancelled by the user.

Copilot AI requested a review from gh-aw-bot May 17, 2026 16:29

Copilot started reviewing on behalf of pelikhan May 17, 2026 16:30 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

specs: daily SPDD work plan 2026-05-17 — safeguards, DriftRecord schema, sync notes, conformance tests, lifecycle sections#32844

specs: daily SPDD work plan 2026-05-17 — safeguards, DriftRecord schema, sync notes, conformance tests, lifecycle sections#32844
pelikhan merged 3 commits into
mainfrom
copilot/review-daily-spec-work-plan

Copilot AI commented May 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		\| `noop` \| `noop` \| Evaluation is skipped; no outcome is computed \| N/A — `noop` always results in `ignored` \|
		\| `missing_tool` \| `missing_tool` \| Evaluation is skipped; no outcome is computed \| N/A — `missing_tool` always results in `ignored` \|


		#### 4.5.3 Usage

		The drift detection procedure (Section 4.2, Step 5) MUST produce a list of zero or more `DriftRecord` objects. When any record has `drift_category` of `missing_in_ghaw` or `spec_mismatch`, the detecting automation MUST open a corrective PR (CR-05) and, if the SLA window is exceeded, an escalation issue (CR-06). The corrective PR description MUST embed the full `DriftRecord` list as JSON.


		The table below specifies one conformance test row per safe-output type. Each row defines the expected OTel attribute value emitted by a correct evaluator, the pass condition (what must be true for `accepted`), and the fail condition (what signals `rejected`). Implementations MUST satisfy the pass condition and MUST not emit `accepted` when the fail condition is observed.

		\| Output type \| Expected `ghaw.outcome.type` OTel attribute \| Pass condition \| Fail condition \|


		2. Audit trail requirement: Every active suppression annotation MUST be recorded in the compiled lock file (`.lock.yml`) manifest section so that reviewers can audit which rules are suppressed and why. The lock file MUST include the full `rule`, `reason`, and `expires` values for each suppression. Suppressions absent from the lock file manifest MUST be treated by subsequent compilations as unapproved and re-evaluated against the current CTR rule.

		3. SLA for resolution: Suppressions marked as false positives that affect a `MUST`-level security control (as defined in Section 5.1 — specifically those rules whose compiler action is `reject` in non-strict mode) SHOULD be resolved within 10 business days — either by confirming the suppression is correct and updating the rule's detection logic to eliminate the false positive, or by removing the suppression when the workflow is corrected. The daily optimizer SHOULD surface unresolved suppressions older than 10 business days in its daily output. A suppression MUST be re-evaluated and explicitly renewed if the `expires` date passes; expired suppressions MUST be treated by the compiler as if they do not exist.

Conversation

Copilot AI commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

specs/aw-harness.md

specs/awf-config-sources-spec.md

specs/compiler-threat-detection-spec.md

specs/safe-output-outcome-evaluation.md

specs/security-architecture-spec.md

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented May 17, 2026 •

edited

Loading

`specs/aw-harness.md`

`specs/awf-config-sources-spec.md`

`specs/compiler-threat-detection-spec.md`

`specs/safe-output-outcome-evaluation.md`

`specs/security-architecture-spec.md`