specs: daily SPDD work plan 2026-05-17 — safeguards, DriftRecord schema, sync notes, conformance tests, lifecycle sections#32844
Conversation
…riftRecord schema, sync notes, conformance tests, and lifecycle sections Agent-Logs-Url: https://github.com/github/gh-aw/sessions/10531a0b-f21b-4ae0-8c29-d02fef519698 Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates several specifications with safeguards, schema definitions, sync notes, lifecycle guidance, and conformance coverage for gh-aw workflows and related security/telemetry behavior.
Changes:
- Adds degraded-mode, fail-secure, false-positive, and lifecycle requirements across harness and threat-detection specs.
- Adds DriftRecord schema and escalation-owner rules for AWF config drift.
- Adds sync notes and conformance/compliance test tables for security architecture and safe-output outcome evaluation.
Show a summary per file
| File | Description |
|---|---|
specs/aw-harness.md |
Adds degraded-mode safeguards and implementation sync notes for the AW harness. |
specs/awf-config-sources-spec.md |
Adds escalation ownership rules and a DriftRecord JSON schema/usage section. |
specs/compiler-threat-detection-spec.md |
Adds false-positive suppression handling and threat-category lifecycle stages. |
specs/safe-output-outcome-evaluation.md |
Adds safe-output conformance table and OTLP backend unavailability requirements. |
specs/security-architecture-spec.md |
Adds SG-07 fail-secure tests, checklist coverage, and validation sync notes. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (16)
specs/safe-output-outcome-evaluation.md:649
- This row treats an open issue as satisfying the
acceptedpass condition, but Section 2 classifies open issues aspendingwhen engaged orignoredwhen there are no human comments;acceptedis onlystate_reason == completed. The conformance test would assert the wrong outcome.
| `create_issue` | `create_issue` | Issue exists in open state, or was closed by human action (not bot policy) within the evaluation window | Issue closed-as-not-planned by human within the evaluation window, or deleted |
specs/safe-output-outcome-evaluation.md:650
- This row says a comment merely existing is the
acceptedpass condition, while Section 3 requires replies or reactions foracceptedand classifies an existing comment with no engagement asignored. This weakens the conformance test relative to the evaluator definition.
| `add_comment` | `add_comment` | Comment exists on the target object at evaluation time | Comment was deleted or hidden by a human (non-bot) actor within the evaluation window |
specs/safe-output-outcome-evaluation.md:651
- This row accepts the outcome when at least one applied label remains, but Section 4 requires all added labels to remain and marks partial removal as
rejected. The conformance test would incorrectly pass partial label rollback.
| `add_labels` | `add_labels` | At least one of the bot-applied labels is still present on the target object at evaluation time | All bot-applied labels were removed by a human actor within the evaluation window |
specs/safe-output-outcome-evaluation.md:652
- This row treats a still-pending reviewer request as
accepted, but Section 5 requires the assigned reviewer to submit a review foraccepted; no review while the PR is open ispending. The pass condition should not include the requested-reviewer state by itself.
| `add_reviewer` | `add_reviewer` | Requested reviewer is still listed as a requested reviewer, or has already submitted a review | Reviewer request was removed by a human actor before any review was submitted |
specs/safe-output-outcome-evaluation.md:655
- This conformance row uses
acceptedfor any issue that remains closed, but Section 8 classifies bot/lifecycle closes aslifecycle_closeand non-lifecycle closes asrejected; it does not define a closed issue asaccepted. This conflicts with the existing outcome taxonomy.
| `close_issue` | `close_issue` | Issue remains closed at evaluation time | Issue was reopened by a human actor within the evaluation window |
specs/safe-output-outcome-evaluation.md:656
- This row treats a PR remaining closed as
accepted, but Section 9 classifies a still-closed PR using lifecycle/non-lifecycle logic and marks itlifecycle_closeorrejected, notaccepted. The conformance pass condition contradicts the evaluator definition.
| `close_pull_request` | `close_pull_request` | PR remains closed (not merged) at evaluation time | PR was reopened or merged after the bot closed it within the evaluation window |
specs/safe-output-outcome-evaluation.md:658
- This row says a discussion merely existing is sufficient for
accepted, but Section 11 requires replies or an accepted answer; a discussion with no replies and no answer isignored. The conformance test would over-count successful discussion creations.
| `create_discussion` | `create_discussion` | Discussion exists and has not been deleted or locked within the evaluation window | Discussion was deleted or permanently locked (preventing any responses) within the evaluation window |
specs/safe-output-outcome-evaluation.md:660
- This row treats the review comment's existence as the
acceptedcondition, while Section 13 requires the thread to be resolved or have replies and otherwise classifies it asignored. The conformance table should match that engagement-based outcome.
| `create_pull_request_review_comment` | `create_pull_request_review_comment` | Review comment exists on the PR diff at evaluation time | Review comment was deleted by a human actor within the evaluation window |
specs/safe-output-outcome-evaluation.md:661
- This row accepts any submitted PR review record, but Section 14 defines
acceptedbased on follow-up commits or the PR being merged; a review that simply exists with no follow-up remainspending. This conformance condition would mark unresolved review feedback as accepted.
| `submit_pull_request_review` | `submit_pull_request_review` | PR review record exists with the submitted state (APPROVED, CHANGES_REQUESTED, COMMENT) at evaluation time | Review was dismissed by a human actor within the evaluation window |
specs/safe-output-outcome-evaluation.md:662
- This row defines a reply comment existing as
accepted, but Section 15 only accepts the outcome when the thread is resolved after the reply or others continue the conversation; no further activity isignored. The conformance row drops that required engagement signal.
| `reply_to_pull_request_review_comment` | `reply_to_pull_request_review_comment` | Reply comment exists in the review thread at evaluation time | Reply comment was deleted by a human actor within the evaluation window |
specs/safe-output-outcome-evaluation.md:665
- This row accepts any PR that is no longer draft, but Section 18 requires at least one review submitted after marking ready for review; a ready PR with no reviews is
pendingorignoreddepending on state. This conformance test would pass too early.
| `mark_pull_request_as_ready_for_review` | `mark_pull_request_as_ready_for_review` | PR is no longer in draft state at evaluation time | PR was converted back to draft by a human actor within the evaluation window |
specs/safe-output-outcome-evaluation.md:666
- This row accepts the assignment record itself, but Section 19 bases
acceptedon the agent producing a merged PR or the issue being resolved; no agent PR isignoredand an open agent PR ispending. This pass condition measures dispatch rather than outcome.
| `assign_to_agent` | `assign_to_agent` | Assignment record exists on the target issue/PR at evaluation time | Assignment was removed by a human actor before the assigned agent acted on it |
specs/safe-output-outcome-evaluation.md:667
- This row accepts dispatched workflow runs that reach either success or failure, but Section 20 explicitly classifies
conclusion == "failure"asrejected. A failed workflow run should be the fail condition, not part of the pass condition.
| `dispatch_workflow` | `dispatch_workflow` | The dispatched workflow run exists and reached a terminal state (success or failure) within the evaluation window | The dispatched workflow run was cancelled before reaching a terminal state; or no corresponding run record is found |
specs/safe-output-outcome-evaluation.md:668
- This row treats dismissed code scanning alerts as
accepted, but Section 21 classifies alert dismissal asrejectedand only fixed alerts or merged linked PRs asaccepted. The conformance row contradicts the evaluator rules.
| `autofix_code_scanning_alert` | `autofix_code_scanning_alert` | Code scanning alert is in a fixed or dismissed state at evaluation time | Alert was re-opened or the fix commit was reverted within the evaluation window |
specs/safe-output-outcome-evaluation.md:669
- This row accepts the mere existence of a created code scanning alert, while Section 22 only treats
fixedor dismissed-with-reason alerts asacceptedand leaves open alertspending. The conformance condition would mark untriaged alerts as successful.
| `create_code_scanning_alert` | `create_code_scanning_alert` | Alert record exists in the repository's code scanning results at evaluation time | Alert was immediately dismissed (within the evaluation window) with no investigation action |
specs/safe-output-outcome-evaluation.md:664
- This row accepts a pushed commit that is still present in the PR branch, but Section 17 classifies an open PR as
pendingand onlyPR mergedasaccepted(with closed/reverted cases rejected). Presence in branch history is not sufficient for acceptance under the existing evaluator rules.
| `push_to_pull_request_branch` | `push_to_pull_request_branch` | The pushed commit SHA is still present in the PR branch history at evaluation time | The commit was force-pushed out of the branch history by a human actor within the evaluation window |
- Files reviewed: 5/5 changed files
- Comments generated: 18
|
|
||
| | Output type | Expected `ghaw.outcome.type` OTel attribute | Pass condition | Fail condition | | ||
| |---|---|---|---| | ||
| | `create_pull_request` | `create_pull_request` | PR exists in open or merged state; was not closed-as-not-planned or reverted within the evaluation window | PR closed-as-not-planned, reverted, or deleted within the evaluation window | |
| | `noop` | `noop` | Evaluation is skipped; no outcome is computed | N/A — `noop` always results in `ignored` | | ||
| | `missing_tool` | `missing_tool` | Evaluation is skipped; no outcome is computed | N/A — `missing_tool` always results in `ignored` | |
| "property_path": { | ||
| "type": "string", | ||
| "description": "Dot-notation path to the drifted configuration property (e.g., 'apiProxy.anthropicAutoCache').", | ||
| "examples": ["apiProxy.anthropicAutoCache", "container.dockerHostPathPrefix"] |
| "drift_category": { | ||
| "type": "string", | ||
| "enum": ["missing_in_ghaw", "missing_in_schema", "spec_mismatch"], | ||
| "description": "Classification of the drift condition. 'missing_in_ghaw': property exists in canonical schema but gh-aw has no coverage. 'missing_in_schema': gh-aw generates a field not present in either schema. 'spec_mismatch': CLI mapping in gh-aw disagrees with the normative spec description." |
| | `property_path` | `string` | **MUST** | Dot-notation config property path (e.g., `apiProxy.anthropicAutoCache`) | | ||
| | `drift_category` | `enum` | **MUST** | One of `missing_in_ghaw`, `missing_in_schema`, or `spec_mismatch` (see Section 4.2, Step 4) | | ||
| | `suggested_action` | `string` | **MUST** | Actionable remediation text; **MUST NOT** be empty | | ||
| | `detected_at` | `string` (ISO 8601) | **MUST** | UTC timestamp of detection; filesystem-safe format **SHOULD** use `YYYY-MM-DDTHH:MM:SSZ` | |
|
|
||
| #### 4.5.3 Usage | ||
|
|
||
| The drift detection procedure (Section 4.2, Step 5) **MUST** produce a list of zero or more `DriftRecord` objects. When any record has `drift_category` of `missing_in_ghaw` or `spec_mismatch`, the detecting automation **MUST** open a corrective PR (CR-05) and, if the SLA window is exceeded, an escalation issue (CR-06). The corrective PR description **MUST** embed the full `DriftRecord` list as JSON. |
| "detected_at": { | ||
| "type": "string", | ||
| "format": "date-time", | ||
| "description": "ISO 8601 timestamp (UTC) when this drift item was first detected in the current run." | ||
| } |
|
|
||
| --- | ||
|
|
||
| ## Sync Notes |
|
|
||
| The table below specifies one conformance test row per safe-output type. Each row defines the expected OTel attribute value emitted by a correct evaluator, the pass condition (what must be true for `accepted`), and the fail condition (what signals `rejected`). Implementations **MUST** satisfy the pass condition and **MUST** not emit `accepted` when the fail condition is observed. | ||
|
|
||
| | Output type | Expected `ghaw.outcome.type` OTel attribute | Pass condition | Fail condition | |
|
|
||
| 2. **Audit trail requirement**: Every active suppression annotation **MUST** be recorded in the compiled lock file (`.lock.yml`) manifest section so that reviewers can audit which rules are suppressed and why. The lock file **MUST** include the full `rule`, `reason`, and `expires` values for each suppression. Suppressions absent from the lock file manifest **MUST** be treated by subsequent compilations as unapproved and re-evaluated against the current CTR rule. | ||
|
|
||
| 3. **SLA for resolution**: Suppressions marked as false positives that affect a `MUST`-level security control (as defined in Section 5.1 — specifically those rules whose compiler action is `reject` in non-strict mode) **SHOULD** be resolved within **10 business days** — either by confirming the suppression is correct and updating the rule's detection logic to eliminate the false positive, or by removing the suppression when the workflow is corrected. The daily optimizer **SHOULD** surface unresolved suppressions older than 10 business days in its daily output. A suppression **MUST** be re-evaluated and explicitly renewed if the `expires` date passes; expired suppressions **MUST** be treated by the compiler as if they do not exist. |
Five spec files had thin or missing Safeguards sections, no entity schemas for internal data structures, absent Sync Notes, and incomplete conformance test coverage. This PR addresses all 10 tasks from the daily SPDD rotation (2 P0, 3 P1, 5 P2).
specs/aw-harness.md§11.4 Degraded Mode & Safeguards— 4 numbered MUST/SHOULD statements covering budget-exhaustion orderly shutdown, non-fatal observability failure behavior, fail-secure exit-code contract (0/1/2), and step-summary degraded-mode annotation## Sync Notes— 6-row table mapping spec sections toactions/setup/js/aw_harness.cjs,pkg/workflow/,actions/setup/js/, andpkg/cli/workflows/specs/awf-config-sources-spec.md§4.5 DriftRecord Entity Schema— formal JSON Schema block withproperty_path,drift_category(enum:missing_in_ghaw|missing_in_schema|spec_mismatch),suggested_action,detected_at; field reference table and usage example with corrective PR embedding requirementCR-06a— escalation-owner assignment rule: default owner is last maintainer to touch the drifted property's implementation file; must acknowledge within 1 business day; issue must not be left unassignedspecs/compiler-threat-detection-spec.md§6.4 False-Positive Handling— suppression via frontmatterthreat-detection-suppressannotation (mandatoryreason, optionalexpires), lock-file audit trail requirement, 10-business-day SLA with expired-suppression enforcement§6.5 Threat Category Lifecycle— 3 normative stages: Experimental (prototype, no prod failures) → Candidate (feature-flagged, one release cycle, evidence collection) → Normative (full PR: rule + mapping + test + changelog, security maintainer review required)specs/safe-output-outcome-evaluation.md### Conformance Test Table— one row per all 29 safe-output types withghaw.outcome.typeOTel attribute, pass condition, and fail condition### OTel Backend Unavailability— 2 MUST statements: persist outcome to local audit log before OTLP export attempt; exponential back-off retry (5s→5min, max 5 attempts) withexport_failedflag on exhaustionspecs/security-architecture-spec.md## Sync Notes— cross-referencesspecs/security-architecture-spec-validation.md; defines 4 revalidation triggers (minor version bump, MUST change, CVE attribution, implementation change)T-SG07-001/T-SG07-002— fail-secure compliance tests for SG-07 added to §12.2.7: schema validation error → compile failure + no lock file written; MUST-level requirement violation → compile failure + rule ID in stderr. Both added to compliance checklist at Level 1 Required.