Skip to content

specs: daily SPDD work plan 2026-05-17 — safeguards, DriftRecord schema, sync notes, conformance tests, lifecycle sections#32844

Merged
pelikhan merged 3 commits into
mainfrom
copilot/review-daily-spec-work-plan
May 17, 2026
Merged

specs: daily SPDD work plan 2026-05-17 — safeguards, DriftRecord schema, sync notes, conformance tests, lifecycle sections#32844
pelikhan merged 3 commits into
mainfrom
copilot/review-daily-spec-work-plan

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 17, 2026

Five spec files had thin or missing Safeguards sections, no entity schemas for internal data structures, absent Sync Notes, and incomplete conformance test coverage. This PR addresses all 10 tasks from the daily SPDD rotation (2 P0, 3 P1, 5 P2).

specs/aw-harness.md

  • §11.4 Degraded Mode & Safeguards — 4 numbered MUST/SHOULD statements covering budget-exhaustion orderly shutdown, non-fatal observability failure behavior, fail-secure exit-code contract (0/1/2), and step-summary degraded-mode annotation
  • ## Sync Notes — 6-row table mapping spec sections to actions/setup/js/aw_harness.cjs, pkg/workflow/, actions/setup/js/, and pkg/cli/workflows/

specs/awf-config-sources-spec.md

  • §4.5 DriftRecord Entity Schema — formal JSON Schema block with property_path, drift_category (enum: missing_in_ghaw | missing_in_schema | spec_mismatch), suggested_action, detected_at; field reference table and usage example with corrective PR embedding requirement
  • CR-06a — escalation-owner assignment rule: default owner is last maintainer to touch the drifted property's implementation file; must acknowledge within 1 business day; issue must not be left unassigned

specs/compiler-threat-detection-spec.md

  • §6.4 False-Positive Handling — suppression via frontmatter threat-detection-suppress annotation (mandatory reason, optional expires), lock-file audit trail requirement, 10-business-day SLA with expired-suppression enforcement
  • §6.5 Threat Category Lifecycle — 3 normative stages: Experimental (prototype, no prod failures) → Candidate (feature-flagged, one release cycle, evidence collection) → Normative (full PR: rule + mapping + test + changelog, security maintainer review required)

specs/safe-output-outcome-evaluation.md

  • ### Conformance Test Table — one row per all 29 safe-output types with ghaw.outcome.type OTel attribute, pass condition, and fail condition
  • ### OTel Backend Unavailability — 2 MUST statements: persist outcome to local audit log before OTLP export attempt; exponential back-off retry (5s→5min, max 5 attempts) with export_failed flag on exhaustion

specs/security-architecture-spec.md

  • ## Sync Notes — cross-references specs/security-architecture-spec-validation.md; defines 4 revalidation triggers (minor version bump, MUST change, CVE attribution, implementation change)
  • T-SG07-001 / T-SG07-002 — fail-secure compliance tests for SG-07 added to §12.2.7: schema validation error → compile failure + no lock file written; MUST-level requirement violation → compile failure + rule ID in stderr. Both added to compliance checklist at Level 1 Required.

Copilot AI linked an issue May 17, 2026 that may be closed by this pull request
11 tasks
…riftRecord schema, sync notes, conformance tests, and lifecycle sections

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/10531a0b-f21b-4ae0-8c29-d02fef519698

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Copilot AI changed the title [WIP] Review daily SPDD specs for improvements specs: daily SPDD work plan 2026-05-17 — safeguards, DriftRecord schema, sync notes, conformance tests, lifecycle sections May 17, 2026
@pelikhan pelikhan marked this pull request as ready for review May 17, 2026 16:29
Copilot AI review requested due to automatic review settings May 17, 2026 16:29
@pelikhan pelikhan merged commit e497729 into main May 17, 2026
1 check passed
@pelikhan pelikhan deleted the copilot/review-daily-spec-work-plan branch May 17, 2026 16:29
Copilot stopped work on behalf of gh-aw-bot due to an error May 17, 2026 16:29
Copilot AI requested a review from gh-aw-bot May 17, 2026 16:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates several specifications with safeguards, schema definitions, sync notes, lifecycle guidance, and conformance coverage for gh-aw workflows and related security/telemetry behavior.

Changes:

  • Adds degraded-mode, fail-secure, false-positive, and lifecycle requirements across harness and threat-detection specs.
  • Adds DriftRecord schema and escalation-owner rules for AWF config drift.
  • Adds sync notes and conformance/compliance test tables for security architecture and safe-output outcome evaluation.
Show a summary per file
File Description
specs/aw-harness.md Adds degraded-mode safeguards and implementation sync notes for the AW harness.
specs/awf-config-sources-spec.md Adds escalation ownership rules and a DriftRecord JSON schema/usage section.
specs/compiler-threat-detection-spec.md Adds false-positive suppression handling and threat-category lifecycle stages.
specs/safe-output-outcome-evaluation.md Adds safe-output conformance table and OTLP backend unavailability requirements.
specs/security-architecture-spec.md Adds SG-07 fail-secure tests, checklist coverage, and validation sync notes.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (16)

specs/safe-output-outcome-evaluation.md:649

  • This row treats an open issue as satisfying the accepted pass condition, but Section 2 classifies open issues as pending when engaged or ignored when there are no human comments; accepted is only state_reason == completed. The conformance test would assert the wrong outcome.
| `create_issue` | `create_issue` | Issue exists in open state, or was closed by human action (not bot policy) within the evaluation window | Issue closed-as-not-planned by human within the evaluation window, or deleted |

specs/safe-output-outcome-evaluation.md:650

  • This row says a comment merely existing is the accepted pass condition, while Section 3 requires replies or reactions for accepted and classifies an existing comment with no engagement as ignored. This weakens the conformance test relative to the evaluator definition.
| `add_comment` | `add_comment` | Comment exists on the target object at evaluation time | Comment was deleted or hidden by a human (non-bot) actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:651

  • This row accepts the outcome when at least one applied label remains, but Section 4 requires all added labels to remain and marks partial removal as rejected. The conformance test would incorrectly pass partial label rollback.
| `add_labels` | `add_labels` | At least one of the bot-applied labels is still present on the target object at evaluation time | All bot-applied labels were removed by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:652

  • This row treats a still-pending reviewer request as accepted, but Section 5 requires the assigned reviewer to submit a review for accepted; no review while the PR is open is pending. The pass condition should not include the requested-reviewer state by itself.
| `add_reviewer` | `add_reviewer` | Requested reviewer is still listed as a requested reviewer, or has already submitted a review | Reviewer request was removed by a human actor before any review was submitted |

specs/safe-output-outcome-evaluation.md:655

  • This conformance row uses accepted for any issue that remains closed, but Section 8 classifies bot/lifecycle closes as lifecycle_close and non-lifecycle closes as rejected; it does not define a closed issue as accepted. This conflicts with the existing outcome taxonomy.
| `close_issue` | `close_issue` | Issue remains closed at evaluation time | Issue was reopened by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:656

  • This row treats a PR remaining closed as accepted, but Section 9 classifies a still-closed PR using lifecycle/non-lifecycle logic and marks it lifecycle_close or rejected, not accepted. The conformance pass condition contradicts the evaluator definition.
| `close_pull_request` | `close_pull_request` | PR remains closed (not merged) at evaluation time | PR was reopened or merged after the bot closed it within the evaluation window |

specs/safe-output-outcome-evaluation.md:658

  • This row says a discussion merely existing is sufficient for accepted, but Section 11 requires replies or an accepted answer; a discussion with no replies and no answer is ignored. The conformance test would over-count successful discussion creations.
| `create_discussion` | `create_discussion` | Discussion exists and has not been deleted or locked within the evaluation window | Discussion was deleted or permanently locked (preventing any responses) within the evaluation window |

specs/safe-output-outcome-evaluation.md:660

  • This row treats the review comment's existence as the accepted condition, while Section 13 requires the thread to be resolved or have replies and otherwise classifies it as ignored. The conformance table should match that engagement-based outcome.
| `create_pull_request_review_comment` | `create_pull_request_review_comment` | Review comment exists on the PR diff at evaluation time | Review comment was deleted by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:661

  • This row accepts any submitted PR review record, but Section 14 defines accepted based on follow-up commits or the PR being merged; a review that simply exists with no follow-up remains pending. This conformance condition would mark unresolved review feedback as accepted.
| `submit_pull_request_review` | `submit_pull_request_review` | PR review record exists with the submitted state (APPROVED, CHANGES_REQUESTED, COMMENT) at evaluation time | Review was dismissed by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:662

  • This row defines a reply comment existing as accepted, but Section 15 only accepts the outcome when the thread is resolved after the reply or others continue the conversation; no further activity is ignored. The conformance row drops that required engagement signal.
| `reply_to_pull_request_review_comment` | `reply_to_pull_request_review_comment` | Reply comment exists in the review thread at evaluation time | Reply comment was deleted by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:665

  • This row accepts any PR that is no longer draft, but Section 18 requires at least one review submitted after marking ready for review; a ready PR with no reviews is pending or ignored depending on state. This conformance test would pass too early.
| `mark_pull_request_as_ready_for_review` | `mark_pull_request_as_ready_for_review` | PR is no longer in draft state at evaluation time | PR was converted back to draft by a human actor within the evaluation window |

specs/safe-output-outcome-evaluation.md:666

  • This row accepts the assignment record itself, but Section 19 bases accepted on the agent producing a merged PR or the issue being resolved; no agent PR is ignored and an open agent PR is pending. This pass condition measures dispatch rather than outcome.
| `assign_to_agent` | `assign_to_agent` | Assignment record exists on the target issue/PR at evaluation time | Assignment was removed by a human actor before the assigned agent acted on it |

specs/safe-output-outcome-evaluation.md:667

  • This row accepts dispatched workflow runs that reach either success or failure, but Section 20 explicitly classifies conclusion == "failure" as rejected. A failed workflow run should be the fail condition, not part of the pass condition.
| `dispatch_workflow` | `dispatch_workflow` | The dispatched workflow run exists and reached a terminal state (success or failure) within the evaluation window | The dispatched workflow run was cancelled before reaching a terminal state; or no corresponding run record is found |

specs/safe-output-outcome-evaluation.md:668

  • This row treats dismissed code scanning alerts as accepted, but Section 21 classifies alert dismissal as rejected and only fixed alerts or merged linked PRs as accepted. The conformance row contradicts the evaluator rules.
| `autofix_code_scanning_alert` | `autofix_code_scanning_alert` | Code scanning alert is in a fixed or dismissed state at evaluation time | Alert was re-opened or the fix commit was reverted within the evaluation window |

specs/safe-output-outcome-evaluation.md:669

  • This row accepts the mere existence of a created code scanning alert, while Section 22 only treats fixed or dismissed-with-reason alerts as accepted and leaves open alerts pending. The conformance condition would mark untriaged alerts as successful.
| `create_code_scanning_alert` | `create_code_scanning_alert` | Alert record exists in the repository's code scanning results at evaluation time | Alert was immediately dismissed (within the evaluation window) with no investigation action |

specs/safe-output-outcome-evaluation.md:664

  • This row accepts a pushed commit that is still present in the PR branch, but Section 17 classifies an open PR as pending and only PR merged as accepted (with closed/reverted cases rejected). Presence in branch history is not sufficient for acceptance under the existing evaluator rules.
| `push_to_pull_request_branch` | `push_to_pull_request_branch` | The pushed commit SHA is still present in the PR branch history at evaluation time | The commit was force-pushed out of the branch history by a human actor within the evaluation window |
  • Files reviewed: 5/5 changed files
  • Comments generated: 18


| Output type | Expected `ghaw.outcome.type` OTel attribute | Pass condition | Fail condition |
|---|---|---|---|
| `create_pull_request` | `create_pull_request` | PR exists in open or merged state; was not closed-as-not-planned or reverted within the evaluation window | PR closed-as-not-planned, reverted, or deleted within the evaluation window |
Comment on lines +675 to +676
| `noop` | `noop` | Evaluation is skipped; no outcome is computed | N/A — `noop` always results in `ignored` |
| `missing_tool` | `missing_tool` | Evaluation is skipped; no outcome is computed | N/A — `missing_tool` always results in `ignored` |
"property_path": {
"type": "string",
"description": "Dot-notation path to the drifted configuration property (e.g., 'apiProxy.anthropicAutoCache').",
"examples": ["apiProxy.anthropicAutoCache", "container.dockerHostPathPrefix"]
"drift_category": {
"type": "string",
"enum": ["missing_in_ghaw", "missing_in_schema", "spec_mismatch"],
"description": "Classification of the drift condition. 'missing_in_ghaw': property exists in canonical schema but gh-aw has no coverage. 'missing_in_schema': gh-aw generates a field not present in either schema. 'spec_mismatch': CLI mapping in gh-aw disagrees with the normative spec description."
| `property_path` | `string` | **MUST** | Dot-notation config property path (e.g., `apiProxy.anthropicAutoCache`) |
| `drift_category` | `enum` | **MUST** | One of `missing_in_ghaw`, `missing_in_schema`, or `spec_mismatch` (see Section 4.2, Step 4) |
| `suggested_action` | `string` | **MUST** | Actionable remediation text; **MUST NOT** be empty |
| `detected_at` | `string` (ISO 8601) | **MUST** | UTC timestamp of detection; filesystem-safe format **SHOULD** use `YYYY-MM-DDTHH:MM:SSZ` |

#### 4.5.3 Usage

The drift detection procedure (Section 4.2, Step 5) **MUST** produce a list of zero or more `DriftRecord` objects. When any record has `drift_category` of `missing_in_ghaw` or `spec_mismatch`, the detecting automation **MUST** open a corrective PR (CR-05) and, if the SLA window is exceeded, an escalation issue (CR-06). The corrective PR description **MUST** embed the full `DriftRecord` list as JSON.
Comment on lines +224 to +228
"detected_at": {
"type": "string",
"format": "date-time",
"description": "ISO 8601 timestamp (UTC) when this drift item was first detected in the current run."
}

---

## Sync Notes

The table below specifies one conformance test row per safe-output type. Each row defines the expected OTel attribute value emitted by a correct evaluator, the pass condition (what must be true for `accepted`), and the fail condition (what signals `rejected`). Implementations **MUST** satisfy the pass condition and **MUST** not emit `accepted` when the fail condition is observed.

| Output type | Expected `ghaw.outcome.type` OTel attribute | Pass condition | Fail condition |

2. **Audit trail requirement**: Every active suppression annotation **MUST** be recorded in the compiled lock file (`.lock.yml`) manifest section so that reviewers can audit which rules are suppressed and why. The lock file **MUST** include the full `rule`, `reason`, and `expires` values for each suppression. Suppressions absent from the lock file manifest **MUST** be treated by subsequent compilations as unapproved and re-evaluated against the current CTR rule.

3. **SLA for resolution**: Suppressions marked as false positives that affect a `MUST`-level security control (as defined in Section 5.1 — specifically those rules whose compiler action is `reject` in non-strict mode) **SHOULD** be resolved within **10 business days** — either by confirming the suppression is correct and updating the rule's detection logic to eliminate the false positive, or by removing the suppression when the workflow is corrected. The daily optimizer **SHOULD** surface unresolved suppressions older than 10 business days in its daily output. A suppression **MUST** be re-evaluated and explicitly renewed if the `expires` date passes; expired suppressions **MUST** be treated by the compiler as if they do not exist.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[spdd] Daily spec work plan - 2026-05-17

4 participants