Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions specs/aw-harness.md
Original file line number Diff line number Diff line change
Expand Up @@ -1057,6 +1057,18 @@ The following matrix records where each normative harness requirement is enforce
| §11.2.2 | MUST hard-abort on budget limit; MUST NOT continue turns after hard limit | `actions/setup/js/aw_harness.cjs` cost-tracker abort gate and post-abort turn guard | Pending (`aw_harness.cjs` not present) |
| §11.2.3 | MUST isolate crashing user extensions; built-in extension failures are fatal | `actions/setup/js/aw_harness.cjs` extension loader policy checks | Pending (`aw_harness.cjs` not present) |

### 11.4 Degraded Mode & Safeguards

This section defines normative safeguard requirements for scenarios where the harness enters a degraded operating mode due to resource exhaustion, infrastructure unavailability, or partial subsystem failure. A conforming implementation **MUST** apply all safeguards numbered below.

1. **Budget-exhaustion shutdown path**: When the effective token budget is exhausted (hard limit reached), the harness **MUST** execute an orderly shutdown sequence: (a) immediately abort the active `AgentSession` turn via the session abort API; (b) flush all in-progress JSONL events and the step-summary buffer to their respective sinks; (c) emit a `budget_exceeded` event with `forced_termination: true` and the final cumulative token count; and (d) exit with code `1`. The harness **MUST NOT** start a new turn or accept additional tool calls after the hard-limit threshold is crossed, even if the session's internal queue contains pending callbacks.

2. **Partial observability failure behavior**: When the OTLP exporter or the context-provenance file writer fails (e.g., network unreachable, disk full, OTLP endpoint returns a non-retryable error), the harness **MUST** continue session execution and **MUST NOT** abort the session or exit with a non-zero code solely due to the observability failure. The harness **SHOULD** emit a structured JSONL warning event to stderr identifying the failed observability sink and the error reason. Observability subsystem failures **MUST** be treated as non-fatal degraded-mode conditions; data loss in telemetry **MUST NOT** propagate as a session-level failure.

3. **Fail-secure exit codes**: The harness **MUST** use the following exit-code contract to ensure downstream consumers can unambiguously detect failure class: exit code `0` — clean session completion with no budget abort and no fatal errors; exit code `1` — session-level failure, including hard-limit budget abort, unrecovered agent error, or failed session finalization; exit code `2` — invocation or infrastructure failure, including Pi SDK load failure, missing required configuration, or fatal built-in extension failure. The harness **MUST NOT** mask an exit code `1` or `2` condition by exiting `0`, even if the step summary was written successfully.

4. **Degraded-mode marking**: When the harness enters any degraded mode (observability failure, extension skip, or partial artifact flush), it **MUST** annotate the step summary (if `$GITHUB_STEP_SUMMARY` is set) with a visible degraded-mode notice that identifies which subsystem is degraded and what data may be incomplete. This notice **SHOULD** include a remediation hint (e.g., "check OTLP endpoint connectivity" or "extension X was skipped due to error Y").

---

## 12. Compliance Tests
Expand Down Expand Up @@ -1174,3 +1186,18 @@ OpenTelemetry specification for distributed tracing. <https://opentelemetry.io/d

**[gh-aw]**
GitHub Agentic Workflows — the gh-aw CLI extension that compiles Markdown workflow files to GitHub Actions YAML. <https://github.com/github/gh-aw>

---

## Sync Notes

This section maps normative spec sections to their primary implementation files and directories in the `github/gh-aw` repository. Maintainers **SHOULD** keep this table updated whenever implementation files are added, renamed, or removed.

| Spec section | Implementation file / directory | Notes |
|---|---|---|
| §5 Harness Invocation Contract; §6 Workflow Definition; §7 Single-Session Execution Model | `actions/setup/js/aw_harness.cjs` | Primary harness entry point. All session lifecycle, config loading, and prompt execution logic lives here. Pending creation (see §11.3). |
| §8 Extensions (provider-setup, cost-tracker, steering, repair, observability) | `actions/setup/js/aw_harness.cjs` (inline extension registrations) | Built-in Pi extensions are implemented as inline factory functions exported from or co-located with the harness. When extracted, each extension SHOULD move to a sibling file named `aw_ext_{name}.cjs`. |
| §10 Build and Deployment; §10.1 esbuild configuration | `actions/setup/js/` (directory); `package.json` build scripts in `github/gh-aw` | JavaScript build toolchain. The harness is compiled with esbuild; build configuration and bundle output paths are tracked here. |
| §9 Model Resolution; §11.1 General Security Requirements (token/credential handling) | `pkg/workflow/` (Go compiler — `aw_engine.go` or equivalent) | The `engine: aw` compilation path in Go generates the `config.json` that specifies the model, provider credentials, and feature flags consumed by the harness at runtime. |
| §11.2 Safeguards; §11.4 Degraded Mode & Safeguards | `actions/setup/js/aw_harness.cjs` | Budget-gating, observability-failure recovery, and fail-secure exit-code enforcement are all implemented inside the harness. |
| §12 Compliance Tests (T-AW-001 through T-AW-007) | `pkg/cli/workflows/` (integration test workflows); `actions/setup/js/*.test.cjs` (unit tests) | Harness lifecycle integration tests live in `pkg/cli/workflows/`. Unit-level tests for harness helpers reside alongside the JavaScript source. |
67 changes: 67 additions & 0 deletions specs/awf-config-sources-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ The key words **MUST**, **MUST NOT**, **SHOULD**, and **MAY** in this section ar

**CR-06**: Drift categorized as "missing in gh-aw" or "spec mismatch" MUST be remediated (merged or explicitly waived with rationale) within **5 business days** of detection. For this requirement, business days are Monday-Friday in UTC, excluding weekends. If this SLA is missed, maintainers MUST open (or update) an escalation tracking issue within 1 business day. The escalation issue MUST include an owner, unblock plan, and revised ETA.

**CR-06a (Escalation Owner Assignment)**: When opening or updating an escalation tracking issue under CR-06, the assignee **SHOULD** be determined as follows: (a) the maintainer who merged the last change to the drifted property's corresponding implementation file in `pkg/workflow/` or `actions/setup/` is the **default escalation owner** (implementation guidance: this can be determined via `git log` on the relevant file, or through PR merge history); (b) if no such maintainer is identifiable (e.g., the property has never been implemented), the escalation owner **SHOULD** default to the on-call maintainer for the `github/gh-aw` repository at the time of escalation; (c) the assigned owner **MUST** be recorded in the `Owner` field of the escalation issue template and **MUST** acknowledge the assignment by commenting on the issue within 1 business day of assignment. The escalation issue **MUST NOT** be left unassigned.

---

## 4. Drift Detection Procedure
Expand Down Expand Up @@ -190,6 +192,71 @@ To satisfy CR-06 tracking obligations, drift escalation records SHOULD use:

The scheduled schema consistency workflow SHOULD open or update one such issue when drift remains unresolved beyond 5 business days.

### 4.5 DriftRecord Entity Schema

A `DriftRecord` represents a single detected schema drift item produced by the drift detection procedure (Section 4.2, Step 5). All automation and agents that produce or consume drift reports **MUST** use this schema for structured drift output.

#### 4.5.1 Formal Schema (JSON Schema)

```json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "DriftRecord",
"description": "A single detected configuration drift item between gh-aw-firewall canonical sources and gh-aw implementation.",
"type": "object",
"required": ["property_path", "drift_category", "suggested_action", "detected_at"],
"properties": {
"property_path": {
"type": "string",
"description": "Dot-notation path to the drifted configuration property (e.g., 'apiProxy.anthropicAutoCache').",
"examples": ["apiProxy.anthropicAutoCache", "container.dockerHostPathPrefix"]
},
"drift_category": {
"type": "string",
"enum": ["missing_in_ghaw", "missing_in_schema", "spec_mismatch"],
"description": "Classification of the drift condition. 'missing_in_ghaw': property exists in canonical schema but gh-aw has no coverage. 'missing_in_schema': gh-aw generates a field not present in either schema. 'spec_mismatch': CLI mapping in gh-aw disagrees with the normative spec description."
},
"suggested_action": {
"type": "string",
"description": "Human-readable remediation recommendation for this drift item (e.g., 'Add coverage for apiProxy.anthropicAutoCache in pkg/workflow/ and reconcile with docs/awf-config-spec.md CLI mapping table').",
"minLength": 1
},
"detected_at": {
"type": "string",
"format": "date-time",
"description": "ISO 8601 timestamp (UTC) when this drift item was first detected in the current run."
}
Comment on lines +224 to +228
},
"additionalProperties": false
}
```

#### 4.5.2 Field Reference

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `property_path` | `string` | **MUST** | Dot-notation config property path (e.g., `apiProxy.anthropicAutoCache`) |
| `drift_category` | `enum` | **MUST** | One of `missing_in_ghaw`, `missing_in_schema`, or `spec_mismatch` (see Section 4.2, Step 4) |
| `suggested_action` | `string` | **MUST** | Actionable remediation text; **MUST NOT** be empty |
| `detected_at` | `string` (ISO 8601) | **MUST** | UTC timestamp of detection; filesystem-safe format **SHOULD** use `YYYY-MM-DDTHH:MM:SSZ` |

#### 4.5.3 Usage

The drift detection procedure (Section 4.2, Step 5) **MUST** produce a list of zero or more `DriftRecord` objects. When any record has `drift_category` of `missing_in_ghaw` or `spec_mismatch`, the detecting automation **MUST** open a corrective PR (CR-05) and, if the SLA window is exceeded, an escalation issue (CR-06). The corrective PR description **MUST** embed the full `DriftRecord` list as JSON.

**Example output (Step 5 of the drift detection procedure):**

```json
[
{
"property_path": "apiProxy.anthropicAutoCache",
"drift_category": "missing_in_ghaw",
"suggested_action": "Add coverage for apiProxy.anthropicAutoCache in pkg/workflow/ and reconcile CLI mapping in docs/awf-config-spec.md.",
"detected_at": "2026-05-17T16:00:00Z"
}
]
```

## 5. Safeguards

When canonical sources in `github/gh-aw-firewall` are unavailable (GitHub outage, auth failure, transient fetch errors), agents and automation MUST apply the following safeguards:
Expand Down
20 changes: 20 additions & 0 deletions specs/compiler-threat-detection-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,26 @@ The optimizer MUST produce one of:
- A pull request containing required spec and/or implementation updates, or
- A noop report explicitly stating no new threat coverage actions were required

### 6.4 False-Positive Handling

False positives occur when a CTR rule triggers on a workflow input that is not actually unsafe. This section defines normative norms for suppressing, auditing, and resolving false-positive detections.

1. **Author suppression mechanism**: When a workflow author believes a compiler diagnostic is a false positive, they **MUST** add an inline suppression annotation in the workflow frontmatter using the `threat-detection-suppress` key. The value **MUST** be a list of objects, each with a `rule` field (the `CTR-*` identifier), a `reason` field (human-readable explanation of why the flagged pattern is safe in this context), and an optional `expires` field (ISO 8601 date after which the suppression is no longer valid). A suppression without a `reason` **MUST NOT** be accepted by the compiler; the compiler **MUST** emit a validation error if `reason` is absent or empty.

2. **Audit trail requirement**: Every active suppression annotation **MUST** be recorded in the compiled lock file (`.lock.yml`) manifest section so that reviewers can audit which rules are suppressed and why. The lock file **MUST** include the full `rule`, `reason`, and `expires` values for each suppression. Suppressions absent from the lock file manifest **MUST** be treated by subsequent compilations as unapproved and re-evaluated against the current CTR rule.

3. **SLA for resolution**: Suppressions marked as false positives that affect a `MUST`-level security control (as defined in Section 5.1 — specifically those rules whose compiler action is `reject` in non-strict mode) **SHOULD** be resolved within **10 business days** — either by confirming the suppression is correct and updating the rule's detection logic to eliminate the false positive, or by removing the suppression when the workflow is corrected. The daily optimizer **SHOULD** surface unresolved suppressions older than 10 business days in its daily output. A suppression **MUST** be re-evaluated and explicitly renewed if the `expires` date passes; expired suppressions **MUST** be treated by the compiler as if they do not exist.
Comment on lines +204 to +208

### 6.5 Threat Category Lifecycle

New threat categories do not immediately become normative rules. This section defines the lifecycle stages a threat category **MUST** pass through before it is added to the CTR rule catalog in Section 5.1.

1. **Experimental stage**: A threat class is identified (via security research, incident analysis, or operational observation) and a tracking issue is opened in `github/gh-aw`. An experimental prototype detection implementation **MAY** be added to the compiler behind a feature flag. The threat class **MUST NOT** appear in the normative CTR catalog while in Experimental stage; it **SHOULD** be documented in a separate scratchpad or issue thread. Experimental detections **MUST NOT** cause compilation failures in production.

2. **Candidate stage**: The threat class has a concrete detection trigger, an agreed compiler action (reject, rewrite, or warn), a stable diagnostic ID reserved in a draft spec update, and at least one test case demonstrating the detection. A Candidate threat **SHOULD** be deployed behind a feature flag for a minimum of one release cycle. During Candidate stage, maintainers **MUST** collect evidence (false-positive reports, affected workflow patterns) and document findings in the tracking issue. A Candidate threat **SHOULD NOT** be promoted to Normative without at least one successful deployment in a non-strict production workflow.

3. **Normative stage**: The threat class is formally added to Section 5.1 and Section 8.1 via a pull request that includes: the CTR rule definition, the implementation mapping in Section 7.1, at least one test ID in Section 8.1, and a change-log entry in Section 10. The pull request **MUST** be reviewed by at least one security-focused maintainer. Once merged, the rule **MUST** be enforced by all conforming implementations. Any feature flag used during Candidate stage **MUST** be removed in the same pull request that adds the Normative definition.

---

## 7. Implementation Mapping
Expand Down
Loading
Loading