Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions docs/src/content/docs/reference/effective-tokens-specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,15 @@ The root invocation MUST have `parent_id = null`. It represents the user-facing

Each sub-agent invocation MUST reference a valid `parent_id`. Sub-agent invocations MAY recursively spawn further invocations.

For execution graphs deeper than two levels, implementations MUST aggregate descendant Effective
Tokens in stable post-order: fully observed leaf descendants first, then their nearest observed
ancestors, and finally the parent node's local invocation cost. When a parent has incomplete or
unobservable descendants, the implementation MUST report the partial sum accumulated from the
deepest observed descendants before adding any shallower fallback estimates, and SHOULD keep the
parent node flagged until all known descendants are either observed or explicitly marked
unobservable. Repeated computations over the same partially observed graph MUST produce the same
partial-ordering and subtotal sequence.

---

## 7. Reporting
Expand Down Expand Up @@ -292,6 +301,12 @@ integer interoperability in cross-language pipelines.
reported `summary.effective_tokens` value to the ceiling and **MUST** emit a warning indicating
that capping occurred.

**R-SAFE-003A**: When ET capping occurs, implementations **MUST** record a deterministic overflow
condition using either `flagged.code = "ET_OVERFLOW"` on the affected root/subtree node or a
deterministic error when no structured flag channel is available. The error/flag payload **MUST**
include the ceiling value `9007199254740991` so operators can distinguish overflow from missing
usage data.

**R-SAFE-004**: For long multi-agent chains, implementations **SHOULD** aggregate ET in a
streaming manner (incremental updates per invocation) and **SHOULD** emit an early warning when
running totals exceed 80% of the ceiling.
Expand Down
13 changes: 8 additions & 5 deletions docs/src/content/docs/reference/experiments-specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,17 +38,20 @@ implementation (gh-aw v1.x) satisfies all conformance requirements below.
Promotion from **Draft** to **Candidate Recommendation** requires all of the following:

1. **Reference implementation completeness**: 100% of normative requirements in §§4–12 are
implemented in `gh-aw` and mapped to concrete implementation files.
implemented in `gh-aw` and mapped to concrete implementation files (**tracking issue**:
[#31983](https://github.com/github/gh-aw/issues/31983)).
2. **Compliance coverage**: At least 95% of normative requirements have automated tests, and
all MUST/MUST NOT requirements have at least one passing automated test.
all MUST/MUST NOT requirements have at least one passing automated test (**tracking issue**:
[#31983](https://github.com/github/gh-aw/issues/31983)).
3. **CI stability window**: The experiments-related test suite passes on the default branch for
30 consecutive days with no unresolved regression in variant selection, persistence, or
reporting behavior.
reporting behavior (**tracking issue**: [#31983](https://github.com/github/gh-aw/issues/31983)).
4. **Interoperability evidence**: At least two production workflows using `experiments:` run for
a minimum of 500 total assignments each with valid assignment artifacts and reproducible
audit output.
audit output (**tracking issue**: [#31983](https://github.com/github/gh-aw/issues/31983)).
5. **Review sign-off**: Written approval from at least two gh-aw maintainers that Sections 10–14
are complete, internally consistent, and suitable for Candidate Recommendation publication.
are complete, internally consistent, and suitable for Candidate Recommendation publication
(**tracking issue**: [#31983](https://github.com/github/gh-aw/issues/31983)).

### Sync

Expand Down
75 changes: 42 additions & 33 deletions docs/src/content/docs/reference/forecast-specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ If a provided `workflow_id` does not match any discovered workflow, the implemen
| `--days` | int | `30` | Length of the historical sampling window in days. Permitted values: `7`, `30`. |
| `--period` | string | `"month"` | Projection period length. Permitted values: `"week"`, `"month"`. |
| `--sample` | int | `100` | Maximum number of completed runs to sample per workflow. MUST be ≥ 1. |
| `--max-age` | int | `90` | Maximum age in days for historical runs eligible for sampling. Implementations SHOULD discard runs older than this bound unless the caller overrides it. MUST be ≥ 1. |
| `--repo` | string | (none) | Target a repository other than the current working directory, in `owner/repo` format. Enables remote mode. |
| `--json` | bool | `false` | Emit machine-readable JSON output instead of console tables. |
| `--verbose` | bool | `false` | Emit verbose diagnostic output to stderr during processing. |
Expand All @@ -220,6 +221,7 @@ Implementations MUST validate all flag values before beginning any API calls or
- **R-CLI-002**: If `--period` is not one of `{"week", "month"}`, the implementation MUST exit with a non-zero status and an error message specifying the permitted values.
- **R-CLI-003**: If `--sample` is less than 1, the implementation MUST exit with a non-zero status.
- **R-CLI-004**: If `--repo` is provided, it MUST match the pattern `owner/repo` (two non-empty components separated by `/`). An invalid format MUST produce a non-zero exit with a descriptive error.
- **R-CLI-005**: If `--max-age` is provided and is less than 1, the implementation MUST exit with a non-zero status and a descriptive error.

### 4.5 Exit Codes

Expand Down Expand Up @@ -250,6 +252,9 @@ gh aw forecast --repo owner/repo

# Forecast a specific workflow in a remote repository
gh aw forecast --repo owner/repo ci-doctor

# Ignore historical runs older than 90 days (default)
gh aw forecast --max-age 90
```

---
Expand Down Expand Up @@ -283,9 +288,10 @@ Frontmatter enrichment is OPTIONAL; absence of a corresponding source file MUST

In remote mode (when `--repo owner/repo` is specified), the implementation MUST:

1. **R-DISC-010**: Call the GitHub Actions API (`GET /repos/{owner}/{repo}/actions/workflows`) to enumerate workflows in the target repository.
1. **R-DISC-010**: Call the GitHub Actions API (`GET /repos/{owner}/{repo}/actions/workflows`) to enumerate workflows in the target repository. If workflow discovery hits a primary or secondary GitHub API rate limit, the implementation SHOULD back off and retry before failing.
2. **R-DISC-011**: Filter the returned workflows to those identified as agentic (e.g., by inspecting file-path conventions, labels, or other implementation-defined heuristics).
3. **R-DISC-012**: Match any caller-supplied `workflow_id` positional arguments against workflow display names and file-path basenames using case-insensitive string comparison.
4. **R-DISC-013**: If rate-limit exhaustion occurs after at least one caller-supplied workflow identifier can still be attempted, the implementation MUST continue with that subset as a partial result set and MUST emit a warning identifying the degraded discovery mode.

In remote mode, frontmatter metadata (triggers, concurrency, experiment variants) is UNAVAILABLE because the workflow source files are not accessible. The implementation MUST degrade gracefully: fields that depend on frontmatter MUST be omitted from output or reported as their zero/empty values rather than causing an error.

Expand All @@ -308,14 +314,15 @@ For each discovered workflow (or each workflow in the filtered set), the impleme

1. **R-SAMP-001**: Query completed workflow runs within the historical window using the equivalent of `gh run list --workflow <id> --status completed --limit <sample> --created >=<cutoff>`.
2. **R-SAMP-002**: Limit the returned run set to at most `--sample` runs.
3. **R-SAMP-003**: For each run in the sample, derive the per-run metrics defined in Section 6.2.
4. **R-SAMP-004**: Record the count of runs with a successful conclusion separately from the total sampled count.
3. **R-SAMP-003**: Implementations SHOULD discard historical runs older than 90 days by default, even when a broader sampling window is requested, and SHOULD expose this bound through a `--max-age` flag so operators can opt in to older samples when needed.
4. **R-SAMP-004**: For each run in the sample, derive the per-run metrics defined in Section 6.2.
5. **R-SAMP-005**: Record the count of runs with a successful conclusion separately from the total sampled count.

If the historical window yields zero completed runs for a workflow, the implementation MUST:

- **R-SAMP-005**: Return `nil` (or a sentinel empty result) for that workflow's Monte Carlo projection.
- **R-SAMP-006**: Include the workflow in output with `sampled_runs: 0` and all projection fields set to zero.
- **R-SAMP-007**: SHOULD emit a warning indicating that no historical data is available for the workflow.
- **R-SAMP-006**: Return `nil` (or a sentinel empty result) for that workflow's Monte Carlo projection.
- **R-SAMP-007**: Include the workflow in output with `sampled_runs: 0` and all projection fields set to zero.
- **R-SAMP-008**: SHOULD emit a warning indicating that no historical data is available for the workflow.

### 6.2 Per-Run Metric Derivation

Expand Down Expand Up @@ -807,6 +814,7 @@ Because the forecast command is marked **Experimental**:
- **T-FC-011**: Local mode: no lock files found exits with code `3`.
- **T-FC-012**: Remote mode: calls GitHub Actions API and matches workflow IDs case-insensitively.
- **T-FC-013**: Remote mode: missing frontmatter fields default to zero/empty without error.
- **T-FC-030**: Remote mode: on GitHub API rate-limit exhaustion during workflow discovery, the implementation backs off and emits a warning before continuing with caller-supplied workflow IDs as partial results.

#### 12.1.3 Data Sampling Tests

Expand All @@ -817,23 +825,23 @@ Because the forecast command is marked **Experimental**:

#### 12.1.4 Monte Carlo Engine Tests

- **T-FC-030**: With `λ ≤ 15`, Knuth's algorithm is used for Poisson draw (verifiable by seeded PRNG in test mode).
- **T-FC-031**: With `λ > 15`, Normal approximation is used; drawn value is non-negative.
- **T-FC-032**: With `λ = 0`, projected tokens is exactly `0` for all trials.
- **T-FC-033**: Bootstrap resampling draws with replacement from historical ET observations.
- **T-FC-034**: Only successful Bernoulli draws contribute ET to the trial total.
- **T-FC-035**: 10,000 trials are executed per workflow.
- **T-FC-036**: P10 ≤ P50 ≤ P90 for all non-zero projections.
- **T-FC-037**: `projected_effective_tokens` equals `p50_projected_effective_tokens`.
- **T-FC-038**: Boundary crossover: `λ = 15` uses Knuth's exact branch.
- **T-FC-039**: Boundary crossover: `λ > 15` uses Normal approximation branch.
- **T-FC-031**: With `λ ≤ 15`, Knuth's algorithm is used for Poisson draw (verifiable by seeded PRNG in test mode).
- **T-FC-032**: With `λ > 15`, Normal approximation is used; drawn value is non-negative.
- **T-FC-033**: With `λ = 0`, projected tokens is exactly `0` for all trials.
- **T-FC-034**: Bootstrap resampling draws with replacement from historical ET observations.
- **T-FC-035**: Only successful Bernoulli draws contribute ET to the trial total.
- **T-FC-036**: 10,000 trials are executed per workflow.
- **T-FC-037**: P10 ≤ P50 ≤ P90 for all non-zero projections.
- **T-FC-038**: `projected_effective_tokens` equals `p50_projected_effective_tokens`.
- **T-FC-039**: Boundary crossover: `λ = 15` uses Knuth's exact branch.
- **T-FC-040**: Boundary crossover: `λ > 15` uses Normal approximation branch.

#### 12.1.5 Episode Analysis Tests

- **T-FC-040**: Runs sharing `headSha` and `headBranch` are grouped into the same episode.
- **T-FC-041**: `runs_per_episode` equals `sampled_run_count / sampled_episodes`.
- **T-FC-042**: Episode table is printed in console output when any workflow has `runs_per_episode > 1`.
- **T-FC-043**: Episode table is suppressed when all workflows have `runs_per_episode = 1.0`.
- **T-FC-041**: Runs sharing `headSha` and `headBranch` are grouped into the same episode.
- **T-FC-042**: `runs_per_episode` equals `sampled_run_count / sampled_episodes`.
- **T-FC-043**: Episode table is printed in console output when any workflow has `runs_per_episode > 1`.
- **T-FC-044**: Episode table is suppressed when all workflows have `runs_per_episode = 1.0`.

#### 12.1.6 Output Format Tests

Expand All @@ -851,20 +859,21 @@ Because the forecast command is marked **Experimental**:
| Flag validation | T-FC-001–005 | 1 | Required |
| Local workflow discovery | T-FC-010–011 | 1 | Required |
| Remote workflow discovery | T-FC-012–013 | 2 | Required |
| Remote discovery rate-limit backoff and partial results | T-FC-030 | 2 | Required |
| Data sampling with limit and window | T-FC-020–021 | 1 | Required |
| Missing artifact graceful handling | T-FC-022 | 1 | Required |
| Nil projection for empty sample | T-FC-023 | 1 | Required |
| Knuth Poisson algorithm (λ ≤ 15) | T-FC-030 | 1 | Required |
| Normal approximation (λ > 15) | T-FC-031 | 1 | Required |
| Zero-λ projection | T-FC-032 | 1 | Required |
| Bootstrap resampling | T-FC-033 | 1 | Required |
| Bernoulli success filtering | T-FC-034 | 1 | Required |
| 10,000 trial count | T-FC-035 | 1 | Required |
| Percentile ordering | T-FC-036 | 1 | Required |
| P50 field consistency | T-FC-037 | 1 | Required |
| λ crossover threshold enforcement | T-FC-038–039 | 1 | Required |
| Episode grouping | T-FC-040–041 | 2 | Required |
| Episode table display logic | T-FC-042–043 | 2 | Required |
| Knuth Poisson algorithm (λ ≤ 15) | T-FC-031 | 1 | Required |
| Normal approximation (λ > 15) | T-FC-032 | 1 | Required |
| Zero-λ projection | T-FC-033 | 1 | Required |
| Bootstrap resampling | T-FC-034 | 1 | Required |
| Bernoulli success filtering | T-FC-035 | 1 | Required |
| 10,000 trial count | T-FC-036 | 1 | Required |
| Percentile ordering | T-FC-037 | 1 | Required |
| P50 field consistency | T-FC-038 | 1 | Required |
| λ crossover threshold enforcement | T-FC-039–040 | 1 | Required |
| Episode grouping | T-FC-041–042 | 2 | Required |
| Episode table display logic | T-FC-043–044 | 2 | Required |
| Console output columns | T-FC-050 | 1 | Required |
| JSON schema conformance | T-FC-051–054 | 2 | Required |
| Experimental status warning | T-FC-055 | 1 | Required |
Expand All @@ -879,8 +888,8 @@ This section maps normative forecast requirements to implementation files.
|---|---|
| Monte Carlo engine (Poisson/Bootstrap/Bernoulli) | `pkg/cli/forecast_montecarlo.go` |
| Forecast command orchestration and output fields | `pkg/cli/forecast.go`, `pkg/cli/forecast_command.go` |
| Workflow/run sampling and API handling | `pkg/cli/forecast.go` |
| Monte Carlo compliance tests (including λ threshold) | `pkg/cli/forecast_montecarlo_test.go` |
| Workflow discovery, rate-limit backoff, and run sampling | `pkg/cli/forecast.go` |
| Forecast compliance tests (including rate-limit backoff and λ thresholds) | `pkg/cli/forecast_montecarlo_test.go` |

Sync procedure:
1. Update this specification when changing projection algorithms or thresholds.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -344,14 +344,15 @@ Version changes will be documented and backward compatibility maintained where p
Per the **Resolution (2026-05-08)** in Implementation Notes, the text-based algorithm remains
authoritative until a dedicated migration milestone is approved.

Tracking issue: [#31983](https://github.com/github/gh-aw/issues/31983)

The project **MUST NOT** schedule a v2.0.0 migration to the field-selection model until all of
the following prerequisites are complete:
the following tracked tasks are complete:

1. A selective field-exclusion use case is confirmed and documented.
2. A migration guide is drafted, including lock-file invalidation and recompilation steps.
3. Cross-language (Go + JavaScript) test vectors for the candidate v2.0.0 behavior are written
and pass in CI.
4. A rollout plan is approved by maintainers, including backward-compatibility impact analysis.
- [ ] Confirm and document a selective field-exclusion use case in [#31983](https://github.com/github/gh-aw/issues/31983).
- [ ] Draft a migration guide in [#31983](https://github.com/github/gh-aw/issues/31983), including lock-file invalidation and recompilation steps.
- [ ] Write candidate v2.0.0 cross-language test vectors in [#31983](https://github.com/github/gh-aw/issues/31983) and verify they pass in CI.
- [ ] Approve a rollout plan in [#31983](https://github.com/github/gh-aw/issues/31983), including backward-compatibility impact analysis.

Until these prerequisites are met, implementations **MUST** continue using the text-based
algorithm and **MUST NOT** selectively exclude frontmatter fields from hash input.
Expand Down
35 changes: 34 additions & 1 deletion docs/src/content/docs/reference/fuzzy-schedule-specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ This document is governed by the GitHub Agentic Workflows project specifications
9. [Error Handling](#9-error-handling)
10. [Compliance Testing](#10-compliance-testing)
11. [Sync Notes](#11-sync-notes)
12. [Calendar Output Schema](#12-calendar-output-schema)

---

Expand Down Expand Up @@ -1220,7 +1221,7 @@ This section maps the fuzzy schedule specification to implementation files.
| Frontmatter schedule parsing and grammar handling | `pkg/parser/schedule_parser.go` |
| Deterministic fuzzy scattering and peak-minute avoidance | `pkg/parser/schedule_fuzzy_scatter.go` |
| Parser/scatter conformance tests | `pkg/parser/schedule_parser_test.go`, `pkg/parser/schedule_fuzzy_scatter_test.go` |
| Calendar/cron visualization support for compile tooling | `pkg/cli/compile_schedule_calendar.go` |
| Calendar/cron visualization support for compile tooling (see §12) | `pkg/cli/compile_schedule_calendar.go` |

After changing fuzzy schedule semantics:
1. Update this specification section and any affected normative clauses.
Expand All @@ -1229,6 +1230,38 @@ After changing fuzzy schedule semantics:

---

## 12. Calendar Output Schema

The compile-time schedule calendar emitted by `pkg/cli/compile_schedule_calendar.go` documents the
aggregate UTC trigger density of scheduled workflows. A conforming implementation MUST treat the
calendar as a human-readable console artifact rather than a machine-readable file format.

| Element | Requirement |
|---|---|
| Output stream | MUST be written to `stderr` only, and MUST NOT be emitted in JSON output mode. |
| Emission condition | MUST be omitted when no scheduled workflows are present. |
| Title line | MUST render the heading `Schedule Heatmap (UTC)`. |
| Hour header | MUST contain 24 UTC hour labels from `00` through `23`, in ascending order. |
| Day rows | MUST render exactly seven rows in `Mon`, `Tue`, `Wed`, `Thu`, `Fri`, `Sat`, `Sun` order. |
| Cells | MUST render one glyph per hour slot using the implementation's intensity mapping (`·`, `░`, `▒`, `▓`, `█`). |
| Legend | MUST explain the trigger-count buckets for each glyph after the grid. |
| File output | MUST NOT create a separate file; the calendar is an inline stderr rendering only. |

Implementations SHOULD preserve a fixed-width grid so adjacent cells remain visually aligned in
plain-text terminals. ANSI styling MAY be applied when stderr is a terminal, but the unstyled text
content MUST preserve the same row/column structure.

### Version 1.2.0 (Draft) — 2026-05-12

- **Changed**: Daily, weekly, bi-weekly, and tri-weekly scattering now share the weighted 622-slot
pool introduced in Sections 6.3.1 and 6.3.5–6.3.6.
- **Added**: Peak-minute avoidance rules in Section 6.4 to steer schedules away from `:00`, `:15`,
`:30`, and `:45` hotspot minutes during documented peak windows.
- **Added**: Calendar output schema requirements (Section 12) for the compile-time heatmap rendered
by `compile_schedule_calendar.go`.

---

## References

### Normative References
Expand Down
Loading