github · pelikhan · May 13, 2026 · May 13, 2026 · May 13, 2026
diff --git a/docs/src/content/docs/reference/effective-tokens-specification.md b/docs/src/content/docs/reference/effective-tokens-specification.md
@@ -238,6 +238,15 @@ The root invocation MUST have `parent_id = null`. It represents the user-facing
 
 Each sub-agent invocation MUST reference a valid `parent_id`. Sub-agent invocations MAY recursively spawn further invocations.
 
+For execution graphs deeper than two levels, implementations MUST aggregate descendant Effective
+Tokens in stable post-order: fully observed leaf descendants first, then their nearest observed
+ancestors, and finally the parent node's local invocation cost. When a parent has incomplete or
+unobservable descendants, the implementation MUST report the partial sum accumulated from the
+deepest observed descendants before adding any shallower fallback estimates, and SHOULD keep the
+parent node flagged until all known descendants are either observed or explicitly marked
+unobservable. Repeated computations over the same partially observed graph MUST produce the same
+partial-ordering and subtotal sequence.
+
 ---
 
 ## 7. Reporting
@@ -292,6 +301,12 @@ integer interoperability in cross-language pipelines.
 reported `summary.effective_tokens` value to the ceiling and **MUST** emit a warning indicating
 that capping occurred.
 
+**R-SAFE-003A**: When ET capping occurs, implementations **MUST** record a deterministic overflow
+condition using either `flagged.code = "ET_OVERFLOW"` on the affected root/subtree node or a
+deterministic error when no structured flag channel is available. The error/flag payload **MUST**
+include the ceiling value `9007199254740991` so operators can distinguish overflow from missing
+usage data.
+
 **R-SAFE-004**: For long multi-agent chains, implementations **SHOULD** aggregate ET in a
 streaming manner (incremental updates per invocation) and **SHOULD** emit an early warning when
 running totals exceed 80% of the ceiling.

diff --git a/docs/src/content/docs/reference/experiments-specification.md b/docs/src/content/docs/reference/experiments-specification.md
@@ -38,17 +38,20 @@ implementation (gh-aw v1.x) satisfies all conformance requirements below.
 Promotion from **Draft** to **Candidate Recommendation** requires all of the following:
 
 1. **Reference implementation completeness**: 100% of normative requirements in §§4–12 are
-   implemented in `gh-aw` and mapped to concrete implementation files.
+   implemented in `gh-aw` and mapped to concrete implementation files (**tracking issue**:
+   [#31983](https://github.com/github/gh-aw/issues/31983)).
 2. **Compliance coverage**: At least 95% of normative requirements have automated tests, and
-   all MUST/MUST NOT requirements have at least one passing automated test.
+   all MUST/MUST NOT requirements have at least one passing automated test (**tracking issue**:
+   [#31983](https://github.com/github/gh-aw/issues/31983)).
 3. **CI stability window**: The experiments-related test suite passes on the default branch for
    30 consecutive days with no unresolved regression in variant selection, persistence, or
-   reporting behavior.
+   reporting behavior (**tracking issue**: [#31983](https://github.com/github/gh-aw/issues/31983)).
 4. **Interoperability evidence**: At least two production workflows using `experiments:` run for
    a minimum of 500 total assignments each with valid assignment artifacts and reproducible
-   audit output.
+   audit output (**tracking issue**: [#31983](https://github.com/github/gh-aw/issues/31983)).
 5. **Review sign-off**: Written approval from at least two gh-aw maintainers that Sections 10–14
-   are complete, internally consistent, and suitable for Candidate Recommendation publication.
+   are complete, internally consistent, and suitable for Candidate Recommendation publication
+   (**tracking issue**: [#31983](https://github.com/github/gh-aw/issues/31983)).
 
 ### Sync
 

diff --git a/docs/src/content/docs/reference/forecast-specification.md b/docs/src/content/docs/reference/forecast-specification.md
@@ -208,6 +208,7 @@ If a provided `workflow_id` does not match any discovered workflow, the implemen
 | `--days` | int | `30` | Length of the historical sampling window in days. Permitted values: `7`, `30`. |
 | `--period` | string | `"month"` | Projection period length. Permitted values: `"week"`, `"month"`. |
 | `--sample` | int | `100` | Maximum number of completed runs to sample per workflow. MUST be ≥ 1. |
+| `--max-age` | int | `90` | Maximum age in days for historical runs eligible for sampling. Implementations SHOULD discard runs older than this bound unless the caller overrides it. MUST be ≥ 1. |
 | `--repo` | string | (none) | Target a repository other than the current working directory, in `owner/repo` format. Enables remote mode. |
 | `--json` | bool | `false` | Emit machine-readable JSON output instead of console tables. |
 | `--verbose` | bool | `false` | Emit verbose diagnostic output to stderr during processing. |
@@ -220,6 +221,7 @@ Implementations MUST validate all flag values before beginning any API calls or
 - **R-CLI-002**: If `--period` is not one of `{"week", "month"}`, the implementation MUST exit with a non-zero status and an error message specifying the permitted values.
 - **R-CLI-003**: If `--sample` is less than 1, the implementation MUST exit with a non-zero status.
 - **R-CLI-004**: If `--repo` is provided, it MUST match the pattern `owner/repo` (two non-empty components separated by `/`). An invalid format MUST produce a non-zero exit with a descriptive error.
+- **R-CLI-005**: If `--max-age` is provided and is less than 1, the implementation MUST exit with a non-zero status and a descriptive error.
 
 ### 4.5 Exit Codes
 
@@ -250,6 +252,9 @@ gh aw forecast --repo owner/repo
 
 # Forecast a specific workflow in a remote repository
 gh aw forecast --repo owner/repo ci-doctor
+
+# Ignore historical runs older than 90 days (default)
+gh aw forecast --max-age 90
 ```
 
 ---
@@ -283,9 +288,10 @@ Frontmatter enrichment is OPTIONAL; absence of a corresponding source file MUST
 
 In remote mode (when `--repo owner/repo` is specified), the implementation MUST:
 
-1. **R-DISC-010**: Call the GitHub Actions API (`GET /repos/{owner}/{repo}/actions/workflows`) to enumerate workflows in the target repository.
+1. **R-DISC-010**: Call the GitHub Actions API (`GET /repos/{owner}/{repo}/actions/workflows`) to enumerate workflows in the target repository. If workflow discovery hits a primary or secondary GitHub API rate limit, the implementation SHOULD back off and retry before failing.
 2. **R-DISC-011**: Filter the returned workflows to those identified as agentic (e.g., by inspecting file-path conventions, labels, or other implementation-defined heuristics).
 3. **R-DISC-012**: Match any caller-supplied `workflow_id` positional arguments against workflow display names and file-path basenames using case-insensitive string comparison.
+4. **R-DISC-013**: If rate-limit exhaustion occurs after at least one caller-supplied workflow identifier can still be attempted, the implementation MUST continue with that subset as a partial result set and MUST emit a warning identifying the degraded discovery mode.
 
 In remote mode, frontmatter metadata (triggers, concurrency, experiment variants) is UNAVAILABLE because the workflow source files are not accessible. The implementation MUST degrade gracefully: fields that depend on frontmatter MUST be omitted from output or reported as their zero/empty values rather than causing an error.
 
@@ -308,14 +314,15 @@ For each discovered workflow (or each workflow in the filtered set), the impleme
 
 1. **R-SAMP-001**: Query completed workflow runs within the historical window using the equivalent of `gh run list --workflow <id> --status completed --limit <sample> --created >=<cutoff>`.
 2. **R-SAMP-002**: Limit the returned run set to at most `--sample` runs.
-3. **R-SAMP-003**: For each run in the sample, derive the per-run metrics defined in Section 6.2.
-4. **R-SAMP-004**: Record the count of runs with a successful conclusion separately from the total sampled count.
+3. **R-SAMP-003**: Implementations SHOULD discard historical runs older than 90 days by default, even when a broader sampling window is requested, and SHOULD expose this bound through a `--max-age` flag so operators can opt in to older samples when needed.
+4. **R-SAMP-004**: For each run in the sample, derive the per-run metrics defined in Section 6.2.
+5. **R-SAMP-005**: Record the count of runs with a successful conclusion separately from the total sampled count.
 
 If the historical window yields zero completed runs for a workflow, the implementation MUST:
 
-- **R-SAMP-005**: Return `nil` (or a sentinel empty result) for that workflow's Monte Carlo projection.
-- **R-SAMP-006**: Include the workflow in output with `sampled_runs: 0` and all projection fields set to zero.
-- **R-SAMP-007**: SHOULD emit a warning indicating that no historical data is available for the workflow.
+- **R-SAMP-006**: Return `nil` (or a sentinel empty result) for that workflow's Monte Carlo projection.
+- **R-SAMP-007**: Include the workflow in output with `sampled_runs: 0` and all projection fields set to zero.
+- **R-SAMP-008**: SHOULD emit a warning indicating that no historical data is available for the workflow.
 
 ### 6.2 Per-Run Metric Derivation
 
@@ -807,6 +814,7 @@ Because the forecast command is marked **Experimental**:
 - **T-FC-011**: Local mode: no lock files found exits with code `3`.
 - **T-FC-012**: Remote mode: calls GitHub Actions API and matches workflow IDs case-insensitively.
 - **T-FC-013**: Remote mode: missing frontmatter fields default to zero/empty without error.
+- **T-FC-030**: Remote mode: on GitHub API rate-limit exhaustion during workflow discovery, the implementation backs off and emits a warning before continuing with caller-supplied workflow IDs as partial results.
 
 #### 12.1.3 Data Sampling Tests
 
@@ -817,23 +825,23 @@ Because the forecast command is marked **Experimental**:
 
 #### 12.1.4 Monte Carlo Engine Tests
 
-- **T-FC-030**: With `λ ≤ 15`, Knuth's algorithm is used for Poisson draw (verifiable by seeded PRNG in test mode).
-- **T-FC-031**: With `λ > 15`, Normal approximation is used; drawn value is non-negative.
-- **T-FC-032**: With `λ = 0`, projected tokens is exactly `0` for all trials.
-- **T-FC-033**: Bootstrap resampling draws with replacement from historical ET observations.
-- **T-FC-034**: Only successful Bernoulli draws contribute ET to the trial total.
-- **T-FC-035**: 10,000 trials are executed per workflow.
-- **T-FC-036**: P10 ≤ P50 ≤ P90 for all non-zero projections.
-- **T-FC-037**: `projected_effective_tokens` equals `p50_projected_effective_tokens`.
-- **T-FC-038**: Boundary crossover: `λ = 15` uses Knuth's exact branch.
-- **T-FC-039**: Boundary crossover: `λ > 15` uses Normal approximation branch.
+- **T-FC-031**: With `λ ≤ 15`, Knuth's algorithm is used for Poisson draw (verifiable by seeded PRNG in test mode).
+- **T-FC-032**: With `λ > 15`, Normal approximation is used; drawn value is non-negative.
+- **T-FC-033**: With `λ = 0`, projected tokens is exactly `0` for all trials.
+- **T-FC-034**: Bootstrap resampling draws with replacement from historical ET observations.
+- **T-FC-035**: Only successful Bernoulli draws contribute ET to the trial total.
+- **T-FC-036**: 10,000 trials are executed per workflow.
+- **T-FC-037**: P10 ≤ P50 ≤ P90 for all non-zero projections.
+- **T-FC-038**: `projected_effective_tokens` equals `p50_projected_effective_tokens`.
+- **T-FC-039**: Boundary crossover: `λ = 15` uses Knuth's exact branch.
+- **T-FC-040**: Boundary crossover: `λ > 15` uses Normal approximation branch.
 
 #### 12.1.5 Episode Analysis Tests
 
-- **T-FC-040**: Runs sharing `headSha` and `headBranch` are grouped into the same episode.
-- **T-FC-041**: `runs_per_episode` equals `sampled_run_count / sampled_episodes`.
-- **T-FC-042**: Episode table is printed in console output when any workflow has `runs_per_episode > 1`.
-- **T-FC-043**: Episode table is suppressed when all workflows have `runs_per_episode = 1.0`.
+- **T-FC-041**: Runs sharing `headSha` and `headBranch` are grouped into the same episode.
+- **T-FC-042**: `runs_per_episode` equals `sampled_run_count / sampled_episodes`.
+- **T-FC-043**: Episode table is printed in console output when any workflow has `runs_per_episode > 1`.
+- **T-FC-044**: Episode table is suppressed when all workflows have `runs_per_episode = 1.0`.
 
 #### 12.1.6 Output Format Tests
 
@@ -851,20 +859,21 @@ Because the forecast command is marked **Experimental**:
 | Flag validation | T-FC-001–005 | 1 | Required |
 | Local workflow discovery | T-FC-010–011 | 1 | Required |
 | Remote workflow discovery | T-FC-012–013 | 2 | Required |
+| Remote discovery rate-limit backoff and partial results | T-FC-030 | 2 | Required |
 | Data sampling with limit and window | T-FC-020–021 | 1 | Required |
 | Missing artifact graceful handling | T-FC-022 | 1 | Required |
 | Nil projection for empty sample | T-FC-023 | 1 | Required |
-| Knuth Poisson algorithm (λ ≤ 15) | T-FC-030 | 1 | Required |
-| Normal approximation (λ > 15) | T-FC-031 | 1 | Required |
-| Zero-λ projection | T-FC-032 | 1 | Required |
-| Bootstrap resampling | T-FC-033 | 1 | Required |
-| Bernoulli success filtering | T-FC-034 | 1 | Required |
-| 10,000 trial count | T-FC-035 | 1 | Required |
-| Percentile ordering | T-FC-036 | 1 | Required |
-| P50 field consistency | T-FC-037 | 1 | Required |
-| λ crossover threshold enforcement | T-FC-038–039 | 1 | Required |
-| Episode grouping | T-FC-040–041 | 2 | Required |
-| Episode table display logic | T-FC-042–043 | 2 | Required |
+| Knuth Poisson algorithm (λ ≤ 15) | T-FC-031 | 1 | Required |
+| Normal approximation (λ > 15) | T-FC-032 | 1 | Required |
+| Zero-λ projection | T-FC-033 | 1 | Required |
+| Bootstrap resampling | T-FC-034 | 1 | Required |
+| Bernoulli success filtering | T-FC-035 | 1 | Required |
+| 10,000 trial count | T-FC-036 | 1 | Required |
+| Percentile ordering | T-FC-037 | 1 | Required |
+| P50 field consistency | T-FC-038 | 1 | Required |
+| λ crossover threshold enforcement | T-FC-039–040 | 1 | Required |
+| Episode grouping | T-FC-041–042 | 2 | Required |
+| Episode table display logic | T-FC-043–044 | 2 | Required |
 | Console output columns | T-FC-050 | 1 | Required |
 | JSON schema conformance | T-FC-051–054 | 2 | Required |
 | Experimental status warning | T-FC-055 | 1 | Required |
@@ -879,8 +888,8 @@ This section maps normative forecast requirements to implementation files.
 |---|---|
 | Monte Carlo engine (Poisson/Bootstrap/Bernoulli) | `pkg/cli/forecast_montecarlo.go` |
 | Forecast command orchestration and output fields | `pkg/cli/forecast.go`, `pkg/cli/forecast_command.go` |
-| Workflow/run sampling and API handling | `pkg/cli/forecast.go` |
-| Monte Carlo compliance tests (including λ threshold) | `pkg/cli/forecast_montecarlo_test.go` |
+| Workflow discovery, rate-limit backoff, and run sampling | `pkg/cli/forecast.go` |
+| Forecast compliance tests (including rate-limit backoff and λ thresholds) | `pkg/cli/forecast_montecarlo_test.go` |
 
 Sync procedure:
 1. Update this specification when changing projection algorithms or thresholds.

diff --git a/docs/src/content/docs/reference/frontmatter-hash-specification.md b/docs/src/content/docs/reference/frontmatter-hash-specification.md
@@ -344,14 +344,15 @@ Version changes will be documented and backward compatibility maintained where p
 Per the **Resolution (2026-05-08)** in Implementation Notes, the text-based algorithm remains
 authoritative until a dedicated migration milestone is approved.
 
+Tracking issue: [#31983](https://github.com/github/gh-aw/issues/31983)
+
 The project **MUST NOT** schedule a v2.0.0 migration to the field-selection model until all of
-the following prerequisites are complete:
+the following tracked tasks are complete:
 
-1. A selective field-exclusion use case is confirmed and documented.
-2. A migration guide is drafted, including lock-file invalidation and recompilation steps.
-3. Cross-language (Go + JavaScript) test vectors for the candidate v2.0.0 behavior are written
-   and pass in CI.
-4. A rollout plan is approved by maintainers, including backward-compatibility impact analysis.
+- [ ] Confirm and document a selective field-exclusion use case in [#31983](https://github.com/github/gh-aw/issues/31983).
+- [ ] Draft a migration guide in [#31983](https://github.com/github/gh-aw/issues/31983), including lock-file invalidation and recompilation steps.
+- [ ] Write candidate v2.0.0 cross-language test vectors in [#31983](https://github.com/github/gh-aw/issues/31983) and verify they pass in CI.
+- [ ] Approve a rollout plan in [#31983](https://github.com/github/gh-aw/issues/31983), including backward-compatibility impact analysis.
 
 Until these prerequisites are met, implementations **MUST** continue using the text-based
 algorithm and **MUST NOT** selectively exclude frontmatter fields from hash input.

diff --git a/docs/src/content/docs/reference/fuzzy-schedule-specification.md b/docs/src/content/docs/reference/fuzzy-schedule-specification.md
@@ -35,6 +35,7 @@ This document is governed by the GitHub Agentic Workflows project specifications
 9. [Error Handling](#9-error-handling)
 10. [Compliance Testing](#10-compliance-testing)
 11. [Sync Notes](#11-sync-notes)
+12. [Calendar Output Schema](#12-calendar-output-schema)
 
 ---
 
@@ -1220,7 +1221,7 @@ This section maps the fuzzy schedule specification to implementation files.
 | Frontmatter schedule parsing and grammar handling | `pkg/parser/schedule_parser.go` |
 | Deterministic fuzzy scattering and peak-minute avoidance | `pkg/parser/schedule_fuzzy_scatter.go` |
 | Parser/scatter conformance tests | `pkg/parser/schedule_parser_test.go`, `pkg/parser/schedule_fuzzy_scatter_test.go` |
-| Calendar/cron visualization support for compile tooling | `pkg/cli/compile_schedule_calendar.go` |
+| Calendar/cron visualization support for compile tooling (see §12) | `pkg/cli/compile_schedule_calendar.go` |
 
 After changing fuzzy schedule semantics:
 1. Update this specification section and any affected normative clauses.
@@ -1229,6 +1230,38 @@ After changing fuzzy schedule semantics:
 
 ---
 
+## 12. Calendar Output Schema
+
+The compile-time schedule calendar emitted by `pkg/cli/compile_schedule_calendar.go` documents the
+aggregate UTC trigger density of scheduled workflows. A conforming implementation MUST treat the
+calendar as a human-readable console artifact rather than a machine-readable file format.
+
+| Element | Requirement |
+|---|---|
+| Output stream | MUST be written to `stderr` only, and MUST NOT be emitted in JSON output mode. |
+| Emission condition | MUST be omitted when no scheduled workflows are present. |
+| Title line | MUST render the heading `Schedule Heatmap (UTC)`. |
+| Hour header | MUST contain 24 UTC hour labels from `00` through `23`, in ascending order. |
+| Day rows | MUST render exactly seven rows in `Mon`, `Tue`, `Wed`, `Thu`, `Fri`, `Sat`, `Sun` order. |
+| Cells | MUST render one glyph per hour slot using the implementation's intensity mapping (`·`, `░`, `▒`, `▓`, `█`). |
+| Legend | MUST explain the trigger-count buckets for each glyph after the grid. |
+| File output | MUST NOT create a separate file; the calendar is an inline stderr rendering only. |
+
+Implementations SHOULD preserve a fixed-width grid so adjacent cells remain visually aligned in
+plain-text terminals. ANSI styling MAY be applied when stderr is a terminal, but the unstyled text
+content MUST preserve the same row/column structure.
+
+### Version 1.2.0 (Draft) — 2026-05-12
+
+- **Changed**: Daily, weekly, bi-weekly, and tri-weekly scattering now share the weighted 622-slot
+  pool introduced in Sections 6.3.1 and 6.3.5–6.3.6.
+- **Added**: Peak-minute avoidance rules in Section 6.4 to steer schedules away from `:00`, `:15`,
+  `:30`, and `:45` hotspot minutes during documented peak windows.
+- **Added**: Calendar output schema requirements (Section 12) for the compile-time heatmap rendered
+  by `compile_schedule_calendar.go`.
+
+---
+
 ## References
 
 ### Normative References