SPDD: close safeguards/consistency gaps across 5 specs and add targeted compliance vectors by Copilot · Pull Request #33365 · github/gh-aw

Copilot · 2026-05-19T16:56:46Z

This PR addresses the daily SPDD findings across Effective Tokens, Forecast, Frontmatter Hash, Fuzzy Schedule, and MCP Scripts specifications. It closes missing safeguards/threat-model coverage, normalizes key norms (retry/error/lower-bound semantics), and adds missing compliance vectors/tests referenced by the specs.

Effective Tokens spec: deep-graph ordering + partial observability
- Added explicit implementation ordering constraints for multi-invocation aggregation in deep graphs.
- Added concrete partial-observability examples (including deterministic subtotal sequencing).
- Added new compliance requirement T-ET-032 and aligned checklist/counts.
- Added test coverage for T-ET-032 in pkg/cli/effective_tokens_compliance_test.go.
Forecast spec: safeguards format + discovery race handling + lower-bound disclosure
- Added remote discovery race-condition requirement (R-DISC-014) and mitigation guidance.
- Extended episode-analysis JSON schema with episode_count_is_lower_bound (normative disclosure).
- Reframed output-ordering norms so R-OUT-010 is explicit lower-bound disclosure and shifted ordering requirements to R-OUT-011..013.
- Reworked appendices to include:
  - Appendix E: workflow discovery race conditions
  - Appendix F: safeguards (threat model, mitigations, residual risk)
Frontmatter Hash spec: sync note authority + diamond-import tie-breaking + new vector
- Updated sync notes to explicitly mark the 2026-05-08 text-based resolution as authoritative, with 2026-05-12 as verification.
- Added deterministic BFS diamond-import tie-breaking requirement to the sync checklist.
- Added FH-TV-004 (agent import / diamond path scenario) and expected hash.
- Added matching test vector in pkg/parser/frontmatter_hash_cross_language_test.go.
Fuzzy Schedule spec: retry/backoff norms + DST behavior + minor-version hash stability
- Added DST-transition behavior guidance for abbreviation-based schedules.
- Added R-HASH-001 requiring hash-derived scatter stability across minor version upgrades.
- Added Section 9.6 with bounded retry/backoff norms for collision/contention paths.
MCP Scripts spec: safeguards completion + JS global side-effect constraints + dependency failure semantics
- Clarified dependency failure behavior: fail-fast for deterministic failures; bounded retry only for transient failures.
- Added explicit side-effect constraint for JS globals (no workflow-control mutators like core.setFailed() from tool code).
- Added safeguards appendix with threat model/mitigations/residual risk.
- Added compliance ID T-MCP-050 (Go sandbox network isolation) and checklist entry.
- Added corresponding targeted test in pkg/workflow/mcp_scripts_firewall_test.go.

Example of newly normalized JSON disclosure in Forecast episode analysis:

"episode_analysis": {
  "sampled_episodes": 40,
  "episode_count_is_lower_bound": true,
  "runs_per_episode": 1.05,
  "avg_effective_tokens_per_episode": 13100,
  "observed_episodes_per_period": 36.7
}

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/61093a3c-8544-44a5-ac04-a50b0d10418b Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>

Copilot

Pull request overview

This PR closes daily SPDD findings across five specifications (Effective Tokens, Forecast, Frontmatter Hash, Fuzzy Schedule, MCP Scripts) by adding missing safeguards/threat-model sections, normalizing retry/error/lower-bound semantics, and adding three targeted compliance tests (T-ET-032, FH-TV-004, T-MCP-050).

Changes:

Spec edits add new normative requirements (R-DISC-014, R-OUT-010 lower-bound disclosure, R-HASH-001, R-ERR-050..052) and safeguards/threat-model appendices for Forecast and MCP Scripts.
Adds compliance test IDs T-ET-032 (deep-graph post-order), FH-TV-004 (diamond-import hash vector), and T-MCP-050 (Go sandbox network isolation), plus checklist/count updates.
Renumbers Forecast R-OUT-010..012 → R-OUT-011..013 and reuses R-OUT-010 for a different (lower-bound disclosure) requirement.

Show a summary per file

File	Description
docs/src/content/docs/reference/effective-tokens-specification.md	Adds deep-graph ordering constraints, partial-observability example (A.5), and T-ET-032 compliance entry.
docs/src/content/docs/reference/forecast-specification.md	Adds R-DISC-014 race-condition norm, lower-bound JSON field, renumbers R-OUT-*, and adds Appendix E/F (race conditions, safeguards).
docs/src/content/docs/reference/frontmatter-hash-specification.md	Marks 2026-05-08 text-based resolution as authoritative, adds BFS diamond-import tie-breaking note, adds FH-TV-004 vector.
docs/src/content/docs/reference/fuzzy-schedule-specification.md	Adds DST behavior guidance, R-HASH-001 hash stability across minor versions, and §9.6 bounded retry/backoff norms.
docs/src/content/docs/reference/mcp-scripts-specification.md	Clarifies dependency failure semantics, adds JS global side-effect constraints, adds Appendix D safeguards and T-MCP-050.
pkg/cli/effective_tokens_compliance_test.go	Adds T-ET-032 test (currently exercises a test-local walker, not production aggregation code).
pkg/parser/frontmatter_hash_cross_language_test.go	Adds FH-TV-004 test with hardcoded expected hash for the agent-import diamond scenario.
pkg/workflow/mcp_scripts_firewall_test.go	Adds T-MCP-050 test asserting no wildcard/`0.0.0.0/0` in computed allowed-domain string.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 8/8 changed files
Comments generated: 3

+	type testNode struct {
+		id       string
+		localET  float64
+		observed bool
+		children []*testNode
+		fallback float64
+	}
+
+	// root
+	// ├─ planner
+	// │  ├─ retrieval
+	// │  │  └─ shard-1
+	// │  └─ shard-2 (unobservable fallback)
+	// └─ synthesis
+	shard1 := &testNode{id: "shard-1", localET: 60, observed: true}
+	retrieval := &testNode{id: "retrieval", localET: 120, observed: true, children: []*testNode{shard1}}
+	shard2 := &testNode{id: "shard-2", observed: false, fallback: 25}
+	planner := &testNode{id: "planner", localET: 30, observed: true, children: []*testNode{retrieval, shard2}}
+	synthesis := &testNode{id: "synthesis", localET: 40, observed: true}
+	root := &testNode{id: "root", localET: 10, observed: true, children: []*testNode{planner, synthesis}}
+
+	var order []string
+	var subtotals []float64
+	running := 0.0
+
+	var walk func(n *testNode)
+	walk = func(n *testNode) {
+		for _, child := range n.children {
+			walk(child)
+		}
+
+		value := n.localET
+		if !n.observed {
+			value = n.fallback
+		}
+
+		running += value
+		order = append(order, n.id)
+		subtotals = append(subtotals, running)
+	}
+
+	walk(root)
+
+	assert.Equal(t,
+		[]string{"shard-1", "retrieval", "shard-2", "planner", "synthesis", "root"},
+		order,
+		"T-ET-032: aggregation order must be stable post-order for deep graphs")
+
+	assert.Equal(t,
+		[]float64{60, 180, 205, 235, 275, 285},
+		subtotals,
+		"T-ET-032: partial subtotals must remain deterministic under partial observability")


+- **R-OUT-010**: JSON output MUST disclose episode lower-bound semantics by including
+  `episode_analysis.episode_count_is_lower_bound` for every workflow. Console output SHOULD include
+  a note when this field is `true`.
+- **R-OUT-011**: In both console and JSON output, workflows MUST be ordered by
+  `projected_effective_tokens` (P50 value) in descending order.
+- **R-OUT-012**: Workflows with zero projected tokens MUST appear after all workflows with non-zero projections.
+- **R-OUT-013**: Among workflows with equal projected tokens, the ordering SHOULD be deterministic (e.g., alphabetical by workflow ID).


 | Field | Type | Required | Description |
 |---|---|---|---|
 | `sampled_episodes` | integer | MUST | Distinct episode count. Lower-bound estimate when artifact linkage is unavailable. |
+| `episode_count_is_lower_bound` | boolean | MUST | `true` when episode linkage data is incomplete (for example, remote mode without artifacts); otherwise `false`. |


pelikhan · 2026-05-19T19:28:03Z

@copilot review all comments and reviews

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

…rement IDs Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-19T19:38:04Z

@copilot review all comments and reviews

Done. I reviewed the feedback and pushed follow-up fixes:

9019554: updated T-ET-032 to exercise production ET aggregation code, restored Forecast requirement ID compatibility (R-OUT-010..012 unchanged), and made episode_count_is_lower_bound migration-safe (SHOULD) with changelog clarification.
28e64f1: added clarification in T-ET-032 about why multipliers are fixed at 1.0 for deterministic-order isolation.

Targeted validation run: go test -run TestETCompliance_T_ET_032_DeepGraphPostOrderAggregation ./pkg/cli/ passed.

Initial plan

07b31c4

Copilot AI assigned Copilot and gh-aw-bot May 19, 2026

Copilot started work on behalf of gh-aw-bot May 19, 2026 16:56 View session

Copilot AI linked an issue May 19, 2026 that may be closed by this pull request

[spdd] Daily spec work plan - 2026-05-19 #33364

Closed

17 tasks

docs: complete daily SPDD spec updates and targeted compliance tests

ca1e456

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/61093a3c-8544-44a5-ac04-a50b0d10418b Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix gaps in specifications for SPDD completeness~~ SPDD: close safeguards/consistency gaps across 5 specs and add targeted compliance vectors May 19, 2026

Copilot finished work on behalf of gh-aw-bot May 19, 2026 17:14

Copilot AI requested a review from gh-aw-bot May 19, 2026 17:14

pelikhan marked this pull request as ready for review May 19, 2026 18:33

Copilot AI review requested due to automatic review settings May 19, 2026 18:33

Copilot started reviewing on behalf of pelikhan May 19, 2026 18:34 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Copilot started work on behalf of pelikhan May 19, 2026 19:28 View session

Copilot AI and others added 3 commits May 19, 2026 19:32

chore: start review-feedback follow-up

289c37c

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

fix: address review feedback on ET compliance test and forecast requi…

9019554

…rement IDs Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

chore: clarify deterministic multiplier setup in T-ET-032

28e64f1

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 19, 2026 19:38

Copilot AI requested a review from pelikhan May 19, 2026 19:38

Merge branch 'main' into copilot/spdd-daily-spec-work-plan-2026-05-19

dfaa0e6

pelikhan merged commit 2e27970 into main May 19, 2026
24 of 25 checks passed

pelikhan deleted the copilot/spdd-daily-spec-work-plan-2026-05-19 branch May 19, 2026 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPDD: close safeguards/consistency gaps across 5 specs and add targeted compliance vectors#33365

SPDD: close safeguards/consistency gaps across 5 specs and add targeted compliance vectors#33365
pelikhan merged 6 commits into
mainfrom
copilot/spdd-daily-spec-work-plan-2026-05-19

Copilot AI commented May 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

pelikhan commented May 19, 2026

Uh oh!

Copilot AI commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

pelikhan commented May 19, 2026

Uh oh!

Copilot AI commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented May 19, 2026 •

edited

Loading