SPDD: close safeguards/consistency gaps across 5 specs and add targeted compliance vectors#33365
Merged
Merged
Conversation
17 tasks
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/61093a3c-8544-44a5-ac04-a50b0d10418b Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix gaps in specifications for SPDD completeness
SPDD: close safeguards/consistency gaps across 5 specs and add targeted compliance vectors
May 19, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR closes daily SPDD findings across five specifications (Effective Tokens, Forecast, Frontmatter Hash, Fuzzy Schedule, MCP Scripts) by adding missing safeguards/threat-model sections, normalizing retry/error/lower-bound semantics, and adding three targeted compliance tests (T-ET-032, FH-TV-004, T-MCP-050).
Changes:
- Spec edits add new normative requirements (R-DISC-014, R-OUT-010 lower-bound disclosure, R-HASH-001, R-ERR-050..052) and safeguards/threat-model appendices for Forecast and MCP Scripts.
- Adds compliance test IDs T-ET-032 (deep-graph post-order), FH-TV-004 (diamond-import hash vector), and T-MCP-050 (Go sandbox network isolation), plus checklist/count updates.
- Renumbers Forecast
R-OUT-010..012→R-OUT-011..013and reusesR-OUT-010for a different (lower-bound disclosure) requirement.
Show a summary per file
| File | Description |
|---|---|
| docs/src/content/docs/reference/effective-tokens-specification.md | Adds deep-graph ordering constraints, partial-observability example (A.5), and T-ET-032 compliance entry. |
| docs/src/content/docs/reference/forecast-specification.md | Adds R-DISC-014 race-condition norm, lower-bound JSON field, renumbers R-OUT-*, and adds Appendix E/F (race conditions, safeguards). |
| docs/src/content/docs/reference/frontmatter-hash-specification.md | Marks 2026-05-08 text-based resolution as authoritative, adds BFS diamond-import tie-breaking note, adds FH-TV-004 vector. |
| docs/src/content/docs/reference/fuzzy-schedule-specification.md | Adds DST behavior guidance, R-HASH-001 hash stability across minor versions, and §9.6 bounded retry/backoff norms. |
| docs/src/content/docs/reference/mcp-scripts-specification.md | Clarifies dependency failure semantics, adds JS global side-effect constraints, adds Appendix D safeguards and T-MCP-050. |
| pkg/cli/effective_tokens_compliance_test.go | Adds T-ET-032 test (currently exercises a test-local walker, not production aggregation code). |
| pkg/parser/frontmatter_hash_cross_language_test.go | Adds FH-TV-004 test with hardcoded expected hash for the agent-import diamond scenario. |
| pkg/workflow/mcp_scripts_firewall_test.go | Adds T-MCP-050 test asserting no wildcard/0.0.0.0/0 in computed allowed-domain string. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 8/8 changed files
- Comments generated: 3
Comment on lines
+258
to
+309
| type testNode struct { | ||
| id string | ||
| localET float64 | ||
| observed bool | ||
| children []*testNode | ||
| fallback float64 | ||
| } | ||
|
|
||
| // root | ||
| // ├─ planner | ||
| // │ ├─ retrieval | ||
| // │ │ └─ shard-1 | ||
| // │ └─ shard-2 (unobservable fallback) | ||
| // └─ synthesis | ||
| shard1 := &testNode{id: "shard-1", localET: 60, observed: true} | ||
| retrieval := &testNode{id: "retrieval", localET: 120, observed: true, children: []*testNode{shard1}} | ||
| shard2 := &testNode{id: "shard-2", observed: false, fallback: 25} | ||
| planner := &testNode{id: "planner", localET: 30, observed: true, children: []*testNode{retrieval, shard2}} | ||
| synthesis := &testNode{id: "synthesis", localET: 40, observed: true} | ||
| root := &testNode{id: "root", localET: 10, observed: true, children: []*testNode{planner, synthesis}} | ||
|
|
||
| var order []string | ||
| var subtotals []float64 | ||
| running := 0.0 | ||
|
|
||
| var walk func(n *testNode) | ||
| walk = func(n *testNode) { | ||
| for _, child := range n.children { | ||
| walk(child) | ||
| } | ||
|
|
||
| value := n.localET | ||
| if !n.observed { | ||
| value = n.fallback | ||
| } | ||
|
|
||
| running += value | ||
| order = append(order, n.id) | ||
| subtotals = append(subtotals, running) | ||
| } | ||
|
|
||
| walk(root) | ||
|
|
||
| assert.Equal(t, | ||
| []string{"shard-1", "retrieval", "shard-2", "planner", "synthesis", "root"}, | ||
| order, | ||
| "T-ET-032: aggregation order must be stable post-order for deep graphs") | ||
|
|
||
| assert.Equal(t, | ||
| []float64{60, 180, 205, 235, 275, 285}, | ||
| subtotals, | ||
| "T-ET-032: partial subtotals must remain deterministic under partial observability") |
Comment on lines
+737
to
+743
| - **R-OUT-010**: JSON output MUST disclose episode lower-bound semantics by including | ||
| `episode_analysis.episode_count_is_lower_bound` for every workflow. Console output SHOULD include | ||
| a note when this field is `true`. | ||
| - **R-OUT-011**: In both console and JSON output, workflows MUST be ordered by | ||
| `projected_effective_tokens` (P50 value) in descending order. | ||
| - **R-OUT-012**: Workflows with zero projected tokens MUST appear after all workflows with non-zero projections. | ||
| - **R-OUT-013**: Among workflows with equal projected tokens, the ordering SHOULD be deterministic (e.g., alphabetical by workflow ID). |
| | Field | Type | Required | Description | | ||
| |---|---|---|---| | ||
| | `sampled_episodes` | integer | MUST | Distinct episode count. Lower-bound estimate when artifact linkage is unavailable. | | ||
| | `episode_count_is_lower_bound` | boolean | MUST | `true` when episode linkage data is incomplete (for example, remote mode without artifacts); otherwise `false`. | |
Collaborator
|
@copilot review all comments and reviews |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…rement IDs Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Contributor
Author
Done. I reviewed the feedback and pushed follow-up fixes:
Targeted validation run: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses the daily SPDD findings across Effective Tokens, Forecast, Frontmatter Hash, Fuzzy Schedule, and MCP Scripts specifications. It closes missing safeguards/threat-model coverage, normalizes key norms (retry/error/lower-bound semantics), and adds missing compliance vectors/tests referenced by the specs.
Effective Tokens spec: deep-graph ordering + partial observability
T-ET-032and aligned checklist/counts.T-ET-032inpkg/cli/effective_tokens_compliance_test.go.Forecast spec: safeguards format + discovery race handling + lower-bound disclosure
R-DISC-014) and mitigation guidance.episode_count_is_lower_bound(normative disclosure).R-OUT-010is explicit lower-bound disclosure and shifted ordering requirements toR-OUT-011..013.Frontmatter Hash spec: sync note authority + diamond-import tie-breaking + new vector
FH-TV-004(agent import / diamond path scenario) and expected hash.pkg/parser/frontmatter_hash_cross_language_test.go.Fuzzy Schedule spec: retry/backoff norms + DST behavior + minor-version hash stability
R-HASH-001requiring hash-derived scatter stability across minor version upgrades.MCP Scripts spec: safeguards completion + JS global side-effect constraints + dependency failure semantics
core.setFailed()from tool code).T-MCP-050(Go sandbox network isolation) and checklist entry.pkg/workflow/mcp_scripts_firewall_test.go.Example of newly normalized JSON disclosure in Forecast episode analysis: