SPDD 2026-05-28: P0/P1/P2 spec work — implement SM-IS-01 10 KB limit (all transports), fix R-IMPL-040, add norms/vectors/safeguards#35528
Conversation
…c notes Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
…, Go go.mod requirements Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
…hen effective-tokens §8.5 Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
…lidateStringInputLengths Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR advances SPDD sync work by aligning selected implementation behavior with spec requirements and adding/clarifying normative specification content across forecast, MCP scripts, frontmatter hash, fuzzy schedule, and effective tokens.
Changes:
- Suppresses the forecast experimental warning for JSON output and adds a regression test.
- Adds MCP scripts per-string input length validation for the HTTP server path with unit tests.
- Expands several reference specifications with promotion tracking, safeguards, precision rules, test vectors, norms, and Go module requirements.
Show a summary per file
| File | Description |
|---|---|
pkg/cli/forecast.go |
Gates the experimental warning on non-JSON output. |
pkg/cli/forecast_test.go |
Adds stderr-capture coverage for R-IMPL-040 warning behavior. |
actions/setup/js/mcp_scripts_validation.cjs |
Adds string input byte-limit validation helper and constant. |
actions/setup/js/mcp_scripts_validation.test.cjs |
Adds unit tests for string length validation. |
actions/setup/js/mcp_scripts_mcp_server_http.cjs |
Applies string length validation before HTTP tool handler invocation. |
docs/src/content/docs/reference/mcp-scripts-specification.md |
Updates SM-IS-01 status and adds Go module dependency requirements. |
docs/src/content/docs/reference/fuzzy-schedule-specification.md |
Adds calendar schema subsections and new norms. |
docs/src/content/docs/reference/frontmatter-hash-specification.md |
Adds BFS import test vectors and S-6 sync verification. |
docs/src/content/docs/reference/forecast-specification.md |
Adds promotion tracking and R-IMPL-040 sync note. |
docs/src/content/docs/reference/effective-tokens-specification.md |
Adds precision/rounding rules and expanded safeguards. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 10/10 changed files
- Comments generated: 8
| // SM-IS-01: Validate per-string input length limits (10 KB max per string parameter). | ||
| const oversized = validateStringInputLengths(args, tool.inputSchema); | ||
| if (oversized.length) { | ||
| const details = oversized.map(v => `'${v.field}' (${v.byteLength} bytes)`).join(", "); | ||
| throw new Error(`Input string parameter(s) exceed the 10 KB limit for tool '${tool.name}': ${details}`); |
| **Sync note (2026-05-28)**: SM-IS-01 enforcement is now implemented. `validateStringInputLengths` | ||
| has been added to `actions/setup/js/mcp_scripts_validation.cjs` and is called by the HTTP server | ||
| in `actions/setup/js/mcp_scripts_mcp_server_http.cjs` immediately after required-field validation. | ||
| Any string-typed input parameter whose UTF-8 byte length exceeds `MAX_STRING_INPUT_BYTES` (10 KB = | ||
| 10,240 bytes) causes the tool invocation to be rejected with a descriptive error before the handler | ||
| is called. Implementations MUST NOT silently truncate oversized inputs. Verified by unit tests in |
| **Reference implementation behavior**: In the Go reference implementation | ||
| (`pkg/parser/effective_tokens.go`), `derived.effective_tokens` is computed as | ||
| `int(math.Floor(base_weighted_tokens * model_multiplier))`. The `math.Floor` call implements | ||
| truncation for non-negative values; the result is cast to `int` and stored in a 64-bit integer |
| top-level fields when processing empty telemetry. Callers MAY emit a non-fatal warning to indicate | ||
| that no usage data was available, but MUST NOT treat an empty graph as an error. | ||
|
|
||
| Normative requirements: **R-SAFE-005** ET aggregation logic **MUST** detect overflow and non-finite arithmetic states |
| Go and JavaScript implementations. These vectors exercise the three required corpus cases from | ||
| **R-XLANG-001**: empty frontmatter, single-level import, and the diamond-import pattern. | ||
|
|
||
| All hashes are verified in CI via `pkg/parser/frontmatter_hash_cross_language_test.go`. |
| - Special characters and escaping | ||
| - All workflows in the repository | ||
|
|
||
| ### 5.2 BFS Diamond-Import Test Vectors |
|
|
||
| Additional imports MAY be added by the user in their code. | ||
|
|
||
| #### 6.4.5 Module and Dependency Requirements |
|
|
||
| When sub-agents are not fully observable, implementations MUST still report aggregate totals. Invocation nodes with incomplete data SHOULD be flagged to indicate missing information. | ||
|
|
||
| ### 8.6 Precision and Rounding |
|
@copilot review all comments |
|
🧪 Test Quality Sentinel completed test quality analysis. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #35528 does not have the 'implementation' label and has only 37 new lines of code in business logic directories (below the 100-line threshold). |
|
✅ PR Code Quality Reviewer completed the code quality review. |
🧪 Test Quality Sentinel Report✅ Test Quality Score: 87/100 — Excellent
📊 Metrics & Test Classification (10 tests analyzed)
Test Classification Details
Language SupportTests analyzed:
Verdict
Score breakdown: Behavioral coverage 40/40 · Edge case coverage 27/30 · Duplication 20/20 · Proportional growth 0/10 (both test files exceed the 2:1 inflation ratio, which is expected given that boundary/UTF-8 tests are inherently verbose relative to minimal production changes). 📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators. References: §26592668846
|
There was a problem hiding this comment.
🔎 Code quality review by PR Code Quality Reviewer · sonnet46 1.4M
Comments that could not be inline-anchored
pkg/cli/forecast_test.go:756
captureStderr captures a closed *os.File as orig on its second call, leaving os.Stderr in a broken state between test cleanups.
<details>
<summary>💡 Details and suggested fix</summary>
captureStderr registers t.Cleanup(func() { os.Stderr = orig }) but then immediately closes w (the write-end pipe it just assigned to os.Stderr) before returning. t.Cleanup callbacks run after the test body finishes, not after each call to captureStderr.
On the second call:
orig…
</details>
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /grill-with-docs, /tdd, and /diagnose — requesting changes for two spec correctness bugs and three code-quality issues.
📋 Key Themes & Highlights
Blocking Issues
- §8.6 section ordering (
effective-tokens-specification.md): §8.6 Precision and Rounding is inserted before §8.5 Safeguards in the file, causing mis-ordered rendering and broken internal narrative. - Malformed R-SAFE-001 label (
effective-tokens-specification.md): The S-6c normative-requirements footer was accidentally merged with the opening line of the existing**R-SAFE-001**paragraph, erasing the requirement identifier.
Non-Blocking Suggestions
argsnull guard (mcp_scripts_validation.cjs):validateStringInputLengthswill throw ifargsisnull/undefined; the schema null-path is tested but the args null-path is not.await import()inconsistency (mcp_scripts_validation.test.cjs): All 10 new tests use dynamicimport()where the file otherwise usesrequire. Easy to align.os.Stderrglobal mutation (forecast_test.go): The test replaces the process-globalos.Stderr; consider threading aStderr io.WriterthroughForecastConfiginstead.
Positive Highlights
- ✅ SM-IS-01 implementation is clean, well-scoped to top-level string properties, and correctly uses
Buffer.byteLengthfor UTF-8 byte counting. - ✅ Excellent test coverage for the boundary, multi-byte, custom-limit, and missing-value paths.
- ✅ R-IMPL-040 fix is minimal and correctly placed; the test directly verifies both the suppressed and non-suppressed paths.
- ✅ BFS diamond-import test vectors in
frontmatter-hash-specification.mdare thorough and include verified SHA-256 hashes — a strong cross-language conformance asset.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 2.1M
|
|
||
| ### 8.4 Partial Visibility | ||
|
|
||
| When sub-agents are not fully observable, implementations MUST still report aggregate totals. Invocation nodes with incomplete data SHOULD be flagged to indicate missing information. |
There was a problem hiding this comment.
[/grill-with-docs] §8.6 is numbered out of order — it appears before §8.5 in the file, so the published docs will render §8.6 (Precision and Rounding) before §8.5 (Safeguards), breaking the forward-reference from S-6 back to §8.5.
💡 Fix
Move the entire §8.6 block so it appears after §8.5 Safeguards:
### 8.5 Safeguards
...(existing content + new S-6a/S-6b/S-6c sub-sections)...
### 8.6 Precision and Rounding
Alternatively, renumber the new section to §8.6 only after §8.5's last paragraph.
| Normative requirements: **R-SAFE-005** ET aggregation logic **MUST** detect overflow and non-finite arithmetic states | ||
| (`NaN`, `+Inf`, `-Inf`) before serializing output. | ||
|
|
||
| **R-SAFE-002**: Implementations **MUST** enforce a maximum ET ceiling of |
There was a problem hiding this comment.
[/grill-with-docs] Malformed normative reference: the S-6c closing line and the pre-existing **R-SAFE-001** paragraph appear to have been accidentally merged into a single line, destroying the **R-SAFE-001**: label.
💡 What happened and how to fix it
The diff shows:
-**R-SAFE-001**: ET aggregation logic **MUST** detect overflow...
+Normative requirements: **R-SAFE-005** ET aggregation logic **MUST** detect overflow...
The intended result should be two separate lines:
Normative requirements: **R-SAFE-005**
**R-SAFE-001**: ET aggregation logic **MUST** detect overflow and non-finite arithmetic statesAs written, R-SAFE-001 loses its identifier label, which breaks every spec cross-reference that cites R-SAFE-001.
| if (typeof value === "string") { | ||
| const byteLength = Buffer.byteLength(value, "utf8"); | ||
| if (byteLength > limit) { | ||
| violations.push({ field, byteLength }); |
There was a problem hiding this comment.
[/tdd] args is not guarded against null/undefined — args[field] will throw a TypeError if a caller passes null as the first argument.
💡 Suggested fix and missing test
The schema null-guard is tested, but there is no parallel test for a null args. Add a guard:
function validateStringInputLengths(args, inputSchema, maxBytes) {
if (!args || typeof args !== 'object') return [];
// ... rest of function
}And add a test:
it('should return empty array when args is null', async () => {
const { validateStringInputLengths } = await import('./mcp_scripts_validation.cjs');
const schema = { type: 'object', properties: { msg: { type: 'string' } } };
expect(validateStringInputLengths(null, schema)).toEqual([]);
expect(validateStringInputLengths(undefined, schema)).toEqual([]);
});The calling code in mcp_scripts_mcp_server_http.cjs always passes args from a validated tool invocation, so this is unlikely to fire in practice — but the validator is a public export and the omission makes it fragile.
| }); | ||
|
|
||
| describe("validateStringInputLengths", () => { | ||
| it("should return empty array when no string properties exceed limit", async () => { |
There was a problem hiding this comment.
[/tdd] All 10 new tests use await import(...) to load the module, but the existing validateRequiredFields tests above use require at the top of the file. This inconsistency can mask test-isolation issues and is harder to read.
💡 Suggestion
Follow the existing pattern used in the validateRequiredFields describe block: require the module once at the top of the file (or at the top of the describe block) and destructure what you need:
const { validateStringInputLengths, MAX_STRING_INPUT_BYTES } = require('./mcp_scripts_validation.cjs');
describe('validateStringInputLengths', () => {
it('should return empty array when no string properties exceed limit', () => {
// No async, no await import — just call the function
const violations = validateStringInputLengths(...);
expect(violations).toEqual([]);
});
});This also eliminates the need for all the it callbacks to be async, making the test suite faster and the error stack traces simpler.
| require.NoError(t, err) | ||
| defer r.Close() | ||
| orig := os.Stderr | ||
| os.Stderr = w |
There was a problem hiding this comment.
[/diagnose] os.Stderr is replaced on the global os package, which is not test-parallel-safe. If another test in the same package ever calls t.Parallel(), this will race.
💡 Safer alternative
The standard pattern for testing stderr in this codebase is to thread an io.Writer through the config struct, then redirect in the test without touching globals:
// In ForecastConfig, add:
Stderr io.Writer // nil means os.Stderr
// In RunForecast:
stderr := config.Stderr
if stderr == nil {
stderr = os.Stderr
}
fmt.Fprintln(stderr, console.FormatWarningMessage(...))
// In the test:
var buf bytes.Buffer
_ = RunForecast(ForecastConfig{..., Stderr: &buf})
assert.Contains(t, buf.String(), "experimental")This avoids the os.Pipe() ceremony and the global-state mutation entirely, and is safe to run in parallel.
…ort, fix spec section ordering, R-PREC-001 rounding, S-6c merge, and BFS test coverage Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed all 8 review comments in commit Code changes:
Spec fixes:
|
|
@copilot precision of effective token does not matter. Integer level precision is good enough. It is an approximation anyway. |
…ffective-tokens spec Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Removed the entire §8.5 Precision and Rounding section (R-PREC-001/002/003) in the latest commit. The Safeguards section is now §8.5 directly after §8.4. |
Addresses the full daily SPDD work plan: 3 P0 sync/impl items, 4 P1 spec-generation items, and 3 P2 analysis/clarification items across
forecast,frontmatter-hash,fuzzy-schedule,effective-tokens, andmcp-scriptsspecs.P0 — Code Fixes + Sync Verification
forecast.go— R-IMPL-040 complianceThe experimental-status warning was unconditionally emitted to stderr; spec requires it be suppressed when
--jsonis set (automated pipeline callers). Fixed with a!config.JSONOutputguard. New testTestRunForecast_R_IMPL_040_ExperimentalWarningcovers both paths.mcp_scripts_validation.cjs+mcp_server_core.cjs— SM-IS-01 implementation (closes #35257)validateStringInputLengthsadded: iterates top-leveltype: "string"schema properties and returns violations where UTF-8 byte length exceedsMAX_STRING_INPUT_BYTES(10 240). Wired into both the HTTP server (mcp_scripts_mcp_server_http.cjs) and the stdio/core server (mcp_server_core.cjs) immediately after required-field validation, before any handler is called. SM-IS-01 is now enforced across all MCP transports. 10 new unit tests cover boundary, multi-byte UTF-8, custom limit override, and null-schema paths.S-6 (frontmatter-hash 1 MiB limit) — verified as already implemented via
validateFrontmatterHashInputSizein Go andvalidateNormalizedFrontmatterHashInputSizein JS;TestFrontmatterHashInputSizeLimitcovers the boundary. Sync note added.P1 — Spec Generation
fuzzy-schedule-specification.md§13 Norms (new section): N-DET-001/002 determinism guarantees across recompilations; R-SAFE-003/004 operator-visible error codes table; N-ZERO-001/002 zero-width scatter window behavior.frontmatter-hash-specification.md§5.1 BFS Diamond-Import Test Vectors: three concrete vectors (FH-BFS-001 empty, FH-BFS-002 single-import, FH-BFS-003 diamondroot→a,b→shared) with verified SHA-256 hex strings, canonical JSON, and deduplication notes. All hashes verified against the Go implementation and covered in CI by newTestFrontmatterHashVectorFH_BFS_002andTestFrontmatterHashVectorFH_BFS_003tests inpkg/parser/frontmatter_hash_cross_language_test.go.mcp-scripts-specification.md§6.4.4 Go Module Requirements:go.modis required for external dependencies (inline or viadependenciesfield); minimum supported Go version is 1.21; R-GO-001/R-GO-002 normative requirements on toolchain satisfaction and clean module cache per invocation.P2 — Analysis/Clarification
forecast-specification.md§2: Added promotion tracking table (run_id,repo,date,outcome) for tracking the 3-production-use requirement toward Draft promotion.fuzzy-schedule-specification.md§12: Restructured Calendar Output Schema into §12.1 minimum-required fields, §12.2 optional extended fields, and §12.3 example output.effective-tokens-specification.md§8.5 S-6: Three new sub-guards — S-6a (NaN/Inf on computed ET must clamp + flag before serialization), S-6b (base_weighted_tokensoverflow for very large token counts), S-6c (zero-invocation graph must produce well-formed zero-valued output, not an error). Precision and rounding are intentionally not normatively specified; integer-level precision is sufficient for this approximation.