TML-2717: Drive trace reader + diagnostics (scripts/drive-diagnostics)#613
Conversation
…gnostics) Slice 3 (TML-2717, close-out slice) ships the deterministic reader for the trace.jsonl that slices 1-2 emit: a scripts/drive-diagnostics/ TypeScript tool (schema+loader, ~14 diagnostic metrics, assertion library over I1-I12 + 8 cascade rules + brief-discipline, markdown report generator, best-effort post-hoc transcript parser), closing by grading this project on its own trace. Architecture decision: scripts/drive-diagnostics/ (meta-tooling, not product; stays out of packages/ layering). Seven strictly-sequential M-sized dispatches. Continues dogfooding: spec-authored + plan-authored emitted to the live trace. Signed-off-by: Will Madden <madden@prisma.io>
scripts/drive-diagnostics/schema.ts transcribes the 17 arktype event schemas
from the drive-record-traces vocabulary; load.ts parses + validates trace.jsonl
line-by-line into {events, unknown, errors}, never throwing on malformed input.
Tests (node --test) cover the real trace fixture (32 events, 0 errors), a
malformed line, an unknown event_type, and empty input.
Orchestrator fixups in-round: as-const on the arktype envelope so .infer keeps
literal types; events kept schema-faithful (TraceEvent[], origin tracked by the
load source not stamped on events) to stay cast-free under the no-bare-cast
ratchet.
Signed-off-by: Will Madden <madden@prisma.io>
metrics.ts computes the project-DoD metric set over a validated TraceEvent[]: rework (rounds_per_dispatch, first-pass acceptance, backtrack ratio, brief stability, tier mix, wall-clock), planning quality (spec/plan stability, I12 halt rate, triage stability), artefact churn (write amplification, time-to- stability), lifecycle/cadence (project/slice wall-clock, health-check cadence, retro distribution), and a null operator-turn placeholder (post-hoc only). Each metric degrades to null-with-note on absent signal; none throw on partial traces. 79 node --test cases with hand-checked expected values; cast-free. Signed-off-by: Will Madden <madden@prisma.io>
…I1-I12 assertions/invariants.ts: one checker per Drive invariant I1-I12, returning pass/fail/not-checkable with evidence pointers into the trace. Observable (I1/I4/I6/I8/I10) carry real checks; the rest are honestly not-checkable with a one-line gap rationale (the project-DoD "coverage gaps named" requirement). assertions/types.ts holds the shared AssertionResult shape (D4 reuses it). Also hardens S3-D2 real-trace metric tests: switch from magic-count pins to internally-consistent assertions, so the suite survives the live trace growing as this project keeps dogfooding its own instrumentation. Signed-off-by: Will Madden <madden@prisma.io>
…es + brief-discipline assertions/cascade.ts (the 8 artifact-cascade-redesign rules) + brief.ts (the brief-discipline anti-patterns), reusing the D3 AssertionResult shape, plus assertions/index.ts runAssertions() aggregating invariants + cascade + brief (31 results on the real trace). Most cascade/brief rules are authoring-process concerns with no trace signal: marked not-checkable with a one-line rationale (the named coverage gaps). One heuristic check (brief byte-length vs spec). 94 node --test cases; tsc 0, 0 casts. Signed-off-by: Will Madden <madden@prisma.io>
…json wiring report.ts renders a deterministic markdown dashboard (header + origin/parse- health banners, metrics tables with n/a-no-signal cells, assertion sections grouped pass/fail/not-checkable with evidence pointers). cli.ts wires load -> computeMetrics -> runAssertions -> renderReport, import-guarded for tests. Root package.json gains drive:diagnose + the five suites in test:scripts. pnpm test:scripts green (374). Wall-clock means rounded for clean output. Signed-off-by: Will Madden <madden@prisma.io>
…arser posthoc.ts reconstructs dispatch-start / spec-authored / plan-authored events + an operator-turn count from a Cursor agent transcript, each stamped origin:post-hoc with a confidence grade; it never invents timestamps and emits a no-signal note when no Drive structure is detected. cli.ts gains --posthoc (origin native/post-hoc/mixed; operator-turn count threaded into the report). 23 posthoc node --test cases + a committed transcript fixture; pnpm test:scripts green (397). Known limitation: reconstructed (timestamp-less) events are surfaced via origin + operator count but do not feed the metric computations. Signed-off-by: Will Madden <madden@prisma.io>
…landed lesson Close slice 3 (trace reader + diagnostics). Ran the finished tool on this project's own trace.jsonl and committed the self-grade report (59 events, 7 assertions pass / 0 fail, clean rework + planning metrics). Added manual-qa.md (7 checks + pre-flight gate) and qa-run-01.md (PASS, no Blockers). Landed the self-grade lesson in drive/retro/findings.md: a self-grade over a hand-emitted trace validates the reader, not the methodology — plus the live-artefact-test-coupling and arktype-as-const gotchas. Also commits scripts/drive-diagnostics/test/cascade-brief.test.ts, which D4 created but never staged. All SDoD1-9 checked; D1-D7 SATISFIED. Signed-off-by: Will Madden <madden@prisma.io>
…mentation-s3-trace-reader
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR introduces the ChangesDiagnostics Pipeline Implementation
🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
Suggested Reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Comment |
@prisma-next/extension-author-tools
@prisma-next/mongo-runtime
@prisma-next/family-mongo
@prisma-next/sql-runtime
@prisma-next/family-sql
@prisma-next/extension-arktype-json
@prisma-next/extension-cipherstash
@prisma-next/middleware-cache
@prisma-next/mongo
@prisma-next/extension-paradedb
@prisma-next/extension-pgvector
@prisma-next/extension-postgis
@prisma-next/postgres
@prisma-next/sql-orm-client
@prisma-next/sqlite
@prisma-next/target-mongo
@prisma-next/adapter-mongo
@prisma-next/driver-mongo
@prisma-next/contract
@prisma-next/utils
@prisma-next/config
@prisma-next/errors
@prisma-next/framework-components
@prisma-next/operations
@prisma-next/ts-render
@prisma-next/contract-authoring
@prisma-next/ids
@prisma-next/psl-parser
@prisma-next/psl-printer
@prisma-next/cli
@prisma-next/cli-telemetry
@prisma-next/emitter
@prisma-next/migration-tools
prisma-next
@prisma-next/vite-plugin-contract-emit
@prisma-next/mongo-codec
@prisma-next/mongo-contract
@prisma-next/mongo-value
@prisma-next/mongo-contract-psl
@prisma-next/mongo-contract-ts
@prisma-next/mongo-emitter
@prisma-next/mongo-schema-ir
@prisma-next/mongo-query-ast
@prisma-next/mongo-orm
@prisma-next/mongo-query-builder
@prisma-next/mongo-lowering
@prisma-next/mongo-wire
@prisma-next/sql-contract
@prisma-next/sql-errors
@prisma-next/sql-operations
@prisma-next/sql-schema-ir
@prisma-next/sql-contract-psl
@prisma-next/sql-contract-ts
@prisma-next/sql-contract-emitter
@prisma-next/sql-lane-query-builder
@prisma-next/sql-relational-core
@prisma-next/sql-builder
@prisma-next/target-postgres
@prisma-next/target-sqlite
@prisma-next/adapter-postgres
@prisma-next/adapter-sqlite
@prisma-next/driver-postgres
@prisma-next/driver-sqlite
commit: |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (3)
package.json (1)
41-41: ⚡ Quick winConsider using glob patterns instead of explicit file enumeration.
The
test:scriptscommand is now very long (~900 characters) and requires manual updates each time a test file is added. Node's--testflag supports glob patterns, which would make this more maintainable:- "test:scripts": "node --test scripts/lint-workflow-triggers.test.mjs scripts/validate-skills.test.mjs scripts/determine-version-utils.test.ts scripts/check-upgrade-coverage.test.mjs scripts/set-version-utils.test.ts scripts/check-publish-deps-pn-pins.test.mjs scripts/publish-packages-utils.test.mjs scripts/check-clean-tree.test.mjs scripts/lint-casts.test.mjs scripts/sync-agent-rules.test.mjs scripts/drive-diagnostics/test/load.test.ts scripts/drive-diagnostics/test/metrics.test.ts scripts/drive-diagnostics/test/invariants.test.ts scripts/drive-diagnostics/test/cascade-brief.test.ts scripts/drive-diagnostics/test/report.test.ts scripts/drive-diagnostics/test/posthoc.test.ts", + "test:scripts": "node --test 'scripts/**/*.test.{mjs,ts}'",However, note that glob patterns may discover tests in different orders or include unintended files, so verify behavior before adopting this pattern.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@package.json` at line 41, The npm script "test:scripts" is brittle because it lists every test file explicitly; change it to use a glob with Node's --test (e.g., replace the long explicit file enumeration in the "test:scripts" script with a glob such as one matching your scripts test tree like scripts/**/*.{test}.{mjs,ts} or scripts/**/*.test.*) so new tests are picked up automatically; after updating the "test:scripts" value, run the test command to verify it discovers the intended files and adjust the glob to exclude any unwanted matches.scripts/drive-diagnostics/test/cascade-brief.test.ts (1)
383-429: ⚡ Quick winAdd BD-8 test with mixed project+slice specs for the same run.
This will lock in that BD-8 uses slice-spec context and won’t regress if project specs are present.
Suggested test addition
+describe('BD-8 — uses slice specs (project spec should not drive comparison)', () => { + const events: TraceEvent[] = [ + { ...mkSpecAuthored('project-spec.md', 2000), spec_kind: 'project' as const }, + mkSpecAuthored('slice-spec.md', 8000), + mkBriefIssued('d-001', 3000), + ]; + const results = checkBriefDiscipline(events); + + it('status is pass (brief compared against slice spec, not project spec)', () => { + const r = results.find((x) => x.id === 'BD-8'); + assert.ok(r !== undefined); + assert.equal(r.status, 'pass'); + }); +});🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/drive-diagnostics/test/cascade-brief.test.ts` around lines 383 - 429, The new test suite must ensure BD-8 behavior when both a project-level spec and a slice-level spec appear in the same run so the rule uses the slice-spec context; add a test in the BD-8 block that calls checkBriefDiscipline with events including mkSpecAuthored for a project spec and mkSpecAuthored for the slice (distinct filenames/ids) plus mkBriefIssued, then assert the BD-8 result uses the slice spec (status and evidence match the slice-based expectations, e.g., pass when brief shorter than slice spec and fail/heuristic when >= slice spec) by inspecting results.find(x => x.id === 'BD-8'), its status, note, and evidence fields to lock in slice-spec precedence.scripts/drive-diagnostics/test/invariants.test.ts (1)
493-508: ⚡ Quick winAdd I10 regression coverage for
spec-amended-only project traces.Current I10 tests won’t detect the early-return path where project
spec-amendedevents are ignored whenspec-authored(project)is absent.Suggested test addition
+describe('I10 — fail: project spec-amended with dod_items_count = 0 and no project spec-authored', () => { + const events: TraceEvent[] = [ + { + ...mkSpecAuthored('spec.md', 'slice', 5), + event_type: 'spec-amended' as const, + spec_kind: 'project' as const, + dod_items_count: 0, + }, + ]; + const results = checkInvariants(events); + + it('status is fail', () => { + const r = results.find((x) => x.id === 'I10'); + assert.ok(r !== undefined); + assert.equal(r.status, 'fail'); + }); +});🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/drive-diagnostics/test/invariants.test.ts` around lines 493 - 508, Add a new test case that covers the "spec-amended"-only project trace path so I10 detects the early-return; instead of using mkSpecAuthored create events with mkSpecAmended('spec.md','project',0) (or equivalent spec-amended TraceEvent) and pass them to checkInvariants, then assert that the result with id 'I10' exists, has status 'fail', and its evidence note mentions 'dod_items_count'; place this alongside the existing I10 describe block and reference mkSpecAmended, TraceEvent, checkInvariants and the 'I10' id to ensure the code path where spec-authored is absent is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/drive-diagnostics/assertions/brief.ts`:
- Around line 139-146: The current minSpecByteLength calculation uses all
'spec-authored' events including project specs; restrict the pool to slice specs
only by filtering specAuthored to include only events that denote a slice (e.g.,
check the event's spec type/scope property such as e.spec_type or e.scope or an
is_slice flag) before building minSpecByteLength; update the creation of
specAuthored or apply .filter(...) so the loop that sets minSpecByteLength only
iterates over slice spec events (refer to specAuthored, eventsOfType, and
minSpecByteLength).
In `@scripts/drive-diagnostics/assertions/index.ts`:
- Line 7: Remove the banned barrel-style re-export line "export type {
AssertionResult, AssertionStatus, TraceRef } from './types.ts'" from
assertions/index.ts; instead either have callers import those types directly
from './types.ts' or add a proper barrel under the repository's exports/ folder
and re-export them there (do not re-export from this module). Ensure no other
"export ... from './types.ts'" remains in assertions/index.ts.
In `@scripts/drive-diagnostics/assertions/invariants.ts`:
- Around line 259-270: The current matching uses only dispatch_id
(briefDispatchIds) so it can incorrectly match briefs from other runs or from
later timestamps; change the logic in the loop over dispatchStarts to require a
brief-issued with the same dispatch_id AND the same run_id that also occurs
before the dispatch-start (compare timestamps), e.g. build a lookup keyed by
`${run_id}:${dispatch_id}` or search briefIssueds for an entry where
brief.dispatch_id === ds.dispatch_id && brief.run_id === ds.run_id &&
brief.timestamp <= ds.timestamp; use that check instead of
briefDispatchIds.has(...) when deciding to push ref(ds) into matched or to push
the orphan message via ref(ds, ...).
- Around line 46-62: The note for invariant I1 incorrectly estimates the number
of affected slices using Math.floor(duplicates.length / 2); instead compute the
actual number of unique slugs with more than one event (e.g., count keys in
bySlug where slugEvents.length > 1 or build a Set of slugs pushed to duplicates)
and use that count in the note; update the returned object (id 'I1', title 'A
slice or direct change delivers exactly one PR.') to use this correct count
instead of duplicates.length / 2 while leaving evidence as the duplicates array
and keeping the rest of the structure unchanged.
- Around line 309-338: The early return when projectSpecs.length === 0 prevents
the later spec-amended validation from running; change the flow so spec-amended
events are always checked: remove or bypass the early return and instead run the
projectSpecAmended check (eventsOfType(..., 'spec-amended') filtered by
spec_kind === 'project') even if projectSpecs is empty, push failures into
failing via ref(e, ...) as currently done, and update the result generation (id
'I10', title, status, evidence, note) to account for combined outcomes of
projectSpecs and projectSpecAmended checks.
In `@scripts/drive-diagnostics/cli.ts`:
- Around line 25-33: The argument parser loop that sets outPath and posthocPath
(the for loop iterating over args and assignments to outPath/posthocPath)
currently takes the next token without validating it; update the branches
handling '--out' and '--posthoc' to ensure the next token exists and does NOT
start with '--' before assigning to outPath or posthocPath, and if validation
fails emit a clear error (or throw) and exit instead of consuming another flag
as the value. Ensure you reference and validate args[i+1] and retain the
existing i++ behavior only after a successful validation.
- Around line 56-88: When calling parseTranscript(posthocPath) and
writeFileSync(outPath, ...), errors currently propagate and crash; wrap the
parseTranscript invocation (when posthocPath is provided) in a try/catch that on
error writes a clear message to stderr (e.g. using console.error or
process.stderr.write) including the posthocPath and error.message, then call
process.exit(1); likewise wrap the file write branch (writeFileSync) in a
try/catch that logs a descriptive stderr message including outPath and the error
and exits with code 1; reference parseTranscript, posthocResult, writeFileSync,
outPath, and process.exit when making these guarded changes.
In `@scripts/drive-diagnostics/posthoc.ts`:
- Around line 259-262: parseTranscript currently calls readFileSync(path) and
will throw for missing/unreadable files; wrap the file read + call to
parseTranscriptFromString in a try/catch inside parseTranscript(path) and on
error return a PostHocResult representing an empty/neutral diagnostic with a
note (include the path and error message) instead of throwing; update
parseTranscript to call parseTranscriptFromString only on success and construct
a minimal PostHocResult when catching errors so the diagnostics flow remains
best-effort.
In `@scripts/drive-diagnostics/report.ts`:
- Around line 43-49: The mdTable implementation is interpolating raw cell values
into the Markdown table which breaks when cells contain '|' or newlines; update
the mdTable(header, rows) function to escape pipe characters and normalize
newlines for each cell before joining (e.g., map each cell through an escape
function that replaces '|' with '\|' and converts or removes newlines like '\n'
-> '<br>' or a space), then use those escaped values in the existing
rows.map((r) => ...) logic so the table layout remains valid when values contain
special characters.
In `@scripts/drive-diagnostics/test/metrics.test.ts`:
- Around line 769-775: The test currently asserts the raw length of
m.planning_quality.plan_accuracy.dispatch_size_distributions equals
countOf('plan-authored'), but the metric contract allows
dispatch_size_distributions entries to be null; update the assertion to count
only non-null distributions before comparing. Specifically, compute the number
of non-null entries from
m.planning_quality.plan_accuracy.dispatch_size_distributions (e.g., by filtering
out null/undefined) and assert that filtered length equals
countOf('plan-authored'); this targets the computeMetrics output and keeps the
expectation aligned with the contract.
---
Nitpick comments:
In `@package.json`:
- Line 41: The npm script "test:scripts" is brittle because it lists every test
file explicitly; change it to use a glob with Node's --test (e.g., replace the
long explicit file enumeration in the "test:scripts" script with a glob such as
one matching your scripts test tree like scripts/**/*.{test}.{mjs,ts} or
scripts/**/*.test.*) so new tests are picked up automatically; after updating
the "test:scripts" value, run the test command to verify it discovers the
intended files and adjust the glob to exclude any unwanted matches.
In `@scripts/drive-diagnostics/test/cascade-brief.test.ts`:
- Around line 383-429: The new test suite must ensure BD-8 behavior when both a
project-level spec and a slice-level spec appear in the same run so the rule
uses the slice-spec context; add a test in the BD-8 block that calls
checkBriefDiscipline with events including mkSpecAuthored for a project spec and
mkSpecAuthored for the slice (distinct filenames/ids) plus mkBriefIssued, then
assert the BD-8 result uses the slice spec (status and evidence match the
slice-based expectations, e.g., pass when brief shorter than slice spec and
fail/heuristic when >= slice spec) by inspecting results.find(x => x.id ===
'BD-8'), its status, note, and evidence fields to lock in slice-spec precedence.
In `@scripts/drive-diagnostics/test/invariants.test.ts`:
- Around line 493-508: Add a new test case that covers the "spec-amended"-only
project trace path so I10 detects the early-return; instead of using
mkSpecAuthored create events with mkSpecAmended('spec.md','project',0) (or
equivalent spec-amended TraceEvent) and pass them to checkInvariants, then
assert that the result with id 'I10' exists, has status 'fail', and its evidence
note mentions 'dod_items_count'; place this alongside the existing I10 describe
block and reference mkSpecAmended, TraceEvent, checkInvariants and the 'I10' id
to ensure the code path where spec-authored is absent is exercised.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: c8b12dd8-9aea-4783-adb3-6c054c3c01b4
⛔ Files ignored due to path filters (6)
projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/manual-qa.mdis excluded by!projects/**projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/plan.mdis excluded by!projects/**projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/qa-run-01.mdis excluded by!projects/**projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/self-grade-report.mdis excluded by!projects/**projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/spec.mdis excluded by!projects/**projects/drive-instrumentation/trace.jsonlis excluded by!projects/**
📒 Files selected for processing (20)
drive/retro/findings.mdpackage.jsonscripts/drive-diagnostics/assertions/brief.tsscripts/drive-diagnostics/assertions/cascade.tsscripts/drive-diagnostics/assertions/index.tsscripts/drive-diagnostics/assertions/invariants.tsscripts/drive-diagnostics/assertions/types.tsscripts/drive-diagnostics/cli.tsscripts/drive-diagnostics/load.tsscripts/drive-diagnostics/metrics.tsscripts/drive-diagnostics/posthoc.tsscripts/drive-diagnostics/report.tsscripts/drive-diagnostics/schema.tsscripts/drive-diagnostics/test/cascade-brief.test.tsscripts/drive-diagnostics/test/fixtures/sample-transcript.jsonlscripts/drive-diagnostics/test/invariants.test.tsscripts/drive-diagnostics/test/load.test.tsscripts/drive-diagnostics/test/metrics.test.tsscripts/drive-diagnostics/test/posthoc.test.tsscripts/drive-diagnostics/test/report.test.ts
…ings + fix confabulated ticket ref Add a principal-engineer-lens finding on the slice-3 self-grade report: metric naming/polarity inversion (TML-2719), no computable run verdict + token/correctness vocabulary gaps (TML-2720), and the deterministic drive-trace-emit emitter (TML-2721). Also fix two stale references in the D7 self-grade finding: the follow-up work is the Drive — Judge + live-experiment harness project, not TML-2705 (which is an unrelated, completed ticket). Signed-off-by: Will Madden <madden@prisma.io>
…-2720) Rename planning-quality metrics to count-named, polarity-correct fields (spec_amendments, plan_amendments, i12_halts) and label dispatch-size distributions by plan path, so a value of 0 reads as "the artefact held" rather than an inverted stability score. Add the report-side honesty surfaces: an assertion-coverage headline (checkable vs not-observable-from-trace), an origin-keyed provenance caveat (native values are author-asserted, not independently verified), a "Run verdict: Not computable" section naming the missing axes (correctness, tokens, baseline), and an explicit token-usage = not instrumented row. Signed-off-by: Will Madden <madden@prisma.io>
Signed-off-by: Will Madden <madden@prisma.io>
Relocate the trace reader from scripts/drive-diagnostics to skills-contrib/drive-diagnostics so it travels with the rest of the drive-* skill cluster instead of being stranded in this repo. The deciding axis is portability: the Drive methodology lives in portable skills (symlinked into the harness skill trees by the pnpm install prepare hook), and the read-side reader belongs alongside the emit-side drive-record-traces contract skill. Adds a SKILL.md (usage, output-interpretation caveats, arktype prerequisite for out-of-repo use, by-name reference to drive-record-traces as the vocabulary source); repoints the drive:diagnose and test:scripts wiring and all in-repo references to the new path. arktype is retained (correctness over a hand-rolled validator rewrite); portability is handled by a documented install step. Signed-off-by: Will Madden <madden@prisma.io>
Declare arktype at the workspace root (catalog:) so the relocated skill resolves its only external dependency in a clean install — skills-contrib is not a workspace package, so the catalog dep was previously only resolvable via an already-hoisted node_modules. Address review threads: - BD-8 restricts the brief-restates-spec heuristic to slice specs. - Drop the barrel-style type re-export from assertions/index.ts; import the types directly from their module. - I1 counts distinct duplicated slugs rather than halving the evidence list. - I8 keys on project_run_id::dispatch_id and requires the brief-issued to precede the dispatch-start, closing cross-run and late-brief false-passes. - I10 still validates spec-amended events when no project spec-authored exists. - cli validates --out/--posthoc values and fails with a clear stderr message and exit 1 instead of crashing on parse/write errors. - posthoc returns a structured empty result with a note for unreadable transcripts instead of throwing. - mdTable escapes pipe and newline characters in cell values. Signed-off-by: Will Madden <madden@prisma.io>
08e3942 to
42a5b92
Compare
|
Addressed all CodeRabbit review threads in
Separately, the Lint failure was |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
skills-contrib/drive-diagnostics/SKILL.md (2)
25-27: ⚡ Quick winAdd language specifiers to shell code blocks.
Per static analysis and Markdown best practices, shell commands should specify the language.
📝 Proposed fix for code block language specifiers
-``` +```bash node skills-contrib/drive-diagnostics/cli.ts <trace.jsonl> [--posthoc <transcript>] [--out <output.md>]```diff -``` +```bash pnpm drive:diagnose <trace.jsonl></details> Also applies to: 31-33 <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In
@skills-contrib/drive-diagnostics/SKILL.mdaround lines 25 - 27, Update the
Markdown code fences for the shell examples so they include a language specifier
(bash); specifically change the fenced blocks containing the commands "node
skills-contrib/drive-diagnostics/cli.ts <trace.jsonl> [--posthoc ]
[--out <output.md>]" and "pnpm drive:diagnose <trace.jsonl>" (and the other
same-style blocks mentioned at lines 31-33) to usebash at the start of each fence instead of plain, leaving the command text unchanged.</details> --- `51-53`: _⚡ Quick win_ **Add language specifier to npm install code block.** Per static analysis and Markdown best practices, shell commands should specify the language. <details> <summary>📝 Proposed fix</summary> ```diff -``` +```bash npm install arktype ``` ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In
@skills-contrib/drive-diagnostics/SKILL.mdaround lines 51 - 53, The fenced
code block containing the single-line command "npm install arktype" in SKILL.md
should include a language specifier for shell; update the backticks from ``` tobest practices and static analysis rules.skills-contrib/drive-diagnostics/test/report.test.ts (1)
3-3: ⚡ Quick winDrop the
.tssuffix in this TypeScript import.Line 3 should use extensionless import style to match repo conventions.
Proposed fix
-import type { AssertionResult } from '../assertions/types.ts'; +import type { AssertionResult } from '../assertions/types';As per coding guidelines,
**/*.{ts,tsx}: "Never add file extensions to imports in TypeScript."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills-contrib/drive-diagnostics/test/report.test.ts` at line 3, The import statement for the AssertionResult type includes a .ts extension; update the import in the test file so it uses the extensionless module specifier (import type { AssertionResult } from '../assertions/types';) to follow the repo's TypeScript import convention and avoid file extensions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@skills-contrib/drive-diagnostics/report.ts`:
- Line 1: The import statements in report.ts include explicit ".ts" extensions
(e.g., the import of AssertionResult from './assertions/types.ts' and the
imports from './load.ts' and './metrics.ts'); remove the ".ts" suffixes so they
read './assertions/types', './load', and './metrics' respectively to match
TypeScript/ESM import conventions. Update the import lines at the top of
report.ts that reference AssertionResult and any symbols imported from load.ts
and metrics.ts (look for imports like "import type { AssertionResult } from
'./assertions/types.ts'" and similar) to use the paths without the ".ts"
extension.
In `@skills-contrib/drive-diagnostics/SKILL.md`:
- Line 45: Update the Node version requirement in SKILL.md to match
package.json's authoritative engines field; replace the "Node 22+" text with
"Node 24+" (or ">=24") so SKILL.md and the package.json engines field ("node":
">=24") are consistent.
In `@skills-contrib/drive-diagnostics/test/metrics.test.ts`:
- Around line 769-772: The test assumes dispatch_sizes contains one entry per
"plan-authored" event but computePlanningQuality only emits entries when
dispatch_size_distribution !== null; update the expectation in the test that
calls computeMetrics(events) to count only plan-authored events with a non-null
dispatch_size_distribution (e.g., filter events for type === 'plan-authored' &&
dispatch_size_distribution !== null) so that
planning_quality.dispatch_sizes.length matches that filtered count; reference
symbols: computeMetrics, computePlanningQuality,
planning_quality.dispatch_sizes, dispatch_size_distribution, and the
"plan-authored" event type.
---
Nitpick comments:
In `@skills-contrib/drive-diagnostics/SKILL.md`:
- Around line 25-27: Update the Markdown code fences for the shell examples so
they include a language specifier (bash); specifically change the fenced blocks
containing the commands "node skills-contrib/drive-diagnostics/cli.ts
<trace.jsonl> [--posthoc <transcript>] [--out <output.md>]" and "pnpm
drive:diagnose <trace.jsonl>" (and the other same-style blocks mentioned at
lines 31-33) to use ```bash at the start of each fence instead of plain ```,
leaving the command text unchanged.
- Around line 51-53: The fenced code block containing the single-line command
"npm install arktype" in SKILL.md should include a language specifier for shell;
update the backticks from ``` to ```bash so the block reads as a bash/shell code
block and adheres to Markdown best practices and static analysis rules.
In `@skills-contrib/drive-diagnostics/test/report.test.ts`:
- Line 3: The import statement for the AssertionResult type includes a .ts
extension; update the import in the test file so it uses the extensionless
module specifier (import type { AssertionResult } from '../assertions/types';)
to follow the repo's TypeScript import convention and avoid file extensions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: e12cae6a-b2cf-4ca9-8c4b-532abec3376e
⛔ Files ignored due to path filters (6)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yamlprojects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/manual-qa.mdis excluded by!projects/**projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/plan.mdis excluded by!projects/**projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/qa-run-01.mdis excluded by!projects/**projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/self-grade-report.mdis excluded by!projects/**projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/spec.mdis excluded by!projects/**
📒 Files selected for processing (21)
drive/retro/findings.mdpackage.jsonskills-contrib/drive-diagnostics/SKILL.mdskills-contrib/drive-diagnostics/assertions/brief.tsskills-contrib/drive-diagnostics/assertions/cascade.tsskills-contrib/drive-diagnostics/assertions/index.tsskills-contrib/drive-diagnostics/assertions/invariants.tsskills-contrib/drive-diagnostics/assertions/types.tsskills-contrib/drive-diagnostics/cli.tsskills-contrib/drive-diagnostics/load.tsskills-contrib/drive-diagnostics/metrics.tsskills-contrib/drive-diagnostics/posthoc.tsskills-contrib/drive-diagnostics/report.tsskills-contrib/drive-diagnostics/schema.tsskills-contrib/drive-diagnostics/test/cascade-brief.test.tsskills-contrib/drive-diagnostics/test/fixtures/sample-transcript.jsonlskills-contrib/drive-diagnostics/test/invariants.test.tsskills-contrib/drive-diagnostics/test/load.test.tsskills-contrib/drive-diagnostics/test/metrics.test.tsskills-contrib/drive-diagnostics/test/posthoc.test.tsskills-contrib/drive-diagnostics/test/report.test.ts
💤 Files with no reviewable changes (8)
- skills-contrib/drive-diagnostics/assertions/index.ts
- skills-contrib/drive-diagnostics/load.ts
- skills-contrib/drive-diagnostics/test/load.test.ts
- skills-contrib/drive-diagnostics/assertions/types.ts
- skills-contrib/drive-diagnostics/test/fixtures/sample-transcript.jsonl
- skills-contrib/drive-diagnostics/schema.ts
- skills-contrib/drive-diagnostics/assertions/cascade.ts
- skills-contrib/drive-diagnostics/test/cascade-brief.test.ts
✅ Files skipped from review due to trivial changes (1)
- drive/retro/findings.md
🚧 Files skipped from review as they are similar to previous changes (1)
- package.json
Rename the skill directory drive-diagnostics -> drive-diagnose-run to match the drive-<verb>-<noun> convention of its siblings (drive-record-traces, drive-qa-run, drive-run-retro); repoint the drive:diagnose script, test globs, and all references. Review fixes: - SKILL.md states the actual Node engine requirement (24+). - metrics dispatch_sizes test counts only plan-authored events with a non-null distribution, matching the implementation. The .ts import extensions are intentionally retained: the tool runs via Node native TypeScript execution, which does not resolve extensionless relative imports (same convention as scripts/*.ts). Signed-off-by: Will Madden <madden@prisma.io>
|
Addressed the latest review threads in
Also renamed the skill |
…mentation-s3-trace-reader Signed-off-by: Will Madden <madden@prisma.io> # Conflicts: # drive/retro/findings.md
Linked issue
Refs TML-2717. Third and final slice of the Drive — Skill instrumentation + diagnostics project; builds on the slice-1 trace vocabulary (#604) and the slice-2 lifecycle/cadence instrumentation. The follow-up is the Drive — Judge + live-experiment harness project.
At a glance
Point the new CLI at any Drive
trace.jsonland it renders a deterministic markdown dashboard — rework, planning-quality, and lifecycle metrics, plus a pass/fail/not-checkable audit of the Drive invariants:Until now, the trace events emitted by the Drive skills had no reader — the numbers existed on disk but nothing turned them into a verdict. This slice is that reader, and it leads with an honest "verdict not computable" rather than implying that an all-green metrics table means the run was good.
Decision
This PR ships
skills-contrib/drive-diagnostics/— a pure, LLM-free analytics tool over an emitted Drive trace, packaged as a portable skill. Five pieces:0reads as "the artefact held", not an inverted "stability" score). Each degrades tonull/0with a note rather than crashing when its source events are absent.pass/fail/not-checkable; everynot-checkablecarries a one-line rationale, so the report doubles as an honest map of what the current trace vocabulary cannot observe.pnpm drive:diagnose.origin: post-hoc+ a confidence and never inventing timestamps.The code lives in
skills-contrib/drive-diagnostics/as a portable skill (aSKILL.mdplus the bundled TypeScript), sibling to the emit-sidedrive-record-tracesskill. The deciding axis is portability: the whole Drive methodology lives inskills-contrib/skills that travel to other repos (symlinked into.agents/skills/and.claude/skills/by thepnpm installprepare hook), so the read-side reader ships alongside the cluster rather than being stranded in this repo'sscripts/. It's still methodology meta-tooling — not apackages/product surface — so it stays outside thepnpm lint:depslayering graph and runs directly vianode+node --test. Its one external dependency isarktype; for out-of-repo use theSKILL.mddocuments the install step.Reviewer notes
self-grade-report.md) and the numbers are spotless (100% first-pass, zero amendments). That's because the trace was hand-emitted by the orchestrator while building the slices (dogfooding the emission protocol), not produced by an unattended skill-driven run. The self-grade proves the reader is correct over a conformant trace; it does not validate the methodology. This is recorded as the slice's landed lesson indrive/retro/findings.mdand is exactly the gap the Drive — Judge + live-experiment harness project closes.not-checkable, by design. 24 of 31 invariant/cascade/brief checks can't be observed from the current trace vocabulary (scope/purpose immutability, sizing-by-INVEST, brief content sections). The not-checkable list with rationales is a deliverable — the honest coverage-gap map — not a TODO.metrics.ts(547) and its test (795). The real-trace tests assert internal consistency (derived from the loaded events), not hardcoded counts — deliberately, because the live trace keeps growing as we dogfood (see the testing-discipline note in the landed finding).schema.tsliteral types needas const. arktype schemas transcribed from markdown silently inferneverwithout it; a realtscpass over the tool (not justnode --test) is the gate that catches it.cascade-brief.test.tsis committed in the D7 commit — it was created in D4 but never staged; folded in here rather than rewriting history.What lands in this PR
slice-3 spec + planS3-D1S3-D2S3-D3S3-D4S3-D5pnpm drive:diagnosewiring.S3-D6--posthocflag.S3-D7self-grade-report.md,manual-qa.md(7 checks) +qa-run-01.md(PASS), landed lesson.scripts/drive-diagnostics/toskills-contrib/drive-diagnostics/with aSKILL.md, so it travels with the drive-* cluster instead of being stranded in this repo (git mv, history preserved).Verification
pnpm test:scripts— 407 pass / 0 fail (load, metrics, invariants, cascade-brief, report, posthoc suites).tsc --noEmit --strictoverskills-contrib/drive-diagnostics/**— clean.biome check skills-contrib/drive-diagnostics— clean, 0no-bare-castdiagnostics.pnpm lint:skills(SKILL.md frontmatter) +pnpm lint:deps(layering) — both clean after the relocation.manual-qa.mdexecuted end-to-end →qa-run-01.md, PASS, no Blockers (native run, malformed/empty traces, post-hoc reconstruction, assertion coverage, directory boundary, self-grade committed).origin/main; re-ranpnpm test:scripts+ CLI smoke green post-merge.Follow-ups
drive-trace-emitemitter (kills hand-emission drift). Not in this PR.Alternatives considered
packages/. Rejected — it's methodology meta-tooling, not a product surface; living underpackages/would drag it into the layering graph and the published build for no benefit.scripts/. Rejected —scripts/strands the reader in this repo while the rest of the Drive methodology (skills) travels to other repos. Shipping it as askills-contrib/skill keeps the cluster coherent and portable. The cost is one documented dependency (arktype) for out-of-repo use.arktypedependency (fully zero-dep skill). Deferred — hand-rolling 17 validators risks silent divergence from the emitted vocabulary, and arktype is the repo standard. Correctness-first; a zero-dep validator is a possible future optimization noted in theSKILL.md.scripts/already runs vianode+node --test; matching that keeps the diagnostics tool dependency-light and runnable without the workspace build.Checklist
git commit -s). (Merge commit exempt.)node --testsuites, 407 cases).TML-NNNN: <sentence-case title>form.Summary by CodeRabbit
New Features
Documentation
Tests
Chores