Skip to content

TML-2717: Drive trace reader + diagnostics (scripts/drive-diagnostics)#613

Merged
wmadden merged 17 commits into
mainfrom
tml-2717-drive-instrumentation-s3-trace-reader
May 29, 2026
Merged

TML-2717: Drive trace reader + diagnostics (scripts/drive-diagnostics)#613
wmadden merged 17 commits into
mainfrom
tml-2717-drive-instrumentation-s3-trace-reader

Conversation

@wmadden
Copy link
Copy Markdown
Contributor

@wmadden wmadden commented May 29, 2026

Linked issue

Refs TML-2717. Third and final slice of the Drive — Skill instrumentation + diagnostics project; builds on the slice-1 trace vocabulary (#604) and the slice-2 lifecycle/cadence instrumentation. The follow-up is the Drive — Judge + live-experiment harness project.

At a glance

Point the new CLI at any Drive trace.jsonl and it renders a deterministic markdown dashboard — rework, planning-quality, and lifecycle metrics, plus a pass/fail/not-checkable audit of the Drive invariants:

pnpm drive:diagnose projects/drive-instrumentation/trace.jsonl
# Drive Diagnostics Report
**Events:** 59   **Run IDs:** drive-instrumentation   **Origin:** native
**Assertion coverage:** 7/31 checkable (24 not observable from the trace)

> Provenance: events were appended through the trace path; their values are
> author-asserted, not independently verified.

## Run verdict
**Not computable** from this trace alone — this report describes *what happened*,
not *how good the run was*: no correctness signal, no token instrumentation, no
baseline. Treat all-green metrics as "no recorded problems", not "verified good".

## Assertions
### Pass (7)
| I1 | A slice or direct change delivers exactly one PR. |
| I8 | Every dispatch has a DoD in its brief AND a DoR satisfied before it starts. |
### Not Checkable (24)
| I2 | A project's scope is bounded by its project spec. | No trace event records scope expansion/contraction. |

Until now, the trace events emitted by the Drive skills had no reader — the numbers existed on disk but nothing turned them into a verdict. This slice is that reader, and it leads with an honest "verdict not computable" rather than implying that an all-green metrics table means the run was good.

Decision

This PR ships skills-contrib/drive-diagnostics/ — a pure, LLM-free analytics tool over an emitted Drive trace, packaged as a portable skill. Five pieces:

  1. Schema + loader — the 17 trace event types transcribed to executable arktype schemas, plus a JSONL loader that validates line-by-line and never throws (malformed lines and unknown event types are collected, not fatal).
  2. Diagnostic metrics — ~14 metrics the operator cares about: rework rate, first-pass acceptance, rounds-per-dispatch, backtrack ratio, tier mix, spec amendments, plan amendments, I12 halts, triage stability, wall-clock distributions, and more. Metric names are count-named and polarity-correct (a 0 reads as "the artefact held", not an inverted "stability" score). Each degrades to null/0 with a note rather than crashing when its source events are absent.
  3. Assertion library — checkers for Drive invariants I1–I12, the 8 cascade-redesign rules, and the brief-discipline anti-patterns. Each returns pass / fail / not-checkable; every not-checkable carries a one-line rationale, so the report doubles as an honest map of what the current trace vocabulary cannot observe.
  4. Report generator + CLI — a deterministic markdown dashboard, wired as pnpm drive:diagnose.
  5. Best-effort post-hoc parser — reconstructs trace-shaped events (dispatch starts, spec/plan authoring, operator-turn count) from a raw Cursor transcript for runs that predate native emission, tagging every reconstructed event origin: post-hoc + a confidence and never inventing timestamps.

The code lives in skills-contrib/drive-diagnostics/ as a portable skill (a SKILL.md plus the bundled TypeScript), sibling to the emit-side drive-record-traces skill. The deciding axis is portability: the whole Drive methodology lives in skills-contrib/ skills that travel to other repos (symlinked into .agents/skills/ and .claude/skills/ by the pnpm install prepare hook), so the read-side reader ships alongside the cluster rather than being stranded in this repo's scripts/. It's still methodology meta-tooling — not a packages/ product surface — so it stays outside the pnpm lint:deps layering graph and runs directly via node + node --test. Its one external dependency is arktype; for out-of-repo use the SKILL.md documents the install step.

Reviewer notes

  • The clean self-grade is not a methodology result — read the caveat. The tool grades this project's own trace (self-grade-report.md) and the numbers are spotless (100% first-pass, zero amendments). That's because the trace was hand-emitted by the orchestrator while building the slices (dogfooding the emission protocol), not produced by an unattended skill-driven run. The self-grade proves the reader is correct over a conformant trace; it does not validate the methodology. This is recorded as the slice's landed lesson in drive/retro/findings.md and is exactly the gap the Drive — Judge + live-experiment harness project closes.
  • Most assertions are not-checkable, by design. 24 of 31 invariant/cascade/brief checks can't be observed from the current trace vocabulary (scope/purpose immutability, sizing-by-INVEST, brief content sections). The not-checkable list with rationales is a deliverable — the honest coverage-gap map — not a TODO.
  • Largest files to spot-check: metrics.ts (547) and its test (795). The real-trace tests assert internal consistency (derived from the loaded events), not hardcoded counts — deliberately, because the live trace keeps growing as we dogfood (see the testing-discipline note in the landed finding).
  • schema.ts literal types need as const. arktype schemas transcribed from markdown silently infer never without it; a real tsc pass over the tool (not just node --test) is the gate that catches it.
  • cascade-brief.test.ts is committed in the D7 commit — it was created in D4 but never staged; folded in here rather than rewriting history.
  • Post-hoc limitation: reconstructed (timestamp-less) events surface via the origin banner + operator-turn count but don't yet feed the metric computations. Acceptable under the best-effort framing; noted in the D6 commit.

What lands in this PR

Commit What it adds
slice-3 spec + plan Slice spec (deliverables, code-home decision, SDoD) + 7-dispatch plan.
S3-D1 arktype schemas for the 17 event types + non-throwing JSONL loader.
S3-D2 Diagnostic-metrics module (~14 metrics, graceful degradation).
S3-D3 Assertion library A — invariants I1–I12.
S3-D4 Assertion library B — 8 cascade rules + brief-discipline anti-patterns.
S3-D5 Report generator + CLI + pnpm drive:diagnose wiring.
S3-D6 Best-effort post-hoc transcript parser + --posthoc flag.
S3-D7 Self-grade run + self-grade-report.md, manual-qa.md (7 checks) + qa-run-01.md (PASS), landed lesson.
legibility + honesty Count-named, polarity-correct metric names; report now leads with assertion-coverage headline, provenance caveat, and a "Run verdict: Not computable" section (TML-2719 + report-side of TML-2720).
portable skill Relocate the tool from scripts/drive-diagnostics/ to skills-contrib/drive-diagnostics/ with a SKILL.md, so it travels with the drive-* cluster instead of being stranded in this repo (git mv, history preserved).

Verification

  • pnpm test:scripts407 pass / 0 fail (load, metrics, invariants, cascade-brief, report, posthoc suites).
  • tsc --noEmit --strict over skills-contrib/drive-diagnostics/** — clean.
  • biome check skills-contrib/drive-diagnostics — clean, 0 no-bare-cast diagnostics.
  • pnpm lint:skills (SKILL.md frontmatter) + pnpm lint:deps (layering) — both clean after the relocation.
  • Manual QA: 7 checks in manual-qa.md executed end-to-end → qa-run-01.md, PASS, no Blockers (native run, malformed/empty traces, post-hoc reconstruction, assertion coverage, directory boundary, self-grade committed).
  • Merged current origin/main; re-ran pnpm test:scripts + CLI smoke green post-merge.

Follow-ups

  • Drive — Judge + live-experiment harness — the unattended skill-emitted run that turns this reader from "validated against a hand-emitted trace" into a real methodology signal; also the home for cross-run aggregation and LLM-based correctness/F-mode judging.
  • TML-2719 — metric naming/polarity + report legibility. Landed in this PR (the legibility + honesty commit).
  • TML-2720 — run-verdict synthesis + token/correctness vocabulary gaps. The report-side honesty (verdict-not-computable, coverage headline, provenance caveat) landed in this PR; the actual token/correctness instrumentation + cross-run baseline remain in the Judge + live-experiment harness project.
  • TML-2721 — deterministic drive-trace-emit emitter (kills hand-emission drift). Not in this PR.

Alternatives considered

  • Put the code in packages/. Rejected — it's methodology meta-tooling, not a product surface; living under packages/ would drag it into the layering graph and the published build for no benefit.
  • Leave it in scripts/. Rejected — scripts/ strands the reader in this repo while the rest of the Drive methodology (skills) travels to other repos. Shipping it as a skills-contrib/ skill keeps the cluster coherent and portable. The cost is one documented dependency (arktype) for out-of-repo use.
  • Rewrite validation to drop the arktype dependency (fully zero-dep skill). Deferred — hand-rolling 17 validators risks silent divergence from the emitted vocabulary, and arktype is the repo standard. Correctness-first; a zero-dep validator is a possible future optimization noted in the SKILL.md.
  • Use Vitest like the rest of the repo. Rejected for this tree — scripts/ already runs via node + node --test; matching that keeps the diagnostics tool dependency-light and runnable without the workspace build.
  • Skip the not-checkable assertions (only ship what's observable). Rejected — the explicit coverage-gap map is the most useful output for deciding what the next trace-vocabulary additions should be.
  • Make the post-hoc parser feed metrics. Deferred — reconstructed events lack timestamps, so feeding them into wall-clock/rework metrics would fabricate precision. Best-effort reconstruction surfaces origin + operator-turn count only.

Checklist

  • All commits are signed off (git commit -s). (Merge commit exempt.)
  • I read CONTRIBUTING.md and the change is scoped to one logical concern.
  • Tests are updated (6 node --test suites, 407 cases).
  • The PR title is in TML-NNNN: <sentence-case title> form.
  • Skill update: n/a — internal methodology tooling only. No user-facing CLI/API/config/error-code surface; the Drive skills that emit traces were instrumented in slices 1–2.

Summary by CodeRabbit

  • New Features

    • Added a new diagnostics tool that analyzes trace data and generates structured reports with metrics and assertion results.
    • Added a new command-line interface to run diagnostics on trace files.
  • Documentation

    • Added comprehensive documentation explaining the diagnostics tool, report interpretation, and usage.
  • Tests

    • Added extensive test coverage for assertions, metrics, trace loading, and report generation.
  • Chores

    • Added new development dependency.
    • Updated build configuration to include new test scripts.

Review Change Stack

wmadden added 9 commits May 29, 2026 09:22
…gnostics)

Slice 3 (TML-2717, close-out slice) ships the deterministic reader for the
trace.jsonl that slices 1-2 emit: a scripts/drive-diagnostics/ TypeScript tool
(schema+loader, ~14 diagnostic metrics, assertion library over I1-I12 + 8
cascade rules + brief-discipline, markdown report generator, best-effort
post-hoc transcript parser), closing by grading this project on its own trace.

Architecture decision: scripts/drive-diagnostics/ (meta-tooling, not product;
stays out of packages/ layering). Seven strictly-sequential M-sized dispatches.

Continues dogfooding: spec-authored + plan-authored emitted to the live trace.

Signed-off-by: Will Madden <madden@prisma.io>
scripts/drive-diagnostics/schema.ts transcribes the 17 arktype event schemas
from the drive-record-traces vocabulary; load.ts parses + validates trace.jsonl
line-by-line into {events, unknown, errors}, never throwing on malformed input.
Tests (node --test) cover the real trace fixture (32 events, 0 errors), a
malformed line, an unknown event_type, and empty input.

Orchestrator fixups in-round: as-const on the arktype envelope so .infer keeps
literal types; events kept schema-faithful (TraceEvent[], origin tracked by the
load source not stamped on events) to stay cast-free under the no-bare-cast
ratchet.

Signed-off-by: Will Madden <madden@prisma.io>
metrics.ts computes the project-DoD metric set over a validated TraceEvent[]:
rework (rounds_per_dispatch, first-pass acceptance, backtrack ratio, brief
stability, tier mix, wall-clock), planning quality (spec/plan stability, I12
halt rate, triage stability), artefact churn (write amplification, time-to-
stability), lifecycle/cadence (project/slice wall-clock, health-check cadence,
retro distribution), and a null operator-turn placeholder (post-hoc only).
Each metric degrades to null-with-note on absent signal; none throw on partial
traces. 79 node --test cases with hand-checked expected values; cast-free.

Signed-off-by: Will Madden <madden@prisma.io>
…I1-I12

assertions/invariants.ts: one checker per Drive invariant I1-I12, returning
pass/fail/not-checkable with evidence pointers into the trace. Observable
(I1/I4/I6/I8/I10) carry real checks; the rest are honestly not-checkable with a
one-line gap rationale (the project-DoD "coverage gaps named" requirement).
assertions/types.ts holds the shared AssertionResult shape (D4 reuses it).

Also hardens S3-D2 real-trace metric tests: switch from magic-count pins to
internally-consistent assertions, so the suite survives the live trace growing
as this project keeps dogfooding its own instrumentation.

Signed-off-by: Will Madden <madden@prisma.io>
…es + brief-discipline

assertions/cascade.ts (the 8 artifact-cascade-redesign rules) + brief.ts (the
brief-discipline anti-patterns), reusing the D3 AssertionResult shape, plus
assertions/index.ts runAssertions() aggregating invariants + cascade + brief
(31 results on the real trace). Most cascade/brief rules are authoring-process
concerns with no trace signal: marked not-checkable with a one-line rationale
(the named coverage gaps). One heuristic check (brief byte-length vs spec).
94 node --test cases; tsc 0, 0 casts.

Signed-off-by: Will Madden <madden@prisma.io>
…json wiring

report.ts renders a deterministic markdown dashboard (header + origin/parse-
health banners, metrics tables with n/a-no-signal cells, assertion sections
grouped pass/fail/not-checkable with evidence pointers). cli.ts wires
load -> computeMetrics -> runAssertions -> renderReport, import-guarded for
tests. Root package.json gains drive:diagnose + the five suites in test:scripts.
pnpm test:scripts green (374). Wall-clock means rounded for clean output.

Signed-off-by: Will Madden <madden@prisma.io>
…arser

posthoc.ts reconstructs dispatch-start / spec-authored / plan-authored events +
an operator-turn count from a Cursor agent transcript, each stamped
origin:post-hoc with a confidence grade; it never invents timestamps and emits
a no-signal note when no Drive structure is detected. cli.ts gains --posthoc
(origin native/post-hoc/mixed; operator-turn count threaded into the report).
23 posthoc node --test cases + a committed transcript fixture; pnpm test:scripts
green (397). Known limitation: reconstructed (timestamp-less) events are
surfaced via origin + operator count but do not feed the metric computations.

Signed-off-by: Will Madden <madden@prisma.io>
…landed lesson

Close slice 3 (trace reader + diagnostics). Ran the finished tool on
this project's own trace.jsonl and committed the self-grade report
(59 events, 7 assertions pass / 0 fail, clean rework + planning
metrics). Added manual-qa.md (7 checks + pre-flight gate) and
qa-run-01.md (PASS, no Blockers).

Landed the self-grade lesson in drive/retro/findings.md: a self-grade
over a hand-emitted trace validates the reader, not the methodology —
plus the live-artefact-test-coupling and arktype-as-const gotchas.

Also commits scripts/drive-diagnostics/test/cascade-brief.test.ts,
which D4 created but never staged.

All SDoD1-9 checked; D1-D7 SATISFIED.

Signed-off-by: Will Madden <madden@prisma.io>
@wmadden wmadden requested a review from a team as a code owner May 29, 2026 09:01
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces the drive-diagnose-run skill, a deterministic LLM-free diagnostics pipeline that validates Drive JSONL traces against a typed schema, aggregates metrics across four domains (rework, planning quality, artefact churn, lifecycle), runs 31 structured assertions (12 invariants, 8 cascade rules, 11 brief-discipline checks), and renders a comprehensive Markdown report. The implementation includes trace loading with error classification, post-hoc event reconstruction from transcripts, CLI orchestration, and extensive test coverage.

Changes

Diagnostics Pipeline Implementation

Layer / File(s) Summary
Trace Schema and Assertion Type Contracts
skills-contrib/drive-diagnose-run/schema.ts, skills-contrib/drive-diagnose-run/assertions/types.ts
Defines 17 ArkType event schemas (dispatch, round, brief, spec, plan, triage, falsified-assumption, project, slice, health-check, retro) with field constraints, unions into TraceEvent, and exports AssertionResult/TraceRef/AssertionStatus types.
Trace Loading and Post-hoc Reconstruction
skills-contrib/drive-diagnose-run/load.ts, skills-contrib/drive-diagnose-run/posthoc.ts
Loads JSONL traces with per-line JSON parsing, schema validation, and outcome routing into events/unknown/errors; reconstructs synthetic dispatch/spec/plan events from transcript tool_use items with deterministic IDs and confidence ratings.
Metrics Computation
skills-contrib/drive-diagnose-run/metrics.ts
Aggregates events into typed Metrics across rework (rounds, acceptance, backtrack, brief stability, tier mix, wall-clock), planning quality (amendments, dispatch sizes, i12 halts, verdict stability), artefact churn (write amplification, time-to-stability), and lifecycle (wallclock, health-check cadence, retro distributions).
Invariant Assertions (I1–I12)
skills-contrib/drive-diagnose-run/assertions/invariants.ts
Runs 12 invariant checks: I1 detects duplicate slice-completed events; I4 validates project-started has child work or direct-change verdict; I6 ensures spec+plan precede non-direct dispatches; I8 ensures brief-issued precedes dispatch-start; I10 validates project specs have non-zero DoD; others are static not-checkable.
Cascade Rules and Brief Discipline (Cascade-1..8, BD-1..11)
skills-contrib/drive-diagnose-run/assertions/cascade.ts, skills-contrib/drive-diagnose-run/assertions/brief.ts
Implements 8 cascade rules (Cascade-3 evaluates triage verdict distribution, Cascade-4 surfaces falsified-assumption evidence, others not-checkable) and 11 brief-discipline assertions (BD-8 heuristically compares brief_byte_length vs spec, BD-9 surfaces amended brief-disposition, others not-checkable or static).
Assertions Aggregation and Report Rendering
skills-contrib/drive-diagnose-run/assertions/index.ts, skills-contrib/drive-diagnose-run/report.ts
Aggregates invariant/cascade/brief results into single assertion array; renders Markdown report sections for metrics (rework, planning, churn, lifecycle, operator), assertion coverage, provenance, and verdict explanation with nullability handling.
CLI Entrypoint and Orchestration
skills-contrib/drive-diagnose-run/cli.ts
Parses CLI arguments, loads native and post-hoc traces, computes metrics and assertions, derives run metadata (origin, project IDs), renders report, and writes to file or stdout with error handling.
Invariant, Load, Metrics, Post-hoc, Cascade-Brief, and Report Test Suites
skills-contrib/drive-diagnose-run/test/invariants.test.ts, test/load.test.ts, test/metrics.test.ts, test/posthoc.test.ts, test/cascade-brief.test.ts, test/report.test.ts
Comprehensive test coverage for each module with synthetic trace builders, empty/partial/malformed input scenarios, real fixture integration tests, and structural guarantee assertions.
Documentation, Configuration, and Test Fixtures
skills-contrib/drive-diagnose-run/SKILL.md, package.json, drive/retro/findings.md, test/fixtures/sample-transcript.jsonl
Adds SKILL.md documentation, npm scripts (drive:diagnose, expanded test:scripts), arktype dev dependency, retro findings entries documenting diagnostics gaps, and sample JSONL transcript fixture.

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

  • prisma/prisma-next#610: Main PR's trace schema and metrics (ProjectStarted/Closed, SliceStarted/Completed, HealthCheckFired, RetroLanded) analyze lifecycle/cadence/direct-change events introduced in the base PR.
  • prisma/prisma-next#604: Main PR's schema/metrics/assertions consume the exact JSONL events that the base PR instruments (brief-issued, triage-verdict, falsified-assumption, spec/plan authored/amended).

Suggested Reviewers

  • aqrln

🐰 Hop, skip, and a diagnostic jump!
Traces now speak truth with every trace-event thump,
Assertions count high, metrics aggregate low,
A pipeline pristine from input to final Markdown show!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely identifies the primary deliverable: a drive trace reader and diagnostics tool integrated as a pnpm script, with the ticket reference (TML-2717) and the skill module location provided for context.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tml-2717-drive-instrumentation-s3-trace-reader

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 29, 2026

Open in StackBlitz

@prisma-next/extension-author-tools

npm i https://pkg.pr.new/@prisma-next/extension-author-tools@613

@prisma-next/mongo-runtime

npm i https://pkg.pr.new/@prisma-next/mongo-runtime@613

@prisma-next/family-mongo

npm i https://pkg.pr.new/@prisma-next/family-mongo@613

@prisma-next/sql-runtime

npm i https://pkg.pr.new/@prisma-next/sql-runtime@613

@prisma-next/family-sql

npm i https://pkg.pr.new/@prisma-next/family-sql@613

@prisma-next/extension-arktype-json

npm i https://pkg.pr.new/@prisma-next/extension-arktype-json@613

@prisma-next/extension-cipherstash

npm i https://pkg.pr.new/@prisma-next/extension-cipherstash@613

@prisma-next/middleware-cache

npm i https://pkg.pr.new/@prisma-next/middleware-cache@613

@prisma-next/mongo

npm i https://pkg.pr.new/@prisma-next/mongo@613

@prisma-next/extension-paradedb

npm i https://pkg.pr.new/@prisma-next/extension-paradedb@613

@prisma-next/extension-pgvector

npm i https://pkg.pr.new/@prisma-next/extension-pgvector@613

@prisma-next/extension-postgis

npm i https://pkg.pr.new/@prisma-next/extension-postgis@613

@prisma-next/postgres

npm i https://pkg.pr.new/@prisma-next/postgres@613

@prisma-next/sql-orm-client

npm i https://pkg.pr.new/@prisma-next/sql-orm-client@613

@prisma-next/sqlite

npm i https://pkg.pr.new/@prisma-next/sqlite@613

@prisma-next/target-mongo

npm i https://pkg.pr.new/@prisma-next/target-mongo@613

@prisma-next/adapter-mongo

npm i https://pkg.pr.new/@prisma-next/adapter-mongo@613

@prisma-next/driver-mongo

npm i https://pkg.pr.new/@prisma-next/driver-mongo@613

@prisma-next/contract

npm i https://pkg.pr.new/@prisma-next/contract@613

@prisma-next/utils

npm i https://pkg.pr.new/@prisma-next/utils@613

@prisma-next/config

npm i https://pkg.pr.new/@prisma-next/config@613

@prisma-next/errors

npm i https://pkg.pr.new/@prisma-next/errors@613

@prisma-next/framework-components

npm i https://pkg.pr.new/@prisma-next/framework-components@613

@prisma-next/operations

npm i https://pkg.pr.new/@prisma-next/operations@613

@prisma-next/ts-render

npm i https://pkg.pr.new/@prisma-next/ts-render@613

@prisma-next/contract-authoring

npm i https://pkg.pr.new/@prisma-next/contract-authoring@613

@prisma-next/ids

npm i https://pkg.pr.new/@prisma-next/ids@613

@prisma-next/psl-parser

npm i https://pkg.pr.new/@prisma-next/psl-parser@613

@prisma-next/psl-printer

npm i https://pkg.pr.new/@prisma-next/psl-printer@613

@prisma-next/cli

npm i https://pkg.pr.new/@prisma-next/cli@613

@prisma-next/cli-telemetry

npm i https://pkg.pr.new/@prisma-next/cli-telemetry@613

@prisma-next/emitter

npm i https://pkg.pr.new/@prisma-next/emitter@613

@prisma-next/migration-tools

npm i https://pkg.pr.new/@prisma-next/migration-tools@613

prisma-next

npm i https://pkg.pr.new/prisma-next@613

@prisma-next/vite-plugin-contract-emit

npm i https://pkg.pr.new/@prisma-next/vite-plugin-contract-emit@613

@prisma-next/mongo-codec

npm i https://pkg.pr.new/@prisma-next/mongo-codec@613

@prisma-next/mongo-contract

npm i https://pkg.pr.new/@prisma-next/mongo-contract@613

@prisma-next/mongo-value

npm i https://pkg.pr.new/@prisma-next/mongo-value@613

@prisma-next/mongo-contract-psl

npm i https://pkg.pr.new/@prisma-next/mongo-contract-psl@613

@prisma-next/mongo-contract-ts

npm i https://pkg.pr.new/@prisma-next/mongo-contract-ts@613

@prisma-next/mongo-emitter

npm i https://pkg.pr.new/@prisma-next/mongo-emitter@613

@prisma-next/mongo-schema-ir

npm i https://pkg.pr.new/@prisma-next/mongo-schema-ir@613

@prisma-next/mongo-query-ast

npm i https://pkg.pr.new/@prisma-next/mongo-query-ast@613

@prisma-next/mongo-orm

npm i https://pkg.pr.new/@prisma-next/mongo-orm@613

@prisma-next/mongo-query-builder

npm i https://pkg.pr.new/@prisma-next/mongo-query-builder@613

@prisma-next/mongo-lowering

npm i https://pkg.pr.new/@prisma-next/mongo-lowering@613

@prisma-next/mongo-wire

npm i https://pkg.pr.new/@prisma-next/mongo-wire@613

@prisma-next/sql-contract

npm i https://pkg.pr.new/@prisma-next/sql-contract@613

@prisma-next/sql-errors

npm i https://pkg.pr.new/@prisma-next/sql-errors@613

@prisma-next/sql-operations

npm i https://pkg.pr.new/@prisma-next/sql-operations@613

@prisma-next/sql-schema-ir

npm i https://pkg.pr.new/@prisma-next/sql-schema-ir@613

@prisma-next/sql-contract-psl

npm i https://pkg.pr.new/@prisma-next/sql-contract-psl@613

@prisma-next/sql-contract-ts

npm i https://pkg.pr.new/@prisma-next/sql-contract-ts@613

@prisma-next/sql-contract-emitter

npm i https://pkg.pr.new/@prisma-next/sql-contract-emitter@613

@prisma-next/sql-lane-query-builder

npm i https://pkg.pr.new/@prisma-next/sql-lane-query-builder@613

@prisma-next/sql-relational-core

npm i https://pkg.pr.new/@prisma-next/sql-relational-core@613

@prisma-next/sql-builder

npm i https://pkg.pr.new/@prisma-next/sql-builder@613

@prisma-next/target-postgres

npm i https://pkg.pr.new/@prisma-next/target-postgres@613

@prisma-next/target-sqlite

npm i https://pkg.pr.new/@prisma-next/target-sqlite@613

@prisma-next/adapter-postgres

npm i https://pkg.pr.new/@prisma-next/adapter-postgres@613

@prisma-next/adapter-sqlite

npm i https://pkg.pr.new/@prisma-next/adapter-sqlite@613

@prisma-next/driver-postgres

npm i https://pkg.pr.new/@prisma-next/driver-postgres@613

@prisma-next/driver-sqlite

npm i https://pkg.pr.new/@prisma-next/driver-sqlite@613

commit: ebd1daa

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (3)
package.json (1)

41-41: ⚡ Quick win

Consider using glob patterns instead of explicit file enumeration.

The test:scripts command is now very long (~900 characters) and requires manual updates each time a test file is added. Node's --test flag supports glob patterns, which would make this more maintainable:

-    "test:scripts": "node --test scripts/lint-workflow-triggers.test.mjs scripts/validate-skills.test.mjs scripts/determine-version-utils.test.ts scripts/check-upgrade-coverage.test.mjs scripts/set-version-utils.test.ts scripts/check-publish-deps-pn-pins.test.mjs scripts/publish-packages-utils.test.mjs scripts/check-clean-tree.test.mjs scripts/lint-casts.test.mjs scripts/sync-agent-rules.test.mjs scripts/drive-diagnostics/test/load.test.ts scripts/drive-diagnostics/test/metrics.test.ts scripts/drive-diagnostics/test/invariants.test.ts scripts/drive-diagnostics/test/cascade-brief.test.ts scripts/drive-diagnostics/test/report.test.ts scripts/drive-diagnostics/test/posthoc.test.ts",
+    "test:scripts": "node --test 'scripts/**/*.test.{mjs,ts}'",

However, note that glob patterns may discover tests in different orders or include unintended files, so verify behavior before adopting this pattern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@package.json` at line 41, The npm script "test:scripts" is brittle because it
lists every test file explicitly; change it to use a glob with Node's --test
(e.g., replace the long explicit file enumeration in the "test:scripts" script
with a glob such as one matching your scripts test tree like
scripts/**/*.{test}.{mjs,ts} or scripts/**/*.test.*) so new tests are picked up
automatically; after updating the "test:scripts" value, run the test command to
verify it discovers the intended files and adjust the glob to exclude any
unwanted matches.
scripts/drive-diagnostics/test/cascade-brief.test.ts (1)

383-429: ⚡ Quick win

Add BD-8 test with mixed project+slice specs for the same run.

This will lock in that BD-8 uses slice-spec context and won’t regress if project specs are present.

Suggested test addition
+describe('BD-8 — uses slice specs (project spec should not drive comparison)', () => {
+  const events: TraceEvent[] = [
+    { ...mkSpecAuthored('project-spec.md', 2000), spec_kind: 'project' as const },
+    mkSpecAuthored('slice-spec.md', 8000),
+    mkBriefIssued('d-001', 3000),
+  ];
+  const results = checkBriefDiscipline(events);
+
+  it('status is pass (brief compared against slice spec, not project spec)', () => {
+    const r = results.find((x) => x.id === 'BD-8');
+    assert.ok(r !== undefined);
+    assert.equal(r.status, 'pass');
+  });
+});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/drive-diagnostics/test/cascade-brief.test.ts` around lines 383 - 429,
The new test suite must ensure BD-8 behavior when both a project-level spec and
a slice-level spec appear in the same run so the rule uses the slice-spec
context; add a test in the BD-8 block that calls checkBriefDiscipline with
events including mkSpecAuthored for a project spec and mkSpecAuthored for the
slice (distinct filenames/ids) plus mkBriefIssued, then assert the BD-8 result
uses the slice spec (status and evidence match the slice-based expectations,
e.g., pass when brief shorter than slice spec and fail/heuristic when >= slice
spec) by inspecting results.find(x => x.id === 'BD-8'), its status, note, and
evidence fields to lock in slice-spec precedence.
scripts/drive-diagnostics/test/invariants.test.ts (1)

493-508: ⚡ Quick win

Add I10 regression coverage for spec-amended-only project traces.

Current I10 tests won’t detect the early-return path where project spec-amended events are ignored when spec-authored(project) is absent.

Suggested test addition
+describe('I10 — fail: project spec-amended with dod_items_count = 0 and no project spec-authored', () => {
+  const events: TraceEvent[] = [
+    {
+      ...mkSpecAuthored('spec.md', 'slice', 5),
+      event_type: 'spec-amended' as const,
+      spec_kind: 'project' as const,
+      dod_items_count: 0,
+    },
+  ];
+  const results = checkInvariants(events);
+
+  it('status is fail', () => {
+    const r = results.find((x) => x.id === 'I10');
+    assert.ok(r !== undefined);
+    assert.equal(r.status, 'fail');
+  });
+});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/drive-diagnostics/test/invariants.test.ts` around lines 493 - 508,
Add a new test case that covers the "spec-amended"-only project trace path so
I10 detects the early-return; instead of using mkSpecAuthored create events with
mkSpecAmended('spec.md','project',0) (or equivalent spec-amended TraceEvent) and
pass them to checkInvariants, then assert that the result with id 'I10' exists,
has status 'fail', and its evidence note mentions 'dod_items_count'; place this
alongside the existing I10 describe block and reference mkSpecAmended,
TraceEvent, checkInvariants and the 'I10' id to ensure the code path where
spec-authored is absent is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/drive-diagnostics/assertions/brief.ts`:
- Around line 139-146: The current minSpecByteLength calculation uses all
'spec-authored' events including project specs; restrict the pool to slice specs
only by filtering specAuthored to include only events that denote a slice (e.g.,
check the event's spec type/scope property such as e.spec_type or e.scope or an
is_slice flag) before building minSpecByteLength; update the creation of
specAuthored or apply .filter(...) so the loop that sets minSpecByteLength only
iterates over slice spec events (refer to specAuthored, eventsOfType, and
minSpecByteLength).

In `@scripts/drive-diagnostics/assertions/index.ts`:
- Line 7: Remove the banned barrel-style re-export line "export type {
AssertionResult, AssertionStatus, TraceRef } from './types.ts'" from
assertions/index.ts; instead either have callers import those types directly
from './types.ts' or add a proper barrel under the repository's exports/ folder
and re-export them there (do not re-export from this module). Ensure no other
"export ... from './types.ts'" remains in assertions/index.ts.

In `@scripts/drive-diagnostics/assertions/invariants.ts`:
- Around line 259-270: The current matching uses only dispatch_id
(briefDispatchIds) so it can incorrectly match briefs from other runs or from
later timestamps; change the logic in the loop over dispatchStarts to require a
brief-issued with the same dispatch_id AND the same run_id that also occurs
before the dispatch-start (compare timestamps), e.g. build a lookup keyed by
`${run_id}:${dispatch_id}` or search briefIssueds for an entry where
brief.dispatch_id === ds.dispatch_id && brief.run_id === ds.run_id &&
brief.timestamp <= ds.timestamp; use that check instead of
briefDispatchIds.has(...) when deciding to push ref(ds) into matched or to push
the orphan message via ref(ds, ...).
- Around line 46-62: The note for invariant I1 incorrectly estimates the number
of affected slices using Math.floor(duplicates.length / 2); instead compute the
actual number of unique slugs with more than one event (e.g., count keys in
bySlug where slugEvents.length > 1 or build a Set of slugs pushed to duplicates)
and use that count in the note; update the returned object (id 'I1', title 'A
slice or direct change delivers exactly one PR.') to use this correct count
instead of duplicates.length / 2 while leaving evidence as the duplicates array
and keeping the rest of the structure unchanged.
- Around line 309-338: The early return when projectSpecs.length === 0 prevents
the later spec-amended validation from running; change the flow so spec-amended
events are always checked: remove or bypass the early return and instead run the
projectSpecAmended check (eventsOfType(..., 'spec-amended') filtered by
spec_kind === 'project') even if projectSpecs is empty, push failures into
failing via ref(e, ...) as currently done, and update the result generation (id
'I10', title, status, evidence, note) to account for combined outcomes of
projectSpecs and projectSpecAmended checks.

In `@scripts/drive-diagnostics/cli.ts`:
- Around line 25-33: The argument parser loop that sets outPath and posthocPath
(the for loop iterating over args and assignments to outPath/posthocPath)
currently takes the next token without validating it; update the branches
handling '--out' and '--posthoc' to ensure the next token exists and does NOT
start with '--' before assigning to outPath or posthocPath, and if validation
fails emit a clear error (or throw) and exit instead of consuming another flag
as the value. Ensure you reference and validate args[i+1] and retain the
existing i++ behavior only after a successful validation.
- Around line 56-88: When calling parseTranscript(posthocPath) and
writeFileSync(outPath, ...), errors currently propagate and crash; wrap the
parseTranscript invocation (when posthocPath is provided) in a try/catch that on
error writes a clear message to stderr (e.g. using console.error or
process.stderr.write) including the posthocPath and error.message, then call
process.exit(1); likewise wrap the file write branch (writeFileSync) in a
try/catch that logs a descriptive stderr message including outPath and the error
and exits with code 1; reference parseTranscript, posthocResult, writeFileSync,
outPath, and process.exit when making these guarded changes.

In `@scripts/drive-diagnostics/posthoc.ts`:
- Around line 259-262: parseTranscript currently calls readFileSync(path) and
will throw for missing/unreadable files; wrap the file read + call to
parseTranscriptFromString in a try/catch inside parseTranscript(path) and on
error return a PostHocResult representing an empty/neutral diagnostic with a
note (include the path and error message) instead of throwing; update
parseTranscript to call parseTranscriptFromString only on success and construct
a minimal PostHocResult when catching errors so the diagnostics flow remains
best-effort.

In `@scripts/drive-diagnostics/report.ts`:
- Around line 43-49: The mdTable implementation is interpolating raw cell values
into the Markdown table which breaks when cells contain '|' or newlines; update
the mdTable(header, rows) function to escape pipe characters and normalize
newlines for each cell before joining (e.g., map each cell through an escape
function that replaces '|' with '\|' and converts or removes newlines like '\n'
-> '<br>' or a space), then use those escaped values in the existing
rows.map((r) => ...) logic so the table layout remains valid when values contain
special characters.

In `@scripts/drive-diagnostics/test/metrics.test.ts`:
- Around line 769-775: The test currently asserts the raw length of
m.planning_quality.plan_accuracy.dispatch_size_distributions equals
countOf('plan-authored'), but the metric contract allows
dispatch_size_distributions entries to be null; update the assertion to count
only non-null distributions before comparing. Specifically, compute the number
of non-null entries from
m.planning_quality.plan_accuracy.dispatch_size_distributions (e.g., by filtering
out null/undefined) and assert that filtered length equals
countOf('plan-authored'); this targets the computeMetrics output and keeps the
expectation aligned with the contract.

---

Nitpick comments:
In `@package.json`:
- Line 41: The npm script "test:scripts" is brittle because it lists every test
file explicitly; change it to use a glob with Node's --test (e.g., replace the
long explicit file enumeration in the "test:scripts" script with a glob such as
one matching your scripts test tree like scripts/**/*.{test}.{mjs,ts} or
scripts/**/*.test.*) so new tests are picked up automatically; after updating
the "test:scripts" value, run the test command to verify it discovers the
intended files and adjust the glob to exclude any unwanted matches.

In `@scripts/drive-diagnostics/test/cascade-brief.test.ts`:
- Around line 383-429: The new test suite must ensure BD-8 behavior when both a
project-level spec and a slice-level spec appear in the same run so the rule
uses the slice-spec context; add a test in the BD-8 block that calls
checkBriefDiscipline with events including mkSpecAuthored for a project spec and
mkSpecAuthored for the slice (distinct filenames/ids) plus mkBriefIssued, then
assert the BD-8 result uses the slice spec (status and evidence match the
slice-based expectations, e.g., pass when brief shorter than slice spec and
fail/heuristic when >= slice spec) by inspecting results.find(x => x.id ===
'BD-8'), its status, note, and evidence fields to lock in slice-spec precedence.

In `@scripts/drive-diagnostics/test/invariants.test.ts`:
- Around line 493-508: Add a new test case that covers the "spec-amended"-only
project trace path so I10 detects the early-return; instead of using
mkSpecAuthored create events with mkSpecAmended('spec.md','project',0) (or
equivalent spec-amended TraceEvent) and pass them to checkInvariants, then
assert that the result with id 'I10' exists, has status 'fail', and its evidence
note mentions 'dod_items_count'; place this alongside the existing I10 describe
block and reference mkSpecAmended, TraceEvent, checkInvariants and the 'I10' id
to ensure the code path where spec-authored is absent is exercised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: c8b12dd8-9aea-4783-adb3-6c054c3c01b4

📥 Commits

Reviewing files that changed from the base of the PR and between 485d437 and dcf0296.

⛔ Files ignored due to path filters (6)
  • projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/manual-qa.md is excluded by !projects/**
  • projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/plan.md is excluded by !projects/**
  • projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/qa-run-01.md is excluded by !projects/**
  • projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/self-grade-report.md is excluded by !projects/**
  • projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/spec.md is excluded by !projects/**
  • projects/drive-instrumentation/trace.jsonl is excluded by !projects/**
📒 Files selected for processing (20)
  • drive/retro/findings.md
  • package.json
  • scripts/drive-diagnostics/assertions/brief.ts
  • scripts/drive-diagnostics/assertions/cascade.ts
  • scripts/drive-diagnostics/assertions/index.ts
  • scripts/drive-diagnostics/assertions/invariants.ts
  • scripts/drive-diagnostics/assertions/types.ts
  • scripts/drive-diagnostics/cli.ts
  • scripts/drive-diagnostics/load.ts
  • scripts/drive-diagnostics/metrics.ts
  • scripts/drive-diagnostics/posthoc.ts
  • scripts/drive-diagnostics/report.ts
  • scripts/drive-diagnostics/schema.ts
  • scripts/drive-diagnostics/test/cascade-brief.test.ts
  • scripts/drive-diagnostics/test/fixtures/sample-transcript.jsonl
  • scripts/drive-diagnostics/test/invariants.test.ts
  • scripts/drive-diagnostics/test/load.test.ts
  • scripts/drive-diagnostics/test/metrics.test.ts
  • scripts/drive-diagnostics/test/posthoc.test.ts
  • scripts/drive-diagnostics/test/report.test.ts

Comment thread skills-contrib/drive-diagnostics/assertions/brief.ts Outdated
Comment thread skills-contrib/drive-diagnostics/assertions/index.ts Outdated
Comment thread skills-contrib/drive-diagnose-run/assertions/invariants.ts
Comment thread skills-contrib/drive-diagnose-run/assertions/invariants.ts
Comment thread skills-contrib/drive-diagnostics/assertions/invariants.ts Outdated
Comment thread skills-contrib/drive-diagnose-run/cli.ts
Comment thread skills-contrib/drive-diagnose-run/cli.ts
Comment thread skills-contrib/drive-diagnose-run/posthoc.ts
Comment thread skills-contrib/drive-diagnose-run/report.ts
Comment thread scripts/drive-diagnostics/test/metrics.test.ts Outdated
wmadden added 5 commits May 29, 2026 11:41
…ings + fix confabulated ticket ref

Add a principal-engineer-lens finding on the slice-3 self-grade report:
metric naming/polarity inversion (TML-2719), no computable run verdict
+ token/correctness vocabulary gaps (TML-2720), and the deterministic
drive-trace-emit emitter (TML-2721).

Also fix two stale references in the D7 self-grade finding: the
follow-up work is the Drive — Judge + live-experiment harness project,
not TML-2705 (which is an unrelated, completed ticket).

Signed-off-by: Will Madden <madden@prisma.io>
…-2720)

Rename planning-quality metrics to count-named, polarity-correct
fields (spec_amendments, plan_amendments, i12_halts) and label
dispatch-size distributions by plan path, so a value of 0 reads as
"the artefact held" rather than an inverted stability score.

Add the report-side honesty surfaces: an assertion-coverage headline
(checkable vs not-observable-from-trace), an origin-keyed provenance
caveat (native values are author-asserted, not independently verified),
a "Run verdict: Not computable" section naming the missing axes
(correctness, tokens, baseline), and an explicit token-usage = not
instrumented row.

Signed-off-by: Will Madden <madden@prisma.io>
Relocate the trace reader from scripts/drive-diagnostics to
skills-contrib/drive-diagnostics so it travels with the rest of the
drive-* skill cluster instead of being stranded in this repo. The
deciding axis is portability: the Drive methodology lives in portable
skills (symlinked into the harness skill trees by the pnpm install
prepare hook), and the read-side reader belongs alongside the
emit-side drive-record-traces contract skill.

Adds a SKILL.md (usage, output-interpretation caveats, arktype
prerequisite for out-of-repo use, by-name reference to
drive-record-traces as the vocabulary source); repoints the
drive:diagnose and test:scripts wiring and all in-repo references to
the new path. arktype is retained (correctness over a hand-rolled
validator rewrite); portability is handled by a documented install
step.

Signed-off-by: Will Madden <madden@prisma.io>
Declare arktype at the workspace root (catalog:) so the relocated skill
resolves its only external dependency in a clean install — skills-contrib
is not a workspace package, so the catalog dep was previously only
resolvable via an already-hoisted node_modules.

Address review threads:
- BD-8 restricts the brief-restates-spec heuristic to slice specs.
- Drop the barrel-style type re-export from assertions/index.ts; import
  the types directly from their module.
- I1 counts distinct duplicated slugs rather than halving the evidence list.
- I8 keys on project_run_id::dispatch_id and requires the brief-issued to
  precede the dispatch-start, closing cross-run and late-brief false-passes.
- I10 still validates spec-amended events when no project spec-authored exists.
- cli validates --out/--posthoc values and fails with a clear stderr message
  and exit 1 instead of crashing on parse/write errors.
- posthoc returns a structured empty result with a note for unreadable
  transcripts instead of throwing.
- mdTable escapes pipe and newline characters in cell values.

Signed-off-by: Will Madden <madden@prisma.io>
@wmadden wmadden force-pushed the tml-2717-drive-instrumentation-s3-trace-reader branch from 08e3942 to 42a5b92 Compare May 29, 2026 11:25
@wmadden
Copy link
Copy Markdown
Contributor Author

wmadden commented May 29, 2026

Addressed all CodeRabbit review threads in 42a5b9242 (and fixed the failing CI). Mapping:

Thread Fix
assertions/brief.ts — BD-8 pools project specs Heuristic now restricted to slice specs (spec_kind === 'slice').
assertions/index.ts — barrel re-export Removed the export type … from './types.ts'; consumers import the types directly.
assertions/invariants.ts — I1 duplicate-slug count Counts distinct duplicated slugs instead of floor(len/2).
assertions/invariants.ts — I8 false-pass Keys on project_run_id::dispatch_id and requires brief-issued to precede dispatch-start.
assertions/invariants.ts — I10 early return spec-amended events are still validated when no project spec-authored exists.
cli.ts--out/--posthoc token consumption Validates the value exists and isn't another flag; clear stderr + exit 1.
cli.ts — uncaught parse/write errors main() wrapped: failures print a message and exit 1 instead of a stack trace.
posthoc.ts — unreadable transcript throws Returns a structured empty result with a note (best-effort preserved).
report.ts — unescaped Markdown cells escapeMdCell escapes `

Separately, the Lint failure was arktype not resolving from skills-contrib/ in a clean install (it's a catalog dep not declared at the workspace root); now declared as a root devDependency (catalog:). New tests cover the I1/I8/I10, cli, posthoc, and mdTable changes; pnpm test:scripts is green (425). All four branch commits are now signed off (DCO).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
skills-contrib/drive-diagnostics/SKILL.md (2)

25-27: ⚡ Quick win

Add language specifiers to shell code blocks.

Per static analysis and Markdown best practices, shell commands should specify the language.

📝 Proposed fix for code block language specifiers
-```
+```bash
 node skills-contrib/drive-diagnostics/cli.ts <trace.jsonl> [--posthoc <transcript>] [--out <output.md>]

```diff
-```
+```bash
 pnpm drive:diagnose <trace.jsonl>

</details>


Also applies to: 31-33

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @skills-contrib/drive-diagnostics/SKILL.md around lines 25 - 27, Update the
Markdown code fences for the shell examples so they include a language specifier
(bash); specifically change the fenced blocks containing the commands "node
skills-contrib/drive-diagnostics/cli.ts <trace.jsonl> [--posthoc ]
[--out <output.md>]" and "pnpm drive:diagnose <trace.jsonl>" (and the other
same-style blocks mentioned at lines 31-33) to use bash at the start of each fence instead of plain , leaving the command text unchanged.


</details>

---

`51-53`: _⚡ Quick win_

**Add language specifier to npm install code block.**

Per static analysis and Markdown best practices, shell commands should specify the language.



<details>
<summary>📝 Proposed fix</summary>

```diff
-```
+```bash
 npm install arktype
 ```
```

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @skills-contrib/drive-diagnostics/SKILL.md around lines 51 - 53, The fenced
code block containing the single-line command "npm install arktype" in SKILL.md
should include a language specifier for shell; update the backticks from ``` to

best practices and static analysis rules.
skills-contrib/drive-diagnostics/test/report.test.ts (1)

3-3: ⚡ Quick win

Drop the .ts suffix in this TypeScript import.

Line 3 should use extensionless import style to match repo conventions.

Proposed fix
-import type { AssertionResult } from '../assertions/types.ts';
+import type { AssertionResult } from '../assertions/types';

As per coding guidelines, **/*.{ts,tsx}: "Never add file extensions to imports in TypeScript."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills-contrib/drive-diagnostics/test/report.test.ts` at line 3, The import
statement for the AssertionResult type includes a .ts extension; update the
import in the test file so it uses the extensionless module specifier (import
type { AssertionResult } from '../assertions/types';) to follow the repo's
TypeScript import convention and avoid file extensions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@skills-contrib/drive-diagnostics/report.ts`:
- Line 1: The import statements in report.ts include explicit ".ts" extensions
(e.g., the import of AssertionResult from './assertions/types.ts' and the
imports from './load.ts' and './metrics.ts'); remove the ".ts" suffixes so they
read './assertions/types', './load', and './metrics' respectively to match
TypeScript/ESM import conventions. Update the import lines at the top of
report.ts that reference AssertionResult and any symbols imported from load.ts
and metrics.ts (look for imports like "import type { AssertionResult } from
'./assertions/types.ts'" and similar) to use the paths without the ".ts"
extension.

In `@skills-contrib/drive-diagnostics/SKILL.md`:
- Line 45: Update the Node version requirement in SKILL.md to match
package.json's authoritative engines field; replace the "Node 22+" text with
"Node 24+" (or ">=24") so SKILL.md and the package.json engines field ("node":
">=24") are consistent.

In `@skills-contrib/drive-diagnostics/test/metrics.test.ts`:
- Around line 769-772: The test assumes dispatch_sizes contains one entry per
"plan-authored" event but computePlanningQuality only emits entries when
dispatch_size_distribution !== null; update the expectation in the test that
calls computeMetrics(events) to count only plan-authored events with a non-null
dispatch_size_distribution (e.g., filter events for type === 'plan-authored' &&
dispatch_size_distribution !== null) so that
planning_quality.dispatch_sizes.length matches that filtered count; reference
symbols: computeMetrics, computePlanningQuality,
planning_quality.dispatch_sizes, dispatch_size_distribution, and the
"plan-authored" event type.

---

Nitpick comments:
In `@skills-contrib/drive-diagnostics/SKILL.md`:
- Around line 25-27: Update the Markdown code fences for the shell examples so
they include a language specifier (bash); specifically change the fenced blocks
containing the commands "node skills-contrib/drive-diagnostics/cli.ts
<trace.jsonl> [--posthoc <transcript>] [--out <output.md>]" and "pnpm
drive:diagnose <trace.jsonl>" (and the other same-style blocks mentioned at
lines 31-33) to use ```bash at the start of each fence instead of plain ```,
leaving the command text unchanged.
- Around line 51-53: The fenced code block containing the single-line command
"npm install arktype" in SKILL.md should include a language specifier for shell;
update the backticks from ``` to ```bash so the block reads as a bash/shell code
block and adheres to Markdown best practices and static analysis rules.

In `@skills-contrib/drive-diagnostics/test/report.test.ts`:
- Line 3: The import statement for the AssertionResult type includes a .ts
extension; update the import in the test file so it uses the extensionless
module specifier (import type { AssertionResult } from '../assertions/types';)
to follow the repo's TypeScript import convention and avoid file extensions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: e12cae6a-b2cf-4ca9-8c4b-532abec3376e

📥 Commits

Reviewing files that changed from the base of the PR and between 0c98886 and 42a5b92.

⛔ Files ignored due to path filters (6)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
  • projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/manual-qa.md is excluded by !projects/**
  • projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/plan.md is excluded by !projects/**
  • projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/qa-run-01.md is excluded by !projects/**
  • projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/self-grade-report.md is excluded by !projects/**
  • projects/drive-instrumentation/slices/03-trace-reader-and-diagnostics/spec.md is excluded by !projects/**
📒 Files selected for processing (21)
  • drive/retro/findings.md
  • package.json
  • skills-contrib/drive-diagnostics/SKILL.md
  • skills-contrib/drive-diagnostics/assertions/brief.ts
  • skills-contrib/drive-diagnostics/assertions/cascade.ts
  • skills-contrib/drive-diagnostics/assertions/index.ts
  • skills-contrib/drive-diagnostics/assertions/invariants.ts
  • skills-contrib/drive-diagnostics/assertions/types.ts
  • skills-contrib/drive-diagnostics/cli.ts
  • skills-contrib/drive-diagnostics/load.ts
  • skills-contrib/drive-diagnostics/metrics.ts
  • skills-contrib/drive-diagnostics/posthoc.ts
  • skills-contrib/drive-diagnostics/report.ts
  • skills-contrib/drive-diagnostics/schema.ts
  • skills-contrib/drive-diagnostics/test/cascade-brief.test.ts
  • skills-contrib/drive-diagnostics/test/fixtures/sample-transcript.jsonl
  • skills-contrib/drive-diagnostics/test/invariants.test.ts
  • skills-contrib/drive-diagnostics/test/load.test.ts
  • skills-contrib/drive-diagnostics/test/metrics.test.ts
  • skills-contrib/drive-diagnostics/test/posthoc.test.ts
  • skills-contrib/drive-diagnostics/test/report.test.ts
💤 Files with no reviewable changes (8)
  • skills-contrib/drive-diagnostics/assertions/index.ts
  • skills-contrib/drive-diagnostics/load.ts
  • skills-contrib/drive-diagnostics/test/load.test.ts
  • skills-contrib/drive-diagnostics/assertions/types.ts
  • skills-contrib/drive-diagnostics/test/fixtures/sample-transcript.jsonl
  • skills-contrib/drive-diagnostics/schema.ts
  • skills-contrib/drive-diagnostics/assertions/cascade.ts
  • skills-contrib/drive-diagnostics/test/cascade-brief.test.ts
✅ Files skipped from review due to trivial changes (1)
  • drive/retro/findings.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • package.json

Comment thread skills-contrib/drive-diagnose-run/report.ts
Comment thread skills-contrib/drive-diagnostics/SKILL.md Outdated
Comment thread skills-contrib/drive-diagnose-run/test/metrics.test.ts
Rename the skill directory drive-diagnostics -> drive-diagnose-run to
match the drive-<verb>-<noun> convention of its siblings
(drive-record-traces, drive-qa-run, drive-run-retro); repoint the
drive:diagnose script, test globs, and all references.

Review fixes:
- SKILL.md states the actual Node engine requirement (24+).
- metrics dispatch_sizes test counts only plan-authored events with a
  non-null distribution, matching the implementation.

The .ts import extensions are intentionally retained: the tool runs via
Node native TypeScript execution, which does not resolve extensionless
relative imports (same convention as scripts/*.ts).

Signed-off-by: Will Madden <madden@prisma.io>
@wmadden
Copy link
Copy Markdown
Contributor Author

wmadden commented May 29, 2026

Addressed the latest review threads in ce8ea8ec4:

  • SKILL.md Node version — corrected "Node 22+" → "Node 24+" to match engines.node.
  • metrics.test.ts dispatch_sizes — expectation now counts only plan-authored events with a non-null dispatch_size_distribution, matching computePlanningQuality.
  • report.ts .ts import extensionsdeclined. This tool runs via Node's native TypeScript execution, which does not resolve extensionless relative .ts imports; the explicit .ts extensions are required at runtime (same convention as the existing node-run scripts/set-version.ts, scripts/determine-version.ts, scripts/bump-minor.ts). Removing them would break node …/cli.ts.

Also renamed the skill drive-diagnosticsdrive-diagnose-run to match the drive-<verb>-<noun> convention.

wmadden added 2 commits May 29, 2026 14:39
…mentation-s3-trace-reader

Signed-off-by: Will Madden <madden@prisma.io>

# Conflicts:
#	drive/retro/findings.md
@wmadden wmadden merged commit 2e645ea into main May 29, 2026
8 of 9 checks passed
@wmadden wmadden deleted the tml-2717-drive-instrumentation-s3-trace-reader branch May 29, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant