You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue was created from a legacy filesystem backlog card. GitHub Issues are now the live work tracker; repository docs remain Method evidence.
Source backlog: docs/method/backlog/cool-ideas/PLATFORM_test-timing-history.md
Original lane: cool-ideas
Original legend: PLATFORM
Original backlog card
Per-test timing history: never discover a 96-second test by vibes again
Legend: PLATFORM
The pain we just lived
The test_toy_contract_no_std_generated_output_checks_in_consumer_crate
integration test took ~96 seconds per run for many cycles before
anyone felt it strongly enough to act. That test is now ~1.14s after
the speedup work in PR #383, but the discovery path was wrong:
96-second test → multiple hours of compounded irritation → rage fix
That is barbarism. The cost should have been visible in numbers
before it became rage fuel.
(To be fair: rage fuel did ship PR #383. Don't make this a
recurring shipping mechanism.)
The proposal
Every test run records per-test wall-clock to a structured log:
Capture happens via a cargo test wrapper or a libtest reporter
hook (structured libtest JSON output is a nightly/unstable path;
on stable, use a wrapper strategy that records per-test timings
from supported output/reporting surfaces). On Node side (jedit), wrap the node --test JSON reporter.
The JSONL log is gitignored — it is local history, not shared
state.
The query surface
A new xtask slow-tests subcommand reads the log and surfaces
anomalies:
xtask slow-tests --top 20 # slowest tests overall
xtask slow-tests --branch current # slowest on current branch
xtask slow-tests --since 7d # slowest in the last week
xtask slow-tests --regressed-since 7d # got slower vs prior baseline
xtask slow-tests --test <name> --history # timing history for one test
The --regressed-since query is the load-bearing one. It catches
the case where a test was fast yesterday and is slow today —
exactly the signal that almost-shipped #383 weeks earlier than it
did.
Why this matters
Cost surfacing is the only sustainable defense against test-loop
decay. Optimizing once is not enough; you have to notice when it
starts decaying again.
Cost: one JSONL file per repo + a few hundred lines of xtask
code. Lower than the wins it protects.
Out of scope here
A CI pipeline that fails on regression. Local visibility first.
CI gating is the natural step after a useful baseline exists.
A flamegraph / span tracer for slow tests. Different tool; this
card is about discovery, not root-cause attribution.
Trigger / acceptance
Resolve this card when:
cargo test runs in echo produce / update target/test-timing- history.jsonl automatically (via wrapper or libtest reporter).
node --test runs in jedit do the same.
xtask slow-tests --top 20 returns the 20 slowest test
recordings, sorted by duration.
xtask slow-tests --regressed-since 7d returns tests whose
median duration over the past 7d exceeded their prior-7d median
by some configurable threshold.
Companion
docs/method/backlog/cool-ideas/PLATFORM_wesley-gen-test-loop-speedup.md
— this card is the regression-defense layer for that speedup work.
The speedup made the loop fast; this card keeps it that way.
Migrated from Method backlog
This issue was created from a legacy filesystem backlog card. GitHub Issues are now the live work tracker; repository docs remain Method evidence.
Source backlog:
docs/method/backlog/cool-ideas/PLATFORM_test-timing-history.mdOriginal lane:
cool-ideasOriginal legend:
PLATFORMOriginal backlog card
Per-test timing history: never discover a 96-second test by vibes again
Legend:
PLATFORMThe pain we just lived
The
test_toy_contract_no_std_generated_output_checks_in_consumer_crateintegration test took ~96 seconds per run for many cycles before
anyone felt it strongly enough to act. That test is now ~1.14s after
the speedup work in PR #383, but the discovery path was wrong:
That is barbarism. The cost should have been visible in numbers
before it became rage fuel.
(To be fair: rage fuel did ship PR #383. Don't make this a
recurring shipping mechanism.)
The proposal
Every test run records per-test wall-clock to a structured log:
One JSON object per test result, append-only:
{ "ts": "2026-05-31T02:02:29Z", "repo": "echo", "branch": "cycle/0025-sessions-as-causal-contexts", "test_binary": "echo-wesley-gen-generation", "test": "test_toy_contract_no_std_generated_output_checks_in_consumer_crate", "duration_ms": 1140, "status": "pass" }Capture happens via a
cargo testwrapper or a libtest reporterhook (structured libtest JSON output is a nightly/unstable path;
on stable, use a wrapper strategy that records per-test timings
from supported output/reporting surfaces). On Node side (jedit), wrap the
node --testJSON reporter.The JSONL log is gitignored — it is local history, not shared
state.
The query surface
A new
xtask slow-testssubcommand reads the log and surfacesanomalies:
The
--regressed-sincequery is the load-bearing one. It catchesthe case where a test was fast yesterday and is slow today —
exactly the signal that almost-shipped #383 weeks earlier than it
did.
Why this matters
decay. Optimizing once is not enough; you have to notice when it
starts decaying again.
visibility layer; otherwise the next slow test enters as
background friction and only escapes at rage-fuel volume.
code. Lower than the wins it protects.
Out of scope here
CI gating is the natural step after a useful baseline exists.
card is about discovery, not root-cause attribution.
Trigger / acceptance
Resolve this card when:
cargo testruns in echo produce / updatetarget/test-timing- history.jsonlautomatically (via wrapper or libtest reporter).node --testruns in jedit do the same.xtask slow-tests --top 20returns the 20 slowest testrecordings, sorted by duration.
xtask slow-tests --regressed-since 7dreturns tests whosemedian duration over the past 7d exceeded their prior-7d median
by some configurable threshold.
Companion
docs/method/backlog/cool-ideas/PLATFORM_wesley-gen-test-loop-speedup.md— this card is the regression-defense layer for that speedup work.
The speedup made the loop fast; this card keeps it that way.