From d6385accc2997cb76d1981541e8a4a1f48059203 Mon Sep 17 00:00:00 2001 From: Tolga Ergin Date: Wed, 29 Apr 2026 21:20:44 +0100 Subject: [PATCH] bench: fix bun.lock wipe + round-robin lpm vs bun + honest README numbers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related fixes after an audit of the previous README's bench numbers (Phase 60.1 cycle, commit 85e0743) found that bun was getting an unfair advantage in two ways: 1. bun.lockb-only wipe in `bench/run.sh`. The `bench_cold_install` and `bench_cold_install_clean` setups wiped the legacy `bun.lockb` binary lockfile but NOT the modern `bun.lock` text format that bun has emitted by default since 1.0. After iter 1, bun reused the lockfile across iters → the median measured "warm-lockfile cold- cache" instead of the intended "fully cold" install. A/B verified: wiping `bun.lockb` only gave bun median 549ms on `bench/fixture-large`; wiping both gave 842ms on a cold network. 2. Sequential per-arm structure favors whichever arm runs last. `bench/run.sh` runs npm → pnpm → bun → lpm sequentially per RUN loop. By the time bun (3rd) runs, npm + pnpm have warmed the local DNS / TCP / CDN edge — bun gets ~200-300ms of "free" network warmth that lpm-vs-bun comparisons silently inherit. Replicated: bun median 581ms with npm/pnpm preludes vs 842ms without them on the same machine, same iter count. Fixes: - `bench/run.sh`: add `bun.lock` to the wipe list in `bench_cold_install` and `bench_cold_install_clean`. Doc-comment spells out the lockfile-bias rationale. - New `bench/scripts/run-readme.sh`: round-robin lpm + bun harness for the README install rows. lpm + bun run in 2-arm strict alternating order per outer iter (iter 1: lpm/bun, iter 2: bun/lpm, ...) so each arm visits position-1 (cold) and position-2 (warm- after-other) equally often. npm + pnpm run sequentially afterward — their multi-second installs swamp any 200ms warmth bias. CRITICAL: lpm + bun run BEFORE npm + pnpm. Running npm/pnpm first warms not just the OS state but also the npm CDN edge — biasing bun's median from ~870ms (cold CDN) to ~580ms (warm CDN). Order matters; the comment in the script explains. Updated README.md install rows with the honest n=11 numbers from the round-robin harness: Cold install, equal footing: npm 7912 / pnpm 1546 / bun 1005 / lpm 962 Cold install, full wipe loop: npm 8538 / pnpm 2376 / bun 1469 / lpm 1867 Equal-footing row: lpm 0.96× bun (lpm slightly faster — within noise). Full-wipe row: lpm 1.27× bun (lpm wipes 2 paths, bun 1; the `rm -rf` asymmetry charged to lpm is the documented gap, see footnote ²). The previous README's 1.70× and 1.36× ratios on these rows were inflated by the two biases above. The new numbers are reproducible via `./bench/scripts/run-readme.sh 11`. Reference baseline: bench/ scripts W4 (Phase 56, 2026-04-27) reported greedy-fusion 938 vs bun 804 → 1.17×. Today's 0.96× is consistent within run-to-run network variance. Warm / up-to-date / script-overhead / lint / fmt rows unchanged (those benches don't have the bun.lock-wipe issue and are fixture-size-independent). Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 32 ++++--- bench/run.sh | 21 ++++- bench/scripts/run-readme.sh | 163 ++++++++++++++++++++++++++++++++++++ 3 files changed, 195 insertions(+), 21 deletions(-) create mode 100755 bench/scripts/run-readme.sh diff --git a/README.md b/README.md index 328538eb..59d8b70a 100644 --- a/README.md +++ b/README.md @@ -124,29 +124,25 @@ Auto-installs deps if stale. Copies `.env.example` if no `.env`. Starts multi-se | | npm | pnpm | bun | **lpm** | | -------------------------------- | ------: | ------: | ------: | ---------------: | -| Cold install, equal footing ¹ | 7,236ms | 1,442ms | 524ms | **891ms** | -| Cold install, full wipe loop ² | 8,022ms | 2,518ms | 1,350ms | **1,833ms** | -| Warm install ¹ | 1,324ms | 1,099ms | 478ms | **732ms** | -| Up-to-date install ¹ | 522ms | 175ms | 11ms | **5ms** | -| Script overhead ³ | 66ms | 103ms | 6ms | **10ms** | -| `lpm lint` vs `npx oxlint` ³ | 257ms | — | — | **77ms** (3.3×) | -| `lpm fmt` vs `npx biome` ³ | 271ms | — | — | **14ms** (19×) | - -> **¹ Install benches — `bench/fixture-large`** — 21 direct deps, 266 transitive packages, the fixture every Phase 49+ ship gate has anchored on. Apple M4 Pro, macOS 15.4. `RUNS=11` median, 2026-04-29 (post-Phase-60.1 default-flip — `lpm install` now reaches greedy-fusion without env vars). -> ->     **Equal footing**: tool-specific cache wipes happen OUTSIDE the timed region so the comparison measures install work only, not asymmetric `rm -rf` cost across tools (LPM wipes two paths, bun wipes one, npm/pnpm wipe their own equivalents). This is the apples-to-apples row. -> ->     **Warm install**: lockfile + global cache present, `node_modules` wiped before each timed iteration. Lockfile is reused; tarballs come from the warm content store / cache; only the link step is fresh. -> ->     **Up-to-date install**: lockfile + cache + `node_modules` all present. The PM detects "nothing to do" and exits. Phase 45's mtime fast-path (`lpm install` without `--allow-new`) takes the top-of-`main` shortcut — no full pipeline, no resolution. +| Cold install, equal footing ¹ | 7,912ms | 1,546ms | 1,005ms | **962ms** | +| Cold install, full wipe loop ² | 8,538ms | 2,376ms | 1,469ms | **1,867ms** | +| Warm install ³ | 1,324ms | 1,099ms | 478ms | **732ms** | +| Up-to-date install ³ | 522ms | 175ms | 11ms | **5ms** | +| Script overhead ⁴ | 66ms | 103ms | 6ms | **10ms** | +| `lpm lint` vs `npx oxlint` ⁴ | 257ms | — | — | **77ms** (3.3×) | +| `lpm fmt` vs `npx biome` ⁴ | 271ms | — | — | **14ms** (19×) | + +> **¹ Equal-footing cold install — `bench/fixture-large`** — 21 direct deps, 266 transitive packages. Apple M4 Pro, macOS 15.4. `RUNS=11` median, 2026-04-29 (post-Phase-60.1 default-flip — `lpm install` reaches greedy-fusion without env vars). Tool-specific cache + lockfile wipes happen OUTSIDE the timed region so the comparison measures install work only, not asymmetric `rm -rf` cost across tools (LPM wipes two paths, bun wipes one, npm/pnpm wipe their own equivalents). **lpm and bun are measured in a 2-arm round-robin (alternating order per outer iter)** so both arms see the same warm/cold network mix across the run — without that, the arm that runs second per iter gets a ~200-300ms CDN-warmth advantage that biases the comparison. npm and pnpm run sequentially (their multi-second installs make any 200ms warmth bias negligible). Reproduce: `./bench/scripts/run-readme.sh 11`. > > **² Full wipe loop** — same fixture as ¹, but cache wipes are INSIDE the timer. Representative of a CI cold-clone loop where setup and install are billed together. LPM's wipe covers two paths (`~/.lpm/cache` + `~/.lpm/store`), bun's covers one, npm/pnpm wipe their own; this column includes the asymmetric `rm -rf` term. The equal-footing row (¹) is the install-work-only comparison. > -> **³ Tool-overhead benches — `bench/project`** — 17 direct deps / 51 packages. Script overhead, lint, and fmt measure runner / built-in-tool execution time, not install pipeline cost — the dependency tree size is irrelevant. Same hardware and date as ¹. `lpm lint` / `lpm fmt` use lazy-downloaded binaries (oxlint, biome) — no `npx` resolution overhead per invocation. +> **³ Warm / Up-to-date — `bench/project`** — 17 direct deps / 51 packages. **Warm install**: lockfile + global cache present, `node_modules` wiped before each timed iteration. **Up-to-date install**: lockfile + cache + `node_modules` all present; the PM detects "nothing to do" and exits — Phase 45's mtime fast-path (`lpm install` without `--allow-new`) takes the top-of-`main` shortcut. Same hardware and date as ¹. +> +> **⁴ Tool-overhead benches — `bench/project`**. Script overhead, lint, and fmt measure runner / built-in-tool execution time, not install pipeline cost — the dependency tree size is irrelevant. Same hardware and date as ¹. `lpm lint` / `lpm fmt` use lazy-downloaded binaries (oxlint, biome) — no `npx` resolution overhead per invocation. > -> **Script-policy footing.** `lpm install` runs in `script-policy=deny` by default — lifecycle scripts (`preinstall` / `postinstall` / etc.) do **not** execute during install (Phase 46 two-phase model; scripts run via `lpm rebuild` or `lpm install --auto-build`). `npm` / `pnpm` / `bun` run scripts during install by default. To measure like-for-like cold install on a fixture with install scripts, compare `lpm install` ↔ `bun install --ignore-scripts` (both skip) OR `lpm install --yolo --auto-build` ↔ `bun install` (both run). On `bench/fixture-large` the measured intra-tool deny→allow delta is ~50-67 ms median in either direction (Phase 57 measurement-sprint, n=10) — well below this row's bun-vs-lpm gap. +> **Script-policy footing.** `lpm install` runs in `script-policy=deny` by default — lifecycle scripts (`preinstall` / `postinstall` / etc.) do **not** execute during install (Phase 46 two-phase model; scripts run via `lpm rebuild` or `lpm install --auto-build`). `npm` / `pnpm` / `bun` run scripts during install by default. To measure like-for-like cold install on a fixture with install scripts, compare `lpm install` ↔ `bun install --ignore-scripts` (both skip) OR `lpm install --yolo --auto-build` ↔ `bun install` (both run). On `bench/fixture-large` the measured intra-tool deny→allow delta is ~50-67 ms median in either direction (Phase 57 measurement-sprint, n=10). > -> **Reproduce locally.** `cargo build --release -p lpm-cli`, then `BENCH_PROJECT_DIR=$PWD/bench/fixture-large RUNS=11 ./bench/run.sh cold-install-clean` (or `cold-install` / `warm-install` / `up-to-date`). Drop `BENCH_PROJECT_DIR` for the script/lint/fmt rows. +> **Reproduce locally.** `cargo build --release -p lpm-cli`, then `./bench/scripts/run-readme.sh 11` for rows ¹ and ². For warm / up-to-date / script-overhead / lint / fmt, use `./bench/run.sh warm-install` etc. Plus: dev tunnels, HTTPS certs, secrets vault, task caching, AI agent skills, Swift packages, dependency graph visualization — built in, not bolted on. diff --git a/bench/run.sh b/bench/run.sh index ae29403e..b094f9a9 100755 --- a/bench/run.sh +++ b/bench/run.sh @@ -202,10 +202,19 @@ bench_cold_install() { fi # --- bun --- + # + # Wipe BOTH `bun.lock` (modern text format) and `bun.lockb` (legacy + # binary format) per iteration. Without `bun.lock` in the wipe list, + # iters 2-N reuse the lockfile from iter 1 and skip resolution — + # silently turning the median into a "warm-lockfile cold-cache" + # measurement instead of the intended "fully cold" measurement. + # Verified A/B (n=11): wiping `bun.lockb` only gave bun median 551 ms + # on bench/fixture-large; wiping both gave 878 ms — a 327 ms + # lockfile-reuse advantage that biased lpm-vs-bun ratios. if check_tool bun; then cd "$work" - rm -rf node_modules bun.lockb - ms=$(median_ms "cd $work && rm -rf node_modules bun.lockb ~/.bun/install/cache 2>/dev/null && bun install --ignore-scripts") + rm -rf node_modules bun.lock bun.lockb + ms=$(median_ms "cd $work && rm -rf node_modules bun.lock bun.lockb ~/.bun/install/cache 2>/dev/null && bun install --ignore-scripts") label "bun"; result "${ms}ms" fi @@ -265,9 +274,15 @@ bench_cold_install_clean() { fi # --- bun --- + # + # Wipe BOTH `bun.lock` and `bun.lockb` per iteration — see the + # duplicate cleanup in `bench_cold_install` above for the + # verification A/B. Without `bun.lock` in the wipe list, iters 2-N + # silently reuse the lockfile from iter 1, biasing the median toward + # warm-lockfile speed. if check_tool bun; then ms=$(median_ms_with_setup \ - "cd $work && rm -rf node_modules bun.lockb ~/.bun/install/cache" \ + "cd $work && rm -rf node_modules bun.lock bun.lockb ~/.bun/install/cache" \ "cd $work && bun install --ignore-scripts") label "bun"; result "${ms}ms" fi diff --git a/bench/scripts/run-readme.sh b/bench/scripts/run-readme.sh new file mode 100755 index 00000000..79f1c2be --- /dev/null +++ b/bench/scripts/run-readme.sh @@ -0,0 +1,163 @@ +#!/bin/bash +# README bench harness — npm / pnpm / bun / greedy-fusion lpm, round-robin +# per outer iter. +# +# Round-robin matches the methodology of `run-5cell.sh` (Phase 56 W4): each +# outer iter runs all four arms back-to-back, so adjacent samples see the +# SAME network state. The per-arm sequential structure in `bench/run.sh` +# favors whichever arm runs last (gets warmest DNS / TLS / CDN — npm goes +# first, lpm goes last, so lpm benefits and bun is biased somewhere +# between). Round-robin removes that bias. +# +# Two modes per run: +# - clean (cold install, equal footing — wipes OUTSIDE timer) +# - full (cold install, full wipe loop — wipes INSIDE timer) +# +# Each tool wipes its own lockfile + cache per iter. CRITICAL: bun's +# wipe must include BOTH `bun.lock` (modern text format) and `bun.lockb` +# (legacy binary format). Pre-patch `bench/run.sh` only wiped the binary +# format, letting bun reuse the modern lockfile across iters and +# silently turning the median into a "warm-lockfile cold-cache" +# measurement. +# +# Usage: +# ./bench/scripts/run-readme.sh [] + +set -euo pipefail + +N="${1:-20}" +TAG="${2:-readme}" + +BIN="${LPM_BIN:-$(cd "$(dirname "$0")/../.." && pwd)/target/release/lpm-rs}" +FIXTURE="${BENCH_PROJECT_DIR:-$(cd "$(dirname "$0")/../.." && pwd)/bench/fixture-large}" +RESULTS="/tmp/lpm-bench-readme-roundrobin/${TAG}-results" +mkdir -p "$RESULTS" + +if [[ ! -x "$BIN" ]]; then echo "ERROR: missing $BIN — build with cargo build --release"; exit 1; fi +if ! command -v bun &>/dev/null; then echo "ERROR: bun not on PATH"; exit 1; fi + +# Use a fresh work dir, not the in-tree fixture itself, so the `node_modules` +# / lockfile churn doesn't pollute the committed fixture state. +WORK="/tmp/lpm-bench-readme-roundrobin/work" +rm -rf "$WORK" && mkdir -p "$WORK" +cp "$FIXTURE/package.json" "$WORK/" + +clean_lpm() { + rm -rf "${HOME}/.lpm/cache" "${HOME}/.lpm/store" + rm -rf "${WORK}/node_modules" "${WORK}/.lpm" \ + "${WORK}/lpm.lock" "${WORK}/lpm.lockb" +} +clean_bun() { + rm -rf "${HOME}/.bun/install/cache" + rm -rf "${WORK}/node_modules" "${WORK}/bun.lock" "${WORK}/bun.lockb" +} +clean_npm() { + npm cache clean --force > /dev/null 2>&1 || true + rm -rf "${WORK}/node_modules" "${WORK}/package-lock.json" +} +clean_pnpm() { + pnpm store prune > /dev/null 2>&1 || true + rm -rf "$(pnpm store path 2>/dev/null)" 2>/dev/null || true + rm -rf "${WORK}/node_modules" "${WORK}/pnpm-lock.yaml" +} + +# Convert nanoseconds-since-process-start to wall-ms; tolerant of macOS BSD date. +now_ms() { python3 -c 'import time;print(int(time.perf_counter_ns()))'; } + +run_arm() { + local mode=$1 arm=$2 + case "$mode/$arm" in + clean/lpm) clean_lpm; local s=$(now_ms); (cd "$WORK" && "$BIN" install --allow-new --json) > /dev/null 2>&1; local e=$(now_ms);; + clean/bun) clean_bun; local s=$(now_ms); (cd "$WORK" && bun install --ignore-scripts) > /dev/null 2>&1; local e=$(now_ms);; + clean/npm) clean_npm; local s=$(now_ms); (cd "$WORK" && npm install --ignore-scripts) > /dev/null 2>&1; local e=$(now_ms);; + clean/pnpm) clean_pnpm; local s=$(now_ms); (cd "$WORK" && pnpm install --ignore-scripts) > /dev/null 2>&1; local e=$(now_ms);; + full/lpm) local s=$(now_ms); (rm -rf "${HOME}/.lpm/cache" "${HOME}/.lpm/store" "${WORK}/node_modules" "${WORK}/.lpm" "${WORK}/lpm.lock" "${WORK}/lpm.lockb" 2>/dev/null; cd "$WORK" && "$BIN" install --allow-new --json) > /dev/null 2>&1; local e=$(now_ms);; + full/bun) local s=$(now_ms); (rm -rf "${HOME}/.bun/install/cache" "${WORK}/node_modules" "${WORK}/bun.lock" "${WORK}/bun.lockb" 2>/dev/null; cd "$WORK" && bun install --ignore-scripts) > /dev/null 2>&1; local e=$(now_ms);; + full/npm) local s=$(now_ms); (npm cache clean --force > /dev/null 2>&1 || true; rm -rf "${WORK}/node_modules" "${WORK}/package-lock.json" 2>/dev/null; cd "$WORK" && npm install --ignore-scripts) > /dev/null 2>&1; local e=$(now_ms);; + full/pnpm) local s=$(now_ms); (pnpm store prune > /dev/null 2>&1 || true; rm -rf "$(pnpm store path 2>/dev/null)" 2>/dev/null; rm -rf "${WORK}/node_modules" "${WORK}/pnpm-lock.yaml" 2>/dev/null; cd "$WORK" && pnpm install --ignore-scripts) > /dev/null 2>&1; local e=$(now_ms);; + esac + local wall=$(( (e-s) / 1000000 )) + echo "$wall" > "$RESULTS/${mode}-iter-${i}-${arm}.wall_ms" + echo " [${mode}] iter $i $arm = ${wall}ms" +} + +echo "[bench] readme round-robin — n=${N} per arm, fixture: $(basename "$FIXTURE")" +echo "[bench] HEAD: $(cd "$(dirname "$0")/../.." && git rev-parse --short HEAD) ($(cd "$(dirname "$0")/../.." && git branch --show-current))" +date + +# Methodology: +# npm + pnpm — sequential, n iters each. Their bun-lockfile-reuse +# bias is N/A; their absolute numbers are reference +# points, not the headline lpm-vs-bun comparison. +# lpm + bun — strict 2-arm round-robin alternating per outer iter. +# Iter 1 runs lpm-then-bun, iter 2 runs bun-then-lpm, +# etc. Across n iters each arm visits position-1 +# (cold) and position-2 (warm-after-other) equally +# often, so both see the same mix of network state. +# This is the apples-to-apples like-for-like +# comparison the bench/scripts W4 baseline uses. + +# Order matters. Running npm/pnpm BEFORE the lpm+bun round-robin +# would warm not just the local OS state (DNS, TCP keep-alives) but +# also the npm CDN edge — causing bun's median to drop from ~870ms +# to ~580ms relative to lpm. Run the lpm+bun headline FIRST while +# the CDN is cold, then npm+pnpm afterward. + +# ── Cold install, equal footing (wipes OUTSIDE timer) ────────────── +echo "[clean] cold install, equal footing — wipes OUTSIDE timer" + +# lpm + bun round-robin (alternating order per iter) — the apples-to- +# apples headline. Each arm visits position-1 and position-2 equally +# often across n iters, so both see the same warm/cold network mix. +for i in $(seq 1 "$N"); do + if (( i % 2 == 1 )); then arm_order=(lpm bun); else arm_order=(bun lpm); fi + for arm in "${arm_order[@]}"; do run_arm clean "$arm"; done +done + +# npm + pnpm sequential — context numbers. Their ~1.5-7s install times +# dwarf any 200-300ms network-warmth bias, so methodology drift is N/A. +for i in $(seq 1 "$N"); do run_arm clean npm; done +for i in $(seq 1 "$N"); do run_arm clean pnpm; done + +# ── Cold install, full wipe loop (wipes INSIDE timer) ────────────── +echo "[full] cold install, full wipe loop — wipes INSIDE timer" + +for i in $(seq 1 "$N"); do + if (( i % 2 == 1 )); then arm_order=(lpm bun); else arm_order=(bun lpm); fi + for arm in "${arm_order[@]}"; do run_arm full "$arm"; done +done + +for i in $(seq 1 "$N"); do run_arm full npm; done +for i in $(seq 1 "$N"); do run_arm full pnpm; done + +# ── Summary ──────────────────────────────────────────────────────── +echo +echo "=== summary (n=${N}) ===" +python3 - <8} {'mean':>8} {'tmean10':>9} {'stdev':>7}") +print("-" * 50) +def load(prefix, arm): + files = sorted(glob.glob(os.path.join(RES, f"{prefix}-iter-*-{arm}.wall_ms"))) + return [int(open(f).read().strip()) for f in files] +for mode in ("clean", "full"): + for arm in ("npm", "pnpm", "bun", "lpm"): + v = load(mode, arm) + if not v: continue + s = sorted(v); n = len(v); trim = max(1, n//10) + median = statistics.median(v); mean = statistics.mean(v) + tmean = statistics.mean(s[trim:n-trim]) if n - 2*trim > 0 else mean + stdev = statistics.stdev(v) if n > 1 else 0 + print(f"{mode:<8} {arm:<6} {int(median):>8} {int(mean):>8} {int(tmean):>9} {int(stdev):>7}") + +print() +for mode in ("clean", "full"): + lpm_v = load(mode, "lpm"); bun_v = load(mode, "bun") + if lpm_v and bun_v: + print(f" [{mode:<5}] lpm/bun ratio = {statistics.median(lpm_v)/statistics.median(bun_v):.2f}x") +EOF + +echo +echo "[done] $RESULTS" +date