Skip to content

fix: prevent repeated failed rebundle attempts and TOCTOU on cache fallback#878

Merged
stack72 merged 1 commit intomainfrom
fix/bundle-cache-rebundle-perf
Mar 26, 2026
Merged

fix: prevent repeated failed rebundle attempts and TOCTOU on cache fallback#878
stack72 merged 1 commit intomainfrom
fix/bundle-cache-rebundle-perf

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Mar 26, 2026

Summary

  • When a rebundle fails for pulled extensions with bare specifiers, touch the cached bundle's mtime so subsequent loads short-circuit at the freshness check instead of re-invoking the Deno bundler on every cold start
  • Fix a pre-existing TOCTOU race where the cache file could be deleted between the initial Deno.stat() and the fallback Deno.readTextFile()
  • Applied consistently to all 4 extension loaders: models, vaults, drivers, and datastores

Problem

PR #876 changed extension loaders to attempt rebundling before falling back to cached bundles, fixing a bug where stale caches were served even when source files had changed. However, CI review identified a performance regression:

For pulled extensions with bare specifiers (e.g. from "zod" resolved via a project's deno.json), repos without the import map can't rebundle. The old code detected bare specifiers upfront and returned the cache immediately. The new code attempts bundleExtension(), which invokes the Deno bundler, waits for it to fail, then falls back — adding seconds of latency per extension on every load.

This is because the cache file's mtime was never updated after a fallback, so the next mtime check would see the cache as stale again and trigger the same failed rebundle cycle.

User Impact

Users with many pulled extensions using bare specifiers (e.g. @swamp/hetzner-cloud with 11 models) would see significantly slower cold starts — the Deno bundler would be invoked and fail 11 times on every swamp command, adding potentially 10+ seconds of startup latency.

Fix

Mtime touch: After falling back to the cached bundle on rebundle failure, Deno.utime() touches the cache file's mtime. The next load sees the cache as "fresh" and returns it immediately at the mtime check — no bundler invocation.

  • First load with stale bare-specifier bundle: tries rebundle → fails → falls back → touches cache (one-time cost)
  • Subsequent loads: mtime check sees cache as fresh → returns immediately (no cost)
  • If source changes again later: mtime check detects newer source → tries rebundle again

TOCTOU fix: The bundleExists flag was set via Deno.stat() early in the method, but the fallback Deno.readTextFile() runs in a later catch block. If the bundle is deleted between the stat and the read (e.g. concurrent garbage collection), readTextFile would throw unhandled. The fallback read is now wrapped in a try/catch — if the file is gone, it falls through to throw the original bundleError.

Test plan

  • deno check passes on all 4 loaders
  • deno lint passes on all 4 loaders
  • deno fmt passes
  • 3562 unit tests pass
  • Verified pulled extensions with bare specifiers still load (@swamp/hetzner-cloud — 11 models in a repo without deno.json)
  • Verified stale bundles are replaced when source changes and deno.json is present (local dev scenario)

🤖 Generated with Claude Code

…llback

When a rebundle fails (e.g. pulled extensions with bare specifiers and no
deno.json import map), the fallback to the cached bundle left the cache
mtime unchanged. This meant every subsequent cold start would re-detect
the bundle as stale, invoke the Deno bundler, wait for it to fail, then
fall back again — adding seconds of latency per extension on every load.

Touch the cache file's mtime after a successful fallback so the next
mtime check sees it as fresh and short-circuits immediately.

Also fix a pre-existing TOCTOU race: bundleExists was set via Deno.stat()
early in the method, but the fallback Deno.readTextFile() runs later in a
catch block. If the bundle is deleted between the stat and the read (e.g.
concurrent GC), readTextFile would throw unhandled. Wrap the fallback read
in a try/catch so deletion between stat and read falls through to the
original bundleError instead of crashing.

Applied consistently to all 4 extension loaders: models, vaults, drivers,
and datastores.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None.

Medium

None.

Low

  1. Code duplication across 4 loaders — The identical 11-line catch block is now copy-pasted in user_model_loader.ts:474-492, user_vault_loader.ts:257-270, user_driver_loader.ts:250-270, and user_datastore_loader.ts:252-270. This is pre-existing duplication (the method was already copy-pasted), so it's not introduced by this PR, but worth noting that a future refactor to extract bundleWithCache into a shared utility would reduce the surface area for divergence bugs.

  2. No new test coverage for the TOCTOU path or the utime touch — The existing tests don't exercise the scenario where readTextFile fails after stat succeeded (TOCTOU), or verify that utime is called after a fallback. These are defensive paths that are hard to unit-test without mocking Deno APIs, so this is informational rather than blocking.

Verdict

PASS — The changes are correct and well-reasoned. The read-before-utime ordering ensures the cached content is captured even if the file disappears before the mtime touch. The inner try/catch around Deno.utime correctly treats touch failure as non-fatal. The outer try/catch around readTextFile correctly falls through to throw bundleError when the cache is gone (TOCTOU). No security, data integrity, or logic concerns.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. Duplicated logic across 4 loaders: The cache fallback + mtime touch + TOCTOU guard is now a ~15-line block duplicated identically in all 4 files. Consider extracting a shared helper (e.g. fallbackToCachedBundle(bundlePath, relativePath, logger)) to reduce duplication. Not blocking since the current approach is consistent and correct.

  2. Missing user_driver_loader_test.ts: The driver loader is the only one of the 4 without a test file. Pre-existing, but worth noting as a gap.

  3. No unit tests for the new utime/TOCTOU paths: The mtime touch and TOCTOU catch are defensive and hard to test without filesystem stubs, so this is understandable — but if the project ever adds filesystem abstraction, these would be good candidates.

Overall: well-scoped fix with clear comments, correct error handling, and no security concerns. The approach of touching mtime to short-circuit subsequent loads is sound, and the TOCTOU fix properly handles the race condition by falling through to throw the original error.

@stack72 stack72 merged commit 18a9b7a into main Mar 26, 2026
10 checks passed
@stack72 stack72 deleted the fix/bundle-cache-rebundle-perf branch March 26, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant