Skip to content

perf(cache): passthrough uncached source files instead of tar-packing#127

Merged
raphaelvigee merged 2 commits into
masterfrom
perf/fs-file-passthrough-no-pack
Jun 25, 2026
Merged

perf(cache): passthrough uncached source files instead of tar-packing#127
raphaelvigee merged 2 commits into
masterfrom
perf/fs-file-passthrough-no-pack

Conversation

@raphaelvigee

@raphaelvigee raphaelvigee commented Jun 25, 2026

Copy link
Copy Markdown
Member

Problem

CI stat: //@heph/fs:file@f=mgmt/go/go.mod cache write 14s — a single tiny source file, uncached, taking 14s on the "cache write" phase.

Root cause: an uncached @heph/fs:file target (one per source file) tar-packs its single source file into the in-memory tmp cache during cache write. The pack does a synchronous file read inline on the tokio worker pool (block_or_inline is inline on Linux). At CI scale, thousands of these saturate the disk and starve the async runtime, so each tiny read stalls for seconds. The work is also pure waste — the artifact just re-exposes an immutable workspace file.

Fix

Produced outputs now travel the result pipeline as a new ResultArtifact { content: Arc<dyn Content>, group, r#type } instead of Vec<CacheArtifact>. A producer sets ContentFile.passthrough = true when source_path is a durable workspace file; execute_and_cache_inner partitions those outputs out of cache_locally entirely and carries them as their raw OutputArtifact — which already implements Content, walking to the single source file at its out_path.

Result: no file read, no tar, no copy, no manifest, no CacheArtifact, and no LocalCacheWrite span — so an @heph/fs:file no longer shows up as "cache write" at all. seekable_reader/file_path stay None, so the FUSE tar-index path is bypassed and consumers materialize via the generic unpack-from-walk() fallback.

Safety — gated two ways

  • Producer flag — other drivers' Content::File (pluginexec log, etc.) point into sandboxes cleaned after caching; a path ref would dangle. Only fs file/glob source artifacts set it.
  • tmp only — a cacheable revision must own a durable copy of its bytes, since source_path may change across runs.

Flag is Rust-only (no proto field yet) → out-of-process plugins always pack.

Tests

  • is_passthrough_gates_on_tmp_and_producer_flag — gating: only tmp + flagged Content::File; cacheable / unflagged / non-file all pack.
  • passthrough_result_artifact_reads_source_without_cache — a passthrough ResultArtifact is never a CacheArtifact, carries no seekable/file_path, and walk() yields the source content at out_path.

Local: lint clean (clippy -D warnings + fmt); engine + builtins + plugin-abi suites green (322 tests incl. 2 new). Full tst was disk-bound on the dev box (No space left on device in plugingo-e2e codegen tests that build the whole Go stdlib) — unrelated to this change, which reduces cache writes.

🤖 Generated with Claude Code

@raphaelvigee raphaelvigee force-pushed the perf/fs-file-passthrough-no-pack branch 2 times, most recently from e5091eb to 11d43f3 Compare June 25, 2026 07:50
An uncached `@heph/fs:file` target (one per source file) tar-packed its
single source file into the in-memory tmp cache on the "cache write" hot
path. The pack does a synchronous file read INLINE on the tokio worker
pool (`block_or_inline` is inline on Linux), so at CI scale thousands of
these saturate the disk and starve the runtime — a tiny go.mod "cache
write" was observed taking 14s. The work is also pure waste: the artifact
just re-exposes an immutable workspace file.

Carry produced outputs through the result pipeline as a new `ResultArtifact`
{ content: Arc<dyn Content>, group, r#type } instead of `Vec<CacheArtifact>`.
A producer sets `ContentFile.passthrough` when `source_path` is a durable
workspace file; `execute_and_cache_inner` then partitions such outputs out of
`cache_locally` entirely and carries them as their raw `OutputArtifact` (which
already implements `Content`, walking to the single source file at its
`out_path`). No file read, no tar, no copy, no manifest, no `CacheArtifact` —
and no LocalCacheWrite span, so it no longer shows up as "cache write" at all.
`seekable_reader`/`file_path` stay `None`, so the FUSE tar-index path is
bypassed and consumers materialize via the generic unpack-from-`walk()` path.

Gated two ways: the producer flag (other drivers' `Content::File` points into
sandboxes cleaned after caching, which would dangle) and `tmp` (a cacheable
revision must own a durable copy of its bytes, since `source_path` may change
across runs). The flag is Rust-only — it does not cross the plugin ABI yet, so
out-of-process plugins always pack.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@raphaelvigee raphaelvigee force-pushed the perf/fs-file-passthrough-no-pack branch from 11d43f3 to 20f033f Compare June 25, 2026 09:02
A passthrough source artifact (`@heph/fs:file`/`fs:glob`) is referenced by
path and read live on consume, never snapshotted into the cache. If the
workspace file is modified between when it was hashed (the value folded into
the target's `hashin` cache key) and when a consumer reads it, the live bytes
silently diverge from the cache key — poisoning every downstream entry.

Wrap the passthrough content in `PassthroughContent`, whose reader/walk tee
the bytes through a `VerifyingReader` that re-hashes as they stream (no extra
I/O — the consumer reads them anyway) and, at EOF, compares the digest against
the recorded `hashout`. Mismatch returns an explicit `InvalidData` error
naming the file, turning silent corruption into a hard failure. The hash is
byte-for-byte identical to `hwalk::file_hashout` (xxh3 over content + exec-bit
marker); a guard test pins them together.

`seekable_reader`/`file_path` stay `None`, so FUSE bails the slot to the
unpack-into-upper fallback (`io::copy` over `walk()`), routing FUSE through the
same verified copy as OS mode.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@raphaelvigee raphaelvigee enabled auto-merge (squash) June 25, 2026 21:24
@raphaelvigee raphaelvigee merged commit e2254df into master Jun 25, 2026
9 checks passed
@raphaelvigee raphaelvigee deleted the perf/fs-file-passthrough-no-pack branch June 25, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant