Skip to content

feat(depot-client): add vfs staging cache ttl#4930

Closed
NathanFlurry wants to merge 1 commit intosqlite-soak/wire-optimization-flagsfrom
sqlite-soak/vfs-staging-cache-ttl
Closed

feat(depot-client): add vfs staging cache ttl#4930
NathanFlurry wants to merge 1 commit intosqlite-soak/wire-optimization-flagsfrom
sqlite-soak/vfs-staging-cache-ttl

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 4, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

Code Review: feat(depot-client): add vfs staging cache ttl

Overview

This PR converts the SQLite VFS page cache from a general-purpose second-level cache into a short-lived staging cache for speculative (prefetch/startup) pages. Target pages fetched for xRead are no longer retained after SQLite receives them; speculative pages are evicted on first target read with the TTL as a fallback. The design follows the spec in .agent/specs/sqlite-vfs-staging-cache-ttl.md and the approach is sound.


Issues

Dead code after the refactor

  • SqliteVfsPageCacheMode::caches_target_pages() is now unreachable. should_cache_page unconditionally returns false for PageCacheInsertKind::Target, so the mode variants Target, Startup, Prefetch, and All no longer gate target-page caching. The method should be removed or the docs should note the mode is now a kill switch for speculative pages only.

  • vfs_protected_cache_pages in SqliteOptimizationFlags is parsed from env, validated, and stored but VfsConfig::from_optimization_flags hardcodes protected_cache_pages: 0 regardless of its value. An operator setting RIVETKIT_SQLITE_OPT_VFS_PROTECTED_CACHE_PAGES will silently have no effect. Either remove the field/env constant or add a deprecation note.

committed_page_cache uses the same TTL as page_cache

Both are built with build_page_cache(config), so committed dirty pages and speculative prefetch pages share staging_cache_ttl_ms. Speculative pages that SQLite never requests should expire quickly; committed pages need to survive long enough for SQLite to re-read them if its internal pager evicts them. Consider building committed_page_cache without a TTL (or with a longer one) so a tight speculative TTL does not also cause committed-page eviction races.

cache_committed_page is a no-op when staging_cache_ttl_ms == 0

When TTL is 0, committed dirty pages are not staged anywhere. If SQLite pager evicts a recently-written page and re-issues an xRead, the VFS will incur a network round-trip for data the actor just wrote. The spec says staging_cache_ttl_ms = 0 disables speculative retention. It is worth deciding explicitly whether it should also disable committed-page staging, and documenting that decision with a test or comment.

Page-1 synthesis is duplicated

The empty-database page-1 synthesis now lives in two separate arms of resolve_pages:

  1. Success path: fetched.bytes.is_none() && commit_total == 0 && pgno == 1
  2. Error path: is_initial_main_page_missing(&error.message)

Both paths independently re-derive the same invariant. Extracting the guard into a shared helper would keep them in sync.

Unused _protected_page_cache parameter

The standalone cache_page function takes _protected_page_cache: &SccHashMap<u32, Vec<u8>> but never uses it. The parameter can be dropped from the signature entirely.


Test coverage gaps

The three new tests cover the main TTL/eviction matrix well. Still missing:

  • Committed-page read-back: Write data, commit, evict from SQLite pager, verify xRead returns the committed bytes without a network round-trip. This is the core correctness guarantee for committed_page_cache.
  • VFS_PAGE_CACHE_MODE=off still fetches lazily: Mentioned in the spec under Tests but not in the added suite.
  • TTL expiry with time control: The spec suggests tokio::time::pause / advance for deterministic TTL assertions.

Minor notes

  • The seed_page_as_cold_ref_for_harness_test fix is correct since cached_page now looks across all cache tiers. The updated error message from "strict VFS cache" to "VFS cache" correctly reflects this broader lookup.
  • The added tracing::error! in io_read for GetPagesError::Other is a welcome improvement; it surfaces page-fetch failures that were previously silent at the call site.
  • for pgno in requested_pages.iter().copied() cleanup is correct.

Summary

The staging-cache model is the right direction and the implementation matches the spec. Main items to address before merge: the shared TTL for committed vs. speculative pages, the silent no-op for committed pages when TTL=0, the dead caches_target_pages / vfs_protected_cache_pages code, and the missing committed-page read-back test.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

Code Review: feat(depot-client): add vfs staging cache ttl

Overview

This PR converts the SQLite VFS page cache from a persistent second-level pager into a short-lived staging cache. Target pages fetched for xRead are no longer retained in VFS memory; speculative (prefetch and startup) pages are cached with a configurable TTL (default 30 s) and evicted when SQLite first reads them. Post-commit dirty pages go into a separate committed_page_cache with the same TTL. The direction is sound. Retaining target pages permanently was redundant since SQLite's internal pager already owns them.

Issues

  1. CLAUDE.md violation: underscore-prefixed unused parameter

The module-level cache_page function accepts _protected_page_cache: &SccHashMap<u32, Vec> (vfs.rs:899) but never uses it. CLAUDE.md says to avoid backwards-compatibility hacks like renaming unused _vars. Remove the parameter from both the module-level function and the VfsState::cache_page call site that still passes &self.protected_page_cache.

  1. Dead field: protected_page_cache is never written to

Nothing writes into protected_page_cache after this PR. The eviction in evict_target_read_pages and the clear in invalidate_page_cache are therefore no-ops. Either remove the field entirely or document why it is kept.

  1. Double effective cache capacity

build_page_cache is called twice with the same config.cache_capacity_pages, so the total VFS memory ceiling is now 2x the configured limit. This is probably intentional but is undocumented and could surprise operators relying on VFS_PAGE_CACHE_CAPACITY_PAGES to bound memory.

  1. vfs_protected_cache_pages env var is silently ignored

SqliteOptimizationFlags still parses RIVETKIT_SQLITE_OPT_VFS_PROTECTED_CACHE_PAGES, but VfsConfig::from_optimization_flags unconditionally sets protected_cache_pages: 0 and never reads that field. Either deprecate the env var or emit a warning when it is non-zero.

Missing test coverage

The spec listed five tests; three are present. The two absent:

  • VFS_PAGE_CACHE_MODE=off still lazily fetches pages.
  • TTL=0 lazy target fetches work end-to-end. vfs_staging_cache_ttl_zero_disables_speculative_retention only checks state-level cache membership, not that a real resolve_pages / xRead round-trip succeeds.

There is also no test that dirty pages after a commit are readable from committed_page_cache without a depot round-trip.

Minor observations

  • unsafe zeroed in test: evicted_empty_page_one_can_be_synthesized_before_first_commit passes std::mem::zeroed() to VfsContext::new. A comment explaining what is zeroed and why it is safe would help.
  • Good fix: changing for pgno in requested_pages to for pgno in requested_pages.iter().copied() is a necessary correctness fix. The old form consumed the Vec, making requested_pages unavailable for the subsequent evict_target_read_pages call.
  • Good addition: the structured error log in io_read before mark_dead follows CLAUDE.md conventions (actor_id as a structured field, lowercase message).
  • Page-1 bootstrap: the two-site synthesize logic looks correct and is well-tested by evicted_empty_page_one_can_be_synthesized_before_first_commit.

Summary

The staging-cache direction is right. Before merging: remove the _protected_page_cache parameter, clean up or remove the dead protected_page_cache field, document or halve the doubled capacity, and either deprecate VFS_PROTECTED_CACHE_PAGES or emit a warning when it is set. Adding the two missing spec tests would also be valuable.

@NathanFlurry
Copy link
Copy Markdown
Member Author

Landed in main via stack-merge fast-forward push. Commits are in main; closing to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant