Skip to content

fix(depot-client): bound sqlite vfs cache#4908

Closed
NathanFlurry wants to merge 1 commit intomainfrom
sqlite-soak/depot-vfs-cache-metrics
Closed

fix(depot-client): bound sqlite vfs cache#4908
NathanFlurry wants to merge 1 commit intomainfrom
sqlite-soak/depot-vfs-cache-metrics

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app Bot commented May 3, 2026

🚅 Deployed to the rivet-pr-4908 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-cloud ❌ Build Failed (View Logs) Web May 3, 2026 at 10:48 pm
mcp-hub ✅ Success (View Logs) Web May 3, 2026 at 10:48 pm
frontend-inspector ❌ Build Failed (View Logs) Web May 3, 2026 at 10:48 pm
website ✅ Success (View Logs) Web May 3, 2026 at 10:47 pm
kitchen-sink ❌ Build Failed (View Logs) Web May 3, 2026 at 10:47 pm
ladle 🕒 Building (View Logs) Web May 3, 2026 at 10:47 pm

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 3, 2026

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 3, 2026

Code Review: fix(depot-client): bound sqlite vfs cache

Overview

This PR makes several significant changes across three concerns:

  1. SQLite VFS page cache bounding — reduces the default cache from 50,000 to 4,096 pages (~200 MB → ~16 MB at 4 KB/page), adds a SqliteVfsPageCacheMode to gate which page types are cached, evicts post-EOF pages after vacuum/truncate, and seeds the main page (page 1) from the envoy before VFS registration.
  2. Preload hint flush removal — entirely stubs out the periodic and shutdown preload hint flush from rivetkit-core.
  3. UniversalDB / RocksDB housekeeping — reduces RocksDB write buffer (256 MiB → 4 MiB), removes substitute_versionstamp_if_incomplete from both RocksDB and Postgres transaction drivers, and adds tests for versionstamp-like byte preservation and range boundary correctness.

Code Quality

Positive:

  • The caches_target_pages() / caches_prefetched_pages() abstraction cleanly centralises the caching predicate instead of spreading conditionals.
  • Adding page_cache_entries, write_buffer_dirty_pages, db_size_pages to SqliteVfsMetricsSnapshot is a good observability improvement.
  • run_pending_tasks() after invalidate_all() in handle_truncate is a correct moka fix — moka's invalidations are deferred without this call.
  • The test coverage for the vacuum/EOF eviction invariant is exactly the right thing to assert.
  • Moving versionstamp substitution out of the Rust driver layer (the "TODO: calculate on the sql side" comment) is a clean simplification.
  • fetch_initial_main_page correctly restricted to #[cfg(test)] since it used a sync transport; the new fetch_initial_main_page_from_envoy is the proper async envoy path.

Style (minor): VfsConfig::from_optimization_flags had a pre-existing extra-indent inconsistency on the trailing fields. The PR fixes it — good cleanup.


Potential Issues

1. VfsConfig::Default must cover page_cache_mode
The struct gains page_cache_mode: SqliteVfsPageCacheMode but the diff doesn't show the impl Default for VfsConfig block being modified. If that block is exhaustive (no ..Default::default()), this won't compile. Please confirm the default for page_cache_mode is explicitly set and reflects a sensible behaviour (likely All or equivalent).

2. max_capacity(config.cache_capacity_pages.max(1)) with capacity zero
When cache_capacity_pages = 0, moka is still created with capacity 1. seed_main_page writes page 1 unconditionally regardless of caches_target_pages() — likely intentional since SQLite must always read page 1 — but deserves an explicit comment. The test direct_engine_opens_empty_database_with_page_cache_disabled only asserts page_cache_capacity_pages == 0; asserting entry_count == 1 would make the invariant explicit and catch regressions.

3. Preload hint flush silently removed
ensure_preload_hint_flush_task now returns Ok(()) immediately and flush_preload_hints_before_close is empty. Hints are never persisted, so the startup preload warm-up is effectively dead. If this is temporary, add a // TODO with a follow-up ticket. If permanent, the now-inert hint collection code (snapshot_preload_hints, VfsPreloadHintSnapshot) can be cleaned up too.

4. RocksDB write buffer reduction needs justification
The original 256 MiB had an explicit comment: // 256MiB for conflict detection. The PR reduces this to 4 MiB per-CF / 16 MiB DB-level with no explanation. Optimistic transaction conflict detection relies on in-memory write buffers — undersizing can increase conflict retry rates. Please add a comment explaining why 4 MiB is safe here, or confirm this has been stress-tested under concurrent writes.

5. Duplicate dirty-page commit block
The dirty-page update + evict_pages_after_eof() block appears identically in both commit_to_envoy and commit_atomic_to_envoy. Consider extracting it into a private helper (e.g. apply_dirty_pages_from_commit) to prevent silent divergence if one path is updated later.


Performance Considerations

  • The 50k → 4,096 page cache reduction is significant at ~200 MB → ~16 MB. Hot actors with large databases will see more envoy round-trips. This is presumably intentional to bound per-actor memory, but worth monitoring p99 read latency in staging.
  • fetch_initial_main_page_from_envoy adds one extra envoy round-trip on every actor open. The cost for a single page should be small, but worth confirming at the call site.

Test Coverage

  • Good: vfs_config_honors_page_cache_optimization_flags and the disabled-cache open path.
  • Good: vacuum/EOF eviction assertions test exactly the right invariant.
  • Missing: a test for a mode that caches target pages but not prefetched ones (if such a variant exists), verifying the split behaviour inside get_pages_from_envoy.
  • The new RocksDB tests (rocksdb_empty_range_*) are well-structured and test important key-boundary conditions.

Summary

The core VFS cache-bounding logic is well-structured. Main items before merging: (1) confirm Default for VfsConfig covers page_cache_mode; (2) justify the RocksDB write buffer reduction; (3) document or clean up the preload hint flush removal. The duplicate dirty-page block is a low-priority cleanup. PR is a draft so these may already be planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant