Skip to content

feat: reduce low-tail latency via root-meta cache and IO tuning#320

Closed
thweetkomputer wants to merge 137 commits intomainfrom
fix-regression-zc
Closed

feat: reduce low-tail latency via root-meta cache and IO tuning#320
thweetkomputer wants to merge 137 commits intomainfrom
fix-regression-zc

Conversation

@thweetkomputer
Copy link
Collaborator

@thweetkomputer thweetkomputer commented Jan 28, 2026

  • Reworked root-meta caching/mapping so eviction no longer blocks active writers: chunked arenas, mapper handles, and manifest serialization now let writers keep running
    while evictions happen in the background.
  • Made io_uring bootstrapping explicit and background-safe, added bounded compaction concurrency, and shrank write/IO batch sizes so background work stops starving foreground
    tasks—cutting long-tail stalls.
  • Tightened task/runtime checks (e.g., coroutine guardrails, safer pools) and updated replayer/tests to reflect the new mapping plumbing.

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/eloqstore#issue_id
  • Reference the link of RFC if exists
  • Pass ctest --test-dir build/tests/

Summary by CodeRabbit

Release Notes

  • New Features

    • Added configurable concurrent compaction throttling via new max_compaction_in_progress option.
    • Introduced chunk-based mapping storage system with arena-backed memory management.
  • Performance Improvements

    • Optimized I/O operations through batched read/write processing.
    • Enhanced task scheduling with explicit preemption points and time-budget enforcement.
    • Improved memory allocation efficiency via early reservation and deque-based pools.
  • Configuration Changes

    • Adjusted default values: max_write_batch_pages (256→32), direct_io_buffer_pool_size (4→128).
    • Removed mapping_arena_size configuration option.
  • Stability & Bug Fixes

    • Enhanced error handling for partial I/O and zero-byte writes.
    • Strengthened time-based comparisons and added defensive initialization checks.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Caution

Review failed

The pull request is closed.

Walkthrough

This PR implements arena-backed chunked storage for mapping tables, introduces background job initialization infrastructure for io-uring rings, optimizes pools and pages with deque-based storage and lazy initialization, adds batch write operations with yielding checkpoints, and adjusts configuration defaults for performance tuning across multiple subsystems.

Changes

Cohort / File(s) Summary
Background Job Initialization
include/async_io_manager.h, include/storage/shard.h, src/async_io_manager.cpp, src/storage/shard.cpp
Introduces BackgroundJobInited() override and InitBackgroundJob() override in IouringMgr; adds BootstrapRing() to configure io-uring with IORING_SETUP_SINGLE_ISSUER and IORING_SETUP_COOP_TASKRUN; centralizes background initialization via new InitBackgroundJob() helper in Shard; refactors work loop and ExecuteReadyTasks to enforce time budgets and track initialization state via ring_inited_ and low_priority_tasks_io_submitted_ flags.
Mapping Table Arena Architecture
include/storage/page_mapper.h, src/storage/page_mapper.cpp, include/replayer.h, src/replayer.cpp, include/storage/index_page_manager.h, src/storage/index_page_manager.cpp
Replaces flat vector storage with chunked, arena-managed mapping via new MappingArena and MappingChunkArena classes; adds chunk-aware methods (AcquireChunk, ReleaseChunk, EnsureChunkCount, ResizeInternal) and copy semantics (CopyFrom, ApplyPendingTo, StartCopying); adds SetVectorArena(), SetChunkArena(), operator== to MappingTbl; updates constructors to pass arena pointers; replaces direct push_back/operator[] with PushBack() and Set() APIs; introduces MapperChunkArena() accessor in IndexPageManager.
Pool Storage & Page Initialization
include/pool.h, include/storage/page.h, src/storage/page.cpp
Switches internal container from std::vector<T> to std::deque<T> in Pool; changes max_cached == 0 semantics to allow unlimited caching; introduces lazy Init() method, initial_pages_ and initialized_ state tracking in PagesPool; defers pool extension from constructor to explicit Init() call; reduces default Extend() batch from 1024 to 8 pages with logging; zero-initializes allocated chunks via memset.
Configuration Changes
include/kv_options.h, include/common.h
Reduces max_write_batch_pages default from 256 to 32; adds new max_compaction_in_progress option (default 1); increases direct_io_buffer_pool_size from 4 to 128; removes mapping_arena_size option; adjusts SerializeFileIdTermMapping pre-allocation from size() * 4 to size() * 8.
Write & Compaction Operations
src/async_io_manager.cpp, src/tasks/background_write.cpp, src/tasks/batch_write_task.cpp, src/tasks/write_task.cpp
Implements batched write/read operations with configurable batch sizes tied to page alignment; adds zero-byte write checks and partial-write recovery; introduces compaction throttling guard using compaction_in_progress_ counter; adds yield points (YieldToLowPQ()) at key intervals (batch boundaries, file rounds); optimizes batch boundaries with bitwise operations (batch & (ops - 1))).
Task & IO Optimizations
src/tasks/task.cpp, src/tasks/task.h
Adds <butil/time.h> include for time utilities; marks low-priority IO as submitted in WaitIo(); applies __builtin_expect hints for error paths in LoadPage/LoadDataPage/LoadOverflowPage.
File & Meta Management
src/storage/root_meta.cpp, src/file_gc.cpp
Adds timing instrumentation around manifest delete buffer resize with 500µs threshold logging; pre-reserves vector capacity in ClassifyFiles and DeleteUnreferencedLocalFiles; switches mapping iteration from range-based to indexed Get(page_id) access pattern.
Tests & Validation
tests/manifest.cpp, tests/manifest_payload.cpp, src/eloq_store.cpp
Updates test to use MappingTbl API with PushBack() instead of push_back(); changes root cache size from 256 to 5000; replaces base vector comparison with direct MappingTbl equality; removes blank line formatting in error log.

Sequence Diagrams

sequenceDiagram
    actor Client
    participant Shard
    participant IouringMgr
    participant IOUring
    
    Client->>Shard: WorkOneRound / WorkLoop
    activate Shard
    Shard->>Shard: Check if InitBackgroundJob needed
    alt First time or ring not initialized
        Shard->>IouringMgr: InitBackgroundJob()
        activate IouringMgr
        IouringMgr->>IouringMgr: BootstrapRing(shard)
        IouringMgr->>IOUring: io_uring_queue_init with SINGLE_ISSUER + COOP_TASKRUN
        IOUring-->>IouringMgr: ring initialized
        IouringMgr->>IouringMgr: Set ring_inited_ = true
        IouringMgr-->>Shard: success
        deactivate IouringMgr
    end
    Shard->>Shard: ExecuteReadyTasks with time budget enforcement
    Note over Shard: Track ts_, enforce max_processing_time_microseconds
    Shard->>Client: Return control
    deactivate Shard
Loading
sequenceDiagram
    participant Client
    participant MappingTbl
    participant MappingArena
    participant MappingChunkArena
    
    Client->>MappingTbl: Constructor(vector_arena, chunk_arena)
    activate MappingTbl
    MappingTbl->>MappingTbl: Store arena pointers
    deactivate MappingTbl
    
    Client->>MappingTbl: EnsureSize(page_id)
    activate MappingTbl
    MappingTbl->>MappingTbl: Calculate required chunks
    alt Chunks needed
        MappingTbl->>MappingTbl: EnsureChunkCount(count)
        loop For each new chunk
            MappingTbl->>MappingChunkArena: AcquireChunk()
            activate MappingChunkArena
            MappingChunkArena->>MappingChunkArena: Pop from pool or allocate
            MappingChunkArena-->>MappingTbl: unique_ptr<Chunk>
            deactivate MappingChunkArena
            MappingTbl->>MappingTbl: Add chunk to base_
        end
    end
    MappingTbl-->>Client: Size ensured
    deactivate MappingTbl
    
    Client->>MappingTbl: Set(page_id, value)
    activate MappingTbl
    MappingTbl->>MappingTbl: Locate chunk via page_id >> kChunkShift
    MappingTbl->>MappingTbl: Update value in chunk
    deactivate MappingTbl
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • eatbreads
  • liunyl
  • xiexiaoy

🐰 Hop through the chunks with grace,
Arenas now manage mapping's place,
Pools defer, and yields abound,
Background jobs initialize sound,
Write and compaction dance in sync,
One batch hop, then a yielding wink!

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-regression-zc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

private:
size_t max_cached_;
std::vector<T> pool_;
std::deque<T> pool_;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cpplint] reported by reviewdog 🐶
Add #include for deque<> [build/include_what_you_use] [4]

void Extend();

private:
std::deque<std::unique_ptr<MappingSnapshot::MappingTbl::Chunk>> pool_;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cpplint] reported by reviewdog 🐶
Add #include for deque<> [build/include_what_you_use] [4]

return std::tuple{replayer.root_,
replayer.ttl_root_,
replayer.mapping_tbl_,
std::move(replayer.mapping_tbl_),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cpplint] reported by reviewdog 🐶
Add #include for move [build/include_what_you_use] [4]

@thweetkomputer thweetkomputer deleted the fix-regression-zc branch January 28, 2026 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant