Skip to content

add raii for memindexpage#351

Merged
thweetkomputer merged 7 commits intomainfrom
raii2
Feb 14, 2026
Merged

add raii for memindexpage#351
thweetkomputer merged 7 commits intomainfrom
raii2

Conversation

@eatbreads
Copy link
Collaborator

@eatbreads eatbreads commented Feb 9, 2026

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/eloqstore#issue_id
  • Reference the link of RFC if exists
  • Pass ctest --test-dir build/tests/

Summary by CodeRabbit

  • Refactor
    • Replaced raw page pointers with managed handles across indexing, writing, scanning, batching and background compaction flows.
    • Iteration and stack entries now store handles, removing manual pin/unpin and simplifying lifetime management.
    • Improves stability, ownership semantics and reduces risk of resource leaks—no user-facing behavior changes.

@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaces raw MemIndexPage* with MemIndexPage::Handle across storage, async I/O, and task layers. Updates VarPage variant, iterator constructors, swizzling APIs, and many method signatures; shifts pin/unpin lifecycle to RAII-style handle semantics and propagates handles through traversal, write, and compaction flows. (≤50 words)

Changes

Cohort / File(s) Summary
Handle type & iterator
include/storage/mem_index_page.h, src/storage/mem_index_page.cpp
Add MemIndexPage::Handle nested class; change IndexPageIter to accept a MemIndexPage::Handle and lazily access underlying page via Get().
Variant & async IO
include/async_io_manager.h, src/async_io_manager.cpp
Replace MemIndexPage * in VarPage with MemIndexPage::Handle; update VarPage helpers and IO request storage to move/store handles.
Index page manager
include/storage/index_page_manager.h, src/storage/index_page_manager.cpp
FindPage now returns std::pair<MemIndexPage::Handle, KvError>; internal swizzling/attach/load flows use and return handles (empty handle on errors).
Page mapper / swizzling
include/storage/page_mapper.h, src/storage/page_mapper.cpp
Rename GetSwizzlingPointerGetSwizzlingHandle; return MemIndexPage::Handle (empty handle when no swizzle) and adjust callers.
Task/public headers
include/tasks/..., include/write_tree_stack.h
Multiple public/private signatures changed to use MemIndexPage::Handle (BatchWriteTask, WriteTask, ScanTask, WriteTreeStack): Pop, FlushIndexPage, FinishIndexPage, Redistribute, TruncateIndexPage, WritePage, stack entries; removed DirtyIndexPage.
Batch/scan/write implementations
src/tasks/batch_write_task.cpp, src/tasks/scan_task.cpp, src/tasks/write_task.cpp
Propagate handles through seek/pop/apply/write/redistribute/truncate flows; remove explicit Unpin calls; use move/reset of handles and adapt iterators/builders.
Background/compaction paths
src/tasks/background_write.cpp
MovingCachedPages and compaction write path store MemIndexPage::Handle in entries; Add/Finish/Finish cleanup use handle.Reset()/Reset semantics.

Sequence Diagram(s)

sequenceDiagram
    participant Task as BatchWriteTask
    participant Manager as IndexPageManager
    participant Mapper as MappingSnapshot
    participant Storage as Storage/IO

    Task->>Manager: FindPage(mapping, page_id)
    Manager->>Mapper: GetSwizzlingHandle(page_id)
    alt swizzle exists
        Mapper-->>Manager: MemIndexPage::Handle(existing)
        Manager-->>Task: return handle
    else no swizzle
        Manager->>Storage: Allocate/Read page
        Storage-->>Manager: MemIndexPage*
        Manager->>Mapper: Attach/swizzle (store handle)
        Manager-->>Task: return MemIndexPage::Handle(new_page)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • write buffer #346 — Directly related: also updates VarPage/async IO usages to incorporate MemIndexPage::Handle.

Suggested reviewers

  • liunyl
  • eatbreads

Poem

🐇 I swapped my pointers for a gentler hand,
Handles hop softly across the land,
No frantic pins to drop or chase,
They hold the pages, keep the pace,
I nibble bugs and leave no trace.

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description contains only a checklist template with no substantive content, issue references, RFC links, or implementation details explaining the RAII changes or their impact. Add a detailed description explaining the RAII pattern introduced, how MemIndexPage::Handle works, why this change improves code (e.g., automatic pinning/unpinning), reference any related issues, and confirm testing status.
Docstring Coverage ⚠️ Warning Docstring coverage is 3.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'add raii for memindexpage' is concise and directly describes the main change: introducing RAII management for MemIndexPage through a new Handle class, which aligns with the core refactoring visible across all modified files.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch raii2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/storage/index_page_manager.cpp (1)

349-354: ⚠️ Potential issue | 🟠 Major

Potential pin conflict: concurrent waiter holds a pin when this error path tries to free the page.

When Task A's IO fails here, it calls FreeIndexPage(new_page) which asserts !page->IsPinned(). However, a concurrent Task B may have already obtained a handle to this page via GetSwizzlingHandle at Line 331 (which pins it), found it detached at Line 360, and is now waiting at Line 363. Task B's handle keeps the page pinned, so FreeIndexPage will trip the assertion.

Before RAII, GetSwizzlingPointer returned a raw pointer without pinning, so FreeIndexPage would succeed (though Task B would hold a dangling pointer — handled by the re-loop). Now the automatic pin prevents the free.

Consider resetting any external pins before freeing, or deferring the free until the page is fully unpinned (e.g., enqueue to the free list only when ref_cnt reaches 0).

src/tasks/write_task.cpp (1)

251-264: ⚠️ Potential issue | 🔴 Critical

Reset the handle before freeing to unpin the page.

In the error path (line 262), FreeIndexPage(idx_page) is called while handle (line 253) still holds a pin on the page. FreeIndexPage asserts !page->IsPinned(), which will fire. Additionally, when handle goes out of scope after the switch statement, the IndexPageHandle destructor will call Unpin() on an already-freed page.

🐛 Proposed fix
         else
         {
+            handle.Reset();
             shard->IndexManager()->FreeIndexPage(idx_page);
         }
🤖 Fix all issues with AI agents
In `@src/tasks/batch_write_task.cpp`:
- Around line 1540-1555: The code frees a pinned page while an IndexPageHandle
still holds it, causing a use-after-free in the error path; before calling
shard->IndexManager()->FreeIndexPage(new_handle.Get()) you must clear/unpin the
handle so the page isn't still pinned: call new_handle.Reset() (or otherwise
Unpin() and clear the handle) immediately prior to FreeIndexPage() in the error
branch of the write failure (the block that checks err != KvError::NoError after
WritePage), ensuring AllocIndexPage()/IndexPageHandle semantics are respected
and the handle destructor won't touch a freed page.

In `@src/tasks/scan_task.cpp`:
- Around line 139-148: The code moves `it` into `index_stack_` then calls
`it.GetPageId()` and `it.IsPointingToLeaf()` causing a use-after-move; after
emplacing with `index_stack_.emplace_back(std::move(handle), std::move(it))`
retrieve the iterator from the newly added stack entry (e.g. the iterator member
of `index_stack_.back()` or the `.second` element) and call `GetPageId()` and
`IsPointingToLeaf()` on that stack-held iterator (and no longer use the local
`it`)—update the calls that read `GetPageId()`/`IsPointingToLeaf()` to reference
the stack entry instead.
- Around line 99-108: The code moves idx_it into index_stack_ then reads from
the moved-from iterator (idx_it.GetPageId()), causing UB; fix by querying the
page id before moving and/or accessing the iterator via the stack entry after
move: call PageId page_id = idx_it.GetPageId() (and check against MaxPageId)
before index_stack_.emplace_back(std::move(handle), std::move(idx_it)), then use
page_id for the MaxPageId check and child_id assignment; for any subsequent
iterator access use index_stack_.back().second.GetPageId() (or the actual member
name for the stored iterator) and keep handle checks via
index_stack_.back().first->IsPointingToLeaf() (or the stored handle member)
instead of reading idx_it after it was moved.
🧹 Nitpick comments (7)
include/write_tree_stack.h (1)

20-29: Initialization order is correct but fragile — consider a comment.

The member initializer list relies on idx_page_iter_ being declared before handle_ so that handle (the parameter) is still valid when passed to IndexPageIter. If someone ever reorders the member declarations, this silently breaks because handle would already be moved-from. The current code is correct since declaration order matches, but a brief comment would prevent a future foot-gun.

src/storage/mem_index_page.cpp (1)

108-135: Redundant handle.Get() calls across three IILEs — consider simplifying.

Each IILE independently calls handle.Get() to fetch the same MemIndexPage* and null-checks it. While correct, this could be simplified by extracting the pointer once. A helper or a single conditional init block would reduce repetition.

♻️ Possible simplification
-IndexPageIter::IndexPageIter(const IndexPageHandle &handle,
-                             const KvOptions *opts)
-    : comparator_(opts->comparator_),
-      page_(
-          [&handle, opts]()
-          {
-              const MemIndexPage *index_page = handle.Get();
-              return index_page == nullptr
-                         ? std::string_view{}
-                         : std::string_view{index_page->PagePtr(),
-                                            opts->data_page_size};
-          }()),
-      restart_num_(
-          [&handle]()
-          {
-              const MemIndexPage *index_page = handle.Get();
-              return index_page == nullptr ? 0 : index_page->RestartNum();
-          }()),
-      restart_offset_(
-          [&handle]()
-          {
-              const MemIndexPage *index_page = handle.Get();
-              return index_page == nullptr
-                         ? 0
-                         : index_page->ContentLength() -
-                               (1 + index_page->RestartNum()) *
-                                   sizeof(uint16_t);
-          }()),
-      curr_offset_(MemIndexPage::leftmost_ptr_offset)
-{
-}
+IndexPageIter::IndexPageIter(const IndexPageHandle &handle,
+                             const KvOptions *opts)
+    : IndexPageIter(handle.Get(), opts)
+{
+}
+
+IndexPageIter::IndexPageIter(const MemIndexPage *index_page,
+                             const KvOptions *opts)
+    : comparator_(opts->comparator_),
+      page_(index_page == nullptr
+                ? std::string_view{}
+                : std::string_view{index_page->PagePtr(),
+                                   opts->data_page_size}),
+      restart_num_(index_page == nullptr ? 0 : index_page->RestartNum()),
+      restart_offset_(index_page == nullptr
+                          ? 0
+                          : index_page->ContentLength() -
+                                (1 + index_page->RestartNum()) *
+                                    sizeof(uint16_t)),
+      curr_offset_(MemIndexPage::leftmost_ptr_offset)
+{
+}

This delegates to a private constructor taking a raw const MemIndexPage*, avoiding the repeated handle.Get() calls and IILEs.

include/async_io_manager.h (1)

30-41: Redundant forward declaration after full include.

Line 30 includes "storage/mem_index_page.h" which defines IndexPageHandle. The forward declaration class IndexPageHandle; at line 41 is therefore redundant. It's harmless but adds noise.

♻️ Remove redundant forward declaration
 class WriteReq;
 class WriteTask;
 class MemIndexPage;
-class IndexPageHandle;
 class CloudStorageService;
 class Shard;
include/storage/mem_index_page.h (1)

211-211: Verify that callers keep IndexPageHandle alive for the lifetime of IndexPageIter.

IndexPageIter captures a std::string_view into the page buffer during construction. Since it only takes a const IndexPageHandle & (not ownership), the caller must ensure the handle outlives the iterator. This is satisfied in the current code where both are stored together in stack frames or in the same local scope — but it's an implicit contract worth documenting.

src/tasks/write_task.cpp (1)

106-113: Redundant SetChecksum call when delegating to the two-argument overload.

WritePage(IndexPageHandle &page) calls SetChecksum at Line 108, then delegates to WritePage(IndexPageHandle &page, FilePageId) which calls SetChecksum again at Line 117. Since SetPageId/SetFilePageId modify member variables (not the page buffer), the page buffer is unchanged and the second checksum call recomputes the same result.

♻️ Remove the redundant SetChecksum from the two-argument overload
 KvError WriteTask::WritePage(IndexPageHandle &page, FilePageId file_page_id)
 {
-    SetChecksum({page->PagePtr(), Options()->data_page_size});
     // Create a temporary handle for VarPage to keep pinning during IO.
     IndexPageHandle io_handle(page.Get());
     return WritePage(VarPage(std::move(io_handle)), file_page_id);
 }

Note: if WritePage(IndexPageHandle &, FilePageId) is also called directly (e.g. from background_write.cpp), the caller must ensure SetChecksum is called beforehand. Alternatively, keep it here and remove it from the single-argument overload.

include/tasks/batch_write_task.h (1)

47-58: The Doxygen comment above Redistribute (lines 47–54) documents the DataPage overload, not the IndexPageHandle overload.

The @param prev_page description references "The previous page" as a DataPage, but the overload at line 56 takes IndexPageHandle &prev_handle. Consider adding a brief doc block for this overload to avoid confusion.

src/tasks/batch_write_task.cpp (1)

937-955: Redundant null check on prev_handle at line 944.

We are already inside the if (prev_handle) block from line 937, so the prev_handle && guard at line 944 is always true. Harmless, but slightly noisy.

Proposed cleanup
-        if (cur_page_len < one_quarter && prev_handle &&
-            prev_handle->RestartNum() > 1 &&
-            prev_handle->ContentLength() > three_quarter)
+        if (cur_page_len < one_quarter &&
+            prev_handle->RestartNum() > 1 &&
+            prev_handle->ContentLength() > three_quarter)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/tasks/write_task.cpp`:
- Around line 115-121: The CHECK that asserts the page is unpinned in
WritePageCallback causes crashes when SubmitMergedWrite fails synchronously
because the caller's IndexPageHandle still holds a pin; change the hard
CHECK(!idx_page->IsPinned()) into a conditional guard that only calls
FreeIndexPage (or the existing free path) when the page has no remaining pins.
Specifically, in WriteTask::WritePage(IndexPageHandle &page, FilePageId) you
create io_handle and move it into VarPage; in WritePageCallback where
handle.Reset() currently unpins only the io_handle, replace the CHECK with logic
like "if (!idx_page->IsPinned()) { FreeIndexPage(...); } else { defer freeing
(no-op) and let the last handle destruction free it }" so synchronous error
paths from SubmitMergedWrite/OpenOrCreateFD do not crash and pages are freed
later when the final handle is destroyed.
🧹 Nitpick comments (2)
src/tasks/write_task.cpp (1)

106-121: Redundant SetChecksum at line 108 — the result is immediately invalidated.

SetChecksum at line 108 hashes the page content before SetPageId/SetFilePageId (lines 110–111) modify it. The two-arg overload at line 117 then recomputes the checksum over the now-correct content. The first call is wasted work.

Proposed fix
 KvError WriteTask::WritePage(IndexPageHandle &page)
 {
-    SetChecksum({page->PagePtr(), Options()->data_page_size});
     auto [page_id, file_page_id] = AllocatePage(page->GetPageId());
     page->SetPageId(page_id);
     page->SetFilePageId(file_page_id);
     return WritePage(page, file_page_id);
 }
src/tasks/batch_write_task.cpp (1)

896-899: Unguarded dereference of handle_ when builder is empty.

Line 899 accesses stack_.back()->handle_->GetPageId() without checking if handle_ is non-null. While it's likely safe in practice (a non-empty base page implies a non-null handle), an assertion or guard would make the precondition explicit and prevent a crash if future changes violate this invariant.

Proposed defensive assertion
     if (idx_page_builder_.IsEmpty())
     {
+        assert(stack_.back()->handle_);
         FreePage(stack_.back()->handle_->GetPageId());

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/tasks/batch_write_task.cpp`:
- Around line 896-906: stack_.back()->handle_ is dereferenced without checking
for emptiness in the block where idx_page_builder_.IsEmpty() is true, which can
cause a null dereference if the IndexPageHandle is empty; add a defensive check
that stack_.back()->handle_ (the IndexPageHandle) is valid before calling
GetPageId() and only call FreePage and record the parent change
(parent->changes_.emplace_back(..., prev_page_id, WriteOp::Delete)) when the
handle exists, otherwise skip FreePage and avoid using page id or ensure
prev_page_id is set appropriately; locate this logic around the
idx_page_builder_.IsEmpty() branch and protect uses of stack_.back()->handle_,
stack_[stack_.size() - 2]->idx_page_iter_.Key(), and prev_page_id accordingly.
🧹 Nitpick comments (5)
include/write_tree_stack.h (1)

20-29: Member ordering creates a fragile initialization dependency.

idx_page_iter_ is initialized from the handle parameter before handle_ takes ownership via move (both in the initializer list and in actual C++ initialization order, since members are initialized in declaration order). This works correctly today, but if someone later swaps the declaration order of handle_ and idx_page_iter_, the parameter will be moved-from before the iterator reads it, causing undefined behavior.

Consider reordering the member declarations so that handle_ is declared (and thus initialized) first, and adjust the IndexPageIter constructor call to read from the member instead:

Suggested reordering
-    IndexStackEntry(IndexPageHandle handle, const KvOptions *opts)
-        : idx_page_iter_(handle, opts), handle_(std::move(handle))
+    IndexStackEntry(IndexPageHandle handle, const KvOptions *opts)
+        : handle_(std::move(handle)), idx_page_iter_(handle_, opts)
     {
     }

-    IndexPageIter idx_page_iter_;
-    IndexPageHandle handle_{};
+    IndexPageHandle handle_{};
+    IndexPageIter idx_page_iter_;
src/storage/mem_index_page.cpp (2)

93-94: const_cast to create an IndexPageHandle from this in a const method pins/unpins a logically-const object.

String() is const, but IndexPageHandle's constructor calls Pin() (a non-const operation) via const_cast. This is technically UB if the object were truly const-qualified, though in practice MemIndexPage instances are always heap-allocated mutable objects. Consider adding a const-aware factory or a PinGuard that doesn't require casting away const, to avoid this pitfall as the codebase evolves.


108-136: Consider simplifying the triple IILE pattern.

handle.Get() is called three times identically across three immediately-invoked lambdas. A single local extraction would be simpler and more readable, though it would require switching to a delegating constructor or a body-initialized approach.

Example simplification using a static helper
+namespace {
+struct IterInitData {
+    std::string_view page;
+    uint16_t restart_num;
+    uint16_t restart_offset;
+};
+
+IterInitData MakeIterData(const IndexPageHandle &handle, const KvOptions *opts) {
+    const MemIndexPage *p = handle.Get();
+    if (p == nullptr) return {{}, 0, 0};
+    return {
+        {p->PagePtr(), opts->data_page_size},
+        p->RestartNum(),
+        static_cast<uint16_t>(p->ContentLength() - (1 + p->RestartNum()) * sizeof(uint16_t))
+    };
+}
+} // namespace
+
 IndexPageIter::IndexPageIter(const IndexPageHandle &handle,
                              const KvOptions *opts)
-    : comparator_(opts->comparator_),
-      page_(
-          [&handle, opts]() { ... }()),
-      restart_num_(
-          [&handle]() { ... }()),
-      restart_offset_(
-          [&handle]() { ... }()),
+    : comparator_(opts->comparator_),
+      page_([&] { auto d = MakeIterData(handle, opts); return d.page; }()),
+      restart_num_([&] { auto d = MakeIterData(handle, opts); return d.restart_num; }()),
+      restart_offset_([&] { auto d = MakeIterData(handle, opts); return d.restart_offset; }()),
       curr_offset_(MemIndexPage::leftmost_ptr_offset)

Or more directly, use a body-initialized approach if the members aren't const.

src/tasks/batch_write_task.cpp (2)

946-950: Redundant prev_handle check on line 948.

The prev_handle truthiness check at line 948 is redundant — we're already inside the if (prev_handle) block from line 941.

Suggested cleanup
-        if (cur_page_len < one_quarter && prev_handle &&
-            prev_handle->RestartNum() > 1 &&
+        if (cur_page_len < one_quarter &&
+            prev_handle->RestartNum() > 1 &&

1126-1200: string_view lifetime across Swap is subtle but correct.

Redistribute returns new_page (a string_view from idx_builder.Finish()), then idx_page_builder_.Swap(idx_builder) transfers the backing buffer into idx_page_builder_. The returned view still points to valid memory (now owned by idx_page_builder_), and the caller (FinishIndexPage) consumes it via memcpy before any further mutation. Consider a brief comment near line 1180 noting this lifetime dependency, since a future refactor that modifies idx_page_builder_ between the swap and the copy would silently invalidate the view.

Copy link
Collaborator

@thweetkomputer thweetkomputer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI cannot pass

@thweetkomputer thweetkomputer merged commit 8b5e4e2 into main Feb 14, 2026
6 checks passed
@thweetkomputer thweetkomputer deleted the raii2 branch February 14, 2026 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants