Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReplaces raw Changes
Sequence Diagram(s)sequenceDiagram
participant Task as BatchWriteTask
participant Manager as IndexPageManager
participant Mapper as MappingSnapshot
participant Storage as Storage/IO
Task->>Manager: FindPage(mapping, page_id)
Manager->>Mapper: GetSwizzlingHandle(page_id)
alt swizzle exists
Mapper-->>Manager: MemIndexPage::Handle(existing)
Manager-->>Task: return handle
else no swizzle
Manager->>Storage: Allocate/Read page
Storage-->>Manager: MemIndexPage*
Manager->>Mapper: Attach/swizzle (store handle)
Manager-->>Task: return MemIndexPage::Handle(new_page)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/storage/index_page_manager.cpp (1)
349-354:⚠️ Potential issue | 🟠 MajorPotential pin conflict: concurrent waiter holds a pin when this error path tries to free the page.
When Task A's IO fails here, it calls
FreeIndexPage(new_page)which asserts!page->IsPinned(). However, a concurrent Task B may have already obtained a handle to this page viaGetSwizzlingHandleat Line 331 (which pins it), found it detached at Line 360, and is now waiting at Line 363. Task B's handle keeps the page pinned, soFreeIndexPagewill trip the assertion.Before RAII,
GetSwizzlingPointerreturned a raw pointer without pinning, soFreeIndexPagewould succeed (though Task B would hold a dangling pointer — handled by the re-loop). Now the automatic pin prevents the free.Consider resetting any external pins before freeing, or deferring the free until the page is fully unpinned (e.g., enqueue to the free list only when
ref_cntreaches 0).src/tasks/write_task.cpp (1)
251-264:⚠️ Potential issue | 🔴 CriticalReset the handle before freeing to unpin the page.
In the error path (line 262),
FreeIndexPage(idx_page)is called whilehandle(line 253) still holds a pin on the page.FreeIndexPageasserts!page->IsPinned(), which will fire. Additionally, whenhandlegoes out of scope after the switch statement, theIndexPageHandledestructor will callUnpin()on an already-freed page.🐛 Proposed fix
else { + handle.Reset(); shard->IndexManager()->FreeIndexPage(idx_page); }
🤖 Fix all issues with AI agents
In `@src/tasks/batch_write_task.cpp`:
- Around line 1540-1555: The code frees a pinned page while an IndexPageHandle
still holds it, causing a use-after-free in the error path; before calling
shard->IndexManager()->FreeIndexPage(new_handle.Get()) you must clear/unpin the
handle so the page isn't still pinned: call new_handle.Reset() (or otherwise
Unpin() and clear the handle) immediately prior to FreeIndexPage() in the error
branch of the write failure (the block that checks err != KvError::NoError after
WritePage), ensuring AllocIndexPage()/IndexPageHandle semantics are respected
and the handle destructor won't touch a freed page.
In `@src/tasks/scan_task.cpp`:
- Around line 139-148: The code moves `it` into `index_stack_` then calls
`it.GetPageId()` and `it.IsPointingToLeaf()` causing a use-after-move; after
emplacing with `index_stack_.emplace_back(std::move(handle), std::move(it))`
retrieve the iterator from the newly added stack entry (e.g. the iterator member
of `index_stack_.back()` or the `.second` element) and call `GetPageId()` and
`IsPointingToLeaf()` on that stack-held iterator (and no longer use the local
`it`)—update the calls that read `GetPageId()`/`IsPointingToLeaf()` to reference
the stack entry instead.
- Around line 99-108: The code moves idx_it into index_stack_ then reads from
the moved-from iterator (idx_it.GetPageId()), causing UB; fix by querying the
page id before moving and/or accessing the iterator via the stack entry after
move: call PageId page_id = idx_it.GetPageId() (and check against MaxPageId)
before index_stack_.emplace_back(std::move(handle), std::move(idx_it)), then use
page_id for the MaxPageId check and child_id assignment; for any subsequent
iterator access use index_stack_.back().second.GetPageId() (or the actual member
name for the stored iterator) and keep handle checks via
index_stack_.back().first->IsPointingToLeaf() (or the stored handle member)
instead of reading idx_it after it was moved.
🧹 Nitpick comments (7)
include/write_tree_stack.h (1)
20-29: Initialization order is correct but fragile — consider a comment.The member initializer list relies on
idx_page_iter_being declared beforehandle_so thathandle(the parameter) is still valid when passed toIndexPageIter. If someone ever reorders the member declarations, this silently breaks becausehandlewould already be moved-from. The current code is correct since declaration order matches, but a brief comment would prevent a future foot-gun.src/storage/mem_index_page.cpp (1)
108-135: Redundanthandle.Get()calls across three IILEs — consider simplifying.Each IILE independently calls
handle.Get()to fetch the sameMemIndexPage*and null-checks it. While correct, this could be simplified by extracting the pointer once. A helper or a single conditional init block would reduce repetition.♻️ Possible simplification
-IndexPageIter::IndexPageIter(const IndexPageHandle &handle, - const KvOptions *opts) - : comparator_(opts->comparator_), - page_( - [&handle, opts]() - { - const MemIndexPage *index_page = handle.Get(); - return index_page == nullptr - ? std::string_view{} - : std::string_view{index_page->PagePtr(), - opts->data_page_size}; - }()), - restart_num_( - [&handle]() - { - const MemIndexPage *index_page = handle.Get(); - return index_page == nullptr ? 0 : index_page->RestartNum(); - }()), - restart_offset_( - [&handle]() - { - const MemIndexPage *index_page = handle.Get(); - return index_page == nullptr - ? 0 - : index_page->ContentLength() - - (1 + index_page->RestartNum()) * - sizeof(uint16_t); - }()), - curr_offset_(MemIndexPage::leftmost_ptr_offset) -{ -} +IndexPageIter::IndexPageIter(const IndexPageHandle &handle, + const KvOptions *opts) + : IndexPageIter(handle.Get(), opts) +{ +} + +IndexPageIter::IndexPageIter(const MemIndexPage *index_page, + const KvOptions *opts) + : comparator_(opts->comparator_), + page_(index_page == nullptr + ? std::string_view{} + : std::string_view{index_page->PagePtr(), + opts->data_page_size}), + restart_num_(index_page == nullptr ? 0 : index_page->RestartNum()), + restart_offset_(index_page == nullptr + ? 0 + : index_page->ContentLength() - + (1 + index_page->RestartNum()) * + sizeof(uint16_t)), + curr_offset_(MemIndexPage::leftmost_ptr_offset) +{ +}This delegates to a private constructor taking a raw
const MemIndexPage*, avoiding the repeatedhandle.Get()calls and IILEs.include/async_io_manager.h (1)
30-41: Redundant forward declaration after full include.Line 30 includes
"storage/mem_index_page.h"which definesIndexPageHandle. The forward declarationclass IndexPageHandle;at line 41 is therefore redundant. It's harmless but adds noise.♻️ Remove redundant forward declaration
class WriteReq; class WriteTask; class MemIndexPage; -class IndexPageHandle; class CloudStorageService; class Shard;include/storage/mem_index_page.h (1)
211-211: Verify that callers keepIndexPageHandlealive for the lifetime ofIndexPageIter.
IndexPageItercaptures astd::string_viewinto the page buffer during construction. Since it only takes aconst IndexPageHandle &(not ownership), the caller must ensure the handle outlives the iterator. This is satisfied in the current code where both are stored together in stack frames or in the same local scope — but it's an implicit contract worth documenting.src/tasks/write_task.cpp (1)
106-113: RedundantSetChecksumcall when delegating to the two-argument overload.
WritePage(IndexPageHandle &page)callsSetChecksumat Line 108, then delegates toWritePage(IndexPageHandle &page, FilePageId)which callsSetChecksumagain at Line 117. SinceSetPageId/SetFilePageIdmodify member variables (not the page buffer), the page buffer is unchanged and the second checksum call recomputes the same result.♻️ Remove the redundant SetChecksum from the two-argument overload
KvError WriteTask::WritePage(IndexPageHandle &page, FilePageId file_page_id) { - SetChecksum({page->PagePtr(), Options()->data_page_size}); // Create a temporary handle for VarPage to keep pinning during IO. IndexPageHandle io_handle(page.Get()); return WritePage(VarPage(std::move(io_handle)), file_page_id); }Note: if
WritePage(IndexPageHandle &, FilePageId)is also called directly (e.g. frombackground_write.cpp), the caller must ensureSetChecksumis called beforehand. Alternatively, keep it here and remove it from the single-argument overload.include/tasks/batch_write_task.h (1)
47-58: The Doxygen comment aboveRedistribute(lines 47–54) documents theDataPageoverload, not theIndexPageHandleoverload.The
@param prev_pagedescription references "The previous page" as aDataPage, but the overload at line 56 takesIndexPageHandle &prev_handle. Consider adding a brief doc block for this overload to avoid confusion.src/tasks/batch_write_task.cpp (1)
937-955: Redundant null check onprev_handleat line 944.We are already inside the
if (prev_handle)block from line 937, so theprev_handle &&guard at line 944 is always true. Harmless, but slightly noisy.Proposed cleanup
- if (cur_page_len < one_quarter && prev_handle && - prev_handle->RestartNum() > 1 && - prev_handle->ContentLength() > three_quarter) + if (cur_page_len < one_quarter && + prev_handle->RestartNum() > 1 && + prev_handle->ContentLength() > three_quarter)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/tasks/write_task.cpp`:
- Around line 115-121: The CHECK that asserts the page is unpinned in
WritePageCallback causes crashes when SubmitMergedWrite fails synchronously
because the caller's IndexPageHandle still holds a pin; change the hard
CHECK(!idx_page->IsPinned()) into a conditional guard that only calls
FreeIndexPage (or the existing free path) when the page has no remaining pins.
Specifically, in WriteTask::WritePage(IndexPageHandle &page, FilePageId) you
create io_handle and move it into VarPage; in WritePageCallback where
handle.Reset() currently unpins only the io_handle, replace the CHECK with logic
like "if (!idx_page->IsPinned()) { FreeIndexPage(...); } else { defer freeing
(no-op) and let the last handle destruction free it }" so synchronous error
paths from SubmitMergedWrite/OpenOrCreateFD do not crash and pages are freed
later when the final handle is destroyed.
🧹 Nitpick comments (2)
src/tasks/write_task.cpp (1)
106-121: RedundantSetChecksumat line 108 — the result is immediately invalidated.
SetChecksumat line 108 hashes the page content beforeSetPageId/SetFilePageId(lines 110–111) modify it. The two-arg overload at line 117 then recomputes the checksum over the now-correct content. The first call is wasted work.Proposed fix
KvError WriteTask::WritePage(IndexPageHandle &page) { - SetChecksum({page->PagePtr(), Options()->data_page_size}); auto [page_id, file_page_id] = AllocatePage(page->GetPageId()); page->SetPageId(page_id); page->SetFilePageId(file_page_id); return WritePage(page, file_page_id); }src/tasks/batch_write_task.cpp (1)
896-899: Unguarded dereference ofhandle_when builder is empty.Line 899 accesses
stack_.back()->handle_->GetPageId()without checking ifhandle_is non-null. While it's likely safe in practice (a non-empty base page implies a non-null handle), an assertion or guard would make the precondition explicit and prevent a crash if future changes violate this invariant.Proposed defensive assertion
if (idx_page_builder_.IsEmpty()) { + assert(stack_.back()->handle_); FreePage(stack_.back()->handle_->GetPageId());
e9e18fa to
4c6eb27
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/tasks/batch_write_task.cpp`:
- Around line 896-906: stack_.back()->handle_ is dereferenced without checking
for emptiness in the block where idx_page_builder_.IsEmpty() is true, which can
cause a null dereference if the IndexPageHandle is empty; add a defensive check
that stack_.back()->handle_ (the IndexPageHandle) is valid before calling
GetPageId() and only call FreePage and record the parent change
(parent->changes_.emplace_back(..., prev_page_id, WriteOp::Delete)) when the
handle exists, otherwise skip FreePage and avoid using page id or ensure
prev_page_id is set appropriately; locate this logic around the
idx_page_builder_.IsEmpty() branch and protect uses of stack_.back()->handle_,
stack_[stack_.size() - 2]->idx_page_iter_.Key(), and prev_page_id accordingly.
🧹 Nitpick comments (5)
include/write_tree_stack.h (1)
20-29: Member ordering creates a fragile initialization dependency.
idx_page_iter_is initialized from thehandleparameter beforehandle_takes ownership via move (both in the initializer list and in actual C++ initialization order, since members are initialized in declaration order). This works correctly today, but if someone later swaps the declaration order ofhandle_andidx_page_iter_, the parameter will be moved-from before the iterator reads it, causing undefined behavior.Consider reordering the member declarations so that
handle_is declared (and thus initialized) first, and adjust theIndexPageIterconstructor call to read from the member instead:Suggested reordering
- IndexStackEntry(IndexPageHandle handle, const KvOptions *opts) - : idx_page_iter_(handle, opts), handle_(std::move(handle)) + IndexStackEntry(IndexPageHandle handle, const KvOptions *opts) + : handle_(std::move(handle)), idx_page_iter_(handle_, opts) { } - IndexPageIter idx_page_iter_; - IndexPageHandle handle_{}; + IndexPageHandle handle_{}; + IndexPageIter idx_page_iter_;src/storage/mem_index_page.cpp (2)
93-94:const_castto create anIndexPageHandlefromthisin a const method pins/unpins a logically-const object.
String()isconst, butIndexPageHandle's constructor callsPin()(a non-const operation) viaconst_cast. This is technically UB if the object were truly const-qualified, though in practiceMemIndexPageinstances are always heap-allocated mutable objects. Consider adding a const-aware factory or aPinGuardthat doesn't require casting away const, to avoid this pitfall as the codebase evolves.
108-136: Consider simplifying the triple IILE pattern.
handle.Get()is called three times identically across three immediately-invoked lambdas. A single local extraction would be simpler and more readable, though it would require switching to a delegating constructor or a body-initialized approach.Example simplification using a static helper
+namespace { +struct IterInitData { + std::string_view page; + uint16_t restart_num; + uint16_t restart_offset; +}; + +IterInitData MakeIterData(const IndexPageHandle &handle, const KvOptions *opts) { + const MemIndexPage *p = handle.Get(); + if (p == nullptr) return {{}, 0, 0}; + return { + {p->PagePtr(), opts->data_page_size}, + p->RestartNum(), + static_cast<uint16_t>(p->ContentLength() - (1 + p->RestartNum()) * sizeof(uint16_t)) + }; +} +} // namespace + IndexPageIter::IndexPageIter(const IndexPageHandle &handle, const KvOptions *opts) - : comparator_(opts->comparator_), - page_( - [&handle, opts]() { ... }()), - restart_num_( - [&handle]() { ... }()), - restart_offset_( - [&handle]() { ... }()), + : comparator_(opts->comparator_), + page_([&] { auto d = MakeIterData(handle, opts); return d.page; }()), + restart_num_([&] { auto d = MakeIterData(handle, opts); return d.restart_num; }()), + restart_offset_([&] { auto d = MakeIterData(handle, opts); return d.restart_offset; }()), curr_offset_(MemIndexPage::leftmost_ptr_offset)Or more directly, use a body-initialized approach if the members aren't const.
src/tasks/batch_write_task.cpp (2)
946-950: Redundantprev_handlecheck on line 948.The
prev_handletruthiness check at line 948 is redundant — we're already inside theif (prev_handle)block from line 941.Suggested cleanup
- if (cur_page_len < one_quarter && prev_handle && - prev_handle->RestartNum() > 1 && + if (cur_page_len < one_quarter && + prev_handle->RestartNum() > 1 &&
1126-1200:string_viewlifetime acrossSwapis subtle but correct.
Redistributereturnsnew_page(astring_viewfromidx_builder.Finish()), thenidx_page_builder_.Swap(idx_builder)transfers the backing buffer intoidx_page_builder_. The returned view still points to valid memory (now owned byidx_page_builder_), and the caller (FinishIndexPage) consumes it viamemcpybefore any further mutation. Consider a brief comment near line 1180 noting this lifetime dependency, since a future refactor that modifiesidx_page_builder_between the swap and the copy would silently invalidate the view.
252abc2 to
22fb86c
Compare
22fb86c to
a27f0bf
Compare
Here are some reminders before you submit the pull request
fixes eloqdb/eloqstore#issue_idctest --test-dir build/tests/Summary by CodeRabbit