fix: oom bug caused by removing temporary data page number restriction#278
fix: oom bug caused by removing temporary data page number restriction#278thweetkomputer merged 2 commits intomainfrom
Conversation
WalkthroughRefactors page allocation and error handling across storage and tasks: removes Page's bool conversion, drops PagesPool capacity tracking and changes Extend to void, and simplifies BatchWriteTask::Redistribute to return a Page. Adjusts related allocation/read paths and tests to match the new semantics. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
f8649b9 to
7245bd5
Compare
7245bd5 to
a9fdc71
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/storage/index_page_manager.cpp (1)
91-96: Verify the memory sizing calculation aligns with actual page memory usage.The
IsFull()check multipliesindex_pages_.size()bydata_page_size, but this accounts only for the page buffers allocated from thePagePool. EachMemIndexPageobject has significant per-object overhead—the struct itself contains pointers (next_,prev_,tbl_ident_), IDs, reference counts, and aWaitingZonemember—which adds 80+ bytes per page. Additionally, thestd::vector<std::unique_ptr<MemIndexPage>>container has its own overhead.This underestimation means actual memory pressure can exceed the calculated threshold, potentially exhausting the buffer pool before
IsFull()returns true.
🤖 Fix all issues with AI agents
In @src/tasks/batch_write_task.cpp:
- Around line 1100-1106: The allocation of Page new_page(true) can return a null
pointer and Ptr() is used unguarded in the memcpy; add an immediate null check
after creating new_page (check new_page.Ptr() or new_page.IsValid()) and handle
the failure path (e.g., log/error/throw or return an appropriate sentinel Page)
before calling std::memcpy; update the code around the Page allocation in this
function so the corner-case branch that copies cur_page.data() only runs when
new_page.Ptr() is non-null, otherwise perform the chosen failure handling.
- Line 1143: FinishDataPage() uses the Page returned by Redistribute() without
validating allocation, causing a null dereference in production; modify
FinishDataPage() to check the Page returned by Redistribute() (inspect
page.Ptr() or page.is_valid()) before constructing DataPageIter or calling
iter.HasNext(), and if allocation failed propagate an error/return a failure (or
nullptr/new_page error path) rather than proceeding; reference Redistribute(),
Page (from PagePool()->Allocate()), FinishDataPage(), DataPageIter, page.Ptr(),
and iter.HasNext() when adding the early null-check and error propagation.
🧹 Nitpick comments (1)
src/storage/page.cpp (1)
110-120: Consider making the extension batch size configurable.The hardcoded
1024pages (4MB at default page size) is a reasonable default, but could benefit from being a configurable option inKvOptionsto allow tuning for different workload profiles.💡 Optional: Make extension size configurable
char *PagesPool::Allocate() { if (free_head_ == nullptr) { - Extend(1024); // Extend the pool with 1024 pages if free list is empty. + // Extend the pool if free list is empty. + Extend(options_->page_pool_extend_size); assert(free_head_ != nullptr); }Add to
KvOptions:uint32_t page_pool_extend_size = 1024;
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
include/kv_options.hinclude/storage/page.hinclude/tasks/batch_write_task.hsrc/async_io_manager.cppsrc/storage/data_page.cppsrc/storage/index_page_manager.cppsrc/storage/page.cppsrc/tasks/batch_write_task.cpptests/batch_write.cpptests/concurrency.cpp
💤 Files with no reviewable changes (2)
- tests/concurrency.cpp
- src/async_io_manager.cpp
🧰 Additional context used
🧬 Code graph analysis (5)
include/storage/page.h (1)
src/storage/page.cpp (2)
Extend(95-108)Extend(95-95)
include/tasks/batch_write_task.h (1)
src/tasks/batch_write_task.cpp (4)
Redistribute(1075-1144)Redistribute(1075-1076)Redistribute(1146-1221)Redistribute(1146-1148)
src/storage/index_page_manager.cpp (1)
src/tasks/batch_write_task.cpp (2)
new_page(1067-1067)new_page(1699-1699)
src/storage/data_page.cpp (1)
include/storage/data_page.h (1)
page_(204-205)
src/tasks/batch_write_task.cpp (4)
include/storage/page.h (1)
Page(30-49)src/storage/page.cpp (4)
Page(33-43)Page(45-48)Page(50-53)Page(66-69)src/tasks/write_task.cpp (8)
WritePage(59-64)WritePage(59-59)WritePage(66-71)WritePage(66-66)WritePage(73-80)WritePage(73-73)WritePage(82-124)WritePage(82-82)src/storage/data_page.cpp (3)
OverflowPage(518-522)OverflowPage(524-554)OverflowPage(556-559)
🪛 Cppcheck (2.19.0)
src/storage/page.cpp
[error] 102-102: Mismatching allocation and deallocation
(mismatchAllocDealloc)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (17)
include/kv_options.h (1)
42-45: LGTM!The documentation update accurately reflects the semantic shift from a fixed-capacity buffer pool to a cache-oriented model for index pages.
include/storage/page.h (1)
62-62: LGTM!The signature change from
booltovoidaligns with the shift to assertion-based validation and unbounded pool growth. Callers no longer need to handle allocation failures explicitly since the pool now grows dynamically.include/tasks/batch_write_task.h (1)
52-60: LGTM!The simplified return type from
std::pair<Page, KvError>toPageis consistent with the broader refactoring pattern shifting from explicit error returns to assertion-based validation. The documentation accurately describes the new behavior.src/storage/page.cpp (3)
100-102: Static analysis false positive:aligned_allocwithstd::freeis valid.The cppcheck warning about mismatching allocation/deallocation is incorrect. Per C11/C++17 standards, memory allocated via
aligned_alloccan be freed withfree(). TheUPtrcorrectly uses&std::freeas the deleter.
85-93: LGTM!The constructor initialization is clean with a sensible initial allocation strategy. Starting with
buf_ring_sizepages (or 1 minimum) provides a reasonable baseline for io_uring operations while enabling lazy growth.
95-108: Verify abort-on-OOM is acceptable behavior.The assertion at line 101 (
assert(ptr)) will abort the process ifaligned_allocfails. This is a deliberate shift from potentially returning an error. Ensure this fail-fast behavior is acceptable for all deployment scenarios.src/storage/data_page.cpp (2)
39-42: LGTM!The explicit
page_.Ptr() == nullptrcheck is clearer than relying on an implicit bool conversion and aligns with the removal ofoperator bool()from thePageclass.
524-554: LGTM!The constructor correctly relies on the assertion-based validation in
PagesPool::Allocate. If allocation fails, the program aborts before reaching this code, sopage_.Ptr()is guaranteed valid when used.src/storage/index_page_manager.cpp (1)
51-53: LGTM! Simplified allocation logic.The change from explicit
unique_ptrconstruction and push_back toemplace_backis cleaner and correctly assigns the newly created page tonext_free.tests/batch_write.cpp (4)
130-130: LGTM! Tightened buffer constraint for OOM test.Setting
buffer_pool_size = 4096(equal todata_page_size) correctly constrains the buffer pool to a singleMemIndexPage, which is necessary to trigger theOutOfMemcondition being tested.
164-164: LGTM! Increased initial entries for abort test.Using 4 entries with large values (3500 bytes each) ensures the first write establishes enough state to pin pages before the second write triggers the OOM condition.
190-191: LGTM! Reasonable buffer sizing for concurrent partition test.The buffer pool of
4096 * 400pages provides sufficient capacity to test 3000 partitions while still allowing some OOM conditions to occur under contention.
317-317: LGTM! Minimal buffer for stress test.Setting
buffer_pool_size = opts.data_page_sizecreates a single-page buffer that maximizes memory pressure, appropriately stressing the abort and cleanup paths.src/tasks/batch_write_task.cpp (4)
1041-1042: LGTM! Simplified redistribution call.The change from
std::pair<Page, KvError>to directPagereturn is cleaner. ThePageclass handles allocation internally, and failure can be detected by checkingPage::Ptr()if needed.
1050-1050: LGTM! Direct page allocation.Using
Page(true)directly inSetPageis straightforward. Per thePageconstructor insrc/storage/page.cpp, passingtrueallocates from the page pool.
1371-1372: LGTM! Inlined OverflowPage construction.Constructing
OverflowPagedirectly in theWritePagecall is cleaner than creating a temporary variable. TheOverflowPageconstructor (persrc/storage/data_page.cpplines 523-553) handlesPage(true)allocation internally.
1396-1396: LGTM! Consistent with the pattern above.Same inline construction pattern for
OverflowPagewithout pointers.
Here are some reminders before you submit the pull request
fixes eloqdb/eloqstore#issue_idctest --test-dir build/tests/Summary by CodeRabbit
Documentation
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.