Skip to content

Insert YieldToNextRound in compaction to avoid affecting online tasks #293

Merged
thweetkomputer merged 1 commit intomainfrom
yield_compact_zc
Jan 16, 2026
Merged

Insert YieldToNextRound in compaction to avoid affecting online tasks #293
thweetkomputer merged 1 commit intomainfrom
yield_compact_zc

Conversation

@thweetkomputer
Copy link
Collaborator

@thweetkomputer thweetkomputer commented Jan 16, 2026

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/eloqstore#issue_id
  • Reference the link of RFC if exists
  • Pass ctest --test-dir build/tests/

Summary by CodeRabbit

  • Refactor
    • Improved file garbage collection internals for faster, more memory-efficient set handling and cooperative yielding to reduce long blocking work.
    • Enhanced background write/compaction with yield-aware sorting, more granular progress points, and clearer log messages for better responsiveness and observability.
    • Mapping snapshot handling updated to better support shared usage patterns during background tasks.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

Walkthrough

Replaces std::unordered_set with absl::flat_hash_set across file-GC APIs, adds cooperative yielding to GC and background compaction (including a heap-based yielding sort), and makes MappingSnapshot inherit enable_shared_from_this for shared_ptr use.

Changes

Cohort / File(s) Summary
File GC API changes
include/file_gc.h, src/file_gc.cpp
Replaced std::unordered_set<FileId> with absl::flat_hash_set<FileId> in public/internal GC signatures (GetRetainedFiles, ExecuteLocalGC, ExecuteCloudGC, DeleteUnreferencedLocalFiles, DeleteUnreferencedCloudFiles) and added absl/container/flat_hash_set.h.
Background write / compaction
include/tasks/background_write.h, src/tasks/background_write.cpp
Added BackgroundWrite::HeapSortFpIdsWithYield(std::vector<std::pair<FilePageId, PageId>>& ), replaced direct std::sort with heap-based yielding sort, inserted multiple YieldToNextRound points, and adjusted compaction logging.
WriteTask GC trigger updates
src/tasks/write_task.cpp
Switched retained-files container to absl::flat_hash_set<FileId>, precomputed/reserved capacity, copied mapping snapshots into a local array, and yields between snapshot processing in TriggerFileGC.
MappingSnapshot ownership
include/storage/page_mapper.h
MappingSnapshot now publicly inherits from std::enable_shared_from_this<MappingSnapshot> to support shared_from_this usage.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant WriteTask
participant SnapArray as MappingSnapshots
participant FileGC as GetRetainedFiles
participant GCExec as ExecuteLocal/ExecuteCloudGC
participant StoreMgr as IouringMgr/CloudStoreMgr

WriteTask->>SnapArray: copy/reserve mapping snapshots
loop per snapshot
  WriteTask->>SnapArray: request snapshot
  SnapArray->>FileGC: iterate pages (yield periodically)
  FileGC-->>WriteTask: add FileId to retained_files
  WriteTask->>WriteTask: YieldToNextRound
end
WriteTask->>GCExec: call ExecuteLocalGC / ExecuteCloudGC with retained_files
GCExec->>StoreMgr: delete/unreference files (may yield)
StoreMgr-->>GCExec: report results
GCExec-->>WriteTask: completion status

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • eatbreads

Poem

🐇
I hop through hashed fields, small and bright,
Yield between chores from morning to night,
Heap-sort hums softly, pages align,
Files tucked away in a neat little line,
A happy burrow — code sleeps tight. 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description includes the template checklist but lacks critical details: no issue reference, no RFC reference, and no actual implementation details or rationale explaining why these changes are needed. Add linked issue reference using 'fixes eloqdb/eloqstore#issue_id' format and provide a brief description of the changes made and their rationale.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: inserting YieldToNextRound calls into the compaction process to prevent it from blocking online tasks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yield_compact_zc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@include/file_gc.h`:
- Line 3: Remove the now-unused include <unordered_set> from include/file_gc.h:
locate the top-of-file include list (where `#include` <unordered_set> appears) and
delete that line since the code now uses absl::flat_hash_set<FileId> (and
references to FileId) instead of std::unordered_set, ensuring no other parts of
include/file_gc.h depend on <unordered_set>.

In `@include/tasks/task.h`:
- Around line 3-4: Remove the unused include by deleting the line '#include
<butil/time.h>' from the header; the Task-related declarations (e.g., class Task
and the Yield() and YieldToNextRound() prototypes) do not depend on butil/time.h
since their implementations in src/tasks/task.cpp only use coroutine resumption,
so simply remove that include from include/tasks/task.h to avoid an unnecessary
dependency.

In `@src/storage/shard.cpp`:
- Line 3: Remove the unused include directive "#include <butil/time.h>" from the
top of the file; the file uses std::chrono for time operations (see usages
around functions handling time near lines with std::chrono) so delete the
include to minimize headers and rebuild to ensure no butil symbols are
referenced (check functions like any time-related helpers in shard.cpp).
🧹 Nitpick comments (3)
src/tasks/background_write.cpp (2)

161-169: Consider reducing consecutive yields.

Lines 161, 164, and 169 have three YieldToNextRound() calls in quick succession with minimal work between them (sorting already yielded, and reserve is typically fast). Consider consolidating to a single yield after HeapSortFpIdsWithYield completes.

♻️ Suggested consolidation
     }
-    YieldToNextRound();
     assert(fp_ids.size() == mapping_cnt);
     HeapSortFpIdsWithYield(fp_ids);
     YieldToNextRound();

     constexpr uint8_t max_move_batch = max_read_pages_batch;
     std::vector<Page> move_batch_buf;
     move_batch_buf.reserve(max_move_batch);
-    YieldToNextRound();
     std::vector<FilePageId> move_batch_fp_ids;

182-194: Redundant reset of round_cnt after yield.

When (round_cnt & 0xFFFF) == 0 is true (including when round_cnt == 0), you yield and then reset round_cnt = 0. Since the condition already guarantees round_cnt is a multiple of 65536 (or 0), resetting to 0 loses the count history. This means:

  1. On the first iteration with round_cnt = 0, you yield immediately before any work
  2. For files with no pages in the inner loop, you'll yield on every outer iteration

If the intent is to yield periodically based on cumulative work, simply remove the reset. If the intent is to always yield after processing 65536 items, the current logic works but the initial yield (when round_cnt == 0) may be unintentional.

♻️ Option A: Remove reset to track cumulative work
     size_t round_cnt = 0;
     for (FileId file_id = begin_file_id; file_id < end_file_id; file_id++)
     {
-        if ((round_cnt & 0xFFFF) == 0)
+        if (round_cnt >= 0x10000)
         {
             YieldToNextRound();
             round_cnt = 0;
         }
♻️ Option B: Initialize to 1 to avoid immediate yield
-    size_t round_cnt = 0;
+    size_t round_cnt = 1;
     for (FileId file_id = begin_file_id; file_id < end_file_id; file_id++)
     {
         if ((round_cnt & 0xFFFF) == 0)
         {
             YieldToNextRound();
-            round_cnt = 0;
+            round_cnt = 1;
         }
src/file_gc.cpp (1)

24-41: Guard ThdTask() before yielding.
ThdTask() is dereferenced unconditionally; if GetRetainedFiles ever runs outside a task context (e.g., background/offline GC or shutdown), this becomes a hard crash. Please confirm the invariant or add a nullptr guard.

♻️ Defensive guard
-        if ((page_id & 0xFF) == 0)
-        {
-            ThdTask()->YieldToNextRound();
-        }
+        if ((page_id & 0xFF) == 0)
+        {
+            if (auto *task = ThdTask())
+            {
+                task->YieldToNextRound();
+            }
+        }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9a9675 and e3fb838.

📒 Files selected for processing (7)
  • include/file_gc.h
  • include/tasks/background_write.h
  • include/tasks/task.h
  • src/file_gc.cpp
  • src/storage/shard.cpp
  • src/tasks/background_write.cpp
  • src/tasks/write_task.cpp
🧰 Additional context used
🧬 Code graph analysis (5)
include/file_gc.h (1)
src/file_gc.cpp (4)
  • GetRetainedFiles (24-43)
  • GetRetainedFiles (24-26)
  • ExecuteCloudGC (529-572)
  • ExecuteCloudGC (529-531)
src/file_gc.cpp (1)
src/tasks/task.cpp (2)
  • ThdTask (350-353)
  • ThdTask (350-350)
include/tasks/background_write.h (1)
src/tasks/background_write.cpp (2)
  • HeapSortFpIdsWithYield (61-93)
  • HeapSortFpIdsWithYield (61-62)
src/tasks/background_write.cpp (1)
src/tasks/task.cpp (2)
  • YieldToNextRound (29-33)
  • YieldToNextRound (29-29)
src/tasks/write_task.cpp (1)
src/file_gc.cpp (2)
  • GetRetainedFiles (24-43)
  • GetRetainedFiles (24-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (13)
include/tasks/background_write.h (2)

3-5: LGTM!

The added includes for <utility> and <vector> are necessary for the new HeapSortFpIdsWithYield method signature.


25-27: LGTM!

The private helper method declaration is well-placed and appropriately scoped. The implementation (visible in src/tasks/background_write.cpp) correctly uses incremental heap operations with periodic yields to enable cooperative multitasking during sorting.

src/tasks/background_write.cpp (3)

48-59: LGTM!

The comparator correctly orders by FilePageId first, then by PageId for ties. Placing it in an anonymous namespace appropriately limits its linkage.


61-93: LGTM - well-designed cooperative sorting.

The heap sort implementation correctly:

  • Early-returns for trivial cases
  • Incrementally builds the heap with push_heap
  • Extracts in sorted order with pop_heap
  • Yields every 256 operations to give online tasks a chance to run

95-284: Good addition of cooperative yielding in compaction.

The overall approach of inserting YieldToNextRound() at key points in the compaction flow (page collection, sorting, main loop) effectively breaks up the long-running operation to avoid starving online tasks. The log messages at start and finish provide useful observability.

include/file_gc.h (5)

6-6: LGTM!

absl::flat_hash_set is a good choice over std::unordered_set for this use case - it provides better cache locality and typically faster iteration/lookup performance.


17-19: LGTM!

The signature change to use absl::flat_hash_set<FileId> is consistent with the implementation in src/file_gc.cpp which also adds cooperative yielding during the file enumeration.


28-35: LGTM!

The GC execution functions consistently use the new absl::flat_hash_set<FileId> type for retained files.


42-47: LGTM!

Consistent type update for DeleteUnreferencedLocalFiles.


74-79: LGTM!

Consistent type update for DeleteUnreferencedCloudFiles.

src/file_gc.cpp (1)

11-11: Flat_hash_set migration looks consistent.
Nice end‑to‑end switch to absl::flat_hash_set across GC APIs. Please confirm Abseil’s container target is linked for all binaries compiling this file to avoid missing‑symbol or include issues.

Also applies to: 49-50, 374-375, 452-453, 530-531

src/tasks/write_task.cpp (2)

10-12: Includes align with new usage.
The added headers match the new snapshot_array and flat_hash_set usage; please ensure the Abseil container dependency is linked for this target as well.


393-413: Verify mapping snapshot lifetime across yields.
snapshot_array stores raw MappingSnapshot* and the loop yields after each snapshot. If another task can retire snapshots during a yield, these pointers may become invalid. Please confirm snapshots are pinned/ref‑counted for the duration or hold strong refs before yielding.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

format

fix

fix
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/file_gc.cpp (1)

24-41: Guard YieldToNextRound against missing task context.

GetRetainedFiles is a public API (include/file_gc.h:16) with no documented task context requirement, yet it unconditionally calls ThdTask()->YieldToNextRound() at line 40. Since ThdTask() (src/tasks/task.cpp:350-353) returns shard->running_ with no null check, and shard->running_ is explicitly set to nullptr when no task is active (src/storage/shard.cpp:390), the function will crash if invoked outside task context. Add a defensive guard:

-        if ((page_id & 0xFF) == 0)
-        {
-            ThdTask()->YieldToNextRound();
-        }
+        if ((page_id & 0xFF) == 0)
+        {
+            if (auto *task = ThdTask())
+            {
+                task->YieldToNextRound();
+            }
+        }

Alternatively, document that GetRetainedFiles requires execution under a KvTask context.

🤖 Fix all issues with AI agents
In `@src/tasks/write_task.cpp`:
- Around line 393-413: meta->mapping_snapshots_ may contain raw MappingSnapshot*
so calling mapping->shared_from_this() can throw std::bad_weak_ptr; inspect how
MappingSnapshot instances are created and owned and ensure they derive from
std::enable_shared_from_this and are always created/held in std::shared_ptr for
the duration of GC, or stop calling shared_from_this and change
meta->mapping_snapshots_ to hold std::shared_ptr<MappingSnapshot> directly.
Concretely, update the producer code that populates mapping_snapshots_ to use
std::make_shared<MappingSnapshot> (or otherwise transfer ownership into a
std::shared_ptr), change the mapping_snapshots_ container type to
std::vector<std::shared_ptr<MappingSnapshot>> (or similar) so the loop that
builds snapshot_array can copy those shared_ptrs instead of calling
MappingSnapshot::shared_from_this(), and ensure MappingSnapshot inherits
enable_shared_from_this if shared_from_this is still needed.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e80a5c8 and 548ca0c.

📒 Files selected for processing (6)
  • include/file_gc.h
  • include/storage/page_mapper.h
  • include/tasks/background_write.h
  • src/file_gc.cpp
  • src/tasks/background_write.cpp
  • src/tasks/write_task.cpp
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/tasks/background_write.cpp
  • include/storage/page_mapper.h
  • include/file_gc.h
  • include/tasks/background_write.h
🧰 Additional context used
🧬 Code graph analysis (2)
src/tasks/write_task.cpp (1)
src/file_gc.cpp (2)
  • GetRetainedFiles (24-43)
  • GetRetainedFiles (24-26)
src/file_gc.cpp (1)
src/tasks/task.cpp (2)
  • ThdTask (350-353)
  • ThdTask (350-350)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (3)
src/tasks/write_task.cpp (1)

10-12: LGTM on the new includes.

src/file_gc.cpp (2)

11-11: LGTM.


48-50: LGTM on the signature propagation.

Also applies to: 371-375, 449-453, 529-531

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants