-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Merge gRPC method to index node writer #1860
Add Merge gRPC method to index node writer #1860
Conversation
This pull request has been linked to Shortcut Story #8997: Add |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1860 +/- ##
=======================================
Coverage 84.34% 84.34%
=======================================
Files 328 328
Lines 18773 18773
=======================================
Hits 15834 15834
Misses 2939 2939
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jotare can you add some tests to make sure all the wiring between layers is correct?
Doesn't need to be a functional test, just a integration one
let force_merge_capacity = 100; | ||
let mut live_segments: Vec<_> = state.dpid_iter().collect(); | ||
let mut buffer = Vec::with_capacity(force_merge_capacity); | ||
|
||
while buffer.len() < force_merge_capacity { | ||
let Some(journal) = live_segments.pop() else { | ||
break; | ||
}; | ||
buffer.push((state.delete_log(journal), journal.id())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should limit by number of nodes per segment? e.g:
let force_merge_capacity = 100; | |
let mut live_segments: Vec<_> = state.dpid_iter().collect(); | |
let mut buffer = Vec::with_capacity(force_merge_capacity); | |
while buffer.len() < force_merge_capacity { | |
let Some(journal) = live_segments.pop() else { | |
break; | |
}; | |
buffer.push((state.delete_log(journal), journal.id())); | |
} | |
let force_merge_capacity = 50_000; | |
let mut live_segments: Vec<_> = state.dpid_iter().collect(); | |
let mut buffer = Vec::with_capacity(force_merge_capacity); | |
let mut nodes_to_merge = 0; | |
while nodes_to_merge < force_merge_capacity { | |
let Some(journal) = live_segments.pop() else { | |
break; | |
}; | |
if journal.no_nodes() > force_merge_capacity { | |
live_segments.push(journal); | |
} else { | |
buffer.push((state.delete_log(journal), journal.id())); | |
nodes_to_merge += journal.no_nodes(); | |
} | |
} |
Or maybe something more simple and just skip large enough segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like another good metric, definitely useful, however to make the right decision we should get more info for the threshold right? I went with this because testing is easier tbh...
I will add the skipping of large segments though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a segment is big enough this is an infinite loop though:
while nodes_to_merge < force_merge_capacity {
let Some(journal) = live_segments.pop() else {
break;
};
if journal.no_nodes() > force_merge_capacity {
live_segments.push(journal);
} else {
buffer.push((state.delete_log(journal), journal.id()));
nodes_to_merge += journal.no_nodes();
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I somehow thought live_segments
was two separate vecs! 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Threshold added!
Co-authored-by: Javier Torres <javier@javiertorres.eu>
Description
Describe the proposed changes made in this PR.
How was this PR tested?
Describe how you tested this PR.