Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Merge gRPC method to index node writer #1860

Conversation

jotare
Copy link
Contributor

@jotare jotare commented Feb 21, 2024

Description

Describe the proposed changes made in this PR.

How was this PR tested?

Describe how you tested this PR.

@jotare jotare requested a review from a team February 21, 2024 10:18
Copy link

This pull request has been linked to Shortcut Story #8997: Add Merge gRPC method to index node writer interface.

Copy link

codecov bot commented Feb 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e1e1c23) 84.34% compared to head (f381071) 84.34%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1860   +/-   ##
=======================================
  Coverage   84.34%   84.34%           
=======================================
  Files         328      328           
  Lines       18773    18773           
=======================================
  Hits        15834    15834           
  Misses       2939     2939           
Flag Coverage Δ
ingest 69.72% <ø> (ø)
node-sidecar 95.32% <ø> (ø)
sdk 87.85% <ø> (ø)
utils 81.81% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@hermeGarcia hermeGarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Contributor

@lferran lferran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jotare can you add some tests to make sure all the wiring between layers is correct?
Doesn't need to be a functional test, just a integration one

Comment on lines 295 to 304
let force_merge_capacity = 100;
let mut live_segments: Vec<_> = state.dpid_iter().collect();
let mut buffer = Vec::with_capacity(force_merge_capacity);

while buffer.len() < force_merge_capacity {
let Some(journal) = live_segments.pop() else {
break;
};
buffer.push((state.delete_log(journal), journal.id()));
}
Copy link
Contributor

@javitonino javitonino Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should limit by number of nodes per segment? e.g:

Suggested change
let force_merge_capacity = 100;
let mut live_segments: Vec<_> = state.dpid_iter().collect();
let mut buffer = Vec::with_capacity(force_merge_capacity);
while buffer.len() < force_merge_capacity {
let Some(journal) = live_segments.pop() else {
break;
};
buffer.push((state.delete_log(journal), journal.id()));
}
let force_merge_capacity = 50_000;
let mut live_segments: Vec<_> = state.dpid_iter().collect();
let mut buffer = Vec::with_capacity(force_merge_capacity);
let mut nodes_to_merge = 0;
while nodes_to_merge < force_merge_capacity {
let Some(journal) = live_segments.pop() else {
break;
};
if journal.no_nodes() > force_merge_capacity {
live_segments.push(journal);
} else {
buffer.push((state.delete_log(journal), journal.id()));
nodes_to_merge += journal.no_nodes();
}
}

Or maybe something more simple and just skip large enough segments.

Copy link
Contributor

@hermeGarcia hermeGarcia Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like another good metric, definitely useful, however to make the right decision we should get more info for the threshold right? I went with this because testing is easier tbh...
I will add the skipping of large segments though!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a segment is big enough this is an infinite loop though:

while nodes_to_merge < force_merge_capacity {
            let Some(journal) = live_segments.pop() else {
                break;
            };
            if journal.no_nodes() > force_merge_capacity {
              live_segments.push(journal);
            } else {
              buffer.push((state.delete_log(journal), journal.id()));
              nodes_to_merge += journal.no_nodes();
            }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I somehow thought live_segments was two separate vecs! 😓

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threshold added!

hermeGarcia and others added 2 commits February 21, 2024 14:03
Co-authored-by: Javier Torres <javier@javiertorres.eu>
@jotare jotare marked this pull request as ready for review February 21, 2024 13:54
@jotare jotare merged commit 20126e2 into main Feb 21, 2024
105 checks passed
@jotare jotare deleted the joanantoniriera4168/sc-8997/add-merge-grpc-method-to-index-node-writer branch February 21, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants