-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pick connections based on batch first statement's shard #508
Conversation
scylla/src/transport/session.rs
Outdated
@@ -1315,7 +1344,7 @@ impl Session { | |||
.await | |||
} | |||
|
|||
fn calculate_token( | |||
pub fn calculate_token( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this (plus the added Hash for Token
) is enough to resolve #468, insight welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm looks like it's not, as this specifically asks for shard, and not only token (which looks like it would improve batching abilities).
This is a good first step though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear for me how this should be done and likely would conflict with #491.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about conflicts with #491, it's a larger effort and we'll make sure that it works correctly with all previously added features. As for #468, our driver is internally capable of computing which shard owns a particular token (if all the information is properly fetched and available), take a look at this test case of a Sharder:
scylla-rust-driver/scylla/src/routing.rs
Lines 144 to 166 in fd06928
#[cfg(test)] | |
mod tests { | |
use super::Token; | |
use super::{ShardCount, Sharder}; | |
use std::collections::HashSet; | |
#[test] | |
fn test_shard_of() { | |
/* Test values taken from the gocql driver. */ | |
let sharder = Sharder::new(ShardCount::new(4).unwrap(), 12); | |
assert_eq!( | |
sharder.shard_of(Token { | |
value: -9219783007514621794 | |
}), | |
3 | |
); | |
assert_eq!( | |
sharder.shard_of(Token { | |
value: 9222582454147032830 | |
}), | |
3 | |
); | |
} |
So if you go deep enough, it's possible to expose this sharding information publicly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharder appears to be a property of a Node, so it looks like I'd have to go through building the query plan, getting the corresponding node (what if there are several nodes output by the plan? just take the first one?) then get the sharder for the node and eventually return something like a (node, shard) pair, but I don't know what we could use as node identifier, in order to allow users to group by that.
It looks like providing an interface for computing the token is a good first step still, as it will allow grouping at least by partition if not by shard, and exposing the shard interface can be done in a later PR.
As it's unlikely I'll have time for implementing the shard interface anytime soon, I'd suggest moving forward with this without this additional feature.
b7c82e5
to
06a5897
Compare
c19d684
to
4fb5ca9
Compare
4fb5ca9
to
d45a6c5
Compare
Alright I had a go at the patch. My initial reaction was that it's way too complicated, but after thinking this through I think I agree that this is the way. Here is a summary to gather thoughts and make reviewing easier for others. We want a few things from the new
Seeing these requirements one could attempt to write: trait BatchValues {
type BatchValuesIter: BatchValuesIterator;
fn iter(&self) -> Self::BatchValuesIter;
fn len(&self) -> usize;
}
trait BatchValuesIterator: Clone {
fn write_next_to_request (&mut self,buf: &mut impl BufMut) -> Option<()>;
fn next_serialized(&mut self) -> Option<SerializedResult<'_>>;
} But there is a problem when the type trait BatchValues {
type BatchValuesIter<'a>: BatchValuesIterator<'a>;
fn iter<'a>(&'a self) -> Self::BatchValuesIter<'a>;
} But because there are no GATs we end up with the workaround (which is really cool btw). Looks very nice, I will read the rest and submit a review. |
That was exactly my thought process, thanks a lot for the detailed write-up! 😃
Yeah at this point it has saved us more than once within Diesel 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I only left some style comments.
Style is highly subjective, but let's try to keep it consistent with the rest of the code.
I'm worried that our tests might not test all code paths here. In cases where the first |
Especially useful in conjunction with batching, even more so with shard-aware batching (scylladb#508)
Returns scylla-rust-driver/scylla/src/transport/session.rs Lines 1346 to 1348 in e484adc
Anyway, I'm just handling it the exact same way as is done in
scylla-rust-driver/scylla/src/transport/session.rs Lines 740 to 743 in e484adc
|
There is a merge conflict with |
Also could you squash these two commits together, both of them contain changes related to review comments but only one is named like it does. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from rebasing, one more thing I would love to see before merging is #508 (comment) I will be on vacation next week, so I won't be able to merge it then. |
59cd0c5
to
cd22a26
Compare
To achieve this, it was useful to rework the BatchValues trait. With this rework, it's also possible to pretty easily provide cloneable `Iterator`s over `ValueList` as `BatchValues` (through `BatchValuesFromIterator`)
cd22a26
to
24ee954
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The None
case could be added to the other cases as well, but let's avoid spending extra time on the details.
Helps (or arguably fixes) scylladb#468 For smarter batch constitution (following up on scylladb#448), it is required to have access to the token a particular statement would be directed to. However, that is pretty difficult to do without access to calculate_token. That was initially put in scylladb#508, but planned to put in a separate pr to keep it minimal (scylladb#508 (comment)) It seems I had forgotten to open that separate PR, and I end up getting bitten by that now that I want to constitute my batches in a smarter way. So here it is. I'm only putting "helps" above because I think we may want to also expose query planning ( `session.plan(prepared_statement, serialized_values) -> impl Iterator<Item = (Arc<Node>, ShardID)>` ) as that may make it significantly easier - but I'd like to keep this PR that just *enables* the ideal behavior as simple as possible.
Resolves scylladb#468 This is a follow-up on scylladb#508 and scylladb#658: - To minimize CPU usage related to network operations when inserting a very large number of lines, it is relevant to batch. - To batch in the most efficient manner, these batches have to be shard-aware. Since scylladb#508, `batch` will pick the shard of the first statement to send the query to. However it is left to the user to constitute the batches in such a way that the target shard is the same for all the elements of the batch. - This was made *possible* by scylladb#658, but it was still very boilerplate-ish. I was waiting for scylladb#612 to be merged (amazing work btw! 😃) to implement a more direct and factored API (as that would use it). - This new ~`Session::first_shard_for_statement(self, &PreparedStatement, &SerializedValues) -> Option<(Node, Option<Shard>)>` makes shard-aware batching easy on the users, by providing access to the first node and shard of the query plan.
Fixes: #448
Helps #468
To achieve this, it was useful to rework the BatchValues trait.
With this rework, it's also possible to pretty easily provide cloneable
Iterator
s overValueList
asBatchValues
(throughBatchValuesFromIterator
) - so #499 may get closed if this gets merged (or this PR will require update if #499 gets merged before).Pre-review checklist
Fixes:
annotations to PR description.