Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rack-awareness to default LBP #666

Merged
merged 3 commits into from Apr 5, 2023
Merged

Conversation

Gor027
Copy link
Contributor

@Gor027 Gor027 commented Mar 20, 2023

Added rack-awareness option to prefer replicas in local rack while picking a coordinator. It will allow to create a top-level tier with local datacenter and local rack replicas in the fallback plan.

Fixes: #604

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I added appropriate Fixes: annotations to PR description.

scylla/src/transport/load_balancing/default.rs Outdated Show resolved Hide resolved
scylla/src/transport/load_balancing/default.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@piodul piodul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added the implementation of rack awareness, but I can't see a way to enable it from the user's perspective. The DefaultPolicyBuilder should be adjusted.

@@ -1066,6 +1104,31 @@ mod tests {
.group([B, C, E]) // remote nodes
.build(),
},
// Keyspace SS with RF=2 with enabled DC failover
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adjust this description to mention that this test also enables rack-awareness

// us eu eu us eu eu us
// r2 r1 r1 r1 r2 r1 r1
expected_groups: ExpectedGroupsBuilder::new()
.group([G, B]) // pick + fallback local replicas
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't those be in separate groups? G is in the local rack but B is not.

@Gor027
Copy link
Contributor Author

Gor027 commented Mar 21, 2023

V2:

  • The pick() function will prefer local rack replicas, and if there are none, it will prioritize local DC replicas.
  • The fallback() function will have a new tier in the load balancing order prioritizing local DC local rack replicas in the fallback plan.

Fixed all suggestions mentioned above.

@piodul
Copy link
Collaborator

piodul commented Mar 22, 2023

Please rebase on main. The PR doesn't seem to account for the newly added option that disables replica shuffling.

@Gor027 Gor027 force-pushed the rack-awareness branch 2 times, most recently from 1e9e7cf to acb2eda Compare March 22, 2023 10:36
@wprzytula
Copy link
Collaborator

Now that we want filter replicas not only by DC, but also by rack, we should consider adding this option to the replica locator, similarly to how filtering by DC is currently implemented. WDYT @havaker @cvybhu?
Alternatively, we can filter replicas by rack on the higher level (as this commit attempts atm), but this is an inconsistency, and a probable source of inefficiency.

Comment on lines 137 to 161
let local_rack_replicas = self.replicas(
ts,
true,
|node| {
Self::is_alive(node)
&& self.preferred_rack.is_some()
&& DefaultPolicy::rack_predicate(node, &self.preferred_rack)
},
cluster,
);
let local_replicas = self.shuffled_replicas(
ts,
true,
|node| {
Self::is_alive(node)
&& (self.preferred_rack.is_none()
|| !DefaultPolicy::rack_predicate(node, &self.preferred_rack))
},
cluster,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are local_rack_replicas not chosen with self.shuffled_replicas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 309 to 326
fn pick_local_rack_replica<'a>(
&'a self,
ts: &TokenWithStrategy<'a>,
should_be_local: bool,
predicate: &impl Fn(&NodeRef<'a>) -> bool,
cluster: &'a ClusterData,
) -> Option<NodeRef<'a>> {
if let Some(rack) = &self.preferred_rack {
return self
.nonfiltered_replica_set(ts, should_be_local, cluster)
.choose_filtered(&mut thread_rng(), |node| {
predicate(&node) && node.rack.as_deref() == Some(rack)
});
}

None
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, creating a distinct function for selecting local rack replicas is unnecessary. Instead, the desired outcome can be accomplished by utilizing a suitable predicate with the pick_replicas function.

Furthermore, it appears that the modifications introduced by rebasing were not taken into account. While pick_replica has the ability to select from an unshuffled sequence, this capability is not present in pick_local_rack_replica, which results in a lack of support for the enable_shuffling_replicas(false) option.

Overall, I believe this reinforces the argument against replicating the functionality of pick_replica and instead utilizing a custom predicate to choose local rack replicas.

// us eu eu us eu eu us
// r2 r1 r1 r1 r2 r1 r1
expected_groups: ExpectedGroupsBuilder::new()
.group([G]) // pick local rack replicas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a single element rack group fails to test shuffling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test, that tests shuffling of local-rack replicas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand by reading the ExpectedGroups::assert_proper_grouping_in_plans code, it does not test the shuffling of nodes, it just verifies that the plan contains all expected nodes in the group. So, does it matter if there are one or more replicas if you collect them with group?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does test shuffling.

ExpectedGroup::NonDeterministic(s) => {
// The group is supposed to have non-deterministic
// ordering. If the group size is larger than one,
// then expect there to be more than one group
// in the set.
if gots.len() > 1 && s.len() > 1 {
assert!(sets.len() > 1);
}
}

After some thought though, I realized that it would be quite difficult to test if returned local-rack replicas as shuffled. It is because local-rack replicas are always first in the plan (if available), an hence the first element from the plan is a picked local-rack replica. If pick chooses nodes in a random way but fallback doesn't, the returned group in plan is not deterministic anyway. Our testing framework has no way of detecting that a group is only partially shuffled.

I don't know if adding new functionality to the testing framework is worth it. I guess that I'm withdrawing my request for a test that checks shuffling of local-rack replicas. @wprzytula any thoughts on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to at least verify that pick chooses rack-local replicas randomly. Having a test with the "local rack replicas" group of size 2 should check this.

Later we can consider adjusting the tests - for example, in addition to checking plans that combine pick and fallback we could also check plans returned by fallback only.

/// in the preferred datacenter, and then the other replicas in the datacenter.
///
/// When a preferred datacenter is not set, setting preferred rack will not have any effect.
pub fn prefer_rack(mut self, rack_name: String) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this function's documentation to the doc book.

Comment on lines 142 to 143
&& self.preferred_rack.is_some()
&& DefaultPolicy::rack_predicate(node, &self.preferred_rack)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to include checking if preferred_rack is not None to rack_predicate. It would reduce the complexity of this expression.

@@ -353,6 +405,10 @@ impl DefaultPolicy {
node.is_enabled()
}

fn rack_predicate(node: &NodeRef<'_>, preferred_rack: &Option<String>) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other predicates are named after the condition they check, such as is_alive which verifies if a node is alive (it doesn't at the moment but will be;))). It may be worthwhile to rename rack_predicate to adhere to this convention, e.g. is_rack_local.

scylla/src/transport/load_balancing/default.rs Outdated Show resolved Hide resolved
scylla/src/transport/load_balancing/default.rs Outdated Show resolved Hide resolved
scylla/src/transport/load_balancing/default.rs Outdated Show resolved Hide resolved
@havaker
Copy link
Contributor

havaker commented Mar 22, 2023

Now that we want filter replicas not only by DC, but also by rack, we should consider adding this option to the replica locator, similarly to how filtering by DC is currently implemented. WDYT @havaker @cvybhu? Alternatively, we can filter replicas by rack on the higher level (as this commit attempts atm), but this is an inconsistency, and a probable source of inefficiency.

I was the one to suggest filtering replicas by rack on the higher level to @Gor027;)))

The idea of putting more filtering options to the locator came to my mind, but I'm not a fan of it.
Required modifications would be complex (and API breaking):

  • Rack aware filtering does not fit into the current locator design as easily as DC filtering.
  • It is not always desired to precompute rack-filtered replica sets - session wide configuration options would need to be exposed.

To simplify the scope of this pull request I would go with the predicate-filtering. We could later assess performance gains of moving such filtering to the locator module and implement it if needed.

@Gor027
Copy link
Contributor Author

Gor027 commented Mar 26, 2023

V3:

  • Picking a local rack replica is implemented using pick_replica as suggested above:
    As @wprzytula suggested above, the should_be_local parameter in pick_replica function is replaced by ReplicaLocation enum which resulted in the following change in the function:
            let (predicate, should_be_local): (Box<dyn Fn(&NodeRef<'a>) -> bool>, bool) =
             match replica_location {
                 ReplicaLocation::Any => (Box::new(|node| predicate(node)), false),
                 ReplicaLocation::Datacenter => (Box::new(|node| predicate(node)), true),
                 ReplicaLocation::DatacenterAndRack => (
                     Box::new(|node| predicate(node) && node.rack == self.preferred_rack),
                     true,
                 ),
             };
    
    
    Even though the signatures of the closures in the above match arms are the same, however, the types are different so I could not find a more elegant solution than to wrap the returned predicates in Box. As a consequence, the predicate variable has a complex type and Clippy started complaining about it, so for now, the warnings are suppressed with the allow attribute: #[allow(clippy::type_complexity)]
  • Added tests for rack awareness and fixed shuffling enabled
  • Added documentation in the docs book

@@ -13,6 +13,12 @@ use scylla_cql::{errors::QueryError, frame::types::SerialConsistency, Consistenc
use std::{fmt, sync::Arc, time::Duration};
use tracing::warn;

pub enum ReplicaLocation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added by mistake, fixed it.

Comment on lines 153 to 161
let local_rack_replicas = self.replicas(
ts,
true,
|node| {
Self::is_alive(node)
&& self.preferred_rack.is_some()
&& node.rack == self.preferred_rack
},
cluster,
);
let local_replicas = self.shuffled_replicas(
ts,
true,
|node| {
Self::is_alive(node)
&& (self.preferred_rack.is_none() || node.rack != self.preferred_rack)
},
cluster,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.replicas, self.shuffled_replicas and self.pick_replica had a very similar signature. I feel that it's a bit inconsistent that you introduced a ReplicaLocation enum only in self.pick_replica. Was there any specific reason behind it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation of retrieving local rack replicas to use shuffled_replicas and appropriately changed the should_be_local: bool to replica_location: ReplicaLocation enum in all of the methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After your changes, semantics of self.shuffled_replicas and self.pick_replica differ. self.pick_replica modifies its predicate according to the value of ReplicaLocation argument, but self.shuffled_replicas does not. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the predicate logic was passed in the fallback function, so I thought it is redundant to modify the predicate in self.replicas. I fixed it, but the predicate modification required the ReplicaLocation enum to implement Clone and Copy.

Comment on lines 137 to 161
let local_rack_replicas = self.replicas(
ts,
true,
|node| {
Self::is_alive(node)
&& self.preferred_rack.is_some()
&& DefaultPolicy::rack_predicate(node, &self.preferred_rack)
},
cluster,
);
let local_replicas = self.shuffled_replicas(
ts,
true,
|node| {
Self::is_alive(node)
&& (self.preferred_rack.is_none()
|| !DefaultPolicy::rack_predicate(node, &self.preferred_rack))
},
cluster,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// us eu eu us eu eu us
// r2 r1 r1 r1 r2 r1 r1
expected_groups: ExpectedGroupsBuilder::new()
.group([G]) // pick local rack replicas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test, that tests shuffling of local-rack replicas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the Creating a DefaultPolicy section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@havaker
Copy link
Contributor

havaker commented Mar 29, 2023

Our convention regarding commit names is something like module_name: short description of changes. I don't see you sticking to it:/

@Gor027 Gor027 force-pushed the rack-awareness branch 3 times, most recently from 280a1e1 to 90edd07 Compare March 30, 2023 12:34
@@ -68,6 +69,21 @@ or refrain from preferring datacenters (which may ban all other datacenters, if
}
}
if let Some(ts) = &routing_info.token_with_strategy {
// Try to pick some alive local rack random replica.
// If preferred rack is not specified, try to pick local DC replica.
if let Some(_local_rack) = &self.preferred_rack {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if let is a bit superfluous here, you are creating an unused _local_rack variable. if self.preferred_rack.is_some() would be a better choice there probably.

Comment on lines +312 to +317
let predicate = move |node| match replica_location {
ReplicaLocation::Any | ReplicaLocation::Datacenter => predicate(&node),
ReplicaLocation::DatacenterAndRack => {
predicate(&node) && node.rack == self.preferred_rack
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it belongs to the previous commit, where you added ReplicaLocation parameter to pick_replica, replicas and shuffled_replicas.

This is because without this predicate modification, those functions are not fully respecting ReplicaLocation values - replicas(..., ReplicaLocation::DatacenterAndRack, ...) won't return a set that is restricted to local rack.

Comment on lines +334 to +339
let predicate = |node| match replica_location {
ReplicaLocation::Any | ReplicaLocation::Datacenter => predicate(&node),
ReplicaLocation::DatacenterAndRack => {
predicate(&node) && node.rack == self.preferred_rack
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

let should_be_local = match replica_location {
ReplicaLocation::Any => false,
ReplicaLocation::Datacenter | ReplicaLocation::DatacenterAndRack => true,
};
self.nonfiltered_replica_set(ts, should_be_local, cluster)
.into_iter()
.filter(predicate)
.filter(move |node: &NodeRef<'a>| predicate(node))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should also be applied to the previous commit - it does not compile without it.

prioritize nodes based on their availability zones in the preferred datacenter.
When a preferred rack is set, the policy will first return replicas in the local rack
in the preferred datacenter, and then the other replicas in the datacenter.
When a preferred datacenter is not set, setting preferred rack will not have any effect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test that checks this behavior.

Gor Stepanyan added 2 commits April 2, 2023 21:17
Changed `should_be_local` parameter in `pick_replica`, `replicas`
and `shuffled_replicas` to `ReplicaLocation` enum.
It is more expressive than a boolean and allows for more ways
of filtering while picking a replica.
Modified pick and fallback plan to prefer local rack replicas.
// eu eu us eu eu us us
// r1 r1 r1 r2 r1 r2 r1
expected_groups: ExpectedGroupsBuilder::new()
.deterministic([B, C]) // pick local rack + fallback replicas
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: instead of this single .deterministic, you should do:

.deterministic([B])
.deterministic([C])

The .deterministic() method does not check that the load balancing policy always returns in the specified order - e.g. in your case it does not check that it always return B and then C. It only checks that the nodes within the group are always returned in an unspecified, but always the same order.

In your case, both [B, C] and [C, B] orders will be accepted by the test as long as the load balancing policy always sticks to the same order. I assume that you wanted to check that B then C is always returned - in such case you should use two groups like I suggested.

Copy link
Contributor

@havaker havaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

There's one minor thing left that I think is worth fixing - the main thing transport: load_balancing: Add ReplicationLocation enum commit introduces is rack-awareness to the default policy internals, but it's not mentioned anywhere in the commit message.

@Gor027
Copy link
Contributor Author

Gor027 commented Apr 5, 2023

LGTM

There's one minor thing left that I think is worth fixing - the main thing transport: load_balancing: Add ReplicationLocation enum commit introduces is rack-awareness to the default policy internals, but it's not mentioned anywhere in the commit message.

It's mentioned in a short sentence: Modified pick and fallback plan to prefer local rack replicas.

@piodul piodul merged commit 91a1b8e into scylladb:main Apr 5, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rack-aware load balancing
4 participants