New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query_partition_key_range_concurrent: strict predecessor of the exclusive right boundary is equal to vnode #12566
Comments
It seems that here the condition for interval being 'nonwrapping' can be violated, and there are no checks/asserts in |
@gleb-cloudius please comment |
I do not remember anything about this code, but what does it mean "and 1 is equal to vnode"? What I do remember is that due to CQL capabilities |
it means 1 is equal to one of the boundary tokens from the token ring |
CQL can have |
Introduced here: 692a0bd (by @gleb-cloudius ) |
I think we should try to create a test case that shows the problem here. |
Yes, it would be helpful. |
ok, I'll try to reproduce it in a test |
if (start_token(cr) == end_token(cr)) { should be is_singular() |
if (end_token(cr) == upper_bound_token) { should use interval::contains() |
In short, we should use interval primitives, not look at bounds directly. |
The code is likely was just translated from Cassandra: https://github.com/apache/cassandra/blob/bf599fb5b062cbcc652da78b7d699e7a01b949ad/src/java/org/apache/cassandra/service/StorageProxy.java#L1908 |
I looked, and that code doesn't appear to have the notion of inclusive/exclusive bounds. |
I think origin doesn't have the problem, because it doesn't have exclusive bounds: Not sure how they avoid exclusive bounds, since with ByteOrderedPartitioner tokens do not form a finite field (so there is no next/previous tokens).
Yes. |
Let the initial range passed to query_partition_key_range be [1, 2) where 2 is the successor of 1 in terms of ring_position order and 1 is equal to vnode. Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)], so we get an empty range (1,2) and subsequently will make a data request with this empty range in storage_proxy::query_partition_key_range_concurrent, which will be redundant. The patch checks for this condition after the main loop in process_one_range. A test case is added for this scenario in test_get_restricted_ranges. The helper lambda check is changed so that not to limit the number of ranges to the length of expected ranges, otherwise this check passes without the change in process_one_range. Fixes: scylladb#12566
Prepared a test case. I'm not sure if a range of the form |
Let the initial range passed to query_partition_key_range be [1, 2) where 2 is the successor of 1 in terms of ring_position order and 1 is equal to vnode. Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)], so we get an empty range (1,2) and subsequently will make a data request with this empty range in storage_proxy::query_partition_key_range_concurrent, which will be redundant. The patch adds a check for this condition after making a split in the main loop in process_one_range. The patch does not attempt to handle cases where the original ranges were empty, since this check is the responsibility of the caller. We only take care not to add empty ranges to the result as an unintentional artifact of the algorithm in query_ranges_to_vnodes_generator. A test case is added in test_get_restricted_ranges. The helper lambda check is changed so that not to limit the number of ranges to the length of expected ranges, otherwise this check passes without the change in process_one_range. Fixes: scylladb#12566
/// Computes partition-key ranges from token atoms in ex.
dht::partition_range_vector partition_ranges_from_token(const expr::expression& ex, const query_options& options) {
auto values = possible_lhs_values(nullptr, ex, options);
if (values == expr::value_set(expr::value_list{})) {
return {};
}
const auto bounds = expr::to_range(values);
const auto start_token = bounds.start() ? bounds.start()->value().with_linearized([] (bytes_view bv) { return dht::token::from_bytes(bv); })
: dht::minimum_token();
auto end_token = bounds.end() ? bounds.end()->value().with_linearized([] (bytes_view bv) { return dht::token::from_bytes(bv); })
: dht::maximum_token();
const bool include_start = bounds.start() && bounds.start()->is_inclusive();
const auto include_end = bounds.end() && bounds.end()->is_inclusive();
auto start = dht::partition_range::bound(include_start
? dht::ring_position::starting_at(start_token)
: dht::ring_position::ending_at(start_token));
auto end = dht::partition_range::bound(include_end
? dht::ring_position::ending_at(end_token)
: dht::ring_position::starting_at(end_token));
return {{std::move(start), std::move(end)}};
} So it looks like the interval is always inclusive on both ends ([x, y]), and there is no user impact. I don't think Alternator can specify tokens. @nyh ? |
(the code that ensures it's inclusive is the |
What's the justification for optimizing for this corner case? It seems unlikely to occur in practice, and if it does occur, no harm is done. |
I ran into this corner case while investigating the code in |
The discussion was about this code:
Here r.end() can hold ring_position::starting_at(t), so end_token() would return a token which is actually excluded by the range. Looking at the current uses of end_token(), it doesn't seem to matter. Even if the range is empty, it's end_token() will be the correct token to get the replica set for. The patch doesn't change the fact that you can get ranges which exclude the token, but are non-empty. |
It didn't try to, it just made sure we don't make unnecessary requests for empty ranges. |
Ok. Given that we're contemplating reverting it in #13387, no backport is needed. |
While discussing this PR The question arose about how storage_proxy::query_partition_key_range_concurrent handles the case when the strict predecessor of the exclusive right boundary is equal to
vnode
.For example, let the initial range passed to
query_partition_key_range
be[1, 2)
where 2 is the successor of 1 in terms ofring_position
order and 1 is equal tovnode
. Thenquery_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)]
. I don't see the check for empty range in eitherquery_ranges_to_vnodes_generator
orquery_partition_key_range_concurrent
, so we will probably make a request for an empty range. Maybe we should add this check?The text was updated successfully, but these errors were encountered: