query_partition_key_range_concurrent: strict predecessor of the exclusive right boundary is equal to vnode #12566

gusev-p · 2023-01-19T12:29:14Z

While discussing this PR The question arose about how storage_proxy::query_partition_key_range_concurrent handles the case when the strict predecessor of the exclusive right boundary is equal to vnode.

For example, let the initial range passed to query_partition_key_range be [1, 2) where 2 is the successor of 1 in terms of ring_position order and 1 is equal to vnode. Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)]. I don't see the check for empty range in either query_ranges_to_vnodes_generator or query_partition_key_range_concurrent, so we will probably make a request for an empty range. Maybe we should add this check?

The text was updated successfully, but these errors were encountered:

gusev-p · 2023-01-23T08:18:51Z

It seems that here the condition for interval being 'nonwrapping' can be violated, and there are no checks/asserts in nonwrapping_interval itself that could catch this. Maybe we should add such checks for debug mode.

avikivity · 2023-01-23T11:43:46Z

@gleb-cloudius please comment

gleb-cloudius · 2023-01-23T12:06:42Z

I do not remember anything about this code, but what does it mean "and 1 is equal to vnode"?

What I do remember is that due to CQL capabilities query_partition_key_range input cannot be arbitrary. It does not mean the problem above cannot happen since I still do not understand it.

gusev-p · 2023-01-23T12:12:37Z

what does it mean "and 1 is equal to vnode"?

it means 1 is equal to one of the boundary tokens from the token ring

avikivity · 2023-01-23T12:17:10Z

CQL can have token(pk) < ? and token(pk) <= ?, so both inclusive and exclusive bounds.

avikivity · 2023-01-23T12:21:57Z

Introduced here: 692a0bd (by @gleb-cloudius )

avikivity · 2023-01-23T12:22:57Z

I think we should try to create a test case that shows the problem here.

gleb-cloudius · 2023-01-23T12:24:02Z

I think we should try to create a test case that shows the problem here.

Yes, it would be helpful.

gusev-p · 2023-01-23T12:26:14Z

ok, I'll try to reproduce it in a test

avikivity · 2023-01-23T12:30:17Z

    if (start_token(cr) == end_token(cr)) {

should be is_singular()

avikivity · 2023-01-23T12:33:27Z

        if (end_token(cr) == upper_bound_token) {

should use interval::contains()

avikivity · 2023-01-23T12:34:08Z

In short, we should use interval primitives, not look at bounds directly.

gleb-cloudius · 2023-01-23T12:48:12Z

The code is likely was just translated from Cassandra: https://github.com/apache/cassandra/blob/bf599fb5b062cbcc652da78b7d699e7a01b949ad/src/java/org/apache/cassandra/service/StorageProxy.java#L1908

avikivity · 2023-01-23T14:24:53Z

I looked, and that code doesn't appear to have the notion of inclusive/exclusive bounds.

gleb-cloudius · 2023-01-24T10:21:49Z

I looked, and that code doesn't appear to have the notion of inclusive/exclusive bounds.

You mean it is translated incorrectly or the origin has the problem as well? BTW 692a0bd does not introduces the code, just moves it. The trunslation is done here: ed0faff

avikivity · 2023-01-24T11:58:08Z

I looked, and that code doesn't appear to have the notion of inclusive/exclusive bounds.

You mean it is translated incorrectly or the origin has the problem as well?

I think origin doesn't have the problem, because it doesn't have exclusive bounds:

https://vscode.dev/github.com/apache/cassandra/blob/bf599fb5b062cbcc652da78b7d699e7a01b949ad/src/java/org/apache/cassandra/dht/RingPosition.java#L25-L32

https://vscode.dev/github.com/apache/cassandra/blob/bf599fb5b062cbcc652da78b7d699e7a01b949ad/src/java/org/apache/cassandra/dht/AbstractBounds.java#L33-L47

Not sure how they avoid exclusive bounds, since with ByteOrderedPartitioner tokens do not form a finite field (so there is no next/previous tokens).

BTW 692a0bd does not introduces the code, just moves it. The trunslation is done here: ed0faff

Yes.

Let the initial range passed to query_partition_key_range be [1, 2) where 2 is the successor of 1 in terms of ring_position order and 1 is equal to vnode. Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)], so we get an empty range (1,2) and subsequently will make a data request with this empty range in storage_proxy::query_partition_key_range_concurrent, which will be redundant. The patch checks for this condition after the main loop in process_one_range. A test case is added for this scenario in test_get_restricted_ranges. The helper lambda check is changed so that not to limit the number of ranges to the length of expected ranges, otherwise this check passes without the change in process_one_range. Fixes: scylladb#12566

gusev-p · 2023-02-06T14:14:01Z

Prepared a test case. I'm not sure if a range of the form [dht::ring_position::ending_at(endpoint_token), dht::ring_position::starting_at(next_token)) can actually be provided by the user.

Let the initial range passed to query_partition_key_range be [1, 2) where 2 is the successor of 1 in terms of ring_position order and 1 is equal to vnode. Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)], so we get an empty range (1,2) and subsequently will make a data request with this empty range in storage_proxy::query_partition_key_range_concurrent, which will be redundant. The patch adds a check for this condition after making a split in the main loop in process_one_range. The patch does not attempt to handle cases where the original ranges were empty, since this check is the responsibility of the caller. We only take care not to add empty ranges to the result as an unintentional artifact of the algorithm in query_ranges_to_vnodes_generator. A test case is added in test_get_restricted_ranges. The helper lambda check is changed so that not to limit the number of ranges to the length of expected ranges, otherwise this check passes without the change in process_one_range. Fixes: scylladb#12566

avikivity · 2023-02-12T18:03:58Z

/// Computes partition-key ranges from token atoms in ex.
dht::partition_range_vector partition_ranges_from_token(const expr::expression& ex, const query_options& options) {
    auto values = possible_lhs_values(nullptr, ex, options);
    if (values == expr::value_set(expr::value_list{})) {
        return {};
    }
    const auto bounds = expr::to_range(values);
    const auto start_token = bounds.start() ? bounds.start()->value().with_linearized([] (bytes_view bv) { return dht::token::from_bytes(bv); })
            : dht::minimum_token();
    auto end_token = bounds.end() ? bounds.end()->value().with_linearized([] (bytes_view bv) { return dht::token::from_bytes(bv); })
            : dht::maximum_token();
    const bool include_start = bounds.start() && bounds.start()->is_inclusive();
    const auto include_end = bounds.end() && bounds.end()->is_inclusive();

    auto start = dht::partition_range::bound(include_start
                                             ? dht::ring_position::starting_at(start_token)
                                             : dht::ring_position::ending_at(start_token));
    auto end = dht::partition_range::bound(include_end
                                           ? dht::ring_position::ending_at(end_token)
                                           : dht::ring_position::starting_at(end_token));

    return {{std::move(start), std::move(end)}};
}

So it looks like the interval is always inclusive on both ends ([x, y]), and there is no user impact. I don't think Alternator can specify tokens. @nyh ?

avikivity · 2023-02-12T18:04:50Z

(the code that ensures it's inclusive is the return line, which doesn't mention intervals, but that's part of the return type)

tgrabiec · 2023-03-30T13:37:19Z

What's the justification for optimizing for this corner case? It seems unlikely to occur in practice, and if it does occur, no harm is done.

gusev-p · 2023-03-30T13:59:28Z

What's the justification for optimizing for this corner case? It seems unlikely to occur in practice, and if it does occur, no harm is done.

I ran into this corner case while investigating the code in storage_proxy and trying to outline the guarantees query_ranges_to_vnodes_generator provides. In particular, the original discussion was about whether in storage_proxy::remote::handle_read_data we can be sure that the requested range is not empty.

tgrabiec · 2023-03-30T17:14:00Z

The discussion was about this code:

static inline
const dht::token& end_token(const dht::partition_range& r) {
    static const dht::token max_token = dht::maximum_token();
    return r.end() ? r.end()->value().token() : max_token;

Here r.end() can hold ring_position::starting_at(t), so end_token() would return a token which is actually excluded by the range. Looking at the current uses of end_token(), it doesn't seem to matter. Even if the range is empty, it's end_token() will be the correct token to get the replica set for.

The patch doesn't change the fact that you can get ranges which exclude the token, but are non-empty.

gusev-p · 2023-03-31T06:52:06Z

The patch doesn't change the fact that you can get ranges which exclude the token, but are non-empty.

It didn't try to, it just made sure we don't make unnecessary requests for empty ranges.

avikivity · 2023-04-16T19:32:42Z

Ok. Given that we're contemplating reverting it in #13387, no backport is needed.

gusev-p assigned gusev-p and avikivity Jan 19, 2023

gusev-p mentioned this issue Jan 19, 2023

raft_topology: RPC fencing initial implementation #12500

Closed

DoronArazii added this to the 5.x milestone Jan 24, 2023

gusev-p mentioned this issue Feb 6, 2023

query_ranges_to_vnodes_generator: fix for exclusive boundaries #12755

Closed

gusev-p mentioned this issue Feb 7, 2023

storage_proxy::do_query: check input ranges and remove the empty ones #12773

Open

scylladb-promoter closed this as completed in 95bf8ee Feb 10, 2023

scylladb-promoter added the Backport candidate label Feb 10, 2023

DoronArazii modified the milestones: 5.x, 5.3 Feb 15, 2023

avikivity removed the Backport candidate label Apr 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query_partition_key_range_concurrent: strict predecessor of the exclusive right boundary is equal to vnode #12566

query_partition_key_range_concurrent: strict predecessor of the exclusive right boundary is equal to vnode #12566

gusev-p commented Jan 19, 2023

gusev-p commented Jan 23, 2023

avikivity commented Jan 23, 2023

gleb-cloudius commented Jan 23, 2023

gusev-p commented Jan 23, 2023

avikivity commented Jan 23, 2023

avikivity commented Jan 23, 2023

avikivity commented Jan 23, 2023

gleb-cloudius commented Jan 23, 2023

gusev-p commented Jan 23, 2023

avikivity commented Jan 23, 2023

avikivity commented Jan 23, 2023

avikivity commented Jan 23, 2023

gleb-cloudius commented Jan 23, 2023

avikivity commented Jan 23, 2023

gleb-cloudius commented Jan 24, 2023

avikivity commented Jan 24, 2023

gusev-p commented Feb 6, 2023

avikivity commented Feb 12, 2023

avikivity commented Feb 12, 2023

tgrabiec commented Mar 30, 2023

gusev-p commented Mar 30, 2023

tgrabiec commented Mar 30, 2023

gusev-p commented Mar 31, 2023

avikivity commented Apr 16, 2023

query_partition_key_range_concurrent: strict predecessor of the exclusive right boundary is equal to vnode #12566

query_partition_key_range_concurrent: strict predecessor of the exclusive right boundary is equal to vnode #12566

Comments

gusev-p commented Jan 19, 2023

gusev-p commented Jan 23, 2023

avikivity commented Jan 23, 2023

gleb-cloudius commented Jan 23, 2023

gusev-p commented Jan 23, 2023

avikivity commented Jan 23, 2023

avikivity commented Jan 23, 2023

avikivity commented Jan 23, 2023

gleb-cloudius commented Jan 23, 2023

gusev-p commented Jan 23, 2023

avikivity commented Jan 23, 2023

avikivity commented Jan 23, 2023

avikivity commented Jan 23, 2023

gleb-cloudius commented Jan 23, 2023

avikivity commented Jan 23, 2023

gleb-cloudius commented Jan 24, 2023

avikivity commented Jan 24, 2023

gusev-p commented Feb 6, 2023

avikivity commented Feb 12, 2023

avikivity commented Feb 12, 2023

tgrabiec commented Mar 30, 2023

gusev-p commented Mar 30, 2023

tgrabiec commented Mar 30, 2023

gusev-p commented Mar 31, 2023

avikivity commented Apr 16, 2023