Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query_partition_key_range_concurrent: strict predecessor of the exclusive right boundary is equal to vnode #12566

Closed
gusev-p opened this issue Jan 19, 2023 · 24 comments
Assignees
Milestone

Comments

@gusev-p
Copy link

gusev-p commented Jan 19, 2023

While discussing this PR The question arose about how storage_proxy::query_partition_key_range_concurrent handles the case when the strict predecessor of the exclusive right boundary is equal to vnode.

For example, let the initial range passed to query_partition_key_range be [1, 2) where 2 is the successor of 1 in terms of ring_position order and 1 is equal to vnode. Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)]. I don't see the check for empty range in either query_ranges_to_vnodes_generator or query_partition_key_range_concurrent, so we will probably make a request for an empty range. Maybe we should add this check?

@gusev-p
Copy link
Author

gusev-p commented Jan 23, 2023

It seems that here the condition for interval being 'nonwrapping' can be violated, and there are no checks/asserts in nonwrapping_interval itself that could catch this. Maybe we should add such checks for debug mode.

@avikivity
Copy link
Member

@gleb-cloudius please comment

@gleb-cloudius
Copy link
Contributor

I do not remember anything about this code, but what does it mean "and 1 is equal to vnode"?

What I do remember is that due to CQL capabilities query_partition_key_range input cannot be arbitrary. It does not mean the problem above cannot happen since I still do not understand it.

@gusev-p
Copy link
Author

gusev-p commented Jan 23, 2023

what does it mean "and 1 is equal to vnode"?

it means 1 is equal to one of the boundary tokens from the token ring

@avikivity
Copy link
Member

CQL can have token(pk) < ? and token(pk) <= ?, so both inclusive and exclusive bounds.

@avikivity
Copy link
Member

Introduced here: 692a0bd (by @gleb-cloudius )

@avikivity
Copy link
Member

I think we should try to create a test case that shows the problem here.

@gleb-cloudius
Copy link
Contributor

I think we should try to create a test case that shows the problem here.

Yes, it would be helpful.

@gusev-p
Copy link
Author

gusev-p commented Jan 23, 2023

ok, I'll try to reproduce it in a test

@avikivity
Copy link
Member

    if (start_token(cr) == end_token(cr)) {

should be is_singular()

@avikivity
Copy link
Member

        if (end_token(cr) == upper_bound_token) {

should use interval::contains()

@avikivity
Copy link
Member

In short, we should use interval primitives, not look at bounds directly.

@gleb-cloudius
Copy link
Contributor

@avikivity
Copy link
Member

I looked, and that code doesn't appear to have the notion of inclusive/exclusive bounds.

@gleb-cloudius
Copy link
Contributor

I looked, and that code doesn't appear to have the notion of inclusive/exclusive bounds.

You mean it is translated incorrectly or the origin has the problem as well? BTW 692a0bd does not introduces the code, just moves it. The trunslation is done here: ed0faff

@avikivity
Copy link
Member

I looked, and that code doesn't appear to have the notion of inclusive/exclusive bounds.

You mean it is translated incorrectly or the origin has the problem as well?

I think origin doesn't have the problem, because it doesn't have exclusive bounds:

https://vscode.dev/github.com/apache/cassandra/blob/bf599fb5b062cbcc652da78b7d699e7a01b949ad/src/java/org/apache/cassandra/dht/RingPosition.java#L25-L32

https://vscode.dev/github.com/apache/cassandra/blob/bf599fb5b062cbcc652da78b7d699e7a01b949ad/src/java/org/apache/cassandra/dht/AbstractBounds.java#L33-L47

Not sure how they avoid exclusive bounds, since with ByteOrderedPartitioner tokens do not form a finite field (so there is no next/previous tokens).

BTW 692a0bd does not introduces the code, just moves it. The trunslation is done here: ed0faff

Yes.

@DoronArazii DoronArazii added this to the 5.x milestone Jan 24, 2023
gusev-p pushed a commit to gusev-p/scylla that referenced this issue Feb 6, 2023
Let the initial range passed to query_partition_key_range
be [1, 2) where 2 is the successor of 1 in terms
of ring_position order and 1 is equal to vnode.
Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)],
so we get an empty range (1,2) and subsequently will
make a data request with this empty range in
storage_proxy::query_partition_key_range_concurrent,
which will be redundant.

The patch checks for this condition after the main loop
in process_one_range.

A test case is added for this scenario in
test_get_restricted_ranges. The helper
lambda check is changed so that not to limit
the number of ranges to the length of expected
ranges, otherwise this check passes without
the change in process_one_range.

Fixes: scylladb#12566
@gusev-p
Copy link
Author

gusev-p commented Feb 6, 2023

Prepared a test case. I'm not sure if a range of the form [dht::ring_position::ending_at(endpoint_token), dht::ring_position::starting_at(next_token)) can actually be provided by the user.

gusev-p pushed a commit to gusev-p/scylla that referenced this issue Feb 7, 2023
Let the initial range passed to query_partition_key_range
be [1, 2) where 2 is the successor of 1 in terms
of ring_position order and 1 is equal to vnode.
Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)],
so we get an empty range (1,2) and subsequently will
make a data request with this empty range in
storage_proxy::query_partition_key_range_concurrent,
which will be redundant.

The patch adds a check for this condition after
making a split in the main loop in process_one_range.

The patch does not attempt to handle cases where the
original ranges were empty, since this check is the
responsibility of the caller. We only take care
not to add empty ranges to the result as an
unintentional artifact of the algorithm in
query_ranges_to_vnodes_generator.

A test case is added in test_get_restricted_ranges.
The helper lambda check is changed so that not to limit
the number of ranges to the length of expected
ranges, otherwise this check passes without
the change in process_one_range.

Fixes: scylladb#12566
@avikivity
Copy link
Member

/// Computes partition-key ranges from token atoms in ex.
dht::partition_range_vector partition_ranges_from_token(const expr::expression& ex, const query_options& options) {
    auto values = possible_lhs_values(nullptr, ex, options);
    if (values == expr::value_set(expr::value_list{})) {
        return {};
    }
    const auto bounds = expr::to_range(values);
    const auto start_token = bounds.start() ? bounds.start()->value().with_linearized([] (bytes_view bv) { return dht::token::from_bytes(bv); })
            : dht::minimum_token();
    auto end_token = bounds.end() ? bounds.end()->value().with_linearized([] (bytes_view bv) { return dht::token::from_bytes(bv); })
            : dht::maximum_token();
    const bool include_start = bounds.start() && bounds.start()->is_inclusive();
    const auto include_end = bounds.end() && bounds.end()->is_inclusive();

    auto start = dht::partition_range::bound(include_start
                                             ? dht::ring_position::starting_at(start_token)
                                             : dht::ring_position::ending_at(start_token));
    auto end = dht::partition_range::bound(include_end
                                           ? dht::ring_position::ending_at(end_token)
                                           : dht::ring_position::starting_at(end_token));

    return {{std::move(start), std::move(end)}};
}

So it looks like the interval is always inclusive on both ends ([x, y]), and there is no user impact. I don't think Alternator can specify tokens. @nyh ?

@avikivity
Copy link
Member

(the code that ensures it's inclusive is the return line, which doesn't mention intervals, but that's part of the return type)

@DoronArazii DoronArazii modified the milestones: 5.x, 5.3 Feb 15, 2023
@tgrabiec
Copy link
Contributor

What's the justification for optimizing for this corner case? It seems unlikely to occur in practice, and if it does occur, no harm is done.

@gusev-p
Copy link
Author

gusev-p commented Mar 30, 2023

What's the justification for optimizing for this corner case? It seems unlikely to occur in practice, and if it does occur, no harm is done.

I ran into this corner case while investigating the code in storage_proxy and trying to outline the guarantees query_ranges_to_vnodes_generator provides. In particular, the original discussion was about whether in storage_proxy::remote::handle_read_data we can be sure that the requested range is not empty.

@tgrabiec
Copy link
Contributor

The discussion was about this code:

static inline
const dht::token& end_token(const dht::partition_range& r) {
    static const dht::token max_token = dht::maximum_token();
    return r.end() ? r.end()->value().token() : max_token;

Here r.end() can hold ring_position::starting_at(t), so end_token() would return a token which is actually excluded by the range. Looking at the current uses of end_token(), it doesn't seem to matter. Even if the range is empty, it's end_token() will be the correct token to get the replica set for.

The patch doesn't change the fact that you can get ranges which exclude the token, but are non-empty.

@gusev-p
Copy link
Author

gusev-p commented Mar 31, 2023

The patch doesn't change the fact that you can get ranges which exclude the token, but are non-empty.

It didn't try to, it just made sure we don't make unnecessary requests for empty ranges.

@avikivity
Copy link
Member

Ok. Given that we're contemplating reverting it in #13387, no backport is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants