Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multishard_mutation_query: add tablets support #16802

Merged

Conversation

denesb
Copy link
Contributor

@denesb denesb commented Jan 16, 2024

When reading a list of ranges with tablets, we don't need a multishard reader. Instead, we intersect the range list with the local nodes tablet ranges, then read each range from the respective shard.
The individual ranges are read sequentially, with database::query_mutations, merging the results into a single
instance. This makes the code simple. For tablets multishard_mutation_query.cc is no longer on the hot paths, range scans
on tables with tablets fork off to a different code-path in the coordinator. The only code using multishard_mutation_query.cc are forced, replica-local scans, like those used by SELECT * FROM MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests, so we optimize for simplicity, not performance.

Fixes: #16484

@avikivity
Copy link
Member

When reading a list of ranges with tablets, we don't need a multishard reader. Instead, we intersect the range list with the local nodes tablet ranges, then read each range from the respective shard. The individual ranges are read sequentially, with database::query_mutations, merging the results into a single instance. This makes the code simple. For tablets multishard_mutation_query.cc is no longer on the hot paths, range scans on tables with tablets fork off to a different code-path in the coordinator. The only code using multishard_mutation_query.cc are forced, replica-local scans, like those used by SELECT * FROM MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests, so we optimize for simplicity, not performance.

Fixes: #16484

Could we change those code paths not to use multishard_mutation_query?

It will make it easier to get rid of it eventually.

We don't have to do it all at once (and this would be a first step).

mutation_query.cc Outdated Show resolved Hide resolved
@avikivity
Copy link
Member

Missing: tests. What gets unbroken here? SELECT mutation_fragments?

@avikivity
Copy link
Member

If it's just mutation_fragments, perhaps we can short-circuit the whole thing and move the code there. It's bad since it's replica code in the coordinator, but it lets us isolate multishard_mutation_query for eventual removal. It's okay for SELECT mutation_fragments to not cross tablet boundaries.

@avikivity
Copy link
Member

But: cqlsh paging should be improved. If it received a page smaller than the terminal size, it should fetch the next page. If it receives a page larger than the page size, it should ask the user to confirm and not dump the whole page. @fruch

@fruch
Copy link
Contributor

fruch commented Jan 16, 2024

But: cqlsh paging should be improved. If it received a page smaller than the terminal size, it should fetch the next page. If it receives a page larger than the page size, it should ask the user to confirm and not dump the whole page. @fruch

I'm not sure how pagination works in cqlsh, but having empty pages is quite common, and we handle it across the different test environments, so having more cases that would generate empty pages would probably go unnoticed.

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 2 hr 0 min
  • Builder: spider3.cloudius-systems.com

@denesb
Copy link
Contributor Author

denesb commented Jan 17, 2024

Missing: tests. What gets unbroken here? SELECT mutation_fragments?

Yes, they break with tablets. With this PR, they don't anymore.

@denesb
Copy link
Contributor Author

denesb commented Jan 17, 2024

If it's just mutation_fragments, perhaps we can short-circuit the whole thing and move the code there. It's bad since it's replica code in the coordinator, but it lets us isolate multishard_mutation_query for eventual removal. It's okay for SELECT mutation_fragments to not cross tablet boundaries.

I considered it, but then realized we have new users for this code in mind, see #16478.
I think multishard_mutation_query (however unfortunate the name is) is a useful abstraction. It is useful to be able to scan all data a local replica owns, or a subset of it and without having completely separate methods for vnodes and tablets The only reason we want to get rid of this code, is the complexity of its vnode variant. We can just drop that code-path once vnodes are retired and keep the simple tablets code-path.

@denesb denesb force-pushed the multishard-mutation-query-tablets branch from f6b65e3 to d83cd80 Compare January 18, 2024 11:20
@denesb
Copy link
Contributor Author

denesb commented Jan 18, 2024

New in v2:

  • s/reconcilable_result::merge()/reconcilable_result::merge_disjoint()/
  • Add comment to locator::tablet_range_splitter, warning user to pin tablet mapping via erm

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 1 hr 49 min
  • Builder: spider7.cloudius-systems.com

@denesb
Copy link
Contributor Author

denesb commented Jan 22, 2024

New in v3:

  • Rebased on master
  • Remove skip_with_tablets from the tests and drop the fixture altogether (no-one uses it anymore)
  • Add a way to parameterize test_keyspace and use it in test_select_from_mutation_fragments.py.

@denesb
Copy link
Contributor Author

denesb commented Jan 22, 2024

@nyh please review the last few patches, I demonstrate how to parameterize tests to run with both vnodes and tablets.

@avikivity
Copy link
Member

Wouldn't LOCAL_ONLY run through storage_proxy?

mutation_fragments has an excuse to bypass storage_proxy since it reads mutation fragments, not row results, but LOCAL_ONLY should go through storage_proxy.

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 1 hr 56 min
  • Builder: spider3.cloudius-systems.com

@denesb
Copy link
Contributor Author

denesb commented Jan 23, 2024

Wouldn't LOCAL_ONLY run through storage_proxy?

mutation_fragments has an excuse to bypass storage_proxy since it reads mutation fragments, not row results, but LOCAL_ONLY should go through storage_proxy.

Going through storage proxy would involve doing a range pre-processing in the CQL layer, to make sure to only pass ranges that the local node owns, to storage proxy. This would be awkward to do because the storage-proxy query API accepts a single range at a time, so the CQL layer would end up stiching together partial results and doing limit accounting.
Another possibility is a new storage proxy API which forces a local read, but at that point why even go though storage proxy at all, we might as well go to database or multishard_mutation_query.cc directly.

@bhalevy
Copy link
Member

bhalevy commented Feb 18, 2024

ping, let's finalize this PR and get it merged

@denesb denesb force-pushed the multishard-mutation-query-tablets branch from 9647966 to fc8b75d Compare February 19, 2024 12:05
@denesb
Copy link
Contributor Author

denesb commented Feb 19, 2024

New in v4:

  • Remove assumption about token-range bounds, use dht::to_partition_range() to convert from token range to partition range.
  • Applied a small fix to the aforementioned dht::to_partition_range() (first patch).

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 2 hr 6 min
  • Builder: spider3.cloudius-systems.com

@denesb denesb force-pushed the multishard-mutation-query-tablets branch from fc8b75d to 944096a Compare February 20, 2024 09:13
@denesb
Copy link
Contributor Author

denesb commented Feb 20, 2024

v5:

  • Add interval version of interval<T>::before(), and use it in the range splitter, so inclusiveness of either bounds can be taken into account.

interval.hh Outdated Show resolved Hide resolved
@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 2 hr 3 min
  • Builder: spider7.cloudius-systems.com

@denesb denesb force-pushed the multishard-mutation-query-tablets branch from 944096a to 1dee6b7 Compare February 20, 2024 13:08
@denesb
Copy link
Contributor Author

denesb commented Feb 20, 2024

v6:

  • interval: s/before/other_is_before/

@avikivity
Copy link
Member

other_is_before is awkward but at least it doesn't generate the wrong impression, so let's go with that.

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Container Test
❌ - dtest
✅ - Unit Tests

Failed Tests (2/26087):

Build Details:

  • Duration: 2 hr 6 min
  • Builder: spider8.cloudius-systems.com

The current point variant cannot take inclusiveness into account, when
said point comes from another interval bound.
This method had no tests at all, so add tests covering both overloads.
…inclusive

Consider the inclusiveness of the token-range's start and end bounds and
copy the flag to the output bounds, instead of assuming they are always
inclusive.
Given a list of partition-ranges, yields the intersection of this
range-list, with that of that tablet-ranges, for tablets located on the
given host.
This will be used in multishard_mutation_query.cc, to obtain the ranges
to read from the local node: given the read ranges, obtain the ranges
belonging to tablets who have replicas on the local node.
Merging two disjoint reconcilable_result instances.
…r factory

This param was used by the query-result builder, to set the
last-position on end-of-stream. Instead, do this via a new ResultBuilder
method, maybe_set_last_position(), which is called from read_page(),
which has access to the compaction-state.
With this, the ResultBuilder can be created without a compaction-state
at hand. This will be important in the next patch.
When reading a list of ranges with tablets, we don't need a multishard
reader. Instead, we intersect the range list with the local nodes tablet
ranges, then read each range from the respective shard.
The individual ranges are read sequentially, with
database::query[_mutations](), merging the results into a single
instance. This makes the code simple. For tablets,
multishard_mutation_query.cc is no longer on the hot paths, range scans
on tables with tablets fork off to a different code-path in the
coordinator. The only code using multishard_mutation_query.cc are
forced, replica-local scans, like those used by SELECT * FROM
MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests,
so we optimize for simplicity, not performance.
…with_tablets

The underlying functionality was fixed, the tests should now pass with
tablets.
…tests

To run with both vnodes and tablets. For this functionality, both
replication methods should be covered with tests, because it uses
different ways to produce partition lists, depending on the replication
method.

Also add scylla_only to those tests that were missing this fixture
before. All tests in this suite are scylla-only and with the
parameterization, this is even more apparent.
All tests that used it are fixed, and we should not add any new tests
failing with tablets from now on, so remove.
@denesb denesb force-pushed the multishard-mutation-query-tablets branch from 1dee6b7 to ca58590 Compare February 21, 2024 07:09
@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest with topology changes
✅ - Unit Tests
✅ - dtest

Build Details:

  • Duration: 2 hr 19 min
  • Builder: spider1.cloudius-systems.com

@scylladb-promoter scylladb-promoter merged commit 4be70bf into scylladb:master Feb 21, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SELECT FROM MUTATION FRAGMENTS doesn't work with tablets
5 participants