Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cql: reject ALTER tablets KS with repl opts #18772

Closed
wants to merge 1 commit into from

Conversation

ptrsmrn
Copy link
Contributor

@ptrsmrn ptrsmrn commented May 20, 2024

The full support for ALTERing a tablets-enabled KEYSPACE is not yet
implemented, and we don't want to only change the schema without changing
any tablets, so the statement has to be explicitly rejected for cases
that won't work, so every time any replication option is provided.

Fixes: #18795
References: #16129

@ptrsmrn ptrsmrn requested a review from xemul May 20, 2024 21:43
@ptrsmrn ptrsmrn added the backport/none Backport is not required label May 20, 2024
@ptrsmrn ptrsmrn requested a review from bhalevy May 20, 2024 21:50
@nyh
Copy link
Contributor

nyh commented May 20, 2024

If we have any tests for ALTER KEYSPACE, won't this break these tests when we run these tests with tablets enabled?
It seems to me we'll need to mark these tests to run without tablets (we already did this in the past in other areas).

@ptrsmrn
Copy link
Contributor Author

ptrsmrn commented May 20, 2024

If we have any tests for ALTER KEYSPACE, won't this break these tests when we run these tests with tablets enabled? It seems to me we'll need to mark these tests to run without tablets (we already did this in the past in other areas).

@bhalevy this change will, of course, fail many tests, what to do about them - just disable? Seems to me that if our tests test the default configuration, and by default tablets will be enabled and ALTER won't be supported, and tests have to pass, the ones that will fail because of this change have to be disabled, at least by marking them with xfail pytag.

@nyh
Copy link
Contributor

nyh commented May 20, 2024

We don't want to skip them - we want to continue running them, but without tablets. @denesb and I already did this to many tests in cql-pytest and dtest, as we are unfortunately seeing more and more features being rejected in tablets but we still need to continue testing them.

@bhalevy
Copy link
Member

bhalevy commented May 21, 2024

We don't want to skip them - we want to continue running them, but without tablets. @denesb and I already did this to many tests in cql-pytest and dtest, as we are unfortunately seeing more and more features being rejected in tablets but we still need to continue testing them.

Yes, I did that for LWT as well (see #18103)
In addition @yarongilor should send a PR to skip dtests when tablets are enabled and #16129 is opened

@yarongilor
Copy link

We don't want to skip them - we want to continue running them, but without tablets. @denesb and I already did this to many tests in cql-pytest and dtest, as we are unfortunately seeing more and more features being rejected in tablets but we still need to continue testing them.

Yes, I did that for LWT as well (see #18103) In addition @yarongilor should send a PR to skip dtests when tablets are enabled and #16129 is opened

@bhalevy , there are about 60 occurrences of a code like session.execute("ALTER KEYSPACE .... Should all associated tests be skipped in these conditions?
It consist of the following tests:

audit_table_migration.py
audit_test.py
auth_test.py
cdc_batch_test.py
cdc_snapshot_operation_test.py
cdc_static_row_test.py
cdc_tracing_info_test.py
cdc_ttl_test.py
cdc_types_test.py
commitlog_test.py
compaction_additional_test.py
cql_additional_tests.py
cql_system_clients_test.py
cql_tests.py
encryption_at_rest_test.py
lwt_destructive_ddl_test.py
materialized_views_test.py
nodetool_additional_test.py
read_repair_test.py
rebuild_test.py
repair_additional_test.py
replication_test.py
schema_test.py
ttl_test.py
update_cluster_layout_tests.py

@bhalevy
Copy link
Member

bhalevy commented May 21, 2024

@bhalevy , there are about 60 occurrences of a code like session.execute("ALTER KEYSPACE .... Should all associated tests be skipped in these conditions?

@yarongilor it depends if they change the replication factor, or if the keyspace uses tablets (system keyspace do not use tablets)

For example, in test_alter_keyspace_then_table_with_udt:

            session.execute(f"ALTER KEYSPACE {keyspace_name} WITH replication = " "{'class': 'NetworkTopologyStrategy', 'dc1': 2};")

attempts to change rf for a user keyspace and therefore is should be skipped for tablets.

audit_test.verify_keyspace is borderline, since it uses:

            "ALTER KEYSPACE ks WITH replication = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 1 } AND DURABLE_WRITES = false",

to change DURABLE_WRITES but it doesn't actually change the replication factor for dc1, as it is the same as the one used for creating the keyspace:

            "CREATE KEYSPACE ks WITH replication = { 'class':'SimpleStrategy', 'replication_factor':1} AND DURABLE_WRITES = true",

So disabling it would depend on how fine-grained validation we'll so for rejecting the ALTER KEYSPACE.

Some cases are already skipped, for example test_streaming_on_rebuild_multidc has:

    @pytest.mark.skip_if(with_feature("tablets") & issue_open("#17572"))

@scylladb-promoter
Copy link
Contributor

Docs Preview 📖

Docs Preview for this pull request is available here

Changed Files:

Note: This preview will be available for 30 days and will be automatically deleted after that period. You can manually trigger a new build by committing changes.

@nyh
Copy link
Contributor

nyh commented May 21, 2024

@ptrsmrn there is something I don't understand... You said that "the full support for ALTER KEYSPACE" is not available for tablets. But... there are dozens of ALTER KEYSPACE in tests. Are really all of them not supported? Isn't there just a specific type of ALTER KEYSPACE which we don't support? Can't we forbid just the specific changes that we don't support, and allow the rest, and break fewer tests and cause a smaller mess?

By the way, if ALTER KEYSPACE is broken with tablets, we need an issue about it. Not this issue, about rejecting ALTER KEYSPACE - we need an issue explaining exactly what's broken and why. And this issue should refer to that issue.

CC @bhalevy

@bhalevy
Copy link
Member

bhalevy commented May 21, 2024

@ptrsmrn there is something I don't understand... You said that "the full support for ALTER KEYSPACE" is not available for tablets. But... there are dozens of ALTER KEYSPACE in tests. Are really all of them not supported? Isn't there just a specific type of ALTER KEYSPACE which we don't support? Can't we forbid just the specific changes that we don't support, and allow the rest, and break fewer tests and cause a smaller mess?

By the way, if ALTER KEYSPACE is broken with tablets, we need an issue about it. Not this issue, about rejecting ALTER KEYSPACE - we need an issue explaining exactly what's broken and why. And this issue should refer to that issue.

CC @bhalevy

@nyh there's #16129 about the support for replication factor changes (not ALTER KEYSPACE as a whole).
We can open an issue for disallowing ALTER KEYSPACE to change the tablets property in 6.0
Later on, we should allow enabling it for vnode-based keyspace when we support online migration to tablets.

@nyh
Copy link
Contributor

nyh commented May 21, 2024

@nyh there's #16129 about the support for replication factor changes (not ALTER KEYSPACE as a whole).

So if that were the only bug, only that should be disallowed, not ALTER KEYSPACE as a whole.
My question if there are additional buggy use cases.

We can open an issue for disallowing ALTER KEYSPACE to change the tablets property in 6.0 Later on, we should allow enabling it for vnode-based keyspace when we support online migration to tablets.

My request wasn't to open an issue to disallow things (this is what this issue is about), but to open an issue for specific cases that work incorrectly, that will need to be eventually fixed.

When we understand which cases don't work correctly (an understanding that currently I don't possess), we can go back to this issue and decide what we should temporarily forbid. But clearly the end-game won't be to "temporarily forbid" these things, but to implement some, and permanently forbid other things that don't make sense (if there is such a thing, I don't know...).

@bhalevy
Copy link
Member

bhalevy commented May 21, 2024

@nyh there's #16129 about the support for replication factor changes (not ALTER KEYSPACE as a whole).

So if that were the only bug, only that should be disallowed, not ALTER KEYSPACE as a whole. My question if there are additional buggy use cases.

nothing that I know of, other than the tablets property discussed below

We can open an issue for disallowing ALTER KEYSPACE to change the tablets property in 6.0 Later on, we should allow enabling it for vnode-based keyspace when we support online migration to tablets.

My request wasn't to open an issue to disallow things (this is what this issue is about), but to open an issue for specific cases that work incorrectly, that will need to be eventually fixed.

Currently, we simply allow to set/unset tablets using ALTER KEYSPACE, and do nothing with it.
I'm not even sure what this can lead to, but it definitely works incorrectly, therefore it needs to be rejected.

When we understand which cases don't work correctly (an understanding that currently I don't possess), we can go back to this issue and decide what we should temporarily forbid. But clearly the end-game won't be to "temporarily forbid" these things, but to implement some, and permanently forbid other things that don't make sense (if there is such a thing, I don't know...).

disabling tablets for an existing keyspace is probably going to be permanently disallowed, unless we're convinced that we must support it.

@bhalevy
Copy link
Member

bhalevy commented May 21, 2024

#18795

@ptrsmrn
Copy link
Contributor Author

ptrsmrn commented May 21, 2024

My request wasn't to open an issue to disallow things (this is what this issue is about), but to open an issue for specific cases that work incorrectly, that will need to be eventually fixed.

Currently, we simply allow to set/unset tablets using ALTER KEYSPACE, and do nothing with it. I'm not even sure what this can lead to, but it definitely works incorrectly, therefore it needs to be rejected.

Right, I don't see this option stored anywhere in the schema (shown below). Moreover, one can pass (seemingly conflicting) combination of tablets = {'initial': 2, 'enabled': false}; as shown here:

cqlsh> ALTER KEYSPACE ma WITH tablets = {'enabled': false};
// both ALTERs have the same effect
cqlsh> ALTER KEYSPACE ma WITH tablets = {'initial': 2, 'enabled': false};
cqlsh> DESC ma;

CREATE KEYSPACE ma WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '1'}  AND durable_writes = true;

CREATE TABLE ma.temp (
    p int PRIMARY KEY,
    s int
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND speculative_retry = '99.0PERCENTILE';

cqlsh> select * from system_schema.keyspaces  WHERE keyspace_name = 'ma';

 keyspace_name | durable_writes | replication
---------------+----------------+---------------------------------------------------------------------------------------
            ma |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '1'}

(1 rows)
cqlsh> select * from system_schema.scylla_keyspaces WHERE keyspace_name = 'ma';

 keyspace_name | initial_tablets | storage_options | storage_type
---------------+-----------------+-----------------+--------------
            ma |               3 |            null |         null

(1 rows)
cqlsh> 

Besides that, ks_prop_defs object only offers get_initial_tablets() method, which returns std::optional<unsigned> with initial tablets integer, so I'm guessing I somehow need to extract whether tablets were enabled or disabled via ALTER.

@ptrsmrn
Copy link
Contributor Author

ptrsmrn commented May 21, 2024

I'm also confused why ALTER is not reflected in system_schema.scylla_keyspaces:

cqlsh> CREATE KEYSPACE ma WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': 1} and tablets = {'initial': 8};

Warnings :
Tables in this keyspace will be replicated using tablets, and will not support the CDC feature (issue #16317) and LWT may suffer from issue #5251 more often. If you want to use CDC or LWT, please drop this keyspace and re-create it without tablets, by adding AND TABLETS = {'enabled': false} to the CREATE KEYSPACE statement.

Using Replication Factor datacenter1=1 lower than the minimum_replication_factor_warn_threshold=3 is not recommended.

cqlsh> select * from system_schema.scylla_keyspaces WHERE keyspace_name = 'ma';

 keyspace_name | initial_tablets | storage_options | storage_type
---------------+-----------------+-----------------+--------------
            ma |               8 |            null |         null

(1 rows)
cqlsh> ALTER KEYSPACE ma WITH tablets = {'initial': 2, 'enabled': true};
cqlsh> select * from system_schema.scylla_keyspaces WHERE keyspace_name = 'ma';

 keyspace_name | initial_tablets | storage_options | storage_type
---------------+-----------------+-----------------+--------------
            ma |               8 |            null |         null

(1 rows)
cqlsh> 

EDIT:
can anyone confirm that?

@xemul
Copy link
Contributor

xemul commented May 21, 2024

That's weird, we have a test for that

def test_alter_changes_initial_tablets(cql, skip_without_tablets):
    ksdef = "WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 1} AND tablets = {'initial': 1};"
    with new_test_keyspace(cql, ksdef) as keyspace:
        cql.execute(f"ALTER KEYSPACE {keyspace} WITH replication = {{'class': 'NetworkTopologyStrategy', 'replication_factor': 1}} AND tablets = {{'initial': 2}};")
        res = cql.execute(f"SELECT * FROM system_schema.scylla_keyspaces WHERE keyspace_name = '{keyspace}'").one()
        assert res.initial_tablets == 2

@xemul
Copy link
Contributor

xemul commented May 21, 2024

I guess it's because you don't specify the WITH replication = ... so this code

lw_shared_ptr<data_dictionary::keyspace_metadata> ks_prop_defs::as_ks_metadata_update(lw_shared_ptr<data_dictionary::keyspace_metadata> old, const locator::token_metadata& tm, const gms::feature_service& feat) {
    std::map<sstring, sstring> options;
    const auto& old_options = old->strategy_options();
    auto sc = get_replication_strategy_class();
    std::optional<unsigned> initial_tablets;
    if (sc) {
        initial_tablets = get_initial_tablets(*sc, old->initial_tablets().has_value());
        options = prepare_options(*sc, tm, get_replication_options(), initial_tablets, old_options);
    } else {
        sc = old->strategy_name();
        options = old_options;
        initial_tablets = old->initial_tablets();
    }

    return data_dictionary::keyspace_metadata::new_keyspace(old->name(), *sc, options, initial_tablets, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
}

takes the else branch and inherits initial-tables from current metadata

@ptrsmrn
Copy link
Contributor Author

ptrsmrn commented May 21, 2024

Yeah, this test is a bit different, when specifying any replication options, it works, but if they are skipped, it doesn't:

cqlsh> ALTER KEYSPACE ma WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': 3} AND tablets = {'initial': 3};
cqlsh> select * from system_schema.scylla_keyspaces WHERE keyspace_name = 'ma';

 keyspace_name | initial_tablets | storage_options | storage_type
---------------+-----------------+-----------------+--------------
            ma |               3 |            null |         null

(1 rows)
cqlsh> ALTER KEYSPACE ma WITH  tablets = {'initial': 9};
cqlsh> select * from system_schema.scylla_keyspaces WHERE keyspace_name = 'ma';

 keyspace_name | initial_tablets | storage_options | storage_type
---------------+-----------------+-----------------+--------------
            ma |               3 |            null |         null

(1 rows)
cqlsh> 

@xemul
Copy link
Contributor

xemul commented May 21, 2024

@ptrsmrn , #18772 (comment)

@bhalevy
Copy link
Member

bhalevy commented May 21, 2024

@ptrsmrn , #18772 (comment)

@xemul can you please open an issue about that?
We should be able to update initial tablets without having to specify the replication options.

@xemul
Copy link
Contributor

xemul commented May 21, 2024

#18801

@ptrsmrn ptrsmrn force-pushed the tablets_reject_alter branch 2 times, most recently from 22d4b9a to 7fe4692 Compare May 22, 2024 07:12
cql3/statements/alter_keyspace_statement.cc Outdated Show resolved Hide resolved
test/cql-pytest/test_keyspace.py Outdated Show resolved Hide resolved
test/cql-pytest/test_keyspace.py Outdated Show resolved Hide resolved
test/cql-pytest/test_keyspace.py Outdated Show resolved Hide resolved
test/cql-pytest/test_tablets.py Outdated Show resolved Hide resolved
test/topology_custom/test_tablets.py Outdated Show resolved Hide resolved
@xemul xemul mentioned this pull request May 28, 2024
}
if (ks.get_replication_strategy().uses_tablets() && !_attrs->get_replication_options().empty()) {
throw exceptions::invalid_request_exception("Changing replication options via ALTER KEYSPACE statement is not yet supported for keyspaces with tablets.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that recently promoted #18899 is prerequisite to this restriction, isn't it?
Maybe it's worth moving this one lower, after the

auto new_ks = _attrs->as_ks_metadata_update(ks.metadata(), *qp.proxy().get_token_metadata_ptr(), qp.proxy().features());

is done, and then later we can relax the restriction to be

    if (ks.uses_tablets() && new_ks->strategy_options().options != ks.strategy_options().options) {
        throw invalid_request_exception("Changing replication options is not supported");
    }

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptrsmrn , you put 👍 and resolved the issue, but the code remained as it was before my suggestion, so I'm totally confused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xemul sorry, I just want to move quick, maybe I'm too quick and cut corners:

I assume that recently promoted #18899 is prerequisite to this restriction, isn't it?

Yes, it's a prerequisite.

As for the 2nd part of your comment, I assumed it's only about refactoring, and since I didn't want to refactor the code, build, test, push, re-run the CI (not to waste time), I ignored it.

@ptrsmrn ptrsmrn added backport/6.0 should be backported to 6.0 and removed backport/none Backport is not required labels May 28, 2024
@ptrsmrn ptrsmrn changed the base branch from master to branch-6.0 May 28, 2024 19:25
@ptrsmrn
Copy link
Contributor Author

ptrsmrn commented May 28, 2024

@nyh @xemul rejecting ALTER should be merged only to 6.0 (per #16723 (comment)), hence I changed the merge-to branch to branch-6.0.

@ptrsmrn ptrsmrn changed the title cql: reject ALTER KEYSPACE statement for keyspaces with tablets cql: reject ALTER tablets KS with repl opts May 28, 2024
@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 cql-pytest/test_keyspace
🔹 cql-pytest/test_tablets
🔹 topology_custom/test_tablets
✅ - Container Test
✅ - dtest with topology changes
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 8 hr 57 min
  • Builder: i-08183f659f37abf87 (r5ad.8xlarge)

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 cql-pytest/test_keyspace::*
🔹 cql-pytest/test_tablets::*
🔹 topology_custom/test_tablets::*
✅ - Container Test
✅ - dtest with topology changes
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 3 hr 25 min
  • Builder: spider3.cloudius-systems.com

@ptrsmrn ptrsmrn changed the base branch from branch-6.0 to master May 29, 2024 09:10
The full support for ALTERing a tablets-enabled KEYSPACE is not yet
implemented, and we don't want to only change the schema without changing
any tablets, so the statement has to be explicitly rejected for cases
that won't work, so every time any replication option is provided.

Fixes: scylladb#18795
References: scylladb#16129
@ptrsmrn
Copy link
Contributor Author

ptrsmrn commented May 29, 2024

@nyh @xemul rejecting ALTER should be merged only to 6.0 (per #16723 (comment)), hence I changed the merge-to branch to branch-6.0.

@nyh @xemul sorry for the confusion, we'll follow a regular procedure here, so rejecting ALTER tablets KS is going to be merged to master and then backported to 6.0.

Copy link
Contributor

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm fine with this patch, but I'm lost on where/why we want to merge it. We want it in master? Did @mykaul approve?

@mykaul
Copy link
Contributor

mykaul commented May 29, 2024

Ok, I'm fine with this patch, but I'm lost on where/why we want to merge it. We want it in master? Did @mykaul approve?

I don't need to approve. If I need to - yes, I approve.

@nyh
Copy link
Contributor

nyh commented May 29, 2024

Ok, I'm fine with this patch, but I'm lost on where/why we want to merge it. We want it in master? Did @mykaul approve?

I don't need to approve. If I need to - yes, I approve.

Ok. This patch, like sadly a few before it, is removing a feature that Scylla 5.4 used to have from 6.0. Yes, a user could work around this but will need some forethought (to create a table without tablets). It's not just a technical decision of whether the patch has correct code and appropriate tests.

@xemul
Copy link
Contributor

xemul commented May 29, 2024

Ok, I'm fine with this patch, but I'm lost on where/why we want to merge it. We want it in master? Did @mykaul approve?

I'm not yet fine with the patch, but I'm lost just as well. If we merge this into master, then #16723 should not be merged, otherwise tests in it would get immediately broken.

Copy link
Contributor

@xemul xemul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a comment about the replication options restrictions here

@ptrsmrn
Copy link
Contributor Author

ptrsmrn commented May 29, 2024

Ok, I'm fine with this patch, but I'm lost on where/why we want to merge it. We want it in master? Did @mykaul approve?

I'm not yet fine with the patch, but I'm lost just as well. If we merge this into master, then #16723 should not be merged, otherwise tests in it would get immediately broken.

@xemul I'm assuming #16723 would have to be extended with a revert of this patch.

@xemul
Copy link
Contributor

xemul commented May 29, 2024

Ok, I'm fine with this patch, but I'm lost on where/why we want to merge it. We want it in master? Did @mykaul approve?

I'm not yet fine with the patch, but I'm lost just as well. If we merge this into master, then #16723 should not be merged, otherwise tests in it would get immediately broken.

@xemul I'm assuming #16723 would have to be extended with a revert of this patch.

Too late, I already queued it. Presumably this PR now needs to be extended with a revert of this patch? 🥴

@ptrsmrn
Copy link
Contributor Author

ptrsmrn commented May 29, 2024

Closing (without merging) per #16723 (comment), so it accidentally doesn't get merged. Can be reopened if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/6.0 should be backported to 6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disabling tablets using ALTER KEYSPACE should be forbidden
7 participants