Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove consistent-topology-changes experimental feature #17802

Closed
kbr-scylla opened this issue Mar 14, 2024 · 11 comments
Closed

Remove consistent-topology-changes experimental feature #17802

kbr-scylla opened this issue Mar 14, 2024 · 11 comments
Assignees
Labels
area/topology changes status/release blocker Preventing from a release to be promoted
Milestone

Comments

@kbr-scylla
Copy link
Contributor

kbr-scylla commented Mar 14, 2024

Raft-based topology is planned GA for 6.0, so we need to take off the experimental feature.

Best to do it before branching.

We'll need to coordinate the change in ScyllaDB repo with changes in scylla-dtest and scylla-pkg, maybe more repositories. They all refer to this experimental feature.

Similar effort was done when we removed consistent_cluster_management: #16334, https://github.com/scylladb/scylla-pkg/pull/3722, https://github.com/scylladb/scylla-dtest/pull/3770 (@patjed41)

cc @mykaul @avikivity @kostja


We decided that we will replace the experimental flag with a flag which will allow disabling raft-topology when starting new cluster (basically, the reverse of the current experimental feature).

The flag is not to be used by customers/users, the purpose is to continue testing gossip-mode operations in dtests and SCT.

The dtest jobs will have to be adjusted as well; we'll need to continue running the gossip-mode jobs.

@kbr-scylla
Copy link
Contributor Author

Not sure if we want to wait with this until we fix all dtest failures.

Fixing dtest failures first may make the process simpler -- there should be fewer adjustments needed for scylla-dtest in that case.

@kbr-scylla
Copy link
Contributor Author

After discussion on today's status, we decided that we will replace the experimental flag with a flag which will allow disabling raft-topology when starting new cluster (basically, the reverse of the current experimental feature).

The flag is not to be used by customers/users, the purpose is to continue testing gossip-mode operations in dtests and SCT.

cc @temichus @enaydanov @aleksbykov @avikivity @mykaul @kostja @bhalevy

@patjed41
Copy link
Contributor

The flag is not to be used by customers/users

Have we ever introduced such a flag?

What value_status should it have? I think the only reasonable choice is Used, but then this flag will be automatically documented and visible to users. Maybe we need a new Internal value?

cc @avikivity

@tzach
Copy link
Contributor

tzach commented Mar 26, 2024

Have we ever introduced such a flag?

4f23eec

@bhalevy
Copy link
Member

bhalevy commented Mar 29, 2024

Please see https://github.com/scylladb/scylla-dtest/pull/4066#issuecomment-2027757489
We need to make sure not to break tablets in the process.

@patjed41
Copy link
Contributor

patjed41 commented Apr 15, 2024

edit: renamed disable-consistent-topology-changes to force-gossip-topology-changes after accepting force-gossip-topology-changes as the final name, added links to PRs

Since fixing this issue will require merging multiple PRs in different repos, I post the plan, so we don't get lost.

We will make consistent-topology-changes unused and introduce force-gossip-topology-changes that allows testing with raft-topology disabled. These changes require corresponding adjustments in dtest:

  • tests using raft-topology should stop using consistent-topology-changes
  • tests not using raft-topology must start using force-gossip-topology-changes

We want to merge changes to Scylla and to dtest in a way that ensures that all tests are run correctly during the whole process. If we merged all changes to Scylla first, all tests would run with raft-topology and the ones excluded in raft-topology would fail. We also can't merge all changes to dtest first.

Here is the proposed merging plan:

  1. scylladb: introduce unused force-gossip-topology-changes; db: config: introduce force-gossip-topology-changes #18284
  2. scylla-dtest: use force-gossip-topology-changes in jobs that run without raft-topology (this change won't have any effect for now); https://github.com/scylladb/scylla-dtest/pull/4179
  3. scylladb: add all required changes, in particular making consistent-topology-changes unused and force-gossip-topology-changes used; db: config: move consistent-topology-changes out of experimental and make it the default for new clusters #18285
  4. scylla-dtest: remove redundant uses of the consistent-topology-changes flag and make the consistent-topology-changes feature (the one responsible for including and excluding tests if raft-topology is used) depend on passing force-gossip-topology-changes; https://github.com/scylladb/scylla-dtest/pull/4186

After merging all changes above, we will still have a small issue. Using raft-topology will become default, but Jenkins jobs' names won't reflect it. For example, we will have gating-dtest-release (not using raft-topology) and gating-dtest-release-with-consistent-topology-changes, but we would like to have gating-dtest-release (using raft-topology) and gating-dtest-release-with-gossip-topology-changes. This issue can be fixed later to simplify the initial process. At least, we will need to:

  • scylla-pkg: rename gating-dtest-release-with-consistent-topology-changes to gating-dtest-release-with-gossip-topology-changes
  • scylla-dtest: stop passing force-gossip-topology-changes to jobs not using raft-topology, start passing it to jobs using it (the ones named with "with-raft-topology"), and rename "with-raft-topology" jobs to "with-gossip-topology"
  • Jenkins: fix Script Path in jobs' configurations according to the changes in scylla-dtest above

@scylladb/scylla-maint @fruch @yaronkaikov

patjed41 added a commit to patjed41/scylladb that referenced this issue Apr 17, 2024
We are going to remove the `consistent-topology-changes` experimental
feature in 6.0. However, the topology upgrade procedure will be
manual and voluntary, so some 6.0 clusters will be using the
gossip-based topology. Therefore, we need to continue testing the
gossip-based topology. The solution is introducing a new flag,
`disable-consistent-topology-changes`, that will enforce the
gossip-based topology in a fresh cluster.

In this patch, we only introduce the parameter without any effect.
Here is the explanation. Removing `consistent-topology-changes`
and introducing `disable-consistent-topology-changes` requires
adjustments in scylla-dtest. We want to merge changes to scylladb
and scylla-dtest in a way that ensures all tests are run correctly
during the whole process. If we merged all changes to scylladb first,
before merging the scylla-dtest changes, all tests would run with
the raft-based topology and the ones excluded in the raft-based
topology would fail. We also can't merge all changes to scylla-dtest
first. However, we can follow this plan:
1. scylladb: merge this patch
2. scylla-dtest: start using `disable-consistent-topology-changes`
   in jobs that run without the raft-based topology
3. scylladb: merge the rest of the changes
4. scylla-dtest: merge the rest of the changes

Ref scylladb#17802
@kostja
Copy link
Contributor

kostja commented Apr 18, 2024

@kbr-scylla unless Tzach explicitly requested "disable-" I join Avi's opinion and prefer to avoid double negation.

@tzach
Copy link
Contributor

tzach commented Apr 18, 2024

AFAIK, I did not request a "disable-" fleg

@patjed41
Copy link
Contributor

The new proposal is force-gossip-topology-changes. See discussion in #18284.

patjed41 added a commit to patjed41/scylladb that referenced this issue Apr 22, 2024
We are going to remove the `consistent-topology-changes` experimental
feature in 6.0. However, the topology upgrade procedure will be
manual and voluntary, so some 6.0 clusters will be using the
gossip-based topology. Therefore, we need to continue testing the
gossip-based topology. The solution is introducing a new flag,
`disable-consistent-topology-changes`, that will enforce the
gossip-based topology in a fresh cluster.

In this patch, we only introduce the parameter without any effect.
Here is the explanation. Removing `consistent-topology-changes`
and introducing `disable-consistent-topology-changes` requires
adjustments in scylla-dtest. We want to merge changes to scylladb
and scylla-dtest in a way that ensures all tests are run correctly
during the whole process. If we merged all changes to scylladb first,
before merging the scylla-dtest changes, all tests would run with
the raft-based topology and the ones excluded in the raft-based
topology would fail. We also can't merge all changes to scylla-dtest
first. However, we can follow this plan:
1. scylladb: merge this patch
2. scylla-dtest: start using `disable-consistent-topology-changes`
   in jobs that run without the raft-based topology
3. scylladb: merge the rest of the changes
4. scylla-dtest: merge the rest of the changes

Ref scylladb#17802
patjed41 added a commit to patjed41/scylladb that referenced this issue Apr 22, 2024
We are going to remove the `consistent-topology-changes` experimental
feature in 6.0. However, the topology upgrade procedure will be
manual and voluntary, so some 6.0 clusters will be using the
gossip-based topology. Therefore, we need to continue testing the
gossip-based topology. The solution is introducing a new flag,
`disable-consistent-topology-changes`, that will enforce the
gossip-based topology in a fresh cluster.

In this patch, we only introduce the parameter without any effect.
Here is the explanation. Removing `consistent-topology-changes`
and introducing `disable-consistent-topology-changes` requires
adjustments in scylla-dtest. We want to merge changes to scylladb
and scylla-dtest in a way that ensures all tests are run correctly
during the whole process. If we merged all changes to scylladb first,
before merging the scylla-dtest changes, all tests would run with
the raft-based topology and the ones excluded in the raft-based
topology would fail. We also can't merge all changes to scylla-dtest
first. However, we can follow this plan:
1. scylladb: merge this patch
2. scylla-dtest: start using `disable-consistent-topology-changes`
   in jobs that run without the raft-based topology
3. scylladb: merge the rest of the changes
4. scylla-dtest: merge the rest of the changes

Ref scylladb#17802
patjed41 added a commit to patjed41/scylladb that referenced this issue Apr 22, 2024
We are going to make the `consistent-topology-changes` experimental
feature unused in 6.0. However, the topology upgrade procedure will
be manual and voluntary, so some 6.0 clusters will be using the
gossip-based topology. Therefore, we need to continue testing the
gossip-based topology. The solution is introducing a new flag,
`force-gossip-topology-changes`, that will enforce the gossip-based
topology in a fresh cluster.

In this patch, we only introduce the parameter without any effect.
Here is the explanation. Making `consistent-topology-changes` unused
and introducing `force-gossip-topology-changes` requires adjustments
in scylla-dtest. We want to merge changes to scylladb and scylla-dtest
in a way that ensures all tests are run correctly during the whole
process. If we merged all changes to scylladb first, before merging
the scylla-dtest changes, all tests would run with the raft-based
topology and the ones excluded in the raft-based topology would fail.
We also can't merge all changes to scylla-dtest first. However, we
can follow this plan:
1. scylladb: merge this patch
2. scylla-dtest: start using `force-gossip-topology-changes`
   in jobs that run without the raft-based topology
3. scylladb: merge the rest of the changes
4. scylla-dtest: merge the rest of the changes

Ref scylladb#17802
patjed41 added a commit to patjed41/scylladb that referenced this issue Apr 22, 2024
We are going to make the `consistent-topology-changes` experimental
feature unused in 6.0. However, the topology upgrade procedure will
be manual and voluntary, so some 6.0 clusters will be using the
gossip-based topology. Therefore, we need to continue testing the
gossip-based topology. The solution is introducing a new flag,
`force-gossip-topology-changes`, that will enforce the gossip-based
topology in a fresh cluster.

In this patch, we only introduce the parameter without any effect.
Here is the explanation. Making `consistent-topology-changes` unused
and introducing `force-gossip-topology-changes` requires adjustments
in scylla-dtest. We want to merge changes to scylladb and scylla-dtest
in a way that ensures all tests are run correctly during the whole
process. If we merged all changes to scylladb first, before merging
the scylla-dtest changes, all tests would run with the raft-based
topology and the ones excluded in the raft-based topology would fail.
We also can't merge all changes to scylla-dtest first. However, we
can follow this plan:
1. scylladb: merge this patch
2. scylla-dtest: start using `force-gossip-topology-changes`
   in jobs that run without the raft-based topology
3. scylladb: merge the rest of the changes
4. scylla-dtest: merge the rest of the changes

Ref scylladb#17802
kbr-scylla pushed a commit that referenced this issue Apr 23, 2024
We are going to make the `consistent-topology-changes` experimental
feature unused in 6.0. However, the topology upgrade procedure will
be manual and voluntary, so some 6.0 clusters will be using the
gossip-based topology. Therefore, we need to continue testing the
gossip-based topology. The solution is introducing a new flag,
`force-gossip-topology-changes`, that will enforce the gossip-based
topology in a fresh cluster.

In this patch, we only introduce the parameter without any effect.
Here is the explanation. Making `consistent-topology-changes` unused
and introducing `force-gossip-topology-changes` requires adjustments
in scylla-dtest. We want to merge changes to scylladb and scylla-dtest
in a way that ensures all tests are run correctly during the whole
process. If we merged all changes to scylladb first, before merging
the scylla-dtest changes, all tests would run with the raft-based
topology and the ones excluded in the raft-based topology would fail.
We also can't merge all changes to scylla-dtest first. However, we
can follow this plan:
1. scylladb: merge this patch
2. scylla-dtest: start using `force-gossip-topology-changes`
   in jobs that run without the raft-based topology
3. scylladb: merge the rest of the changes
4. scylla-dtest: merge the rest of the changes

Ref #17802

Closes #18284
kbr-scylla added a commit that referenced this issue Apr 26, 2024
…al and make it the default for new clusters' from Patryk Jędrzejczak

We move consistent cluster management out of experimental and
make it the default for new clusters in 6.0. In code, we make the
`consistent-topology-changes` flag unused and assumed to be true.

In 6.0, the topology upgrade procedure will be manual and
voluntary, so some clusters will still be using the gossip-based
topology even though they support the raft-based topology.
Therefore, we need to continue testing the gossip-based topology.
This is possible by using the `force-gossip-topology-changes` flag
introduced in #18284.

Ref #17802

Closes #18285

* github.com:scylladb/scylladb:
  docs: raft.rst: update after removing consistent-topology-changes
  treewide: fix indentation after the previous patch
  db: config: make consistent-topology-changes unused
  test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls
  test: test_read_required_hosts: run with force-gossip-topology-changes
  storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes
  storage_service: join_token_ring: fix finish_setup_after_join calls
@kbr-scylla
Copy link
Contributor Author

Step 4 of the plan #17802 (comment) was done, so I'm closing the issue as done.

@patjed41 please create a follow-up issue for the cleanups you mentioned at the end.

@patjed41
Copy link
Contributor

patjed41 commented May 9, 2024

Step 4 of the plan #17802 (comment) was done, so I'm closing the issue as done.

@patjed41 please create a follow-up issue for the cleanups you mentioned at the end.

I opened #18600

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/topology changes status/release blocker Preventing from a release to be promoted
Projects
None yet
Development

No branches or pull requests

6 participants