Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RBAC: Enable the role store to survive a controller snapshot #17562

Merged
merged 2 commits into from
Apr 3, 2024

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Apr 2, 2024

Snapshot machinery for roles was missing previously.

Updates a ducktape test to ensure that a controller snapshot occurs before restarting the cluster.

Validator (test should fail absent these changes): #17560

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 3, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/47252#018ea170-f5b9-4602-aefa-32803461f2db:

"rptest.tests.controller_snapshot_test.ControllerSnapshotTest.test_upgrade_compat"

new failures in https://buildkite.com/redpanda/redpanda/builds/47252#018ea170-f5b0-40c5-91f0-510401a699b0:

"rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_and_upgrade"

new failures in https://buildkite.com/redpanda/redpanda/builds/47252#018ea177-5fec-46b5-8e89-d46c63b91ce8:

"rptest.tests.transactions_test.TxUpgradeTest.upgrade_does_not_change_tx_coordinator_assignment_test"
"rptest.tests.tx_coordinator_migration_test.TxCoordinatorMigrationTest.test_migrating_tx_manager_coordinator.with_failures=True.upgrade=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/47252#018ea177-5ff4-4c27-bf95-e18fb54c6341:

"rptest.tests.controller_upgrade_test.ControllerUpgradeTest.test_updating_cluster_when_executing_operations"

new failures in https://buildkite.com/redpanda/redpanda/builds/47252#018ea177-5ff1-4485-a228-2c99e985aee3:

"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_cross_shard.num_to_upgrade=2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_shadow_indexing.num_to_upgrade=2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_and_upgrade"
"rptest.tests.controller_snapshot_test.ControllerSnapshotTest.test_upgrade_compat"
"rptest.tests.tx_coordinator_migration_test.TxCoordinatorMigrationTest.test_migrating_tx_manager_coordinator.with_failures=False.upgrade=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/47252#018ea177-5fef-468c-9d8f-e57ae1dd174d:

"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_cross_shard.num_to_upgrade=2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.upgrade_test.UpgradeFromPriorFeatureVersionCloudStorageTest.test_rolling_upgrade.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.workload_upgrade_runner_test.RedpandaUpgradeTest.test_workloads_through_releases.cloud_storage_type=CloudStorageType.S3"

@oleiman oleiman marked this pull request as draft April 3, 2024 02:24
@oleiman
Copy link
Member Author

oleiman commented Apr 3, 2024

force push for controller_snapshot_parts::security_t version compat.

@oleiman oleiman marked this pull request as ready for review April 3, 2024 02:57
@oleiman
Copy link
Member Author

oleiman commented Apr 3, 2024

force push improve test

@@ -166,6 +189,10 @@ security_t::serde_async_read(iobuf_parser& in, serde::header const h) {
in, h._bytes_left_limit);
acls = co_await read_vector_async_nested<decltype(acls)>(
in, h._bytes_left_limit);
if (h._version > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

tag_t<write_tag>,
iobuf& out,
cluster::controller_snapshot_parts::security_t::named_role t) {
write(out, std::move(t.name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I figured it wasn't worth versioning the key by making named_role an envelope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this seems sensible to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenPope @oleiman Skipping the envelope saves only 2 bytes (the version and compact version are both int8_t). For that savings you have to write more code, and you can never change the encoding. I agree it's unlikely it will change, but why risk it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, when you put it that way...may as well just knock it out while we still can

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (const auto& role : roles) {
security::role_name name{role};
snapshot.roles.emplace_back(name, *_roles.local().get(name));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: Intentionally without ss::maybe_yield, roles is a view.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, I did recently make role_store::range return a full container (frag vec) to clean up the header, but the items are are string_views.

@@ -907,6 +927,8 @@ def test_role_survives_restart(self):

assert r.name == rand_role

self._wait_for_everything_snapshotted(self.redpanda.nodes, admin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's possible for the snapshot to sneak in between the last command update_role_members and this line. Maybe this should go above the wait on 922?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, yeah I'll play with it a bit locally

@oleiman
Copy link
Member Author

oleiman commented Apr 3, 2024

/ci-repeat 3
skip-redpanda-build
skip-units
dt-repeat=20
tests/rptest/tests/rbac_test.py::RolePersistenceTest

BenPope and others added 2 commits April 3, 2024 10:32
Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
As a spot check that the role store has been properly integrated into
controller snapshot machinery.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman
Copy link
Member Author

oleiman commented Apr 3, 2024

force push to tweak the test slightly

@oleiman oleiman requested a review from BenPope April 3, 2024 18:11
@oleiman oleiman merged commit da7972f into redpanda-data:dev Apr 3, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants