cluster: invoke config_frontend methods on controller shard #17088

pgellert · 2024-03-14T09:31:58Z

Various places in the code were calling do_patch directly without regard
to the requirement that do_patch has to be called on the controller
shard.

This caused a fixture test to fail because it tried to invoke do_patch
on all shards and this violates the assertion in do_patch of
config_frontend.cc, causing it to fail with the error message "Must be
called on version_shard".

This fixes it by changing config_fronter::patch() to invoke do_patch on
the controller shard, and moving all calls of do_patch to call patch
instead.

Backports Required

Release Notes

Bug Fixes

Fixes a bug of config_frontend methods getting called on shards other than the controller shard.

pgellert · 2024-03-14T09:34:30Z

I am not sure why this test started failing now, because I can't see any recent changes that would explain it. But based on the code it seems to me that this is how we should fix the failing test.

src/v/kafka/server/tests/alter_config_test.cc

vbotbuildovich · 2024-03-14T12:35:26Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46174#018e3ca9-0fed-49d5-bd21-1f96d0cc2ed3

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46190#018e3d59-3145-421a-9c59-388bd910662d

Various places in the code were calling do_patch directly without regard to the requirement that do_patch has to be called on the controller shard. This caused a fixture test to fail because it tried to invoke do_patch on all shards and this violates the assertion in do_patch of config_frontend.cc, causing it to fail with the error message "Must be called on version_shard". This fixes it by changing config_fronter::patch() to invoke do_patch on the controller shard, and moving all calls of do_patch to call patch instead.

Just like patch, we also have to call do_set_next_version on the same shard because config_frontend::set_next_version might be called from a background fiber.

BenPope

LGTM

Maybe @dotnwat has an opinion, too?

It might be worth backporting this, the change in the metrics reporter might not be inconsequential.

pgellert · 2024-03-15T13:49:28Z

Makes sense. I've updated the description now to make this a backport + added a bug-fix description.

dotnwat · 2024-03-16T00:32:58Z

src/v/cluster/config_frontend.cc

-        co_return co_await do_patch(std::move(update), timeout);
+        co_return co_await container().invoke_on(


why are we switching cores here to invoke do_patch rather than requiring the caller to invoke patch on the correct core, like in cluster::service::config_update?

to be clear, i'm not saying it should be one way or another. but i do think it should be consistent. i'd probably choose whichever pattern aligned with a majority of the callers, and then add a comment expressing the expectation on callers (if there are any), and an assertion to check it.

It clearly has been used incorrectly; I can't imagine a downside of making do_patch private and dispatching to the correct core within patch, but I may have missed something.

The primary advantage is that it makes the API harder to misuse.

As far as I can tell, requiring the caller to know which core to call it on is error prone and has no advantage.

I may be missing something, though.

Agree with what Ben said above.

This also improves consistency locally, by making all of the public methods of config_frontend callable from any shard, not just set_status.

Globally across *_frontend.h public methods, there is an inconsistency where some methods are callable from any shard while some aren't. I would think that since invoke_on seems cheap, we should standardise making it the *_frontend classes' responsibility to delegate work to the correct core. But that's a larger undertaking.

Yes, allowing the method to be invoked on any core is great.

Internally we should also be consistent:

auto leader = _leaders.local().get_leader(model::controller_ntp); if (!leader) { co_return patch_result{ .errc = errc::no_leader_controller, .version = config_version_unset}; } if (leader == _self) { co_return co_await do_patch(std::move(update), timeout); co_return co_await container().invoke_on(

here we are combining state from different cores. it's benign in this case, but in general, it should be consistent.

vbotbuildovich · 2024-03-19T17:35:50Z

/backport v23.3.x

vbotbuildovich · 2024-03-19T17:35:51Z

/backport v23.2.x

vbotbuildovich · 2024-03-19T17:36:54Z

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-17088-v23.2.x-498 remotes/upstream/v23.2.x
git cherry-pick -x b7202f11b98f555a5403c77ff65f91e5a2f24f67 ed04fc3eba2b2e66ed68861d84ecb0a26bc2055a

Workflow run logs.

github-actions bot added the area/redpanda label Mar 14, 2024

BenPope reviewed Mar 14, 2024

View reviewed changes

src/v/kafka/server/tests/alter_config_test.cc Outdated Show resolved Hide resolved

pgellert force-pushed the fix-fixture-test-invoke branch 2 times, most recently from dfc93a2 to 79fd14d Compare March 14, 2024 12:58

pgellert changed the title ~~kafka/test: invoke patch on controller shard~~ cluster: invoke do_patch on controller shard Mar 14, 2024

pgellert force-pushed the fix-fixture-test-invoke branch from 79fd14d to 28ed6b7 Compare March 14, 2024 13:26

pgellert changed the title ~~cluster: invoke do_patch on controller shard~~ cluster: invoke config_frontend methods on controller shard Mar 14, 2024

pgellert requested a review from BenPope March 14, 2024 13:27

cluster: invoke set_next_version on controller shard

ed04fc3

Just like patch, we also have to call do_set_next_version on the same shard because config_frontend::set_next_version might be called from a background fiber.

pgellert force-pushed the fix-fixture-test-invoke branch from 28ed6b7 to ed04fc3 Compare March 14, 2024 13:29

pgellert self-assigned this Mar 14, 2024

BenPope approved these changes Mar 15, 2024

View reviewed changes

pgellert requested a review from dotnwat March 15, 2024 15:22

dotnwat reviewed Mar 16, 2024

View reviewed changes

pgellert requested a review from dotnwat March 18, 2024 13:00

pgellert merged commit 09304e5 into redpanda-data:dev Mar 19, 2024
17 checks passed

This was referenced Mar 19, 2024

[v23.3.x] cluster: invoke config_frontend methods on controller shard #17184

Merged

[v23.2.x] cluster: invoke config_frontend methods on controller shard #17185

Closed

pgellert mentioned this pull request Mar 20, 2024

[v23.2.x] cluster: invoke config_frontend methods on controller shard #17211

Merged

renovate bot mentioned this pull request May 4, 2024

feat(github-release)!: Update redpanda-operator to v24.1.6 otosky/home-ops#1232

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster: invoke config_frontend methods on controller shard #17088

cluster: invoke config_frontend methods on controller shard #17088

pgellert commented Mar 14, 2024 •

edited

pgellert commented Mar 14, 2024

vbotbuildovich commented Mar 14, 2024 •

edited

BenPope left a comment •

edited

pgellert commented Mar 15, 2024

dotnwat Mar 16, 2024

BenPope Mar 16, 2024 •

edited

pgellert Mar 18, 2024

dotnwat Mar 19, 2024

vbotbuildovich commented Mar 19, 2024

vbotbuildovich commented Mar 19, 2024

vbotbuildovich commented Mar 19, 2024

		co_return co_await do_patch(std::move(update), timeout);
		co_return co_await container().invoke_on(

cluster: invoke config_frontend methods on controller shard #17088

cluster: invoke config_frontend methods on controller shard #17088

Conversation

pgellert commented Mar 14, 2024 • edited

Backports Required

Release Notes

Bug Fixes

pgellert commented Mar 14, 2024

vbotbuildovich commented Mar 14, 2024 • edited

BenPope left a comment • edited

Choose a reason for hiding this comment

pgellert commented Mar 15, 2024

dotnwat Mar 16, 2024

Choose a reason for hiding this comment

BenPope Mar 16, 2024 • edited

Choose a reason for hiding this comment

pgellert Mar 18, 2024

Choose a reason for hiding this comment

dotnwat Mar 19, 2024

Choose a reason for hiding this comment

vbotbuildovich commented Mar 19, 2024

vbotbuildovich commented Mar 19, 2024

vbotbuildovich commented Mar 19, 2024

pgellert commented Mar 14, 2024 •

edited

vbotbuildovich commented Mar 14, 2024 •

edited

BenPope left a comment •

edited

BenPope Mar 16, 2024 •

edited