Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage STM creation outside of cluster::partition #15424

Merged
merged 20 commits into from
Jan 3, 2024

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Dec 13, 2023

Changed the way how state machines are managed in Redpanda. Previously all the STMs were a part of cluster::partition. This was convenient as the partition STM lifecycle is in general bound with the partition lifecycle. There were two main problems with that approach:

  1. all the state machines had to be defined in cluster module or other modules that the cluster module dependent on
  2. cluster::partition interface and behaviour was customised in runtime based solely on the partition name.

The new approach introduced with this PR makes it possible to create STMs outside of the cluster module as the STM interface is generalised at the cluster module level. Newly introduced state_machine_registry allows registering an STM factory. Factory decide whether the stm is required for given Raft group and if that is the case the STM is then created.

With the new approach STM implementer can simply create an STM by implementing either a raft::state_machine_base interface or extending raft::persisted_stm. Implementer provides and register a factory in partition manager instance.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

src/v/raft/fwd.h Outdated Show resolved Hide resolved
@mmaslankaprv mmaslankaprv marked this pull request as ready for review December 14, 2023 17:28
@mmaslankaprv mmaslankaprv marked this pull request as draft December 14, 2023 17:28
@mmaslankaprv
Copy link
Member Author

/dt

@mmaslankaprv mmaslankaprv force-pushed the stm-registry branch 2 times, most recently from 4e07d39 to d3e24cb Compare December 15, 2023 09:13
@mmaslankaprv
Copy link
Member Author

/ci-repeat 1

@redpanda-data redpanda-data deleted a comment from vbotbuildovich Dec 15, 2023
@mmaslankaprv mmaslankaprv force-pushed the stm-registry branch 2 times, most recently from 1d66e8a to 8b95a3e Compare December 15, 2023 13:40
@mmaslankaprv
Copy link
Member Author

/ci-repeat 1

@mmaslankaprv
Copy link
Member Author

/ci-repeat 1

@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/42950#018c77c9-8961-4121-8f44-eb3541208d67:

"rptest.tests.delete_records_test.DeleteRecordsTest.test_delete_records_concurrent_truncations.cloud_storage_enabled=True"

@mmaslankaprv
Copy link
Member Author

/ci-repeat 1

@mmaslankaprv mmaslankaprv marked this pull request as ready for review December 18, 2023 17:31
@mmaslankaprv mmaslankaprv force-pushed the stm-registry branch 2 times, most recently from 240256f to e41ef2e Compare December 20, 2023 08:59
* Must return true if STM should be created for a partition underlaid by
* passed raft group
*/
virtual bool is_applicable_for(const raft::consensus&) const = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this in context of deletion of unmanaged partitions - it would be nice to be able to determine if the stm is applicable for a partition without creating a consensus instance. AFAICS currently all STMs require just an ntp, maybe we should start with just ntp as a parameter (and maybe add something more if the need arises, but never a full consensus instance).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was considering passing ntp_config which is what is currently required, will change that


void id_allocator_stm_factory::create(
raft::state_machine_manager_builder& builder, raft::consensus* raft) {
builder.create_stm<id_allocator_stm>(clusterlog, raft);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the line

raft->log()->stm_manager()->add_stm(stm);

required here?

Maybe it could be automated for all persisted stms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually id_allocator_stm manages eviction on its own hence it is not registered in STM manager

@@ -198,8 +191,8 @@ class partition {
/// that any truncation-delaying activitiy (like uploading to tiered
/// storage) could be expedited to enable local disk space to be reclaimed.
std::optional<model::offset> eviction_requested_offset() {
if (_log_eviction_stm) {
return _log_eviction_stm->eviction_requested_offset();
if (log_eviction_stm()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not a big deal, but doing a lookup each time seems wasteful. Maybe it is better to keep this and tm_stm as fields (initialized with stm_manager()->get() once)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see that it is there, but unused for some reason?

auto stm = partition->transform_offsets_stm();
auto stm
= partition->raft()->stm_manager()->get<transform_offsets_stm_t>(
"distributed_kv_stm");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mismatch between the name and the type looks weird and implies that we can't have e.g. more than 1 distributed_kv_stm per partition. Maybe it is better to pass the stm name when registering the factory instead of tying it to the stm type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i am not a fan of those names either, will add ability to pass a name as a part of factory registration interface

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to do partition->raft()->stm_manager()->get<stm_type>() right? And key everything off the class type?

So the underlying container is absl::flat_hash_map<std::type_index, state_machine_manager::named_stm> and the get method can create the index using: std::type_index(typeid(T)).

This assumes that each stm is it's own type, which I think is fine. For the case of the distributed_kv, I'd expect them to all have different keys and values, otherwise we can add a tag type to distinguish

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've ended up with adding a static name field to stm this way we do not need to relay ion typeid in human readable parts f.e. snapshot map or logs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah much better - love it!

"When creating transform stm the topic configuration must exists");

builder.create_stm<transform_offsets_stm_t>(
cfg->partition_count, tlog, raft);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stm_manager registration needed?

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for this 🎉 Looking forward to be able to untangle more deps and clean up the build graph because of this.

src/v/raft/state_machine_manager.h Outdated Show resolved Hide resolved
Removed `get_name` method from `raft::state_machine_base` and made it a
static variable. This way called does not have to provide a name and we
can easily query state machine by type rather than by name.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added an API method that will allow caller to query for an stm with a
specific name. The `dynamic_cast` is used to provide a basic level of
runtime correctness check.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
State machine registry is a class holding all registered state machines
factories. Registry is used by partition_manger whenever a new partition
instance is created. Registry builds all state machines that are
required for a give partition instance.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Some of the state machines in Redpanda may persists its own local state.
State machine manager must be able to handle the local state i.e. query
for its size and remove it if required.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Use stm manager APIs to query the size and delete stm local persistent
state

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added transactional manager state machine factory and using a state
machine registry to create manager stm only if partition requires it.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Introduced `fixed_string` helper allowing to use string literals as
template parameters.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Now that the stm creation is decoupled from its implementation we may
move transform stm implementation outside the cluster module.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Creating `mem_tracker` inside of archival stm to make it independent
from partition instance.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Now as the state machines are created outside of the partition we may
cleanup the dependencies used by partition abstraction.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
We no longer need `topic_configuration` to create partition instance as
the stm creation process is managed outside of the partition.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mmaslankaprv
Copy link
Member Author

ci failure:
#15042

@mmaslankaprv mmaslankaprv merged commit 0806712 into redpanda-data:dev Jan 3, 2024
17 of 19 checks passed
@mmaslankaprv mmaslankaprv deleted the stm-registry branch January 3, 2024 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants