Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[transaction] Clear old pids from rm_stm state in overflow #7057

Merged
merged 9 commits into from
Dec 2, 2022

Conversation

VadimPlh
Copy link
Contributor

@VadimPlh VadimPlh commented Nov 2, 2022

Cover letter

Problem

User can use transaction/idempotency in different way. Some of them can be cause of problem. For example: user can create producer per request. Idempotency in kafka protocol works per session, the session is scoped by a producer so if user create a producer per request we will store all info for each producer in Idempotency case. It does not provide any benefits for user, but will create big maps in internal state for rm_stm. Also the same things for transaction.

We need to prevent this type overflow. So idea is add new settings (max_concurrent_producer_ids). And when rm_stm will reach this limits we will spawn clear process for log and mame state.

Fix

For Idempotency we use _log_state.seq_table, where we store session info. The simplest way to track order of last apply_data for pid. We will store seq_entry inside intrusive list to store replication order.

For transaction is a little bit harder. We can not delete any info for ongoing transaction. But we can understand do user finish transaction or not, For this we just need to check _log_state.tx_seqs. Because on abort or commit we clear this map. So for all commited/aborted txs we just can delete all info from internal maps

Fixes #5321

Backport Required

  • not a bug fix
  • issue does not exist in previous branches
  • papercut/not impactful enough to backport
  • v22.2.x
  • v22.1.x
  • v21.11.x

UX changes

  • none

Release notes

Features

  • Adding new redpanda setting (max_concurrent_producer_ids) to control how much sessions for transaction/idempotency will be saved

@VadimPlh VadimPlh changed the title Issue 5321 [transaction] Clear old pids from rm_stm state in overflow Nov 3, 2022
@mmedenjak mmedenjak added kind/bug Something isn't working area/raft and removed area/redpanda labels Nov 3, 2022
@VadimPlh VadimPlh force-pushed the issue-5321 branch 3 times, most recently from 2e116dc to 43dde1b Compare November 3, 2022 16:47
@VadimPlh VadimPlh marked this pull request as ready for review November 3, 2022 16:48
@piyushredpanda piyushredpanda added this to the v22.3.1-rc5 milestone Nov 4, 2022
Copy link
Contributor

@rystsov rystsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tx houskeeping looks good while the idempotency has issues. also please add tests for the idempotency, something like:

  1. create a producer, produce
  2. create n producers to exhaust the limit and kick out the first producer
  3. produce via the first producer
  4. observe error

src/v/cluster/rm_stm.cc Show resolved Hide resolved
src/v/cluster/rm_stm.cc Outdated Show resolved Hide resolved
src/v/cluster/rm_stm.cc Outdated Show resolved Hide resolved
src/v/cluster/rm_stm.h Outdated Show resolved Hide resolved
src/v/cluster/rm_stm.h Outdated Show resolved Hide resolved
@mmaslankaprv
Copy link
Member

I am wondering if we want to remove the oldest PIDs from all of them ?)
If so we may not need to keep another index but instead use an intrusive list to keep track of an insertion order and add a timestamp to existing object this way we would iterate (according to insertion order) up to the point where PIDs are more recent than the the requested timestamp and remove all previous (or if it is just number of elements then we wouldn't even have to track timestamp).

Similar approach is used in fetch_session.h

@VadimPlh VadimPlh force-pushed the issue-5321 branch 2 times, most recently from 080dd5e to 86aac9c Compare November 4, 2022 15:15
@VadimPlh VadimPlh force-pushed the issue-5321 branch 2 times, most recently from 5e9115e to d89123e Compare November 7, 2022 14:26
rystsov
rystsov previously approved these changes Dec 1, 2022
Copy link
Contributor

@rystsov rystsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need @dotnwat's feedback on the name of the property but otherwise it looks good!

@@ -2232,6 +2234,8 @@ void rm_stm::apply_data(model::batch_identity bid, model::offset last_offset) {
_log_state.lru_idempotent_pids.iterator_to(seq_it->second));
}
_log_state.lru_idempotent_pids.push_back(seq_it->second);
spawn_background_clean_for_pids(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop rm_stm::? sometimes you use clear_type sometimes not

@dotnwat
Copy link
Member

dotnwat commented Dec 1, 2022

I think we need @dotnwat's feedback on the name of the property but otherwise it looks good!

@rystsov I chimed in on that comment thread. but i wasn't thinking it was about the name, but rather the default value.

We need to prevent overflow for some
state maps, so we need to delete info
for pids which does not have tx.
For this we just should check is pid
inside tx_seqs map or not. If not it means
tx was commited or aborted
To track old session for idempotent request
we need to store replication order for them
for delete old session in right order.
So idea is clear internal state for oldest
session if we have overflow. we
just need to delete info for oldest
pid in out state
New option will control the max size of
internal maps in rm_stm.

User can use transaction/idempotency in different way.
Some of them can be cause of problem. For example:
user can create producer per request.
Idempotency in kafka protocol works per session,
the session is scoped by a producer so if user create
a producer per request we will store all info for
each producer in Idempotency case.
It does not provide any benefits for user,
but will create big maps in internal state for rm_stm.
Also the same things for transaction.
Run cleaning for old session in
replicate_seq and apply_data
@VadimPlh VadimPlh added the ci-repeat-5 repeat tests 5x concurrently to check for flakey tests; self-cancelling label Dec 2, 2022
@vbotbuildovich vbotbuildovich removed the ci-repeat-5 repeat tests 5x concurrently to check for flakey tests; self-cancelling label Dec 2, 2022
@dotnwat dotnwat merged commit c6afd9d into redpanda-data:dev Dec 2, 2022
@VadimPlh
Copy link
Contributor Author

VadimPlh commented Dec 2, 2022

/backport v22.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

waiting for offset=8546353 failed with raft::offset_monitor::wait_aborted
7 participants