-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transform: gc committed offsets #16526
Conversation
ea93e12
to
31343d0
Compare
31343d0
to
297864d
Compare
new failures in https://buildkite.com/redpanda/redpanda/builds/44903#018d8a18-b4e3-4667-bc3f-e7586c03af85:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a high level thought on holding write lock, lgtm otherwise.
src/v/cluster/distributed_kv_stm.h
Outdated
|
||
ss::future<errc> remove_all(ss::noncopyable_function<bool(Key)> pred) { | ||
auto holder = _gate.hold(); | ||
auto units = co_await _snapshot_lock.hold_read_lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this could hold a write lock since remove_all seems like a rare operation and the side effects of it are easier to explain if the entire operation sees a consistent snapshot of the state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is a rare case, so probably cleaner to make it see consistent state. I've pushed an update.
297864d
to
8776bf6
Compare
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Based on a set transform IDs, this will be used to GC deleted transforms. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This will be used to GC offsets when transforms are deleted. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
note the comments in the transform::service::garbage_collect_committed_offsets about the consistency of these operations. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
With all the caveats mentioned in the `transform::service` documentation Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This makes sure our debug endpoint to cleanup offsets works. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
In accordance with new documentation standards document some more methods in this stm. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
8776bf6
to
cfef68c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/45625#018e0a7c-58db-4ee6-ab18-aceb9539c21d |
Prior to this patch set - it was not possible to cleanup committed offsets. This means that if lots of transforms were created and then deleted one could run into the limit with no escape hatch. This patch set implements an escape hatch to cleanup offsets.
This is implemented as a manual GC mechanism because this can race with deploys and delete offsets that are still in use. A way to get around that is to expose a max transform id or something from the controller log so that we don't ever delete any ids that the target shard has not yet seen. Since IDs are assigned from the first deploy and their offset within the controller log, we could do this by the committed offset of the controller log. However, instead of implementing that and running it periodically, we just make it a manual endpoint to hit, the automatic GC could happen at a later date.
This is a followup to: #16185
Backports Required
Release Notes