Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform: gc committed offsets #16526

Merged
merged 8 commits into from
Mar 4, 2024

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Feb 7, 2024

Prior to this patch set - it was not possible to cleanup committed offsets. This means that if lots of transforms were created and then deleted one could run into the limit with no escape hatch. This patch set implements an escape hatch to cleanup offsets.

This is implemented as a manual GC mechanism because this can race with deploys and delete offsets that are still in use. A way to get around that is to expose a max transform id or something from the controller log so that we don't ever delete any ids that the target shard has not yet seen. Since IDs are assigned from the first deploy and their offset within the controller log, we could do this by the committed offset of the controller log. However, instead of implementing that and running it periodically, we just make it a manual endpoint to hit, the automatic GC could happen at a later date.

This is a followup to: #16185

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

@github-actions github-actions bot added area/redpanda area/wasm WASM Data Transforms labels Feb 7, 2024
@rockwotj rockwotj self-assigned this Feb 7, 2024
@rockwotj rockwotj marked this pull request as ready for review February 8, 2024 02:12
@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/44903#018d8a18-b4e3-4667-bc3f-e7586c03af85:

"rptest.tests.consumer_group_recovery_test.ConsumerOffsetsRecoveryTest.test_consumer_offsets_partition_recovery"

bharathv
bharathv previously approved these changes Feb 20, 2024
Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a high level thought on holding write lock, lgtm otherwise.


ss::future<errc> remove_all(ss::noncopyable_function<bool(Key)> pred) {
auto holder = _gate.hold();
auto units = co_await _snapshot_lock.hold_read_lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could hold a write lock since remove_all seems like a rare operation and the side effects of it are easier to explain if the entire operation sees a consistent snapshot of the state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is a rare case, so probably cleaner to make it see consistent state. I've pushed an update.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Based on a set transform IDs, this will be used to GC deleted
transforms.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This will be used to GC offsets when transforms are deleted.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
note the comments in the
transform::service::garbage_collect_committed_offsets about the
consistency of these operations.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
With all the caveats mentioned in the `transform::service` documentation

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This makes sure our debug endpoint to cleanup offsets works.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
In accordance with new documentation standards document some more
methods in this stm.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@vbotbuildovich
Copy link
Collaborator

@rockwotj rockwotj merged commit f8dd25f into redpanda-data:dev Mar 4, 2024
17 checks passed
@rockwotj rockwotj deleted the gc-committed-offsets branch March 4, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda area/wasm WASM Data Transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants