Kafka: Add support for the delete-records API #10061

graphcareful · 2023-04-13T16:37:46Z

This PR adds the delete-records feature to redpanda. This feature allows a kafka client to truncate data from a topic/partition at a given offset. To support this in redpanda two non-trivial tasks were implemented:

Addition of new stm (and corresponding special batch type) to process prefix truncation events.
Addition of new concept (visible_start_offset) which is a delta into the first segment of the log representing the first offset a fetch request could read from.

Note, as of this branch delete-records requests for cloud enabled topics will be rejected. Work on supporting this is slated to be completed in a follow up PR, ~~which depends on a currently still in review PR~~ #9994

Backports Required

Release Notes

Features

Adds support for the delete-records API

graphcareful · 2023-04-15T00:42:08Z

Added more ducktape tests

emaxerrno · 2023-04-17T03:05:01Z

wondering if franz-go has some delete records test, or if the librdkafka test suite should be enabled for this or some other external-API project that exercises this path for edge case detection.

jcsp

Looks like lots of tests are failing, perhaps related to the timeout change in the PR.

tests/rptest/tests/prefix_truncate_test.py

tests/rptest/util.py

src/v/cluster/partition.h

src/v/features/feature_table.h

src/v/kafka/server/handlers/delete_records.cc

tests/rptest/tests/prefix_truncate_test.py

src/v/raft/log_eviction_stm.cc

graphcareful · 2023-04-17T19:12:54Z

Changes in force-push

Modify prefix_truncate name schema to delete_records
Fixed bug where truncating with -1 (high watermark) may cause a crash
Fixed bug that made offset translation not work correctly due to not excluding the new special batch type in translation accounting
Fixed bug that made delete_records request return early due to off-by-one error in firing new min offset event notification. Returned low_watermark would appear lower then expected.
Fixed bug that would made subsequent request to prefix truncate within the same segment fail due to not updating conditionals in consensus::write_snapshot
Fixed bug that would cause a crash when writing a snapshot at the first segment. consensus::write_snapshot expects get_term to always work, which was previously true since write_snapshot would never be called at a last_included_index value of 0. Now this is not the case since write_snapshot can be called just to increment the new log start delta.
Fixed bug where log_eviction_stm would start processing events from start_offset() instead of visible_start_offset(). No issues arose because events would be ignored, after correctly determining that the older events aren't applicable to a log with a start_offset + delta ahead of the requested.
Fixed bug where delete-records at the high watermark - 1 offset (truncate all data) failed due to off-by-one error
Fixed bug where log_eviction_stm was not truncating the underlying log at segment end offsets but rather segment start offsets. (only for events arisen from delete-records requests)
Fixed bug that caused some CloudStorage tests in CI to fail, random default timeout was incorrectly modified
Modify KCL::delete_records for better error condition handling
Much more ducktape tests

graphcareful · 2023-04-17T19:25:59Z

graphcareful · 2023-04-18T14:54:22Z

Test failures look like:

FAIL: <NodeCrash docker-rp-17: ERROR 2023-04-18 01:52:13,290 [shard 1] assert - Assert failure: (/var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-0ffc8dc9474419720-1/redpanda/redpanda/src/v/raft/consensus.cc:2273) 'config.has_value()' Configuration for offset 7919 must be available in configuration manager: {configurations: { offset: {9049}, idx: {1},cfg: {current: {voters: {{id: {3}, revision: {25}}, {id: {1}, revision: {25}}, {id: {2}, revision: {25}}}, learners: {}}, old:{nullopt}, revision: 25, update: {nullopt}, version: 5}} } 
>

Note this must indicate a bug in the changes I've made to log_eviction_stm, when handling GC events from storage.

graphcareful · 2023-04-18T22:02:21Z

Changes in force-push

Rebase against dev to resolve conflicts in feature_table (another change added a new feature)
Modified log_eviction housekeeping loop to periodically sleep when there is no work to do, or in the case the work item was re-enqueued due to the offset being greater then the max collectible.
Fixed a bug that was resulting in test failures incorrect variable was being referred to in log_eviction_stm::do_process_log_eviction
Fixed a newly written flaky ducktape test by repeating if NOT_LEADER is observed
Made some changes around previous off-by-one fixes to remove confusing logic around subtracting an extra 1 here and there.

VladLazar

There's still some things I don't understand, so I'll need to come back to it.

src/v/raft/log_eviction_stm.h

src/v/storage/disk_log_impl.cc

src/v/raft/log_eviction_stm.cc

graphcareful · 2023-04-20T00:15:20Z

Changes in force-push

Fix for CI issues which were causing some retention based tests to fail. Issue was writing a snapshot with offset::min as the value of log_start_delta
Fixed issue where truncation at bounds wasn't entirely respected, new ducktape test written to test for this case. (log of one record, truncate at 0 & 1)
Expanded a ducktape test to truncates at the high watermark, only truncation at 1 before the high watermark was tested.

graphcareful · 2023-04-20T01:22:45Z

Changes in force-push

Reduce eviction housekeeping interval from 1s to 200ms. Fixed a failing raft unit test that had a 2s timeout on checking a segments eviction. 1s also seemed a bit too long, open to discussion on this.
Removed a useless tracelog from log_eviction_stm

src/v/raft/log_eviction_stm.cc

src/v/raft/consensus.cc

src/v/raft/log_eviction_stm.h

src/v/raft/log_eviction_stm.cc

src/v/kafka/server/handlers/delete_records.cc

src/v/kafka/protocol/schemata/delete_records_request.json

src/v/cluster/log_eviction_stm.cc

- This commit modifies the log_eviction_stm to be a proper stm that will scan the underlying log for new special batches of type 'prefix_truncate'. - These new special batches are indicators to modify the start offset. Since this is replicated to peers, eventually all peers will modify their respective start offsets as well. - There is a new background processing fiber added to this class as well that attempts to write a raft snapshot , reaping as much data as possible as close to the current start offset. - The class is a subclass of persisted_stm and persists to the kvstore its last applied offset and the current start offset. In the case of failures these pieces of information can be used to startup from the last processed offset.

- The kafka layer presents the prefix_truncate abstraction, it performs bounds checking with the given kafka offset and translates this to a rp model::offset before invoking cluster::prefix_truncate , which will boil down to replicating the prefix truncate special batch.

- This stm has a conditional in its last_stable_offset() method that returns an invalid offset in the case it hasn't completed bootstrapping. - The issue is that this bootstrap phase isn't considered finished after bootstrapping from apply_snapshot(). This would cause other stms to pause thinking the rm_stm had work to do at an offset at 0, causing that other stm to timeout and fail processing of said event. - Solution is simple, to set `_boostrap_committed_offset`within the `apply_snapshot()` method - Fixes: redpanda-data#11131 - Fixes: redpanda-data#11130

- relevent kafka git commit: e7acf3c

- Adds support for the DeleteRecords API

- Now that the start offset is not an independent concept across all nodes its desired to ensure the leader node has cought up with previous leader with respect to processing of prefix truncate special batches before returning the start offset. - This avoids returning a stale start_offset in the case leadership has recently changed and the log_eviction_stm hasn't yet caught up.

graphcareful · 2023-06-21T23:21:09Z

CI failures

Investigating a leak seen in RackAwarePlacementTest.test_replica_placement

graphcareful · 2023-06-22T14:32:05Z

/ci-repeat 1
debug
skip-units
dt-repeat=20
tests/rptest/tests/rack_aware_replica_placement_test.py::RackAwarePlacementTest.test_replica_placement

graphcareful · 2023-06-22T16:28:42Z

/ci-repeat 1
debug
skip-units
dt-repeat=20
tests/rptest/tests/memory_stress_test.py::MemoryStressTest.test_fetch_with_many_partitions

graphcareful · 2023-06-22T17:40:15Z

/ci-repeat 1
debug
skip-units
dt-repeat=200
tests/rptest/tests/memory_stress_test.py::MemoryStressTest.test_fetch_with_many_partitions

graphcareful requested review from dotnwat, jcsp, ztlpn, NyaliaLui, VladLazar and mmaslankaprv April 13, 2023 16:37

github-actions bot added the area/redpanda label Apr 13, 2023

jcsp reviewed Apr 17, 2023

View reviewed changes

graphcareful force-pushed the delete-records branch from bc82906 to 4d086ba Compare April 17, 2023 19:04

graphcareful force-pushed the delete-records branch from 4d086ba to 8a2d84b Compare April 17, 2023 19:19

graphcareful requested a review from jcsp April 17, 2023 19:31

graphcareful force-pushed the delete-records branch from 8a2d84b to a765617 Compare April 18, 2023 00:03

graphcareful force-pushed the delete-records branch 3 times, most recently from 1626948 to ebb4b29 Compare April 18, 2023 21:57

VladLazar reviewed Apr 19, 2023

View reviewed changes

src/v/raft/log_eviction_stm.h Outdated Show resolved Hide resolved

src/v/storage/disk_log_impl.cc Outdated Show resolved Hide resolved

src/v/raft/log_eviction_stm.cc Outdated Show resolved Hide resolved

graphcareful force-pushed the delete-records branch from ebb4b29 to 3a05354 Compare April 20, 2023 00:11

graphcareful requested a review from VladLazar April 20, 2023 00:11

graphcareful force-pushed the delete-records branch from 3a05354 to 2fb1f65 Compare April 20, 2023 01:21

VladLazar reviewed Apr 20, 2023

View reviewed changes

NyaliaLui reviewed Apr 20, 2023

View reviewed changes

src/v/kafka/server/handlers/delete_records.cc Outdated Show resolved Hide resolved

src/v/kafka/protocol/schemata/delete_records_request.json Show resolved Hide resolved

graphcareful force-pushed the delete-records branch from 2fb1f65 to 9dd1e60 Compare April 20, 2023 14:57

graphcareful force-pushed the delete-records branch 2 times, most recently from b32b63c to 7896fbb Compare June 21, 2023 01:09

andrwng mentioned this pull request Jun 21, 2023

kafka: fix cloud storage support for delete-records #11579

Merged

7 tasks

graphcareful force-pushed the delete-records branch from 7896fbb to 25cd9e6 Compare June 21, 2023 16:01

mmaslankaprv reviewed Jun 21, 2023

View reviewed changes

src/v/cluster/log_eviction_stm.cc Outdated Show resolved Hide resolved

mmaslankaprv reviewed Jun 21, 2023

View reviewed changes

src/v/cluster/log_eviction_stm.cc Outdated Show resolved Hide resolved

mmaslankaprv reviewed Jun 21, 2023

View reviewed changes

src/v/cluster/log_eviction_stm.cc Outdated Show resolved Hide resolved

mmaslankaprv reviewed Jun 21, 2023

View reviewed changes

src/v/cluster/log_eviction_stm.cc Outdated Show resolved Hide resolved

mmaslankaprv self-requested a review June 21, 2023 17:39

mmaslankaprv previously approved these changes Jun 21, 2023

View reviewed changes

graphcareful dismissed mmaslankaprv’s stale review via 7a4e047 June 21, 2023 18:03

graphcareful force-pushed the delete-records branch from 25cd9e6 to 7a4e047 Compare June 21, 2023 18:03

mmaslankaprv self-requested a review June 21, 2023 18:04

mmaslankaprv previously approved these changes Jun 21, 2023

View reviewed changes

graphcareful added 9 commits June 21, 2023 19:15

kafka/p/schemata: Add delete_records_req/response

c3635fa

- relevent kafka git commit: e7acf3c

kafka/s/h: delete_records API request handler

82844d2

- Adds support for the DeleteRecords API

features: Add prefix_truncation feature barrier

8c82466

rptest/clients: add kcl support for delete-records

7cf2047

rptest: Adding delete-records ducktape tests

e029aa5

graphcareful dismissed mmaslankaprv’s stale review via e029aa5 June 21, 2023 23:15

graphcareful force-pushed the delete-records branch from 7a4e047 to e029aa5 Compare June 21, 2023 23:15

piyushredpanda merged commit 01e5279 into redpanda-data:dev Jun 22, 2023
30 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka: Add support for the delete-records API #10061

Kafka: Add support for the delete-records API #10061

graphcareful commented Apr 13, 2023 •

edited

graphcareful commented Apr 15, 2023

emaxerrno commented Apr 17, 2023

jcsp left a comment

graphcareful commented Apr 17, 2023 •

edited

graphcareful commented Apr 17, 2023 •

edited

graphcareful commented Apr 18, 2023

graphcareful commented Apr 18, 2023 •

edited

VladLazar left a comment

graphcareful commented Apr 20, 2023

graphcareful commented Apr 20, 2023

graphcareful commented Jun 21, 2023

graphcareful commented Jun 22, 2023

graphcareful commented Jun 22, 2023

graphcareful commented Jun 22, 2023

Kafka: Add support for the delete-records API #10061

Kafka: Add support for the delete-records API #10061

Conversation

graphcareful commented Apr 13, 2023 • edited

Backports Required

Release Notes

Features

graphcareful commented Apr 15, 2023

emaxerrno commented Apr 17, 2023

jcsp left a comment

Choose a reason for hiding this comment

graphcareful commented Apr 17, 2023 • edited

graphcareful commented Apr 17, 2023 • edited

graphcareful commented Apr 18, 2023

graphcareful commented Apr 18, 2023 • edited

VladLazar left a comment

Choose a reason for hiding this comment

graphcareful commented Apr 20, 2023

graphcareful commented Apr 20, 2023

graphcareful commented Jun 21, 2023

graphcareful commented Jun 22, 2023

graphcareful commented Jun 22, 2023

graphcareful commented Jun 22, 2023

graphcareful commented Apr 13, 2023 •

edited

graphcareful commented Apr 17, 2023 •

edited

graphcareful commented Apr 17, 2023 •

edited

graphcareful commented Apr 18, 2023 •

edited