Non-polling fetch implementation #15328

ballard26 · 2023-12-06T08:22:10Z

This PR implements a new fetch_plan_executor that doesn't repeatedly poll every partition in the fetch for new data. Instead it relies on registering callbacks with raft:consensus::visible_offset_monitor to know when a partition has new data and therefore would be worth querying once again.

The gist of how this is implemented is as follows;

A fetch "coordinator"(see kafka::nonpolling_fetch_plan_executor::execute_plan) is created on the shard that received the fetch request. This coordinator is responsible for creating fetch workers and determining when a fetch request is completed.
A fetch "worker"(see kafka::nonpolling_fetch_plan_executor::shard_fetch_worker) is created on every shard that has a partition from the request. It's responsible for querying partitions for data. And if no partitions have data it'll register with the raft:consensus::visible_offset_monitor for those partitions and wait until it increases for one or more of them. The worker only returns on errors, aborts from the worker, or when it has queried enough data to meet or exceed the lower limit the coordinator specified.

Backports Required

Release Notes

Improvements

Introduces a new non-polling fetch execution strategy that decreases CPU utilization of fetch requests and fetch request latency.
Adds a new cluster configuration property fetch_read_strategy. This property determines which fetch execution strategy Redpanda will use to fulfill a fetch request. The newly introduced non_polling execution strategy is the default for this property with the polling strategy being included to make backporting possible.

ballard26 · 2024-01-08T08:25:15Z

/dt

src/v/kafka/server/handlers/fetch.cc

travisdowns

Looks good to me, but at least some of the test failures look legit with BadLogLines related to the switch to ERROR logging.

As you add cases to handle those, can you also add comments as to why we expect certain error types in that function?

Implements a fetch_plan_executor that doesn't repeatedly poll every partition in the fetch for new data. Instead it relies on registering callbacks with raft:consensus::visible_offset_monitor to know when a partition has new data and therefore would be worth querying once again.

…tency

travisdowns

Awesome, let the new non-polling fetch era begin!

vbotbuildovich · 2024-02-06T00:18:32Z

/backport v23.3.x

rockwotj · 2024-02-06T02:25:21Z

src/v/kafka/server/handlers/fetch.cc

+                errored_partitions.emplace_back(i, req.ktp().get_partition());
+                continue;
+            }
+            last_visible_indexes[i] = consensus->last_visible_index();


FWIW I feel like this is a little error prone. Mostly in this layer we're using kafka offsets (that have gone through translation), but here we explicitly are using raft offsets.

We should probably just prioritize using kafka::offset properly so this footgun is more explicit, but some documentation on this might help.

Otherwise a quick look at this PR looks good. Nice work!

github-actions bot added the area/redpanda label Dec 6, 2023

ballard26 force-pushed the offset-table branch from e6e4432 to 8c81285 Compare December 7, 2023 22:49

redpanda-data deleted a comment from vbotbuildovich Dec 7, 2023

redpanda-data deleted a comment from vbotbuildovich Dec 9, 2023

ballard26 force-pushed the offset-table branch from 8c81285 to 2d02448 Compare December 9, 2023 09:30

graphcareful self-requested a review December 11, 2023 16:34

ballard26 force-pushed the offset-table branch 3 times, most recently from 0038c74 to c517220 Compare January 5, 2024 21:46

redpanda-data deleted a comment from vbotbuildovich Jan 5, 2024

redpanda-data deleted a comment from vbotbuildovich Jan 6, 2024

ballard26 force-pushed the offset-table branch 2 times, most recently from 3e65f5e to 0769e58 Compare January 8, 2024 08:25

ballard26 force-pushed the offset-table branch 3 times, most recently from 88515d5 to 0b7dad1 Compare January 9, 2024 02:06

github-actions bot added the area/rpk label Jan 9, 2024

ballard26 force-pushed the offset-table branch from 0b7dad1 to dee199c Compare January 9, 2024 02:06

github-actions bot removed the area/rpk label Jan 9, 2024

ballard26 changed the title ~~Draft: Nonpolling fetch implementation~~ Non-polling fetch implementation Jan 9, 2024

ballard26 marked this pull request as ready for review January 9, 2024 02:07

ballard26 requested review from StephanDollberg, travisdowns and twmb January 9, 2024 02:11

ballard26 assigned ballard26 and unassigned ballard26 Jan 9, 2024

BenPope reviewed Jan 9, 2024

View reviewed changes

src/v/kafka/server/handlers/fetch.cc Outdated Show resolved Hide resolved

ballard26 requested review from r-vasquez and gene-redpanda as code owners February 2, 2024 21:51

github-actions bot added the area/rpk label Feb 2, 2024

ballard26 added 4 commits February 2, 2024 16:52

kafka/fetch: set shard_id in shard fetch plan

42479b5

kafka/fetch: take responses by reference in fill_fetch_responses

dc4ba96

ssx: add local_is_initialized forward to sharded_abort_source

8aec676

config: add fetch_read_strategy property

871d404

ballard26 force-pushed the offset-table branch from 446d3d9 to 94a2202 Compare February 2, 2024 21:52

github-actions bot removed the area/rpk label Feb 2, 2024

ballard26 requested review from travisdowns, BenPope, rockwotj, graphcareful and StephanDollberg and removed request for twmb, r-vasquez and gene-redpanda February 2, 2024 21:54

ballard26 force-pushed the offset-table branch from 94a2202 to 6ec4d45 Compare February 3, 2024 02:17

travisdowns requested changes Feb 5, 2024

View reviewed changes

ballard26 added 3 commits February 5, 2024 15:27

raft: validate behavior of visible_offset_monitor with relaxed consis…

529792a

…tency

kafka: test fetch path with relaxed produce consistency

ed51c39

ballard26 force-pushed the offset-table branch from 6ec4d45 to ed51c39 Compare February 5, 2024 20:27

travisdowns approved these changes Feb 6, 2024

View reviewed changes

ballard26 merged commit ef31392 into redpanda-data:dev Feb 6, 2024
17 checks passed

vbotbuildovich mentioned this pull request Feb 6, 2024

[v23.3.x] Non-polling fetch implementation #16484

Merged

rockwotj reviewed Feb 6, 2024

View reviewed changes

ballard26 mentioned this pull request Mar 22, 2024

Dynamic fetch debounce time #13377

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-polling fetch implementation #15328

Non-polling fetch implementation #15328

ballard26 commented Dec 6, 2023 •

edited

ballard26 commented Jan 8, 2024

travisdowns left a comment

travisdowns left a comment

vbotbuildovich commented Feb 6, 2024

rockwotj Feb 6, 2024

rockwotj Feb 6, 2024

Non-polling fetch implementation #15328

Non-polling fetch implementation #15328

Conversation

ballard26 commented Dec 6, 2023 • edited

Backports Required

Release Notes

Improvements

ballard26 commented Jan 8, 2024

travisdowns left a comment

Choose a reason for hiding this comment

travisdowns left a comment

Choose a reason for hiding this comment

vbotbuildovich commented Feb 6, 2024

rockwotj Feb 6, 2024

Choose a reason for hiding this comment

rockwotj Feb 6, 2024

Choose a reason for hiding this comment

ballard26 commented Dec 6, 2023 •

edited