-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-polling fetch implementation #15328
Conversation
e6e4432
to
8c81285
Compare
8c81285
to
2d02448
Compare
0038c74
to
c517220
Compare
3e65f5e
to
0769e58
Compare
/dt |
88515d5
to
0b7dad1
Compare
0b7dad1
to
dee199c
Compare
446d3d9
to
94a2202
Compare
94a2202
to
6ec4d45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but at least some of the test failures look legit with BadLogLines related to the switch to ERROR logging.
As you add cases to handle those, can you also add comments as to why we expect certain error types in that function?
Implements a fetch_plan_executor that doesn't repeatedly poll every partition in the fetch for new data. Instead it relies on registering callbacks with raft:consensus::visible_offset_monitor to know when a partition has new data and therefore would be worth querying once again.
6ec4d45
to
ed51c39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, let the new non-polling fetch era begin!
/backport v23.3.x |
errored_partitions.emplace_back(i, req.ktp().get_partition()); | ||
continue; | ||
} | ||
last_visible_indexes[i] = consensus->last_visible_index(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I feel like this is a little error prone. Mostly in this layer we're using kafka offsets (that have gone through translation), but here we explicitly are using raft offsets.
We should probably just prioritize using kafka::offset
properly so this footgun is more explicit, but some documentation on this might help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise a quick look at this PR looks good. Nice work!
This PR implements a new
fetch_plan_executor
that doesn't repeatedly poll every partition in the fetch for new data. Instead it relies on registering callbacks withraft:consensus::visible_offset_monitor
to know when a partition has new data and therefore would be worth querying once again.The gist of how this is implemented is as follows;
kafka::nonpolling_fetch_plan_executor::execute_plan
) is created on the shard that received the fetch request. This coordinator is responsible for creating fetch workers and determining when a fetch request is completed.kafka::nonpolling_fetch_plan_executor::shard_fetch_worker
) is created on every shard that has a partition from the request. It's responsible for querying partitions for data. And if no partitions have data it'll register with theraft:consensus::visible_offset_monitor
for those partitions and wait until it increases for one or more of them. The worker only returns on errors, aborts from the worker, or when it has queried enough data to meet or exceed the lower limit the coordinator specified.Backports Required
Release Notes
Improvements
fetch_read_strategy
. This property determines which fetch execution strategy Redpanda will use to fulfill a fetch request. The newly introducednon_polling
execution strategy is the default for this property with thepolling
strategy being included to make backporting possible.