feat: additional options to allow for batching and asynchronous batch handling for BroadwayAdapter #103

seungjinstord · 2024-09-26T00:26:35Z

Related Ticket(s)

Checklist

Code conforms to the Elixir Styleguide

Problem

Using Kafee's BroadwayAdapter with the FirehoseConsumer in wms-service, I ran a comparison test with a similar dummy handler for ProcessAdapter along with the dummy handler for BroadwayAdapter.

ProcessAdapter version took 7 seconds with a LOT of DBConnection pool errors, and the BroadwayAdapater took 66 seconds (at 12 partitions).

Roughly 10x difference, so a crude way to match would be to jack up partitions from 12 to 120.

Naturally I then jacked up partition to 120, keeping consumer_concurrency to 12, and it did help - now it took only 44 seconds.

Then I jacked up the consumer_concurrency to 120, and then I started to get DBConnection pool / page issues because my local DB couldn’t power 120 connection pooling, even with upping it in dev.exs. Therefore this “vanilla” approach to scaling won’t work.

Details

My first thought was if a naive way of tweaking batch config values would work - changing concurrency, batch size, or partitioning function. Partitioning function (to override Kafka's partitioning) is locked down by BroadwayKafka so that's not possible - concurrent processes used would only be up to the number of partitions, which I guess should be abiding by the rules of Kafka. But regardless, batching does expose a place where some chunked operation can happen.

I decided our events are pretty idempotent and protected from rigid rules of being chronologically ordered - this is battle tested already. I mean the current ProcessAdapter runs the event handlers asynchronously already.

We're exactly mimicing the pattern that ProcessAdapter goes through, if we use Batching:

ProcessAdapter records the events as they happen for that request_id, in chronological order, in its process state.
Then, it runs it through the event handlers asynchronously.
BroadwayAdapter in this PR would have capability to group messages that are coming in chronological order from partitions into batches, and for the messages in each batch, there's a new configuration to allow for the messages to be handled asynchronously.

For simplicity of code paths, the pragmatic approach is chosed - it's always going to go through a default batching, with a size of 1 unless overriding config options is passed.

Local test result

Using the same 400 threshold automation config trigger (see description in this PR), doing it async with following settings resulted in 7 seconds (previously 66 seconds) to go through the same number of events, with NO DBConnection errors popping up! The batch options used were:

               batching: [
                 concurrency: System.schedulers_online() * 2,
                 size: 100,
                 timeout: 500,
                 async_run: true
               ]

…handling

lib/kafee/consumer/broadway_adapter.ex

… form

…a Task.Supervisor running with it

…andle_failure even during batching

….Supervisor holding on to a lot of processses

… of Task.Supervisor holding on to a lot of processses" This reverts commit 9439378.

… having a Task.Supervisor running with it" This reverts commit 0b6c19d.

…or handling

seungjinstord · 2024-09-27T18:22:28Z

lib/kafee/consumer/broadway_adapter.ex

+    if batch_config[:async_run] do
+      # No need for Task.Supervisor as it is not running under a GenServer,
+      # and Kafee.Consumer.Adapter.push_message does already have error handling.
+      Enum.each(messages, &Task.async(fn -> do_consumer_work(&1) end))


This matches how ProcessAdapter event processing runs in async - it doesn't wait for them to complete, it just fires the async operations.

btkostner

Some cleanup comments to hopefully make this easier to use. Otherwise I like the idea and it should allow us to make something super fast.

lib/kafee/consumer/broadway_adapter.ex

test/kafee/consumer/broadway_adapter_integration_test.exs

…_failure per message

…s a list form" This reverts commit dd44ef2.

seungjinstord · 2024-10-03T16:50:23Z

@btkostner reminder for another round of review, thanks 🙏

jondthomas · 2024-10-07T15:12:03Z

lib/kafee/consumer/broadway_adapter.ex

@@ -100,34 +197,65 @@ defmodule Kafee.Consumer.BroadwayAdapter do

  @doc false
  @impl Broadway
-  def handle_message(:default, %Broadway.Message{data: value, metadata: metadata} = message, %{
+  def handle_message(:default, %Broadway.Message{metadata: metadata} = message, %{


do we still need this implementation if batching is always going to be on? 🤔

Probably still needed, as the spec requires handle_message (as it's not a defoverridable), but while reading the code again I did find an optimization point - I think I can actually not inject the additional metadata at this point. I can probably pull it from the context at handle_batch!

Optimization done in a913991

trideepgogoi

Looks safe to me!

… because it's unnecessary work as context already has it

trideepgogoi

LGTM

trideepgogoi

LGTM

An automated release has been created for you. --- ## [3.3.0](v3.2.0...v3.3.0) (2024-10-07) ### Features * Additional options to allow for batching and asynchronous batch handling for BroadwayAdapter ([#103](#103)) ([f003f0b](f003f0b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

ADD: additional options to allow for batching and asynchronous batch …

35914f3

…handling

seungjinstord added the hold Do not merge this pull request label Sep 26, 2024

seungjinstord self-assigned this Sep 26, 2024

btkostner reviewed Sep 26, 2024

View reviewed changes

lib/kafee/consumer/broadway_adapter.ex Outdated Show resolved Hide resolved

seungjinstord and others added 14 commits September 26, 2024 09:00

Merge branch 'main' into SIGNAL-7088-broadway-and-batch-option

e61f605

UPDATE: option names for batching

86801fc

UPDATE: use messages because handle_failed will always get a list

98b6835

UPDATE: option usage in accordance with the new name changes

7958427

ADD: Broadway test

2352127

UDPATE: consumer function signature because broadway is always a list…

dd44ef2

… form

UPDATE: move BroadwayAdapter into its own supervisor to allow having …

0b6c19d

…a Task.Supervisor running with it

UPDATE: use Task.Supervisor's async_nolink

94a0ff3

ADD: test for error handling to bubble all the way up to consumer's h…

0cc1ce9

…andle_failure even during batching

UPDATE: test description

9c5e746

UPDATE: use PartitionSupervisor to protect against bottleneck of Task…

9439378

….Supervisor holding on to a lot of processses

Revert "UPDATE: use PartitionSupervisor to protect against bottleneck…

f78d704

… of Task.Supervisor holding on to a lot of processses" This reverts commit 9439378.

Revert "UPDATE: move BroadwayAdapter into its own supervisor to allow…

ba67eb2

… having a Task.Supervisor running with it" This reverts commit 0b6c19d.

UPDATE: actually just use Task.async as we're covered in terms of err…

8fbf5a1

…or handling

seungjinstord commented Sep 27, 2024

View reviewed changes

seungjinstord removed the hold Do not merge this pull request label Sep 27, 2024

seungjinstord marked this pull request as ready for review September 27, 2024 18:22

seungjinstord requested a review from a team as a code owner September 27, 2024 18:22

seungjinstord requested review from btkostner and a team September 27, 2024 18:22

REMOVE: unnecessary tag

423a8f7

btkostner reviewed Sep 28, 2024

View reviewed changes

seungjinstord added 3 commits September 30, 2024 14:46

ADD: docs for sub key

d092d9d

UPDATE: context to include adapter_options

95faafc

REMOVE: redundant options call now that context has it

a8a32ba

seungjinstord and others added 6 commits September 30, 2024 15:05

UPDATE: use context 's adapter_options for handle_batch

51928f0

UPDATe: make test more readable and extendable

6e2e13c

UPDATE: have handle_failed run the message(s) through consumer.handle…

810ad73

…_failure per message

Revert "UDPATE: consumer function signature because broadway is alway…

2381268

…s a list form" This reverts commit dd44ef2.

UPDATE: default batching to be on all times

3c0bd97

Merge branch 'main' into SIGNAL-7088-broadway-and-batch-option

1a5f55c

seungjinstord requested a review from a team October 2, 2024 17:31

Merge branch 'main' into SIGNAL-7088-broadway-and-batch-option

35edc20

jondthomas reviewed Oct 7, 2024

View reviewed changes

trideepgogoi previously approved these changes Oct 7, 2024

View reviewed changes

UPDATE: moved consumer / options reference logic down to handle_batch…

a913991

… because it's unnecessary work as context already has it

seungjinstord dismissed trideepgogoi’s stale review via a913991 October 7, 2024 16:11

seungjinstord requested review from trideepgogoi and jondthomas October 7, 2024 16:11

trideepgogoi previously approved these changes Oct 7, 2024

View reviewed changes

REMOVE: commented out code

6c3043c

seungjinstord dismissed trideepgogoi’s stale review via 6c3043c October 7, 2024 16:15

trideepgogoi approved these changes Oct 7, 2024

View reviewed changes

seungjinstord enabled auto-merge (squash) October 7, 2024 16:19

cdipesa-stord requested review from cdipesa-stord and btkostner and removed request for btkostner and cdipesa-stord October 7, 2024 16:52

cdipesa-stord approved these changes Oct 7, 2024

View reviewed changes

seungjinstord merged commit f003f0b into main Oct 7, 2024
12 checks passed

seungjinstord deleted the SIGNAL-7088-broadway-and-batch-option branch October 7, 2024 16:53

stord-engineering-account mentioned this pull request Oct 7, 2024

chore(main): release 3.3.0 #107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: additional options to allow for batching and asynchronous batch handling for BroadwayAdapter #103

feat: additional options to allow for batching and asynchronous batch handling for BroadwayAdapter #103

seungjinstord commented Sep 26, 2024 •

edited

Loading

seungjinstord Sep 27, 2024

btkostner left a comment

seungjinstord commented Oct 3, 2024 •

edited

Loading

jondthomas Oct 7, 2024

seungjinstord Oct 7, 2024 •

edited

Loading

seungjinstord Oct 7, 2024

trideepgogoi left a comment

trideepgogoi left a comment

trideepgogoi left a comment

feat: additional options to allow for batching and asynchronous batch handling for BroadwayAdapter #103

feat: additional options to allow for batching and asynchronous batch handling for BroadwayAdapter #103

Conversation

seungjinstord commented Sep 26, 2024 • edited Loading

Related Ticket(s)

Checklist

Problem

Details

Local test result

seungjinstord Sep 27, 2024

Choose a reason for hiding this comment

btkostner left a comment

Choose a reason for hiding this comment

seungjinstord commented Oct 3, 2024 • edited Loading

jondthomas Oct 7, 2024

Choose a reason for hiding this comment

seungjinstord Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

seungjinstord Oct 7, 2024

Choose a reason for hiding this comment

trideepgogoi left a comment

Choose a reason for hiding this comment

trideepgogoi left a comment

Choose a reason for hiding this comment

trideepgogoi left a comment

Choose a reason for hiding this comment

seungjinstord commented Sep 26, 2024 •

edited

Loading

seungjinstord commented Oct 3, 2024 •

edited

Loading

seungjinstord Oct 7, 2024 •

edited

Loading