fix: SIGNAL-7090 UPDATE async worker id setting per partition #97

seungjinstord · 2024-09-22T00:09:56Z

Related Ticket(s)

Checklist

Code conforms to the Elixir Styleguide

Problem

While trying to use the BroadwayAdapter, I found that something in the AsyncAdapter was pushing the messages to just one partition.

Details

I was able to trace it back to the get_or_create_worker function. I found out that unless the :id is customized, and just AsyncWorker is used, then it doesn't matter whichever distinct name is used for Registry - the same worker pid is going be to tried to be retrieved. This means even though a different partition is found from AsyncAdapter, when it comes to get the associated worker pid - the first one created will be returned for any subsequent worker retrieval attempt.

Downstream effect is the consuming phase gets slowed down - only one partition will have all of the messages for the consumer group. Meaning, no matter how many consumers you have in the consumer group - only one consumer will be handling all of the messages pushed into the topic, across all of the partitions.

The fix is using a custom id that appends the partition number to the atom AsyncWorker. I string-concatenated it and didn't explicitly change it to atom, but I suspect it would be done under the hood.

As to why this was not detected in production, is probably because of the robustness of BEAM - a self-healing of crashed supervisor tree would end up picking random workers to be created, resulting in distributing messages across partitions.

But because for firehose we're passing a lot more messages, this became more visible.

TL;DR - I think this would increase overall performance of production consumption of Kafka messages for anywhere using AsyncAdapter. Meaning, this would increase consumption performance of services that consume from topic wms-service, which are GAS and WMS Bridge.

btkostner

Very nice find!

lib/kafee/producer/async_adapter.ex

test/kafee/producer/async_adapter_test.exs

…sage_list

…ode setup for partitioned messages

… registry lookup

An automated release has been created for you. --- ## [3.1.1](v3.1.0...v3.1.1) (2024-09-23) ### Bug Fixes * SIGNAL-7090 UPDATE async worker id setting per partition ([#97](#97)) ([060dbd7](060dbd7)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

UPDATE: async worker id setting per partition

e68c6bd

seungjinstord self-assigned this Sep 22, 2024

seungjinstord requested a review from a team September 23, 2024 18:59

seungjinstord marked this pull request as ready for review September 23, 2024 19:00

seungjinstord requested a review from a team as a code owner September 23, 2024 19:00

btkostner requested changes Sep 23, 2024

View reviewed changes

lib/kafee/producer/async_adapter.ex Outdated Show resolved Hide resolved

test/kafee/producer/async_adapter_test.exs Outdated Show resolved Hide resolved

test/kafee/producer/async_adapter_test.exs Outdated Show resolved Hide resolved

seungjinstord added 5 commits September 23, 2024 13:23

UPDATE: child id to include better nomenclature

79c9e59

UPDATE: refactor to use new BrodApi.generate_producer_partitioned_mes…

d4ae3d1

…sage_list

UPDATE: generate_producer_partitioned_message_list to refactor test c…

7180121

…ode setup for partitioned messages

ADD: BrodApi.generate_producer_partitioned_message_list

a72b3be

UPDATE: use test helper function assert_registry_lookup_pid to assert…

a2683bb

… registry lookup

seungjinstord requested a review from btkostner September 23, 2024 20:57

btkostner approved these changes Sep 23, 2024

View reviewed changes

seungjinstord merged commit 060dbd7 into main Sep 23, 2024
12 checks passed

seungjinstord deleted the SIGNAL-7090-async-adapter-to-start-partition-based-async-worker branch September 23, 2024 22:26

stord-engineering-account mentioned this pull request Sep 23, 2024

chore(main): release 3.1.1 #100

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: SIGNAL-7090 UPDATE async worker id setting per partition #97

fix: SIGNAL-7090 UPDATE async worker id setting per partition #97

seungjinstord commented Sep 22, 2024 •

edited

Loading

btkostner left a comment

fix: SIGNAL-7090 UPDATE async worker id setting per partition #97

fix: SIGNAL-7090 UPDATE async worker id setting per partition #97

Conversation

seungjinstord commented Sep 22, 2024 • edited Loading

Related Ticket(s)

Checklist

Problem

Details

btkostner left a comment

Choose a reason for hiding this comment

seungjinstord commented Sep 22, 2024 •

edited

Loading