Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded KafkaItemReader on lines of JdbcPagingItemReader [BATCH-2855] #760

Closed
spring-issuemaster opened this issue Nov 6, 2019 · 1 comment

Comments

@spring-issuemaster
Copy link
Collaborator

@spring-issuemaster spring-issuemaster commented Nov 6, 2019

Abhinav Nigam opened BATCH-2855 and commented

Hello Mike

 

Checking if there is possibility to have a multithreaded item reader to read from a Kafka topic/partition similar to JdbcPagingItemReader

 

Regards

Abhinav


No further details from BATCH-2855

@benas
Copy link
Contributor

@benas benas commented Feb 28, 2020

The KafkaItemReader is based on a KafkaConsumer which is not thread safe. Here is an excerpt from its Javadoc:

The Kafka consumer is NOT thread-safe.
All network I/O happens in the thread of the application making the call.
It is the responsibility of the user to ensure that multi-threaded access
is properly synchronized. 

Hence, the KafkaItemReader is in turn not thread-safe. If you want to use it in a multi-threaded scenario, you can decorate it with a SynchronizedItemStreamReader. In a partitioned scenario (for example with a partitioner that creates a Spring batch partition for each partition in a given kafka topic), using a step-scoped reader should make it thread-safe as well.

That said, to answer the question about the "possibility to have a multithreaded item reader to read from a Kafka topic/partition ", I will base my answer on the following section from the aforementioned Javadoc of the Kafka consumer:

We have intentionally avoided implementing a particular threading model for processing.
This leaves several options for implementing multi-threaded processing of records.

1. One Consumer Per Thread
A simple option is to give each thread its own consumer instance
[...]
2. Decouple Consumption and Processing
Another alternative is to have one or more consumer threads that do all data consumption
and hands off ConsumerRecords instances to a blocking queue consumed by
a pool of processor threads that actually handle the record processing.
[...]
  • To implement option 1: You can create multiple KafkaItemReaders (each one with its own kafka consumer), then make each thread use a different reader
  • To implement option 2: Use a single KafkaItemReader and couple it with an AsyncItemProcessor/AsyncItemWriter (aka reading is single threaded and processing/writing is multi-threaded)

I'm closing this issue for now as I explained the reason why the KafkaItemReader cannot be made thread-safe and gave some alternative options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.