-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
In what version(s) of Spring for Apache Kafka are you seeing this issue?
2.8.10
Describe the bug
This is a conceptual description of a Kafka duplicate consumption scenario that we are seeing in our load test environment:
GIVEN Two application instances on separate EC2 instances with one Kafka consumer each and same consumer group ID.
GIVEN Consumption with a Spring BatchAcknowledgingMessageListener which synchronously consumes a batch and then calls Acknowledgment.acknowledge().
GIVEN single-threaded ConcurrentMessageListenerContainer with AckMode MANUAL_IMMEDIATE and syncCommits enabled.
WHEN One of the application instances shuts down (its consumer leaves the consumer group)
THEN The consumer group is rebalanced so that the remaining consumer gets all of the partitions.
WHEN The first application instance starts again (its consumer joins the consumer group).
THEN The consumer group is rebalanced so that each consumer gets half of the partitions.
THEN A handful of messages are successfully consumed by both consumers.
THEN The consumer that gets half of its partitions revoked logs one or more of this type of message before the partition revocation is logged:
2022-11-17 08:24:21,809 INFO [consumer-0-C-1] o.a.k.c.c.i.ConsumerCoordinator [ConsumerCoordinator.java:1156] [Consumer clientId=consumer-xms-batch-mt-callback-3, groupId=xms-batch-mt-callback] Failing OffsetCommit request since the consumer is not part of an active group
This happens sometimes when executing a test case with application instance restart during moderate traffic load (consumption of 1000 records per second). By contrast, when there is no duplication, the partition revocation is logged first and only after that there are log entries for failing offset commits.
To Reproduce
- Set up two KafkaMessageListenerContainers with BatchAcknowledgingMessageListeners, belonging to the same consumer group, consuming from the same topic with at least two partitions, with non-default consumer configuration settings
heartbeat.interval.ms = 1500
andsession.timeout.ms = 6000
. - Start sending records with a serialized payload of roughly 100 bytes at a rate of 1000 per second.
- Stop one of the consumers, wait for a minute, then start it again.
In our test setup this triggers duplicate consumption of a handful of messages roughly two times out of 10.
Expected behavior
All produced records are only consumed once by either consumer.
Sample
TBD.