Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible dead lock between DefaultKafkaProducerFactory.expire and DefaultKafkaProducerFactory.removeProducer #2744

Closed
Nick-The-Uncharted opened this issue Jul 17, 2023 · 1 comment · Fixed by #2747

Comments

@Nick-The-Uncharted
Copy link

Nick-The-Uncharted commented Jul 17, 2023

In what version(s) of Spring for Apache Kafka are you seeing this issue?

2.8.11

Describe the bug

Following log was printed:

Jul 16, 2023 @ 12:08:41.972 [Producer clientId=producer-4] Proceeding to force close the producer since pending requests could not be completed within timeout 30000 ms.
Jul 16, 2023 @ 12:08:11.971 [Producer clientId=producer-4] Closing the Kafka producer with timeoutMillis = 30000 ms.

and after 4 hours, the closing thread is still pending, and jstack shows a deadlock:


# waiting for lock and never finish
"kafka-producer-network-thread | producer-4" #17753 daemon prio=5 os_prio=0 tid=0x00007fa500089000 nid=0x456f waiting for monitor entry [0x00007fa4f37fa000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.springframework.kafka.core.DefaultKafkaProducerFactory.removeProducer(DefaultKafkaProducerFactory.java:841)
	- waiting to lock <0x00000000ea598598> (a org.springframework.kafka.core.DefaultKafkaProducerFactory)
	at org.springframework.kafka.core.DefaultKafkaProducerFactory$$Lambda$726/64605090.test(Unknown Source)
	at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.close(DefaultKafkaProducerFactory.java:1201)
	at org.springframework.kafka.core.KafkaTemplate.closeProducer(KafkaTemplate.java:633)
	at org.springframework.kafka.core.KafkaTemplate.lambda$buildCallback$6(KafkaTemplate.java:706)
	at org.springframework.kafka.core.KafkaTemplate$$Lambda$1907/1089124032.onCompletion(Unknown Source)
	at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion(DefaultKafkaProducerFactory.java:1095)
	at org.apache.kafka.clients.producer.KafkaProducer$AppendCallbacks.onCompletion(KafkaProducer.java:1505)
	at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:270)
	at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:234)
	at org.apache.kafka.clients.producer.internals.ProducerBatch.complete(ProducerBatch.java:180)
	at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:692)
	at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:663)
	at org.apache.kafka.clients.producer.internals.Sender.lambda$null$1(Sender.java:589)
	at org.apache.kafka.clients.producer.internals.Sender$$Lambda$1933/812525772.accept(Unknown Source)
	at java.util.ArrayList.forEach(ArrayList.java:1259)
	at org.apache.kafka.clients.producer.internals.Sender.lambda$handleProduceResponse$2(Sender.java:576)
	at org.apache.kafka.clients.producer.internals.Sender$$Lambda$1932/466903717.accept(Unknown Source)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:576)
	at org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:850)
	at org.apache.kafka.clients.producer.internals.Sender$$Lambda$1926/1711686194.onComplete(Unknown Source)
	at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:154)
	at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:594)
	at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:586)
	at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328)
	at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:256)
	at java.lang.Thread.run(Thread.java:750)

# get lock and wait other thread to finish
"consumer-xxx-0-C-1" #88 prio=5 os_prio=0 tid=0x00007fa5c268b800 nid=0x6f in Object.wait() [0x00007fa4aa7ed000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1257)
	- locked <0x00000000ef416040> (a org.apache.kafka.common.utils.KafkaThread)
	at java.lang.Thread.join(Thread.java:1331)
	at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:1342)
	at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:1303)
	at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.closeDelegate(DefaultKafkaProducerFactory.java:1207)
	at org.springframework.kafka.core.DefaultKafkaProducerFactory.expire(DefaultKafkaProducerFactory.java:887)
	at org.springframework.kafka.core.DefaultKafkaProducerFactory.doCreateProducer(DefaultKafkaProducerFactory.java:754)
	- locked <0x00000000ea598598> (a org.springframework.kafka.core.DefaultKafkaProducerFactory)
	at org.springframework.kafka.core.DefaultKafkaProducerFactory.createProducer(DefaultKafkaProducerFactory.java:733)
	at org.springframework.kafka.core.DefaultKafkaProducerFactory.createProducer(DefaultKafkaProducerFactory.java:727)
	at org.springframework.kafka.core.KafkaTemplate.getTheProducer(KafkaTemplate.java:760)
	at org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:644)

@garyrussell garyrussell added this to the 3.0.9 milestone Jul 17, 2023
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Jul 17, 2023
Resolves spring-projects#2744

Possible deadlock if `removeProducer` is called on the producer network thread.

Move resetting the global shared producer to the creation logic.

Also ensure the delegate of any thread-bound producers are closed.

Add try/catch around the delegate close.

**cherry-pick to 2.9.x**
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Jul 17, 2023
Resolves spring-projects#2744

Possible deadlock if `removeProducer` is called on the producer network thread.

Move resetting the global shared producer to the creation logic.

Also ensure the delegate of any thread-bound producers are closed.

Add try/catch around the delegate close.

**cherry-pick to 2.9.x**
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Jul 17, 2023
Resolves spring-projects#2744

Possible deadlock if `removeProducer` is called on the producer network thread.

Move resetting the global shared producer to the creation logic.

Also ensure the delegate of any thread-bound producers are closed.

Add try/catch around the delegate close.

**cherry-pick to 2.9.x**
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Jul 17, 2023
Resolves spring-projects#2744

Possible deadlock if `removeProducer` is called on the producer network thread.

Move resetting the global shared producer to the creation logic.

Also ensure the delegate of any thread-bound producers are closed.

Add try/catch around the delegate close.

**cherry-pick to 2.9.x**
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Jul 17, 2023
Resolves spring-projects#2744

Possible deadlock if `removeProducer` is called on the producer network thread.

Move resetting the global shared producer to the creation logic.

Also ensure the delegate of any thread-bound producers are closed.

Add try/catch around the delegate close.

**cherry-pick to 2.9.x**
@garyrussell
Copy link
Contributor

Thanks for reporting @Nick-The-Uncharted

Please note that 2.8.x is no longer supported as OSS https://spring.io/projects/spring-kafka#support - the fix will be in 2.9.10 and 3.0.9 (which should be out today). 2.9.x is fully compatible with Boot 2.7.x.

artembilan pushed a commit that referenced this issue Jul 17, 2023
Resolves #2744

Possible deadlock if `removeProducer` is called on the producer network thread.

Move resetting the global shared producer to the creation logic.

Also ensure the delegate of any thread-bound producers are closed.

Add try/catch around the delegate close.

**cherry-pick to 2.9.x**
artembilan pushed a commit that referenced this issue Jul 17, 2023
Resolves #2744

Possible deadlock if `removeProducer` is called on the producer network thread.

Move resetting the global shared producer to the creation logic.

Also ensure the delegate of any thread-bound producers are closed.

Add try/catch around the delegate close.

**cherry-pick to 2.9.x**
# Conflicts:
#	spring-kafka/src/test/java/org/springframework/kafka/core/DefaultKafkaProducerFactoryTests.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants