Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMap.add/remove- EntryListener randomly hangs #11470

Closed
medvand opened this issue Sep 27, 2017 · 3 comments
Closed

IMap.add/remove- EntryListener randomly hangs #11470

medvand opened this issue Sep 27, 2017 · 3 comments

Comments

@medvand
Copy link

@medvand medvand commented Sep 27, 2017

Accordingly to thread dump the problem is in the following:

  • Client registers/unregisters entry listeners using IMap.add/remove- EntryListener, this is performed by ClientSmartListenerService which blocks calling thread until registration is done
  • Registration itself is performed by ClientSmartListenerService in a single thread by calling ClientInvocation.invokeUrgent and blocking this thread until a response is received from the server.
  • For some reason, registration/deregistration response from the server is not received by the client (thread dump shows that ResponseThread is waiting on empty incoming messages queue).
  • Therefore all calling threads are blocked on registrations/deregistrations on a period of completing this registration/deregistration procedure. If something goes wrong during that period, entire server hangs forever
@sancar
Copy link
Member

@sancar sancar commented Sep 28, 2017

Hi @medvand ,
ClientSmartListenerService uses single thread to solve the thread safety issues. It could have been multithread but it will be complicated and error prone. Making it single thread potentially means operations will wait for each other, that is expected.
But registration response not coming from server is not expected. Server will either close the connection, send response or client will detect heartbeat issues and close the connection. These timeouts could be long depending on your configuration. But none of them should cause a hang forever.
Some defaults timeouts that your invocations may expire after is as following:
"hazelcast.client.invocation.timeout.seconds", 120, SECONDS
"hazelcast.client.heartbeat.timeout", 60000, MILLISECONDS
Do you have a reproducer for us to try ?
If not, can you share server and client logs of problematic run with us to investigate ?

@medvand
Copy link
Author

@medvand medvand commented Sep 29, 2017

Hi Sancar. The problem occurs randomly, approx. several times a week. We could not write a sample, that reproduces it. Version 3.8.5 Community Edition is used. The following properties were changed:

hazelcast.io.thread.count=8
hazelcast.event.thread.count=8
hazelcast.event.queue.timeout.millis=250
hazelcast.operation.thread.count=64
hazelcast.map.entry.filtering.natural.event.types=true
hazelcast.max.no.heartbeat.seconds=600
hazelcast.max.no.master.confirmation.seconds=600
hazelcast.slow.operation.detector.stacktrace.logging.enabled=true
hazelcast.slow.operation.detector.log.retention.seconds=86400

Our topology: 16 server nodes (8G ram), 4 client nodes (8G ram)

@mmedenjak mmedenjak removed the To Triage label Jan 29, 2018
@mmedenjak mmedenjak added this to the 3.10 milestone Feb 19, 2018
@sancar
Copy link
Member

@sancar sancar commented Mar 16, 2018

Hi @medvand , only known bug of ClientSmartListenerService after 3.8.6 is
#11763
which does not seem directly related to your issue.
This is fixed in 3.9.1 (#11763) , if you would like to try.

If you don't have a reproducer, logs of the client and member could help us to reason about problem.
I am closing the issue for now. If you have anything new,(logs etc), please share here so that we can identify the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.