Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hazelcast terminates listener incorrectly. #9766

Closed
tqt9209 opened this issue Jan 26, 2017 · 3 comments

Comments

Projects
None yet
4 participants
@tqt9209
Copy link

commented Jan 26, 2017

I have this Hazelcast set up

Application ->hzClient (3.7.5 version) ------------------- hzServer1 (3.7.5)
------------------- hzServer2 (3.7.5)
hzServer1 and hzServer2 are members in the same cluster.
hzClient registers message listener with the cluster to receive messages from reliable topic.

If I restart hzServer2 then I see the exception below and the listener is terminated incorrectly. There is no call back to tell application that the listener is terminated.
I end up with situation where application no longer receives any message from the topic.

I have the following suggestions.

  1. ReliableMessageListener should not be terminated on exception. If exception occurs when trying to read message from the dead member, don't terminate the listener, try re-creating the listener on different member. If there is no alive member, wait for the connection to re-established (could be just a temporary glitch) then re-create listeners again.
  2. Look like there is existing isTerminal(Throwable failure). Should that be used to feed the exception to application?
  3. If the listener has to be terminated (no other choice), there should be a new method introduced into ReliableMessageListener interface (ex: listenerTerminated(...)) to tell application that the listener is terminated.
    Terminate listener silently without telling application doesn't make any sense at all because application assumes everything is well when it is not.

        hz.client_0 [hzc1] [3.7.5] Terminating MessageListener com.broadsoft.persistence.hazelcast.BWHazelcastInstance$TopicContainer$ReliableMessageListenerImpl@1f739e95 on topic: profileManagementUpdate. Reason: Unhandled exception, message: 
                                          com.hazelcast.spi.exception.TargetDisconnectedException: Disconnecting from member [10.16.176.21]:5701 due to heartbeat problems. Current time: 2017-01-26 12:50:57.655. Last heartbeat requested: 2017-01-26 12:50:55.660. Last heartbeat received: 2017-01-26 12:50:55.662. Last read: 2017-01-26 12:50:57.137. Connection ClientConnection{live=false, connectionId=2, socketChannel=SSLSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[closed]}, remoteEndpoint=[10.16.176.21]:5701, lastReadTime=2017-01-26 12:50:57.137, lastWriteTime=2017-01-26 12:50:55.661, closedTime=2017-01-26 12:50:57.137, lastHeartbeatRequested=2017-01-26 12:50:55.660, lastHeartbeatReceived=2017-01-26 12:50:55.662, connected server version=3.7.5} 
<java.util.concurrent.ExecutionException: com.hazelcast.spi.exception.TargetDisconnectedException: Disconnecting from member [10.16.176.21]:5701 due to heartbeat problems. Current time: 2017-01-26 12:50:57.655. Last heartbeat requested: 2017-01-26 12:50:55.660. Last heartbeat received: 2017-01-26 12:50:55.662. Last read: 2017-01-26 12:50:57.137. Connection ClientConnection{live=false, connectionId=2, socketChannel=SSLSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[closed]}, remoteEndpoint=[10.16.176.21]:5701, lastReadTime=2017-01-26 12:50:57.137, lastWriteTime=2017-01-26 12:50:55.661, closedTime=2017-01-26 12:50:57.137, lastHeartbeatRequested=2017-01-26 12:50:55.660, lastHeartbeatReceived=2017-01-26 12:50:55.662, connected server version=3.7.5}>
 java.util.concurrent.ExecutionException: com.hazelcast.spi.exception.TargetDisconnectedException: Disconnecting from member [10.16.176.21]:5701 due to heartbeat problems. Current time: 2017-01-26 12:50:57.655. Last heartbeat requested: 2017-01-26 12:50:55.660. Last heartbeat received: 2017-01-26 12:50:55.662. Last read: 2017-01-26 12:50:57.137. Connection ClientConnection{live=false, connectionId=2, socketChannel=SSLSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[closed]}, remoteEndpoint=[10.16.176.21]:5701, lastReadTime=2017-01-26 12:50:57.137, lastWriteTime=2017-01-26 12:50:55.661, closedTime=2017-01-26 12:50:57.137, lastHeartbeatRequested=2017-01-26 12:50:55.660, lastHeartbeatReceived=2017-01-26 12:50:55.662, connected server version=3.7.5}
        at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolve(ClientInvocationFuture.java:66)
        at com.hazelcast.spi.impl.AbstractInvocationFuture$1.run(AbstractInvocationFuture.java:246)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92)
Caused by: com.hazelcast.spi.exception.TargetDisconnectedException: Disconnecting from member [10.16.176.21]:5701 due to heartbeat problems. Current time: 2017-01-26 12:50:57.655. Last heartbeat requested: 2017-01-26 12:50:55.660. Last heartbeat received: 2017-01-26 12:50:55.662. Last read: 2017-01-26 12:50:57.137. Connection ClientConnection{live=false, connectionId=2, socketChannel=SSLSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[closed]}, remoteEndpoint=[10.16.176.21]:5701, lastReadTime=2017-01-26 12:50:57.137, lastWriteTime=2017-01-26 12:50:55.661, closedTime=2017-01-26 12:50:57.137, lastHeartbeatRequested=2017-01-26 12:50:55.660, lastHeartbeatReceived=2017-01-26 12:50:55.662, connected server version=3.7.5}
        at com.hazelcast.spi.exception.TargetDisconnectedException.newTargetDisconnectedExceptionCausedByHeartbeat(TargetDisconnectedException.java:66)
        at com.hazelcast.client.spi.impl.ClientInvocationServiceSupport$CleanResourcesTask.run(ClientInvocationServiceSupport.java:221)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        ... 5 more
Caused by: java.io.EOFException: Remote socket closed!
        at com.hazelcast.client.connection.nio.ClientReadHandler.handle(ClientReadHandler.java:87)
        at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.handleSelectionKey(NonBlockingIOThread.java:345)
        at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.handleSelectionKeys(NonBlockingIOThread.java:330)
        at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.selectLoop(NonBlockingIOThread.java:248)
        at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.run(NonBlockingIOThread.java:201)

@sancar sancar added this to the 3.7.6 milestone Jan 27, 2017

@ihsandemir ihsandemir self-assigned this Jan 27, 2017

@ihsandemir

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2017

I examined the problem. The ringbuffer readMany is not marked as retryable (https://github.com/hazelcast/hazelcast-client-protocol/blob/master/hazelcast/src/main/java/com/hazelcast/client/impl/protocol/template/RingbufferCodecTemplate.java#L132) This probably caused the problem. I will submit a fix PR for this.

ihsandemir added a commit to ihsandemir/hazelcast that referenced this issue Feb 7, 2017

Test for issue hazelcast#9766 . The new client-protocol (1.4.0-12) li…
…brary release will provide the fix which makes the idempotent requests retryable.

ihsandemir added a commit to ihsandemir/hazelcast that referenced this issue Feb 7, 2017

Test for issue hazelcast#9766 . The new client-protocol (1.4.0-12) li…
…brary release will provide the fix which makes the idempotent requests retryable.

ihsandemir added a commit that referenced this issue Feb 8, 2017

Test for issue #9766 . The new client-protocol (1.4.0-12) library rel…
…ease will provide the fix which makes the idempotent requests retryable. (#9811)
@tqt9209

This comment has been minimized.

Copy link
Author

commented Mar 17, 2017

The problem still exists in version 3.7.6. I am not sure how you guys tested and closed it. I am using the enterprise version. I am going to open new ticket:

2017.03.17 11:15:09:298 EDT | Warn | Hazelcast

    hz.client_0 [hzc1] [3.7.6] Terminating MessageListener com.broadsoft.persistence.hazelcast.BWHazelcastInstance$TopicContainer$ReliableMessageListenerImpl@111bed78 on topic: profileManagementUpdate. Reason: Unhandled exception, message: com.hazelcast.spi.exception.TargetDisconnectedException: Disconnecting from member [10.16.176.21]:5701 due to heartbeat problems. Current time: 2017-03-17 11:15:09.265. Last heartbeat requested: 2017-03-17 11:14:49.266. Last heartbeat received: 2017-03-17 11:14:49.268. Last read: 2017-03-17 11:15:08.561. Connection ClientConnection{live=false, connectionId=2, socketChannel=SSLSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[closed]}, remoteEndpoint=[10.16.176.21]:5701, lastReadTime=2017-03-17 11:15:08.561, lastWriteTime=2017-03-17 11:14:59.443, closedTime=2017-03-17 11:15:08.561, lastHeartbeatRequested=2017-03-17 11:14:49.266, lastHeartbeatReceived=2017-03-17 11:14:49.268, connected server version=3.7.6} <java.util.concurrent.ExecutionException: com.hazelcast.spi.exception.TargetDisconnectedException: Disconnecting from member [10.16.176.21]:5701 due to heartbeat problems. Current time: 2017-03-17 11:15:09.265. Last heartbeat requested: 2017-03-17 11:14:49.266. Last heartbeat received: 2017-03-17 11:14:49.268. Last read: 2017-03-17 11:15:08.561. Connection ClientConnection{live=false, connectionId=2, socketChannel=SSLSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[closed]}, remoteEndpoint=[10.16.176.21]:5701, lastReadTime=2017-03-17 11:15:08.561, lastWriteTime=2017-03-17 11:14:59.443, closedTime=2017-03-17 11:15:08.561, lastHeartbeatRequested=2017-03-17 11:14:49.266, lastHeartbeatReceived=2017-03-17 11:14:49.268, connected server version=3.7.6}>java.util.concurrent.ExecutionException: com.hazelcast.spi.exception.TargetDisconnectedException: Disconnecting from member [10.16.176.21]:5701 due to heartbeat problems. Current time: 2017-03-17 11:15:09.265. Last heartbeat requested: 2017-03-17 11:14:49.266. Last heartbeat received: 2017-03-17 11:14:49.268. Last read: 2017-03-17 11:15:08.561. Connection ClientConnection{live=false, connectionId=2, socketChannel=SSLSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[closed]}, remoteEndpoint=[10.16.176.21]:5701, lastReadTime=2017-03-17 11:15:08.561, lastWriteTime=2017-03-17 11:14:59.443, closedTime=2017-03-17 11:15:08.561, lastHeartbeatRequested=2017-03-17 11:14:49.266, lastHeartbeatReceived=2017-03-17 11:14:49.268, connected server version=3.7.6}
    at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolve(ClientInvocationFuture.java:66)
    at com.hazelcast.spi.impl.AbstractInvocationFuture$1.run(AbstractInvocationFuture.java:246)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)
    at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92)

Caused by: com.hazelcast.spi.exception.TargetDisconnectedException: Disconnecting from member [10.16.176.21]:5701 due to heartbeat problems. Current time: 2017-03-17 11:15:09.265. Last heartbeat requested: 2017-03-17 11:14:49.266. Last heartbeat received: 2017-03-17 11:14:49.268. Last read: 2017-03-17 11:15:08.561. Connection ClientConnection{live=false, connectionId=2, socketChannel=SSLSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[closed]}, remoteEndpoint=[10.16.176.21]:5701, lastReadTime=2017-03-17 11:15:08.561, lastWriteTime=2017-03-17 11:14:59.443, closedTime=2017-03-17 11:15:08.561, lastHeartbeatRequested=2017-03-17 11:14:49.266, lastHeartbeatReceived=2017-03-17 11:14:49.268, connected server version=3.7.6}
at com.hazelcast.spi.exception.TargetDisconnectedException.newTargetDisconnectedExceptionCausedByHeartbeat(TargetDisconnectedException.java:66)
at com.hazelcast.client.spi.impl.ClientInvocationServiceSupport$CleanResourcesTask.run(ClientInvocationServiceSupport.java:221)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
... 5 more
Caused by: java.io.EOFException: Remote socket closed!
at com.hazelcast.client.connection.nio.ClientReadHandler.handle(ClientReadHandler.java:87)
at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.handleSelectionKey(NonBlockingIOThread.java:345)
at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.handleSelectionKeys(NonBlockingIOThread.java:330)
at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.selectLoop(NonBlockingIOThread.java:248)
at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.run(NonBlockingIOThread.java:201)

@tqt9209

This comment has been minimized.

Copy link
Author

commented Mar 17, 2017

New ticket created:
#10101

@sancar sancar modified the milestones: 3.8, 3.7.6 Mar 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.