Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MessageListener fails to receive messages after client reconnect to cluster #14123

Closed
kdohz opened this issue Nov 14, 2018 · 3 comments
Closed

MessageListener fails to receive messages after client reconnect to cluster #14123

kdohz opened this issue Nov 14, 2018 · 3 comments

Comments

@kdohz
Copy link

@kdohz kdohz commented Nov 14, 2018

Report issue:

When the client reconnects (transition from CONNECTED -> DISCONNECTED -> CONNECTED), the MessageListener added to the client no longer receives messages. However, the MapListener continues to receive map entry events after reconnection.
The HZ version in use is 3.11.

Scenario details:

  1. HZ cluster has 2 members 41 and 43.
  2. HZ client is configured as follows:
      ClientConnectionStrategyConfig connectionStrategy = clientConfig.getConnectionStrategyConfig();
      ConnectionRetryConfig connectionRetry = connectionStrategy.getConnectionRetryConfig();
      
      connectionStrategy.setAsyncStart(true); // default is false
      connectionStrategy.setReconnectMode(ClientConnectionStrategyConfig.ReconnectMode.ASYNC); // default is OFF
      connectionRetry.setInitialBackoffMillis(1000);
      connectionRetry.setMaxBackoffMillis(1000); // default is 30000
      connectionRetry.setMultiplier(2); // default is 2
      connectionRetry.setFailOnMaxBackoff(false); // default is false. 
      connectionRetry.setJitter(0.2); // default is 0.2
      connectionRetry.setEnabled(true); // default is false
  1. HZ client is CONNECTED to the cluster via 41 as owner member.
  2. MessageListener is successfully registered and is successfully receive messages.
  3. MapListener is successfully registered and is successfully receive notification upon add/update/remove entry in the map.
  4. Disconnect 41 from the network.
  5. HZ client transition from DISCONNECTED to CONNECTED state where 43 is now the owner member.
  6. Add/update/remove entry in the map successfully triggers the MapListener to receive the corresponding event.
  7. MessageListener receives nothing when a message is published.

Note that the same issue is observed when the cluster has only one member and the client is reconnected.

Observation
Remove the old message listener and add new message listener works around the problem i.e the new MessageListener starts to receive published messages.

@sancar sancar added the Team: Client label Nov 15, 2018
@sancar
Copy link
Member

@sancar sancar commented Nov 15, 2018

Hi @kdohz
I assume that you are using ReliableTopic, because ITopic Messagelistener and MapListener works same.
ReliableTopic can terminate when events are lost. Not sure what is happened in your case. You can check the logs to see if it is terminated. Messages like "Terminating MessageListener ... " are logged in warning level.

Also to have more control over ReliableTopic you can use ReliableMessageListener instead of MessageListener

@sancar sancar self-assigned this Nov 15, 2018
@kdohz
Copy link
Author

@kdohz kdohz commented Nov 15, 2018

Here is some details information that could help diagnose the problem:

1.  Yes, ReliableTopic is being used:

          ITopic<byte[]> topic = hazelcastClient.getReliableTopic(topicName);

2.  Yes, ReliableMessageListener is being used:

    private class ReliableMessageListenerImpl implements ReliableMessageListener<byte[]>
    {
      public void onMessage(Message<byte[]> m)
      {
        queueMessage(m.getMessageObject());
      }
      
      /**
       * This method is called when we try to read messages using the staled sequence. 
       * Staled sequence happens when we lose connectivity on the cluster or the publisher
       * is much faster than us (unlikely). We want to return true to prevent Hazelcast
       * from terminating the observer.
       */
      @Override
      public boolean isLossTolerant() 
      {
        // notify observer that some messages may have lost
        handleMessagesLost();
        return true;
      }

      /** 
       * Tell Hazelcast whether observer should be terminated on the failure parameter.
       * Return false to prevent Hazelcast from terminating the observer.
       */
      @Override
      public boolean isTerminal(Throwable failure) 
      {
        return false;
      }

      /** 
       * @return the stored sequence
       * This method is called once when we register the observer. 
       */
      @Override
      public long retrieveInitialSequence() 
      {
        return (msgSequence == -1)? msgSequence: msgSequence + 1;
      }

      /** 
       * Store the sequence for failure recovery purpose 
       */
      @Override
      public void storeSequence(long sequence) 
      {
        msgSequence = sequence;
      }
    }
  }
  1. And finally, right after the client receives CLIENT_DISCONNECTED event, this warning log is seen:
        hz.client_0 [as] [3.11] Terminating MessageListener com.broadsoft.persistence.hazelcast.BWHazelcastInstance$TopicContainer$ReliableMessageListenerImpl@7dc4a2ed on topic: profileManagementUpdate. Reason: Unhandled exception, message: Client is offline. <com.hazelcast.client.HazelcastClientOfflineException: Client is offline.>com.hazelcast.client.HazelcastClientOfflineException: Client is offline.
        at com.hazelcast.client.connection.nio.DefaultClientConnectionStrategy.beforeGetConnection(DefaultClientConnectionStrategy.java:58)
        at com.hazelcast.client.connection.nio.ClientConnectionManagerImpl.checkAllowed(ClientConnectionManagerImpl.java:335)
        at com.hazelcast.client.connection.nio.ClientConnectionManagerImpl.getConnection(ClientConnectionManagerImpl.java:307)
        at com.hazelcast.client.connection.nio.ClientConnectionManagerImpl.getOrTriggerConnect(ClientConnectionManagerImpl.java:298)
        at com.hazelcast.client.spi.impl.SmartClientInvocationService.getOrTriggerConnect(SmartClientInvocationService.java:73)
        at com.hazelcast.client.spi.impl.SmartClientInvocationService.invokeOnPartitionOwner(SmartClientInvocationService.java:48)
        at com.hazelcast.client.spi.impl.ClientInvocation.invokeOnSelection(ClientInvocation.java:163)
        at com.hazelcast.client.spi.impl.ClientInvocation.retry(ClientInvocation.java:194)
        at com.hazelcast.client.spi.impl.ClientInvocation.run(ClientInvocation.java:179)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
@sancar
Copy link
Member

@sancar sancar commented Nov 16, 2018

This seems like a bug. It make sense to retry HazelcastClientOfflineException.
I will prepare a fix for next patch.
Thanks for the report.

@sancar sancar added this to the 3.11.1 milestone Nov 16, 2018
@sancar sancar added the Type: Defect label Nov 16, 2018
sancar added a commit to sancar/hazelcast that referenced this issue Nov 16, 2018
ReliableTopic was getting terminated in case of
HazelcastClientOfflineException. With this pr, relieable
topic continues from last known sequence id, in case
of `HazelcastClientOfflineException`.

fixes hazelcast#14123
sancar added a commit to sancar/hazelcast that referenced this issue Nov 16, 2018
ReliableTopic was getting terminated in case of
HazelcastClientOfflineException. With this pr, relieable
topic continues from last known sequence id, in case
of `HazelcastClientOfflineException`.

fixes hazelcast#14123

(cherry picked from commit 1be1e43)
sancar added a commit to sancar/hazelcast that referenced this issue Nov 16, 2018
ReliableTopic was getting terminated in case of
HazelcastClientOfflineException. With this pr, relieable
topic continues from last known sequence id, in case
of `HazelcastClientOfflineException`.

fixes hazelcast#14123
sancar added a commit to sancar/hazelcast that referenced this issue Nov 16, 2018
ReliableTopic was getting terminated in case of
HazelcastClientOfflineException. With this pr, relieable
topic continues from last known sequence id, in case
of `HazelcastClientOfflineException`.

fixes hazelcast#14123

(cherry picked from commit 1be1e43)
blazember added a commit to blazember/hazelcast that referenced this issue Dec 11, 2018
ReliableTopic was getting terminated in case of
HazelcastClientOfflineException. With this pr, relieable
topic continues from last known sequence id, in case
of `HazelcastClientOfflineException`.

fixes hazelcast#14123
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants