New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client and RedoOperations #8545

Closed
juanavelez opened this Issue Jul 18, 2016 · 9 comments

Comments

Projects
None yet
4 participants
@juanavelez

juanavelez commented Jul 18, 2016

Tested both using HZ 3.6.4 and latest 3.7-SNAPSHOT. I understand that when using a Hazelcast Client instance if the redoOperation is set to true, operations which failed to be sent to the Cluster are retried but failures from executing the operation itself in the cluster are not. I am testing Split Brain scenarios and merges and I keep getting this exception which I would think would fall under the category of operations handled by the redo-operation.

Jul 18, 2016 10:39:19 AM com.acc.test.hazelcast.main.GettingStartedClient
INFO: Working on transaction 10
Exception in thread "main" com.hazelcast.core.HazelcastException: java.io.IOException: Packet not send to [hc4.dev.xxx.com]:5701
    at com.hazelcast.util.ExceptionUtil.peel(ExceptionUtil.java:73)
    at com.hazelcast.util.ExceptionUtil.peel(ExceptionUtil.java:63)
    at com.hazelcast.util.ExceptionUtil.peel(ExceptionUtil.java:52)
    at com.hazelcast.util.ExceptionUtil.rethrow(ExceptionUtil.java:83)
    at com.hazelcast.client.proxy.txn.TransactionProxy.invoke(TransactionProxy.java:151)
    at com.hazelcast.client.proxy.txn.TransactionProxy.begin(TransactionProxy.java:83)
    at com.hazelcast.client.proxy.txn.TransactionContextProxy.beginTransaction(TransactionContextProxy.java:78)
    at com.acc.test.hazelcast.main.GettingStartedClient.testTransactionalPuts(GettingStartedClient.java:80)
    at com.acc.test.hazelcast.main.GettingStartedClient.main(GettingStartedClient.java:31)
Caused by: java.io.IOException: Packet not send to [hc4.dev.xxx.com]:5701
    at com.hazelcast.client.spi.impl.ClientInvocationServiceSupport.send(ClientInvocationServiceSupport.java:121)
    at com.hazelcast.client.spi.impl.ClientSmartInvocationServiceImpl.invokeOnConnection(ClientSmartInvocationServiceImpl.java:99)
    at com.hazelcast.client.spi.impl.ClientInvocation.invokeOnSelection(ClientInvocation.java:140)
    at com.hazelcast.client.spi.impl.ClientInvocation.invoke(ClientInvocation.java:122)
    at com.hazelcast.client.proxy.txn.TransactionProxy.invoke(TransactionProxy.java:148)
    at com.hazelcast.client.proxy.txn.TransactionProxy.begin(TransactionProxy.java:83)
    at com.hazelcast.client.proxy.txn.TransactionContextProxy.beginTransaction(TransactionContextProxy.java:78)
    at com.acc.test.hazelcast.main.GettingStartedClient.testTransactionalPuts(GettingStartedClient.java:80)
    at com.acc.test.hazelcast.main.GettingStartedClient.main(GettingStartedClient.java:31)
    at ------ submitted from ------.(Unknown Source)
    at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolveAndThrow(ClientInvocationFuture.java:74)
    at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolveAndThrow(ClientInvocationFuture.java:30)
    at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:150)
    at com.hazelcast.client.proxy.txn.TransactionProxy.invoke(TransactionProxy.java:149)

This is the test scenario: I have 4 nodes communicating via tcp-ip. There are two partition groups (G!=N1,N2, G2=N3,N4). I start the 4 nodes and the 4-node cluster is formed. I start my client test application which executes 10 different transactions of which each transaction inserts 10K in a transactional map. Midway, I simulate a network failure between G1 and G2. In the middle of the transaction, I can see the new state of the cluster and I handle the TransactionException/RetryableHazelcastException. When retrying the transaction I get the exception above. The client is set up with both smart routing and redo operation enabled and the addresses of the 4 nodes.

        ClientConfig clientConfig = new ClientConfig();
        ClientNetworkConfig networkConfig = new ClientNetworkConfig();
        networkConfig.setSmartRouting(true);
        networkConfig.setRedoOperation(true);
        String[] addresses = { "hc1.dev.xxx.com", "hc2.dev.xxx.com",
                               "hc3.dev.xxx.com", "hc4.dev.xxx.com" };
        networkConfig.setAddresses(Arrays.asList(addresses));
        clientConfig.setNetworkConfig(networkConfig);
        final HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
        IMap<Integer, String> customersMap = client.getMap("customers");
        TransactionContext txCtx = client.newTransactionContext();
        LoggerFactory loggerFactory = new StandardLoggerFactory();
        ILogger logger = loggerFactory.getLogger(GettingStartedClient.class.getName());
        int maxTransactions = 10;
        int maxEntriesPerTx = 1000;
        int startKey = 0;
        logger.info("Starting " + maxTransactions + " transactions");
        for (int i = 0; i < maxTransactions; i++) {
            logger.info("Working on transaction " + (i+1));
            txCtx.beginTransaction();
            TransactionalMap<Integer, String> customersTxMap = txCtx.getMap("customers");
            try {
                for (int j = 0; j < maxEntriesPerTx; j++) {
                    int entryKey = startKey + maxEntriesPerTx * i + j;
                    customersTxMap.put(new Integer(entryKey), new Integer(entryKey).toString());
                }
                txCtx.commitTransaction();
                logger.info("Put " + (maxEntriesPerTx * (i+1)) + " entries so far");
                logger.info("Retrieving size for customers map");
                logger.info("Size=" + customersMap.size());
            } catch (TransactionException | RetryableHazelcastException e) {
                logger.warning("TransactionException/RetryableHazelcastException occurred. Rolling back and retrying transaction "
                               + i--, e);
                txCtx.rollbackTransaction();
            }
        }
        logger.info("Put " + (maxEntriesPerTx * maxTransactions) + " entries total");
        logger.info("Retrieving size for customers map");
        logger.info("Size=" + customersMap.size());
        client.shutdown();
INFO: Working on transaction 10
Jul 18, 2016 10:39:18 AM com.hazelcast.client.spi.impl.ClientMembershipListener
INFO: hz.client_0 [dev] [3.7-SNAPSHOT] 

Members [3] {
    Member [hc1.dev.xxx.com]:5701 - c31023bd-7d48-4299-8d3c-512b6c0f291c
    Member [hc2.dev.xxx.com]:5701 - 93248e07-8a36-4d51-8651-dea1e6f2cb76
    Member [hc4.dev.xxx.com]:5701 - 7e9ebb02-f71e-44e1-9c0e-aa028c4df604
}

Jul 18, 2016 10:39:18 AM com.hazelcast.client.connection.nio.ClientConnection
WARNING: hz.client_0 [dev] [3.7-SNAPSHOT] Connection [hc3.dev.xxx.com/172.16.100.24:5701] lost. Reason: com.hazelcast.spi.exception.TargetDisconnectedException[Closing connection to member [hc3.dev.xxx.com]:5701. The client has closed the connection to this member, after receiving a member left event from the cluster. Connection=ClientConnection{live=true, writeHandler=com.hazelcast.client.connection.nio.ClientWriteHandler@129eee57, readHandler=com.hazelcast.client.connection.nio.ClientReadHandler@579c6260, connectionId=4, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[connected local=/172.19.131.36:49406 remote=hc3.dev.xxx.com/172.16.100.24:5701]}, remoteEndpoint=[hc3.dev.xxx.com]:5701}]
Jul 18, 2016 10:39:18 AM com.hazelcast.client.spi.impl.ClientMembershipListener
INFO: hz.client_0 [dev] [3.7-SNAPSHOT] 

Members [2] {
    Member [hc1.dev.xxx.com]:5701 - c31023bd-7d48-4299-8d3c-512b6c0f291c
    Member [hc2.dev.xxx.com]:5701 - 93248e07-8a36-4d51-8651-dea1e6f2cb76
}

Jul 18, 2016 10:39:18 AM com.hazelcast.client.connection.nio.ClientConnection
WARNING: hz.client_0 [dev] [3.7-SNAPSHOT] Connection [hc4.dev.xxx.com/172.16.100.25:5701] lost. Reason: com.hazelcast.spi.exception.TargetDisconnectedException[Closing connection to member [hc4.dev.xxx.com]:5701. The client has closed the connection to this member, after receiving a member left event from the cluster. Connection=ClientConnection{live=true, writeHandler=com.hazelcast.client.connection.nio.ClientWriteHandler@388705c, readHandler=com.hazelcast.client.connection.nio.ClientReadHandler@37b39d2f, connectionId=2, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[connected local=/172.19.131.36:49403 remote=hc4.dev.xxx.com/172.16.100.25:5701]}, remoteEndpoint=[hc4.dev.xxx.com]:5701}]
Jul 18, 2016 10:39:19 AM com.acc.test.hazelcast.main.GettingStartedClient
WARNING: TransactionException/RetryableHazelcastException occurred. Rolling back and retrying transaction 9
com.hazelcast.spi.exception.TargetDisconnectedException: Disconnecting from member [hc4.dev.xxx.com]:5701 due to heartbeat problems. Current time: 2016-07-18 10:39:19.547. Last heartbeat: never. Last read: 2016-07-18 10:39:15.549. Connection ClientConnection{live=false, writeHandler=com.hazelcast.client.connection.nio.ClientWriteHandler@388705c, readHandler=com.hazelcast.client.connection.nio.ClientReadHandler@37b39d2f, connectionId=2, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[closed]}, remoteEndpoint=[hc4.dev.xxx.com]:5701}
    at com.hazelcast.spi.exception.TargetDisconnectedException.newTargetDisconnectedExceptionCausedByHeartBeat(TargetDisconnectedException.java:62)
    at com.hazelcast.client.spi.impl.ClientInvocationServiceSupport$CleanResourcesTask.run(ClientInvocationServiceSupport.java:221)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)
    at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92)
    at ------ submitted from ------.(Unknown Source)
    at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolveAndThrow(ClientInvocationFuture.java:74)
    at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolveAndThrow(ClientInvocationFuture.java:30)
    at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:158)
    at com.hazelcast.client.proxy.txn.ClientTxnProxy.invoke(ClientTxnProxy.java:49)
    at com.hazelcast.client.proxy.txn.ClientTxnMapProxy.put(ClientTxnMapProxy.java:120)
    at com.hazelcast.client.proxy.txn.ClientTxnMapProxy.put(ClientTxnMapProxy.java:110)
    at com.acc.test.hazelcast.main.GettingStartedClient.testTransactionalPuts(GettingStartedClient.java:85)
    at com.acc.test.hazelcast.main.GettingStartedClient.main(GettingStartedClient.java:31)
Caused by: com.hazelcast.spi.exception.TargetDisconnectedException: Closing connection to member [hc4.dev.xxx.com]:5701. The client has closed the connection to this member, after receiving a member left event from the cluster. Connection=ClientConnection{live=true, writeHandler=com.hazelcast.client.connection.nio.ClientWriteHandler@388705c, readHandler=com.hazelcast.client.connection.nio.ClientReadHandler@37b39d2f, connectionId=2, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[connected local=/172.19.131.36:49403 remote=hc4.dev.xxx.com/172.16.100.25:5701]}, remoteEndpoint=[hc4.dev.xxx.com]:5701}
    at com.hazelcast.spi.exception.TargetDisconnectedException.newTargetDisconnectedExceptionCausedByMemberLeftEvent(TargetDisconnectedException.java:71)
    at com.hazelcast.client.spi.impl.ClientMembershipListener.memberRemoved(ClientMembershipListener.java:167)
    at com.hazelcast.client.spi.impl.ClientMembershipListener.handle(ClientMembershipListener.java:75)
    at com.hazelcast.client.impl.protocol.codec.ClientAddMembershipListenerCodec$AbstractEventHandler.handle(ClientAddMembershipListenerCodec.java:168)
    at com.hazelcast.client.spi.impl.ClientMembershipListener.handle(ClientMembershipListener.java:1)
    at com.hazelcast.client.spi.impl.listener.ClientListenerServiceImpl$ClientEventProcessor.run(ClientListenerServiceImpl.java:124)
    at com.hazelcast.util.executor.StripedExecutor$Worker.process(StripedExecutor.java:187)
    at com.hazelcast.util.executor.StripedExecutor$Worker.run(StripedExecutor.java:171)

Jul 18, 2016 10:39:19 AM com.acc.test.hazelcast.main.GettingStartedClient
INFO: Working on transaction 10
Exception in thread "main" com.hazelcast.core.HazelcastException: java.io.IOException: Packet not send to [hc4.dev.xxx.com]:5701
    at com.hazelcast.util.ExceptionUtil.peel(ExceptionUtil.java:73)
    at com.hazelcast.util.ExceptionUtil.peel(ExceptionUtil.java:63)
    at com.hazelcast.util.ExceptionUtil.peel(ExceptionUtil.java:52)
    at com.hazelcast.util.ExceptionUtil.rethrow(ExceptionUtil.java:83)
    at com.hazelcast.client.proxy.txn.TransactionProxy.invoke(TransactionProxy.java:151)
    at com.hazelcast.client.proxy.txn.TransactionProxy.begin(TransactionProxy.java:83)
    at com.hazelcast.client.proxy.txn.TransactionContextProxy.beginTransaction(TransactionContextProxy.java:78)
    at com.acc.test.hazelcast.main.GettingStartedClient.testTransactionalPuts(GettingStartedClient.java:80)
    at com.acc.test.hazelcast.main.GettingStartedClient.main(GettingStartedClient.java:31)
Caused by: java.io.IOException: Packet not send to [hc4.dev.xxx.com]:5701
    at com.hazelcast.client.spi.impl.ClientInvocationServiceSupport.send(ClientInvocationServiceSupport.java:121)
    at com.hazelcast.client.spi.impl.ClientSmartInvocationServiceImpl.invokeOnConnection(ClientSmartInvocationServiceImpl.java:99)
    at com.hazelcast.client.spi.impl.ClientInvocation.invokeOnSelection(ClientInvocation.java:140)
    at com.hazelcast.client.spi.impl.ClientInvocation.invoke(ClientInvocation.java:122)
    at com.hazelcast.client.proxy.txn.TransactionProxy.invoke(TransactionProxy.java:148)
    at com.hazelcast.client.proxy.txn.TransactionProxy.begin(TransactionProxy.java:83)
    at com.hazelcast.client.proxy.txn.TransactionContextProxy.beginTransaction(TransactionContextProxy.java:78)
    at com.acc.test.hazelcast.main.GettingStartedClient.testTransactionalPuts(GettingStartedClient.java:80)
    at com.acc.test.hazelcast.main.GettingStartedClient.main(GettingStartedClient.java:31)
    at ------ submitted from ------.(Unknown Source)
    at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolveAndThrow(ClientInvocationFuture.java:74)
    at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolveAndThrow(ClientInvocationFuture.java:30)
    at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:150)
    at com.hazelcast.client.proxy.txn.TransactionProxy.invoke(TransactionProxy.java:149)
    ... 4 more

My question is, would this IOException be handled by the redoOperation?

@mustafaiman

This comment has been minimized.

Member

mustafaiman commented Aug 5, 2016

Hello @juanavelez
Client side transactions are bounded to a single server connection to provide atomicity of the transaction. When client loses connection to that server, the transaction cannot be retried. Therefore, this exception will not be handled by redoOperation.

@juanavelez

This comment has been minimized.

juanavelez commented Aug 22, 2016

Hello @mustafaiman. Thanks for you reply. So what is the best approach to handling transactions from a Client perspective? In this case, what would the approach be? As I stated, I am already handling Transaction Exceptions/Retryable Exceptions. But if I get an IO Exception (as you can see there is no mention on transaction within the stack of the exception itself, it is just plain IO) what are my options?

@juanavelez

This comment has been minimized.

juanavelez commented Feb 13, 2017

Any updates on this?

@sancar sancar added this to the 3.8.1 milestone Feb 14, 2017

@sancar

This comment has been minimized.

Member

sancar commented Mar 14, 2017

Hi @juanavelez , sorry for late response.
You will need to do something like this:

   TransactionContext txCtx = client.newTransactionContext();

    for (int i = 0; i < maxTransactions; i++) {
        try{
               txCtx.beginTransaction();

                try {
                   .....
                  txCtx.commitTransaction();
           
                  } catch (TransactionException | RetryableHazelcastException e) {
                      .....  
                        txCtx.rollbackTransaction();
                  } 
        }catch(HazelcastException e){
                if(e.getCause() instanceof IOException){
                      txCtx = client.newTransactionContext();       <<== get a new transactionContext in case of IOException
               }
        }
       
    }

Currently, When beginTransaction throws an IOException that means you can not use that TransactionContext anymore. TransactionContext is bound to a random connection when it first created. If connection is closed you can not use it anymore.

You can get HazelcastException(IOException) from any of the operations inside try block.
Difference with TransactionException is that

  1. You don't need to do rollback
    Transaction will be rolled back at the remote server automatically when connection is closed.
  2. Also you can't do rollback.
    Since connection is not there anymore, you can't tell server to rollback.
    Calling txCtx.rollbackTransaction() is harmless and does not throw any exception(do a logging at the moment).

If you have further questions, don't hesitate to post another question here.

@juanavelez

This comment has been minimized.

juanavelez commented Mar 14, 2017

@sancar Thank you for your advice!

@juanavelez

This comment has been minimized.

juanavelez commented Mar 14, 2017

@sancar So I tried your suggestion and it is a mixed outcome: When the network is split, as you indicated the HazelcastException is caught and the checks for it being an IOException succeed and I can retry successfully after recreating the Transaction Context. However, when the network is fixed and the other nodes can again be seen, some of my clients get "HazelcastNotActiveException" when executing any of the transaction-related operation.

@sancar

This comment has been minimized.

Member

sancar commented Mar 15, 2017

@juanavelez . If you share full stacktrace/message of the exception, I will try to understand why you got that. Is it HazelcastClientNotActiveException or HazelcastInstanceNotActiveException ?

@juanavelez

This comment has been minimized.

juanavelez commented Mar 15, 2017

@sancar I indeed wrote the wrong exception name and the correct one is HazelcastInstanceNotActiveException. I am using Hazelcast Simulator (0.10-SNAPSHOT) to perform my tests. I have 4 VMs, and each VM runs both a Hazelcast Server and a Client. The test itself is executed on the Client Workers. I mimic the network being sever (Servers 1 and 2 can't see Servers 3 and 4 and vice-versa using iptables) after ~90-100 seconds of the test starting. After another 100 seconds I restore the network and it takes around 30-60 seconds for the original cluster to form back(and merge) and that's around the time I get the exception. This is the stack trace I get:

com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not ready yet!
	at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.initializeAndProcessMessage(AbstractMessageTask.java:122)
	at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.run(AbstractMessageTask.java:108)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
	at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)
	at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92)
	at ------ submitted from ------.(Unknown Source)
	at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolveAndThrow(ClientInvocationFuture.java:74)
	at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolveAndThrow(ClientInvocationFuture.java:30)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:158)
	at com.hazelcast.client.proxy.txn.ClientTxnProxy.invoke(ClientTxnProxy.java:49)
	at com.hazelcast.client.proxy.txn.ClientTxnMapProxy.get(ClientTxnMapProxy.java:81)
	at net.intermedia.hz.tests.MapTransactionReadWriteTest.get(MapTransactionReadWriteTest.java:147)
	at MapTransactionReadWriteTestRunnerMapTransactionReadWriteTest.timeStepLoop(MapTransactionReadWriteTestRunnerMapTransactionReadWriteTest.java:40)
	at com.hazelcast.simulator.worker.testcontainer.TimeStepRunner.run(TimeStepRunner.java:84)
	at java.lang.Thread.run(Thread.java:745)
	at com.hazelcast.simulator.utils.ThreadSpawner$ReportExceptionThread.run(ThreadSpawner.java:181)```

Thank you, Juan

@sancar sancar modified the milestones: 3.8.2, 3.8.1 Mar 17, 2017

@sancar sancar reopened this Mar 17, 2017

@sancar

This comment has been minimized.

Member

sancar commented Mar 17, 2017

HazelcastInstanceNotActiveException normally retried by client.
"hazelcast.client.invocation.timeout.seconds" 120 SECONDS
After 120 seconds , it will throw the last exception it gets from invocation. txnMap.get in this case.

But in this case, it is not retried because transactions are bound to a connection.
https://github.com/hazelcast/hazelcast/blob/v3.8/hazelcast-client/src/main/java/com/hazelcast/client/spi/impl/ClientInvocation.java#L179
which seems wrong to me. HazelcastInstanceNotActiveException should have been retried in this case.

Our latest fix regarding retryable exceptions should fix this problem. When bound to single connection like transactions, only IOException is not retried.(which means connection is problematic). The other exceptions can be retried.
https://github.com/hazelcast/hazelcast/pull/9823/files#diff-9c81ad0eddfeb7a319a4dbc161d99801R174

Fixed with #9823

@sancar sancar closed this Mar 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment