Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMap GetOperation invoked in 3.5 EA2-Snapshot produced IllegalStateException with wrong partition but not occurred in 3.4.2 #5341

Closed
bwzhang2011 opened this issue May 21, 2015 · 11 comments · Fixed by #5353
Labels
Source: Community PR or issue was opened by a community user Team: Core Type: Defect
Milestone

Comments

@bwzhang2011
Copy link

hzteam, while upgraded to the latest 3.5 EA2 snapshot, I got the exception stack trace as follows while in 3.4.2 is OK, here comes my stack traces and hope someone could help me locate the problem.

五月 21, 2015 4:29:05 下午 com.hazelcast.map.impl.operation.GetOperation
严重: [127.0.0.1]:5880 [dev] [3.5-EA2-SNAPSHOT] wrong partition, expected: 225 but found:157
java.lang.IllegalStateException: wrong partition, expected: 225 but found:157
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.ensureNoPartitionProblems(OperationRunnerImpl.java:232)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:127)
    at com.hazelcast.spi.impl.operationexecutor.classic.ClassicOperationExecutor.runOnCallingThread(ClassicOperationExecutor.java:376)
    at com.hazelcast.spi.impl.operationexecutor.classic.ClassicOperationExecutor.runOnCallingThreadIfPossible(ClassicOperationExecutor.java:340)
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvokeLocal(Invocation.java:247)
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:231)
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invokeInternal(Invocation.java:198)
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:168)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationBuilderImpl.invoke(InvocationBuilderImpl.java:51)
    at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:431)
    at com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:254)
    at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:82)
    at com.zjht.channel.server.cache.AbstractHazelcastCache.getObject(AbstractHazelcastCache.java:50)
    at org.apache.ibatis.cache.decorators.LoggingCache.getObject(LoggingCache.java:57)
    at org.apache.ibatis.cache.decorators.TransactionalCache.getObject(TransactionalCache.java:68)
    at org.apache.ibatis.cache.TransactionalCacheManager.getObject(TransactionalCacheManager.java:35)
    at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:94)
    at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:82)
    at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:49)
    at com.zjht.channel.server.interceptor.mybatis.AbstractMyBatisInterceptor.intercept(AbstractMyBatisInterceptor.java:53)
    at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
    at com.sun.proxy.$Proxy53.query(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:63)
    at com.sun.proxy.$Proxy53.query(Unknown Source)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:120)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:113)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:73)
    at sun.reflect.GeneratedMethodAccessor83.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:386)
    at com.sun.proxy.$Proxy25.selectOne(Unknown Source)
    at org.mybatis.spring.SqlSessionTemplate.selectOne(SqlSessionTemplate.java:165)
    at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:69)
    at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:53)
    at com.sun.proxy.$Proxy28.selectByPrimaryKey(Unknown Source)
    at com.zjht.channel.platform.service.process.entity.impl.ChannelPlatformEntityProcessImpl.getChannelAppParaValue(ChannelPlatformEntityProcessImpl.java:101)
    at com.zjht.channel.platform.service.cache.questore.DefaultChannelQueueStoreFactoryBean$1.getTransWorkDate(DefaultChannelQueueStoreFactoryBean.java:78)
    at com.zjht.channel.integration.service.store.impl.AbstractChannelQueueStoreImpl.loadAllKeys(AbstractChannelQueueStoreImpl.java:106)
    at com.hazelcast.collection.impl.queue.QueueStoreWrapper.loadAllKeys(QueueStoreWrapper.java:270)
    at com.hazelcast.collection.impl.queue.QueueContainer.init(QueueContainer.java:98)
    at com.hazelcast.collection.impl.queue.QueueService.getOrCreateContainer(QueueService.java:138)
    at com.hazelcast.collection.impl.queue.operations.QueueOperation.getOrCreateContainer(QueueOperation.java:69)
    at com.hazelcast.collection.impl.queue.operations.PollOperation.shouldWait(PollOperation.java:92)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.waitingNeeded(OperationRunnerImpl.java:161)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:133)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processOperation(OperationThread.java:154)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.process(OperationThread.java:110)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.doRun(OperationThread.java:101)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.run(OperationThread.java:76)
@bwzhang2011 bwzhang2011 changed the title IMap GetOperation invoked in 3.5 EA2-Snapshot produced IllegalStateException with partition but not occurred in 3.4.2 IMap GetOperation invoked in 3.5 EA2-Snapshot produced IllegalStateException with wrong partition but not occurred in 3.4.2 May 21, 2015
@bwzhang2011
Copy link
Author

@pveentjer, or someone else. I tried it many times this afternoon that it threw such exception shown above while hz3.4.2 would not. i didn't test with the E-A release but for the latest snapshot artifact. so
please help me locate whether there's proper changes e.s.p for the partition check or not.

@Donnerbart
Copy link
Contributor

I didn't have an extensive look at the stacktrace yet, but if we have to reproduce your problem at our side it would be great if you could give us some more information and maybe a code snippet :)

  • map key/value types
  • map key/value length (if String or Byte)
  • map key/value count (how many different keys/values)
  • cluster layout (members, clients)
  • from where did you execute the call, member or client?
  • if client: old or new one?
  • code snippet of the method call which creates this stacktrace

@bwzhang2011
Copy link
Author

@Donnerbart, thanks for view and advice. as compared to hz3.4.2, I did think that hz3.5 changed for the interval innovation from the stack trace. what's the more I will introduce the procedure as follows:

  1. I just set up the hz instance in local mode with any other client connecting and the client is the old one(I just use mvn clean install -DskipTests=true to build the community of hz3.5)
  2. the map key/value is the simple prototype in java and also with the standard configuration without any special definition.
  3. as the stack tracing for calling, here comes the info as follows:
    3.1 I wrote some queue processing related to the queue store
    3.2 the queue store will invoke the querying related to the current day
    3.3 the querying above will call the hazelcast-mybatis integration for datbase querying
  4. the code run well both in 3.4.2 or 3.4.3 snapshot as usual, but in master branch it showed us
    OperationRunnerImpl.ensureNoPartitionProblems
    I don't know why it would induce ensureNoPartitionProblems.and would you mind leaving me more message for such comment.

@pveentjer pveentjer modified the milestones: 3.6, 3.5 May 22, 2015
@pveentjer
Copy link
Contributor

I think something fishy is going on when within an operation another invocation is being done. This should not be a problem for the same partition, but is not allowed for a different one due to deadlocks.

Apparently it isn't detected correctly and therefore this illegal call is allowed and only when executed you illegal call is detected. The top call is a queue PollOperation and this triggers an internal invocation map.get invocation.

I'll have a closer look why it isn't immediately detected.

@pveentjer
Copy link
Contributor

I think I found the cause:

We have a method that checks if an invocation can be done from the calling thread and the operation executed as well from the calling thread.

  @Override
    public boolean isInvocationAllowedFromCurrentThread(Operation op, boolean isAsync) {
          .... stuff

        // we are only allowed to invoke from a PartitionOperationThread if the operation belongs to that
        // PartitionOperationThread.
        PartitionOperationThread partitionThread = (PartitionOperationThread) currentThread;
        return toPartitionThreadIndex(op.getPartitionId()) == partitionThread.threadId;
    }

In this case we check if the calling thread is owner of that partition and apparently 225 and 157 are mapped to the same thread.

However.. instead of offloading the inner operation to the correct OperationRunner (every partition has its own), I guess we are using the OperationRunner of the outer operation and there we have a hard check that the partitions need to be the same.

So there is a discrepancy

@pveentjer
Copy link
Contributor

I managed to reproduce it:

 @Test
    public void test() throws ExecutionException, InterruptedException {
        Config config = new Config();
        config.setProperty(GroupProperties.PROP_PARTITION_COUNT, "2");
        config.setProperty(GroupProperties.PROP_PARTITION_OPERATION_THREAD_COUNT, "1");
        HazelcastInstance hz = createHazelcastInstance(config);
        final OperationService operationService = getOperationService(hz);

        int innerPartitionId = 0;
        int outerPartitionId = 1;

        final Operation innerOperation = new AbstractOperation() {
            @Override
            public void run() throws Exception {
            }
        };
        innerOperation.setPartitionId(innerPartitionId);

        Operation outerOperation = new AbstractOperation() {
            @Override
            public void run() throws Exception {
               Future f =  operationService.invokeOnPartition(null, innerOperation, innerOperation.getPartitionId());
                f.get();
            }
        };
        outerOperation.setPartitionId(outerPartitionId);

        Future f = operationService.invokeOnPartition(null, outerOperation, outerOperation.getPartitionId());
        f.get();
    }
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: wrong partition, expected: 1 but found:0
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponseOrThrowException(InvocationFuture.java:357)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:225)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:204)
    at com.hazelcast.spi.impl.operationexecutor.classic.RunOnCallingThreadIfPossibleTest$2.run(RunOnCallingThreadIfPossibleTest.java:39)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:139)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processOperation(OperationThread.java:154)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.process(OperationThread.java:110)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.doRun(OperationThread.java:101)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.run(OperationThread.java:76)
Caused by: java.lang.IllegalStateException: wrong partition, expected: 1 but found:0
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.ensureNoPartitionProblems(OperationRunnerImpl.java:234)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:129)
    at com.hazelcast.spi.impl.operationexecutor.classic.ClassicOperationExecutor.runOnCallingThread(ClassicOperationExecutor.java:376)
    at com.hazelcast.spi.impl.operationexecutor.classic.ClassicOperationExecutor.runOnCallingThreadIfPossible(ClassicOperationExecutor.java:340)
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvokeLocal(Invocation.java:247)
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:231)
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invokeInternal(Invocation.java:198)
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:168)
    at com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnPartition(OperationServiceImpl.java:271)
    at com.hazelcast.spi.impl.operationexecutor.classic.RunOnCallingThreadIfPossibleTest$2.run(RunOnCallingThreadIfPossibleTest.java:38)
    ... 5 more

@bwzhang2011
Copy link
Author

@pveentjer, thanks for view and accurate location and for producing such. I wondered that why it's not occurred in 3.4.x does it changed a lot since 3.5 or else if I used it wrong I will make some modification.

@pveentjer
Copy link
Contributor

The 3.5 release has a OperationRunnerImpl for every partition; something not available in the 3.4 release.

@bwzhang2011
Copy link
Author

@pveentjer, thanks for feed back and hope it could be fixed as expected.

pveentjer added a commit to pveentjer/hazelcast that referenced this issue May 22, 2015
pveentjer added a commit to pveentjer/hazelcast that referenced this issue May 22, 2015
The problem was that the system didn't deal correctly with an inner call for 2 different partitions, but mapped to the same thread. The error was that the OperationRunnerImpl for the inner operation was not obtained using
the partition id of the inner operation, but by accessing the OperationThread.currentOperationRunner. So the inner operation would run on the outer OperationRunner and then you get the exception.# Also tests have been
added for local and remote calls where this behaviour is tested. 1 change is ClassicOperationScheduler and 6 changes are in the tests.
@pveentjer
Copy link
Contributor

Fixed:

#5353

@bwzhang2011
Copy link
Author

@pveentjer, thank for quick fix and hope it will be merged into master quickly.

jerrinot pushed a commit to jerrinot/hazelcast that referenced this issue May 22, 2015
The problem was that the system didn't deal correctly with an inner call for 2 different partitions, but mapped to the same thread. The error was that the OperationRunnerImpl for the inner operation was not obtained using
the partition id of the inner operation, but by accessing the OperationThread.currentOperationRunner. So the inner operation would run on the outer OperationRunner and then you get the exception.# Also tests have been
added for local and remote calls where this behaviour is tested. 1 change is ClassicOperationScheduler and 6 changes are in the tests.
sancar pushed a commit to sancar/hazelcast that referenced this issue May 27, 2015
The problem was that the system didn't deal correctly with an inner call for 2 different partitions, but mapped to the same thread. The error was that the OperationRunnerImpl for the inner operation was not obtained using
the partition id of the inner operation, but by accessing the OperationThread.currentOperationRunner. So the inner operation would run on the outer OperationRunner and then you get the exception.# Also tests have been
added for local and remote calls where this behaviour is tested. 1 change is ClassicOperationScheduler and 6 changes are in the tests.
@mmedenjak mmedenjak added the Source: Community PR or issue was opened by a community user label Aug 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Source: Community PR or issue was opened by a community user Team: Core Type: Defect
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants