Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression cacheManager.getCache(“defined-in-xml")); can rarely return null #7208

Closed
Danny-Hazelcast opened this issue Dec 29, 2015 · 18 comments

Comments

Projects
None yet
3 participants
@Danny-Hazelcast
Copy link
Member

commented Dec 29, 2015

I think we have a regression in

cacheManager.getCache(“defined-in-xml"));

it looks like some times it returns null

i think its very rare. I saw it happen when starting the first round of simulator tests for 3.6-EA3

and again just now running some other j cache tests
the latest one was

http://54.87.52.100/~jenkins/workspace/hd-jcache-nearCache/3.6-SNAPSHOT/2015_12_29-17_10_41/output/2015_12_29-17_10_59/failures-2015-12-29__17_11_00.txt

Failure[
   message='Worked ran into an unhandled exception'
   type=WORKER_EXCEPTION
   timestamp=1451401890070
   workerAddress=C_A1_W5
   agentAddress=52.90.126.123
   hzAddress=client:52.90.126.123
   workerId=worker-52.90.126.123-5-client
   test=TestCase{
        id=icache-sim,
        class=com.hazelcast.enterprise.tests.jcache.JcacheSim,
        asyncWait=50,
        cacheBaseNamesStr=nat-NoNear,
        cachesPerName=4,
        getAsyncThreds=4,
        getkeyDomainMax=1000000,
        loadCachePhase=true,
        loadkeyDomainMax=1000000,
        multiCacheHitCount=3,
        putAsyncThreds=16,
        putkeyDomainMax=1500000
    }
   cause=java.lang.NullPointerException
    at com.hazelcast.enterprise.tests.jcache.JcacheSim.validateCaceContent(JcacheSim.java:132)
    at com.hazelcast.enterprise.tests.jcache.JcacheSim.setup(JcacheSim.java:122)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.hazelcast.simulator.utils.ReflectionUtils.invokeMethod(ReflectionUtils.java:105)
    at com.hazelcast.simulator.worker.TestContainer.invoke(TestContainer.java:157)
    at com.hazelcast.simulator.protocol.processors.TestOperationProcessor$1.doRun(TestOperationProcessor.java:115)
    at com.hazelcast.simulator.protocol.processors.TestOperationProcessor$OperationThread.run(TestOperationProcessor.java:182)
]

https://github.com/hazelcast/hazelcast-ee-simulator-tests/blob/master/ee-tests/src/main/java/com/hazelcast/enterprise/tests/jcache/JcacheSim.java#L132

I have run this test lots of time, and at random we hit the NPE on using the cache

From 3.6-EA3

Failure[
   message='Worked ran into an unhandled exception'
   type=WORKER_EXCEPTION
   timestamp=1450455278435
   workerAddress=C_A1_W4
   agentAddress=54.165.157.166
   hzAddress=client:54.165.157.166
   workerId=worker-54.165.157.166-4-client
   test=TestCase{
        id=icacheMaxSmall,
        class=com.hazelcast.simulator.tests.icache.EvictionICacheTest,
        basename=maxCachSmall1
    }
   cause=java.lang.NullPointerException
    at com.hazelcast.simulator.tests.icache.EvictionICacheTest.setup(EvictionICacheTest.java:95)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.hazelcast.simulator.utils.ReflectionUtils.invokeMethod(ReflectionUtils.java:105)
    at com.hazelcast.simulator.worker.TestContainer.invoke(TestContainer.java:157)
    at com.hazelcast.simulator.protocol.processors.TestOperationProcessor$1.doRun(TestOperationProcessor.java:115)
    at com.hazelcast.simulator.protocol.processors.TestOperationProcessor$OperationThread.run(TestOperationProcessor.java:182)
]

https://github.com/hazelcast/hazelcast-simulator/blob/80b8b1c6a43f5b9900551408da8bce20a9076ec3/tests/src/main/java/com/hazelcast/simulator/tests/icache/EvictionICacheTest.java#L95

@serkan-ozal

This comment has been minimized.

Copy link
Contributor

commented Dec 29, 2015

I have seen java.lang.IllegalStateException: Cache operations can not be performed. The cache closed exceptions after INFO 2015-12-29 15:11:30,823 [WorkerShutdownThread] com.hazelcast.core.LifecycleService: HazelcastClient[hz.client_0_workers][3.6-SNAPSHOT] is SHUTTING_DOWN

Cache is closed because HazelcastClientCacheManager registers itself to LifeCycleService and closes itself and its owned caches when shutting down event is received.

@serkan-ozal

This comment has been minimized.

Copy link
Contributor

commented Dec 29, 2015

Also I can see that there is

INFO  2015-12-29 15:11:30,742 [nioEventLoopGroup-3-1] com.hazelcast.simulator.protocol.processors.TestOperationProcessor: --------------------------- Skipping run of icache-sim (member is passive) ---------------------------
INFO  2015-12-29 15:11:30,763 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
INFO  2015-12-29 15:11:30,763 [WorkerShutdownThread] com.hazelcast.core.LifecycleService: [10.0.0.38]:5701 [workers] [3.6-SNAPSHOT] Address[10.0.0.38]:5701 is SHUTTING_DOWN

logs at http://54.87.52.100/~jenkins/workspace/hd-jcache-nearCache/3.6-SNAPSHOT/2015_12_29-17_10_41/output/2015_12_29-17_10_59/2015-12-29__17_11_00/worker-52.90.223.101-1-member/worker.log

and time is very close to client's setup where cache are taken

@Danny-Hazelcast

This comment has been minimized.

Copy link
Member Author

commented Dec 30, 2015

@Donnerbart @serkan-ozal has proposed this issue could well be related to simulator rather than hazelcast as i first suspected.

@Danny-Hazelcast

This comment has been minimized.

Copy link
Member Author

commented Dec 30, 2015

I have seen another occurrence of this null cache issue, while running test for lease lock hang.
I run 25 30min runs, using the normal simulator test.propities test set. and 1 of 25 hit this issue.

Failure[
   message='Worked ran into an unhandled exception'
   type=WORKER_EXCEPTION
   timestamp=1451433673433
   workerAddress=C_A14_W4
   agentAddress=54.175.249.6
   hzAddress=client:54.175.249.6
   workerId=worker-54.175.249.6-4-client
   test=TestCase{
        id=iCacheCas,
        class=com.hazelcast.simulator.tests.icache.CasICacheTest,
        basename=iCacheCas,
        keyCount=1000,
        threadCount=3
    }
   cause=java.lang.NullPointerException
    at com.hazelcast.simulator.tests.icache.CasICacheTest$Worker.timeStep(CasICacheTest.java:112)
    at com.hazelcast.simulator.worker.tasks.AbstractMonotonicWorker.run(AbstractMonotonicWorker.java:35)
    at java.lang.Thread.run(Thread.java:745)
    at com.hazelcast.simulator.utils.ThreadSpawner$ReportExceptionThread.run(ThreadSpawner.java:183)
]

https://github.com/hazelcast/hazelcast-simulator/blob/80b8b1c6a43f5b9900551408da8bce20a9076ec3/tests/src/main/java/com/hazelcast/simulator/tests/icache/CasICacheTest.java#L112

@serkan-ozal

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2016

I don't know is this expected behaviour or not but here are my findings about simulator for this issue:

Seems that by default members are passive https://github.com/hazelcast/hazelcast-simulator/blob/25870c8a63cc7a49937c8797d0d977b92994d162/simulator/src/main/java/com/hazelcast/simulator/coordinator/CoordinatorParameters.java#L51

And RUN phase completed operation is sent here https://github.com/hazelcast/hazelcast-simulator/blob/25870c8a63cc7a49937c8797d0d977b92994d162/simulator/src/main/java/com/hazelcast/simulator/protocol/processors/TestOperationProcessor.java#L138

Then member instance is being stopped as seen from this log message: INFO 2015-12-29 15:11:30,763 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...

@Donnerbart Donnerbart assigned Donnerbart and unassigned serkan-ozal Jan 7, 2016

@Donnerbart

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2016

Passive members just skip the run phase, no other phases. The phases between all workers are synchronized for each test, so the members won't just rush through if the clients are still in the run phase. The only exception is if there was a critical test failure, then we abort the testsuite to fail fast.

Since there was a NPE in the setup phase the testsuite got aborted:

INFO  2015-12-29 17:11:28,689 [Thread-1] com.hazelcast.simulator.coordinator.TestCaseRunner: icache-sim Starting Test setup
ERROR 2015-12-29 17:11:30,082 [nioEventLoopGroup-2-3] com.hazelcast.simulator.coordinator.FailureContainer: Failure #1 C_A1_W5 icache-sim WORKER_EXCEPTION[java.lang.NullPointerException]
INFO  2015-12-29 17:11:30,722 [Thread-1] com.hazelcast.simulator.coordinator.TestCaseRunner: icache-sim Waiting for setup completion aborted (critical failure)
INFO  2015-12-29 17:11:30,725 [Thread-1] com.hazelcast.simulator.coordinator.TestCaseRunner: icache-sim Completed Test setup
INFO  2015-12-29 17:11:30,725 [Thread-1] com.hazelcast.simulator.coordinator.TestCaseRunner: Completed TestPhase setup
INFO  2015-12-29 17:11:30,728 [Thread-1] com.hazelcast.simulator.coordinator.TestCaseRunner: icache-sim Skipping Test local warmup (critical failure)
INFO  2015-12-29 17:11:30,731 [Thread-1] com.hazelcast.simulator.coordinator.TestCaseRunner: icache-sim Skipping Test global warmup (critical failure)
INFO  2015-12-29 17:11:30,734 [Thread-1] com.hazelcast.simulator.coordinator.TestCaseRunner: icache-sim Starting Test start (passive members)

The following IllegalStateException are very annoying, but they should not be the cause of the issue:

ERROR 2015-12-29 17:11:31,055 [nioEventLoopGroup-2-2] com.hazelcast.simulator.coordinator.FailureContainer: Failure #2 C_A3_W5 WORKER_EXCEPTION[java.lang.IllegalStateException: Tried to start RUN for test icache-sim, but SETUP is still running!]

It could still be a Simulator issue, but it shouldn't be, since there is phase synchronization. Also the abort makes sense after the first NPE in the setup phase. I'll check this further today.

@Donnerbart

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2016

Also the Hazelcast instance is shutdown when the whole worker is shutdown, not after the RUN phase (we still need Hazelcast in the VERIFY and TEARDOWN phase). Of course on a testsuite abort the workers are shutdown quite fast after the SETUP phase was started, so the timestamps are expected to be close.

@Donnerbart

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2016

I just checked the timings from the logs Serkan analysed before, and it looks how it should look like.

worker-52.90.126.123-5-client/worker.log:INFO  2015-12-29 15:11:28,688 [nioEventLoopGroup-3-1] com.hazelcast.simulator.protocol.processors.OperationProcessor: [C] icache-sim Starting Test setup
worker-52.90.126.123-5-client/worker.log:ERROR 2015-12-29 15:11:30,070 [WorkerJvmFailureMonitorThread] com.hazelcast.simulator.agent.workerjvm.WorkerJvmFailureMonitor: Detected failure on Worker worker-52.90.126.123-5-client: Failure #1 C_A1_W5 icache-sim WORKER_EXCEPTION[java.lang.NullPointerException]

worker-52.90.139.156-6-client/worker.log:INFO  2015-12-29 15:11:30,757 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.139.156-2-client/worker.log:INFO  2015-12-29 15:11:30,758 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.139.156-5-client/worker.log:INFO  2015-12-29 15:11:30,758 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.139.156-7-client/worker.log:INFO  2015-12-29 15:11:30,759 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.223.101-5-client/worker.log:INFO  2015-12-29 15:11:30,759 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.158.172-5-client/worker.log:INFO  2015-12-29 15:11:30,759 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.13.80-6-client/worker.log:INFO  2015-12-29 15:11:30,760 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.139.156-4-client/worker.log:INFO  2015-12-29 15:11:30,760 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.223.101-3-client/worker.log:INFO  2015-12-29 15:11:30,760 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.223.101-6-client/worker.log:INFO  2015-12-29 15:11:30,760 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.158.172-6-client/worker.log:INFO  2015-12-29 15:11:30,760 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.223.101-4-client/worker.log:INFO  2015-12-29 15:11:30,761 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.139.156-1-member/worker.log:INFO  2015-12-29 15:11:30,761 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.139.156-3-client/worker.log:INFO  2015-12-29 15:11:30,761 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.158.172-3-client/worker.log:INFO  2015-12-29 15:11:30,761 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.13.80-2-client/worker.log:INFO  2015-12-29 15:11:30,761 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.13.80-3-client/worker.log:INFO  2015-12-29 15:11:30,762 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.223.101-2-client/worker.log:INFO  2015-12-29 15:11:30,762 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.223.101-7-client/worker.log:INFO  2015-12-29 15:11:30,762 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.158.172-1-member/worker.log:INFO  2015-12-29 15:11:30,762 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.158.172-2-client/worker.log:INFO  2015-12-29 15:11:30,762 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.223.101-1-member/worker.log:INFO  2015-12-29 15:11:30,763 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.13.80-1-member/worker.log:INFO  2015-12-29 15:11:30,763 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.13.80-4-client/worker.log:INFO  2015-12-29 15:11:30,763 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.126.123-2-client/worker.log:INFO  2015-12-29 15:11:30,764 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.158.172-4-client/worker.log:INFO  2015-12-29 15:11:30,764 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.158.172-7-client/worker.log:INFO  2015-12-29 15:11:30,764 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.13.80-5-client/worker.log:INFO  2015-12-29 15:11:30,764 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.91.13.80-7-client/worker.log:INFO  2015-12-29 15:11:30,765 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.126.123-6-client/worker.log:INFO  2015-12-29 15:11:30,765 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.126.123-7-client/worker.log:INFO  2015-12-29 15:11:30,766 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.126.123-3-client/worker.log:INFO  2015-12-29 15:11:30,767 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.126.123-4-client/worker.log:INFO  2015-12-29 15:11:30,767 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.126.123-1-member/worker.log:INFO  2015-12-29 15:11:30,780 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...
worker-52.90.126.123-5-client/worker.log:INFO  2015-12-29 15:11:30,796 [WorkerShutdownThread] com.hazelcast.simulator.worker.MemberWorker: Stopping HazelcastInstance...

The first thing seems to be the NPE, then we abort the testsuite (which causes those IllegalStateExceptions), then the instances are shutdown. The workers are not shutdown in a particular order yet, so some clients lost the connection because the members went down first.

@serkan-ozal

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2016

Thanks for investigation David. I am taking this issue back on me.

@serkan-ozal serkan-ozal assigned serkan-ozal and unassigned Donnerbart Jan 8, 2016

@serkan-ozal

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2016

I think, I have found the issue.

The problem is that cache config can be requested in parallel from different clients. If cache config is not exist, it is tried to load from XML config. But at the current implementation, there is a race for this logic. Because only one of them can put this cache config inside configs in cache service and set the response with this config. But the others don't set the response and so response returns null. This means that cache config couldn't be found and getCache returns null. It doesn't throw exception in respect to jcache spec (@return the Cache or null if it does exist or can't be pre-configured)

@Danny-Hazelcast

This comment has been minimized.

Copy link
Member Author

commented Jan 8, 2016

well this was a rare issue, to reproduce. so i am happy to close the issue and reopen if we hit this issue again after fix merged

@Danny-Hazelcast

This comment has been minimized.

Copy link
Member Author

commented May 2, 2016

re-open for possible regression

simulator run
http://54.87.52.100/~jenkins/workspace/Hazelcast-Simulator-nightly-enterprise/3.7-SNAPSHOT/2016_04_28-01_30_11/

Failure[
   message='Worked ran into an unhandled exception'
   type=WORKER_EXCEPTION
   timestamp=1461796397362
   workerAddress=C_A1_W1
   agentAddress=54.173.197.119
   hzAddress=10.0.0.132:5701
   workerId=worker-C_A1_W1-54.173.197.119-member
   test=null
   cause=com.hazelcast.cache.CacheNotExistsException: Cache is already destroyed or not created yet, on Member [10.0.0.135]:5701 this
    at com.hazelcast.cache.EnterpriseCacheService.createNewRecordStore(EnterpriseCacheService.java:168)
    at com.hazelcast.cache.impl.CachePartitionSegment.createNew(CachePartitionSegment.java:49)
    at com.hazelcast.cache.impl.CachePartitionSegment.createNew(CachePartitionSegment.java:35)
    at com.hazelcast.util.ConcurrencyUtil.getOrPutSynchronized(ConcurrencyUtil.java:40)
    at com.hazelcast.cache.impl.CachePartitionSegment.getOrCreateRecordStore(CachePartitionSegment.java:65)
    at com.hazelcast.cache.impl.AbstractCacheService.getOrCreateRecordStore(AbstractCacheService.java:254)
    at com.hazelcast.cache.hidensity.operation.AbstractHiDensityCacheOperation.beforeRun(AbstractHiDensityCacheOperation.java:90)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:172)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:399)
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:117)
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.run(OperationThread.java:102)
    at ------ submitted from ------.(Unknown Source)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:111)
    at com.hazelcast.spi.impl.AbstractInvocationFuture$1.run(AbstractInvocationFuture.java:246)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
    at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)
    at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92)
]

logs of run
https://s3.amazonaws.com/dannyc/issues/icache-CacheNotExistsException.zip

@Danny-Hazelcast Danny-Hazelcast modified the milestones: 3.7, 3.6 May 2, 2016

@Danny-Hazelcast

This comment has been minimized.

Copy link
Member Author

commented May 11, 2016

@serkan-ozal

we have 1 more instance of this issue. full logs https://s3.amazonaws.com/dannyc/issues/nullCache.zip

@serkan-ozal

This comment has been minimized.

Copy link
Contributor

commented May 11, 2016

Good

@serkan-ozal

This comment has been minimized.

Copy link
Contributor

commented May 12, 2016

Added info level logging for tracking config add/remove calls: 382490c

Normally, cache config creation operations are sent to each node in the cluster and response from each node is waited before creating/returning cache proxy. So there should not be any node which has not the config.

However from the logs, as far as I see, somehow only one of the four members has the config but others are not. Anyway, I have added some logs to verify that config is not created in each node as sync (or maybe never created at all).

@serkan-ozal

This comment has been minimized.

Copy link
Contributor

commented May 12, 2016

I think, I have found a possible edge case that might cause an instance (HazelcastServerCacheManager) returns a cache proxy (because it has the config) even though the config is still not exist on all the nodes. So in a limited time window, putting through this cache proxy might cause CacheNotExistException.

@serkan-ozal

This comment has been minimized.

Copy link
Contributor

commented May 13, 2016

Lets see whether it still fails after 948f165

@serkan-ozal

This comment has been minimized.

Copy link
Contributor

commented May 16, 2016

should be fixed via 948f165. Closing now and can reopen/investigate if we see this issue again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.