Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cache] Infinite loop trying to initialize Cache configuration when deserialization fails #10728

Closed
lprimak opened this issue Jun 12, 2017 · 11 comments

Comments

@lprimak
Copy link
Contributor

@lprimak lprimak commented Jun 12, 2017

In our case, there may be an application that's joining Hazelcast cluster that doesn't have all classes available to the JCache configuration (i.e. key type and value type)
In this case, trying to deserialize CacheConfig causes ClassNotFoundException.

The expected behavior is that the cache configuration fails, but what's happening that the other node re-transmits cache configuration infinitely, resulting in an infinite loop:

(the exception keeps repeating infinitely with the same stack trace)

[2017-06-11T20:00:58.700-0500] [] [INFO] [] [com.hazelcast.system] [tid: _ThreadID=1 _ThreadName=main] [timeMillis: 1497229258700] [levelValue: 800] [100.74.17.133]:5901 [development] [3.9-SNAPSHOT] Copyright (c) 2008-2017, Hazelcast, Inc. All Rights Reserved.

[2017-06-11T20:00:58.700-0500] [] [INFO] [] [com.hazelcast.system] [tid: _ThreadID=1 _ThreadName=main] [timeMillis: 1497229258700] [levelValue: 800] [100.74.17.133]:5901 [development] [3.9-SNAPSHOT] Configured Hazelcast Serialization version: 1

[2017-06-11T20:00:59.109-0500] [] [INFO] [] [com.hazelcast.spi.impl.operationservice.impl.BackpressureRegulator] [tid: _ThreadID=1 _ThreadName=main] [timeMillis: 1497229259109] [levelValue: 800] [100.74.17.133]:5901 [development] [3.9-SNAPSHOT] Backpressure is disabled

[2017-06-11T20:00:59.955-0500] [] [INFO] [] [com.hazelcast.instance.Node] [tid: _ThreadID=1 _ThreadName=main] [timeMillis: 1497229259955] [levelValue: 800] [100.74.17.133]:5901 [development] [3.9-SNAPSHOT] Creating MulticastJoiner

[2017-06-11T20:01:00.178-0500] [] [INFO] [] [com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl] [tid: _ThreadID=1 _ThreadName=main] [timeMillis: 1497229260178] [levelValue: 800] [100.74.17.133]:5901 [development] [3.9-SNAPSHOT] Starting 4 partition threads

[2017-06-11T20:01:00.179-0500] [] [INFO] [] [com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl] [tid: _ThreadID=1 _ThreadName=main] [timeMillis: 1497229260179] [levelValue: 800] [100.74.17.133]:5901 [development] [3.9-SNAPSHOT] Starting 3 generic threads (1 dedicated for priority tasks)

[2017-06-11T20:01:00.190-0500] [] [INFO] [] [com.hazelcast.core.LifecycleService] [tid: _ThreadID=1 _ThreadName=main] [timeMillis: 1497229260190] [levelValue: 800] [100.74.17.133]:5901 [development] [3.9-SNAPSHOT] [100.74.17.133]:5901 is STARTING

[2017-06-11T20:01:00.340-0500] [] [INFO] [] [com.hazelcast.internal.cluster.impl.MulticastJoiner] [tid: _ThreadID=1 _ThreadName=main] [timeMillis: 1497229260340] [levelValue: 800] [100.74.17.133]:5901 [development] [3.9-SNAPSHOT] Trying to join to discovered node: [100.74.17.133]:5900

[2017-06-11T20:01:00.451-0500] [] [INFO] [] [com.hazelcast.nio.tcp.TcpIpConnector] [tid: _ThreadID=73 _ThreadName=hz._hzInstance_1_development.cached.thread-3] [timeMillis: 1497229260451] [levelValue: 800] [100.74.17.133]:5901 [development] [3.9-SNAPSHOT] Connecting to /100.74.17.133:5900, timeout: 0, bind-any: true

[2017-06-11T20:01:00.455-0500] [] [INFO] [] [com.hazelcast.nio.tcp.TcpIpConnectionManager] [tid: _ThreadID=73 _ThreadName=hz._hzInstance_1_development.cached.thread-3] [timeMillis: 1497229260455] [levelValue: 800] [100.74.17.133]:5901 [development] [3.9-SNAPSHOT] Established socket connection between /100.74.17.133:51741 and /100.74.17.133:5900

[2017-06-11T20:01:06.414-0500] [] [SEVERE] [] [com.hazelcast.config.Config] [tid: _ThreadID=57 _ThreadName=hz._hzInstance_1_development.generic-operation.thread-0] [timeMillis: 1497229266414] [levelValue: 1000] [[
com.hazelcast.nio.serialization.HazelcastSerializationException: java.lang.ClassNotFoundException: bg.jug.guestbook.entities.User
	at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$ClassSerializer.read(JavaDefaultSerializers.java:181)
	at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$ClassSerializer.read(JavaDefaultSerializers.java:169)
	at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:48)
	at com.hazelcast.internal.serialization.impl.AbstractSerializationService.readObject(AbstractSerializationService.java:265)
	at com.hazelcast.internal.serialization.impl.ByteArrayObjectDataInput.readObject(ByteArrayObjectDataInput.java:601)
	at com.hazelcast.config.CacheConfig.readData(CacheConfig.java:600)
	at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.readInternal(DataSerializableSerializer.java:158)
	at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.read(DataSerializableSerializer.java:105)
	at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.read(DataSerializableSerializer.java:50)
	at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:48)
	at com.hazelcast.internal.serialization.impl.AbstractSerializationService.readObject(AbstractSerializationService.java:265)
	at com.hazelcast.internal.serialization.impl.ByteArrayObjectDataInput.readObject(ByteArrayObjectDataInput.java:601)
	at com.hazelcast.cache.impl.operation.PostJoinCacheOperation.readInternal(PostJoinCacheOperation.java:88)
	at com.hazelcast.spi.Operation.readData(Operation.java:594)
	at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.readInternal(DataSerializableSerializer.java:158)
	at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.read(DataSerializableSerializer.java:105)
	at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.read(DataSerializableSerializer.java:50)
	at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:48)
	at com.hazelcast.internal.serialization.impl.AbstractSerializationService.readObject(AbstractSerializationService.java:265)
	at com.hazelcast.internal.serialization.impl.ByteArrayObjectDataInput.readObject(ByteArrayObjectDataInput.java:601)
	at com.hazelcast.internal.cluster.impl.operations.PostJoinOp.readInternal(PostJoinOp.java:124)
	at com.hazelcast.spi.Operation.readData(Operation.java:594)
	at com.hazelcast.internal.cluster.impl.operations.FinalizeJoinOp.readInternalImpl(FinalizeJoinOp.java:156)
	at com.hazelcast.internal.cluster.impl.operations.VersionedClusterOperation.readInternal(VersionedClusterOperation.java:59)
	at com.hazelcast.spi.Operation.readData(Operation.java:594)
	at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.readInternal(DataSerializableSerializer.java:158)
	at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.read(DataSerializableSerializer.java:105)
	at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.read(DataSerializableSerializer.java:50)
	at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:48)
	at com.hazelcast.internal.serialization.impl.AbstractSerializationService.toObject(AbstractSerializationService.java:183)
	at com.hazelcast.spi.impl.NodeEngineImpl.toObject(NodeEngineImpl.java:351)
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:394)
	at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:115)
	at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.run(OperationThread.java:100)
Caused by: java.lang.ClassNotFoundException: bg.jug.guestbook.entities.User
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at com.sun.enterprise.loader.CurrentBeforeParentClassLoader.loadClass(CurrentBeforeParentClassLoader.java:83)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at com.hazelcast.nio.ClassLoaderUtil.tryLoadClass(ClassLoaderUtil.java:173)
	at com.hazelcast.nio.ClassLoaderUtil.loadClass(ClassLoaderUtil.java:147)
	at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$ClassSerializer.read(JavaDefaultSerializers.java:179)
	... 33 more
]]
@vbekiaris
Copy link
Contributor

@vbekiaris vbekiaris commented Jun 12, 2017

@lprimak thanks for the report

@lprimak
Copy link
Contributor Author

@lprimak lprimak commented Jul 7, 2017

Something that may help here is not to transmit all the cache configurations when PostJoinOperation is being conducted, but at the first access of the cache, i.e. ICache/JCache.getCache() with a name that's not found.
This way if the class is not found it's the application's fault because it would already be loaded in order to call getCache() etc.

@mmedenjak mmedenjak changed the title Infinite loop trying to initialize Cache configuration when deserialization fails [cache] Infinite loop trying to initialize Cache configuration when deserialization fails Jul 11, 2017
@vbekiaris
Copy link
Contributor

@vbekiaris vbekiaris commented Oct 2, 2017

@lprimak thanks for the excellent issue report. A problem handling this issue is that the exception occurs while Hazelcast is still deserializing the FinalizeJoinOp operation that will make the dynamically added CacheConfigs known to the joining member. Since it's a generic deserialization failure, we cannot reliably recover deserialization and just skip the problematic CacheConfig including the unknown Class reference, thus at that point the choice is to fail-fast and forcefully terminate the joining member. Finer solutions (eg deferring class resolution or changing how Cache configs are distributed as per your suggestion) require compatibility-breaking changes which we are not allowed to do in a minor version release.

@lprimak
Copy link
Contributor Author

@lprimak lprimak commented Oct 3, 2017

Ok, this issue can wait until the next major version, please re-open it.
The problem does need to be solved at some point,
thank you @mmedenjak @vbekiaris

@lprimak
Copy link
Contributor Author

@lprimak lprimak commented Oct 10, 2017

Once again, please reopen this issue. Because it's not a major version release date yet doesn't mean this issue is invalid.

This issue is very much valid and the fail-fast fix doesn't really solve the problem.
Thank you!

@vbekiaris
Copy link
Contributor

@vbekiaris vbekiaris commented Oct 10, 2017

Hi @lprimak, here is an update on this issue: the wire format across members was updated in 3.9 (in #11513) so a typed CacheConfig with a non-resolvable Class as key or value type will not stop the new member from joining the cluster. Obviously not being able to resolve the key/value type will result in runtime failures later on, should one attempt to use the specific CacheConfig later on.

@lprimak
Copy link
Contributor Author

@lprimak lprimak commented Oct 11, 2017

@vbekiaris Interesting, certainly a big step forward.
Do you think it actually will resolve this issue correctly?
I have no way to tell unless I run some tests with Payara, but I think it has a good chance

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Oct 11, 2017

@lprimak you may try with the latest snapshot release as I see the last one contains the fix:
https://github.com/hazelcast/hazelcast#snapshot-releases

The JAR should contain the PreJoinCacheConfig class.

@lprimak
Copy link
Contributor Author

@lprimak lprimak commented Oct 11, 2017

I will do it as soon as I can.
I would need to compile my own snapshot because of TenantControl classes and afterRun() fix / enhancement PRs need to be merged

Thank you!

@lprimak
Copy link
Contributor Author

@lprimak lprimak commented Nov 29, 2017

Ok, I had a chance to test it and it looks great!

@vbekiaris
Copy link
Contributor

@vbekiaris vbekiaris commented Nov 29, 2017

@lprimak great , thanks for the heads up !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants
You can’t perform that action at this time.