New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cache] JCache fails to initialise when type of a key or value is not available on a remote node #8972

Closed
smillidge opened this Issue Sep 23, 2016 · 21 comments

Comments

Projects
None yet
5 participants
@smillidge

smillidge commented Sep 23, 2016

In the scenario where you create a cache like;

        cache = cacheManager.getCache("test",String.class, CustomPOJO.class);
        if (cache == null) {
            MutableConfiguration<String, CustomPOJO> cconfig = new MutableConfiguration<>();
            cache = cacheManager.createCache("test", cconfig);
        }

If a cluster member is started where CustomPojo can not be loaded by the context classloader of the hz._hzInstance_1_development.generic-operation.thread-3 thread then repeated errors are thrown and the cluster member never starts.

[2016-09-23T22:16:23.394+0100] [Payara 4.1] [SEVERE] [] [com.hazelcast.spi.impl.operationexecutor.classic.ClassicOperationExecutor] [tid: _ThreadID=74 _ThreadName=hz._hzInstance_1_development.generic-operation.thread-3] [timeMillis: 1474665383394] [levelValue: 1000] [[
  [192.168.139.128]:5901 [development] [3.6.4] Failed to process packet: Packet{header=17, isResponse=false, isOperation=true, isEvent=false, partitionId=-1, conn=Connection [/192.168.139.128:54657 -> /192.168.139.128:5900], endpoint=Address[192.168.139.128]:5900, alive=true, type=MEMBER} on hz._hzInstance_1_development.generic-operation.thread-3
com.hazelcast.nio.serialization.HazelcastSerializationException: java.lang.ClassNotFoundException: fish.payara.testcases.payara1072.CustomPOJO
    at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$ClassSerializer.read(JavaDefaultSerializers.java:181)
    at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$ClassSerializer.read(JavaDefaultSerializers.java:169)
    at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:46)
    at com.hazelcast.internal.serialization.impl.AbstractSerializationService.readObject(AbstractSerializationService.java:214)
    at com.hazelcast.internal.serialization.impl.ByteArrayObjectDataInput.readObject(ByteArrayObjectDataInput.java:600)
    at com.hazelcast.config.CacheConfig.readData(CacheConfig.java:543)
    at com.hazelcast.internal.serialization.impl.DataSerializer.read(DataSerializer.java:121)
    at com.hazelcast.internal.serialization.impl.DataSerializer.read(DataSerializer.java:47)
    at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:46)
    at com.hazelcast.internal.serialization.impl.AbstractSerializationService.readObject(AbstractSerializationService.java:214)
    at com.hazelcast.internal.serialization.impl.ByteArrayObjectDataInput.readObject(ByteArrayObjectDataInput.java:600)
    at com.hazelcast.cache.impl.operation.PostJoinCacheOperation.readInternal(PostJoinCacheOperation.java:64)
    at com.hazelcast.spi.Operation.readData(Operation.java:557)
    at com.hazelcast.internal.serialization.impl.DataSerializer.read(DataSerializer.java:121)
    at com.hazelcast.internal.serialization.impl.DataSerializer.read(DataSerializer.java:47)
    at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:46)
    at com.hazelcast.internal.serialization.impl.AbstractSerializationService.readObject(AbstractSerializationService.java:214)
    at com.hazelcast.internal.serialization.impl.ByteArrayObjectDataInput.readObject(ByteArrayObjectDataInput.java:600)
    at com.hazelcast.cluster.impl.operations.PostJoinOperation.readInternal(PostJoinOperation.java:136)
    at com.hazelcast.spi.Operation.readData(Operation.java:557)
    at com.hazelcast.cluster.impl.operations.FinalizeJoinOperation.readInternal(FinalizeJoinOperation.java:178)
    at com.hazelcast.spi.Operation.readData(Operation.java:557)
    at com.hazelcast.internal.serialization.impl.DataSerializer.read(DataSerializer.java:121)
    at com.hazelcast.internal.serialization.impl.DataSerializer.read(DataSerializer.java:47)
    at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:46)
    at com.hazelcast.internal.serialization.impl.AbstractSerializationService.toObject(AbstractSerializationService.java:170)
    at com.hazelcast.spi.impl.NodeEngineImpl.toObject(NodeEngineImpl.java:234)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:378)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processPacket(OperationThread.java:184)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.process(OperationThread.java:137)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.doRun(OperationThread.java:124)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.run(OperationThread.java:99)
Caused by: java.lang.ClassNotFoundException: fish.payara.testcases.payara1072.CustomPOJO
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at com.sun.enterprise.loader.CurrentBeforeParentClassLoader.loadClass(CurrentBeforeParentClassLoader.java:82)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at com.hazelcast.nio.ClassLoaderUtil.tryLoadClass(ClassLoaderUtil.java:137)
    at com.hazelcast.nio.ClassLoaderUtil.loadClass(ClassLoaderUtil.java:115)
    at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$ClassSerializer.read(JavaDefaultSerializers.java:179)
    ... 31 more
]]

A workaround is to make the cache untyped.

        cache = cacheManager.getCache("test");
        if (cache == null) {
            MutableConfiguration cconfig = new MutableConfiguration<>();
            cache = cacheManager.createCache("test", cconfig);
        }

It may seem obvious that this is true. However the subtlety arises when the class is available within a classloader within the same JVM but not to the Hazelcast threads as they are booted with a different classloader. This is often the case in an application server.

@vbekiaris vbekiaris added this to the 3.8 milestone Sep 25, 2016

@vbekiaris

This comment has been minimized.

Contributor

vbekiaris commented Sep 25, 2016

Thanks for the report @smillidge! The issue occurs as the receiving cluster
member deserializes the CacheConfig object, a part of which is the key
and value types definition. In doing so, an attempt is made to load the class
by name, using the input's classloader in
ClassSerializer.
In your use case, this class is not available in this classloader, resulting in the
ClassNotFoundException. A simple workaround would be to override the default
serializer, however ClassSerializer is one of Hazelcast's built-in "default" serializers
and cannot be overriden.
In the meanwhile, could you please describe a bit more your use case, how is Hazelcast
booted and in which classloader are domain classes available?
In what kind of setup did you encounter this issue? Are you using OSGi?

Cheers!

@smillidge

This comment has been minimized.

smillidge commented Sep 26, 2016

The use case is Payara Server, which is a Java application server bootstraps Hazelcast as a core function of the server. As Hazelcast spawns its own threads then these threads will inherit the server bootclassloader. Then a user deploys a web application war file which tries to use the JCache api to create a typed cache, using the code sample above. As the war file has its own classloader the classes in the war are only visible to the war. Hence we get the problem shown.

I think as a rule Hazelcast should try as much as possible to not deserialize application data in operation threads it creates as these threads will just inherit whatever classloader was on the parent thread. I know there is no problem if we use Imap it is just the JCache api that creates a problem.

This is not due to OSGI just standard application server classloading of war files.

@jerrinot jerrinot modified the milestones: 4.0, 3.8, 3.9 Dec 12, 2016

@lprimak

This comment has been minimized.

lprimak commented May 21, 2017

I think an easy way to solve this is for Hazelcast to provide a hook when starting/finishing an OperationThread-based task. This way, Payara can do whatever it wants, i.e. set context classloaders, ThreadLocal setup etc. From what I can see working on a number of this type of an issue, this would solve most, if not all problems at once.

Something like this:

interface OperationThreadHook {
  void onStart();
  void onFinish();
};

Prior to doing any work, Hazelcast would do something like this:

OperationThread.doWork() {
  OperationThreadHook hook = // get the hook if configured
  try {
    hook.onStart();
    // do OperationThread stuff
   }
   finally {
      hook.onFinish();
   }   
}
@lprimak

This comment has been minimized.

lprimak commented May 22, 2017

Another thing, not 100% sure on what to do about this, but OperationThreadHook.onXXX() should take a parameter so Payara can determine the tenant application that created the actual object that's being worked on. Payara uses an ApplicationInfo class (could be just Object on Hazelcast side) or a String to identify the tenant

@lprimak

This comment has been minimized.

lprimak commented May 25, 2017

Another major issue is the CacheConfig serialization / deserialization when instantiating a new Hazelcast node. Usually, Hz is initialized first before Payara gets to "deploy" any application, thus when deserializing CacheConfig, it cannot find any application classes, even if the OperationThread hooks were there.
I believe the solution to this is to defer deserializing CacheConfig to a later point (where it's used) so at that point it's guaranteed that Applications (and thus all classes) have been loaded

@lprimak

This comment has been minimized.

lprimak commented Jun 12, 2017

Related issues: #9735 and #10728

@lprimak

This comment has been minimized.

lprimak commented Jun 29, 2017

I have started working on PR for Hazelcast for rudimentary tenant support

@vbekiaris

This comment has been minimized.

Contributor

vbekiaris commented Jun 30, 2017

@lprimak this is awesome. I also had some thoughts about it, ping me on https://gitter.im/hazelcast/hazelcast if you want to discuss

@lprimak

This comment has been minimized.

lprimak commented Jun 30, 2017

Let me do some preliminary work and I will do that!
Thanks @vbekiaris

@vbekiaris

This comment has been minimized.

Contributor

vbekiaris commented Jul 3, 2017

Great, I think the key issue here is serialization / deserialization with a proper classloader. Assuming a hypothetical model where each named Hazelcast data structure can be associated with a classloader, then there must be a way for the serialization service to identify that an incoming packet (for example an incoming entry event for a Cache) is meant to be deserialized with the specific classloader associated with that Cache so that domain objects can be resolved. However there is currently no way of having knowledge of the datastructure in advance, rather the datastructure type & name become apparent as the packet is being deserialized, so it's a sort of chicken-and-egg problem.

@lprimak

This comment has been minimized.

lprimak commented Jul 4, 2017

@vbekiaris I just got a working version of the stuff going. Should be able to get a PR soon so we can start discussing and get it going.
Thanks!

@lprimak

This comment has been minimized.

lprimak commented Jul 4, 2017

Ok, I got these PRs:

#10873 and #10874 (one dependent on the other)

Also this is also dependent on couple of issues:
#10727 #10728

@lprimak

This comment has been minimized.

lprimak commented Jul 6, 2017

Motivation for all of this is BJug's JCache issues as well as other customer complaints:
https://gist.github.com/ivannov/906096dcff2100ccd5b2ef11f97362dd

@lprimak

This comment has been minimized.

lprimak commented Jul 6, 2017

There are similar issues with Map loaders,
payara/Payara#1169
they need to have TenantControl as well

@mmedenjak mmedenjak changed the title from JCache fails to initialise when type of a key or value is not available on a remote node to [cache] JCache fails to initialise when type of a key or value is not available on a remote node Jul 11, 2017

@jerrinot jerrinot modified the milestones: 3.10, 3.9 Aug 16, 2017

@lprimak

This comment has been minimized.

lprimak commented Nov 30, 2017

Looks like this issue is fixed in 3.9

@lprimak

This comment has been minimized.

lprimak commented Dec 1, 2017

@smillidge you can close this issue

@Andrew-Gr

This comment has been minimized.

Andrew-Gr commented Feb 27, 2018

@smillidge is workaround in your initial post (to use untyped cache) still the only way for Payara-4.1.2.181, or there is a way to use typed cache available?
What about Payara 5? What version of the Hazelcast is expected to be shipped with Payara 5?
Thank you and sorry for possible off-topic...

@smillidge

This comment has been minimized.

smillidge commented Feb 27, 2018

I haven't tested 181 but it uses Hazelcast 3.8.x
Payara 5 will use Hazelcast 3.9.x but again I personally have not tested this explicitly.

@Andrew-Gr

This comment has been minimized.

Andrew-Gr commented Feb 27, 2018

Thank you for quick response! I have just faced with this issue on Payara-4.1.2.181 with Hazelcast 3.8.5 and had to use workaround with Object.class in MutableConfiguration<String, Object> config = ...
Thus was curious whether more valid solution available out-of-the-box. Will try to test on Payara 5 as well soon. Thank you!

@lprimak

This comment has been minimized.

lprimak commented Feb 27, 2018

We are actively working on fixing this and other related issues in Hazelcast 3.10

@lprimak

This comment has been minimized.

lprimak commented Feb 27, 2018

Hazelcast 3.9 solves this particular issue, when Payara uses 3.9 this issue will be resolved
There are other related issues that are being worked for Hz 3.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment