-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check of ClassLoader when putting objects in to a cache #17
Comments
How does that work with the Thread context classloader as that is typically used in JEE containers to try and load classes and to obtain the correct classloader based on context. |
In most JEE cases, the thread's context ClassLoader would be a ClassLoader associated with a deployed application and the expectation is that your CacheManager is associated with a particular application and its ClassLoader. So, this scheme should work by default in JEE and prevent obscure errors. It would be interesting if you can come up with a case where it would not work. |
I'm not sure a CacheManager will be associated with a specific application. Especially in the case of a long lived DataGrid where we may have many applications accessing the same data in the grid. I suppose it could be the case that the way to get to the grid will be via an application loaded cache manager but I'm not sure it will always be true. I can envisage creating a CacheManager and placing in JNDI in the JEE case which can then be accessed by many deployments. I think what I was trying to get at is that a Cache can just use the Thread Context ClassLoader to deserialize data out of the cache but that context ClassLoader doesn't have to be the ClassLoader of the CacheManager itself. Steve Millidge Providing the foundations for Enterprise Scale Java. T: 08450 539457 -----Original Message----- In most JEE cases, the thread's context ClassLoader would be a ClassLoader associated with a deployed application and the expectation is that your CacheManager is associated with a particular application and its ClassLoader. So, this scheme should work by default in JEE and prevent obscure errors. It would be interesting if you can come up with a case where it would not work. Reply to this email directly or view it on GitHub: |
Steve, I have been battling with these questions wrt to a well known data grid product. As for a data grid, it is very true that one of the most important aspects is the ability to share data with multiple applications. As powerful as that ability is, as error prone and complicated it can be in the light of multiple ClassLoaders. There is nothing stopping an implementation of JSR 107 to hand out a distinct CacheManager instance for each application and handling the sharing with other applications and a data grid as an implementation detail. And this particular proposed feature would enable a decent level of error checking. For a single JEE server, a "shared" data grid really would like to be hosted in a "base" ClassLoader to all applications sharing the data grid. Ideally, there would be a deployment module for the data grid describing the cache semantics and the classes used for keys and values which serves as a dependency for the JEE modules that would access the data grid . I'm thinking such a concept is likely out of scope for JSR 107 and JEE 7. In the scenario you describe, you would create a CacheManager in one JEE app, put it in JNDI such that another JEE app can get hold of it. Then the only "safe" classes you could put in that cache would be classes in a shared ClassLoader to the two JEE apps, i.e. the system ClassLoader. If you put any other class you are at the mercy of the applications ClassLoader and an arbitrary alignment of classes between different ClassLoaders. Unfortunately, this represents best practice for several data grid products out there. I'm all for trying to create a flexible, yet consistent way of handling these issues in JSR 107. The check I proposed in this issue is all about ensuring consistency and predictability in the light of multiple ClassLoaders. Looking forward to your further comments! Thanks, |
Christer, I agree Classloading in data grids is much harder as you need to support Entry Processors, Parallel Queries etc. all of which have to operate on the deserialized data in the data grid node away from the application JVM. Therefore the classes in the grid need to be accessible from the grid's classloader. However for the JSR 107 case I can't see any problem first attempting to use the Thread Context Classloader to deserialize an object when returning from it from a get and then using the CacheManager's classloader if that fails. Therefore I think this check is too restrictive. Steve Millidge Providing the foundations for Enterprise Scale Java. T: 08450 539457 -----Original Message----- Steve, I have been battling with these questions wrt to a well known data grid product. As for a data grid, it is very true that one of the most important aspects is the ability to share data with multiple applications. As powerful as that ability is, as error prone and complicated it can be in the light of multiple ClassLoaders. There is nothing stopping an implementation of JSR 107 to hand out a distinct CacheManager instance for each application and handling the sharing with other applications and a data grid as an implementation detail. And this particular proposed feature would enable a decent level of error checking. For a single JEE server, a "shared" data grid really would like to be hosted in a "base" ClassLoader to all applications sharing the data grid. Ideally, there would be a deployment module for the data grid describing the cache semantics and the classes used for keys and values which serves as a dependency for the JEE modules that would access the data grid . I'm thinking such a concept is likely out of scope for JSR 107 and JEE 7. In the scenario you describe, you would create a CacheManager in one JEE app, put it in JNDI such that another JEE app can get hold of it. Then the only "safe" classes you could put in that cache would be classes in a shared ClassLoader to the two JEE apps, i.e. the system ClassLoader. If you put any other class you are at the mercy of the applications ClassLoader and an arbitrary alignment of classes between different ClassLoaders. Unfortunately, this represents best practice for several data grid products out there. I'm all for trying to create a flexible, yet consistent way of handling these issues in JSR 107. The check I proposed in this issue is all about ensuring consistency and predictability in the light of multiple ClassLoaders. Looking forward to your further comments! Thanks, Reply to this email directly or view it on GitHub: |
While it is reasonable for the "by value" case to as a best effort use the TCCL, it will for the "by reference" case lead to potential ClassLoader leakages to allow objects from a different ClassLoader. In the current reference implementation, serialization and de-serialization will occur using the ClassLoader associated with the CacheManager. Making sure the object you put there belongs to the same ClassLoader is not a far stretch IMHO. /Christer |
I'm just concerned about the implications for when we look at integration with JEE. I may be wrong but the implication for JEE is that CacheManager's must be deployed with the application that uses them rather than as a container level resource. The use case I'm thinking about is an application using a CacheManager backed by a large memcached cluster. I could envisage configuring this as a Container level CacheManager available via JNDI like a datasource. With multiple apps pushing data into memcached. With the cache using TCCC to save and load data. ------Original Message------ While it is reasonable for the "by value" case to as a best effort use the TCCL, it will for the "by reference" case lead to potential ClassLoader leakages to allow objects from a different ClassLoader. In the current reference implementation, serialization and de-serialization will occur using the ClassLoader associated with the CacheManager. Making sure the object you put there belongs to the same ClassLoader is not a far stretch IMHO. /Christer Reply to this email directly or view it on GitHub: Sent from my BlackBerry® wireless device |
What is proposed by Christer seems sensible. We are allowing a class loader to be set when a CacheManager is created. See the Caching class. Quoting from the Javadoc:
Now isn't all that Christer is proposing is to check the keys and values on the way in? It seems to me that this honours the fail-fast principle and will prevent errors. I propose we add the following to Cache methods which accept puts: "Both the key and value must be loadable by the CacheManager's class loader otherwise a CacheException is thrown." On Steve's points about what happens when no class loader is defined, we still define one for the CacheManager, or we were planning to. We currently allow an implementation one of two options:
We probably should lock this down. Do you have a view on what it should be? |
I think the way to address this is to consider what classloader we would want to use if we were specifying it explicitly. Going forward, Oracle is introducing the concept of a GAR, analogous to that of a WAR to make the class loading behavior explicit. I don't understand how placing the cache(manager) at the system level would work. If application 1 stores Class1 defined in its WAR, what would happen if application 2 did a get on that object? Particularly in an EE environment it seems to me that all applications that want to share a cache must have all classes stored in that cache in their classloader. Incidentally, if store by value is in use, the classes could be (consistent) copies of each other (i.e. not from a same parent classloader) but if using by reference they must be in the same parent classloader. |
I'm very skeptical about both the value and correctness of this check. Assuming I understand Christer's original proposal he would like a method
Enforcing a requirement like this would seem to force all implementations to make a trade-off between the runtime cost of this check and the number of originally functional use cases they subsequently preclude. This doesn't seem to be a fair decision to place in the implementors lap. I'm much more comfortable with not performing this check, ruling out no use-cases on these grounds and instead failing on retrieve/deserialization when necessary. |
Chris, I understand your objections; do you have an alternate strategy that would help prevent leaking ClassLoaders in an application server environment? Thanks, |
I don't really have a workable strategy for that. I'm sure there are many potential things that could be done to prevent these kind of memory leaks but it seems to me that all of them are going to constrain other people who want to do something eminently sensible with the API. |
I'm closing this issue right now. If someone can come up with (in a timely manner) an implementable and useful (non restrictive on reasonable use cases) approach here then we can re-open and reconsider. |
I'd suggest you list the reasonable use cases and the use cases where care should be taken, (i.e when sharing objects/caches between applications deployed in an appserver). |
Ok with me to close. |
I propose that we limit puts so that keys and values that are put in to a cache owned by a particular CacheManager must be loadable through the ClassLoader associated with the CacheManager in question - at a minimum for the byValue variant of caches.
Here is a rough sketch:
checkValidClassLoader(Object.getClass().getClassLoader()) would be called for the key and the value being put.
boolean CacheManager.checkValidClassLoader(ClassLoader classLoader)
{
ClassLoader current = getClassLoader();
if (current == classLoader) return true;
while ((current = current.getParent()) != null)
{
if (current == classLoader) return true;
}
return false;
}
The text was updated successfully, but these errors were encountered: