Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The querying performance drop when running inside OSGI #13785

Closed
ncherkas opened this issue Sep 18, 2018 · 14 comments
Closed

The querying performance drop when running inside OSGI #13785

ncherkas opened this issue Sep 18, 2018 · 14 comments

Comments

@ncherkas
Copy link

@ncherkas ncherkas commented Sep 18, 2018

A problem seems to be related to the Java Reflection info caching implemented in com.hazelcast.query.impl.getters.Extractors:

Getter getGetter(InternalSerializationService serializationService, Object targetObject, String attributeName) { 
    Getter getter = getterCache.getGetter(targetObject.getClass(), attributeName); 
    if (getter == null) { 
        getter = instantiateGetter(serializationService, targetObject, attributeName); 
        if (getter.isCacheable()) { 
           getterCache.putGetter(targetObject.getClass(), attributeName, getter); 
        } 
    } 
    return getter; 
} 

For OSGi environments the given method will be returning false because obviously Hazelcast and the POJOs are in separate bundles, thus having separate class loaders:

boolean isCacheable() { 
    return ReflectionHelper.THIS_CL.equals(method.getDeclaringClass().getClassLoader()); 
} 

It looks like this was introduced with this commit as a fix for: #1503

Please reach out to me for more details.

@GreenRover
Copy link

@GreenRover GreenRover commented Sep 19, 2018

The fix in #1503 makes it wrong. The correct solution is to just use throw away isCacheabke. Use a weak map for caching to avoi memleaks created by on the fly created POJOs

@dbrimley dbrimley modified the milestones: 3.12, 3.11.1 Sep 19, 2018
@dbrimley
Copy link
Member

@dbrimley dbrimley commented Sep 19, 2018

@GreenRover I'd like to get this fix to you sooner, so I've moved this forward to our next patch release after the GA of 3.11.

@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Sep 19, 2018

@GreenRover: I agree the fix is not great. I might make sense when it was written: as the GetterCache used to have just a class (String) name as a key.

This changed a few releases ago and indeed now we can just use weak references to prevents leaks. The possible issue here is a performance regression - this part will require extensive testing.

@eugeneeer
Copy link

@eugeneeer eugeneeer commented Sep 20, 2018

@dbrimley thanks for moving this forward. Do you already know the date when the mentioned patch release will be provided?

@dbrimley
Copy link
Member

@dbrimley dbrimley commented Sep 20, 2018

@eugeneeer All hands are presently working hard to get 3.11 GA.

3.11.1 should be out around November, of course usual caveats apply around this though and things can change...

https://github.com/hazelcast/hazelcast/milestone/118

@GreenRover
Copy link

@GreenRover GreenRover commented Sep 24, 2018

@dbrimley thaks for sheduling this bug. As soon as this is solved and available as nigltly build we would be glad to start testing on our side, to check if our performance problems are gone.

@eugeneeer
Copy link

@eugeneeer eugeneeer commented Oct 2, 2018

@dbrimley will it be possible to provide us a nightly build (once it is available) as mentioned by @GreenRover ?

@ncherkas
Copy link
Author

@ncherkas ncherkas commented Oct 2, 2018

@eugeneeer yes, we will let you know once it's available

@rweingart
Copy link

@rweingart rweingart commented Nov 7, 2018

Are there any news on this issue? Do you have any timeline for it?

@taburet taburet self-assigned this Nov 7, 2018
taburet added a commit to taburet/hazelcast that referenced this issue Nov 7, 2018
@taburet
Copy link
Contributor

@taburet taburet commented Nov 7, 2018

@rweingart @eugeneeer JFYI, I'm working on the fix right now. The fix itself is basically ready, but we have to test its performance, I'm going to do this within next few days.

taburet added a commit to taburet/hazelcast that referenced this issue Nov 9, 2018
@taburet
Copy link
Contributor

@taburet taburet commented Nov 9, 2018

Testing a simple query (age=10 and iq=100, scanning and returning 100k entries) revealed that soft caching is even slightly faster than strong caching:

before-noCache-xmx2g        534.828 ± 1.409  ms/op
before-strongCache-xmx2g     56.367 ± 0.304  ms/op
after-softCache-xmx2g        51.087 ± 0.206  ms/op

before-strongCache-xmx96m    65.623 ± 1.003  ms/op
before-strongCache-xmx128m   60.571 ± 0.233  ms/op
before-strongCache-xmx256m   60.372 ± 0.228  ms/op
before-strongCache-xmx512m   56.490 ± 0.191  ms/op

after-softCache-xmx96m       61.354 ± 1.095  ms/op
after-softCache-xmx128m      56.279 ± 0.142  ms/op
after-softCache-xmx256m      55.741 ± 0.163  ms/op
after-softCache-xmx512m      51.550 ± 0.329  ms/op

Also I have recreated the situation when a class loader, its classes and associated getters are being garbage collected to measure the performance impact:

  1. For a single class loader, there is no chance for garbage collection since all classes are loaded by that single class loader.

  2. For the multiple class loaders scenario, typically there is still some kind of a strong reference keeping the class loader alive even if there are no references to instances of its classes. For OSGi it's a bundle manager instance that keeps references to bundles along with their individual class loaders. If a bundle is unloaded, it basically has no impact on the getter cache performance anymore, so there is nothing to measure.

  3. In the artificially crafted situation when there are no permanent strong references to the class loader, it's still hard to achieve a garbage collection if we are actively performing queries since there are almost always strong references on stacks pointing to instances of class loader's classes. Moreover, JVMs use some kind of a heuristic to avoid garbage collection of softly-reachable objects if they were accessed recently, this also suggested by SoftReference javadoc: "virtual machine implementations are, however, encouraged to bias against clearing recently-created or recently-used soft references". If the cache is switched to the weak mode instead of the soft one, we may observe relatively frequent garbage collections which have almost no impact on the performance: creating a new getter and using it for the first time is about 10 times more costly (in terms of time) than reusing the existing cached one; therefore if the new getter is able to survive for like 1000 usages its construction cost is fully amortized.

  4. Under memory-limited conditions, if we pause query processing for some time while there are no strong references to the class loader, it may be garbage collected, as well as the associated getters. Again, there is no measurable impact on the performance because after resuming the processing we pay getters construction cost only once and then reuse the getters for a long time.

Looks like the only scenario that may suffer from the new soft cache is performing infrequent queries selecting very few entries under very limited memory conditions. But in that scenario we will have performance problems anyway because of spending too much time doing GC.

taburet added a commit to taburet/hazelcast that referenced this issue Nov 12, 2018
Previously, field and method getters were not cached for classes loaded
by a class loader different from the one by which HZ classes were
loaded. In a multiple class loaders environment (e.g. OSGi) this caused
a slowdown in the full-scan queries execution by about 10 times.

This change fixes that by utilizing soft references to simultaneously
allow the caching for such classes and their garbage collection.

Fixes: hazelcast#13785
@GreenRover
Copy link

@GreenRover GreenRover commented Nov 21, 2018

Hi,

is this issued done? Do you have any information when the release 3.11.1 will be? Or if we can get a nightly build?

@eugeneeer
Copy link

@eugeneeer eugeneeer commented Nov 22, 2018

Hi, since the release 3.11.1 is being posponed to the Mid-December we need a nightly build with the fix of this bug. When can you provide it?

@taburet
Copy link
Contributor

@taburet taburet commented Nov 22, 2018

@GreenRover @eugeneeer the PR will be merged within next few days, we will provide you with a nightly build either this week or on the beginning of the next week.

taburet added a commit that referenced this issue Nov 22, 2018
Previously, field and method getters were not cached for classes loaded
by a class loader different from the one by which HZ classes were
loaded. In a multiple class loaders environment (e.g. OSGi) this caused
a slowdown in the full-scan queries execution by about 10 times.

This change fixes that by utilizing soft references to simultaneously
allow the caching for such classes and their garbage collection.

Fixes: #13785
taburet added a commit to taburet/hazelcast that referenced this issue Nov 22, 2018
Previously, field and method getters were not cached for classes loaded
by a class loader different from the one by which HZ classes were
loaded. In a multiple class loaders environment (e.g. OSGi) this caused
a slowdown in the full-scan queries execution by about 10 times.

This change fixes that by utilizing soft references to simultaneously
allow the caching for such classes and their garbage collection.

(cherry-picked from d2f5563)

Fixes: hazelcast#13785
taburet added a commit that referenced this issue Nov 22, 2018
Previously, field and method getters were not cached for classes loaded
by a class loader different from the one by which HZ classes were
loaded. In a multiple class loaders environment (e.g. OSGi) this caused
a slowdown in the full-scan queries execution by about 10 times.

This change fixes that by utilizing soft references to simultaneously
allow the caching for such classes and their garbage collection.

(cherry-picked from d2f5563)

Fixes: #13785
blazember added a commit to blazember/hazelcast that referenced this issue Dec 11, 2018
Previously, field and method getters were not cached for classes loaded
by a class loader different from the one by which HZ classes were
loaded. In a multiple class loaders environment (e.g. OSGi) this caused
a slowdown in the full-scan queries execution by about 10 times.

This change fixes that by utilizing soft references to simultaneously
allow the caching for such classes and their garbage collection.

Fixes: hazelcast#13785
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

8 participants