Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client side OOME with Async operations 3.8-SNAPSHOT #9665

Closed
Danny-Hazelcast opened this issue Jan 17, 2017 · 6 comments
Closed

client side OOME with Async operations 3.8-SNAPSHOT #9665

Danny-Hazelcast opened this issue Jan 17, 2017 · 6 comments

Comments

@Danny-Hazelcast
Copy link
Member

@Danny-Hazelcast Danny-Hazelcast commented Jan 17, 2017

while making async operations test, I got a client side GC Overhead limit exceeded, OOME http://54.87.52.100/~jenkins/workspace/temp/HzClient460HZ-hprof.zip

One instance of "com.hazelcast.util.executor.LoggingScheduledExecutor" loaded by "sun.misc.Launcher$AppClassLoader @ 0xe6fae438" occupies 327,143,344 (97.65%) bytes. The memory is accumulated in one instance of "java.util.concurrent.RunnableScheduledFuture[]" loaded by "".`

so it looks like in hat master, we have a client side OOME,

member side seams ok.

i can only get this result with a full on 2000 client 3 member,

@tkountis tkountis assigned tkountis and unassigned tkountis Jan 17, 2017
@degerhz degerhz added this to the 3.8 milestone Jan 19, 2017
@asimarslan asimarslan self-assigned this Jan 24, 2017
@Danny-Hazelcast
Copy link
Member Author

@Danny-Hazelcast Danny-Hazelcast commented Jan 25, 2017

async near cache test result in client size hprof. and hs_err jvm crash and core

http://54.87.52.100/~jenkins/workspace/temp/HzClient74HZ-nearCache.hprof.zip

One instance of "com.hazelcast.util.executor.LoggingScheduledExecutor" loaded by "sun.misc.Launcher$AppClassLoader @ 0xe72530d8" occupies 244,743,832 (58.78%) bytes. The memory is accumulated in one instance of "java.util.concurrent.RunnableScheduledFuture[]" loaded by "".

200 instances of "com.hazelcast.internal.nearcache.impl.invalidation.RepairingHandler", loaded by "sun.misc.Launcher$AppClassLoader @ 0xe72530d8" occupy 54,078,400 (12.99%) bytes.

and http://54.87.52.100/~jenkins/workspace/temp/nearCacheAplStyle-HzClient60HZ-core-hsErr-hprof.zip

@Danny-Hazelcast
Copy link
Member Author

@Danny-Hazelcast Danny-Hazelcast commented Jan 25, 2017

i see -Dhazelcast.client.max.concurrent.invocations=100 is working and that

com.hazelcast.core.HazelcastOverloadException is thrown

and handled by the client test code before making another async call

@Danny-Hazelcast
Copy link
Member Author

@Danny-Hazelcast Danny-Hazelcast commented Jan 25, 2017

Asım Arslan
My initial findings show that -Dhazelcast.client.max.concurrent.invocations=100 not working properly. I can reproduce backpressure is not working and causing OOME.

@asimarslan
Copy link
Member

@asimarslan asimarslan commented Jan 26, 2017

After some analysis we can reproduce that client backpressure is broken. I'll link this issue to original issue and continue from there.

Serialization, near cache and jcache statistics make problem apparent.

#8568

@asimarslan asimarslan modified the milestones: 3.8.1, 3.8 Jan 27, 2017
@bwzhang2011
Copy link

@bwzhang2011 bwzhang2011 commented Feb 21, 2017

@asimarslan, any update with such issure ?

sancar added a commit to sancar/hazelcast that referenced this issue Mar 13, 2017
CallIdSequence will be completed to accept next invocation only when
the callbacks running on internal executors are completed.

A second change made to achieve back pressure safely. If response
is already available when andThen is called with an internal callback.
Then internal callback runs on calling thread instead of executor.
Since it is already not permitted to do any blocking call in internal
threads, this will achieve a natural backpressure.

fixes hazelcast#9665
fixes hazelcast#8568
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

6 participants
You can’t perform that action at this time.