Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ring buffer OutOfMemoryError: GC overhead limit exceeded #10189

Closed
Danny-Hazelcast opened this issue Mar 30, 2017 · 5 comments
Closed

Ring buffer OutOfMemoryError: GC overhead limit exceeded #10189

Danny-Hazelcast opened this issue Mar 30, 2017 · 5 comments

Comments

@Danny-Hazelcast
Copy link
Member

@Danny-Hazelcast Danny-Hazelcast commented Mar 30, 2017

version 3.8.1

this ring buffer test has failed for the first time with an OOME,
we have has successful runs of this test, before, using the same version 3.8.1
I thing the difference, is this run is 15 mins longer than previous, at a duration of 30 mins.

bulid
https://hazelcast-l337.ci.cloudbees.com/view/stable/job/stable-All/40/console

/disk1/jenkins/workspace/stable-All/3.8.1/2017_03_29-23_49_29/stable/ring Failed

http://54.163.63.218/~jenkins/workspace/stable-All/3.8.1/2017_03_29-23_49_29/stable/ring

Fail oome

HzMember1HZ _ring_ring_read hzcmd.ring.ReadOne threadId=1 java.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit exceeded 

8 boxes, aws-ec2 c4.2xlarge

D4M4C8 (dedicated member box 4, members 4, client 8)

memberOps "-Xms2G -Xmx2G"
clientOps "-Xms200M -Xmx200M"

Hprof java.lang.OutOfMemoryError: GC overhead limit exceeded

http://54.163.63.218/~jenkins/workspace/stable-All/3.8.1/2017_03_29-23_49_29/stable/ring/HzMember1HZ.hprof.zip
or
http://54.163.63.218/~jenkins/workspace/stable-All/3.8.1/2017_03_29-23_49_29/stable/ring/output/HZ/HzMember1HZ/HzMember1HZ.hprof

GC charts
http://54.163.63.218/~jenkins/workspace/stable-All/3.8.1/2017_03_29-23_49_29/stable/ring/gc.html

Member1 GC chart
image

Test Config
https://github.com/hazelcast/hzCmd-bench/tree/zeta/lab/hz/stable/ring

all though is't named async we do wait for the result after the call.
https://github.com/hazelcast/hzCmd-bench/blob/zeta/src/main/java/hzcmd/ring/AddAsync.java#L26
https://github.com/hazelcast/hzCmd-bench/blob/zeta/src/main/java/hzcmd/ring/ReadOne.java

looks like build up of add msgs

screen shot 2017-03-30 at 18 30 37

@tkountis
Copy link
Contributor

@tkountis tkountis commented Mar 30, 2017

ringbuffer-mat

this seems to be the store of the ringbuffer, and not the ringbuffer itself. I checked the Store implementation and its a plain HashMap, which just accumulates stuff (https://github.com/hazelcast/hzCmd-bench/blob/zeta/src/main/java/hzcmd/ring/Store.java), so the OOM is justified and its outside our jurisdiction

@Danny-Hazelcast please confirm, and update issue accordingly.

@tkountis tkountis self-assigned this Mar 30, 2017
@Danny-Hazelcast
Copy link
Member Author

@Danny-Hazelcast Danny-Hazelcast commented Mar 31, 2017

also there are 500 ring buffers used in the test

@tkountis
Copy link
Contributor

@tkountis tkountis commented Mar 31, 2017

We discussed this offline, the load is used when a sequence is not found in the ringbuffer so it attempts to load the item from the back-end store. This happens when a reader falls behind and your producer is significantly faster. So in your use-case, your producer / consumer are pretty much in sync, so the load shouldn't be called, and you still accumulate items. Ideally, a store should be a disk based store, eg. a database, you could do some batching to prevent hitting the DB every time you add an item, but you wouldn't have to keep everything in memory. Having said that, a store's implementation is not under our control, so one can do anything he wants.

@Danny-Hazelcast
Copy link
Member Author

@Danny-Hazelcast Danny-Hazelcast commented Mar 31, 2017

next version of the store

https://github.com/hazelcast/hzCmd-bench/blob/zeta/src/main/java/hzcmd/ring/Store.java

better looking gc
http://54.163.63.218/~jenkins/workspace/stable-x/3.8.1/2017_03_31-12_06_09/ring/gc.html

member1 gc
image

and charts

_ring_add throughput
image

ring add 99 latency
image

ring read throughput
image

ring read 99 latency
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.