Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Brain messages still occupies memory after recovery #10325

Closed
sjursky opened this issue Apr 12, 2017 · 2 comments
Closed

Split Brain messages still occupies memory after recovery #10325

sjursky opened this issue Apr 12, 2017 · 2 comments

Comments

@sjursky
Copy link

@sjursky sjursky commented Apr 12, 2017

Hi,
we have Hazelcast v3.8 cluster with around 80 nodes distributed over 4 servers (2x24,2x16)
Usually the cluster breaks down during server-by-server apps redeployment and recovers after a while.

When we analyzed our last heap dump (~97% full), we found ~66k messages of com.hazelcast.internal.cluster.impl.SplitBrainJoinMessage allocated in 616MB. That's a lot.

Each message occupied 9kB and most of the message (8KB) consists of memberAddresses ArrayList<com.hazelcast.nio.Address> of 80 items (=number of nodes).

Those messages cannot even be GarbageCollected because of a reference from thread com.hazelcast.internal.partition.impl.MigrationThread.

We also noticed this behavior without server restart. After 26h server uptime there were 18k SplitBrainJoinMessages taking 167MB on heap. No restart was done, only some heavy tests and there are no Hazelcast errors in logs other than "MonitorInvocationsTask/BroadcastOperationControlTask delayed" right before heap dump creation.

What can we do to somehow debug this, or cleanup those messages from heap?
If you need more information please let me know.

Thanks in advance.
Regards

Stanislav

Environment:
Hazelcast 3.8
Cluster of 80 members
0 clients, each web application acts as a cluster member.
JDK7u75
hazelcast JVM related params: -Djava.net.preferIPv4Stack=true -DscMulticastAddress=231.12.21.132 -DscMulticastPort=50000 -DscIpTtl=1
OS: Oracle Linux 6.1

@mdogan
Copy link
Contributor

@mdogan mdogan commented Apr 12, 2017

Hi @sjursky,

Can you please attach a screenshot from heap dump analysis which shows how MigrationThread keeps references to SplitBrainJoinMessages? I'm still struggling to figure out how, a concrete sample will make its easier for us.

@sjursky
Copy link
Author

@sjursky sjursky commented Apr 12, 2017

Hi,
using memory analyzer merge paths to GC roots
merge_paths

I will try to find out that MigrationThread
currently some screenshots from 18k messages heap is showing com.hazelcast.spi.impl.operationexecutor.impl.PartitionOperationThread

reference

and from 66k messages java.lang.Thread hz._hzInstance_1_UNIUS_APP.MulticastThread

reference_multicastthread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants
You can’t perform that action at this time.