Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Faster Map Statistics #5905
Management Center is unstable in showing cluster view. I started a 3 members cluster (each member on different machine), connected with Management Center running on another machine and started loading cluster with data (only imap.set calls).
Aim was to load 30 million entries, amounting to ~22GB in total.
15 million into loading process and ManCenter shakes - 2 out of 3 nodes disappear for a few seconds, a red warning sign flashes at the top of the screen that says - " Instance Connection Warning: xxxxxxxxx". Please see attached screenshot MC12.
Clicking on Map and the entire table of statistics is blank, I stayed on this screen for half a minute but nothing changed. See attached screenshot MC13.
After few more minutes, all three nodes came back on Home page, see MC14.
Few seconds later, another member disappears from the Map screen and red warning starts flashing again, see MC16. After a few more seconds, everything disappears, see MC17.
And Management Centre stops showing anything from this point onwards, even on Home, see MC18.
Do note that all this while, all the members were very much connected with each other, no disconnections, highly stable network (everything normal in logs). The machine that runs Management Centre was also in the same network as the member machines. Important thing is that Management Centre could never recover from this situation again.
I have also attached some of the Health Monitor logs from all the members when the members were keep going out and coming into ManCentre.
Reproduction is very easy - just start a 3 nodes cluster and load 30 million entries of 1KB.
Two more issues with Management Centre:
Only started 6 server nodes with partition count set to 4099.
For the first part, it was a known issue (reported before #4895) but never tested with this much entry count.
This is not about management center, this is about how we collect map statistics. Currently, some map stats (like cost, last access time) is calculated by traversing map entries. So calculation time exceeds the time interval reserved for management center state sending thread when entry count is too high.
@bilalyasar has done some tests and saw that we support around 10 million entries before the fix. (emrahkocaman@c6d3195).
Map stats calculation mechanism needs to be changed to fix this issue completely and I think this requires a PRD and can't be fixed within the scope of this issue. So for now, we can say that ~27 million entries is the upper limit.
For the second part, @bilalyasar is trying to reproduce your case, will update here once we succeed.
Traversing that much content is potentially causing a performance hit on the system since one or more cores will be busy. Don't know which thread this is done one, but if it is done on a partition-thread, this thread can be blocked for a long time and this can cause throughput/latency issues.
@wildnez ,for the case
I reproduced your case and discussed with @emrahkocaman.
When we try
Also i tried with another application that an empty java application. When i set