-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too much cpu used when Hazelcast is idle V 3.6.2 #7943
Comments
What type of serialization do you use? Are there any complex objects involved? I guess you utilize Hazelcast Session Replication or just Shiro? |
I am using java serialization, its not complex object its a simple object having few fields about user details. |
Any chance to share some testcode that reproduces the issue? If you can't share it privately you could also send it via mail. |
Ok I will send through mail. |
Hi @noctarius I have sent mail to you please verify and let me know if you need any more help. |
@noctarius It has become a block for us, any config changes I need to do?? |
@noctarius please switch to 3.7 and add the following:
This should give detailed information of what is happening; especially it should give some insights in operation executions. @NiranjanBS I'm not that familiar with the tool you are using. It doesn't need to mean that Hazelcast has high cpu usage; it can also be that the whole system isn't doing much, and the it seems that threads that do a little bit of stuff, are consuming most of the cpu time; but this is all very relative. Imagine a room of super lazy programmers, that sit 1 minute a week behind there keyboard and the rest of the time they are playing footbal. If one of these programmers sits behind his keyboard 2 minutes, he seems to be overloaded compared to his colleagues, but effectively he is only busy a fraction of the time. If you look at one of the graphs, you see that the 'acceptor thread' is doing a lot of work just like the regular io threads. But it is very unlikely that the acceptor thread is doing work all the time; so my guess is that the system is quite idle. But perhaps I'm wrong. Can you check with e.g. htop and see if your cores are really busy? |
@pveentjer @noctarius Today I tested by running only cache server there was no client connected and nothing else was running but still after 2 hours cpu is taking 100%. And once it goes to 100% it will never come down. |
@NiranjanBS thanks for the update, the CPU usage comes from Tomcat process so it looks like there's something wrong on the Tomcat/Shiro/Hazelcast-client side (if I remember correctly you mentioned you run Hazelcast in client-server mode, is this accurate?). Can you please provide more details on your use case so we can setup a lab to reproduce the issue?
|
We have one dedicated web application in which we initialize the cache and load the userdetails and shiro session string, we use hazelcast clients to interact with hazelcast server. In the above scenario we had just started the cache application without starting any other clients, basically there was no client interaction, only on startup data was loaded to the map through mapstore. Shiro Version: 1.2.3 |
@NiranjanBS it would be great if you could also email me the code. |
@vbekiaris |
@vbekiaris @noctarius @pveentjer I tested by creating simple cache server with out connecting any client, still I am facing same issue. We were supposed to go live next week, but hazelcast has become big block for us. Versions are as follows. I have attacked the complete source code here. |
@NiranjanBS would it be possible to switch to 3.7-SNAPSHOT and add the following parameters:
And rerun your test? This should give us a lot more detailed information what is happening. What confuses me is why only the io threads are doing something. The other thing which confuses me is why the acceptor thread is doing so much work according to your screenshots. Once the connections are established; it should be dormant. Dumb question; do you make new HZ client instance all the time? |
@pveentjer We have already added the mentioned parameters (3.6.2) while running the test. PFA jvm option for your reference. |
These properties only work with 3.7. I have improved the performance logging system in 3.7 to expose a lot more info. |
@pveentjer do you make new HZ client instance all the time?
|
@NiranjanBS no; creating new clients would be disastrous for your performance :) I'm looking forward to the log file. What I find strange is that according to the following graph: is the following: Anyhow... the performance log files will provide some insights in what is happening. Especially on the networking level. |
I have started testing on 3.7 by setting -Dhazelcast property to tomcat. |
you should find these new files in the working directory of your application. See 'user.dir' from System properties. Files should be called performance-xxx.log and performance-client-xxx.log |
Ok It is generating performance-xxx.log files. Give me some time, once CPU hits 100% I will share them here. |
keep it running for at least 10 minutes after you get the problems. |
ya sure. |
I have been running the sample code on Windows 7, Tomcat 8.0_30, JDK 1.8.0_77 with Hazelcast 3.6.2 for more than an hour and I don't see any CPU usage on tomcat process. Perhaps there is something else in your environment causing the additional CPU usage? Let's wait for the 3.7 performance logs and see if we find anything interesting there. |
@pveentjer @vbekiaris @noctarius With 3.7-SNAPSHOT also I am facing same issue. Please find the performance-xxx.log file and screenshot attached below. Even heap memory is increasing continually you can see in screen shot. |
Ya sure I will check. |
@vbekiaris |
Great, I think that if you fire up a visualvm/mission control and check cpu usage, the acceptor thread must be the one causing the excess usage while in/out threads should now be at the bottom end. Can you please confirm? Also, once you conclude your test, can you share the performance log? |
Hi @vbekiaris , yes you are right now acceptor thread is the one which is taking more CPU. |
@vbekiaris @pveentjer @noctarius |
Hi @NiranjanBS , we are working on the final fix for this one, which will address excess object allocation (this was introduced as a side-effect of the temporary workaround you are now testing) and also introduce the workaround in acceptor thread as well. |
Hi @NiranjanBS , would you be so kind as to run your test again with the attached Hazelcast binary? hazelcast-3.7-SNAPSHOT.jar.zip. Don't forget to add (or just keep if you already have it) |
ya sure we will check. |
@NiranjanBS, any update with such issue ? |
Hi, I am running test from 4-5 days, there is no high CPU usage, but still am facing Memory issue as you can see in attachment. @vbekiaris We have done workaround for both server and client selector threads or only for server, because I have seen client selector threads also using high CPU. https://cloud.githubusercontent.com/assets/9413835/14551483/68765360-02f1-11e6-9ae2-cbf598824036.png |
Hi @NiranjanBS , a couple of questions:
The workaround is currently only implemented on the server side, so if you see client side IO threads with high CPU usage, we'll need to apply the workaround there as well. |
Dear all, as there's a period of time before hazelcast3.7 released, @vbekiaris, @pveentjer, does next stable version 3.6.3 solve such problem ? |
any update with such issue ? |
@NiranjanBS, @bwzhang2011 the fix is now merged in master and a backport will be available in 3.x branch shortly. In order to enable the workaround, you need to supply system property @NiranjanBS thanks for your help in locating and fixing this. It would be great if you could use 3.7-SNAPSHOT to run a test with following system properties:
and attach here your performance.log after the test. I guess a 3.7-SNAPSHOT build with the fix included should be available tomorrow on Hazelcast maven snapshot repository. |
@vbekiaris @pveentjer @noctarius @bwzhang2011 @NiranjanBS -- Is the issue resolved, we are facing similar issue with 3.6.2, is there a later version of Hazelcast with the fix for this issue? |
Hi @SomannaDC , yes, it is fixed in 3.6.3 (by #8154) and 3.7 (by #7998). If you believe you are hit by the same JVM/network stack bug, then you should start hazelcast adding the system property If after upgrading & testing with this system property enabled you still see high cpu usage while idle, then please create a new issue for further investigation. |
Hi I've got the same problem, when very simple project (from the comment of related issue) after 2 hrs (+/- few minutes) got 100% CPU usage Env: Windows 10, jdk build 1.8.0_171-b11, tried also with jdk v71, v161 In debug there is next info, if compare data before and after CPU overhead: Any help on the problem? Or any advise what to debug else or try to change? |
@Linkedz did you try this advice in the comment above? This property enables the selector issue workaround. |
Just now tried and it helped to fix the problem, thanks! |
nice, thanks for the heads up @Linkedz |
As requested by @noctarius creating new ticket.
Am using 3.6.2 new release still taking high CUP usage.
Verified in java 1.7.0_75 and 1.8.0_77.
Tomcat 7.0.68 (java 1.7),Tomcat 8.0.30 (java 1.8).
OS Windows 7.
Use case - We are using apache shiro as security framework and hazelcast is beeing used to store user session in hazelcast map. We are observing high cpu when no user has logged in for around 3 hours.
hazelcast.txt
The text was updated successfully, but these errors were encountered: