Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very high CPU load with v0.2.10 #25

Closed
josefzahner opened this issue Sep 7, 2022 · 24 comments
Closed

Very high CPU load with v0.2.10 #25

josefzahner opened this issue Sep 7, 2022 · 24 comments

Comments

@josefzahner
Copy link

As soon as I'm enabling the prometheus homey app, I see a drastically increased CPU load on my homey. Without prometheus I've between 50-100% load (with a few higher peaks), but as soon as I'm enabling prometheus app between 250-300% load! That's way too much and has impact to all my other apps...
Screenshot 2022-09-07 at 21 16 36

@rickardp
Copy link
Owner

rickardp commented Sep 7, 2022

Can you please send a diagnostics report?
Also, can you please install dashboard 10586 and see those graphs to see what is taking time.

I've seen before both that some drivers are spamming device updates but also that the polling loop takes time.

@josefzahner
Copy link
Author

josefzahner commented Sep 7, 2022

I can't create a diagnostic report -> "Entity too large"...
IMG_5AED348ED03A-1

Grafana Dashboard Screenshot, please guide me to the right direction, I've a lot of devices...:
Screenshot 2022-09-07 at 23 03 36

The blue line for "Call rate per device" is the "Sunny Home Manager 2" from the SMA app.

@josefzahner
Copy link
Author

Ok, I retried to create a diagnose report:
515de8e4-130b-42d1-9bbf-d8d1fb3d0e72

@rickardp
Copy link
Owner

rickardp commented Sep 8, 2022

Thanks, I got the report. In the dashboard, there is a device with a really high call rate, (Edit: I now say what you wrote underneath the picture. So the Sunny Home Manager 2 device is probably what is causing this).
I would say in general the issue is that you get up to 25.9 device updates per second, that's a lot of metrics. Also, can you check what the top 3 ones are on "Time spent"?

@josefzahner
Copy link
Author

josefzahner commented Sep 8, 2022

Correct, the highest device from "Call rate per device" is the energy meter for my house and the solar panels. It is a device from the "SMA Energy" app. It generates the following metrics:

Screenshot 2022-09-08 at 22 13 14

"Time spent per device" -> Two of them are old Open Weather Device and one is my Withing scale.
Screenshot 2022-09-08 at 22 06 37

So the question is now, how can we improve the whole thing. 200% more average load is not acceptable for my homey, everything gets slow due to the high load ;-)

EDIT: I've about 130 devices configured.

@rickardp
Copy link
Owner

rickardp commented Sep 8, 2022

I just updated the app with some optimizations to the hot path of the device update callback. I don't expect miracles, but at least there is less code to run during each update.

I think the issue here is that the device gets updated more than 25 times per second. While I could implement some kind of timer back off pattern, I think there is so little code now in the callback that it wouldn't make much of an impact. As far as I know there isn't an API in Homey to limit the update frequency on the listener side, every device update gets passed.

Is there a real need to have this device update this fast? Otherwise a solution might be to request the app developer to update the caps less frequently?

I also note that the polling loop takes a long time. I have had reports of this from other users as well. It seems some Homeys are very slow to respond to some of the APIs for system info. I do not yet know why, further investigation is needed.

@josefzahner
Copy link
Author

With "just updated" you mean the v0.3.1 version?

Which device do you mean by "update so fast", the energy meter? The reason for the updates in 5s interval there is, that you would like to see the energy consumption as fast as possible. Eg. if you use your toast you wanna see that on the energy meter and you don't want to wait 30s...

@rickardp
Copy link
Owner

rickardp commented Sep 8, 2022

Can you please report back if the numbers look the same with the new version? Also, it would be helpful if you could expand the box called "Time spent" to see what is taking the most time.

@josefzahner
Copy link
Author

I've deleted now some unneeded open weather devices and as well the withings scale. I activated again your app with v0.3.1. Let's see in the next few minutes whats happening on the cpu load.

@josefzahner
Copy link
Author

Can you please report back if the numbers look the same with the new version? Also, it would be helpful if you could expand the box called "Time spent" to see what is taking the most time.

I'll do that in a few minutes. Let's wait a bit.

@rickardp
Copy link
Owner

rickardp commented Sep 8, 2022

I've also got around 100+ devices by the way. "Worst" device I see is the Easee load balancer, which peaks at around 0.5 updates per second. After the update I could see the device update time go down by a large percentage, but it is from a very low level.

Interestingly, the system info poll seems to have increased recently, but not tied to an update. I will check it it correlates with a Homey OS update.

@josefzahner
Copy link
Author

Ok - he we go.
Screenshot 2022-09-08 at 23 01 58

Screenshot 2022-09-08 at 23 02 31

Screenshot 2022-09-08 at 23 03 26

@josefzahner
Copy link
Author

I've as well a easee home, but it wasn't connected to my car during the test. 15m load is still at 200%.

So you say one thing is the systeminfo (time spent) and the other culprit will be the "sunny home manager 2" from "call rate per device". An option would be to delete the device for the test - but at the end it doesn't help because i really need this device... to display the energy consumption on my dashboard

@josefzahner
Copy link
Author

josefzahner commented Sep 8, 2022

I've deleted the "Sunny Home Manager 2" device. The deviceupdate call rate went down from 20 to 18.6. Load average is more or less the same. So it doesn't make much difference.

@rickardp
Copy link
Owner

rickardp commented Sep 9, 2022

I think I misread your first picture, sorry about that. The deviceupdate was not 25/sec, that was the stacked total. Then it is more or less the same as mine. I think the main difference between your dashboard and mine is the polling loop (systeminfo). I think I will add some more fine grained metrics on what in that method takes time to try to isolate the problem to which Homey API call is the culprit.

@josefzahner
Copy link
Author

Perfect, thanks a lot for tacking care of it. Any idea's what could cause the long plling loop (systeminfo) time? Shouldn't that be the same for more or less every homey?

@rickardp
Copy link
Owner

rickardp commented Sep 10, 2022

Version 0.3.2 improves self metrics on the polling, and also reduces the storage polling which I suspect could be the culprit. Can be installed as a test version for now https://homey.app/sv-se/app/io.prometheus/Prometheus.io/test/

Shouldn't that be the same for more or less every homey?

It seems to not be the case. I am not sure why. Other people have also reported similar issues, but I have never seen them myself. My suspicion is that it is the call to the getStorageInfo that sometimes cause issues (based on the fact that I can see error reports on this call timing out). Maybe things such as a lot of installed apps, worn-out flash drive or different hardware revisions on the Homey could impact? Anyway, we should see if this is what causes problems for you after this.

Version 0.3.2 also halves the frequency with which getStorageInfo is called, which means if the above is correct, you should see a lower load average

@josefzahner
Copy link
Author

Ok 0.3.2 installed. Load average went down a bit, but is still high:
Screenshot 2022-09-10 at 20 51 32

@josefzahner
Copy link
Author

This is the homey insight graph. Base load is as well massively higher than before enabling prometheus.io. I'm using a real homey Pro (2019), bought end of 2019. About 35 apps installed, but not all enabled. Storage not even half filled.

Screenshot 2022-09-10 at 21 02 18

@rickardp
Copy link
Owner

Thanks for the info. It confirms the storage info call is crazy expensive for some people. Since it is not crucial to update this with high frequency, I am considering setting a higher update interval on this as a solution, at least temporarily until it is possible to figure out why it is so expensive.

@rickardp
Copy link
Owner

0.3.3 significantly lowers the frequency for storage updates. Can you please report how this works, and also please share the update times ("Time spent") for all labels?

@josefzahner
Copy link
Author

v0.3.3 Graph:
Screenshot 2022-09-10 at 21 53 53
Screenshot 2022-09-10 at 21 54 26

@rickardp
Copy link
Owner

Interesting indeed. It looks like the storage info scanning takes ~20 seconds to complete on your device, with a nontrivial amount of kernel time. The API call takes no parameters that lets me control how long this takes. When I first added this feature, it took maybe a second or two on my Homey, but that was on firmware version 2.something. I think the only reasonable thing to do is to leave this interval significantly higher (currently 10 minutes which is what I will keep).

It makes sense that, for example, they added more detailed directory scanning to calculate sizes (which would possibly explain the high kernel time %) in newer versions of the Homey, but I still don't understand why it differs between different Homeys. The only thing I can think of is if some apps are published with significantly more files which makes directory scanning more expensive, but that's just a hypothesis.

Anyway, I will now publish version 0.3.3 and consider this issue to be resolved.

@rickardp
Copy link
Owner

Version 0.3.3 is live, which resolves the high CPU usage by increasing the time interval between polling of storage stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants