Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor use shows 0% and 100% limits after updating to 2023.11 #103298

Closed
Mariusthvdb opened this issue Nov 3, 2023 · 32 comments · Fixed by #111110
Closed

Processor use shows 0% and 100% limits after updating to 2023.11 #103298

Mariusthvdb opened this issue Nov 3, 2023 · 32 comments · Fixed by #111110

Comments

@Mariusthvdb
Copy link
Contributor

Mariusthvdb commented Nov 3, 2023

The problem

since release 2023.11 processor use is suddenly showing 0% and 100% limits, still trying to figure out if this is only in restarts, or also during runtime.

Scherm­afbeelding 2023-11-03 om 12 01 53

in my system this never happened before, and the card below was always showing the limits displaying an actual percentage.

If anything, this is really annoying when trying to show a card like

Scherm­afbeelding 2023-11-03 om 12 20 59

rendering that useless really.

Its seems a bug, because how can usage ever be 0%? The 100% is also unexplained, even upon restart.

What version of Home Assistant Core has the issue?

2023.11

What was the last working version of Home Assistant Core?

2023.10

What type of installation are you running?

Home Assistant OS

Integration causing the issue

system monitor

Link to integration documentation on our website

https://www.home-assistant.io/integrations/systemmonitor/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

https://community.home-assistant.io/t/2023-11-to-do-add-release-title/634647/168?u=mariusthvdb
https://community.home-assistant.io/t/2023-11-to-do-add-release-title/634647/90

@home-assistant
Copy link

home-assistant bot commented Nov 3, 2023

@Mariusthvdb Mariusthvdb changed the title Processor use shows 0% and 100% limits Processor use shows 0% and 100% limits after updating to 2023.11 Nov 3, 2023
@EDelsman
Copy link

EDelsman commented Nov 3, 2023

I see the same, on an RPI4, mostly in the period after a restart. I have no discernible ill effects or performance gain, just curious as 0 indeed seems unlikely., especially during reboot. Left side of the graph is before the update tp 2023.11, right is after.
IMG_0622

@Anto79-ops
Copy link

Interesting, I reported this in beta chat in discord, as I'm definitely seeing the same.

@Mariusthvdb
Copy link
Contributor Author

what was the dev response on that?

@stalakerob
Copy link

Just tried 2023.11.1. Still the same issue.

@zSprawl
Copy link

zSprawl commented Nov 5, 2023

I’m seeing the same.

I had a random battery sensor show 100.0000000001% too so it makes me wonder if we got a round error or something. Just a guess though.

@N3rdix
Copy link
Contributor

N3rdix commented Nov 16, 2023

Maybe this is related to some changes in psutil or its buffer in general? I couldn't see a direct trigger, but couldn't the behaviour be improved according the the psutil docs (it recommends to call the cpu_percent at least a 2nd time to get accurate results)?

Code:

    elif type_ == "processor_use":
        state = round(psutil.cpu_percent(interval=None))

@issue-triage-workflows
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@Mariusthvdb
Copy link
Contributor Author

still seeing this

@gjohansson-ST
Copy link
Member

@Mariusthvdb still seeing exactly what and on which version?

@gjohansson-ST gjohansson-ST self-assigned this Feb 14, 2024
@gjohansson-ST
Copy link
Member

The faulty 0 return should be fixed already but not sure why there should be a fault in returning 100?

@Mariusthvdb
Copy link
Contributor Author

Mariusthvdb commented Feb 14, 2024

I meant, still seeing the exact same behavior form the issue opening post.

regular 0% (even during runtime, so that can not be correct).
the 100% also seems very unlikely, and has never been returned before, unless truly the case in a looping automation or so.....

but not on each and every startup?

Scherm­afbeelding 2024-02-14 om 21 48 57

min and max....

but check this, it shows exactly what is the issue:

Scherm­afbeelding 2024-02-14 om 21 53 35

btw I am running HA OS 2024.2.1

@EDelsman
Copy link

EDelsman commented Feb 14, 2024

2024.2.1 Two reboots below, the second one I took care not to do anything special with HA afterward. The reboot was just for the sake of the graph. The period with weird peeks seems to last way longer than the reboot.
IMG_0657

@stalakerob
Copy link

I'm also seeing CPU use dropping to 0 regularily. Running HA on Proxmox.

cpu_load

@gjohansson-ST
Copy link
Member

The drop to 0 has been fixed but I think perhaps that's not coming until next patch release.
However there is nothing really indicating it should be a problem raising to 100 so not sure what to do about that.

@Mariusthvdb
Copy link
Contributor Author

Mariusthvdb commented Feb 15, 2024

could you please explain why we see this from the mentioned update on, as this was never experienced before?

Asking, because if you might believe we had these 100% before, I would have to state that was not the case.
and it is a remarkable change, breaking backend templates/automations etc.

@gjohansson-ST
Copy link
Member

Regarding the 0 output it came up as another issue and then I looked at it briefly in the past code but mainly on psutil documentation which clearly says and output of 0 is faulty and should be ignored. So that was a bug that has already been solved (which I didn't really look at the reason this was missed during the implementation of the coordinator).

However with the 100 output I mean I don't know as there is nothing so far I could see which would result in this and I don't think ignoring the 100 is the way to go either.

The frequency of the sensor is every 15s (I believe), could you try to change this to 20/30 or something to see what behavior we get?

@EDelsman
Copy link

Not hindered by in depth knowledge of how this works, so I may be totally wrong. But if we have unexpected lows and unexpected highs in a time sampled measurement, then my first thought is that the 0 are points where measurements are missed, that are then counted later on at the unexpected highs. In that case, ignoring the 0's and keeping the highs would raise the average percentage, thus misrepresenting the situation.

This combined with the fact the problem wasn't there before, and that it now only occurs in the hour after a reboot, makes me wonder how that can be. I understand that a reboot causes high cpu usage, but for an hour when there's no actual change to the system?

If0's are expected from the measurement, then why do the 0's only happen in that first hour? Also, for what I've read 0 is expected only for the first measurement because there aren't any samples yet. But it's not something you'd ignore on a regular basis. I can imagine that cpu measurement is hard when the system is very busy, but it kind of defeats the purpose of cpu measurement if that is the case. And after an hour the 0's are gone, together with the highs.

@Mariusthvdb
Copy link
Contributor Author

Mariusthvdb commented Feb 16, 2024

not yet investigated in depth, but latest dev 2024.3.0.dev20240216 makes the processor use entity go unknown for the moments we saw 0% before.

Seems hardly an improvement tbh...
no error in log

Scherm­afbeelding 2024-02-16 om 12 38 12 Scherm­afbeelding 2024-02-16 om 12 38 18

@gjohansson-ST
Copy link
Member

unknown is not unavailable

@Mariusthvdb
Copy link
Contributor Author

Mariusthvdb commented Feb 16, 2024

sorry for that typo. edited that above

hope that is not all of the response though... You do see the issue we're facing here, not sure what else to add now

other than my instance is running for an hour now, and the unknown is still reported

Scherm­afbeelding 2024-02-16 om 13 14 41

@garry0garry
Copy link

garry0garry commented Feb 19, 2024

This combined with the fact the problem wasn't there before, and that it now only occurs in the hour after a reboot,

I think the problem can be checked using this script:

SELECT strftime('%Y-%m-%d %H:%M:%f', states.last_updated_ts, 'unixepoch', 'localtime') AS 'Time', states.state AS 'Value',  states_meta.entity_id AS 'Entity' 
FROM states 
JOIN states_meta ON states.metadata_id = states_meta.metadata_id 
WHERE states_meta.entity_id = 'sensor.processor_use' 
AND strftime('%Y-%m-%d %H:%M:%f', last_updated_ts, 'unixepoch', 'localtime') BETWEEN '2024-02-19 12:30:00' AND '2024-02-19 13:30:00'
AND states.state = 'unknown';

Where:
'2024-02-19 12:30:00' - Home Assistant start time
'2024-02-19 13:30:00' - Home Assistant start time + 60 min

@garry0garry
Copy link

Or this automation that writes to the log:

- alias: CPU load
  trigger:
    platform: time_pattern
    minutes: /1
  action:
    - service: system_log.write
      data:
        message: >-
          CPU load {{ states('sensor.processor_use') }}%.
        level: warning

@gjohansson-ST
Copy link
Member

Hi.
First off I acknowledge the problem and I have the same thing on my prod (but not on dev).
We're not going to roll back but obviously a solution needs to be found somehow.

So it's work in progress to get to the root cause of the issue and get a permanent fix in order to resolve it.

No need to get more posts about reproducing the issue unless there is constructive proposals on how to fix the issue.

Thanks

@gjohansson-ST
Copy link
Member

So long story short psutil became thread aware and since cpu percent (among others) isn't implemented to only run in the main thread it's therefore goes to 0 sometimes which is a false value hence why it's setting unknown as state.

A fix is coming (PR has been linked) which I hope can be managed and implemented shortly.

@garry0garry
Copy link

garry0garry commented Feb 23, 2024

Do I understand correctly that the fix was not included in the 2024.2.3 release?

2024-02-23 23:41:13.532	unknown	sensor.processor_use
2024-02-23 23:41:58.532	5	sensor.processor_use
2024-02-23 23:42:13.532	unknown	sensor.processor_use
2024-02-23 23:42:28.532	14	sensor.processor_use
2024-02-23 23:42:43.533	unknown	sensor.processor_use
2024-02-23 23:43:28.532	5	sensor.processor_use
2024-02-23 23:43:43.532	unknown	sensor.processor_use
2024-02-23 23:45:58.531	7	sensor.processor_use
2024-02-23 23:46:13.533	unknown	sensor.processor_use
2024-02-23 23:46:28.535	8	sensor.processor_use
2024-02-23 23:46:43.532	unknown	sensor.processor_use
2024-02-23 23:47:28.531	7	sensor.processor_use
2024-02-23 23:47:43.532	unknown	sensor.processor_use
2024-02-23 23:48:28.532	8	sensor.processor_use
2024-02-23 23:48:43.531	unknown	sensor.processor_use
2024-02-23 23:49:28.532	8	sensor.processor_use
2024-02-23 23:49:58.533	unknown	sensor.processor_use
2024-02-23 23:50:28.530	8	sensor.processor_use
2024-02-23 23:50:43.532	unknown	sensor.processor_use
2024-02-23 23:51:28.533	8	sensor.processor_use
2024-02-23 23:51:58.531	unknown	sensor.processor_use
2024-02-23 23:52:13.531	8	sensor.processor_use
2024-02-23 23:52:28.532	unknown	sensor.processor_use
2024-02-23 23:52:43.533	8	sensor.processor_use
2024-02-23 23:52:58.530	unknown	sensor.processor_use
2024-02-23 23:53:13.533	8	sensor.processor_use
2024-02-23 23:53:28.531	unknown	sensor.processor_use
2024-02-23 23:53:43.531	7	sensor.processor_use
2024-02-23 23:53:58.532	8	sensor.processor_use
2024-02-23 23:54:13.532	unknown	sensor.processor_use
2024-02-23 23:54:28.531	7	sensor.processor_use
2024-02-23 23:54:43.531	unknown	sensor.processor_use
2024-02-23 23:54:58.530	8	sensor.processor_use
2024-02-23 23:55:13.533	unknown	sensor.processor_use
2024-02-23 23:55:28.531	8	sensor.processor_use
2024-02-23 23:55:43.531	unknown	sensor.processor_use
2024-02-23 23:56:13.531	9	sensor.processor_use
2024-02-23 23:56:28.530	8	sensor.processor_use

@gjohansson-ST
Copy link
Member

As 2024.2.3 was released yesterday and this was fixed like an hour ago then yes, it was n't included.
As beta starts on Wednesday not sure if there will be another patch before releasing 2024.3 so we'll see.

@erkr
Copy link

erkr commented Feb 24, 2024

Thanks for fixing, now just wait for 24.3

@Mariusthvdb
Copy link
Contributor Author

@gjohansson-ST , I do believe the issues are gone! the 0% was already taken out but replaced with unknown. the latest pr was in Dev, which I just installed and have a look:

Scherm­afbeelding 2024-02-24 om 20 31 58

the 100% peaks at startup are gone now too.

(regular processor % also went down significantly, but that has to do with improvements elsewhere...)

nice.

@akicker
Copy link

akicker commented Feb 27, 2024

getting worse in 2024.2.4 mostly unknown!
image

@gjohansson-ST
Copy link
Member

The fix isn't coming until 2024.3 so at this point either turn the sensor off or ignore it. Nothing to do until next release to have this fixed.

@Mariusthvdb
Copy link
Contributor Author

proof!

Scherm­afbeelding 2024-02-27 om 16 23 52

@github-actions github-actions bot locked and limited conversation to collaborators Mar 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.