Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU and memory usage under 0.108.0 #33866

Closed
McGiverGim opened this issue Apr 9, 2020 · 82 comments
Closed

High CPU and memory usage under 0.108.0 #33866

McGiverGim opened this issue Apr 9, 2020 · 82 comments
Labels

Comments

@McGiverGim
Copy link

The problem

Since the installation of HA version 0.108.0 the CPU and memory usage in my system grows. I have restarted several times (HA and the raspberry) but this does not fix nothing.

This graph shows the difference from 0.107.7 to 0.108.0:

image

You can see the low CPU in blue and the memory stable at about 30% in red until the installation of 0.108.0. Since then the memory and the CPU grows in the time. I have restarted it several times without luck, as you see in the graph.

My first idea is to go back to 0.107.7 but I prefer to comment this here because maybe there is a bug and I can help giving information.

Environment

  • Home Assistant Core release with the issue: 0.108.0
  • Last working Home Assistant Core release (if known): 0.107.7
  • Operating environment (Home Assistant/Supervised/Docker/venv): Hass.io
  • Integration causing this issue: Unknown
  • Link to integration documentation on our website: Unknown

Problem-relevant configuration.yaml

I don't know what to attach. My complete configuration?

Traceback/Error logs

I can't see any error, only the Brother that can't find the powered off printer, but this was the same in 0.107.7.

2020-04-09 08:39:56 ERROR (MainThread) [homeassistant.components.brother] Error fetching brother data: No SNMP response received before timeout
2020-04-09 08:39:56 WARNING (MainThread) [homeassistant.config_entries] Config entry for brother not ready yet. Retrying in 80 seconds.

Additional information

I have some custom integration, but they were there in earlier versions too:

  • badnest
  • nodered
  • hacs
@andriej
Copy link
Contributor

andriej commented Apr 9, 2020

It's high probably that nodered is leaking.

@McGiverGim
Copy link
Author

I'm doing tests disabling the custom integrations.
A docker stats command shows both, nodered and homeassistant growing in memory.
If I stop nodered, homeassistant stops to grow too. The CPU continues high with or without nodered.
So it seems nodered is the culprit (the most probable) or some sensor produced by nodered causes this in the system.
The strange thing is that this happens since the update to 0.108.0 it was not there in 0.107.7 or any previous version.

@gohm44
Copy link

gohm44 commented Apr 9, 2020

I do not have any custom integration nodered and so on. After upgrade to the new version container consumes more resources. Eats more CPU usage and memory every minute... After ~12h I had to restart home assistant. I'm right after the upgrade to 0.108.1 but as far as I see it behave the same manner. My OS: Ubuntu 18.04

The new release introduced Jamalloc. Maybe it is our suspect?

Docker images for Home Assistant are now using Jemalloc, to reduce memory fragmentation and speed up memory allocation. So, less memory and generally a faster Home Assistant.

image

@McGiverGim
Copy link
Author

Happy to hear that maybe is not node red related and I'm not alone in this problem. I think I see it with node red because I have sensors and flows that react from one to the other and maybe for some reason they don't do it like in the earlier version (maybe some sensor is out of control?).

I tested too 0.108.1 without luck, as you. I have made rollback to 0.107.7 and now all works as expected again.

If someone has some idea, I can upgrade again to 0.108.1 to test it, but until them, is impossible for me to remain in this version.

@Coren4
Copy link

Coren4 commented Apr 9, 2020

I have already updated to 0.108.1 and I have te same issue.
CPU and memory usage is getting higher and higher, so around 24h after start both are above 90%.

Before 0.108 my CPU was around 5% on idle, and memory around 45%.

@gohm44
Copy link

gohm44 commented Apr 10, 2020

A reboot of the server helped me to prevent constant increasing memory and CPU usage but anyway it's much higher than was before.

2020-04-10_06-54

@AndrzejOlender
Copy link

I have the same problem, except for a high CPU and a large delay in HA at that time (the automatics worked with a few seconds delay) disk jump at the same time.
Screenshot 2020-04-10 at 08 13 38
Screenshot 2020-04-10 at 08 13 50
Intel NUC, HA Corde in docker 0.108.1

@Coren4
Copy link

Coren4 commented Apr 10, 2020

0.108.2 it is still present

I use old 2 core/4gb RAM/SSD laptop as host with Ubuntu 18 Server.

What I use with HA:
ESPHome
Zigbee2Mqtt
Mqtt broker
Node red
Google drive backup
Plex integration
Airly integration
Brother printer integration
UPnP integration to network router

I am using MariaDB in container next to HA, as a storage.

@gohm44
Copy link

gohm44 commented Apr 10, 2020

Actually I notice that for me 0.108.2 almost resolve the issue. Memory usage gets back to previous. CPU usage is still higher but at least constant.

@Coren4
Copy link

Coren4 commented Apr 10, 2020

Actually I notice that for me 0.108.2 almost resolve the issue. Memory usage gets back to previous. CPU usage is still higher but at least constant.

I also thought it solved problem for me, but then 1h hour passed, and everything came back.

@McGiverGim
Copy link
Author

The CPU usage as high is a clear symptom that the problem is still there.
While I was playing starting and stopping nodered to see if it goes better, I thought I fixed it because at some moment the memory was stable, but the CPU was high and some time later the memory started to grow again.

@mountainsandcode
Copy link
Contributor

Struggling with the same issue

@McGiverGim
Copy link
Author

@mountainsandcode can you give more information about your system? Are you using unofficial integrations like Node Red?

@mountainsandcode
Copy link
Contributor

I'm running HASS on a Synology using Docker - I have HACS running and two self-programmed integrations, but they have been running for quite a while. I suspect this may be due to #33882 (comment) as I can also see two python processes

@Coren4
Copy link

Coren4 commented Apr 11, 2020

I don't have Homekit integration, so in my case I don't think it is it.

@haseat
Copy link

haseat commented Apr 12, 2020

Having the same problem in Home Assistant Core on a Pi4 since upgrading to 0.108.n, although on a much lower level. As you can see, memory usage goes up until a restart.
image

@Alessandro1981
Copy link

I have the same issue updating from 107.7 to 108.3. Rolling back to version 107.7 the issue disappeared (memory consumption is flat).

Alessandro

@chosten
Copy link

chosten commented Apr 12, 2020

Same issue. I'm using docker on a i5 and I'm not using Nodered.
Problem appeared with v108.0.
I have to restart the container often because all services starts to timeout.

@AndrzejOlender
Copy link

I also have a HA Core in Docker, maybe the last change is the cause of the problem?

Docker images for Home Assistant are now using Jemalloc, to reduce memory fragmentation and speed up memory allocation. So, less memory and generally a faster Home Assistant.

@gohm44
Copy link

gohm44 commented Apr 13, 2020

Since version 0.108.3 everything goes back to normal for me.

@McGiverGim
Copy link
Author

McGiverGim commented Apr 13, 2020

It does not fix the problem for me. Tested it today with 0.108.3 and you can see again the memory going up and the CPU going up and down :
image

EDIT:
I edit to give more information. The memory is wasted in the homeassistant docker. After the installation:
image

Some time later:
image

The rest of addons seems to be stable.

A ps inside the homeassistant docker does not reveal nothing strange.

@choeflake
Copy link

choeflake commented Apr 13, 2020

Same here (except the memory). Issue experienced first with 0.108.? then upgraded to 0.108.3, still the same. Now back to 0.107.7 and issue is gone (all other components are not downgraded).
Under 0.108.x, my log was full of events like:

ha             | 2020-04-13 22:22:17 INFO (MainThread) [homeassistant.components.mqtt] Got update for entity with hash: ('binary_sensor', '0x0017880104b53dcc occupancy') '{'payload_on': True, 'payload_off': False, 'value_template': '{{ value_json.occupancy }}', 'device_class': 'motion', 'state_topic': 'zigbee2mqtt/hal_1_sensor_1', 'json_attributes_topic': 'zigbee2mqtt/hal_1_sensor_1', 'name': 'hal_1_sensor_1_occupancy', 'unique_id': '0x0017880104b53dcc_occupancy_zigbee2mqtt', 'device': {'identifiers': ['zigbee2mqtt_0x0017880104b53dcc'], 'name': 'hal_1_sensor_1', 'sw_version': 'Zigbee2mqtt 1.12.2', 'model': 'Hue motion sensor (9290012607)', 'manufacturer': 'Philips'}, 'availability_topic': 'zigbee2mqtt/bridge/state', 'platform': 'mqtt'}'
ha             | 2020-04-13 22:22:17 INFO (MainThread) [homeassistant.components.mqtt] Updating component: binary_sensor.hal_1_sensor_1_occupancy

(not sure which of the two rows is first)

For every entity (thus devices multiplied by the number of entities on it), this log is written every second.

My config: Ubuntu with Docker Compose running mosquitto 1.6.9, Zigbee2mqtt 1.12.2 (firmware 20200328). Stopping the mosquitto container reduces the CPU, listening on the MQTT shows that hundreds of messages per second are processed.

@haseat
Copy link

haseat commented Apr 15, 2020

0.108.4 seems to have fixed the problem for me so far

@andriej
Copy link
Contributor

andriej commented Apr 15, 2020

@haseat according to changelog nothing seems to be changed regarding system.
Maybe you've updated HACS meanhwile too?

@haseat
Copy link

haseat commented Apr 15, 2020

@andriej you're right, I totally forgot about that, but I did it right before the 0.108.4 update

@McGiverGim
Copy link
Author

My latest test with 0.108.3 was with HACS updated to the latest version, and that did not fix the problem for me :(

@McGiverGim
Copy link
Author

I have observed that some users use py-spy to analyze what is going on with the system, one example on this thread:

#34093

Someone from here is able to execute it? I don't know nothing about python so I don't know if it can be executed in a running released hass.io/homeassistant instance.

@McGiverGim
Copy link
Author

@Lawrencezarb
Copy link

Lawrencezarb commented Apr 15, 2020

I have the same problem, even using 108.4. I have reverted to 107.7

@Gunth
Copy link

Gunth commented Apr 16, 2020

I also have the same issue with the 108.5 version .... :-(

@ayufan
Copy link

ayufan commented Apr 16, 2020

@bieniu

As far as I can see, they do request data from coordinator which might enqueue multiple data requests. Now, since coordinator registers listeners it is likely that each sensor might request a new await. I have no idea if this is problem with Brother or the DataUpdateCoordinator, but symptoms are as here: #33866 (comment). It seems also off that we seem to consume a multiple results at the same time, this does mean that multiple awaits are enqueued and finish at the same time.

Now, after re-testing again with Brother power off, HASS restarted I clearly see an increasing in CPU usage, something that is not happening when Brother integration is disabled:
image.

Can you re-test on your side with the following conditions? For reference I run HASS on armv7l, so it might take significantly longer to see increasing CPU usage on much more beefy processors (like amd64).

@bieniu
Copy link
Member

bieniu commented Apr 16, 2020

OK guys, I have to admit that something is wrong.
I added _LOGGER.debug(f"update: {self}") to _async_update_data method of BrotherDataUpdateCoordinator class. When I start HA when the printer is offline after some time I have in log:

2020-04-16 22:49:43 DEBUG (MainThread) [homeassistant.components.brother] update: <homeassistant.components.brother.BrotherDataUpdateCoordinator object at 0x7f332346adc0>
2020-04-16 22:50:04 DEBUG (MainThread) [homeassistant.components.brother] update: <homeassistant.components.brother.BrotherDataUpdateCoordinator object at 0x7f32e6e6bc70>
2020-04-16 22:50:04 DEBUG (MainThread) [homeassistant.components.brother] update: <homeassistant.components.brother.BrotherDataUpdateCoordinator object at 0x7f335794db50>
2020-04-16 22:50:56 DEBUG (MainThread) [homeassistant.components.brother] update: <homeassistant.components.brother.BrotherDataUpdateCoordinator object at 0x7f32e6ab8160>

Four different coordinator objects are trying to update the data from the printer.
I think that every attempt to configure brother integrations adds new coordinator object. Each coordinator object is trying to update the data from the printer and it takes memory and CPU time.

EDIT:
This piece of code is to blame for that: https://github.com/home-assistant/core/blob/dev/homeassistant/components/brother/__init__.py#L33-L37
Each integration that uses DataUpdateCoordinator works the same way.

@McGiverGim
Copy link
Author

Good catch @bieniu ! Let's go for the fix! 😁

@bieniu
Copy link
Member

bieniu commented Apr 17, 2020

Fix: #34317

@choeflake
Copy link

Ok, but I have high CPU and no brother integration. So I think the problem is more general. My log is full of messages about updating components (see above).
My setup uses MQTT and Zibee2MQTT plus Node Red.

@chosten
Copy link

chosten commented Apr 17, 2020

The fix applies to all integration using DataUpdateCoordinator. Hopefully it will fix our problem too.

@ayufan
Copy link

ayufan commented Apr 17, 2020

I updated my install. Will monitor :) Thanks @bieniu :)

So far graphs are significantly more healthy.

@D43m0n666
Copy link

I have the Brother integration, but I think remember I installed it after the problem, but I will try later without it.
@D43m0n666 waiting for your test...

Disabling Brother integration it seems ok with latest release 108.5, FOR NOW! I can give you better news in 24H, thanks @ayufan for suggestion!

Yes, i confirm problem was Brother integration. Now it's ok, i've also updated to latest version!

Thanks to all!

@bieniu
Copy link
Member

bieniu commented Apr 17, 2020

@D43m0n666 This was a Data Update Coordinator issue and affected all integrations using it.

@ayufan
Copy link

ayufan commented Apr 17, 2020 via email

@ayufan
Copy link

ayufan commented Apr 17, 2020

It is definitely a magnitude better, but for me it still seems that there's some performance impact of having and not having Brother enabled:

As you can see baseline did jump from ~4.5% to ~6% of CPU usage (given that this is not the only task running there it is in fact 33% increase in CPU usage) after upgrade from 0.108.4 to 0.108.8. The only other difference is I enabled Brother integration.

image

I'm running another test now. I disabled and lets see the difference in next 12/18 hours :)

@andriej
Copy link
Contributor

andriej commented Apr 17, 2020

1,5% percent of difference between having and not having snmp queries in background... what hardware are you on where it matters? ;-)

@choeflake
Copy link

Upgraded tot 0.108.6, seems to fix a lot. My CPU is now around 5%. In my opinion this is not 'low', but way better than the previous 60%.
Take note that I have a 'Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz' cpu. The 5 min load shows 0.05.

@ayufan
Copy link

ayufan commented Apr 18, 2020

@andriej @bieniu

Given that this machine is: my router, dual wifi with 5 SSIDs, firewall, VPN server, HASS (with 100 entities), zigbee2mqtt (with 40 entities), Prometheus scraping 15 targets, MQTT, esphome frontend, and pihole it is doing pretty well with that average.

There is a clearly noticeable performance regression when running brother integration with a printer being unreachable. It is very likely that it was as that before, but it still does not make sense to use so much resources by service where there's in fact no data processing.

I will look how performant is brother module :)

Screenshot 2020-04-18 at 12 56 25

EDIT:

It takes around 140ms for me to perform a single cycle of brother.async_update for unavailable device. This time is spend solely on CPU.

EDIT 2:

The problem is related for the cost of running snmp_engine = hlapi.SnmpEngine(). This requires parsing all OIDs database it seems. This takes takes that CPU time. I see around 130ms being spend on just this call.

EDIT 3:

This happens as the whole OID database seems to be imported each time: https://github.com/etingof/pysnmp/blob/master/pysnmp/entity/engine.py#L99.

Likely, it would be better to re-use hlapi.SnmpEngine() across different printers, and initialize it once, instead of each time when requesting data. This is also probably why we saw so siginificant performance regression, as more and more of hlapi.SnmpEngine() ops were queued, and each of them took significant amount of time.

It seems that a single snmp_engine object takes around 40kB in memory, so likely this is fully acceptable given the very high cost of initialization. Considering, that having 130ms initialization time each time to fetch data creates a random latency spikes in responsiveness of Home Assistant, due to GIL of CPython :)

EDIT 4:

Initing snmp_engine once results in the check being instantaneous.

@Alessandro1981
Copy link

Updating to 108.6 fix the issue on my installation: memory and CPU are as expected

@McGiverGim
Copy link
Author

All is ok for me. It seems that my cpu is maybe a little bigger (maybe 1%) but I don't know if this can be addressed or where the origin is.

@McGiverGim
Copy link
Author

After two days os testing, all seems stable but I see my memory a little high. The home assistant docker is taking more than 400M. It has been taken more memory slowly. The memory total is about 50% (Raspberry Pi 2Gb).

image

Usually I had less memory used, but I'm always changing things, so maybe is simply normal but I comment about this here to see if more people has noticed something. As I say it seems stable and I have no problems.

@Gunth
Copy link

Gunth commented Apr 20, 2020

Yes, it seems that i have the same issue, memory and swap is growing to 100% for both after 2 days on my RPi3b+

@eddetollenaer
Copy link

I have the same problem. I see a clear increase of memory usage every hour during the night when not much is going on rather than one automation that determines the max day temperature every 15 minutes. I already set the recorder to only record the automations and two sensors because the recorder send to eat up a lot of memory.

D94C92AD-B0B4-41A4-8D05-BFA3A535FBC9

@bullshitduckdnsaccount
Copy link

I had this same issue as well, starting in 0.108.0 and the changes in 0.108.6 did fix it for me.

The issue then returned in 0.109.0 and has consisted through 0.110.x. Both CPU and RAM will creep up and then eventually bring HA to a standstill until rebooted.

@norbertomartins
Copy link

I have the version 0.110.3 and have the same issue. I created a trigger to reboot if memory is below 110 for more than 2 minutes. Sometimes this occurs during the night.

homeassistant_low_memory

@andriej
Copy link
Contributor

andriej commented May 28, 2020

Ain't it google backup addon doing it's job?

@bdraco
Copy link
Member

bdraco commented Jul 6, 2020

Please post a py-spy recording
https://github.com/benfred/py-spy

py-spy record --pid 200 --output issue33866.svg --duration 120

You'll have to change the pid to your local instance's pid.

@stale
Copy link

stale bot commented Oct 4, 2020

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 4, 2020
@bdraco bdraco closed this as completed Oct 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests