-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zigbee ZHA locks up and problems controlling #112843
Comments
Hey there @dmulcahey, @Adminiuga, @puddly, @TheJulianJES, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) zha documentation |
According to your log, you have severe RF interference issues:
The radio is unable to transmit packets because there is too much 2.4GHz interference.Is your SkyConnect near any USB 3.0 ports, SSDs, 2.4GHz routers, etc.? |
It's on a extension cord taking it away from the main machine, but it is sitting in a area with the Tado gateway, behind a surround sound system, in like a "media" console (see attached image). But it's been sitting in that location in a year now. And never this mainy problem it before this update. I now moved it away and set up on my desk as a test (See attached image). Still having problems.
|
Still having massive issues, where the system seems to be getting worse when it's been on for prolonged period. It had an okay period here with the move, but it seems like it degrades over time.
|
After rolling back tp 2024.2.2 Zigbee has become more stable. |
I don't recall the issue thread but I read in another issue that heavy interference could cause the ZHA integration to throw. My yellow is near both a router and wifi printer. The printer uses 2.4ghz. I instead switched the printer to Ethernet and things have been stable so far 🤞 |
So after many hours of work I'm really just back at square one. I toke the decision reinstall the whole system, it was running on a Mac mini in vituralbox, and I decided reinstall the whole machine and install the hassOS directly onto the machine. Install the newest version os ha 2024.03 and after getting all my zigbee devices into the system it seemed stable last night. But after the night it's having problems sending commands again. I would really appreciate some help with this.
more from the log
And the debug log, after enabling debug logging and trying to controll a "locked" device
|
And now after a restart HA seems to have thrown all sensors. |
I think a reasonable assumption can be made that if recreating from scratch demonstrates the same issue. That your issue may be the network / airwave conditions where your HA node is placed. Would you be able to take any steps to reduce interference?
|
@mikeymop normally i would agree, and that as I stated earlier is also why a moved the machine running HA. But as also stated earlier the system and network have been running more or less rock solid for year, nothing new introduced to the system, most of my smart devices are zigbee, which also why my whole house is not working right now. I have minimal wifi devices mostly phone and tablets, computers are hardwired. And would agree if I had over a longer period experience problems I would agree with interference, but this came at the exact moment of updating to 2024.3 and could be stabilized by downgrading. |
Same here, 80+ Zigbee devices on ZHA/Conbee2, mostly stable for a long time now. But 2024.3 hit me hard. Error log fills up with errors belonging to zigbee devices, e.g.: [0x5D69:1:0x0020] Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/zigpy/device.py", line 342, in request return await req.result ^^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/zigpy/zcl/init.py", line 377, in request return await self._endpoint.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/endpoint.py", line 253, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy/device.py", line 341, in request async with asyncio_timeout(timeout): File "/usr/local/lib/python3.12/asyncio/timeouts.py", line 115, in aexit raise TimeoutError from exc_val TimeoutError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/general.py", line 623, in check_in_response await self.checkin_response(True, self.CHECKIN_FAST_POLL_TIMEOUT, tsn=tsn) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/init.py", line 82, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit self.gen.throw(value) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/init.py", line 65, in wrap_zigpy_exceptions raise HomeAssistantError( homeassistant.exceptions.HomeAssistantError: I was able to fix broken zigbee groups with 2024.3.1, but the mesh is still instable. I can confirm that it seems like it degrades over time. There is increasing delay in device reaction to commands. |
similar setup to dexheimer42. ConbeeII ~ 60 devices. Stable until update to 2024.3 (currently on 2024.3.1) multiple restarts and reloads of ZHA / devices. still unstable. latest ZHA log entry attached
|
Well after several attempts getting it fixed, i toke the plunged with zigbee2MQTT, and things are A LOT more stable. Then you can talk about interferences and all that, but if changing system helps, then there seems to something wrong with the system. |
The statistics from the radio that were pulled from your diagnostics info don't lie: MAC_TX_UNICAST_SUCCESS = 943
MAC_TX_UNICAST_RETRY = 4294
MAC_TX_UNICAST_FAILED = 1634
PHY_CCA_FAIL_COUNT = 2784 Over 40% of your requests outright fail because the radio's firmware refused to transmit due to noise. The integration can't control that, it's your environment. Unless your network was loaded with a ton of Tuya devices, the only change that Z2M performs is moving Zigbee network to channel 11. ZHA generally avoids channel 11 when picking your network's channel. |
I understand completely, and read your comment history very carefully. The reason I brought it up is that our situations are almost identical. The only difference being that I'm using a Ha Yellow with a skyconnect built in rather than a dongle. My Ha Yellow has not moved in over a year, and I also primarily have only Ethernet devices aside from the cell phones and the wifi printer I mentioned. The event that triggered my crashing issues I believe was the firmware update to ZHA. I put it off for several months as I read the warning it could be a breaking upgrade. Once I upgraded I started experiencing the crashes. My belief (still reading through the commit history), is that the newer firmware enforces this failure due to noise and bubbles up the exception. I argue the firmware could raise a more understandable exception to the user however it's entirely possible we have experienced these failures due to interference and the firmware just swallowed the exception and retried over and over producing even more interference. The above reasoning aligns with my router reporting many dropped packets. Up until the point I remedied the interference. I still stand by the issue being the network environment and the interference with in our homes and the best remediation being to make it clear to the user that the Firmware is throwing because of interference as Python stack traces tend to be very verbose. |
Which one are you referring to?
We actually disabled (as much as possible) this firmware feature in August. If you haven't explicitly updated the firmware on your Yellow, you're running the firmware that it came with. ZHA won't flash new firmware.
Unfortunately, this isn't easily possible. The firmware just tells you You are very right, however, that channel access failures should be reported better. It's something that's being worked on. |
How can I get this diagnostic info with these specific statistics from? I
pulled diag info from ZHA and from the coordinator device but couldn't find
these counters (HA2024.2.2).
puddly ***@***.***> schrieb am Di., 19. März 2024, 16:01:
… The statistics from the radio that were pulled from your diagnostics info
don't lie:
MAC_TX_UNICAST_SUCCESS = 943MAC_TX_UNICAST_RETRY = 4294MAC_TX_UNICAST_FAILED = 1634PHY_CCA_FAIL_COUNT = 2784
Over 40% of your requests outright fail because the radio's firmware
refused to transmit due to noise. The integration can't control that, it's
your environment.
Unless your network was loaded with a ton of Tuya devices, the only change
that Z2M performs is moving Zigbee network to channel 11. ZHA generally
avoids channel 11 when picking your network's channel.
—
Reply to this email directly, view it on GitHub
<#112843 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A4QOIEX44W4T4WWCTD64JKLYZBHMVAVCNFSM6AAAAABEOMDPDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBXGQZDANZVGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Good question. It came up end of year, is there an easy way I can view historical changelogs that I can try to back reference? After some more time I noticed that I still experience this issue just much less frequently. I'm noticing a strange correlation where this integration only crashes when an update becomes available for one of:
On any given day I noticed ZHA crashes I always have updates available these three. |
In the meantime HA 2024.4 came out. So I hoped that this issue could have been fixed, since I rolled back to 2024.2 before, where this strange behavior didn't show up: zigbee devices / mesh is getting slow after time until it gets totally unusable. Unfortunately with 2024.4 my zigbee mesh continues to collapse over time. So I started a new investigation and found that this might has something to do with the latest deconz/firmware for my conbee II coordinator: https://forum.phoscon.de/t/current-deconz-2-24-3-2-25-1-slow-in-response-after-time/4517 The timeframe stated in that topic somehow relates with the time when I started to notice these problems with my HA instance. So I decided to give SkyConnect a try which I had on my desk. Et voilà, all issues gone. Explanation: Different chip, different firmware, different software stack in ZHA/zigpy. Good by conbee. We had a good time. T. |
The problem
Af the update 2024.4 the zigbee/ZHA/Skyconnect 1.0 network have "locked" up several times, with the problem of not being able to control anything. Only a restart of HA got the integration running again.
Tonight the whole network seems to have problems, with problem sending command with 10-20 seconds with commands being sent, or not at all. With HA displaying errors.
several errors seems to present in the logs.
What version of Home Assistant Core has the issue?
core-2024.3.0
What was the last working version of Home Assistant Core?
core-2024.2.4
What type of installation are you running?
Home Assistant OS
Integration causing the issue
ZHA
Link to integration documentation on our website
https://www.home-assistant.io/integrations/zha/
Diagnostics information
home-assistant_zha_2024-03-09T20-55-27.032Z.log
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
No response
The text was updated successfully, but these errors were encountered: