Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zigpy / zigpy_deconz Issues after 2023.3 to 2023.8.1 upgrade #97965

Closed
userdeveloper98 opened this issue Aug 7, 2023 · 14 comments
Closed

zigpy / zigpy_deconz Issues after 2023.3 to 2023.8.1 upgrade #97965

userdeveloper98 opened this issue Aug 7, 2023 · 14 comments
Assignees

Comments

@userdeveloper98
Copy link

The problem

I noticed a significant degradation of zigbee stability immediately after upgrade.
Some devices are going unavailable and become unresponsive quite quickly.

What version of Home Assistant Core has the issue?

core-2023.8.1

What was the last working version of Home Assistant Core?

core-2023.3

What type of installation are you running?

Home Assistant Container

Integration causing the issue

zigpy_deconz

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

zha-749096ce2114b5163c3a12a5f7af92c2-GLEDOPTO GL-C-007-bf62f215c132349b45e2cf4feac944ca.json.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Attaching a few unusual errors:
`    [2561915752] Failed to send request: Failed to deliver packet: <TXStatus.MAC_NO_ACK: 233>
    [2551132968] Failed to send request: Failed to deliver packet: <TXStatus.MAC_NO_ACK: 233>

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 64, in wrapper
    return await RETRYABLE_REQUEST_DECORATOR(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/util.py", line 132, in retry
    return await func()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 375, in request
    return await self._endpoint.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request
    return await self.device.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 293, in request
    await self._application.request(
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 824, in request
    await self.send_packet(
  File "/usr/local/lib/python3.11/site-packages/zigpy_deconz/zigbee/application.py", line 453, in send_packet
    raise zigpy.exceptions.DeliveryError(
zigpy.exceptions.DeliveryError: Failed to deliver packet: <TXStatus.MAC_NO_ACK: 233>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/websocket_api/commands.py", line 226, in handle_call_service
    await hass.services.async_call(
  File "/usr/src/homeassistant/homeassistant/core.py", line 1974, in async_call
    response_data = await coro
                    ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2011, in _execute_service
    return await target(service_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 235, in handle_service
    return await service.entity_service_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 870, in entity_service_call
    response_data = await _handle_entity_call(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 942, in _handle_entity_call
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 580, in async_handle_light_on_service
    await light.async_turn_on(**filter_turn_on_params(light, params))
  File "/usr/src/homeassistant/homeassistant/components/zha/light.py", line 360, in async_turn_on
    result = await self._on_off_cluster_handler.on()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrapper
    raise HomeAssistantError(message) from exc
homeassistant.exceptions.HomeAssistantError: Failed to send request: Failed to deliver packet: <TXStatus.MAC_NO_ACK: 233>
`
`
Unexpected transmit confirm for request id 23, Status: TXStatus.MAC_NO_ACK
Unexpected transmit confirm for request id 103, Status: TXStatus.SUCCESS
Unexpected transmit confirm for request id 205, Status: TXStatus.SUCCESS
Unexpected transmit confirm for request id 208, Status: TXStatus.SUCCESS
Unexpected transmit confirm for request id 254, Status: TXStatus.SUCCESS
`
`
No response to 'Command.aps_data_confirm' command with seq id '0xed'
No response to 'Command.aps_data_request' command with seq id '0x72'
No response to 'Command.aps_data_indication' command with seq id '0xee'
No response to 'Command.aps_data_indication' command with seq id '0xef'
No response to 'Command.aps_data_indication' command with seq id '0xf0'
`
`
Failed to deserialize frame: b'040a05080001002a'
No response to 'Command.aps_data_confirm' command with seq id '0x0a'
No response to 'Command.aps_data_confirm' command with seq id '0xcc'
No response to 'Command.aps_data_indication' command with seq id '0x0b'
No response to 'Command.aps_data_confirm' command with seq id '0x0c'
`

Additional information

Consider mains powered devices unavailable after (seconds): 1800 (increased now, previously 900 but still doesn't help)
Consider battery powered devices unavailable after (seconds): 21600

@home-assistant
Copy link

home-assistant bot commented Aug 7, 2023

Hey there @dmulcahey, @Adminiuga, @puddly, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zha can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign zha Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


zha documentation
zha source
(message by IssueLinks)

@puddly
Copy link
Contributor

puddly commented Aug 7, 2023

2023.3 to 2023.8.1 is a huge jump. Are the devices reachable if you downgrade back to 2023.3?

@ghost
Copy link

ghost commented Aug 7, 2023

Mine says:

Logger: zigpy.device Source: runner.py:179 First occurred: 18:13:36 (1 occurrences) Last logged: 18:13:36 Failed to parse message (b'8184019c') on cluster 32799, because Data is too short to contain 1 bytes

@criticallimit
Copy link

Same here.
From 2023.6 up to 2023.8 my zigbee devices are more and more unreliable, producing more and more faults newer seen before.
Additionally some devices are not triggering automation although they are correctly shown switched in the logbook, but automation is not triggered.
Conbee II stick is up to date in firmware, no devices are added or removed. They all at the position they were installed two years before. No additional Wi-Fi devices are installed (so noise is not added).
My last good running install was with 2023.6.4.
2023.8.1 is nearly unusable due to unreachable devices, when reachable and working they do not trigger automations reliable.

Can't go back to 2023.6.4 because restore is not working since 2023.7

@tediroca
Copy link

tediroca commented Aug 7, 2023

I'm using Conbee II and since the 2023.8 all my IKEA Tradfri outlets don't get triggered, neither from the automations or from Zigbee2MQ. Re-pairing them fixes it for few minutes and then these become unresponsive again

@criticallimit
Copy link

In my case it´s the Ikea Tradfri switches and bulbs, eWelink MS01 Motion sensors and .xiaomi.aqara.weather sensors (lumi.weather)

@lux4rd0
Copy link

lux4rd0 commented Aug 7, 2023

Same here. From 2023.6 up to 2023.8 my zigbee devices are more and more unreliable, producing more and more faults newer seen before. Additionally some devices are not triggering automation although they are correctly shown switched in the logbook, but automation is not triggered. Conbee II stick is up to date in firmware, no devices are added or removed. They all at the position they were installed two years before. No additional Wi-Fi devices are installed (so noise is not added). My last good running install was with 2023.6.4. 2023.8.1 is nearly unusable due to unreachable devices, when reachable and working they do not trigger automations reliable.

Can't go back to 2023.6.4 because restore is not working since 2023.7

Not sure if it's frowned upon, but I've been flipping between core versions with this command:

ha core update --version=2023.6.3

There are some slight differences in things like Zigbee database versions, etc., but my system is back to being usable with 2023.6.3 - Everything in 2023.7 and 2023.8 is unstable with automations.

@puddly
Copy link
Contributor

puddly commented Aug 7, 2023

@lux4rd0 can you upload debug logs of HA startup and ~10 minutes of runtime with both 2023.6.3 and 2023.7.0? There weren't really any ZHA (or ZHA library) changes between those versions.

@criticallimit
Copy link

Logger: homeassistant.components.zha.core.cluster_handlers
Source: components/zha/core/cluster_handlers/__init__.py:508 
Integration: Zigbee Home Automation (documentation, issues) 
First occurred: 21:08:50 (1 occurrences) 
Last logged: 21:08:50

[0x3D57:1:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>')]

@userdeveloper98
Copy link
Author

2023.3 to 2023.8 is a huge jump. Are the devices reachable if you downgrade back to 2023.3?

Downgraded and the network was still unstable (~2 hours), upgraded back to 2023.8 and after ~10 hours the network works as usually.
To be noted, at the time I opened this issue the network was unstable for ~ 12 hours (random disconnects, unavailable/unresponsive devices) on 2023.8, extremely odd on top of that those new huge stack traces made me think it's a compatibility issue with the devices or something.
Given the behaviour is now back to normal (usual rate of failures 😄 ), I assume it is not related to the upgrade and the issue can be closed.
Apologize for the inconvenience.

@Kisty
Copy link
Contributor

Kisty commented Aug 8, 2023

I'm having the same or similar problem with my C2531 also on Home Assistant Container. Perhaps it's a linux kernel issue? What do you get from doing sudo dmseg | tail? I'm getting lots of new USB device detected giving an incrementing number each time every few seconds. I'm on an ODroid N2+.

Maintainers, I'm not sure how to proceed with this discussion since it's not an issue with core. Where would be better?

@lux4rd0
Copy link

lux4rd0 commented Aug 8, 2023

@lux4rd0 can you upload debug logs of HA startup and ~10 minutes of runtime with both 2023.6.3 and 2023.7.0? There weren't really any ZHA (or ZHA library) changes between those versions.

@puddly

Two zip files - one for 2023.6.3 and one for 2023.7.0. I let it run for ~20 minutes.

home-assistant.log.2023.6.3.zip
home-assistant.log.2023.7.0.zip

I also have the traces for the first automation that got hung up...

trace1
trace2
trace3

Runs fine the first time, hung up on the second run... Things started stacking up after that...

A few other things I've noticed...

  • Restarting the Z-Wave JS Add-on will sometime break the piled-up automation log jam - so I restarted that each time before the end of the logs collections. However, I've noticed that doing so in either version of HA core causes Home Assistant core to recycle itself. I've noticed this over the last few weeks in any of the latest versions when I make some configuration changes, as the UI will become unresponsive, and I'll see all the startup messages in the logs. Sometimes I'll have Z-Wave nodes go dead, and a restart of the add-on will reconnect them all.

  • All versions of 2023.7 and 2023.8 exhibit this automation blockage. I only use Single mode for all of my automation.

  • I took the time to switch all of my automation from "turn on / turn off" devices to using services. Thought there might be something there gunking things up - but it didn't make any difference in terms of time to complete the automation or things not hanging up.

  • I originally had a referenced Z-Wave device in an automation that no longer existed, but that was removed, and the problem is still happening.

Also - since this is now closed - does it make sense to open a new ticket? There are several open tickets having this issue.

@puddly
Copy link
Contributor

puddly commented Aug 8, 2023

Also - since this is now closed - does it make sense to open a new ticket?

Please do.

Also, I'm not sure the relevance of Z-Wave JS to this issue, but ZHA and Z-Wave JS are two completely independent integrations and projects. You seem to be using a HUSBZB-1 with combined Zigbee and Z-Wave radios and are likely using both integrations but just to be sure, the issue you're having is with ZHA and not Z-Wave JS, correct? I'm seeing no tracebacks related to ZHA in your log file (other than software watchdog timeouts in the 2023.6.3 log, which indicates that the serial connection to your adapter is unstable). If you can, enable ZHA debug logging as well.

@lux4rd0
Copy link

lux4rd0 commented Aug 8, 2023

@puddly Thanks - will do!! And yes - I'm using the dual Z-Wave/Zigbee HUSBZB-1 and using both integrations. I use the Z-Wave JS Add-on (54 devices) and ZHA (51 devices). Most of my automation is essentially taking Zigbee motion sensors and turning on Z-wave light switches. Not sure why nothing is showing up for those Zigbee devices.

I'll turn on ZHA debugging and send more logs to my issue #98073

Thanks for taking a look!!

@github-actions github-actions bot locked and limited conversation to collaborators Sep 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants