Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shelly Door/Window 2 (battery operated, gen1) goes unavailable and stops working until integration reload #116948

Open
sargue opened this issue May 6, 2024 · 24 comments · May be fixed by #118969

Comments

@sargue
Copy link

sargue commented May 6, 2024

The problem

My Shelly Door Window 2 device stops reporting updates and all sensors go to unavailable.
The device seems to be connecting correctly to the network, as I have an Unify network also integrated on HA and it reports the device (via MAC address) as home and away on each update (that's how battery operated Shelly devices work, they just connect to send the update, then go offline).

I can manage to get the device working fine again for a while if I reload the integration with the device "awake" (via pushing the hardware button to wake the device for a couple of minutes). Then it works for some hours, until it shows "unavailable" again.

The device has an static IP address and the CoIoT websocket correctly configured. Actually, it's been working for quite some time. Firmware is the latest available, as I just updated it when I started diagnosing this problem.

About diagnostic logs for the integration I can surely provide this but please let me know at which point should I activate it and up until which point. As the problem manifest after several hours I'm not sure if I can have the diagnostics running for so long. I have several other Shelly devices, so the file could be large.

What version of Home Assistant Core has the issue?

core-2024.5.1

What was the last working version of Home Assistant Core?

Not sure. 2024.4.x for sure, but 2024.5.0 I'm not sure.

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Shelly

Link to integration documentation on our website

https://www.home-assistant.io/integrations/shelly/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

@home-assistant
Copy link

home-assistant bot commented May 6, 2024

Hey there @balloob, @bieniu, @thecode, @chemelli74, @bdraco, mind taking a look at this issue as it has been labeled with an integration (shelly) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of shelly can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign shelly Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


shelly documentation
shelly source
(message by IssueLinks)

@DerAutomatiker
Copy link

I'm also having issues with 5.0 and 5.1
For me it's the Shelly TRV and Shelly Motion that become unavailable and only a restart of the integration fixes that.
Meanwhile they are available in the Shelly App and local IP.

@bieniu
Copy link
Member

bieniu commented May 6, 2024

Please attach diagnostics file

obraz

And please enable debug logging for Shelly integration

obraz

restart HA, wait 15 minutes, close/open the door, disable debug logging and attach here the log file.

@sargue
Copy link
Author

sargue commented May 7, 2024

Done. But in 15 minutes it's not failing. It take hours until the device becomes "unavailable".

config_entry-shelly-0afccd1550e7c9a04b81f0294aae0701.json
home-assistant_shelly_2024-05-07T05-44-22.136Z.log

@bieniu
Copy link
Member

bieniu commented May 7, 2024

The device appears to be properly configured and data from the device is reaching the HA server.
The integration marks entities as unavailable if data from the device does not reach the HA server for 12 hours * 1.2 = 14.4 hours. We need to determine what happens after this time so you have to catch this moment in the log.

@sargue
Copy link
Author

sargue commented May 7, 2024

The device appears to be properly configured and data from the device is reaching the HA server. The integration marks entities as unavailable if data from the device does not reach the HA server for 12 hours * 1.2 = 14.4 hours. We need to determine what happens after this time so you have to catch this moment in the log.

What if the door is not opened (the state doesn't change) in 14.4 hours?
I'm guessing this Shelly devices do not perform any keep alive connection, right?

@bieniu
Copy link
Member

bieniu commented May 7, 2024

The device should send the status to the HA server every 12 hours. Regardless of whether the door was opened or not.

obraz

Integration increases this time by 20% and after this time it marks the device as unavailable.

@sargue
Copy link
Author

sargue commented May 7, 2024

I see.
I will try to catch that with the diagnostic logs enabled.

@DerAutomatiker
Copy link

Here are my Logs. At 12:35 and 12:54 has one TRV each changed to unavailable.
Log is in my GDrive because it's too big to upload here:
https://drive.google.com/file/d/1YLYLKlzLmL2hWQchVjkctPynIhh5Y8oG/view?usp=drivesdk

config_entry-shelly-ad020fc9466acbc775edf007c5df012a.json
config_entry-shelly-c6550611053d9afd178192480a8cb89b.json

@bieniu
Copy link
Member

bieniu commented May 7, 2024

At 12:35 and 12:54 has one TRV each changed to unavailable

Yes, the devices did not send updates on time or the updates did not reach the HA server and the devices were marked as unavailable.
I don't see anything that would indicate an integration problem.

2024-05-07 12:35:32.555 ERROR (MainThread) [homeassistant.components.shelly] Error fetching shellytrv-60A423D07C9E data: Sleeping device did not update within 3600 seconds interval
2024-05-07 12:54:00.161 ERROR (MainThread) [homeassistant.components.shelly] Error fetching shellytrv-60A423D3F87C data: Sleeping device did not update within 3600 seconds interval

@DerAutomatiker
Copy link

But strange thing is that it stays unavailable and as soon as I reload the integration it's there again. Also in UniFi and Shelly App it's online all the time with no reconnect or so.
Screenshot_2024-05-07-14-44-52-95_c3a231c25ed346e59462e84656a70e50
Screenshot_2024-05-07-14-44-35-97_c3a231c25ed346e59462e84656a70e50

@thecode
Copy link
Member

thecode commented May 7, 2024

But strange thing is that it stays unavailable and as soon as I reload the integration it's there again.

This is how Home Assistant (Shelly integration) works, when you reload we restore the previous value and start counting the 14.4 hours again.

@DerAutomatiker
Copy link

But the value changes like normal. It doesn't stay at one value.
Screenshot_2024-05-07-18-12-43-88_c3a231c25ed346e59462e84656a70e50

@bieniu
Copy link
Member

bieniu commented May 7, 2024

For your device, TRV, sleep period is 10 minutes. So the integration will mark the device as unavailable if it does not receive data from the device for 10 * 1.2 = 12 minutes. This is why your chart looks normal.
You must remember that CoIoT is UDP, packets can be lost and are not retransmitted. Often this problem is caused by network equipment. I myself once had an AP, which after several hours of work lost the CoIoT packets.

@smarthomefamilyverrips
Copy link

smarthomefamilyverrips commented May 9, 2024

Logger: homeassistant.components.shelly
Source: helpers/update_coordinator.py:347
integration: Shelly (documentation, issues)
First occurred: 16:08:57 (1 occurrences)
Last logged: 16:08:57

Error fetching shellymotionsensor-60A4239A65B2 data: Sleeping device did not update within 3600 seconds interval

I have the same problem also after Update from 2024.4.3 to 2024.5.2 ...... @bieniu @thecode .... It is easy to assume that this is a connection problem non related to the integration but this problem not and never even once occurred before the update to 2024.5.2, also the device stays online in the router device list and also the device interface is reachable by IP through web browser.

After a reboot the device get unavailable "OR" entities (motion, vibration, lux) get stuck in last state/value they where before or at the time of the reboot of HA and after a reload of the device within the shelly integration interface the device start report all values properly again.

No idea what changes between version 2024.4.3 and 2024.5.2 cause this behavior but for sure it is related to the integration and not to external properties./conditions.

config_entry-shelly-dfc56339e4e44286b9091eb9415f396f.json

config_entry-shelly-dfc56339e4e44286b9091eb9415f396f (1).json

the first diagnostic file is from when the device shown unavailable and the second file from after did do a reload within the shelly integration

2024-05-09_17-23-06.mp4

in the video you can see the device is already online for over 7 hours way before I did do the reboot of HA and the device became unavailable, also you can see every 3 seconds it update the connection time when is polling for connection and as stated before the device interface is also available. (but this was also already stated by @DerAutomatiker "But strange thing is that it stays unavailable and as soon as I reload the integration it's there again. Also in UniFi and Shelly App it's online all the time with no reconnect or so")

@sargue
Copy link
Author

sargue commented May 9, 2024

You must remember that CoIoT is UDP, packets can be lost and are not retransmitted. Often this problem is caused by network equipment. I myself once had an AP, which after several hours of work lost the CoIoT packets.

I've changed the configuration of my Unify APs to lock this device to its nearest AP. I saw that, sometimes, it connected to other AP.
For now, it's been working fine for over 24 hours.

@smarthomefamilyverrips
Copy link

You must remember that CoIoT is UDP, packets can be lost and are not retransmitted. Often this problem is caused by network equipment. I myself once had an AP, which after several hours of work lost the CoIoT packets.

I've changed the configuration of my Unify APs to lock this device to its nearest AP. I saw that, sometimes, it connected to other AP. For now, it's been working fine for over 24 hours.

I not am able to do this at me the device is already connected and linked to closest router, what actually is also the main router and is on a distance of 1.5 meters.... so this seems very unlikely that connection to the router is the problem... more so before the update to 2024.5.2 I never had this error in the almost 2 years I have these shelly motion devices, and besides the HA update nothing changed in my setup, no firmware updates or whatever on Routers or devices itself. All is exactly the same as before the update of HA.

@sodre90
Copy link

sodre90 commented May 12, 2024

Same issue here, 2 motion sensors and 1 Door/Window2 are affected. The issue has occurred since I updated HA to 2024.5.x.
Error fetching Mechanical Room Motion Sensor data: Sleeping device did not update within 3600 seconds interval
If I reload the integration, it works temporarily but becomes unavailable again after some time, remaining stuck in the unavailable state.

@smarthomefamilyverrips
Copy link

Screenshot_20240513_224229_Chrome

Screenshot_20240513_224237_Home Assistant

@bieniu, @thecode .... as you can see on screenshots above the motion sensor became unavailable in HA but is reachable and works through webinterface by IP address. Just to backup my argument that it seems very unlikely this is a network error...
In the logs I got again the sleeping device not update in 3600 seconds as stated in .y previous comment.

Just want to point out again before the 2024.5.2 update I never had this error message in my logs for last 2 years of use of the shelly devices/integration.

@thecode
Copy link
Member

thecode commented May 13, 2024

@smarthomefamilyverrips I have looked at all the previous comments and didn't see any logs from you, but you are 100% sure it is "the same problem" and integration related. While I am pretty sure it is integration related in your case and might be related to #116975 since I seen the same on my setup, without logs I can't promise anything. The fact that for someone it stopped working in a specific release doesn't guarantee it is the same for you. You can either wait until #116975 is fixed and see if it fixes the problem for you, or provide logs so we can check.

@smarthomefamilyverrips
Copy link

@thecode, in above comment I shared the error and diagnostic files. The reason why I shared the pictures is because there was stated in previous comments that most likely was a connection problem, so that is why I shared some information about the connection status of the device to show that in my case this is unlikely in the hope that somehow it will contribute.... I also did see the issue you referred to but in my non expert view this seemed a other issue and this one seemed more similar, hence I react here. But I hope you are right and that the fixes for that issue will solve and close this issue also, I not find it a problem to wait for that. As far logs go I will only have time to supply these during the weekends, sorry for this. 🫣

@BOFH90
Copy link

BOFH90 commented May 26, 2024

Same Problem with my TRVs. After some hours, they become unavailable. The WebUI is still reachable and I can control the devices through it. The CoIoT is setup properly and the firewall is configured to pass all traffic for the complete haos-host (tcp and udp). All the other 9 Shelly Devices work perfect (mostly PM and Dimmer).
If I reload the Integration or restart Home Assistant, its available and controllable again and works again for some hours.

@thecode thecode linked a pull request Jun 6, 2024 that will close this issue
20 tasks
@issue-triage-workflows
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@smarthomefamilyverrips
Copy link

Still is happening and also reported in other issues as for example in #119002

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment