Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT discovery should be re-sent on HA birth #920

Open
jhansche opened this issue Jul 16, 2023 · 14 comments
Open

MQTT discovery should be re-sent on HA birth #920

jhansche opened this issue Jul 16, 2023 · 14 comments

Comments

@jhansche
Copy link
Contributor

Viewing the MQTT Device representing a wyze camera, sometimes all the entities report "Unavailable". Took me a while to realize the exact scenarios that cause this, but it now makes sense that this happens any time Home Assistant is restarted, or the MQTT integration is reloaded.

This happens whether ON_DEMAND is set to true or false. And since I'm using Frigate to consume the video frames anyway, I believe that should mean that even with ON_DEMAND=true, it should generally always have a client, so I don't think that is what's happening.

I believe what's happening is that the MQTT integration loses the originally published MQTT discovery messages that the Docker Wyze Bridge sent upon startup, and therefore all the entities start up as unavailable because the integration doesn't remember them or doesn't have up to date information for them.

The issue is resolved if I restart the bridge add-on after restarting HA or the MQTT integration.

What I believe is missing is handling the birth/LWT messages from HA: https://www.home-assistant.io/integrations/mqtt/#birth-and-last-will-messages. The general tenet of MQTT discovery is that the bridge should send its discovery details any time it receives the birth message from HA, which is by default sending payload online to topic homeassistant/status (but you'll probably want to make it configurable like the DTOPIC config is currently, and also because the MQTT integration can be configured to use a different topic or payload). If it also needs to do something to release resources while HA is offline, it can also handle payload offline. Thinking new configs:

MQTT_DSTATUS_TOPIC="homeassistant/status"
MQTT_DSTATUS_ONLINE="online"
# MQTT_DSTATUS_OFFLINE="offline"  ## if needed for something

Then any time it receives MQTT_DSTATUS_ONLINE over MQTT_DSTATUS_TOPIC, it should re-publish all cameras to wyze_discovery(), like it does on startup.

@jhansche
Copy link
Contributor Author

jhansche commented Jul 16, 2023

I just noticed this appears to be a dupe of #907 which is marked as fixed in v2.3.10. 🤦‍♂️

However, I'm running v2.3.10, and this is not working currently. The above behavior is what I experience on v2.3.10 and HA 2023.7.2.

Now that I've pulled the latest code, I do now see the /status subscription - however the reaction to this topic receiving "online" only seems to re-publish a wyzebridge/status="online" message, which is not enough for the entities to be accessible again. I believe we need to re-send the device's discovery config when this HA status=online message is encountered, instead of just sending its own status=online message.

It may be sufficient to re-send the current values as well, over each camera's state_topic for each entity in this case? But in the case of the screenshot above, you can see that the switches are disabled, which means it is not just a problem of having incomplete state at the time - the entities are literally unusable in this state, until the discovery message is re-sent.

EDIT:
when clicking to view one of the entities listed as Unavailable, HA reports:

This entity is no longer being provided by the mqtt integration. If the entity is no longer in use, delete it in settings.

This is additional evidence that the problem is caused by the MQTT integration not receiving the discovery config after coming back online.

And to prove the theory of whether re-sending the current state would work or not, I tried publishing a message to emulate what the bridge would send to reflect state changes:

topic=wyzebridge/pan-1/night_vision
payload=2 # or 3

but the entity remains unavailable and the state is not updated in HA.
After I restart the bridge container, which re-sends the discovery message, and the entities are available again: I can send the same payload to emulate state changes from the bridge, and the switch status reflected in the HA UI correctly (2=off, 3=on). So this shows that simply re-sending the current states will not be enough either. The only thing that appears to work is to re-send the discovery configs.

mrlt8 added a commit that referenced this issue Jul 20, 2023
* cache build

* Catch and disable MQTT on name resolution error

* Doorbell quick response

* Set camera time zone #916

* Set timezone on camera #916

* OSD toggle for logo/timestamp

* Add K10006 auth #742

* Fix /time_zone/get and return offset #916

* convert TZ offset to hours #916

* custom video filter #919

* Resend discovery message on HA online #907 #920

* Revert K10006 for WYZEDB3 #742

* Add more MQTT entities #921 #922

* Return json response/value for commands #835

* Fix threading issue on restart #902

* Fix SET cruise_points over MQTT

* SET cruise_point #835

* split into multiple jobs

* changelog
@jhansche
Copy link
Contributor Author

Confirmed in v2.3.11, when reloading the MQTT HA integration, the Wyze entities come back within ~5-10 seconds, without having to restart anything else! 🎉

Thanks!

@jhansche
Copy link
Contributor Author

One minor note: although the entities become available again after a reload, the current status of each entity remains unknown until either I manually toggle the entity, or restart the bridge add-on. That is not a big deal imo, because the controls still work which is what's most important.

@mrlt8
Copy link
Owner

mrlt8 commented Jul 22, 2023

Hmm, would a retain flag help in this situation or would that get cleared out when HA restarts?

@jhansche
Copy link
Contributor Author

Yeah, retain would resend the message to future subscribers, including HA when the integration reloads. What I'm not sure of is whether that would persist across broker restarts 🤔 The risk of retaining a message even after the broker restarts is ending up with orphaned messages, such as if you delete the camera. Also not clear if the retained messages would continue to persist even after the Bridge client disconnects.

But what I'm reading says it'll otherwise act just like a normal message, which I think means the retain flag is not persistent. So shouldn't have that problem.

The retain flag would also work for discovery messages too btw, including when the ha integration reloads, as long as the same orphan issue doesn't happen.

mrlt8 added a commit that referenced this issue Jul 22, 2023
mrlt8 added a commit that referenced this issue Jul 24, 2023
* Start from index 1 for cruise_point/waypoint #835

* update_snapshot via MQTT

* fix camera status always online #907 #920

* Additional MQTT entities #921

* QSV related changes

* i965-va-drivers #736

* FIX power status #921

* Fix cruise_point type #921

Thanks @jhansche

* return index from command payload #921

* Update docker-image.yml

* Monitor and set preferred bitrate #929

* Default to `-` for cruise_point #921

* clear out stale entities #921

* changelog
@giorgi1324
Copy link

Yeah same here, the issue is still present in 2.3.13

@jhansche
Copy link
Contributor Author

jhansche commented Aug 5, 2023

I think the current issues may be somewhat different. I do see that sometimes the MQTT entities go unavailable periodically. But I haven't tracked down the root cause. E.g. it may be that it loses connectivity to the broker, or it may be some other state that gets mixed up in the bridge. Restarting the bridge container brings everything back, but that doesn't mean the problem is in the bridge necessarily.

@51av0sh
Copy link

51av0sh commented Jan 10, 2024

To confirm the above, this issue also happens to the Govee to MQTT add-on I recently installed. Restarting the add-ons fixes the issue but need to figure out what's causing this.

@jhansche
Copy link
Contributor Author

I think the issue happens, at least from what I've been able to determine without really digging into it, when restarting HA, without restarting the MQTT broker (mosquito add-on in my case) or the Wyze bridge add-on. I recently added an automation triggered by HA start, that checks for one of my Wyze entities being Unavailable after some period of time, and automatically restarts the add-on. It seems like that has improved things.

Given that, I think the problem is that when the MQTT integration in HA reloads, it loses the Wyze discovery configs. Therefore all entities become unavailable. Restarting the bridge add-on causes it to reconnect to the broker and resend the discovery messages, and everything comes back up.

@mrlt8 Did you end up adding the "retain" flag on discovery, mentioned here? #920 (comment)

@jhansche jhansche reopened this Jan 10, 2024
mrlt8 added a commit that referenced this issue Jan 10, 2024
@mrlt8
Copy link
Owner

mrlt8 commented Jan 10, 2024

@jhansche Could you try the latest dev image?

@jhansche
Copy link
Contributor Author

I tried the dev image, and after reloading the MQTT integration, I see the Wyze entities come back after about 5-10 sec.

But then I switched back to v2.6.0, and I'm still seeing the entities come back up. So it seems like my assumptions are wrong somewhere 🤔 either that, or the dev branch's retained discovery messages was still retained even after switching back to the release version? I could try the 2.6 image again after a restart of the broker and HA, and see if it still auto-recovers. If it does, then my assumption that it's the MQTT integration losing the discovery message doesn't hold water

I'm still having trouble pinpointing exactly which component is the culprit (as in, which one triggers entities to become unavailable and not automatically recover):

  • MQTT broker (mosquito add-on)
  • MQTT integration
  • Wyze bridge add-on

@jhansche
Copy link
Contributor Author

That's what it was... I restarted the mosquito broker, after switching back to v2.6, and now when I reload the MQTT integration, my entities go unavailable and they don't recover. I have to restart the Wyze bridge to get them back.

So it does look like the retain flag in the dev image is what fixed it. I guess it just continued to be retained even after switching back, which is what I was not expecting.

The retain flag allows the MQTT integration to receive the original discovery message when it reloads; and the last-will message is what will tell HA that it's unavailable until it reconnects.

On the topic however, I was looking at my mosquito logs, and it looks like the wyze user is opening and closing several connections, every few seconds:

2024-01-10 02:31:43: New connection from 172.30.33.12:43131 on port 1883.
2024-01-10 02:31:43: New client connected from 172.30.33.12:43131 as auto-35393D12-D47B-8F37-7CA9-A9836F7979FA (p2, c1, k60, u'wyze').
2024-01-10 02:31:43: Client auto-35393D12-D47B-8F37-7CA9-A9836F7979FA disconnected.
2024-01-10 02:31:44: New connection from 172.30.33.12:42991 on port 1883.
2024-01-10 02:31:44: New client connected from 172.30.33.12:42991 as auto-5C42A771-04E9-7F7F-3EBD-589E87214C8F (p2, c1, k60, u'wyze').
2024-01-10 02:31:44: Client auto-5C42A771-04E9-7F7F-3EBD-589E87214C8F disconnected.
2024-01-10 02:31:44: New connection from 172.30.33.12:46393 on port 1883.
2024-01-10 02:31:44: New client connected from 172.30.33.12:46393 as auto-F8A1EB48-B8A7-4148-6A7A-965A541E1700 (p2, c1, k60, u'wyze').
2024-01-10 02:31:44: Client auto-F8A1EB48-B8A7-4148-6A7A-965A541E1700 disconnected.

Is that to be expected? Not related to this issue either way, so I think this can be closed again and I can open a new issue for the connections, if you want

@mrlt8
Copy link
Owner

mrlt8 commented Jan 10, 2024

Hmm, seems like we might be able to set a birth message instead?

https://www.home-assistant.io/integrations/mqtt/#how-to-use-discovery-messages

@jhansche
Copy link
Contributor Author

Hmmm... All of this was sounding like deja vu, and then I looked at the issue description☝️😅

Yes, it looks like that should be sufficient.

However, if the bridge doesn't stay connected to the broker, it won't see the HA birth message.

mrlt8 added a commit that referenced this issue Jan 11, 2024
* Drop late audio frames to keep sync #388

* show running architecture

* Don't log failed tutk if stream is down #990

* Use valid FPS for sleep #388

* Refactor

* Adjust sleep_interval #388

* Increase sleep time between frames #388

* Set larger buf size #388

* add SLEEP_INTERVAL_FPS #388

* Adjust sleep interval #388

* use genpts #388

* avoid conflicting names with errno module

* refactor _audio_frame_slow

* LOW_LATENCY mode

* show github SHA on dev build

* substream support and more refactoring #388

* div tag for jittery video in Firefox #1025

* Re-encoding audio for WebRTC/MTX

* reduce sleep time for audio thread #388

* Target firefox for jittery video fix in css #1025

* Use K10050GetVideoParam for FW 4.50.4.x #1070

* Use K10006 for newer doorbell #742 and refactor

* Reduce audio pipe flushing #388

* show gap when audio out of sync #388

* don't include ARCH in version

* Update iotc.py

* reset frame_ts on clock sync

* update auth api

* Restructure and cleanup

* remove unneeded files

* update path

* Forget alarm/siren state #953 #1051

* use addon_config for Home Assistant

* Additional refactoring to auth api

* don't skip keyframes #388

* Update ffmpeg.py

* delay audio when ahead #388

* format iotc logging so we know what cam is late

* drop late video frame and speed up audio #388

* Add Floodlight V2

* Set default sample rate for all cams

* Delay audio by 1 second if ahead of video #388

* tweak ffmpeg buffer #388

* Retain MQTT Discovery Message #920

* Update change log

Special thanks to @carlosnasillo!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants