Zeroconf.get_service_info seems broken in 0.26.0 #255

emontnemery · 2020-05-24T19:46:21Z

Several Home Assistant users have reported that cast devices can't be discovered correctly after a recent upgrade which bumps zeroconf from 0.25.1 to 0.26.1: home-assistant/core#35922

According to some testing, the problem appeared in 0.26.0, which adds #239

It seems that the problem is that Zeroconf.get_service_info doesn't return anything useful in 0.26.0+, from the logs it seems that Zeroconf keeps sending queries but does not get or does not accept answers.

Debug logs here:
home-assistant/core#35922 (comment)

For context, this is the code in pychromecast which calls Zeroconf.get_service_info, look for the add_service and service prints in the logs:

    def add_service(self, zconf, typ, name):
        """ Add a service to the collection. """
        service = None
        tries = 0
        _LOGGER.debug("add_service %s, %s", typ, name)
        while service is None and tries < 4:
            try:
                service = zconf.get_service_info(typ, name)
            except IOError:
                # If the zeroconf fails to receive the necessary data we abort
                # adding the service
                break
            tries += 1

        if not service:
            _LOGGER.debug("add_service failed to add %s, %s", typ, name)
            return

@mattsaxon, any idea?

The text was updated successfully, but these errors were encountered:

gjbadros · 2020-05-24T23:21:57Z

I haven't looked at this hard, but the mdns-repeater process that is running under the HASS supervisor has crashed for me before, resulting in failed service discovery. Is the setup where the problem occurs relying on mdns-repeater and is it still running? (I had to docker restart that container.)

hmmbob · 2020-05-25T06:23:55Z

Yes. I run the mdns repeater on the host and can stop the container with the not-working 0.26.0 setup and start the container with 0.25.1 to get a working set-up back - without further touching (or reboot) of the host.

Logs indicate that mdns traffic is still being reflected as well.

jstasiak · 2020-05-25T08:11:15Z

@emontnemery Is this about add_service callback not being called at all, or it being called all the same but get_service_info() returns some garbage now? If it's the former I suspect this is the explanation: zeroconf receives multiple packets/responses for the cast devices before it's able to dispatch first service added callback, so it dispatches service updated callback (since it already knew about the device(s)).

jstasiak · 2020-05-25T08:17:22Z

How that I think about this it should be the case only if a record changes, and I don't think you expect this here.

mattsaxon · 2020-05-25T10:12:08Z

I've just run some local tests with zeroconf 26.1 and a chromecast and it seems to be working both from a python test harness and also my HA (0.110.1) recognises it fine, including after a restart of HA. Any suggestions on how to recreate?

hmmbob · 2020-05-25T10:17:20Z

Are you using docker?
And you might need multiple devices - 1 device did get connected in my case, all subsequent devices discovered didn't.

You can find logs and some initial thoughts at home-assistant/core#35922 (comment) and later comments.

mattsaxon · 2020-05-25T10:38:04Z

No I'm not using Docker.

There's a lot of logs to go through there, can we first try to isolate this to if if it occurs when not using HA with a simple test harness? The reason I mention this is there seemed to be quite a lot of changes to the way zeroconf was being used in HA 0.110 from a quick scan of the release logs.

Happy to work on a harness with you and then we can ensure that test case is in the base zeroconf test suite so we don't inadvertently reintroduce

hmmbob · 2020-05-25T10:48:00Z

The Q about docker was because the thread seems to only have docker users reporting the issue - for now.

To summarize what we did already:

HA 109.6 with zeroconf 25.1 (default): all works fine
HA 110.0 with zeroconf 26.0 (default): cast broken
HA 110.0 with zeroconf 25.1 (requires manual installation): cast works fine again

26.1 and 26.2 didn't solve the issue either btw

"Cast broken" in this context means that not all devices seem to connect, or more specifically: any device after the first one.

mattsaxon · 2020-05-25T11:30:50Z

OK, thanks, that's a useful summary. Just reran my test harness, adding a second (mocked) chromecast device and this work fine too.

Can you enhance the logging in the code posted at the top of this thread, can you please catch all exceptions (rather than just IOError and log the exception in both cases so we can see what is occurring.

hmmbob · 2020-05-25T11:38:18Z

I'll try later today - but it will require some work inside the docker container so i'll be needing some time to set everything up. I'll report back though.

mattsaxon · 2020-05-25T12:16:23Z

One thing to consider is that I’ve realised that I’m also implicitly using docker for my HA as I’m using HassOS. My test harness is running on a Window machine however elsewhere on my network. Is it worth us trying to produce this outside of HA first as it will save you fiddling with your HA? Also if we can’t produce it outside of HA it will help us with the diagnosis. Your call obviously, I’ll support you whichever way you choose to go.

emontnemery · 2020-05-25T12:30:28Z

The key difference between the 0.25.1 and 0.26.1 logs is that Zeroconf.get_service_info returns None in 0.26.1.

Example from 0.25.1 log:

2020-05-24 16:37:39 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] add_service _googlecast._tcp.local., Chromecast-32e39183bc97abac51d77fce98a766a4-7._googlecast._tcp.local.
^--- This results in a call to `Zeroconf.get_service_info
2020-05-24 16:37:39 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [homeassistant.components.cast.discovery] Discovered chromecast ChromecastInfo(host='192.168.88.11', port=8009, service='Chromecast-32e39183bc97abac51d77fce98a766a4-7._googlecast._tcp.local.', uuid='32e39183-bc97-abac-51d7-7fce98a766a4', model_name='Chromecast', friendly_name='TV Slaapkamer')
^--- This means `Zeroconf.get_service_info returned a a service. Note in the log that no packages were sent by zeroconf to resolve the service

Example from 0.26.1 log:

2020-05-24 18:50:27 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] add_service _googlecast._tcp.local., Chromecast-32e39183bc97abac51d77fce98a766a4-7._googlecast._tcp.local.
^--- This results in a call to `Zeroconf.get_service_info
2020-05-24 18:50:27 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [zeroconf] Sending <DNSOutgoing:{multicast=True, flags=0, questions=[question[srv,in,Chromecast-32e39183bc97abac51d77fce98a766a4-7._googlecast._tcp.local.], question[txt,in,Chromecast-32e39183bc97abac51d77fce98a766a4-7._googlecast._tcp.local.], question[a,in,Chromecast-32e39183bc97abac51d77fce98a766a4-7._googlecast._tcp.local.], question[quada,in,Chromecast-32e39183bc97abac51d77fce98a766a4-7._googlecast._tcp.local.]], answers=[], authorities=[], additionals=[]}> (104 bytes) as b'\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00-Chromecast-32e39183bc97abac51d77fce98a766a4-7\x0b_googlecast\x04_tcp\x05local\x00\x00!\x00\x01\xc0\x0c\x00\x10\x00\x01\xc0\x0c\x00\x01\x00\x01\xc0\x0c\x00\x1c\x00\x01'...
^--- Question from zeroconf
2020-05-24 18:50:27 DEBUG (zeroconf-Engine) [zeroconf] Received from '172.22.0.2':5353: b'\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00-Chromecast-32e39183bc97abac51d77fce98a766a4-7\x0b_googlecast\x04_tcp\x05local\x00\x00!\x00\x01\xc0\x0c\x00\x10\x00\x01\xc0\x0c\x00\x01\x00\x01\xc0\x0c\x00\x1c\x00\x01' 
2020-05-24 18:50:27 DEBUG (zeroconf-Engine) [zeroconf] Received from '172.22.0.1':5353: b'\x00\x00\x84\x00\x00\x00\x00\x02\x00\x00\x00\x00-Chromecast-32e39183bc97abac51d77fce98a766a4-7\x0b_googlecast\x04_tcp\x05local\x00\x00\x10\x80\x01\x00\x00\x11\x94\x00\xac#id=32e39183bc97abac51d77fce98a766a4#cd=C69CA6478A7752218282F68F6D6520C6\x03rm=\x05ve=05\rmd=Chromecast\x12ic=/setup/icon.png\x10fn=TV Slaapkamer\tca=200709\x04st=0\x0fbs=FA8FCA9E2B99\x04nf=2\x03rs=\xc0\x0c\x00!\x80\x01\x00\x00\x00x\x00/\x00\x00\x00\x00\x1fI&32e39183-bc97-abac-51d7-7fce98a766a4-1\xc0K' 
2020-05-24 18:50:39 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] add_service failed to add _googlecast._tcp.local., Chromecast-32e39183bc97abac51d77fce98a766a4-7._googlecast._tcp.local.
^--- pychromecast gives upp after  `Zeroconf.get_service_info` returned `None` 4 times

Another key difference between the 0.25.1 and 0.26.1 logs is that add_service is called multiple times for the same cast device in case of 0.25.1, although with different suffixes for the name:

[pychromecast.discovery] add_service _googlecast._tcp.local., Chromecast-32e39183bc97abac51d77fce98a766a4-7._googlecast._tcp.local.
2020-05-24 16:37:54 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] add_service _googlecast._tcp.local., Chromecast-32e39183bc97abac51d77fce98a766a4-6._googlecast._tcp.local.
2020-05-24 16:37:54 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] add_service _googlecast._tcp.local., Chromecast-32e39183bc97abac51d77fce98a766a4-5._googlecast._tcp.local.
2020-05-24 16:38:06 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] add_service _googlecast._tcp.local., Chromecast-32e39183bc97abac51d77fce98a766a4-4._googlecast._tcp.local.
2020-05-24 16:38:42 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] add_service _googlecast._tcp.local., Chromecast-32e39183bc97abac51d77fce98a766a4-3._googlecast._tcp.local.
2020-05-24 16:38:54 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] add_service _googlecast._tcp.local., Chromecast-32e39183bc97abac51d77fce98a766a4-2._googlecast._tcp.local.

note: pychromecast is relying only on add_service, update_service is not implemented.
@mattsaxon With your changes in #239, is it now necessary to also add update_service?

Maybe this is just a red herring though, I've seen chromecasts add suffixes before, and it shouldn't cause Zeroconf.get_service_info to fail as long as there is an add_service for each new name?

mattsaxon · 2020-05-25T12:47:19Z

It would be a good test to see what implementing update_service yields, however this would be a bug which we would rectify, the only requirement to implement update_service SHOULD be to get updates, such as if the IP address changes.

Given my testing so far on this, looks like we are investigating some sort of edge case, not sure what it is yet though.

What you’ve identified for the multiple calls to add_service looks worthy of investigation, any idea why there are 7 suffixes? Is that 7 devices in a single house? Or is that 7 services from a single device?

emontnemery · 2020-05-25T13:31:09Z

It's 7 services from one device, but only one of them should be valid, the remaining ones should be stale records.

IIRC:
This is a fallback if the preferred name, without suffix, is already taken.
The Chromecast sends a question for its preferred name, and if receiving an answer, adds a suffix.
If that suffix is also taken, the suffix is incremented and so on.

Due to the nature of mDNS, the name may be taken by the Chromecast itself before it rebooted or reconnected etc, but cached by another client.

If @hmmbob has been "messing around" shortly before logging, that may explain the multiple services, and also why they show up in one log but not another.

In any case, logs from 0.25.1 and 0.26.1 with the changes from #254 should improve the readability.

emontnemery · 2020-05-25T13:44:20Z

@hmmbob A bit crazy, but could you try patching pychromecast/discovery.py by adding this to CastListener:

    def update_service(self, zconf, typ, name):
        _LOGGER.debug("update_service %s, %s", typ, name)
        self.add_service(zconf, typ, name)
        _LOGGER.debug("update_service done for %s, %s", typ, name)

mattsaxon · 2020-05-25T15:01:41Z

Looks a good suggestion.

Given what you’ve said about the large number of suffixes, I wonder if the issue is down to a large Mdns packet being split and when the first packet comes in, that there is not enough information to provide the data needed to service the get_service_info call. If this is the case, on a failure (ie receiving none), you could try waiting for, say 1 second and trying again. At the moment the code retries 4 times without a wait.

hmmbob · 2020-05-25T15:07:31Z

@hmmbob A bit crazy, but could you try patching pychromecast/discovery.py by adding this to CastListener:
    def update_service(self, zconf, typ, name):
        _LOGGER.debug("update_service %s, %s", typ, name)
        self.add_service(zconf, typ, name)
        _LOGGER.debug("update_service done for %s, %s", typ, name)

What testing/code changes do you want me to focus on first? I've just got limited time available tonight, but I do want to help you guys forward with debugging this....

emontnemery · 2020-05-25T15:34:12Z

@hmmbob My proposal is to:
Test with both zeroconf 0.25.1 and 0.26.1 with the following changes

Patch zeroconf to enable debug log: Chromecast and Home-mini not detected after HA restart home-assistant/core#35922 (comment)
Patch zeroconf to improve debug log of incoming data: Improve readability of logged incoming data #254
Patch pychromecast/discovery.py by adding this to CastListener:

+    def update_service(self, zconf, typ, name):
+        _LOGGER.debug("update_service %s, %s", typ, name)
+        self.add_service(zconf, typ, name)
+        _LOGGER.debug("update_service done for %s, %s", typ, name)
+
     def add_service(self, zconf, typ, name):
         """ Add a service to the collection. """
+        import time
         service = None
         tries = 0
         _LOGGER.debug("add_service %s, %s", typ, name)
         while service is None and tries < 4:
             try:
                 service = zconf.get_service_info(typ, name)
                 _LOGGER.debug("service: %s", service)
             except IOError:
                 # If the zeroconf fails to receive the necessary data we abort
                 # adding the service
                 break
             tries += 1
+            if service is None:
+                time.sleep(1)

@mattsaxon Does this seem reasonable? Is the sleep() in the right place?

jstasiak · 2020-05-25T15:51:02Z

I'll let @mattsaxon speak for himself, but to me it looks like a reasonable direction ~~(with the small exception of patching zeroconf to enable debug logs, see home-assistant/core#35922 (comment)).~~ (crossed out because home-assistant/core#35922 (comment))

hmmbob · 2020-05-25T16:48:40Z

@hmmbob My proposal is to:
Test with both zeroconf 0.25.1 and 0.26.1 with the following changes

Patch zeroconf to enable debug log: Chromecast and Home-mini not detected after HA restart home-assistant/core#35922 (comment)

Patch zeroconf to improve debug log of incoming data: Improve readability of logged incoming data #254

Patch pychromecast/discovery.py by adding this to CastListener:

First run is done. I started a brand new HA 110.0 container (to make sure previous debugging did not interfere). I observed none of the cast devices to show. Within the container, I patched the files as quoted above (i seem to get handy with vi while doing this, lol) and restarted home assistant. I observed none of the devices to show upon startup, but suddenly at 18:32-ish most of the devices showed up. I am still missing one chromecast device (LG-SH8 soundbar) and an AndroidTV (but that is quite regular if that one isn't turned on - which it isnt).

Logs: 110-26.1-patched.log

hmmbob · 2020-05-25T17:13:39Z

And the second run is done too. I again started a brand new 110.0 container and observed all cast devices showing as unavailable. I installed zeroconf 25.1, patched all files as requested and restarted home assistant. I observed all cast devices to be online immediately when the HA frontend is available.

Logs: 110-25.1-patched.log

emontnemery · 2020-05-25T18:09:35Z

Nice! I think there's some real progress here, Adding update_service is why the devices show up around 18:32!

From 0.26.1 log:

2020-05-25 18:30:44 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] add_service _googlecast._tcp.local., Google-Home-d4063db0253e246c9d66d8f7fda77a6a._googlecast._tcp.local.
^--- This results in a call to `Zeroconf.get_service_info`
2020-05-25 18:30:44 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [zeroconf] Sending <DNSOutgoing:{multicast=True, flags=0, questions=[question[srv,in,Google-Home-d4063db0253e246c9d66d8f7fda77a6a._googlecast._tcp.local.], question[txt,in,Google-Home-d4063db0253e246c9d66d8f7fda77a6a._googlecast._tcp.local.], question[a,in,Google-Home-d4063db0253e246c9d66d8f7fda77a6a._googlecast._tcp.local.], question[quada,in,Google-Home-d4063db0253e246c9d66d8f7fda77a6a._googlecast._tcp.local.]], answers=[], authorities=[], additionals=[]}> (103 bytes) as [b'\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00,Google-Home-d4063db0253e246c9d66d8f7fda77a6a\x0b_googlecast\x04_tcp\x05local\x00\x00!\x00\x01\xc0\x0c\x00\x10\x00\x01\xc0\x0c\x00\x01\x00\x01\xc0\x0c\x00\x1c\x00\x01']
^--- Question sent by zeroconf
2020-05-25 18:31:00 DEBUG (zeroconf-Engine) [zeroconf] Received from '172.22.0.1':5353: <DNSIncoming:{id=0, flags=33792, n_q=0, n_ans=2, n_auth=0, n_add=0, questions=[], answers=[record[txt,in-unique,Google-Home-d4063db0253e246c9d66d8f7fda77a6a._googlecast._tcp.local.]=4500/4499,b'#id=d40'..., record[srv,in-unique,Google-Home-d4063db0253e246c9d66d8f7fda77a6a._googlecast._tcp.local.]=120/119,d4063db0-253e-246c-9d66-d8f7fda77a6a.local.:8009]}> (320 bytes) as [b'\x00\x00\x84\x00\x00\x00\x00\x02\x00\x00\x00\x00,Google-Home-d4063db0253e246c9d66d8f7fda77a6a\x0b_googlecast\x04_tcp\x05local\x00\x00\x10\x80\x01\x00\x00\x11\x94\x00\xac#id=d4063db0253e246c9d66d8f7fda77a6a#cd=8F6DC5CC9152D3EA8E06D8C484007970\x03rm=\x05ve=05\x0emd=Google Home\x12ic=/setup/icon.png\x0ffn=Studeerkamer\tca=198660\x04st=0\x0fbs=FA8FCA8E8647\x04nf=1\x03rs=\xc0\x0c\x00!\x80\x01\x00\x00\x00x\x00-\x00\x00\x00\x00\x1fI$d4063db0-253e-246c-9d66-d8f7fda77a6a\xc0J']
^--- Reply received, should be valid?
2020-05-25 18:31:00 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] add_service failed to add _googlecast._tcp.local., Google-Home-d4063db0253e246c9d66d8f7fda77a6a._googlecast._tcp.local.
^--- Add service fails even though reply has been received

Devices show up around 18:32:

2020-05-25 18:32:21 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] update_service _googlecast._tcp.local., Google-Home-d4063db0253e246c9d66d8f7fda77a6a._googlecast._tcp.local.
2020-05-25 18:32:21 DEBUG (zeroconf-ServiceBrowser__googlecast._tcp.local.) [pychromecast.discovery] add_service _googlecast._tcp.local., Google-Home-d4063db0253e246c9d66d8f7fda77a6a._googlecast._tcp.local.

One thing which unfortunately makes the logs inconclusive is that there are multiple zeroconf instances and multiple zeroconf-ServiceBrowser__googlecast._tcp.local. service browsers.

emontnemery · 2020-05-25T19:22:48Z

#239 adds the lock _handlers_lock:
https://github.com/jstasiak/python-zeroconf/blob/552a030eb592a0c07feaa7a01ece1464da4b1d0b/zeroconf/__init__.py#L2209

Which is used in the engine thread:
https://github.com/jstasiak/python-zeroconf/blob/552a030eb592a0c07feaa7a01ece1464da4b1d0b/zeroconf/__init__.py#L2484-L2489

And also by the service browser when issuing the state change callbacks:
https://github.com/jstasiak/python-zeroconf/blob/552a030eb592a0c07feaa7a01ece1464da4b1d0b/zeroconf/__init__.py#L1541-L1546

Both pychromecast and Home Assistant calls Zeroconf.get_service_info from the service callbacks which means the lock may be held for several seconds. Could this be the root cause of this issue?

mattsaxon · 2020-05-25T19:27:14Z

I haven’t considered what you’ve written above yet, but let me explain why I added the lock. The intention was to deal with the situation of calling add_service and then update_service day in short succession. The expectations was that adding it would allow the add_serivce to have all the information rather than calling add_service early and then calling update_service. Pretty much trying to avoid the problem we may be facing here .

emontnemery · 2020-05-25T19:49:24Z

OK, so the intention is to prevent the service browsers from rushing away and call add_service too early while looping over for record in msg.answers:?

Maybe the lock can remain, but ~~make a list handlers_to_really_really_call which can be looped over without holding the lock~~ call self._service_state_changed.fire() outside the lock:

             if len(self._handlers_to_call) > 0 and not self.zc.done:
                 with self.zc._handlers_lock:
                     handler = self._handlers_to_call.popitem(False)
-                    self._service_state_changed.fire(
-                        zeroconf=self.zc, service_type=self.type, name=handler[0], state_change=handler[1]
-                    )
+                self._service_state_changed.fire(
+                    zeroconf=self.zc, service_type=self.type, name=handler[0], state_change=handler[1]
+                )

jstasiak · 2020-05-25T20:08:47Z

I think this could be just deindenting the self._service_state_changed.fire() lines?

emontnemery · 2020-05-25T20:11:52Z

Yes, that's right, not sure what I was thinking @_@

emontnemery · 2020-05-25T20:21:23Z

Maybe another round of test, test with zeroconf 0.26.1 and the below changes.
@jstasiak, @hmmbob does it seem reasonable?

Changes:

Patch zeroconf to enable debug log: Chromecast and Home-mini not detected after HA restart home-assistant/core#35922 (comment)
Patch zeroconf to improve debug log of incoming data: Improve readability of logged incoming data #254
Patch zeroconf to not starve the engine thread:

             if len(self._handlers_to_call) > 0 and not self.zc.done:
                 with self.zc._handlers_lock:
                     handler = self._handlers_to_call.popitem(False)
-                    self._service_state_changed.fire(
-                        zeroconf=self.zc, service_type=self.type, name=handler[0], state_change=handler[1]
-                    )
+                self._service_state_changed.fire(
+                    zeroconf=self.zc, service_type=self.type, name=handler[0], state_change=handler[1]
+                )

Patch pychromecast/discovery.py by adding this to CastListener:

+    def update_service(self, zconf, typ, name):
+        _LOGGER.debug("update_service %s, %s", typ, name)
+        self.add_service(zconf, typ, name)
+        _LOGGER.debug("update_service done for %s, %s", typ, name)
+
     def add_service(self, zconf, typ, name):
         """ Add a service to the collection. """
+        import time
         service = None
         tries = 0
         _LOGGER.debug("add_service %s, %s", typ, name)
         while service is None and tries < 4:
             try:
                 service = zconf.get_service_info(typ, name)
                 _LOGGER.debug("service: %s", service)
             except IOError:
                 # If the zeroconf fails to receive the necessary data we abort
                 # adding the service
                 break
             tries += 1
+            if service is None:
+                time.sleep(1)

hmmbob · 2020-05-25T20:22:27Z

I'll give it a run tomorrow!

Strike that - I still have that container from the previous test so I'll do it now.

edit: @emontnemery @jstasiak First run seems to be successful (I mean: cast items show at startup with patched 26.1). I am verifying the run again.
edit2: 2nd run also succesful.

Logs: 110-26.1-patched-2.log

jstasiak · 2020-05-25T21:02:00Z

@emontnemery Yes, +1 from me, although late, as @hmmbob was faster with actual testing. :)

hmmbob · 2020-05-25T21:13:12Z

But now I wonder why I ran into this, with just a few others - but not the other thousands of HA users.....

jstasiak · 2020-05-25T21:21:28Z

Is the percentage of HA users who upgraded to the latest HA and use pychromecast/zeroconf known (not the precise figure, naturally, just an approximation)?

emontnemery · 2020-05-25T21:36:20Z

Based on the number of comments in home-assistant/core#35922, i don't think the percentage is too high. A few users also complain in the forums.

@hmmbob is running a "non recommended" HA setup where the cast devices and home assistant are on different networks, with Avahi in reflector mode helping to bridge, that seems to be the common factor for others as well.

Maybe it would be worth digging a bit deeper in the logs from @hmmbob to understand why the deadlock / starve was triggered in this case. Maybe Avahi does some filtering of mDNS packets which somehow changes the timing cuasing Zeroconf.get_service_info to always/often deadlock?

Closes #255 Background: #239 adds the lock _handlers_lock: python-zeroconf/zeroconf/__init__.py self._handlers_lock = threading.Lock() # ensure we process a full message in one go Which is used in the engine thread: def handle_response(self, msg: DNSIncoming) -> None: """Deal with incoming response packets. All answers are held in the cache, and listeners are notified.""" with self._handlers_lock: And also by the service browser when issuing the state change callbacks: if len(self._handlers_to_call) > 0 and not self.zc.done: with self.zc._handlers_lock: handler = self._handlers_to_call.popitem(False) self._service_state_changed.fire( zeroconf=self.zc, service_type=self.type, name=handler[0], state_change=handler[1] ) Both pychromecast and Home Assistant calls Zeroconf.get_service_info from the service callbacks which means the lock may be held for several seconds which will starve the engine thread.

emontnemery mentioned this issue May 25, 2020

Chromecast and Home-mini not detected after HA restart home-assistant/core#35922

Closed

emontnemery mentioned this issue May 25, 2020

Don't call callbacks whend holding _handlers_lock #258

Merged

jstasiak closed this as completed in #258 May 26, 2020

bdraco mentioned this issue Aug 9, 2020

Reduce the time window that the handlers lock is held #287

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zeroconf.get_service_info seems broken in 0.26.0 #255

Zeroconf.get_service_info seems broken in 0.26.0 #255

emontnemery commented May 24, 2020 •

edited

gjbadros commented May 24, 2020

hmmbob commented May 25, 2020

jstasiak commented May 25, 2020

jstasiak commented May 25, 2020

mattsaxon commented May 25, 2020 •

edited

hmmbob commented May 25, 2020 •

edited

mattsaxon commented May 25, 2020 •

edited

hmmbob commented May 25, 2020

mattsaxon commented May 25, 2020

hmmbob commented May 25, 2020

mattsaxon commented May 25, 2020

emontnemery commented May 25, 2020 •

edited

mattsaxon commented May 25, 2020 •

edited

emontnemery commented May 25, 2020 •

edited

emontnemery commented May 25, 2020

mattsaxon commented May 25, 2020

hmmbob commented May 25, 2020 •

edited

emontnemery commented May 25, 2020 •

edited

jstasiak commented May 25, 2020 •

edited

hmmbob commented May 25, 2020 •

edited

hmmbob commented May 25, 2020

emontnemery commented May 25, 2020 •

edited

emontnemery commented May 25, 2020

mattsaxon commented May 25, 2020 •

edited

emontnemery commented May 25, 2020 •

edited

jstasiak commented May 25, 2020

emontnemery commented May 25, 2020

emontnemery commented May 25, 2020 •

edited

hmmbob commented May 25, 2020 •

edited

jstasiak commented May 25, 2020

hmmbob commented May 25, 2020

jstasiak commented May 25, 2020

emontnemery commented May 25, 2020 •

edited

Zeroconf.get_service_info seems broken in 0.26.0 #255

Zeroconf.get_service_info seems broken in 0.26.0 #255

Comments

emontnemery commented May 24, 2020 • edited

gjbadros commented May 24, 2020

hmmbob commented May 25, 2020

jstasiak commented May 25, 2020

jstasiak commented May 25, 2020

mattsaxon commented May 25, 2020 • edited

hmmbob commented May 25, 2020 • edited

mattsaxon commented May 25, 2020 • edited

hmmbob commented May 25, 2020

mattsaxon commented May 25, 2020

hmmbob commented May 25, 2020

mattsaxon commented May 25, 2020

emontnemery commented May 25, 2020 • edited

mattsaxon commented May 25, 2020 • edited

emontnemery commented May 25, 2020 • edited

emontnemery commented May 25, 2020

mattsaxon commented May 25, 2020

hmmbob commented May 25, 2020 • edited

emontnemery commented May 25, 2020 • edited

jstasiak commented May 25, 2020 • edited

hmmbob commented May 25, 2020 • edited

hmmbob commented May 25, 2020

emontnemery commented May 25, 2020 • edited

emontnemery commented May 25, 2020

mattsaxon commented May 25, 2020 • edited

emontnemery commented May 25, 2020 • edited

jstasiak commented May 25, 2020

emontnemery commented May 25, 2020

emontnemery commented May 25, 2020 • edited

hmmbob commented May 25, 2020 • edited

jstasiak commented May 25, 2020

hmmbob commented May 25, 2020

jstasiak commented May 25, 2020

emontnemery commented May 25, 2020 • edited

emontnemery commented May 24, 2020 •

edited

mattsaxon commented May 25, 2020 •

edited

hmmbob commented May 25, 2020 •

edited

mattsaxon commented May 25, 2020 •

edited

emontnemery commented May 25, 2020 •

edited

mattsaxon commented May 25, 2020 •

edited

emontnemery commented May 25, 2020 •

edited

hmmbob commented May 25, 2020 •

edited

emontnemery commented May 25, 2020 •

edited

jstasiak commented May 25, 2020 •

edited

hmmbob commented May 25, 2020 •

edited

emontnemery commented May 25, 2020 •

edited

mattsaxon commented May 25, 2020 •

edited

emontnemery commented May 25, 2020 •

edited

emontnemery commented May 25, 2020 •

edited

hmmbob commented May 25, 2020 •

edited

emontnemery commented May 25, 2020 •

edited