mdns-repeater causes ChromeCast Audio devices to cease broadcasting mDNS responses when interface MTU exceeded #6

gjbadros · 2020-05-05T06:48:33Z

Because I'm not sure of the right set of fixes, I'm filing this issue with each of:
Google Chromecast/Home Team (via email),
https://github.com/jstasiak/python-zeroconf,
https://github.com/kennylevinsen/mdns-repeater,
https://github.com/home-assistant/plugin-multicast, and
https://github.com/home-assistant/core

I just spent the weekend tracking this down as I apparently started using mdns-repeater
unwittingly due to a change in HomeAssistant's hass.io and its new
multicast plugin
(https://github.com/home-assistant/plugin-multicast). That resulted in
a revival of some terrible instability in my 30+ Google ChromeCast
Audio (CCA) devices in my home -- the problem had gone away a couple
months ago and I'd attributed it to IGMP issues, but I peeled the
onion and this is what I found.

BUG #1 CHROMECAST AUDIO DEVICE PROCESS CRASHES ENDING mDNS:

The mDNS announcing process of Google Chromecast Audios (a
discontinued product, unfortunately) dies when triggered by the steps
below using mdns-repeater and python zeroconf (via Home Assistant in my case).

The CCA crash manifests itself by the multicast messages announcing
the CCA and its audio groups stopping appearing. E.g., if you do

tcpdump -npi eno1 port 5353 and host [CCA_IP]

you'll see a couple of PTR responses coming from the devices every 10
seconds, announcing something like:

Chromecast-Audio-dc.a.................f._googlecast._tcp.local: type TXT, class IN, cache flush

when the set of steps below happens, these announcements end until either:

the CCA reboots (power cycle); or
the CCA is forced to switch to a new WAP; e.g., I have a script
that forces a client to reconnect to the WAP in order to kick this
mDNS announcing process back on.

When those announcements cease, the Google Home app on Android stops
showing "Play Music" links under that device in the display. HOWEVER,
there is a per-physical-WAP (based off the MAC of the WAP, not SSID,
so it is not shared across multiple mesh-networked WAPs on the same
SSID) cache for Google Home, so you won't see the problem happen
immediately. You instead, have to go to another room, ensure the phone
is connected to a new WAP, and then see that "Play Music" will no
longer show up for that device.

It's worthing noting that the TCP socket interface to each ChromeCast Audio
device is still working after the MDNS announcing process has died. E.g.,
you can still play music and control the device via TCP APIs, you just
can't discover the device via mDNS.

BUG #1 SUMMARY - HIGH SEVERITY but probably NO FIX: Google ChromeCast
Audio must not crash due to bad network data. (But this probably won't
get fixed since Google Home/Mini do not have the bug and the CCA is a
discontinued product.)

BUG #2 PYTHON ZEROCONF SHOULD NOT SEND HUGE PACKETS

I have 30+ ChromeCast Audio devices and over 80+ Google casting
devices. A query response to _googlecast._tcp.locl. results in
a response that's almost 4KB, far larger than the 1500 MTU on
most ethernet switches. E.g., if I modify examples/browser.py
to interrogate like so:

browser = ServiceBrowser(zeroconf, "_googlecast._tcp.local.", handlers=[on_service_state_change])

zeroconf will then publish those 4KB mDNS responses. They, of
course, get IP fragmented and that seems to be find when multicasting
directly to the CCAs and other devices. However, RFC 6762
(https://tools.ietf.org/html/rfc6762) section 17 states some
requirements for Multicast DNS Message Size, and the fourth paragraph reads:

"A Multicast DNS packet larger than the interface MTU, which is sent
using fragments, MUST NOT contain more than one resource record."

Larger than the interface MTU seems to me to mean that these Responses
must limit themselves to no more than 1500 octets (except in the
special case of a long single record that's too big). That's not the
issue here -- the responses causing the crash are, e.g., 59 Resource
Records (RR) in the answer (not a single long one).

For whatever reason, that problem alone is not causing the ChromeCast
Audios to crash, but I strongly suspect that fixing this problem would
fix the stack. I believe these MUST be broken up into separate UDP
packets of length <= 1500 (the interface MTU) at the application layer
(rather than using IP fragmentation).

You can reproduce this using avahi-publish to create lots of records in a
subdomain and then browsing that subdomain. The total length of the
DNS records should exceed 2KB (for good measure to be sure it's
big enough).

BUG #2 SUMMARY - MEDIUM SEVERITY AND STRAIGHTFORWARD FIX: python
zeroconf MUST adhere to RFC 6762.

BUG #3 MDNS-REPEATER SOMEHOW TICKLES BUG #1 WHEN PRESENTED WITH MDNS IP FRAGMENTS

I've not investigated this thoroughly, but I suspect it's either due
to some kind of UDP storm due to a cycle that crashes that CCAs
because of the fragmentation, or some kind of packet rewriting.

The only other open issue on home-assistant/plugin-multicast seems
possibly relevant
(home-assistant/plugin-multicast#1) and
jesserockz's note at the end is worth understanding/trying. I don't
think the mdns-repeater code should be mirroring all the interfaces,
so if it is, that's a bug.

Note that in the configuration where I can reproduce this,
mdns-repeater is running inside Home Assistant's hass.io plugin called
the home-assistant/plugin-multicast

I work around it by running a shell inside that docker environment and
changing the run command to comment out the running of mdns-repeater
(since just docker stop-ping that container results in the hassio
supervisor restarting the container).

It may be worth noting that the machine on which hassio has many network interfaces:

$ ifconfig # output follows
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
eno2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
enp2s0f0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
enp2s0f1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
hassio: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
veth0ac5dfe: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth0ff2059: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth1b05ec4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth3003e6f: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth347b241: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth54a968f: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth748acbe: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
vethc5fab43: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
vethedd7c47: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500

BUG #3 SUMMARY - UNCERTAIN SEVERITY AND FIX. Confirm it's behaving
as expected when there are multiple interfaces and in the presence
of UDP packets undergoing IP fragmentation.

Summary: I propose fixing python-zeroconf as the quickest and easiest
change, and ideally someone more in tune with what homeassistant is
trying to do with mdns-repeater could figure out the right fix to
mdns-repeater and/or the way the multicast plugin is configured.

Let me know what more information you need.

The text was updated successfully, but these errors were encountered:

kennylevinsen · 2020-05-07T22:17:47Z

This is quite the wall of text. :/

A good starting point would be using wireshark to monitor both the source and destination subnets for the mdns broadcast that crashes your ChromeCast Audio—that is, the subnet that the broadcast originated on, and the subnet that mdns-repeater repeated it to. I don't have any devices that crash from bad mDNS packages, so you're a bit on your own with regards to finding the fault.

Note that mdns-repeater just blindly copies UDP packages targetting the mdns address from one interface to another. It forwards to all interfaces that have been specified by name on the command-line. That all interfaces are monitored is not a bug, it's just the current behavior. Easiest way to filter is using the blacklists.

gjbadros · 2020-05-07T23:12:19Z

Yes I think it's just the python-zeroconf that homeassistant embeds being exposed back onto the whole subnet. I don't actually now think mdns-repeater is mangling packets, it's just exposing the bad python-zeroconf-generated mdns responses from homeassistant in a docker container back onto the network where the Chromecast audios are dying. Thanks for reading the wall :) Greg

…

On Thu, May 7, 2020, 3:18 PM Kenny Levinsen ***@***.***> wrote: This is quite the wall of text. :/ A good starting point would be using wireshark to monitor both the source and destination subnets for the mdns broadcast that crashes your ChromeCast Audio—that is, the subnet that the broadcast originated on, and the subnet that mdns-repeater repeated it to. I don't have any devices that crash from bad mDNS packages, so you're a bit on your own with regards to finding the fault. Note that mdns-repeater just blindly copies UDP packages targetting the mdns address from one interface to another. It does this between all interfaces that have been *specified by name* on the command-line. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALOHTM7BFLRWEAQTJCQ533RQMXRRANCNFSM4MZKEY7Q> .

sbeckeriv · 2020-06-22T16:42:24Z

I have been running this in a docker on unraid to support my broken up network for chromecast ultra to stream plex. I have seen, often enough, that the audio for a stream for a video will not work. I thought it was an issue with chromecast to the speaker but this wall of text might also be my issue. unlike gjbadros i have just been restarting my chromecast and blaming sonos. I look forward to anything that comes out of this issue or to test any potential solutions.

Thanks again!
Becker

kennylevinsen · 2020-06-22T18:12:08Z

See python-zeroconf/python-zeroconf#245 and python-zeroconf/python-zeroconf#248.

Closing this as there were no mdns-repeater issues identified.

gjbadros · 2020-06-22T18:21:30Z

The fix was isolated to python-zeroconf so Stephen you're welcome to try the latest version of that and see if it makes a difference. I suspect it doesn't affect your scenario since the problem was about Google ChromeCast Audio (CCAs not Ultras) having mdns announcement failures under certain relatively unusual situations.

…

On Mon, Jun 22, 2020 at 9:42 AM Stephen Becker IV ***@***.***> wrote: I have been running this in a docker on unraid to support my broken up network for chromecast ultra to stream plex. I have seen, often enough, that the audio for a stream for a video will not work. I thought it was an issue with chromecast to the speaker but this wall of text might also be my issue. unlike gjbadros i have just been restarting my chromecast and blaming sonos. I look forward to anything that comes out of this issue or to test any potential solutions. Thanks again! Becker — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALOHTK4FXZ3OGPJRJ5I5ELRX6CX5ANCNFSM4MZKEY7Q> .

kennylevinsen closed this as completed Jun 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mdns-repeater causes ChromeCast Audio devices to cease broadcasting mDNS responses when interface MTU exceeded #6

mdns-repeater causes ChromeCast Audio devices to cease broadcasting mDNS responses when interface MTU exceeded #6

gjbadros commented May 5, 2020

kennylevinsen commented May 7, 2020 •

edited

gjbadros commented May 7, 2020 via email

sbeckeriv commented Jun 22, 2020

kennylevinsen commented Jun 22, 2020

gjbadros commented Jun 22, 2020 via email

mdns-repeater causes ChromeCast Audio devices to cease broadcasting mDNS responses when interface MTU exceeded #6

mdns-repeater causes ChromeCast Audio devices to cease broadcasting mDNS responses when interface MTU exceeded #6

Comments

gjbadros commented May 5, 2020

kennylevinsen commented May 7, 2020 • edited

gjbadros commented May 7, 2020 via email

sbeckeriv commented Jun 22, 2020

kennylevinsen commented Jun 22, 2020

gjbadros commented Jun 22, 2020 via email

kennylevinsen commented May 7, 2020 •

edited