Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mdns-repeater causes ChromeCast Audio devices to cease broadcasting mDNS responses when interface MTU exceeded #6

Closed
gjbadros opened this issue May 5, 2020 · 5 comments

Comments

@gjbadros
Copy link

gjbadros commented May 5, 2020

Because I'm not sure of the right set of fixes, I'm filing this issue with each of:
Google Chromecast/Home Team (via email),
https://github.com/jstasiak/python-zeroconf,
https://github.com/kennylevinsen/mdns-repeater,
https://github.com/home-assistant/plugin-multicast, and
https://github.com/home-assistant/core

I just spent the weekend tracking this down as I apparently started using mdns-repeater
unwittingly due to a change in HomeAssistant's hass.io and its new
multicast plugin
(https://github.com/home-assistant/plugin-multicast). That resulted in
a revival of some terrible instability in my 30+ Google ChromeCast
Audio (CCA) devices in my home -- the problem had gone away a couple
months ago and I'd attributed it to IGMP issues, but I peeled the
onion and this is what I found.

BUG #1 CHROMECAST AUDIO DEVICE PROCESS CRASHES ENDING mDNS:

The mDNS announcing process of Google Chromecast Audios (a
discontinued product, unfortunately) dies when triggered by the steps
below using mdns-repeater and python zeroconf (via Home Assistant in my case).

The CCA crash manifests itself by the multicast messages announcing
the CCA and its audio groups stopping appearing. E.g., if you do

tcpdump -npi eno1 port 5353 and host [CCA_IP]

you'll see a couple of PTR responses coming from the devices every 10
seconds, announcing something like:

Chromecast-Audio-dc.a.................f._googlecast._tcp.local: type TXT, class IN, cache flush

when the set of steps below happens, these announcements end until either:

  1. the CCA reboots (power cycle); or

  2. the CCA is forced to switch to a new WAP; e.g., I have a script
    that forces a client to reconnect to the WAP in order to kick this
    mDNS announcing process back on.

When those announcements cease, the Google Home app on Android stops
showing "Play Music" links under that device in the display. HOWEVER,
there is a per-physical-WAP (based off the MAC of the WAP, not SSID,
so it is not shared across multiple mesh-networked WAPs on the same
SSID) cache for Google Home, so you won't see the problem happen
immediately. You instead, have to go to another room, ensure the phone
is connected to a new WAP, and then see that "Play Music" will no
longer show up for that device.

It's worthing noting that the TCP socket interface to each ChromeCast Audio
device is still working after the MDNS announcing process has died. E.g.,
you can still play music and control the device via TCP APIs, you just
can't discover the device via mDNS.

BUG #1 SUMMARY - HIGH SEVERITY but probably NO FIX: Google ChromeCast
Audio must not crash due to bad network data. (But this probably won't
get fixed since Google Home/Mini do not have the bug and the CCA is a
discontinued product.)

BUG #2 PYTHON ZEROCONF SHOULD NOT SEND HUGE PACKETS

I have 30+ ChromeCast Audio devices and over 80+ Google casting
devices. A query response to _googlecast._tcp.locl. results in
a response that's almost 4KB, far larger than the 1500 MTU on
most ethernet switches. E.g., if I modify examples/browser.py
to interrogate like so:

browser = ServiceBrowser(zeroconf, "_googlecast._tcp.local.", handlers=[on_service_state_change])

zeroconf will then publish those 4KB mDNS responses. They, of
course, get IP fragmented and that seems to be find when multicasting
directly to the CCAs and other devices. However, RFC 6762
(https://tools.ietf.org/html/rfc6762) section 17 states some
requirements for Multicast DNS Message Size, and the fourth paragraph reads:

"A Multicast DNS packet larger than the interface MTU, which is sent
using fragments, MUST NOT contain more than one resource record."

Larger than the interface MTU seems to me to mean that these Responses
must limit themselves to no more than 1500 octets (except in the
special case of a long single record that's too big). That's not the
issue here -- the responses causing the crash are, e.g., 59 Resource
Records (RR) in the answer (not a single long one).

For whatever reason, that problem alone is not causing the ChromeCast
Audios to crash, but I strongly suspect that fixing this problem would
fix the stack. I believe these MUST be broken up into separate UDP
packets of length <= 1500 (the interface MTU) at the application layer
(rather than using IP fragmentation).

You can reproduce this using avahi-publish to create lots of records in a
subdomain and then browsing that subdomain. The total length of the
DNS records should exceed 2KB (for good measure to be sure it's
big enough).

BUG #2 SUMMARY - MEDIUM SEVERITY AND STRAIGHTFORWARD FIX: python
zeroconf MUST adhere to RFC 6762.

BUG #3 MDNS-REPEATER SOMEHOW TICKLES BUG #1 WHEN PRESENTED WITH MDNS IP FRAGMENTS

I've not investigated this thoroughly, but I suspect it's either due
to some kind of UDP storm due to a cycle that crashes that CCAs
because of the fragmentation, or some kind of packet rewriting.

The only other open issue on home-assistant/plugin-multicast seems
possibly relevant
(home-assistant/plugin-multicast#1) and
jesserockz's note at the end is worth understanding/trying. I don't
think the mdns-repeater code should be mirroring all the interfaces,
so if it is, that's a bug.

Note that in the configuration where I can reproduce this,
mdns-repeater is running inside Home Assistant's hass.io plugin called
the home-assistant/plugin-multicast

I work around it by running a shell inside that docker environment and
changing the run command to comment out the running of mdns-repeater
(since just docker stop-ping that container results in the hassio
supervisor restarting the container).

It may be worth noting that the machine on which hassio has many network interfaces:

$ ifconfig # output follows
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
eno2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
enp2s0f0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
enp2s0f1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
hassio: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
veth0ac5dfe: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth0ff2059: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth1b05ec4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth3003e6f: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth347b241: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth54a968f: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
veth748acbe: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
vethc5fab43: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
vethedd7c47: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500

BUG #3 SUMMARY - UNCERTAIN SEVERITY AND FIX. Confirm it's behaving
as expected when there are multiple interfaces and in the presence
of UDP packets undergoing IP fragmentation.

Summary: I propose fixing python-zeroconf as the quickest and easiest
change, and ideally someone more in tune with what homeassistant is
trying to do with mdns-repeater could figure out the right fix to
mdns-repeater and/or the way the multicast plugin is configured.

Let me know what more information you need.

@kennylevinsen
Copy link
Collaborator

kennylevinsen commented May 7, 2020

This is quite the wall of text. :/

A good starting point would be using wireshark to monitor both the source and destination subnets for the mdns broadcast that crashes your ChromeCast Audio—that is, the subnet that the broadcast originated on, and the subnet that mdns-repeater repeated it to. I don't have any devices that crash from bad mDNS packages, so you're a bit on your own with regards to finding the fault.

Note that mdns-repeater just blindly copies UDP packages targetting the mdns address from one interface to another. It forwards to all interfaces that have been specified by name on the command-line. That all interfaces are monitored is not a bug, it's just the current behavior. Easiest way to filter is using the blacklists.

@gjbadros
Copy link
Author

gjbadros commented May 7, 2020 via email

@sbeckeriv
Copy link

I have been running this in a docker on unraid to support my broken up network for chromecast ultra to stream plex. I have seen, often enough, that the audio for a stream for a video will not work. I thought it was an issue with chromecast to the speaker but this wall of text might also be my issue. unlike gjbadros i have just been restarting my chromecast and blaming sonos. I look forward to anything that comes out of this issue or to test any potential solutions.

Thanks again!
Becker

@kennylevinsen
Copy link
Collaborator

See python-zeroconf/python-zeroconf#245 and python-zeroconf/python-zeroconf#248.

Closing this as there were no mdns-repeater issues identified.

@gjbadros
Copy link
Author

gjbadros commented Jun 22, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants