Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mDNS: update domain lib; fix various issues #186

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

ivmarkov
Copy link
Contributor

@ivmarkov ivmarkov commented Jun 12, 2024

I started looking at the mDNS and DNS-SD RFCs recently w.r.t. what it would take to implement the "query" portion of mDNS in our built-in pure-Rust responder. (So that we can initiate brand-new sessions for subscriptions, or update existing sessions to their new peer IP address, in case such a change is detected.)

What a can of worms I opened...

While the mDNS and DNS-SD specs look deceivingly simple (great work by the author BTW as in the specs are super easy to read), and while the domain lib does the heavy-lifting of DNS record parsing / creation for us... these specs are actually complex in that there is a lot of complexity hidden in there not in terms of DNS record parsing/creation, but how an mDNS responder/queryer should react in various circumstances (as in - when to accept unicast requests; when to generate delays before broadcasting; what additional data to put in the mDNS record; and the list goes on...)

Anyway, to cut the long story short, this PR addresses the following deficiencies we have in our current mDNS impl (in severity order):

  • We should not (as in 99% of the time) answer with unicast replies to DNS-SD queries. We do this currently all the time. PR fixes it that we always answer with multicast message (still not correct but only in 1% of the cases; answering via unicast was not correct in 99% of the cases and was the reason why we sometimes could not send a message to a remote peer)
  • To avoid wire packet collisions, we should reply to questions containing the generic "what DNS-SD services do you have?" (i.e. questions where we are not the authority and a lot of peers might reply) with a random 20 - 100 ms delay. Else our answer might be "lost in the noise"
  • We should put "extra" data we want to send to the remote peer in the additional DNS message section. We were just slapping it in the answers section, though we don't address a question with such an answer. Impact: unknown
  • We had a lot of repetition, generating large mDNS responses. Impact: unknown
  • We did not have any unit tests. I've implement a unit-test framework and a few such (we can fill those in over time) at least on the level of DNS packet "correctness". We don't have end-to-end tests yet, which cover the networking aspect

Since the PR was ending up large-ish, I also upgraded domain to the next version (0.10). This increased our Rust MSRV from 1.77 to 1.78. Upgrading got rid of a potential future large refactoring (domain lib authors renamed the notion of "dname" to just "name" everywhere) and also we could get rid of the heapless 0.7 "appendix" dependency that we no longer need. And from the "octets" crate which is now re-exported by the new "domain" lib.

So yes, I think this should go in.

Are we 100% compliant yet after this PR?

Far from it. I also think we might never get fully compliant with mDNS and DNS-SD. But... we just need to be "compliant enough" and not generate too much "network noise".
I have my doubts that other MCU-based mDNS implementations are very compliant either (due to code complexity yet limited flash size).

@ivmarkov
Copy link
Contributor Author

ivmarkov commented Jun 12, 2024

@kedars If that's not clear ^^^ does not yet implement the "query" portion. But it tries to fix the pre-existing "responder" logic to operate well-enough, so that I hopefully finally get rid of the mDNS lookup timeouts I see from time to time in my own setup over here - in the Commissioner...

Copy link
Collaborator

@kedars kedars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a can of worms I opened...

:-D

I have spent too much time debugging 'device not accessible over time' issues in the field, distill down to mDNS discovery, depending upon how a particular implementation works, and interoperates with other entities on the network.

Are you considering the built-in implementation for just quick-developer use cases or production options too? If you intend to continue to evolve the built-in implementation:

  • I wonder if we should spit it out into a separate project since it will get complicated, and also so that it has a wider applicability, and a better chance of surfacing and fixing issues.
  • There is a Bonjour Conformance Test (BCT), available for download on Apple's site (require's Apple developer account) https://developer.apple.com/bonjour/. It also has chattiness tests across a few hours to ensure we aren't creating large bursts. I doubt if many of the current implementations even pass this though.

rs-matter/src/mdns/builtin.rs Show resolved Hide resolved
@ivmarkov
Copy link
Contributor Author

ivmarkov commented Jun 13, 2024

Are you considering the built-in implementation for just quick-developer use cases or production options too?

Great question.
We must bring it to a state where it can be used in production. Reason being, "true" baremetal top-to-bottom Rust MCU stacks like Embassy, and the stuff the baremetal Rust folks at Espressif do with esp-hal (which is essentially integrating with Embassy, in the lack of any other task scheduler) do not have any suitable mDNS implementation in-place.

So if rs-matter is to be used in future on these stacks, we need to have an mDNS implementation that runs there.

If you intend to continue to evolve the built-in implementation:

  • I wonder if we should spit it out into a separate project since it will get complicated, and also so that it has a wider applicability, and a better chance of surfacing and fixing issues.

Also a great question.
What I'm doing in the meantime, is to maintain a "copy" of our "built-in" mDNS implementation as a stand-alone library here, as part of the edge-net suite of protocols.

The implementation in edge-mdns is almost a verbatim copy of the rs-matter one. In a way, if you need - on your device - a more generic mDNS responder (and queryer in future) - as you would be running also other mDNS workloads besides the Mater ones, you should be using that library or a similar one with MdnsType::UserProvided in rs-matter.

Reason why I'm not actively pushing for inclusion of this library as a dependency in rs-matter:

  • It is not popular enough, at least as of now
  • Its API is not set in stone. I would expect a lot of changes there, in the next couple of iterations
  • It is in the ivmarkov GH namespace still. Not ideal, as this is a one-man show

So for the short to mid-term - as painful as it is for me - I think we should keep our own mDNS implementation in-tree in rs-matter. Once (or if) the edge-net crates and edge-mdns becomes somewhat popular, we might switch.

  • There is a Bonjour Conformance Test (BCT), available for download on Apple's site (require's Apple developer account) https://developer.apple.com/bonjour/. It also has chattiness tests across a few hours to ensure we aren't creating large bursts. I doubt if many of the current implementations even pass this though.

Great idea! Let me look into it (but possibly after merging the BTP PR, as it is next on my list).

@ivmarkov
Copy link
Contributor Author

BTW one more reason why we need e2e tests - after my changes, the code was supposed to send to the multicast addresses. Yet - due to a single-line bug - it was still sending to the unicast address.

Fixed now (and re-tested), but yes I think we do need e2e - it would be not impossible but difficult to catch such type of errors with unit tests.

@kedars kedars merged commit 3fbc047 into project-chip:main Jun 13, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants