-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPv6-specific "Host is unreachable" error that exits the matter runtime #100
Comments
The mDNS responder is doing broadcasting. In other words, it is not sending the UDP packet to a specific host, but rather, to the broadcast address Not using ipv6 is less than ideal to put it mildly. The way Matter is implemented in the field (Google Home and I suspect others) is that it requires IPv6 connectivity - link-local IPv6 addresses suffice, but those are necessary. Moreover, the mDNS responder also needs ipv6 support. I was not able - without it - to get Google Home provisioning to complete. So where I'm going is that if ipv6 (including for broadcasting) does not work for you, a hard failure for now is probably OK. It is another story why it fails, as per above. Can you pinpoint the exact code line where it fails? |
Ack'd, I'll dig a little deeper why this isn't working. My network definitely should be supporting ipv6 as it's a Google WiFi mesh with no custom configuration which makes me think the fault lies somewhere in the rs-matter code somehow but we'll see... |
Might be... see, with linked-local ipv6 there is no need of any explicit "ipv6 support" per se. As in, there is no "dhcp" and you don't need a gateway as well. What esp idf version are you using with the example? It should be 4.4.x which I know works, unless you've explicitly changed it... |
Ack'd that's good context for the debugging.
I'm using 5.0.x, but I can try dropping back to 4.4.x to confirm that's the issue. Another good clue, thanks! |
Confirmed that 4.4.x fixed this specific issue. I'll try to debug deeper why 5.x would be broken in this way. |
Digging deeper on why this doesn't work in release/v5.0 (and I presume v5.1 but that fails to compile with esp-idf-svc), the story here seems really hairy. I think espressif might've broke something when attempting to backport fixes to v4.4. After many hours of debugging, I am fairly confident the offending code is: espressif/esp-idf/components/lwip @ release/v4.4: espressif/esp-idf/components/lwip @ release/v5.0: No idea why these things are different or what actual diff introduced this inconsistency. The v4.4 branch of esp-lwip has only one commit and it seems unrelated like maybe somebody squashed a big merge into one commit (possibly on accident?). Even weirder is that I can't find any evidence of upstream or esp-lwip having code like this. There's also support in the v4.4 branch for IPV6_MULTICAST_IF (which probably would also fix the issue matter-rs is seeing), but that support isn't in upstream or v5.0/v5.1, or really anywhere I can see... |
My hypothesis: One test we can try to do is "manually" re-implement |
I think you're right. I was able to find a commit that indicated the behavior I identified in 4.4 is actually wrong according to the standard and they tried to fix it but seemingly regressed this other behavior we care about. I'll do some more digging and see if any work arounds exist.
I don't think the IPV6_JOIN_GROUP even has the correct support in lwip to set the multicast interface as it probably should. So there's two unknowns that we need to work out then:
I'll think on this a little more and see if I can find something... |
Nope, you were right, I think it's not setting the zone flag in the ip6_addr struct which is causing the route to fail. I'll prep a patch soon to fix it. |
After some digging, I have good news. I believe the issue is that in lwip you have to call ip6_addr_set_zone on an ip6_addr_t (which has an extra u8 zone field at the end) that is then used to route packets. This is achieved using the scope_id field in SocketAddrV6. I believe this should be required/important on all platforms, it's just very likely that Linux has a less fragile heuristic to figure this out for you. See the discussion on the scope_id field here: https://datatracker.ietf.org/doc/html/rfc2553#section-3.3. So, good news addressing my unknowns above:
I'll prep a PR to fix this |
According to the RFC (https://datatracker.ietf.org/doc/html/rfc2553#section-3.3), it is necessary to disambiguate link-local addresses with the interface index (in the scope_id field). Lacking this field, newer versions of lwip that support proper IPv6 scopes will yield EHOSTUNREACH (Host unreachable). Other implementations like on Linux and OS X will likely be affected by the lack of this field for more complex networking setups. Fixes project-chip#100
According to the RFC (https://datatracker.ietf.org/doc/html/rfc2553#section-3.3), it is necessary to disambiguate link-local addresses with the interface index (in the scope_id field). Lacking this field, newer versions of lwip that support proper IPv6 scopes will yield EHOSTUNREACH (Host unreachable). Other implementations like on Linux and OS X will likely be affected by the lack of this field for more complex networking setups. Fixes project-chip#100
According to the RFC (https://datatracker.ietf.org/doc/html/rfc2553#section-3.3), it is necessary to disambiguate link-local addresses with the interface index (in the scope_id field). Lacking this field, newer versions of lwip that support proper IPv6 scopes will yield EHOSTUNREACH (Host unreachable). Other implementations like on Linux and OS X will likely be affected by the lack of this field for more complex networking setups. Fixes project-chip#100
According to the RFC (https://datatracker.ietf.org/doc/html/rfc2553#section-3.3), it is necessary to disambiguate link-local addresses with the interface index (in the scope_id field). Lacking this field, newer versions of lwip that support proper IPv6 scopes will yield EHOSTUNREACH (Host unreachable). Other implementations like on Linux and OS X will likely be affected by the lack of this field for more complex networking setups. Fixes project-chip#100 Run cargo fmt again Run cargo clippy again Revert "Run cargo clippy again" This reverts commit e3bba1f.
Environment
Chip: ESP32-C3-MINI-1
Hardware: ESP32-C3-DevKitM-1
Platform: esp-idf (Rust std)
Problem
I likely have something misconfigured on my network causing IPv6 broadcasts to yield a surprising "Host is unreachable" error, however the more important issue is that the way the master future is structured in my example (and onoff_light) causes the entire Matter runtime to effectively shutdown and not automatically restart.
An abridged version of the log shows the issue:
The last line in particular appears to be coming from the master future in the onoff light example: https://github.com/project-chip/rs-matter/blob/main/examples/onoff_light/src/main.rs#L165
This is "fixed" by just disabling IPv6 for me, but I do think this highlights some bigger issues with the error handling robustness inside the runtime. In particular I'd expect that the IPv4 and IPv6 behaviour be separated into separate futures that can error out independently and that one reaching a terminal state wouldn't negatively impact the other. Further I think some measure of error handling policy is appropriate (Host is unreachable seems like it should probably be retryable for example). I could take a crack at a patch but I do worry based on the current state of the code that it might be a bit intrusive. Any guidance from the maintainers would be greatly appreciated before getting started!
Thanks again for this awesome project, it's renewed my interest big time in IoT :)
The text was updated successfully, but these errors were encountered: