New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd 246 looses IPv6 address in a flappy way #16719
Comments
Please provide debugging logs of networkd. You can generate debugging logs by creating the following drop-in config:
|
This is from a case where the IPv6 was not available at first, shortly available and gone again. I also followed the log while IPv6 was working (ping to a IPv6 target in the internet) and when the ping stopped working those are the lines in the log:
Is that an issue that when the RA is received it breaks? |
69203fb does not consider the case that multiple routers exist, and causes systemd#16719. Fixes systemd#16719.
Thanks I will try a custom pkgbuild with that PR applied. Nice to see that it seem already to be addressed. |
@yuwata I was running into this problem on multiple machines on my home network. I tested with your PR applied and all the machines are behaving well now. Thanks for your fast work -- hopefully it gets merged soon. |
I can confirm that this fixes my issue as well |
69203fb does not consider the case that multiple routers exist, and causes systemd#16719. Fixes systemd#16719.
Hey folks - I don't think this is a complete fix; as I still see strange issues with SLAAC, and packet loss. With the patches from this PR applied, I no longer get the extended periods without configured v6 addresses, however I do get short periods of connectivity drop (1-2 seconds long, about once per minute), which I did not get on systemd 245. The systemd-networkd debug logs still have what seems like unusual messages in them, indicating repeated attempts to reconfigure the interface. Looks like systemd-network also restarts from time to time. Debug logs attached, with some minor redactions of v6 addresses. This is the Debian backports source, patched with the four commits referenced above. (The address ending :63be is the primary SLAAC address for this system.) |
For comparison, systemd-networkd debug log for the version 245 on the same system, which does not experience any connectivity drops. (It still seems to reconfigure the interface unusually frequently; not sure if this is an artifact of my RA environment.) |
Aside: I'm slightly puzzled by the description of the PR as being for "the case where multiple routers exist". I have only a single router in my network; it provides a fairly standard configuration of DHCPv4 for a /24 RFC1918 subnet, RA/SLAAC for a /64 v6 net, and DHCPv6 for dns-server and domain-search options only. (All of these provided by the common 'dnsmasq' service.) I worry that this issue will be more widespread than is anticipated. |
@colmbuckley From your systemd-networkd.debug.log,
Where the |
I think the 9d0f address is an RFC4941 privacy address. int0.network on this system has |
@yuwata I think the network drops I saw might be correlated with systemd-network restarts; note that the daemon exited/crashed and was restarted twice during that log capture. It's possible that there's a conflict between your patch and the Debian sources (although the patch applied cleanly) causing the daemon to crash; what do you think? Unfortunately I don't have any non-Debian systems available to test on. |
@colmbuckley I see that
What happens if you disable |
I did not restart it manually. I shall run it again shortly and see if I can reproduce this.
I will check now. |
If |
Yes, I see regular crashes of systemd-networkd with a SIGSEGV. It seems to crash immediately after logging "NDISC: Invoking callback for 'router' event." (see attached logs). This happens even with Don't want to upload a core file to a public forum; but happy to send it on privately to you. |
@colmbuckley Is the SIGSEGV is caused by the patch in #16725? What happens without the patch? |
Ugh, I found a typo in the patch. Will update. |
69203fb does not consider the case that multiple routers exist, and causes systemd#16719. Fixes systemd#16719.
Updated. @colmbuckley Could you test the new patch? Thank you for your cooperation! |
... just realized that my systemd-networkd doesn't have debugging symbols, so the core will be limited in usefulness. I'll build another version with debugging in, but don't have a lot of time left today to work on this. Yes, the SEGV only occurs with that patch. Without it, I get the original behavior of the route and address being periodically dropped and re-added, but systemd-networkd does not crash. |
Building now... |
With the new patch, systemd-networkd does not crash and the addresses look stable; I am not seeing any packet loss. I do see the interface state changing frequently from "configured" to "configuring" and back (every time it gets a RA packet from the router, I think); is that expected? (see logs) |
Great! Thank you for testing the patch!
It is expected. I do not see any reason to not set "configuring" state on receiving new packet. Or do you have any issue caused by such the state changes? Of course, we can reduce that if the new packet does not change anything. But let's do that in a later PR. |
That's fine by me, as long as any configuration which is unchanged by the new RA is maintained (rather than being dropped and then re-added). That seems to be the case from the logs, so hopefully we're good. I'll follow up with the Debian folk to pull this change once it's integrated. |
Yeah, the patch should be backported to v246-stable branch after it is merged to master branch. |
I think my preference would be for the link to change to configuring only if the new RA was introducing a change; and for it to remain stable if the new configuration was identical to the old config. But this seems to be mostly a cosmetic issue, so I will leave it up to you to figure out whether to create a new PR for it. |
I think that'd be nice to do. |
69203fb does not consider the case that multiple routers exist, and causes systemd#16719. Fixes systemd#16719. (cherry picked from commit 5055072)
69203fb does not consider the case that multiple routers exist, and causes systemd#16719. Fixes systemd#16719.
69203fb does not consider the case that multiple routers exist, and causes systemd#16719. Fixes systemd#16719.
systemd version the issue has been seen with
Used distribution
Expected behaviour you didn't see
Unexpected behaviour you saw
Steps to reproduce the problem
I did downgrade to 245 and the problem doesn't appear, when I upgrade to 246 the flappy IPv6 is back. My interface config is rather short:
And this is how it should look like and does with 245:
With 246 in the faulty state:
In the flappy case only the IPv4 and the link local v6 addresses are available.
I tried to see anything in the logs via journalctl or dmesg but nothing to see there at those timeslots where flapping starts.
Since it's narrowed down to the diff between systemd 245 and 246 I guess a change is the root cause for that behavior.
The text was updated successfully, but these errors were encountered: