Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systemd 249 network issues after waking from sleep #20244

Closed
oscarpax opened this issue Jul 19, 2021 · 16 comments · Fixed by #20374
Closed

Systemd 249 network issues after waking from sleep #20244

oscarpax opened this issue Jul 19, 2021 · 16 comments · Fixed by #20374
Labels
bug 🐛 Programming errors, that need preferential fixing network regression ⚠️ A bug in something that used to work correctly and broke through some recent commit

Comments

@oscarpax
Copy link

systemd version the issue has been seen with
systemd 249 (249-3-arch)

Used distribution
Arch Linux

Linux kernel version used (uname -a)
5.12.15-arch1-1

CPU architecture issue was seen on
amd64

Expected behaviour you didn't see
Network connection to resume and stay constant with ipv6

Unexpected behaviour you saw
Frequent network connection changes with ipv6 (global) address disappearing and then reappearing. I can reproduce this problem identically with a desktop and a laptop. Rebooting fixes the problem until next sleep. Workaround is to disable systemd-networkd and systemd-resolved, and enabling NetworkManager after which network connection resumes normally even after waking from sleep.

Steps to reproduce the problem
Put the computer to sleep, and then wake it up. This problem only happens after sleep.

Additional program output to the terminal or log subsystem illustrating the issue

@yuwata yuwata added the network label Jul 19, 2021
@yuwata
Copy link
Member

yuwata commented Jul 19, 2021

Could you enable to generate debugging logs of networkd and provide it? The logs can be generated by creating the following drop-in config:

# /etc/systemd/system/systemd-networkd.service.d/override.conf
[Service]
Environment=SYSTEMD_LOG_LEVEL=debug

@yuwata yuwata added bug 🐛 Programming errors, that need preferential fixing needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer labels Jul 19, 2021
@oscarpax
Copy link
Author

@bluca bluca removed the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Jul 20, 2021
@garw
Copy link

garw commented Jul 24, 2021

I see the same behavior with systemd 249.1-1 and Linux 5.13.4-arch1-1 also on Arch Linux.

The SLAAC global address is continuously being removed and added to the interface. For me it isn't 100% reproducible after every suspend to RAM cycle but it always occurs after some cycles.
Restarting systemd-networkd doesn't change the behavior. Also removing /run/systemd/netif doesn't change the behavior. However, after the next complete reboot, the problem is gone.

Another log to cross reference:
https://gist.github.com/garw/4542a4787d1df964ef32589d6a5cba70

@Tisalgelijk
Copy link

Can also confirm this behaviour with 249 on Arch Linux. SLAAC addresses alternate between being added and removed with every incoming RA. This only starts happening some time after booting. The resume from suspend others have reported is not directly related, just often a way to move forward in time a few hours.

This comment in issue #20050 references this behaviour: #20050 (comment)

I would guess commit 899034b apparently did not completely fix this issue, but rather delays it until a few hours after a reboot.

@yuwata
Copy link
Member

yuwata commented Aug 4, 2021

@oscarpax and @garw Thank you for providing the debugging logs. I hope #20374 fixes this issue. If possible, please test the PR. Thank you!

(And sorry for late reply. I took PTO for a while.)

bluca pushed a commit that referenced this issue Aug 4, 2021
@foxxx0
Copy link

foxxx0 commented Aug 4, 2021

just for reference: i've been using v249 + the proposed patch since this morning including two temporary s2ram + resumes and have not had any issues yet. looks promising!

@yuwata
Copy link
Member

yuwata commented Aug 4, 2021

@foxxx0 Thank you!

@foxxx0
Copy link

foxxx0 commented Aug 6, 2021

Unfortunately, after 2 days with longer s2ram periods (8-10h) the issue now presented itself even with the patch.

It seems that on one RA the SLAAC works and the interface gets configured.
But on the next RA it will just remove the addresses again, without even configuring new ones.
So until the next RA you are left without any IPv6 address configured.
On the next RA the SLAAC works again and the IPv6 addresses are added to the interface.

Did a quick restart of systemd-networkd.service to enable debug logging and that didn't resolve the issue so it is not related to long-running instances of networkd.

networkd debug log: https://gist.github.com/foxxx0/ff83f6130a9c2b4c7c073a2a52076319
(I have obfuscated my IPv6 prefix to 3000:aaaa:bbbb::)

@yuwata
Copy link
Member

yuwata commented Aug 6, 2021

@foxxx0 From the debugging log, the first and third RA messages seem to contain autonomous prefix, but the second and fourth do not. So, the IPv6 address is once configured, but removed later. See the following:

Aug 06 08:11:31 utgard.lan systemd-networkd[2119957]: br0: NDISC: Invoking callback for 'router' event.
Aug 06 08:11:31 utgard.lan systemd-networkd[2119957]: br0: Requesting route: dst: n/a, src: n/a, gw: fe80::1, prefsrc: n/a, scope: global, table: main(254), proto: ra, type: unicast, nexthop: 0, priority: 1024
Aug 06 08:11:31 utgard.lan systemd-networkd[2119957]: br0: Requesting route: dst: 3000:aaaa:bbbb::/64, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main(254), proto: ra, type: unicast, nexthop: 0, priority: 1024
Aug 06 08:11:31 utgard.lan systemd-networkd[2119957]: br0: Requesting address: 3000:aaaa:bbbb:0:f22f:74ff:fe16:5ade/64 (valid for 12h, preferred for 5min), flags: manage-temporary-address,no-prefixroute
Aug 06 08:11:31 utgard.lan systemd-networkd[2119957]: br0: Setting SLAAC addresses.
Aug 06 08:11:31 utgard.lan systemd-networkd[2119957]: br0: Setting NDisc routes.
Aug 06 08:11:31 utgard.lan systemd-networkd[2119957]: br0: link_check_ready(): dynamic addresses or routes are not configured.
Aug 06 08:11:31 utgard.lan systemd-networkd[2119957]: br0: Configuring route: dst: n/a, src: n/a, gw: fe80::1, prefsrc: n/a, scope: global, table: main(254), proto: ra, type: unicast, nexthop: 0, priority: 1024
Aug 06 08:11:31 utgard.lan systemd-networkd[2119957]: br0: Configuring route: dst: 3000:aaaa:bbbb::/64, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main(254), proto: ra, type: unicast, nexthop: 0, priority: 1024
Aug 06 08:11:31 utgard.lan systemd-networkd[2119957]: br0: Configuring address: 3000:aaaa:bbbb:0:f22f:74ff:fe16:5ade/64 (valid for 12h, preferred for 5min), flags: tentative,manage-temporary-address,no-prefixroute
(snip)
Aug 06 08:12:24 utgard.lan systemd-networkd[2119957]: br0: NDISC: Invoking callback for 'router' event.
Aug 06 08:12:24 utgard.lan systemd-networkd[2119957]: br0: Requesting route: dst: n/a, src: n/a, gw: fe80::1, prefsrc: n/a, scope: global, table: main(254), proto: ra, type: unicast, nexthop: 0, priority: 1024
Aug 06 08:12:24 utgard.lan systemd-networkd[2119957]: br0: Requesting route: dst: 3000:aaaa:bbbb::/64, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main(254), proto: ra, type: unicast, nexthop: 0, priority: 1024
Aug 06 08:12:24 utgard.lan systemd-networkd[2119957]: br0: Setting NDisc routes.
Aug 06 08:12:24 utgard.lan systemd-networkd[2119957]: br0: Configuring route: dst: n/a, src: n/a, gw: fe80::1, prefsrc: n/a, scope: global, table: main(254), proto: ra, type: unicast, nexthop: 0, priority: 1024
Aug 06 08:12:24 utgard.lan systemd-networkd[2119957]: br0: Configuring route: dst: 3000:aaaa:bbbb::/64, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main(254), proto: ra, type: unicast, nexthop: 0, priority: 1024
Aug 06 08:12:24 utgard.lan systemd-networkd[2119957]: br0: NDisc routes set.
Aug 06 08:12:24 utgard.lan systemd-networkd[2119957]: br0: Removing old NDisc information obtained from fe80::1.
Aug 06 08:12:24 utgard.lan systemd-networkd[2119957]: br0: Removing address: 3000:aaaa:bbbb:0:f22f:74ff:fe16:5ade/64 (valid for 11h 59min 59s, preferred for 4min 59s), flags: manage-temporary-address,no-prefixroute

(As you can see, there is no Requesting address: 3000:aaaa:bbbb:0:f22f:74ff:fe16:5ade/64 log message during processing the second RA message.)

Could you check that the router always send information about the autonomous prefix? If so, could you also check that this behavior can be reproducible with earlier releases, e.g. plain v249 or v248?

@foxxx0
Copy link

foxxx0 commented Aug 6, 2021

With systemd v248 I've had no such issues even with power-on times for over two weeks with daily suspend periods.

From my ISP I have been assigned a static /48 prefix from which I can freely request /64 prefixes through DHCPv6-PD.
On my router there is a simple radvd instance running to announce the prefixes per-VLAN and that hasn't been causing any issues since it was setup. Works perfectly fine with Windows, Android, FreeBSD and other Linux clients.

radvd.conf excerpt for one of the interfaces:

interface intern0 {
  AdvSendAdvert on;
  AdvDefaultLifetime 9000;
  MinRtrAdvInterval 10;
  MaxRtrAdvInterval 120;

  RDNSS fe80::1 {
      AdvRDNSSLifetime 240;
  };

  DNSSL lan {
      AdvDNSSLLifetime 240;
  };

  AdvRASrcAddress {
      fe80::1;
  };

  prefix ::/64 {
    AdvOnLink on;
    AdvAutonomous on;
    AdvRouterAddr on;
    AdvValidLifetime 43200;
    AdvPreferredLifetime 300;
  };
};

output from radvdump on my machine with systemd v248 now running (shouldn't change with v249 though):

[Aug 06 11:08:13] radvdump (233090): br0 recvmsg len=96
[Aug 06 11:08:13] radvdump (233090): receiver if_index: 2
#
# radvd configuration generated by radvdump 2.19
# based on Router Advertisement from fe80::1
# received by interface br0
#

interface br0
{
	AdvSendAdvert on;
	# Note: {Min,Max}RtrAdvInterval cannot be obtained with radvdump
	AdvManagedFlag off;
	AdvOtherConfigFlag off;
	AdvReachableTime 0;
	AdvRetransTimer 0;
	AdvCurHopLimit 64;
	AdvDefaultLifetime 9000;
	AdvHomeAgentFlag off;
	AdvDefaultPreference medium;
	AdvSourceLLAddress on;

	prefix 3000:aaaa:bbbb::/64
	{
		AdvValidLifetime 43200;
		AdvPreferredLifetime 300;
		AdvOnLink on;
		AdvAutonomous on;
		AdvRouterAddr on;
	}; # End of prefix definition


	RDNSS fe80::1
	{
		AdvRDNSSLifetime 240;
	}; # End of RDNSS definition


	DNSSL lan
	{
		AdvDNSSLLifetime 240;
	}; # End of DNSSL definition

}; # End of interface definition

Please let me know if you need further information.

@foxxx0
Copy link

foxxx0 commented Aug 6, 2021

Just as an update, I have now dumped 20 consecutive RAs from my router and all of them contain the same information, including the

	prefix 3000:aaaa:bbbb::/64
	{
		AdvValidLifetime 43200;
		AdvPreferredLifetime 300;
		AdvOnLink on;
		AdvAutonomous on;
		AdvRouterAddr on;
	}; # End of prefix definition

block.

So the RAs should not be the issue here but rather what systemd does with them.

@yuwata yuwata reopened this Aug 6, 2021
@yuwata yuwata added the regression ⚠️ A bug in something that used to work correctly and broke through some recent commit label Aug 6, 2021
yuwata added a commit to yuwata/systemd that referenced this issue Aug 6, 2021
…n/update timestamp

Follow-up for 25db3ae and 899034b.

Fixes another issue in systemd#20244.
@yuwata
Copy link
Member

yuwata commented Aug 6, 2021

@foxxx0 Thanks. Could you also try to apply the patch in PR #20389?

yuwata added a commit to yuwata/systemd that referenced this issue Aug 6, 2021
…n/update timestamp

Follow-up for 25db3ae and 899034b.

Fixes another issue in systemd#20244.
@foxxx0
Copy link

foxxx0 commented Aug 6, 2021

compiled, installed and rebooted.

will continue to observe and report back in 2-3 days if it keeps working as intended. (or earlier if it fails before that)

@yuwata
Copy link
Member

yuwata commented Aug 6, 2021

Thank you!

codepeon pushed a commit to codepeon/systemd that referenced this issue Aug 7, 2021
…utes configured by NDisc

Fixes systemd#20244.

(cherry picked from commit 10e417b)
poettering pushed a commit that referenced this issue Aug 9, 2021
…n/update timestamp

Follow-up for 25db3ae and 899034b.

Fixes another issue in #20244.
@foxxx0
Copy link

foxxx0 commented Aug 9, 2021

Almost reached 3d uptime with multiple s2ram periods, some of them even for ~12 hours and haven't encountered this issue yet.
So I'm cautiously optimistic that the two commits have fixed this regression.

@yuwata
Copy link
Member

yuwata commented Aug 9, 2021

Thank you for testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing network regression ⚠️ A bug in something that used to work correctly and broke through some recent commit
6 participants