Kernel panics with use_after_free on 2765.2.4 #427

glitchcrab · 2021-07-15T09:49:15Z

Description

We have observed a kernel panic in the networking stack on 5.10.37-flatcar:

[  801.711320] ------------[ cut here ]------------
[  801.712125] refcount_t: underflow; use-after-free.
[  801.712800] WARNING: CPU: 10 PID: 13791 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
[  801.713957] Modules linked in: ip6table_filter nfnetlink_queue xt_NFQUEUE xt_set xt_multiport ipt_rpfilter iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 veth xt_recent xt_statistic xt_nat ipt_REJECT nf_reject_ipv4 ip6table_nat ip6_tables iptable_mangle xt_comment xt_mark xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat br_netfilter bridge stp llc scsi_transport_iscsi overlay kvm_intel mousedev psmouse kvm i2c_i801 i2c_smbus i2c_core evdev irqbypass button dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse configfs squashfs loop xfs sr_mod cdrom ahci libahci aesni_intel libata glue_helper libaes crypto_simd scsi_mod virtio_net cryptd virtio_blk net_failover failover qemu_fw_cfg btrfs blake2b_generic xor zstd_compress lzo_compress raid6_pq libcrc32c crc32c_generic crc32c_intel dm_mirror dm_region_hash dm_log
[  801.714073]  dm_mod
[  801.725980] CPU: 10 PID: 13791 Comm: worker-6 Not tainted 5.10.37-flatcar #1
[  801.726948] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  801.728439] RIP: 0010:refcount_warn_saturate+0xa6/0xf0
[  801.729129] Code: 05 cc 02 40 01 01 e8 00 49 38 00 0f 0b c3 80 3d ba 02 40 01 00 75 95 48 c7 c7 a8 7e 1c 87 c6 05 aa 02 40 01 01 e8 e1 48 38 00 <0f> 0b c3 80 3d 99 02 40 01 00 0f 85 72 ff ff ff 48 c7 c7 00 7f 1c
[  801.731636] RSP: 0018:ffffb15d466e3938 EFLAGS: 00010282
[  801.732353] RAX: 0000000000000000 RBX: ffff930f1457e200 RCX: 0000000000000000
[  801.733323] RDX: ffff932651d288a0 RSI: ffff932651d18b00 RDI: ffff932651d18b00
[  801.734379] RBP: ffff930f1457e200 R08: ffff932651d18b00 R09: ffffb15d466e3750
[  801.735328] R10: 0000000000000001 R11: 0000000000000001 R12: ffff930f1457e230
[  801.736278] R13: 0000000000000002 R14: ffff930cb58773c0 R15: ffff930c6d09c800
[  801.737226] FS:  00007f6680a00b38(0000) GS:ffff932651d00000(0000) knlGS:0000000000000000
[  801.738305] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  801.739063] CR2: 00007f6682c34848 CR3: 00000001edece005 CR4: 0000000000770ee0
[  801.740024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  801.740972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  801.741931] PKRU: 55555554
[  801.742304] Call Trace:
[  801.742668]  nf_queue_entry_release_refs+0x82/0xa0
[  801.743336]  nf_reinject+0x6f/0x1a0
[  801.743825]  0xffffffffc0c33980
[  801.744262]  nfnetlink_unicast+0x1f1/0x420 [nfnetlink]
[  801.744976]  ? sched_clock+0x5/0x10
[  801.745462]  ? sched_clock_cpu+0xc/0xa0
[  801.745979]  ? cred_has_capability+0x7f/0x120
[  801.746578]  ? nfnetlink_unicast+0xa0/0x420 [nfnetlink]
[  801.747296]  netlink_rcv_skb+0x50/0x100
[  801.747832]  nfnetlink_subsys_register+0x789/0x869 [nfnetlink]
[  801.748625]  netlink_unicast+0x191/0x230
[  801.749173]  netlink_sendmsg+0x243/0x480
[  801.749706]  sock_sendmsg+0x5e/0x60
[  801.750187]  ____sys_sendmsg+0x1f3/0x260
[  801.750725]  ? copy_msghdr_from_user+0x5c/0x90
[  801.751335]  ? _cond_resched+0x15/0x30
[  801.751845]  ___sys_sendmsg+0x81/0xc0
[  801.752372]  ? do_lock_file_wait+0x6e/0xe0
[  801.752926]  ? _cond_resched+0x15/0x30
[  801.753434]  ? fcntl_setlk+0x1a5/0x2d0
[  801.754057]  __sys_sendmsg+0x59/0xa0
[  801.754607]  do_syscall_64+0x33/0x40
[  801.755095]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  801.755798] RIP: 0033:0x7f6684207352
[  801.756310] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 8a d2 ff ff 41 54 b8 02 00 00 00 49 89 f4 be 00 88 08 00 55
[  801.758802] RSP: 002b:00007f66809feae8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  801.759812] RAX: ffffffffffffffda RBX: 00007f6680a00b38 RCX: 00007f6684207352
[  801.760786] RDX: 0000000000000000 RSI: 00007f66809feb38 RDI: 0000000000000072
[  801.761741] RBP: 0000000000000072 R08: 0000000000000000 R09: 0000000000000000
[  801.762707] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000002e
[  801.763661] R13: 00007f6682c7dfd0 R14: 00007f66809ff028 R15: 0000000000ac0000
[  801.764649] ---[ end trace 773c05729b11a531 ]---

Impact

Impact is currently limited large clusters which have pod counts and high levels of pod churn. Impact is currently relatively low, in a cluster with 40 workers we see the issue occurring on 5 workers.

Environment and steps to reproduce

Flatcar runs inside a QEMU VM which itself runs on top of Fedora 33. Full QEMU commandline args are provided below.
No specific task was run, it seems to be caused by pod churn in a large cluster.
As above.
Networking ends up in a semi-broken state after the above kernel panic.

Expected behavior

The kernel panic should not occur.

Additional information

The symptoms seem to differ slightly to when we saw this bug on AWS; in this case the networking stack seems to end up semi-broken instead of completely dead.

The issue appears to manifest on larger clusters (both clusters where we saw it have 40+ nodes) which have a large number of pods (both clusters have 1400+ pods). Additionally, the cluster which was affected the worst also has a high pod churn - often pods have 1000+ restarts in 24 hours.

Observed symptoms

CPU usage of affected nodes spikes compared to normal levels.
affected nodes appear to drop inbound connection attempts (SSH connections are immediately reset, for example).
nodes still respond to pings

QEMU args

Below is the full commandline which launches the machine, however I don't think that this is going to be that helpful as we experienced what appears to be the same bug previous on AWS.

/usr/local/bin/qemu-system-x86_64 -name master-1 -nographic -machine type=q35,accel=kvm -cpu host,pmu=off -smp 3 -m 8G -enable-kvm -device virtio-net-pci,netdev=tap-rx7m2,mac=DE:AD:BE:95:E9:6F -netdev tap,id=tap-rx7m2,ifname=tap-rx7m2,script=/etc/qemu-ifup,downscript=no -fw_cfg name=opt/org.flatcar-linux/config,file=/usr/code/ignition/final.json -drive if=none,file=/usr/code/rootfs/rootfs.img,format=raw,discard=on,id=rootfs -device virtio-blk-pci,drive=rootfs,serial=rootfs -drive if=none,file=/usr/code/rootfs/dockerfs.img,format=raw,discard=on,id=dockerfs -device virtio-blk-pci,drive=dockerfs,serial=dockerfs -drive if=none,file=/usr/code/rootfs/kubeletfs.img,format=raw,discard=on,id=kubeletfs -device virtio-blk-pci,drive=kubeletfs,serial=kubeletfs -fsdev local,security_model=none,id=fsdev1,path=/etc/kubernetes/data/etcd/ -device virtio-9p-pci,id=fs1,fsdev=fsdev1,mount_tag=etcdshare -device sga -device virtio-rng-pci -serial stdio -monitor unix:/qemu-monitor,server,nowait -kernel /usr/code/images/v2/2765.2.4/flatcar_production_pxe.vmlinuz -initrd /usr/code/images/v2/2765.2.4/flatcar_production_pxe_image.cpio.gz -append console=ttyS0 root=/dev/disk/by-id/virtio-rootfs rootflags=rw flatcar.first_boot=1

The text was updated successfully, but these errors were encountered:

dongsupark · 2021-07-15T13:51:45Z

Thanks for the report.

AFAIK, the upstream stable v5.10 tree has no fix about that issue yet.
The issue looks quite similar to https://www.spinics.net/lists/netfilter-devel/msg66430.html . (Note: the original post seems to be gone from public mailing list archives)

However, the mentioned fix, and its relevant commits like 1 and 2 are already included since Kernel ~v5.7. (credit to @alban )
So the story is not that simple.

My theory is, the remaining refcount issue had been hidden like that for some time, and recently it was somehow uncovered by other changes.
I would like to suggest reporting the issue to https://bugzilla.kernel.org or the kernel netdev mailing list.

glitchcrab · 2021-07-19T13:37:37Z

thanks for the time spent looking at this @dongsupark. I've opened a bug report which is here for reference https://bugzilla.kernel.org/show_bug.cgi?id=213783

tormath1 · 2022-07-08T10:24:58Z

Hi @glitchcrab, did you get a chance to reproduce the issue with latest Flatcar releases? They ship kernel 5.15 - that would be interesting to see if you still have these panics...

glitchcrab · 2022-07-10T10:28:13Z

@tormath1 I'm afraid not, we were experiencing the issues on a platform of ours which is now EOL and so we aren't investing any more time into upgrade work

pothos · 2023-05-23T12:47:59Z

Closing now, please tell if this is still an issue on the latest Stable.

sayanchowdhury added the kind/bug Something isn't working label Jul 15, 2021

sayanchowdhury added the kind/upstream-blocked label Jan 28, 2022

pothos closed this as completed May 23, 2023

github-actions bot mentioned this issue Nov 10, 2023

Monthly contributions report 2023-05-22 - 2023-06-21 #1248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel panics with use_after_free on 2765.2.4 #427

Kernel panics with use_after_free on 2765.2.4 #427

glitchcrab commented Jul 15, 2021

dongsupark commented Jul 15, 2021

glitchcrab commented Jul 19, 2021

tormath1 commented Jul 8, 2022

glitchcrab commented Jul 10, 2022

pothos commented May 23, 2023

Kernel panics with use_after_free on 2765.2.4 #427

Kernel panics with use_after_free on 2765.2.4 #427

Comments

glitchcrab commented Jul 15, 2021

Description

Impact

Environment and steps to reproduce

Expected behavior

Additional information

Observed symptoms

QEMU args

dongsupark commented Jul 15, 2021

glitchcrab commented Jul 19, 2021

tormath1 commented Jul 8, 2022

glitchcrab commented Jul 10, 2022

pothos commented May 23, 2023