Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable 3227.2.2 randomly causes processes to hang on I/O related operations #847

Closed
Nuckal777 opened this issue Sep 7, 2022 · 59 comments · Fixed by flatcar-archive/coreos-overlay#2315
Labels
area/kernel Issues related to kernel kind/bug Something isn't working

Comments

@Nuckal777
Copy link

Nuckal777 commented Sep 7, 2022

Description

We've seen multiple nodes (different regions and environments) stalling completely (unresponsive kubelets and containerds, journald and timesyncd units failing, no login possible, ...) on the 3227.2.2 release. This seems to be happening mostly on the reboot after the update, but we also had this occurring at random.

Impact

This causes Kubernetes nodes to become NotReady before being drained, which involves volumes not being able to be moved and therefore service interruptions.

Environment and steps to reproduce

  1. Set-up: Flatcar Stable 3227.2.2, OpenStack, ESXi hypervisors
  2. Task: on node start, but also during normal operation
  3. Action(s): no clue if anything specific causes this
  4. Error:
  • we get task blocked for more than 120 seconds errors and related call stacks, see the screenshot
  • Screenshot 2022-09-07 at 14 49 03
  • CPU usage is very higher when the issue occurs
  • the journal gets corrupted when this issue occurs
  • rebooting once more, brings the node back
  • once the node is up again, partitions and filesystems appear healthy

Expected behavior

The nodes do not stall completely.

Additional information

We have the feeling that we may hit a kernel bug as we only see this on the 3227.2.2 release were basically only the kernel was changed. Do you have ideas how we can diagnose this further? Thanks.

cc @databus23 @defo89 @jknipper

@Nuckal777 Nuckal777 added the kind/bug Something isn't working label Sep 7, 2022
@pothos
Copy link
Member

pothos commented Sep 7, 2022

Thanks for reporting, in https://www.flatcar.org/releases#release-3227.2.2 we have the links to the kernel changelogs for each bugfix release since the last Flatcar stable release. We need to find suspicious changes and try to see if the problem goes away if they are reverted.
Do you have an easy reproducer with https://stable.release.flatcar-linux.net/amd64-usr/3227.2.2/flatcar_production_qemu.sh and the qemu image?

@Nuckal777
Copy link
Author

Could not reproduce it using QEMU so far. I tried plenty of reboots and letting the VM ran for while using one of our ignition templates. Also starting with 3227.2.1 and updating did not trigger it. :(

@nealey
Copy link

nealey commented Sep 7, 2022

We think we're having this same problem. It seems to manifest most often as sshd hanging. We'll keep a test server running things and hopefully get some idea about what's going on, but probably can't help with a reproduction case if QEMU isn't working.

@databus23
Copy link

databus23 commented Sep 9, 2022

We are still seeing lock-ups of nodes with 3227.2.2 on a daily basis in our fleet but still fail to reproduce the error some what consistently. Affected nodes are hanging, login via ssh or console is just haging. We are always seeing stack traces of "hung task timeouts" that are io/fs related.

@nealey
Copy link

nealey commented Sep 9, 2022

The main node we see hangs on has several cephfs mounts. This kernel module has been problematic in the past: does anyone else with this problem use cephfs?

@Nuckal777
Copy link
Author

We experienced the issue with ext4 on vmware disks.

@schmitch
Copy link

schmitch commented Sep 12, 2022

my company see's these issues aswell under vmware and 3227.2.2 will be ready to provide more info if necessary.
btw. we also have a baremetal cluster that does not have this problem, so it's probably related to vmware.

@procinger
Copy link

The servers in our company have the same symptoms after an update to 3227.2.2.

We can reproduce it as soon as we generate a lot of IO on the hard disk.

Unfortunately I don't see any errors in systemd / kernel ring buffer. Only
File /var/log/journal/<some-id>/system.journal corrupted or uncleanly shut down, renaming and replacing.
shortly after that the system stops responding.

As a temporary solution we did a rollback to 3227.2.1 for now.

@Nuckal777
Copy link
Author

We can reproduce it as soon as we generate a lot of IO on the hard disk.

Any specific you executed for that? I shuffled around some 100 GB's using dd and could not trigger the issue.

@sfudeus
Copy link
Contributor

sfudeus commented Sep 15, 2022

I could reproduce hung task errors on bare metal by running stress -i 10 -d 5 within a pod's emptyDir, backed by the host's SSD Raid1 after a while. I didn't crosscheck against the previous version so far.

No further negative consequences, esp. after shutting down stress. No more errors AFAICS and load is still usual.

[291596.784816] INFO: task stress:3125869 blocked for more than 123 seconds.
[291596.792511]       Not tainted 5.15.63-flatcar #1
[291596.797871] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[291596.806839] task:stress          state:D stack:    0 pid:3125869 ppid:3125856 flags:0x00004000
[291596.816707] Call Trace:
[291596.819630]  <TASK>
[291596.822164]  __schedule+0x2eb/0x8d0
[291596.826258]  schedule+0x5b/0xd0
[291596.829957]  rwsem_down_write_slowpath+0x220/0x4f0
[291596.835508]  ? wb_wait_for_completion+0x25/0x80
[291596.840776]  sync_inodes_sb+0xba/0x2a0
[291596.845159]  ? __x64_sys_tee+0xd0/0xd0
[291596.849542]  iterate_supers+0x9f/0x100
[291596.853926]  ksys_sync+0x40/0xa0
[291596.857729]  __do_sys_sync+0xa/0x20
[291596.861818]  do_syscall_64+0x3b/0x90
[291596.866008]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[291596.871853] RIP: 0033:0x7f9de9d77b17
[291596.876042] RSP: 002b:00007ffe1b4ffc48 EFLAGS: 00000217 ORIG_RAX: 00000000000000a2
[291596.884722] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f9de9d77b17
[291596.892903] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[291596.901101] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f9de9c7f740
[291596.909292] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000002328
[291596.917488] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003
[291596.925684]  </TASK>

@procinger
Copy link

We tried to reproduce the case again today in our test cluster. We installed one node with flatcar version 3227.2.1 and the other with 3227.2.2.

Both nodes have the following hardware

  • 128 GB DDR4 ECC
  • 2 x 1.92 TB NVME
  • AMD Ryzen 9 3900 (24 cores)
  • K8S v1.24.5

We have used dd, stress as well as stress-ng, unsuccessfully! We had a high load, but the 3227.2.2 nodes did not crash.

Then we crafted the following K8S resource with a small Golang tool. And this actually managed to crash the 3227.2.2 node after about 5 minutes. SSH and other services were no longer accessible.

On the 3227.2.1 node the script ran until the filesystem reported full inodes, but it was still reachable.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: io-test
  labels: 
    app: io-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: io-test
  template:
    metadata:
      labels:
        app: io-test
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/hostname
                    operator: In
                    values:
                      - <select-a-node>
      containers:
        - name: io-test
          image: golang:1.19.1
          command:
          - bash
          - -c
          - |
            apt update && apt install time -y
            echo '
              package main

              import (
                "fmt"
                "log"
                "os"
              )

              func main() {
                suffix := ""
                if len(os.Args) > 1 {
                  suffix = os.Args[1]
                }

                err := run(suffix)
                if err != nil {
                  log.Fatal(err)
                }
              }

              func run(suffix string) error {
                for i := 0; i < 100_000; i++ {
                  if i%10_000 == 0 {
                    log.Printf("file #%d", i)
                  }

                  path := fmt.Sprintf("file_%d_%s", i, suffix)
                  content := path
                  err := os.WriteFile(path, []byte(content), 0644)
                  if err != nil {
                    return err
                  }
                }

                return nil
              }
            ' >> /test/main.go
            cd /test
            go build main.go

            while true
            do
              time ./main $(date +%s)
            done
          volumeMounts:
          - name: io-test
            mountPath: "/test"
      restartPolicy: Always
      volumes:
        - name: io-test
          emptyDir: {}

When the 3227.2.2 node crashed, the kubelet process was marked as a zombie and in the kernel ring buffer we saw the following errors

[2091.827035] INFO: task jbd2/nvme0n1p9-:755 blocked for more than 122 seconds.
[ 2091.834299]       Not tainted 5.15.63-flatcar #1
[ 2091.839081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2091.847099] task:jbd2/nvme0n1p9- state:D stack:    0 pid:  755 ppid:     2 flags:0x00004000
[ 2091.855634] Call Trace:
[ 2091.858241]  <TASK>
[ 2091.860505]  __schedule+0x2eb/0x8d0
[ 2091.864153]  schedule+0x5b/0xd0
[ 2091.867456]  jbd2_journal_commit_transaction+0x301/0x2850 [jbd2]
[ 2091.873628]  ? sugov_update_single_freq+0xb9/0x190
[ 2091.878582]  ? sugov_exit+0xa0/0xa0
[ 2091.882229]  ? wait_woken+0x70/0x70
[ 2091.885878]  ? lock_timer_base+0x61/0x80
[ 2091.889962]  jbd2_journal_check_available_features+0x1ab/0x3f0 [jbd2]
[ 2091.896565]  ? wait_woken+0x70/0x70
[ 2091.900214]  ? jbd2_journal_check_available_features+0x100/0x3f0 [jbd2]
[ 2091.906982]  kthread+0x127/0x150
[ 2091.910379]  ? set_kthread_struct+0x50/0x50
[ 2091.914720]  ret_from_fork+0x22/0x30
[ 2091.918455]  </TASK>
[ 2091.920803] INFO: task systemd-journal:988 blocked for more than 122 seconds.
[ 2091.928104]       Not tainted 5.15.63-flatcar #1
[ 2091.932876] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2091.940891] task:systemd-journal state:D stack:    0 pid:  988 ppid:     1 flags:0x00000224
[ 2091.949428] Call Trace:
[ 2091.952038]  <TASK>
[ 2091.954301]  __schedule+0x2eb/0x8d0
[ 2091.957949]  schedule+0x5b/0xd0
[ 2091.961260]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[ 2091.966815]  ? wait_woken+0x70/0x70
[ 2091.970461]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[ 2091.976104]  ? call_rcu+0xa2/0x330
[ 2091.979667]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[ 2091.985311]  ? step_into+0x47c/0x7b0
[ 2091.989052]  ? __cond_resched+0x16/0x50
[ 2091.993047]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 2091.997998]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 2092.003302]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 2092.007902]  __mark_inode_dirty+0x147/0x320
[ 2092.012255]  touch_atime+0x13c/0x150
[ 2092.015991]  filemap_read+0x308/0x320
[ 2092.019810]  ? path_lookupat+0xa2/0x1c0
[ 2092.023815]  new_sync_read+0x119/0x1b0
[ 2092.027735]  ? 0xffffffffb6000000
[ 2092.031207]  vfs_read+0xf6/0x190
[ 2092.034596]  __x64_sys_pread64+0x91/0xc0
[ 2092.038678]  do_syscall_64+0x3b/0x90
[ 2092.042413]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2092.047622] RIP: 0033:0x7f7b5d090b17
[ 2092.051356] RSP: 002b:00007ffc6dedc4c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[ 2092.059113] RAX: ffffffffffffffda RBX: 00007ffc6dedc570 RCX: 00007f7b5d090b17
[ 2092.066401] RDX: 0000000000000040 RSI: 00007ffc6dedc4f0 RDI: 0000000000000018
[ 2092.073701] RBP: 0000000000f25b78 R08: 0000000000000000 R09: 00007ffc6dedc700
[ 2092.080998] R10: 0000000000f25b78 R11: 0000000000000293 R12: 000055d55b175310
[ 2092.088284] R13: 0000000000000000 R14: 00007ffc6dedc4f0 R15: 000055d55b175310
[ 2092.095576]  </TASK>
[ 2092.097944] INFO: task systemd-timesyn:1140 blocked for more than 123 seconds.
[ 2092.105360]       Not tainted 5.15.63-flatcar #1
[ 2092.110132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2092.118139] task:systemd-timesyn state:D stack:    0 pid: 1140 ppid:     1 flags:0x00000220
[ 2092.126678] Call Trace:
[ 2092.129287]  <TASK>
[ 2092.131546]  __schedule+0x2eb/0x8d0
[ 2092.135199]  schedule+0x5b/0xd0
[ 2092.138500]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[ 2092.144063]  ? wait_woken+0x70/0x70
[ 2092.147721]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[ 2092.153361]  ? __fget_files+0x79/0xb0
[ 2092.157193]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[ 2092.162846]  ? nd_jump_link+0x4d/0xc0
[ 2092.166681]  ? __cond_resched+0x16/0x50
[ 2092.170677]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 2092.175627]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 2092.180921]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 2092.185533]  __mark_inode_dirty+0x147/0x320
[ 2092.189877]  ext4_setattr+0x1fb/0x9a0 [ext4]
[ 2092.194311]  notify_change+0x3c4/0x540
[ 2092.198221]  ? vfs_utimes+0x139/0x220
[ 2092.202043]  vfs_utimes+0x139/0x220
[ 2092.205692]  do_utimes+0xb4/0x120
[ 2092.209165]  __x64_sys_utimensat+0x70/0xb0
[ 2092.213431]  ? syscall_trace_enter.constprop.0+0x143/0x1c0
[ 2092.219073]  do_syscall_64+0x3b/0x90
[ 2092.222815]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2092.228036] RIP: 0033:0x7f26ccdc801f
[ 2092.231770] RSP: 002b:00007ffc7da41a98 EFLAGS: 00000206 ORIG_RAX: 0000000000000118
[ 2092.239533] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f26ccdc801f
[ 2092.246824] RDX: 0000000000000000 RSI: 00007ffc7da41aa0 RDI: 00000000ffffff9c
[ 2092.254130] RBP: 00007ffc7da41aa0 R08: 0000000000000000 R09: 00007ffc7da41930
[ 2092.261427] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
[ 2092.268715] R13: 00000000ffffffff R14: ffffffffffffffff R15: 00000000ffffffff
[ 2092.276013]  </TASK>
[ 2092.280018] INFO: task containerd:1257 blocked for more than 123 seconds.
[ 2092.286959]       Not tainted 5.15.63-flatcar #1
[ 2092.291735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2092.299750] task:containerd      state:D stack:    0 pid: 1257 ppid:     1 flags:0x00000000
[ 2092.308287] Call Trace:
[ 2092.310903]  <TASK>
[ 2092.313175]  __schedule+0x2eb/0x8d0
[ 2092.316821]  ? newidle_balance+0x12a/0x400
[ 2092.321086]  schedule+0x5b/0xd0
[ 2092.324399]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[ 2092.329962]  ? wait_woken+0x70/0x70
[ 2092.333617]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[ 2092.339260]  ? select_task_rq_fair+0x130/0xf90
[ 2092.343871]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[ 2092.349531]  ? ttwu_queue_wakelist+0xf2/0x110
[ 2092.354056]  ? __cond_resched+0x16/0x50
[ 2092.358056]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 2092.363007]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 2092.368311]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 2092.372920]  __mark_inode_dirty+0x147/0x320
[ 2092.377274]  generic_update_time+0x6c/0xd0
[ 2092.381527]  file_update_time+0x127/0x140
[ 2092.385704]  ? generic_write_checks+0x61/0xc0
[ 2092.390229]  ext4_llseek+0x3fa/0x1cb0 [ext4]
[ 2092.394666]  new_sync_write+0x11c/0x1b0
[ 2092.398660]  ? intel_get_event_constraints+0x300/0x390
[ 2092.403964]  vfs_write+0x1de/0x270
[ 2092.407536]  ksys_write+0x5f/0xe0
[ 2092.411010]  do_syscall_64+0x3b/0x90
[ 2092.414746]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2092.419954] RIP: 0033:0x55d3b77a2edb
[ 2092.423698] RSP: 002b:000000c0005228d0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 2092.431454] RAX: ffffffffffffffda RBX: 000000c000054000 RCX: 000055d3b77a2edb
[ 2092.438751] RDX: 0000000000000117 RSI: 000000c002fe3500 RDI: 0000000000000255
[ 2092.446053] RBP: 000000c000522920 R08: 0000000000000201 R09: 000000c003776120
[ 2092.453346] R10: 0000000000000117 R11: 0000000000000202 R12: 000000c0005229b0
[ 2092.460644] R13: 0000000000000000 R14: 000000c002c9cea0 R15: 0000000000000032
[ 2092.467945]  </TASK>
[ 2092.470307] INFO: task containerd:4089 blocked for more than 123 seconds.
[ 2092.477260]       Not tainted 5.15.63-flatcar #1
[ 2092.482042] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2092.490059] task:containerd      state:D stack:    0 pid: 4089 ppid:     1 flags:0x00000000
[ 2092.498597] Call Trace:
[ 2092.501212]  <TASK>
[ 2092.503475]  __schedule+0x2eb/0x8d0
[ 2092.507131]  schedule+0x5b/0xd0
[ 2092.510437]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[ 2092.515997]  ? wait_woken+0x70/0x70
[ 2092.519647]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[ 2092.525289]  ? select_task_rq_fair+0x130/0xf90
[ 2092.529900]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[ 2092.535550]  ? ttwu_queue_wakelist+0xf2/0x110
[ 2092.540073]  ? __cond_resched+0x16/0x50
[ 2092.544068]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 2092.549019]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 2092.554321]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 2092.558932]  __mark_inode_dirty+0x147/0x320
[ 2092.563282]  generic_update_time+0x6c/0xd0
[ 2092.567537]  file_update_time+0x127/0x140
[ 2092.571708]  ? generic_write_checks+0x61/0xc0
[ 2092.576233]  ext4_llseek+0x3fa/0x1cb0 [ext4]
[ 2092.580668]  new_sync_write+0x11c/0x1b0
[ 2092.584672]  ? intel_get_event_constraints+0x300/0x390
[ 2092.589977]  vfs_write+0x1de/0x270
[ 2092.593539]  ksys_write+0x5f/0xe0
[ 2092.597015]  do_syscall_64+0x3b/0x90
[ 2092.600747]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2092.605957] RIP: 0033:0x55d3b77a2edb
[ 2092.609701] RSP: 002b:000000c0025e18d0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 2092.617456] RAX: ffffffffffffffda RBX: 000000c00004a000 RCX: 000055d3b77a2edb
[ 2092.624747] RDX: 00000000000000ac RSI: 000000c000b760c0 RDI: 00000000000000d2
[ 2092.632044] RBP: 000000c0025e1920 R08: 0000000000000001 R09: 000000c001ca3c80
[ 2092.639343] R10: 00000000000000ac R11: 0000000000000202 R12: 000000c0025e19b0
[ 2092.646639] R13: 0000000000000000 R14: 000000c001504d00 R15: 0000000000000035
[ 2092.653938]  </TASK>
[ 2092.656293] INFO: task containerd:6560 blocked for more than 123 seconds.
[ 2092.663246]       Not tainted 5.15.63-flatcar #1
[ 2092.668027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2092.676044] task:containerd      state:D stack:    0 pid: 6560 ppid:     1 flags:0x00000000
[ 2092.684581] Call Trace:
[ 2092.687188]  <TASK>
[ 2092.689450]  __schedule+0x2eb/0x8d0
[ 2092.693100]  ? newidle_balance+0x12a/0x400
[ 2092.697354]  schedule+0x5b/0xd0
[ 2092.700659]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[ 2092.706224]  ? wait_woken+0x70/0x70
[ 2092.709870]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[ 2092.715521]  ? select_task_rq_fair+0x130/0xf90
[ 2092.720124]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[ 2092.725773]  ? ttwu_queue_wakelist+0xf2/0x110
[ 2092.730288]  ? __cond_resched+0x16/0x50
[ 2092.734284]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 2092.739231]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 2092.744539]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 2092.749148]  __mark_inode_dirty+0x147/0x320
[ 2092.753500]  generic_update_time+0x6c/0xd0
[ 2092.757754]  file_update_time+0x127/0x140
[ 2092.761931]  ? generic_write_checks+0x61/0xc0
[ 2092.766456]  ext4_llseek+0x3fa/0x1cb0 [ext4]
[ 2092.770895]  new_sync_write+0x11c/0x1b0
[ 2092.774897]  ? intel_get_event_constraints+0x300/0x390
[ 2092.780199]  vfs_write+0x1de/0x270
[ 2092.783763]  ksys_write+0x5f/0xe0
[ 2092.787245]  do_syscall_64+0x3b/0x90
[ 2092.790981]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2092.796200] RIP: 0033:0x55d3b77a2edb
[ 2092.799942] RSP: 002b:000000c00237a8d0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 2092.807698] RAX: ffffffffffffffda RBX: 000000c00004f000 RCX: 000055d3b77a2edb
[ 2092.814989] RDX: 0000000000000097 RSI: 000000c001e1c540 RDI: 0000000000000227
[ 2092.822287] RBP: 000000c00237a920 R08: 0000000000000201 R09: 000000c0019f21e0
[ 2092.829581] R10: 0000000000000097 R11: 0000000000000202 R12: 000000c00237a9b0
[ 2092.836881] R13: 0000000000000000 R14: 000000c001cc8d00 R15: 0000000000000034
[ 2092.844177]  </TASK>
[ 2092.846531] INFO: task containerd:7559 blocked for more than 123 seconds.
[ 2092.853470]       Not tainted 5.15.63-flatcar #1
[ 2092.858243] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2092.866258] task:containerd      state:D stack:    0 pid: 7559 ppid:     1 flags:0x00000000
[ 2092.874794] Call Trace:
[ 2092.877406]  <TASK>
[ 2092.879666]  __schedule+0x2eb/0x8d0
[ 2092.883314]  ? newidle_balance+0x12a/0x400
[ 2092.887569]  schedule+0x5b/0xd0
[ 2092.890873]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[ 2092.896436]  ? wait_woken+0x70/0x70
[ 2092.900085]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[ 2092.905736]  ? select_task_rq_fair+0x130/0xf90
[ 2092.910347]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[ 2092.915997]  ? ttwu_queue_wakelist+0xf2/0x110
[ 2092.920521]  ? __cond_resched+0x16/0x50
[ 2092.924524]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 2092.929473]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 2092.934777]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 2092.939386]  __mark_inode_dirty+0x147/0x320
[ 2092.943741]  generic_update_time+0x6c/0xd0
[ 2092.947995]  file_update_time+0x127/0x140
[ 2092.952170]  ? generic_write_checks+0x61/0xc0
[ 2092.956693]  ext4_llseek+0x3fa/0x1cb0 [ext4]
[ 2092.961124]  new_sync_write+0x11c/0x1b0
[ 2092.965129]  ? intel_get_event_constraints+0x300/0x390
[ 2092.970424]  vfs_write+0x1de/0x270
[ 2092.973984]  ksys_write+0x5f/0xe0
[ 2092.977459]  do_syscall_64+0x3b/0x90
[ 2092.981195]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2092.986412] RIP: 0033:0x55d3b77a2edb
[ 2092.990148] RSP: 002b:000000c00285c8d0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 2092.997903] RAX: ffffffffffffffda RBX: 000000c000074800 RCX: 000055d3b77a2edb
[ 2093.005203] RDX: 0000000000000088 RSI: 000000c000a13500 RDI: 0000000000000236
[ 2093.012502] RBP: 000000c00285c920 R08: 0000000000000201 R09: 000000c002024f00
[ 2093.019796] R10: 00007ffc79f7e080 R11: 0000000000000202 R12: 000000c00285c810
[ 2093.027095] R13: 0000000000000000 R14: 000000c001d89520 R15: 000000000000002f
[ 2093.034390]  </TASK>
[ 2093.036749] INFO: task containerd:7563 blocked for more than 124 seconds.
[ 2093.043690]       Not tainted 5.15.63-flatcar #1
[ 2093.048463] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2093.056472] task:containerd      state:D stack:    0 pid: 7563 ppid:     1 flags:0x00000000
[ 2093.065011] Call Trace:
[ 2093.067619]  <TASK>
[ 2093.069881]  __schedule+0x2eb/0x8d0
[ 2093.073537]  ? newidle_balance+0x12a/0x400
[ 2093.077793]  schedule+0x5b/0xd0
[ 2093.081095]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[ 2093.086651]  ? wait_woken+0x70/0x70
[ 2093.090308]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[ 2093.095959]  ? select_task_rq_fair+0x130/0xf90
[ 2093.100569]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[ 2093.106209]  ? ttwu_queue_wakelist+0xf2/0x110
[ 2093.112364]  ? __cond_resched+0x16/0x50
[ 2093.116360]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 2093.121309]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 2093.126613]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 2093.131214]  __mark_inode_dirty+0x147/0x320
[ 2093.135557]  generic_update_time+0x6c/0xd0
[ 2093.139810]  file_update_time+0x127/0x140
[ 2093.143979]  ? generic_write_checks+0x61/0xc0
[ 2093.148494]  ext4_llseek+0x3fa/0x1cb0 [ext4]
[ 2093.152923]  new_sync_write+0x11c/0x1b0
[ 2093.156920]  ? intel_get_event_constraints+0x300/0x390
[ 2093.162222]  vfs_write+0x1de/0x270
[ 2093.165785]  ksys_write+0x5f/0xe0
[ 2093.169267]  do_syscall_64+0x3b/0x90
[ 2093.173004]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2093.178213] RIP: 0033:0x55d3b77a2edb
[ 2093.181947] RSP: 002b:000000c0027fa8d0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 2093.189704] RAX: ffffffffffffffda RBX: 000000c000074800 RCX: 000055d3b77a2edb
[ 2093.196997] RDX: 00000000000001f5 RSI: 000000c0001e22c0 RDI: 000000000000013b
[ 2093.204289] RBP: 000000c0027fa920 R08: 0000000000000101 R09: 000000c003288360
[ 2093.211592] R10: 00000000000001f5 R11: 0000000000000202 R12: 000000c0027fa9b0
[ 2093.218875] R13: 0000000000000000 R14: 000000c00117c820 R15: 000000000000002f
[ 2093.226176]  </TASK>
[ 2093.228524] INFO: task containerd:9227 blocked for more than 124 seconds.
[ 2093.235475]       Not tainted 5.15.63-flatcar #1
[ 2093.240257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2093.248273] task:containerd      state:D stack:    0 pid: 9227 ppid:     1 flags:0x00000000
[ 2093.256810] Call Trace:
[ 2093.259420]  <TASK>
[ 2093.261682]  __schedule+0x2eb/0x8d0
[ 2093.265330]  ? newidle_balance+0x12a/0x400
[ 2093.269586]  schedule+0x5b/0xd0
[ 2093.272886]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[ 2093.278451]  ? wait_woken+0x70/0x70
[ 2093.282099]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[ 2093.287740]  ? cpumask_next+0x1e/0x30
[ 2093.291562]  ? select_task_rq_fair+0x130/0xf90
[ 2093.296165]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[ 2093.301809]  ? ttwu_queue_wakelist+0xf2/0x110
[ 2093.306331]  ? __cond_resched+0x16/0x50
[ 2093.310327]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 2093.315276]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 2093.320579]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 2093.325192]  __mark_inode_dirty+0x147/0x320
[ 2093.329530]  generic_update_time+0x6c/0xd0
[ 2093.333789]  file_update_time+0x127/0x140
[ 2093.337965]  ? generic_write_checks+0x61/0xc0
[ 2093.342488]  ext4_llseek+0x3fa/0x1cb0 [ext4]
[ 2093.346917]  new_sync_write+0x11c/0x1b0
[ 2093.350921]  ? intel_get_event_constraints+0x300/0x390
[ 2093.356224]  vfs_write+0x1de/0x270
[ 2093.359788]  ksys_write+0x5f/0xe0
[ 2093.363270]  do_syscall_64+0x3b/0x90
[ 2093.367007]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2093.372215] RIP: 0033:0x55d3b77a2edb
[ 2093.375949] RSP: 002b:000000c000fa88d0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 2093.383696] RAX: ffffffffffffffda RBX: 000000c00004a000 RCX: 000055d3b77a2edb
[ 2093.390987] RDX: 0000000000000153 RSI: 000000c000e96000 RDI: 000000000000003f
[ 2093.398276] RBP: 000000c000fa8920 R08: 0000000000000001 R09: 000000c00084d3e0
[ 2093.405571] R10: 0000000000000153 R11: 0000000000000202 R12: 000000c000fa89b0
[ 2093.412860] R13: 0000000000000000 R14: 000000c001a471e0 R15: 0000000000000031
[ 2093.420160]  </TASK>
[ 2093.422514] INFO: task containerd:9229 blocked for more than 124 seconds.
[ 2093.429459]       Not tainted 5.15.63-flatcar #1
[ 2093.434232] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2093.442248] task:containerd      state:D stack:    0 pid: 9229 ppid:     1 flags:0x00000000
[ 2093.450785] Call Trace:
[ 2093.453406]  <TASK>
[ 2093.455664]  __schedule+0x2eb/0x8d0
[ 2093.459315]  schedule+0x5b/0xd0
[ 2093.462617]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[ 2093.468171]  ? wait_woken+0x70/0x70
[ 2093.471819]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[ 2093.477469]  ? select_task_rq_fair+0x130/0xf90
[ 2093.482073]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[ 2093.487726]  ? ttwu_queue_wakelist+0xf2/0x110
[ 2093.492247]  ? __cond_resched+0x16/0x50
[ 2093.496244]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 2093.501200]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 2093.506504]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 2093.511114]  __mark_inode_dirty+0x147/0x320
[ 2093.515458]  generic_update_time+0x6c/0xd0
[ 2093.519714]  file_update_time+0x127/0x140
[ 2093.523879]  ? generic_write_checks+0x61/0xc0
[ 2093.528403]  ext4_llseek+0x3fa/0x1cb0 [ext4]
[ 2093.532832]  new_sync_write+0x11c/0x1b0
[ 2093.536836]  ? intel_get_event_constraints+0x300/0x390
[ 2093.542140]  vfs_write+0x1de/0x270
[ 2093.545702]  ksys_write+0x5f/0xe0
[ 2093.549176]  do_syscall_64+0x3b/0x90
[ 2093.552913]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2093.558133] RIP: 0033:0x55d3b77a2edb
[ 2093.561864] RSP: 002b:000000c00245f8d0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 2093.569624] RAX: ffffffffffffffda RBX: 000000c000054000 RCX: 000055d3b77a2edb
[ 2093.576918] RDX: 0000000000001158 RSI: 000000c0013a9900 RDI: 00000000000002e3
[ 2093.584221] RBP: 000000c00245f920 R08: 0000000000000201 R09: 000000c002d243c0
[ 2093.591506] R10: 0000000000001158 R11: 0000000000000202 R12: 000000c00245f9b0
[ 2093.598793] R13: 0000000000000000 R14: 000000c004509860 R15: 0000000000000033
[ 2093.606086]  </TASK>

cc (@damoon)

@Champ-Goblem
Copy link

We are also experiencing this problem but on 3227.2.1 that we have updated some packages of. On 3227.2.1 the only updates we made were containerd 1.6.8 and kernel 5.15.62.

When this issues occurs, our CPU metrics show that containerd jumps to 90% utilisation of the node's CPU and some services such as SSHD stop responding. As stated in previous replies the node becomes entirely unresponsive and kubernetes reports the node as unreachable. The node doesnt recover unless force restarted or replaced by a cloud provider.

I have tried downgrading the containerd version to 1.6.6 (which worked fine for us on a previous flatcar version), and I have also tried upgrading the kernel to 5.15.68, both of which did not resolve the problem. Comparing the kernel versions of upstream 3227.2.1 and our current stable flatcar version (3139.2.3) I see 5.15.58 and 5.15.54 respectively. So potentially some kernel version after this is causing the issue?

@defo89
Copy link

defo89 commented Sep 27, 2022

@Champ-Goblem since you have decent testing capabilities, would you mind also reproducing this with 5.15.70 kernel? Looks that it will be in the next stable release flatcar-archive/coreos-overlay@ea82f18

@jepio
Copy link
Member

jepio commented Sep 27, 2022

I was not able to repro with the reproducer in my own testing, but an infrastructure VM that we have on the stable channel has also hit this (14 days of uptime, no particular io load):

log:
[1282119.153912] INFO: task jbd2/sda9-8:544 blocked for more than 122 seconds.
[1282119.157088]       Not tainted 5.15.63-flatcar #1
[1282119.159281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1282119.162723] task:jbd2/sda9-8     state:D stack:    0 pid:  544 ppid:     2 flags:0x00004000
[1282119.166441] Call Trace:
[1282119.167640]  <TASK>
[1282119.168675]  __schedule+0x2eb/0x8d0
[1282119.170341]  schedule+0x5b/0xd0
[1282119.171806]  jbd2_journal_commit_transaction+0x301/0x2850 [jbd2]
[1282119.175448]  ? wait_woken+0x70/0x70
[1282119.177174]  ? lock_timer_base+0x61/0x80
[1282119.179015]  jbd2_journal_check_available_features+0x1ab/0x3f0 [jbd2]
[1282119.181922]  ? wait_woken+0x70/0x70
[1282119.183533]  ? jbd2_journal_check_available_features+0x100/0x3f0 [jbd2]
[1282119.186566]  kthread+0x127/0x150
[1282119.188087]  ? set_kthread_struct+0x50/0x50
[1282119.190346]  ret_from_fork+0x22/0x30
[1282119.192027]  </TASK>
[1282119.193081] INFO: task systemd-journal:748 blocked for more than 122 seconds.
[1282119.196255]       Not tainted 5.15.63-flatcar #1
[1282119.198321] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1282119.201776] task:systemd-journal state:D stack:    0 pid:  748 ppid:     1 flags:0x00000004
[1282119.205604] Call Trace:
[1282119.206773]  <TASK>
[1282119.207794]  __schedule+0x2eb/0x8d0
[1282119.209410]  schedule+0x5b/0xd0
[1282119.210887]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[1282119.213342]  ? wait_woken+0x70/0x70
[1282119.214946]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[1282119.217482]  ? call_rcu+0xa2/0x330
[1282119.219070]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[1282119.221672]  ? step_into+0x47c/0x7b0
[1282119.223372]  ? __cond_resched+0x16/0x50
[1282119.225175]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[1282119.227342]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[1282119.229688]  ext4_dirty_inode+0x35/0x80 [ext4]
[1282119.231735]  __mark_inode_dirty+0x147/0x320
[1282119.233712]  touch_atime+0x13c/0x150
[1282119.235429]  filemap_read+0x308/0x320
[1282119.238370]  ? may_delete+0x2a0/0x2f0
[1282119.240286]  ? do_filp_open+0xa9/0x150
[1282119.242071]  new_sync_read+0x119/0x1b0
[1282119.244052]  ? 0xffffffffad000000
[1282119.245736]  vfs_read+0xf6/0x190
[1282119.247362]  __x64_sys_pread64+0x91/0xc0
[1282119.249365]  do_syscall_64+0x3b/0x90
[1282119.251173]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[1282119.253772] RIP: 0033:0x7f59ebb32b17
[1282119.255634] RSP: 002b:00007ffefe2b6520 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[1282119.259426] RAX: ffffffffffffffda RBX: 00007ffefe2b65d0 RCX: 00007f59ebb32b17
[1282119.262876] RDX: 0000000000000040 RSI: 00007ffefe2b6550 RDI: 0000000000000020
[1282119.266269] RBP: 000000000210fdb0 R08: 0000000000000000 R09: 00007ffefe2b6760
[1282119.270866] R10: 000000000210fdb0 R11: 0000000000000293 R12: 000055705e025030
[1282119.274357] R13: 0000000000000000 R14: 00007ffefe2b6550 R15: 000055705e025030
[1282119.277808]  </TASK>
[1282119.279102] INFO: task python:1117 blocked for more than 123 seconds.
[1282119.282317]       Not tainted 5.15.63-flatcar #1
[1282119.284531] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1282119.288819] task:python          state:D stack:    0 pid: 1117 ppid:  1010 flags:0x00000000
[1282119.292823] Call Trace:
[1282119.294282]  <TASK>
[1282119.295429]  __schedule+0x2eb/0x8d0
[1282119.297126]  schedule+0x5b/0xd0
[1282119.298734]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[1282119.301713]  ? wait_woken+0x70/0x70
[1282119.303438]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[1282119.306313]  ? hrtimer_try_to_cancel.part.0+0x50/0xd0
[1282119.308836]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[1282119.311496]  ? mlx5e_select_queue+0x3c/0x2d0 [mlx5_core]
[1282119.314170]  ? __cond_resched+0x16/0x50
[1282119.315956]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[1282119.318816]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[1282119.321540]  ext4_setattr+0x3a8/0x9a0 [ext4]
[1282119.323842]  notify_change+0x3c4/0x540
[1282119.325750]  ? ext4_es_delayed_clu+0x170/0x430 [ext4]
[1282119.328375]  ? ext4_es_delayed_clu+0x1ef/0x430 [ext4]
[1282119.330894]  ? do_truncate+0x7d/0xd0
[1282119.332698]  do_truncate+0x7d/0xd0
[1282119.334571]  path_openat+0x24d/0x1280
[1282119.336461]  do_filp_open+0xa9/0x150
[1282119.338190]  ? __check_object_size+0x146/0x160
[1282119.340409]  do_sys_openat2+0x9b/0x160
[1282119.342289]  __x64_sys_openat+0x54/0xa0
[1282119.344180]  do_syscall_64+0x3b/0x90
[1282119.346050]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[1282119.348496] RIP: 0033:0x7f47adf699e4
[1282119.352234] RSP: 002b:00007ffd95f911e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[1282119.357348] RAX: ffffffffffffffda RBX: 00007f47ac34a558 RCX: 00007f47adf699e4
[1282119.360865] RDX: 0000000000080241 RSI: 00007f47acb90b48 RDI: 00000000ffffff9c
[1282119.364381] RBP: 00007f47acb90b48 R08: 0000000000000000 R09: 0000561f71986760
[1282119.369496] R10: 00000000000001b6 R11: 0000000000000293 R12: 0000000000080241
[1282119.374035] R13: 00007f47acce1a80 R14: 00007f47adc43248 R15: 0000000000000000
[1282119.377852]  </TASK>
[1282119.379101] INFO: task VM Periodic Tas:11249 blocked for more than 123 seconds.
[1282119.383214]       Not tainted 5.15.63-flatcar #1
[1282119.385459] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1282119.389306] task:VM Periodic Tas state:D stack:    0 pid:11249 ppid:  8007 flags:0x00000000
[1282119.393418] Call Trace:
[1282119.394722]  <TASK>
[1282119.395852]  __schedule+0x2eb/0x8d0
[1282119.397882]  schedule+0x5b/0xd0
[1282119.399608]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[1282119.402324]  ? wait_woken+0x70/0x70
[1282119.404155]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[1282119.406854]  ? update_load_avg+0x7a/0x5e0
[1282119.408822]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[1282119.411679]  ? psi_task_switch+0x1e0/0x200
[1282119.414387]  ? __cond_resched+0x16/0x50
[1282119.416760]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[1282119.419236]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[1282119.421878]  ext4_dirty_inode+0x35/0x80 [ext4]
[1282119.424391]  __mark_inode_dirty+0x147/0x320
[1282119.426430]  generic_update_time+0x6c/0xd0
[1282119.428570]  file_update_time+0x127/0x140
[1282119.430744]  ext4_page_mkwrite+0x86/0xc50 [ext4]
[1282119.433058]  do_page_mkwrite+0x55/0xc0
[1282119.434861]  do_wp_page+0x20a/0x300
[1282119.436643]  ? task_tick_fair+0x7c/0x370
[1282119.438526]  __handle_mm_fault+0xb7a/0x1470
[1282119.440623]  handle_mm_fault+0xcf/0x2b0
[1282119.442491]  do_user_addr_fault+0x1be/0x670
[1282119.444588]  exc_page_fault+0x68/0x140
[1282119.446694]  asm_exc_page_fault+0x21/0x30
[1282119.448720] RIP: 0033:0x7f08a7c0b29e
[1282119.450453] RSP: 002b:00007f0879ffed30 EFLAGS: 00010206
[1282119.453058] RAX: 00048c1340635bb2 RBX: 00007f08a66851b8 RCX: 0000000000000018
[1282119.456630] RDX: 0000000000000000 RSI: 0000000000138faa RDI: 0000000000000001
[1282119.460183] RBP: 00007f0879ffed40 R08: 0000000000000000 R09: 0000000000000032
[1282119.467273] R10: 00007ffe86f54080 R11: 00007ffe86f54090 R12: 0000000000000008
[1282119.470823] R13: 00007f08a0303b40 R14: 0000000000000002 R15: 0000000000000001
[1282119.474328]  </TASK>
[1282119.475620] INFO: task jenkins.util.Ti:11283 blocked for more than 123 seconds.
[1282119.479241]       Not tainted 5.15.63-flatcar #1
[1282119.481472] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1282119.485306] task:jenkins.util.Ti state:D stack:    0 pid:11283 ppid:  8007 flags:0x00000000
[1282119.489061] Call Trace:
[1282119.490271]  <TASK>
[1282119.491341]  __schedule+0x2eb/0x8d0
[1282119.492969]  schedule+0x5b/0xd0
[1282119.494583]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[1282119.497032]  ? wait_woken+0x70/0x70
[1282119.498657]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[1282119.501246]  ? pagecache_get_page+0x28b/0x470
[1282119.503242]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[1282119.505761]  ? __cond_resched+0x16/0x50
[1282119.507555]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[1282119.509870]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[1282119.512310]  __ext4_new_inode+0x73f/0x1710 [ext4]
[1282119.514499]  ext4_insert_dentry+0x1c75/0x1d30 [ext4]
[1282119.517080]  path_openat+0xf4b/0x1280
[1282119.518780]  do_filp_open+0xa9/0x150
[1282119.520453]  ? vfs_statx+0x74/0x130
[1282119.522096]  ? __check_object_size+0x146/0x160
[1282119.524139]  do_sys_openat2+0x9b/0x160
[1282119.525976]  __x64_sys_openat+0x54/0xa0
[1282119.527774]  do_syscall_64+0x3b/0x90
[1282119.529493]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[1282119.531871] RIP: 0033:0x7f08a89067e4
[1282119.533542] RSP: 002b:00007f08079fa410 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[1282119.536954] RAX: ffffffffffffffda RBX: 00007f08079fa588 RCX: 00007f08a89067e4
[1282119.540170] RDX: 00000000000000c2 RSI: 00007f084c02c500 RDI: 00000000ffffff9c
[1282119.543579] RBP: 00007f084c02c500 R08: 0000000000000000 R09: 00000007acb8f77e
[1282119.546883] R10: 00000000000001b6 R11: 0000000000000293 R12: 00000000000000c2
[1282119.550079] R13: 00000000000000c2 R14: 00007f084c02c500 R15: 00007f08240f4000
[1282119.553255]  </TASK>
[1282119.554327] INFO: task SCM polling for:13607 blocked for more than 123 seconds.
[1282119.557600]       Not tainted 5.15.63-flatcar #1
[1282119.559902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1282119.563425] task:SCM polling for state:D stack:    0 pid:13607 ppid:  8007 flags:0x00000000
[1282119.567152] Call Trace:
[1282119.568335]  <TASK>
[1282119.569377]  __schedule+0x2eb/0x8d0
[1282119.571001]  ? recalibrate_cpu_khz+0x10/0x10
[1282119.572959]  schedule+0x5b/0xd0
[1282119.575701]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[1282119.578165]  ? wait_woken+0x70/0x70
[1282119.579791]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[1282119.582297]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[1282119.584770]  ? __update_load_avg_cfs_rq+0x28c/0x310
[1282119.587014]  ? __cond_resched+0x16/0x50
[1282119.588776]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[1282119.591102]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[1282119.593532]  ext4_setattr+0x3a8/0x9a0 [ext4]
[1282119.595540]  notify_change+0x3c4/0x540
[1282119.597319]  ? ext4_es_delayed_clu+0x170/0x430 [ext4]
[1282119.599641]  ? ext4_es_delayed_clu+0x1ef/0x430 [ext4]
[1282119.601964]  ? do_truncate+0x7d/0xd0
[1282119.603622]  do_truncate+0x7d/0xd0
[1282119.605236]  path_openat+0x24d/0x1280
[1282119.606999]  do_filp_open+0xa9/0x150
[1282119.608658]  ? __fput+0xff/0x250
[1282119.610171]  ? __check_object_size+0x146/0x160
[1282119.612351]  do_sys_openat2+0x9b/0x160
[1282119.614152]  __x64_sys_openat+0x54/0xa0
[1282119.615983]  do_syscall_64+0x3b/0x90
[1282119.617649]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[1282119.619993] RIP: 0033:0x7f08a89067e4
[1282119.621775] RSP: 002b:00007f07f7faa2e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[1282119.625933] RAX: ffffffffffffffda RBX: 000000071bca3a90 RCX: 00007f08a89067e4
[1282119.629183] RDX: 0000000000000241 RSI: 00007f0840094c40 RDI: 00000000ffffff9c
[1282119.632397] RBP: 00007f0840094c40 R08: 0000000000000000 R09: 0000000000000051
[1282119.635645] R10: 00000000000001b6 R11: 0000000000000293 R12: 0000000000000241
[1282119.638964] R13: 0000000000000241 R14: 00007f0840094c40 R15: 00007f081402cb48
[1282119.642228]  </TASK>
[1282119.643335] INFO: task Handling GET /l:830503 blocked for more than 123 seconds.
[1282119.646702]       Not tainted 5.15.63-flatcar #1
[1282119.648846] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1282119.652383] task:Handling GET /l state:D stack:    0 pid:830503 ppid:  8007 flags:0x00000000
[1282119.656380] Call Trace:
[1282119.657559]  <TASK>
[1282119.658594]  __schedule+0x2eb/0x8d0
[1282119.660229]  ? trigger_load_balance+0x60/0x300
[1282119.662271]  schedule+0x5b/0xd0
[1282119.663751]  rwsem_down_read_slowpath+0x32c/0x380
[1282119.665950]  ? hw_breakpoint_exceptions_notify+0x30/0xf0
[1282119.668366]  ? ktime_get+0x38/0xa0
[1282119.670018]  do_user_addr_fault+0x42f/0x670
[1282119.672059]  ? read_hv_sched_clock_tsc+0x5/0x40
[1282119.674158]  exc_page_fault+0x68/0x140
[1282119.675939]  asm_exc_page_fault+0x21/0x30
[1282119.677791] RIP: 0033:0x7f08a77a4d3c
[1282119.679642] RSP: 002b:00007f07f521bff0 EFLAGS: 00010202
[1282119.682089] RAX: 00007f08a6683d28 RBX: 00007f08a00402d0 RCX: 000000009c700000
[1282119.685259] RDX: 000000007ce00000 RSI: 000000007ce00000 RDI: 00007f08a0195bb0
[1282119.691672] RBP: 00007f07f521bff0 R08: 00007f08a850987c R09: 00007f0835091f20
[1282119.694867] R10: ffffffffffffffff R11: 0000000000000000 R12: 00007f083c9b61d0
[1282119.698137] R13: 0000000000000000 R14: 00007f08a76e57b0 R15: 0000000000000800
[1282119.701408]  </TASK>
[1282119.702576] INFO: task org.jenkinsci.p:875804 blocked for more than 123 seconds.
[1282119.705988]       Not tainted 5.15.63-flatcar #1
[1282119.708152] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1282119.711696] task:org.jenkinsci.p state:D stack:    0 pid:875804 ppid:  8007 flags:0x00000000
[1282119.715495] Call Trace:
[1282119.716676]  <TASK>
[1282119.717757]  __schedule+0x2eb/0x8d0
[1282119.719495]  ? update_load_avg+0x7a/0x5e0
[1282119.721348]  schedule+0x5b/0xd0
[1282119.722885]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[1282119.725377]  ? wait_woken+0x70/0x70
[1282119.727048]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[1282119.729522]  ? timerqueue_del+0x2a/0x50
[1282119.731304]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[1282119.733797]  ? hrtimer_cancel+0x1d/0x40
[1282119.735798]  ? futex_wait+0x21a/0x240
[1282119.737584]  ? __cond_resched+0x16/0x50
[1282119.739363]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[1282119.741573]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[1282119.743920]  ext4_dirty_inode+0x35/0x80 [ext4]
[1282119.745987]  __mark_inode_dirty+0x147/0x320
[1282119.748024]  generic_update_time+0x6c/0xd0
[1282119.749950]  file_update_time+0x127/0x140
[1282119.751906]  ? generic_write_checks+0x61/0xc0
[1282119.753905]  ext4_llseek+0x3fa/0x1cb0 [ext4]
[1282119.755906]  new_sync_write+0x11c/0x1b0
[1282119.757693]  ? intel_get_event_constraints+0x300/0x390
[1282119.760046]  vfs_write+0x1de/0x270
[1282119.761659]  ksys_write+0x5f/0xe0
[1282119.763218]  do_syscall_64+0x3b/0x90
[1282119.764911]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[1282119.767352] RIP: 0033:0x7f08a8905fef
[1282119.769060] RSP: 002b:00007f07f1ade4c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[1282119.772473] RAX: ffffffffffffffda RBX: 00007f0814407348 RCX: 00007f08a8905fef
[1282119.775715] RDX: 000000000000005e RSI: 00007f07f1ade540 RDI: 0000000000000338
[1282119.779100] RBP: 00007f07f1ade510 R08: 0000000000000000 R09: 000000076ae08648
[1282119.782445] R10: 0000000000178e34 R11: 0000000000000293 R12: 000000000000005e
[1282119.785816] R13: 00007f07f1ade540 R14: 0000000000000338 R15: 000000000000005e
[1282119.789069]  </TASK>
[1282119.790140] INFO: task kworker/u8:2:874634 blocked for more than 123 seconds.
[1282119.793381]       Not tainted 5.15.63-flatcar #1
[1282119.795512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1282119.800485] task:kworker/u8:2    state:D stack:    0 pid:874634 ppid:     2 flags:0x00004000
[1282119.804354] Workqueue: writeback wb_workfn (flush-8:0)
[1282119.806726] Call Trace:
[1282119.807936]  <TASK>
[1282119.809026]  __schedule+0x2eb/0x8d0
[1282119.810692]  schedule+0x5b/0xd0
[1282119.812181]  jbd2_journal_free_reserved+0xca/0x9d0 [jbd2]
[1282119.814641]  ? wait_woken+0x70/0x70
[1282119.816909]  jbd2_journal_free_reserved+0x2d9/0x9d0 [jbd2]
[1282119.819409]  ? find_get_pages_range+0x197/0x200
[1282119.821507]  jbd2_journal_free_reserved+0x5ab/0x9d0 [jbd2]
[1282119.823988]  ? ext4_convert_inline_data+0xb07/0x2020 [ext4]
[1282119.826555]  ? __cond_resched+0x16/0x50
[1282119.828352]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[1282119.830542]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[1282119.833049]  __ext4_mark_inode_dirty+0x502/0x1880 [ext4]
[1282119.835474]  ? __cond_resched+0x16/0x50
[1282119.837281]  ? __getblk_gfp+0x27/0x60
[1282119.838986]  ? __ext4_handle_dirty_metadata+0x56/0x19b0 [ext4]
[1282119.841629]  ? ext4_mark_iloc_dirty+0x56a/0xaf0 [ext4]
[1282119.843979]  do_writepages+0xd1/0x200
[1282119.845682]  __writeback_single_inode+0x39/0x290
[1282119.847884]  writeback_sb_inodes+0x20d/0x490
[1282119.849861]  __writeback_inodes_wb+0x4c/0xe0
[1282119.851844]  wb_writeback+0x1c0/0x280
[1282119.853561]  wb_workfn+0x29f/0x4d0
[1282119.855195]  ? psi_task_switch+0x1e0/0x200
[1282119.857128]  process_one_work+0x226/0x3c0
[1282119.859031]  worker_thread+0x50/0x410
[1282119.860747]  ? process_one_work+0x3c0/0x3c0
[1282119.862674]  kthread+0x127/0x150
[1282119.864307]  ? set_kthread_struct+0x50/0x50
[1282119.867863]  ret_from_fork+0x22/0x30
[1282119.869538]  </TASK>

@t-lo
Copy link
Member

t-lo commented Sep 28, 2022

State of our research so far:

The suspicious commit has not been reverted yet, and is still in main (therefore, also in the 5.15 LTS kernel series). We will continue to closely monitor the situation.

Kudos go to @vbatts and to @jepio for doing the leg work.

@t-lo
Copy link
Member

t-lo commented Sep 28, 2022

@Champ-Goblem Since we still have trouble reproducing this on our end, would it be possible for you to apply below patch to 5.15.63 (i.e. 3227.2.2) and check if you can still reproduce the issue? The patch reverts the suspicious commit 51ae846cff5.

We have just started a discussion on linux-ext4 (replying to the syzbot thread) to raise awareness of potential issues with 51ae846cff5. A clear bisect with the patch reverted (and no issues caused) would help a lot in this discussion.

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 98f381f6fc18..072a9bc931fc 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3139,15 +3139,13 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
 {
 	struct inode *inode = mapping->host;
 	journal_t *journal;
-	sector_t ret = 0;
 	int err;
 
-	inode_lock_shared(inode);
 	/*
 	 * We can get here for an inline file via the FIBMAP ioctl
 	 */
 	if (ext4_has_inline_data(inode))
-		goto out;
+		return 0;
 
 	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) &&
 			test_opt(inode->i_sb, DELALLOC)) {
@@ -3186,14 +3184,10 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
 		jbd2_journal_unlock_updates(journal);
 
 		if (err)
-			goto out;
+			return 0;
 	}
 
-	ret = iomap_bmap(mapping, block, &ext4_iomap_ops);
-
-out:
-	inode_unlock_shared(inode);
-	return ret;
+	return iomap_bmap(mapping, block, &ext4_iomap_ops);
 }
 
 static int ext4_readpage(struct file *file, struct page *page)

@Champ-Goblem
Copy link

@t-lo Yep I can give this a go, although I may not be able to provide results till some time next week

@pothos
Copy link
Member

pothos commented Sep 28, 2022

Should we help by building the patched image?

@Champ-Goblem
Copy link

I have already started a build on our infra for the patched image, so that should be fine, it will probably take an hour and a bit for this to build. Ill look at getting it rolled out and start testing by the afternoon, so hopefully ill have some results either later today or tomorrow some time.
Ill be off from Friday till Monday so it may also be worth you guys doing some testing on your end in case I am unable to get any results by then.

@Champ-Goblem
Copy link

Okay I think the patch provided above fixes the problem. Ive deployed two nodes, one running stock 5.15.63 and a second running with the patch.
On the regular node without the patch, I could recreate this reliably by spawning 20-30 pods each pulling different images. Every test run would result in SSH connections dying and the node would be become completely unresponsive with most of the images stuck in a ContainerCreating state.
On the node with the patch I ran the pull a couple of times, pruning the images between each run. No problems were encountered and the images all entered a Running state successfully, SSH connections were also responsive during this time.

@t-lo
Copy link
Member

t-lo commented Sep 29, 2022

Awesome, thank you very much for testing! We're discussing this issue with Jan Kara, an Ext4 maintainer, upstream. (https://www.spinics.net/lists/linux-ext4/msg85417.html ff., messages from today will become available in the archive tomorrow).
Unfortunately, there seems to be an issue with the stack traces we provided (from this very issue); we're currently looking into possible causes and will follow up with the maintainer.

@jepio
Copy link
Member

jepio commented Sep 30, 2022

@Champ-Goblem would you be able to rebuild and test and image with the 5.15.63 kernel, without the revert but with this PR applied: flatcar-archive/coreos-overlay#2196? This should fix the issue with the stacktraces being slightly off, and help upstream come up with a proper fix.

@Champ-Goblem
Copy link

I have rebuilt flatcar with the above PR, below should be the logs from the system during the failure.

[ 1599.005306] INFO: task jbd2/sda9-8:702 blocked for more than 122 seconds.
[ 1599.012290]       Not tainted 5.15.63-flatcar #1
[ 1599.017128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1599.025100] task:jbd2/sda9-8     state:D stack:    0 pid:  702 ppid:     2 flags:0x00004000
[ 1599.033579] Call Trace:
[ 1599.036144]  <TASK>
[ 1599.038354]  __schedule+0x2eb/0x8d0
[ 1599.042109]  schedule+0x5b/0xd0
[ 1599.045372]  jbd2_journal_commit_transaction+0x301/0x18e0 [jbd2]
[ 1599.051518]  ? wait_woken+0x70/0x70
[ 1599.055127]  ? lock_timer_base+0x61/0x80
[ 1599.059181]  kjournald2+0xab/0x270 [jbd2]
[ 1599.063317]  ? wait_woken+0x70/0x70
[ 1599.066923]  ? load_superblock.part.0+0xb0/0xb0 [jbd2]
[ 1599.072200]  kthread+0x124/0x150
[ 1599.075543]  ? set_kthread_struct+0x50/0x50
[ 1599.079849]  ret_from_fork+0x1f/0x30
[ 1599.083538]  </TASK>
[ 1599.085835] INFO: task kworker/u32:13:732 blocked for more than 122 seconds.
[ 1599.093010]       Not tainted 5.15.63-flatcar #1
[ 1599.097739] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1599.105688] task:kworker/u32:13  state:D stack:    0 pid:  732 ppid:     2 flags:0x00004000
[ 1599.114160] Workqueue: writeback wb_workfn (flush-8:0)
[ 1599.119418] Call Trace:
[ 1599.121976]  <TASK>
[ 1599.124192]  __schedule+0x2eb/0x8d0
[ 1599.127797]  schedule+0x5b/0xd0
[ 1599.131051]  wait_transaction_locked+0x8a/0xd0 [jbd2]
[ 1599.136227]  ? wait_woken+0x70/0x70
[ 1599.139829]  add_transaction_credits+0xd9/0x2b0 [jbd2]
[ 1599.145091]  ? find_get_pages_range+0x197/0x200
[ 1599.149832]  start_this_handle+0xfb/0x520 [jbd2]
[ 1599.154591]  ? mpage_release_unused_pages+0x1c7/0x1e0 [ext4]
[ 1599.160524]  ? __cond_resched+0x16/0x50
[ 1599.164478]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 1599.169393]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 1599.174669]  ext4_writepages+0x302/0xfd0 [ext4]
[ 1599.179328]  ? __find_get_block+0xb3/0x2c0
[ 1599.183539]  ? __cond_resched+0x16/0x50
[ 1599.187518]  ? __getblk_gfp+0x27/0x60
[ 1599.191307]  ? cpumask_next_and+0x1f/0x30
[ 1599.195433]  ? update_sd_lb_stats.constprop.0+0xff/0x8a0
[ 1599.200888]  do_writepages+0xce/0x200
[ 1599.204788]  ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 1599.209796]  ? percpu_counter_add_batch+0x5b/0x70
[ 1599.214627]  ? fprop_reflect_period_percpu.isra.0+0x7b/0xc0
[ 1599.220439]  __writeback_single_inode+0x39/0x290
[ 1599.225183]  writeback_sb_inodes+0x20d/0x490
[ 1599.229569]  __writeback_inodes_wb+0x4c/0xe0
[ 1599.233951]  wb_writeback+0x1c0/0x280
[ 1599.237727]  wb_workfn+0x29f/0x4d0
[ 1599.241244]  process_one_work+0x223/0x3c0
[ 1599.245371]  worker_thread+0x50/0x410
[ 1599.249146]  ? process_one_work+0x3c0/0x3c0
[ 1599.253460]  kthread+0x124/0x150
[ 1599.256811]  ? set_kthread_struct+0x50/0x50
[ 1599.261110]  ret_from_fork+0x1f/0x30
[ 1599.264820]  </TASK>
[ 1599.267171] INFO: task systemd-journal:1098 blocked for more than 123 seconds.
[ 1599.274538]       Not tainted 5.15.63-flatcar #1
[ 1599.279282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1599.287256] task:systemd-journal state:D stack:    0 pid: 1098 ppid:     1 flags:0x00000224
[ 1599.295753] Call Trace:
[ 1599.298343]  <TASK>
[ 1599.300570]  __schedule+0x2eb/0x8d0
[ 1599.304178]  schedule+0x5b/0xd0
[ 1599.307434]  wait_transaction_locked+0x8a/0xd0 [jbd2]
[ 1599.312814]  ? wait_woken+0x70/0x70
[ 1599.316431]  add_transaction_credits+0xd9/0x2b0 [jbd2]
[ 1599.321722]  start_this_handle+0xfb/0x520 [jbd2]
[ 1599.326460]  ? __cond_resched+0x16/0x50
[ 1599.330414]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 1599.335331]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 1599.340615]  ext4_truncate+0x167/0x480 [ext4]
[ 1599.345134]  ext4_setattr+0x59a/0x9a0 [ext4]
[ 1599.349567]  ? virtnet_poll+0x31b/0x45b [virtio_net]
[ 1599.354663]  ? common_interrupt+0xf/0xa0
[ 1599.358705]  notify_change+0x3c1/0x540
[ 1599.362591]  ? do_truncate+0x7d/0xd0
[ 1599.366292]  do_truncate+0x7d/0xd0
[ 1599.369836]  do_sys_ftruncate+0xc9/0x150
[ 1599.373882]  do_syscall_64+0x38/0x90
[ 1599.377581]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 1599.382757] RIP: 0033:0x7fc405986757
[ 1599.386449] RSP: 002b:00007ffdf6776af8 EFLAGS: 00000202 ORIG_RAX: 000000000000004d
[ 1599.394126] RAX: ffffffffffffffda RBX: 00007ffdf6776b40 RCX: 00007fc405986757
[ 1599.401371] RDX: 0000557b78aa8f40 RSI: 0000000001000000 RDI: 0000000000000015
[ 1599.408615] RBP: 0000557b78aac520 R08: 0000000000000001 R09: 0000557b78aac5ac
[ 1599.415860] R10: 0000000000000000 R11: 0000000000000202 R12: 0000557b78a9c600
[ 1599.423119] R13: 00007ffdf6776b38 R14: 0000000000000003 R15: 0000000000000000
[ 1599.430406]  </TASK>
[ 1599.432754] INFO: task systemd-timesyn:1277 blocked for more than 123 seconds.
[ 1599.440097]       Not tainted 5.15.63-flatcar #1
[ 1599.444842] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1599.452799] task:systemd-timesyn state:D stack:    0 pid: 1277 ppid:     1 flags:0x00000224
[ 1599.461268] Call Trace:
[ 1599.463836]  <TASK>
[ 1599.466046]  __schedule+0x2eb/0x8d0
[ 1599.469684]  schedule+0x5b/0xd0
[ 1599.472946]  wait_transaction_locked+0x8a/0xd0 [jbd2]
[ 1599.478133]  ? wait_woken+0x70/0x70
[ 1599.481839]  add_transaction_credits+0xd9/0x2b0 [jbd2]
[ 1599.487387]  ? __fget_files+0x79/0xb0
[ 1599.491185]  start_this_handle+0xfb/0x520 [jbd2]
[ 1599.495963]  ? nd_jump_link+0x4d/0xc0
[ 1599.499752]  ? __cond_resched+0x16/0x50
[ 1599.503708]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 1599.508620]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 1599.513897]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 1599.518477]  __mark_inode_dirty+0x144/0x320
[ 1599.522795]  ext4_setattr+0x1fb/0x9a0 [ext4]
[ 1599.527200]  notify_change+0x3c1/0x540
[ 1599.531071]  ? vfs_utimes+0x139/0x220
[ 1599.534845]  vfs_utimes+0x139/0x220
[ 1599.538446]  do_utimes+0xb4/0x120
[ 1599.541874]  __x64_sys_utimensat+0x70/0xb0
[ 1599.546132]  ? syscall_trace_enter.constprop.0+0x143/0x1c0
[ 1599.551751]  do_syscall_64+0x38/0x90
[ 1599.555443]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 1599.560628] RIP: 0033:0x7fce21af901f
[ 1599.564858] RSP: 002b:00007ffedd36c8c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000118
[ 1599.572559] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fce21af901f
[ 1599.579806] RDX: 0000000000000000 RSI: 00007ffedd36c8d0 RDI: 00000000ffffff9c
[ 1599.587062] RBP: 00007ffedd36c8d0 R08: 0000000000000000 R09: 00007ffedd36c760
[ 1599.594313] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[ 1599.601558] R13: 00000000ffffffff R14: ffffffffffffffff R15: 00000000ffffffff
[ 1599.608806]  </TASK>
[ 1599.611124] INFO: task bash:1925 blocked for more than 123 seconds.
[ 1599.617618]       Not tainted 5.15.63-flatcar #1
[ 1599.622347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1599.630283] task:bash            state:D stack:    0 pid: 1925 ppid:  1924 flags:0x00000004
[ 1599.638750] Call Trace:
[ 1599.641309]  <TASK>
[ 1599.643528]  __schedule+0x2eb/0x8d0
[ 1599.647140]  schedule+0x5b/0xd0
[ 1599.650399]  wait_transaction_locked+0x8a/0xd0 [jbd2]
[ 1599.655571]  ? wait_woken+0x70/0x70
[ 1599.659188]  add_transaction_credits+0xd9/0x2b0 [jbd2]
[ 1599.664444]  ? pagecache_get_page+0x28b/0x470
[ 1599.668914]  start_this_handle+0xfb/0x520 [jbd2]
[ 1599.673664]  ? __cond_resched+0x16/0x50
[ 1599.677615]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 1599.682525]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 1599.687801]  __ext4_new_inode+0x73f/0x1710 [ext4]
[ 1599.692639]  ext4_create+0x115/0x1d0 [ext4]
[ 1599.696968]  path_openat+0xf48/0x1280
[ 1599.700751]  ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 1599.705742]  ? __wake_up_common_lock+0x8a/0xc0
[ 1599.710299]  do_filp_open+0xa9/0x150
[ 1599.713990]  ? vfs_statx+0x74/0x130
[ 1599.717615]  ? __check_object_size+0x146/0x160
[ 1599.722179]  do_sys_openat2+0x9b/0x160
[ 1599.726057]  __x64_sys_openat+0x54/0xa0
[ 1599.730003]  do_syscall_64+0x38/0x90
[ 1599.733691]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 1599.738870] RIP: 0033:0x7f138fdb5337
[ 1599.742558] RSP: 002b:00007ffdc49dcaa0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 1599.750249] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f138fdb5337
[ 1599.757497] RDX: 0000000000000241 RSI: 00005583f2829d70 RDI: 00000000ffffff9c
[ 1599.764747] RBP: 00005583f2829d70 R08: 0000000000000000 R09: 0000000000000000
[ 1599.771998] R10: 0000000000000180 R11: 0000000000000246 R12: 0000000000000241
[ 1599.779258] R13: 0000000000000000 R14: 00005583f1f45534 R15: 0000000000000000
[ 1599.786510]  </TASK>
[ 1599.788906] INFO: task MVStore backgro:8970 blocked for more than 123 seconds.
[ 1599.796262]       Not tainted 5.15.63-flatcar #1
[ 1599.800992] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1599.808931] task:MVStore backgro state:D stack:    0 pid: 8970 ppid:  8062 flags:0x00000000
[ 1599.817398] Call Trace:
[ 1599.819968]  <TASK>
[ 1599.822205]  __schedule+0x2eb/0x8d0
[ 1599.825816]  schedule+0x5b/0xd0
[ 1599.829073]  wait_transaction_locked+0x8a/0xd0 [jbd2]
[ 1599.834250]  ? wait_woken+0x70/0x70
[ 1599.837852]  add_transaction_credits+0xd9/0x2b0 [jbd2]
[ 1599.843108]  start_this_handle+0xfb/0x520 [jbd2]
[ 1599.847859]  ? __cond_resched+0x16/0x50
[ 1599.851810]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 1599.856715]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 1599.861992]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 1599.866585]  __mark_inode_dirty+0x144/0x320
[ 1599.870892]  generic_update_time+0x6c/0xd0
[ 1599.875118]  file_update_time+0x127/0x140
[ 1599.879242]  ? generic_write_checks+0x61/0xc0
[ 1599.883721]  ext4_buffered_write_iter+0x5a/0x180 [ext4]
[ 1599.889103]  do_iter_readv_writev+0x14f/0x1b0
[ 1599.893582]  do_iter_write+0x80/0x1c0
[ 1599.897370]  ovl_write_iter+0x2d3/0x450 [overlay]
[ 1599.902195]  new_sync_write+0x119/0x1b0
[ 1599.906151]  ? intel_get_event_constraints+0x300/0x390
[ 1599.911403]  vfs_write+0x1de/0x270
[ 1599.914917]  __x64_sys_pwrite64+0x91/0xc0
[ 1599.919039]  do_syscall_64+0x38/0x90
[ 1599.922727]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 1599.927968] RIP: 0033:0x7f2f532424a3
[ 1599.931659] RSP: 002b:00007f2f2cc8aa78 EFLAGS: 00000246 ORIG_RAX: 0000000000000012
[ 1599.939352] RAX: ffffffffffffffda RBX: 00007f2f2cc8bb38 RCX: 00007f2f532424a3
[ 1599.946606] RDX: 0000000000001000 RSI: 00007f2f2c769d90 RDI: 0000000000000014
[ 1599.953875] RBP: 00007f2f2cc8aaf0 R08: 0000000000000000 R09: 0000000000000000
[ 1599.961127] R10: 00000000000a3000 R11: 0000000000000246 R12: 0000000000000012
[ 1599.968527] R13: 00007f2f2c769d90 R14: 00000000000a3000 R15: 00007f2f2f94b800
[ 1599.975775]  </TASK>
[ 1599.978135] INFO: task k3s-server:12051 blocked for more than 123 seconds.
[ 1599.985142]       Not tainted 5.15.63-flatcar #1
[ 1599.989873] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1599.997816] task:k3s-server      state:D stack:    0 pid:12051 ppid:     1 flags:0x00000000
[ 1600.006283] Call Trace:
[ 1600.008862]  <TASK>
[ 1600.011083]  __schedule+0x2eb/0x8d0
[ 1600.014779]  schedule+0x5b/0xd0
[ 1600.018034]  rwsem_down_write_slowpath+0x220/0x4f0
[ 1600.022947]  chown_common+0x152/0x250
[ 1600.026740]  ? __do_sys_newfstat+0x57/0x60
[ 1600.030950]  ? __fget_files+0x79/0xb0
[ 1600.034749]  ksys_fchown+0x74/0xb0
[ 1600.038283]  __x64_sys_fchown+0x16/0x20
[ 1600.042230]  do_syscall_64+0x38/0x90
[ 1600.045922]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 1600.051091] RIP: 0033:0x3f8045f
[ 1600.054346] RSP: 002b:00007f749fe0a670 EFLAGS: 00000202 ORIG_RAX: 000000000000005d
[ 1600.062050] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000003f8045f
[ 1600.069297] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000188
[ 1600.076557] RBP: 00007f749dfd2498 R08: 0000000000000188 R09: 000000000572a6fb
[ 1600.083808] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000188
[ 1600.091051] R13: 00007f749f5416cd R14: 0000000000080006 R15: 00000000000001a4
[ 1600.098312]  </TASK>
[ 1600.100611] INFO: task k3s-server:12052 blocked for more than 123 seconds.
[ 1600.107601]       Not tainted 5.15.63-flatcar #1
[ 1600.112344] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1600.120281] task:k3s-server      state:D stack:    0 pid:12052 ppid:     1 flags:0x00000000
[ 1600.128758] Call Trace:
[ 1600.131315]  <TASK>
[ 1600.133536]  __schedule+0x2eb/0x8d0
[ 1600.137136]  schedule+0x5b/0xd0
[ 1600.140408]  wait_transaction_locked+0x8a/0xd0 [jbd2]
[ 1600.145574]  ? wait_woken+0x70/0x70
[ 1600.149190]  add_transaction_credits+0xd9/0x2b0 [jbd2]
[ 1600.154442]  ? __cond_resched+0x16/0x50
[ 1600.158395]  ? dput+0x32/0x310
[ 1600.161559]  start_this_handle+0xfb/0x520 [jbd2]
[ 1600.166290]  ? asm_sysvec_apic_timer_interrupt+0x15/0x20
[ 1600.171713]  ? __cond_resched+0x16/0x50
[ 1600.175660]  jbd2__journal_start+0xfb/0x1e0 [jbd2]
[ 1600.180568]  __ext4_journal_start_sb+0xf8/0x110 [ext4]
[ 1600.185846]  ext4_dirty_inode+0x35/0x80 [ext4]
[ 1600.190441]  __mark_inode_dirty+0x144/0x320
[ 1600.194737]  ext4_setattr+0x1fb/0x9a0 [ext4]
[ 1600.199134]  notify_change+0x3c1/0x540
[ 1600.203002]  ? chown_common+0x168/0x250
[ 1600.206957]  chown_common+0x168/0x250
[ 1600.210731]  ? __fget_files+0x79/0xb0
[ 1600.214502]  ksys_fchown+0x74/0xb0
[ 1600.218017]  __x64_sys_fchown+0x16/0x20
[ 1600.221965]  do_syscall_64+0x38/0x90
[ 1600.225674]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 1600.230851] RIP: 0033:0x3f8045f
[ 1600.234114] RSP: 002b:00007f749fde7670 EFLAGS: 00000202 ORIG_RAX: 000000000000005d
[ 1600.241794] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000003f8045f
[ 1600.249050] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000145
[ 1600.256388] RBP: 00007f749e2213c8 R08: 0000000000000145 R09: 000000000572a6fb
[ 1600.263636] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000145
[ 1600.270890] R13: 00007f749dfbef0d R14: 0000000000080006 R15: 00000000000001a4
[ 1600.278138]  </TASK>
[ 1600.280436] INFO: task k3s-server:12055 blocked for more than 124 seconds.
[ 1600.287449]       Not tainted 5.15.63-flatcar #1
[ 1600.292184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1600.300127] task:k3s-server      state:D stack:    0 pid:12055 ppid:     1 flags:0x00000000
[ 1600.308590] Call Trace:
[ 1600.311151]  <TASK>
[ 1600.313358]  __schedule+0x2eb/0x8d0
[ 1600.316957]  schedule+0x5b/0xd0
[ 1600.320208]  rwsem_down_write_slowpath+0x220/0x4f0
[ 1600.325113]  chown_common+0x152/0x250
[ 1600.328890]  ? __do_sys_newfstat+0x57/0x60
[ 1600.333107]  ? __fget_files+0x79/0xb0
[ 1600.336885]  ksys_fchown+0x74/0xb0
[ 1600.340400]  __x64_sys_fchown+0x16/0x20
[ 1600.344346]  do_syscall_64+0x38/0x90
[ 1600.348035]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 1600.353218] RIP: 0033:0x3f8045f
[ 1600.356471] RSP: 002b:00007f749fcae670 EFLAGS: 00000202 ORIG_RAX: 000000000000005d
[ 1600.364149] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000003f8045f
[ 1600.371397] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000019c
[ 1600.378648] RBP: 00007f749f539888 R08: 000000000000019c R09: 000000000572a6fb
[ 1600.385895] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000019c
[ 1600.393155] R13: 00007f749e2e5a3d R14: 0000000000080006 R15: 00000000000001a4
[ 1600.400407]  </TASK>
[ 1600.402761] INFO: task k3s-server:12057 blocked for more than 124 seconds.
[ 1600.409779]       Not tainted 5.15.63-flatcar #1
[ 1600.414512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1600.422470] task:k3s-server      state:D stack:    0 pid:12057 ppid:     1 flags:0x00000000
[ 1600.430934] Call Trace:
[ 1600.433510]  <TASK>
[ 1600.435722]  __schedule+0x2eb/0x8d0
[ 1600.439328]  schedule+0x5b/0xd0
[ 1600.442583]  rwsem_down_write_slowpath+0x220/0x4f0
[ 1600.447495]  chown_common+0x152/0x250
[ 1600.451274]  ? __do_sys_newfstat+0x57/0x60
[ 1600.455483]  ? __fget_files+0x79/0xb0
[ 1600.459258]  ksys_fchown+0x74/0xb0
[ 1600.462773]  __x64_sys_fchown+0x16/0x20
[ 1600.466724]  do_syscall_64+0x38/0x90
[ 1600.470416]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 1600.475604] RIP: 0033:0x3f8045f
[ 1600.478857] RSP: 002b:00007f749fc28670 EFLAGS: 00000202 ORIG_RAX: 000000000000005d
[ 1600.486539] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000003f8045f
[ 1600.493788] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000196
[ 1600.501050] RBP: 00007f749f64dac8 R08: 0000000000000196 R09: 000000000572a6fb
[ 1600.508318] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000196
[ 1600.515569] R13: 00007f749eadef0d R14: 0000000000080006 R15: 00000000000001a4
[ 1600.522817]  </TASK>
[ 1629.084561] systemd[1]: systemd-journald.service: Processes still around after SIGKILL. Ignoring.

@jepio
Copy link
Member

jepio commented Oct 4, 2022

Thanks, @Champ-Goblem, I'll forward that to the mailing list, any chance you can also provide the following:

So probably first try find out why stacktraces are not working right on
your kernel and fix them. And then, if the hang happens, please trigger
sysrq-w (or do echo w >/proc/sysrq-trigger if you can still get to the
machine) and send here the output. It will dump all blocked tasks and from
that we should be able to better understand what is happening.

@Champ-Goblem
Copy link

@jepio I think I have managed to get some of the list of hung tasks, I cant be sure how complete the list is, but I think it should be a good starting point nonetheless
hung-tasks.txt

@wayfrro
Copy link

wayfrro commented Oct 21, 2022

We also experience this. Any ideas if and when this will be solved? Is this certain that's vanilla kernel issue? If so, other distos should have the same problem as well.
We're already making decision to come back to LTS due this... but what if this comes to LTS too?
This is pretty serious. :/

@jepio
Copy link
Member

jepio commented Nov 22, 2022

There's a fix available now (https://lore.kernel.org/linux-ext4/20221122115715.kxqhsk2xs4nrofyb@quack3/T/#ma65f0113b4b2f9259b8f7d8af16b8cb351728d20) and I've tested it on 5.15 with the reproducer in this thread, I can no longer reproduce the hang.

This should now get merged upstream and subsequently backported. Then it'll be included in a Flatcar stable release.

@databus23
Copy link

databus23 commented Dec 1, 2022

Thanks for all the work that went into fixing this!
Any update on the upstream merging? Is this progressing? I'm not familiar with kernel development and so far it seems to me nobody has responded to the patch by the ext4 developer and nothing seems to be happening.

Any advise were I can track if the patch has landed upstream?

We have stopped flatcar OS upgrades for 3 month now and our PCI compliance is not looking too good... 😬

@dopafon
Copy link

dopafon commented Dec 1, 2022

@databus23 We have worked around this by changing the root FS from Ext4 to XFS. Our tests look good so far but haven't rolled it out to production yet. In case you need to upgrade flatcar for some reason, this might be an alternative idea for you until the ext4 fix has made it's way up to a flatcar stable release.

@Champ-Goblem
Copy link

We have been testing the new patch [1] which we have applied against 5.15.80 in a custom build and it is currently looking stable again on our development clusters with Flatcar 3227.2.2

[1] https://lore.kernel.org/linux-ext4/20221122174807.GA9658@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/2-0001-ext4-Fix-deadlock-due-to-mbcache-en.patch

@jepio
Copy link
Member

jepio commented Dec 2, 2022

Any update on the upstream merging? Is this progressing? I'm not familiar with kernel development and so far it seems to me nobody has responded to the patch by the ext4 developer and nothing seems to be happening.

Unfortunately there hasn't been any response from the ext4 maintainer, I've poked them this week.

In the meantime we'll likely take the backport into the next alpha/beta Flatcar release next week.

@Champ-Goblem thanks all the help with testing and reproducing, it's good to have a datapoint confirming that no new issues pop-up with the patch.

@till
Copy link

till commented Dec 6, 2022

When is it in stable?

@jepio jepio reopened this Dec 6, 2022
@jepio
Copy link
Member

jepio commented Dec 6, 2022

Didn't mean to close this. It will land in stable in 1-2 release cycles (next time beta promotes), or when the bugfix lands in the stable trees.

If you're hitting this issue and have been holding off on updates - do update to the beta release that will come this week.

@schmitch
Copy link

is this still not in stable? I've tought it already is and updated to 3374.2.1 and still get processes to hang.

@tmsdce
Copy link

tmsdce commented Dec 21, 2022

Hi @schmitch
I think the kernel patch is available in the latest beta release 3432.1.0 but not promoted to stable yet.

@databus23
Copy link

is this still not in stable?

No, it's added to beta in 3432.1.0. So as long as stable is on a smaller version it hasn't trickled down to stable.

@dongsupark
Copy link
Member

No, it's added to beta in 3432.1.0. So as long as stable is on a smaller version it hasn't trickled down to stable.

Correct. The fix is not in Stable yet, but only in Beta & Alpha.

Good news is, the ext4 deadlock fix was recently merged to the mainline Kernel. Though that is still not included in any Kernel release. Looks like it needs a little more time.
Anyway most of Flatcar maintainers already left for holidays, so the next release should happen in Jan. in the next year.

@zzvara
Copy link

zzvara commented Dec 28, 2022

is this still not in stable? I've tought it already is and updated to 3374.2.1 and still get processes to hang.

I can confirm that this is still an issue on the stable track, as it is still on version 3374.2.1. This appeared immediately as I rolled the upgrades out manually to 4 of my nodes. The issue appears to be present on only 1 of the 4 nodes, although the nodes have exactly the same hardware, software, and Kubernetes installed with Kubespray. I'm now updating to Beta channel.

@zou-weidong
Copy link

Looking forward to the next stable channel release to solve this problem.

@zzvara
Copy link

zzvara commented Dec 29, 2022

No more tears with beta 3432.1.0.

@databus23
Copy link

The fix is now released with Kernel 6.1.4

@jepio
Copy link
Member

jepio commented Jan 10, 2023

And i'm working on getting it into 5.15.x.

https://lore.kernel.org/stable/1672844851195248@kroah.com/
https://lore.kernel.org/stable/1672927319-22912-1-git-send-email-jpiotrowski@linux.microsoft.com/

@jepio
Copy link
Member

jepio commented Jan 11, 2023

the fix is queued up for 5.15.87 https://lore.kernel.org/stable/5035a51d-2fb3-9044-7b70-1df33af37e5f@linuxfoundation.org/T/#m39683920478da269a295cc907332a5f20e6122f5

@qeqar
Copy link

qeqar commented Jan 12, 2023

Yesterdays release did not include the bugfix for this! was that intentional.

as i am really waiting for this! If that will take a longer time, i would need to temporary switch to the beta channel, but that is a lot of work to switch everything :-(

@bitfisher
Copy link

5.15.87 got released with the patch included!

Could you please release a new version of flatcar-linux including the new kernel release!?

@jepio
Copy link
Member

jepio commented Jan 13, 2023

We'll make sure the fix (either 5.15.87 or the backport) is part of the next stable release scheduled for 2022-01-23.

@dongsupark
Copy link
Member

Flatcar Stable 3374.2.3 was released with the bugfix, Kernel 5.15.86 with the backport.
Please try it out, and see if the bug was gone.

@bitfisher
Copy link

Looks good now

@mduerr3cgi
Copy link

We are still facing the issue with the current stable release 3374.2.3:
grafik

Beta 3432.1.0 is working fine for us.

@jepio
Copy link
Member

jepio commented Jan 27, 2023

@mduerr3cgi that does not look like the same issue. Can you open a new issue and paste full logs (and test the 2 more recent beta releases).

@dongsupark
Copy link
Member

I assume that this issue was resolved in Stable 3374.2.3.
Please create new issues for other bugs.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kernel Issues related to kernel kind/bug Something isn't working
Development

Successfully merging a pull request may close this issue.