kernel crashes at Oracle Linux 8 #178

ksyblast · 2024-03-28T16:02:52Z

Kubernetes v1.27.5
Bare metal nodes
LVM Thinpool
piraeus-operator v2.4.1
Oracle Linux 8
Kernel 5.15.0-204.147.6.2.el8uek.x86_64 + default drbd image drbd9-jammy
Also reproduced with kernel 4.18 + drbd image drbd9-almalinux8

How to reproduce:
Create and subsequently delete a number of volumes and attach them. I tested with about 8 pvc-s and pod-s and made around 20 operations of creation and then deletion of them. Randomly the server goes to reboot because of crash. Most often it happened during volumes deletion but also it was reproduced during a new pvc creation.

UEK kernel Makefile (/usr/src/kernels/5.15.0-204.147.6.2.el8uek.x86_64/Makefile) patched to be able to build drbd:

--- Makefile	2024-01-15 12:24:44.452296691 +0000
+++ Makefile	2024-01-15 12:25:36.325543428 +0000
@@ -853,18 +853,18 @@
 endif
 
 # Initialize all stack variables with a 0xAA pattern.
-ifdef CONFIG_INIT_STACK_ALL_PATTERN
-KBUILD_CFLAGS	+= -ftrivial-auto-var-init=pattern
-endif
+#ifdef CONFIG_INIT_STACK_ALL_PATTERN
+#KBUILD_CFLAGS	+= -ftrivial-auto-var-init=pattern
+#endif
 
 # Initialize all stack variables with a zero value.
-ifdef CONFIG_INIT_STACK_ALL_ZERO
-KBUILD_CFLAGS	+= -ftrivial-auto-var-init=zero
-ifdef CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO_ENABLER
+#ifdef CONFIG_INIT_STACK_ALL_ZERO
+#KBUILD_CFLAGS	+= -ftrivial-auto-var-init=zero
+#ifdef CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO_ENABLER
 # https://github.com/llvm/llvm-project/issues/44842
-KBUILD_CFLAGS	+= -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
-endif
-endif
+#KBUILD_CFLAGS	+= -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
+#endif
+#endif

apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
  name: piraeus-storage-pool
spec:
  storagePools:
    - name: piraeus-storage-pool-lvmthin
      lvmThinPool:
        volumeGroup: lvmvgthin
        thinPool: thinpool_piraeus
  podTemplate:
    spec:
      hostNetwork: true
  nodeAffinity:
    nodeSelectorTerms:
    - matchExpressions:
      - key: node-role.kubernetes.io/control-plane
        operator: DoesNotExist
      - key: piraeus
        operator: In
        values:
         - enabled

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: piraeus-storage-replicated-lvm
provisioner: linstor.csi.linbit.com
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
parameters:
  # https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-kubernetes-sc-parameters
  ## CSI related parameters
  csi.storage.k8s.io/fstype: ext4
  ## LINSTOR parameters
  linstor.csi.linbit.com/storagePool: piraeus-storage-pool-lvmthin
  linstor.csi.linbit.com/placementCount: "2"
  linstor.csi.linbit.com/mountOpts: noatime,discard
  property.linstor.csi.linbit.com/DrbdOptions/Net/max-buffers: "11000"

---
apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
  name: linstorcluster
spec:
  nodeAffinity:
    nodeSelectorTerms:
    - matchExpressions:
      - key: node-role.kubernetes.io/control-plane
        operator: DoesNotExist
  # https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-autoplace-linstor
  properties:
    - name: DrbdOptions/Net/max-buffers # controller level
      value: "10000"
    - name: Autoplacer/Weights/MaxFreeSpace
      value: "0" # 1 default
    - name: Autoplacer/Weights/MinReservedSpace
      value: "10" # preferr nodes with minimal reserved space on thin pool
    - name: Autoplacer/Weights/MinRscCount
      value: "0"
    # - name: Autoplacer/Weights/MaxThroughput
    #   value: "0" # COOL but not today

cat /proc/drbd 
version: 9.2.8 (api:2/proto:86-122)
GIT-hash: e163b05a76254c0f51f999970e861d72bb16409a build by @srvh52.example.com, 2024-03-28 15:13:48
Transports (api:20): tcp (9.2.8) lb-tcp (9.2.8) rdma (9.2.8)

[ 4083.197349] Call Trace:
[ 4083.208990]  <TASK>
[ 4083.220334]  ? show_trace_log_lvl+0x1d6/0x2f9
[ 4083.231532]  ? show_trace_log_lvl+0x1d6/0x2f9
[ 4083.242553]  ? drbd_free_peer_req+0x99/0x210 [drbd]
[ 4083.253383]  ? __die_body.cold+0x8/0xa
[ 4083.263954]  ? page_fault_oops+0x16d/0x1ac
[ 4083.274325]  ? exc_page_fault+0x68/0x13b
[ 4083.284460]  ? asm_exc_page_fault+0x22/0x27
[ 4083.294360]  ? _raw_spin_lock_irq+0x13/0x58
[ 4083.303995]  drbd_free_peer_req+0x99/0x210 [drbd]
[ 4083.313482]  drbd_finish_peer_reqs+0xc0/0x180 [drbd]
[ 4083.322880]  drain_resync_activity+0x25b/0x43a [drbd]
[ 4083.332060]  conn_disconnect+0xf4/0x650 [drbd]
[ 4083.341017]  drbd_receiver+0x53/0x60 [drbd]
[ 4083.349787]  drbd_thread_setup+0x77/0x1df [drbd]
[ 4083.358332]  ? drbd_reclaim_path+0x90/0x90 [drbd]
[ 4083.366677]  kthread+0x127/0x144
[ 4083.374961]  ? set_kthread_struct+0x60/0x52
[ 4083.382938]  ret_from_fork+0x22/0x2d
[ 4083.390678]  </TASK>

The text was updated successfully, but these errors were encountered:

ksyblast · 2024-03-28T16:04:11Z

vmcore-dmesg.txt.tar.gz
Full log attached
Any tips or ideas are highly appreciated

ksyblast · 2024-03-29T06:37:32Z

Looks like this is similar to LINBIT/drbd#86

WanzenBug · 2024-03-29T07:15:58Z

Hello! Thanks for the report. I guess it would be a good idea to add that information to the DRBD issue, as that seems to be the root cause.

We have seen it internally, but never been able to reproduce it reliably. Adding more context seems like a good idea.

ksyblast · 2024-03-29T07:28:17Z

Thanks for the answer. Should I add more details how I reproduced that?

ksyblast · 2024-03-29T07:32:29Z

Also, does it make sense to try with some older piraeus version? It's also reproduced with drbd 9.2.6 and piraeus v2.3.0

WanzenBug · 2024-03-29T07:37:19Z

You could try DRBD 9.1.18.

That does mean you have to use host networking, but you already do use that.

duckhawk · 2024-03-29T08:15:47Z

@WanzenBug hello. There are our reproduction steps:

We have 5-nodes k8s cluster with SSD storage pools of 100 GB each (Thin LVM)

All queues are processed in 1 parallel operation:
csiAttacherWorkerThreads: 1
csiProvisionerWorkerThreads: 1
csiSnapshotterWorkerThreads: 1
csiResizerWorkerThreads: 1

30 STS are made with 3 replicas each, and each replica have 5 gigabytes PV
When pods are up, we change the size of all PVCs to 5.1 gigabytes
After all PV resize finished, we delete the namespace.
After that we restart the process from the beginning

When such a scheme is launched in a continuous cycle, we almost invariably have several node reboots per day. The operating system is not essential; we have encountered a similar problem with various 5.x and 6.x kernels from different distributions. However, the issue is definitely reproducible on the current LTS Ubuntu 22.04.

STS spec:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: flog-generator-0
  namespace: test1
spec:
  podManagementPolicy: OrderedReady
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/name: flog-generator-0
  serviceName: ""
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/name: flog-generator-0
    spec:
      containers:
      - args:
        - -c
        - /srv/flog/run.sh 2>&1 | tee -a /var/log/flog/fake.log
        command:
        - /bin/sh
        env:
        - name: FLOG_BATCH_SIZE
          value: "1024000"
        - name: FLOG_TIME_INTERVAL
          value: "1"
        image: ex42zav/flog:0.4.3
        imagePullPolicy: IfNotPresent
        name: flog-generator
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log/flog
          name: flog-pv
      - env:
        - name: LOGS_DIRECTORIES
          value: /var/log/flog
        - name: LOGROTATE_INTERVAL
          value: hourly
        - name: LOGROTATE_COPIES
          value: "2"
        - name: LOGROTATE_SIZE
          value: 500M
        - name: LOGROTATE_CRONSCHEDULE
          value: 0 2 * * * *
        image: blacklabelops/logrotate
        imagePullPolicy: Always
        name: logrotate
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log/flog
          name: flog-pv
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      terminationGracePeriodSeconds: 30
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: flog-pv
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
      storageClassName: linstor-r2
      volumeMode: Filesystem

WanzenBug · 2024-03-29T08:21:04Z

Thanks! You could also try switching to DRBD 9.1.18. We suspect there is a race condition introduced in the 9.2 branch.

WanzenBug · 2024-03-29T10:39:00Z

Another idea on what might be causing the issue, with a work around in the CSI driver: piraeusdatastore/linstor-csi#256

You might try that by using the v1.5.0-2-g16c206a tag for the CSI image. You can edit the piraeus-operator-image-config to change the image.

ksyblast · 2024-04-01T10:27:00Z

We have tested with DRBD 9.1.18. Looks like the issue is not reproduced with this version

duckhawk · 2024-04-01T11:39:19Z

I'm also testing 9.1.18 now.
Can you tell, please, is it safe to move existing installation from 9.2.5 to 9.1.18?

WanzenBug · 2024-04-02T05:32:56Z

Can you tell, please, is it safe to move existing installation from 9.2.5 to 9.1.18?

Yes, it is safe.

duckhawk · 2024-04-09T05:38:36Z

@WanzenBug it looks like v1.5.0-2-g16c206a solves the node restart problem. Will you please create a tag version with it? (maybe like 1.5.1)
Also, it looks like there also problem inside DRBD, that cause crash in some conditions? Will you solve it? If you can't reproduce situation, I think, I can gave an ssh access to cluster where I can reproduce situation for you.

WanzenBug · 2024-04-09T05:58:59Z

Thank you for testing! So just to confirm, you tested with DRBD 9.2.8 and the above CSI version and did not observe the crash?

Then it must have something to do with removing a volume from a resource, as I expected. I will use that to try to reproduce the bevahiour.

duckhawk · 2024-04-09T07:26:55Z

We tested this with 9.2.5 and 9.2.8, and above CSI version. Yes, there were no crash anymore.

Thank you, I'll wait for your solution.

Can you tell, will fix from v1.5.0-2-g16c206a come in 1.5.1?

WanzenBug · 2024-04-09T09:33:10Z

Yes, there will be a 1.5.1 with that. We still intend to fix the issue in DRBD, too.

ksyblast · 2024-04-11T10:09:18Z

We will also test with 1.5.1 and drbd 3.2.8 when 1.5.1 is released

WanzenBug · 2024-04-11T11:35:11Z

Just wanted to let you know that we think we have tracked down the issue, no fix yet but we should have something ready for next DRBD release.

Philipp-Reisner · 2024-04-15T13:24:21Z

Fixed on the DRBD side with LINBIT/drbd@857db82 and LINBIT/drbd@343e077.

ksyblast mentioned this issue Mar 29, 2024

Kernel Panic with 9.2.8 LINBIT/drbd#86

Closed

Philipp-Reisner closed this as completed Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel crashes at Oracle Linux 8 #178

kernel crashes at Oracle Linux 8 #178

ksyblast commented Mar 28, 2024 •

edited

Loading

ksyblast commented Mar 28, 2024

ksyblast commented Mar 29, 2024

WanzenBug commented Mar 29, 2024

ksyblast commented Mar 29, 2024

ksyblast commented Mar 29, 2024

WanzenBug commented Mar 29, 2024

duckhawk commented Mar 29, 2024

WanzenBug commented Mar 29, 2024

WanzenBug commented Mar 29, 2024

ksyblast commented Apr 1, 2024

duckhawk commented Apr 1, 2024 •

edited

Loading

WanzenBug commented Apr 2, 2024

duckhawk commented Apr 9, 2024

WanzenBug commented Apr 9, 2024

duckhawk commented Apr 9, 2024

WanzenBug commented Apr 9, 2024

ksyblast commented Apr 11, 2024 •

edited

Loading

WanzenBug commented Apr 11, 2024

Philipp-Reisner commented Apr 15, 2024

kernel crashes at Oracle Linux 8 #178

kernel crashes at Oracle Linux 8 #178

Comments

ksyblast commented Mar 28, 2024 • edited Loading

ksyblast commented Mar 28, 2024

ksyblast commented Mar 29, 2024

WanzenBug commented Mar 29, 2024

ksyblast commented Mar 29, 2024

ksyblast commented Mar 29, 2024

WanzenBug commented Mar 29, 2024

duckhawk commented Mar 29, 2024

WanzenBug commented Mar 29, 2024

WanzenBug commented Mar 29, 2024

ksyblast commented Apr 1, 2024

duckhawk commented Apr 1, 2024 • edited Loading

WanzenBug commented Apr 2, 2024

duckhawk commented Apr 9, 2024

WanzenBug commented Apr 9, 2024

duckhawk commented Apr 9, 2024

WanzenBug commented Apr 9, 2024

ksyblast commented Apr 11, 2024 • edited Loading

WanzenBug commented Apr 11, 2024

Philipp-Reisner commented Apr 15, 2024

ksyblast commented Mar 28, 2024 •

edited

Loading

duckhawk commented Apr 1, 2024 •

edited

Loading

ksyblast commented Apr 11, 2024 •

edited

Loading