Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on NVMe multipath environment #669

Open
cospotato opened this issue Mar 3, 2022 · 2 comments
Open

Panic on NVMe multipath environment #669

cospotato opened this issue Mar 3, 2022 · 2 comments

Comments

@cospotato
Copy link
Contributor

What steps did you take and what happened:

The node-disk-manager panic on environment with NVMe multipath enabled.

What did you expect to happen:

Run correctly

The output of the following commands will help us better understand what's going on:
[Pasting long output into a GitHub gist or other pastebin is fine.]

  • kubectl get pods -n openebs
  • kubectl get blockdevices -n openebs -o yaml
  • kubectl get blockdeviceclaims -n openebs -o yaml
  • kubectl logs <ndm daemon pod name> -n openebs
I0302 03:02:21.186114     983 udevprobe.go:334] starting udev probe listener
I0302 03:02:21.187058     983 udevprobe.go:228] Dependents of /dev/vda : {Parent: Partitions:[/dev/vda1] Holders:[] Slaves:[]}
I0302 03:02:21.187075     983 udevprobe.go:238] Device: /dev/vda is of type: disk
I0302 03:02:21.187168     983 udevprobe.go:211] device: /dev/vda1, FileSystemUUID: c38efadd-7226-47e9-8865-94aa94af4e7c filled during udev scan
I0302 03:02:21.187282     983 udevprobe.go:228] Dependents of /dev/vda1 : {Parent:/dev/vda Partitions:[] Holders:[] Slaves:[]}
I0302 03:02:21.187291     983 udevprobe.go:238] Device: /dev/vda1 is of type: partition
I0302 03:02:21.187353     983 udevprobe.go:195] device: , WWN: eui.32303838363437393800000000000000 filled during udev scan
I0302 03:02:21.187360     983 udevprobe.go:199] device: , Serial: 1qguryhhxaf9ik89ik89 filled during udev scan
I0302 03:02:21.187538     983 udevprobe.go:228] Dependents of  : {Parent: Partitions:[/dev/.dockerenv /dev/bin /dev/boot /dev/dev /dev/etc /dev/home /dev/host /dev/lib /dev/lib64 /dev/media /dev/mnt /dev/opt /dev/proc /dev/root /dev/run /dev/sbin /dev/srv /dev/sys /dev/tmp /dev/usr /dev/var] Holders:[] Slaves:[]}
panic: runtime error: slice bounds out of range [:3] with length 0

goroutine 1 [running]:
panic(0x1686a60, 0xc0003feb20)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/runtime/panic.go:1064 +0x46d fp=0xc00020b6a8 sp=0xc00020b5f0 pc=0x444bbd
runtime.goPanicSliceAlen(0x3, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/runtime/panic.go:98 +0xa3 fp=0xc00020b6f0 sp=0xc00020b6a8 pc=0x442b73
github.com/openebs/node-disk-manager/pkg/sysfs.isDM(...)
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/pkg/sysfs/syspath.go:320
github.com/openebs/node-disk-manager/pkg/sysfs.Device.GetDeviceType(0x0, 0x0, 0x0, 0x0, 0xc00048d898, 0x2, 0xc00048d874, 0x4, 0xc000566e00, 0x15, ...)
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/pkg/sysfs/syspath.go:282 +0x45c fp=0xc00020b758 sp=0xc00020b6f0 pc=0x129696c
github.com/openebs/node-disk-manager/cmd/ndm_daemonset/probe.(*udevProbe).scan(0xc000427fc0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/cmd/ndm_daemonset/probe/udevprobe.go:232 +0xbfd fp=0xc00020bb58 sp=0xc00020b758 pc=0x13ada2d
github.com/openebs/node-disk-manager/cmd/ndm_daemonset/probe.(*udevProbe).Start(0xc000427f80)
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/cmd/ndm_daemonset/probe/udevprobe.go:123 +0x76 fp=0xc00020bb80 sp=0xc00020bb58 pc=0x13acca6
github.com/openebs/node-disk-manager/cmd/ndm_daemonset/probe.(*registerProbe).register(0xc00020bc60)
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/cmd/ndm_daemonset/probe/probe.go:59 +0xb3 fp=0xc00020bba0 sp=0xc00020bb80 pc=0x13a9b43
github.com/openebs/node-disk-manager/cmd/ndm_daemonset/probe.glob..func5()
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/cmd/ndm_daemonset/probe/udevprobe.go:67 +0x1b1 fp=0xc00020bca8 sp=0xc00020bba0 pc=0x13b2251
github.com/openebs/node-disk-manager/cmd/ndm_daemonset/probe.Start(0x2743ae0, 0x7, 0x7)
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/cmd/ndm_daemonset/probe/probe.go:67 +0x96 fp=0xc00020bd00 sp=0xc00020bca8 pc=0x13a9c16
github.com/openebs/node-disk-manager/cmd/ndm_daemonset/app/command.NewCmdStart.func1(0xc000219b80, 0xc000551050, 0x0, 0x3)
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/cmd/ndm_daemonset/app/command/start.go:65 +0x1d6 fp=0xc00020bd78 sp=0xc00020bd00 pc=0x13c9f46
github.com/spf13/cobra.(*Command).execute(0xc000219b80, 0xc000550f60, 0x3, 0x3, 0xc000219b80, 0xc000550f60)
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/vendor/github.com/spf13/cobra/command.go:842 +0x29d fp=0xc00020be50 sp=0xc00020bd78 pc=0x12af01d
github.com/spf13/cobra.(*Command).ExecuteC(0xc000218dc0, 0x0, 0x0, 0x41722f)
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/vendor/github.com/spf13/cobra/command.go:943 +0x317 fp=0xc00020bf28 sp=0xc00020be50 pc=0x12afcc7
github.com/spf13/cobra.(*Command).Execute(...)
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/vendor/github.com/spf13/cobra/command.go:883
main.run(0x0, 0x0)
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/cmd/ndm_daemonset/main.go:49 +0x94 fp=0xc00020bf68 sp=0xc00020bf28 pc=0x13ca494
main.main()
	/home/travis/gopath/src/github.com/openebs/node-disk-manager/cmd/ndm_daemonset/main.go:28 +0x22 fp=0xc00020bf88 sp=0xc00020bf68 pc=0x13ca3c2
runtime.main()
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/runtime/proc.go:203 +0x1fa fp=0xc00020bfe0 sp=0xc00020bf88 pc=0x44760a
runtime.goexit()
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00020bfe8 sp=0xc00020bfe0 pc=0x474e01

  • lsblk from nodes where ndm daemonset is running

6233F5DE-64A8-4ad8-9708-824CA01223B3

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

I wrote a tiny program to get udev DEVNAME.

#include <libudev.h>
#include <stdio.h>

int main()
{
    struct udev *udev;
    struct udev_enumerate *ue;
    struct udev_list_entry *entry;

    udev = udev_new();
    ue = udev_enumerate_new(udev);
    udev_enumerate_add_match_subsystem(ue, "block");
    udev_enumerate_scan_devices(ue);

    udev_list_entry_foreach(entry, udev_enumerate_get_list_entry(ue))
    {
        printf("entry name: %s\n", udev_list_entry_get_name(entry));
        struct udev_device *device;

        device =
            udev_device_new_from_syspath(udev,
                                         udev_list_entry_get_name(entry));
        printf("DEVNAME: %s\n", udev_device_get_property_value(device, "DEVNAME"));
    }

    udev_enumerate_unref(ue);
    udev_unref(udev);
}

The result on the nodes is

9A50D919-93BD-418e-B469-76154DE0BEF2

Environment:

  • OpenEBS version
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • Type of disks connected to the nodes (eg: Virtual Disks, GCE/EBS Volumes, Physical drives etc)
  • OS (e.g. from /etc/os-release):
@akhilerm
Copy link
Contributor

akhilerm commented Mar 4, 2022

@cospotato We have never tested on nodes which uses nvme multipath devices. But this issue seems like udev is not reporting the devname for the device.

Also, as per my understanding NVMe devices follow a naming format of nvme<controller_num>n<namespace_num>, But the devices in the above given example are using a different format nvme3c3n1.

Can you provide details of the cluster / node and the OS running on the machines.

@cospotato
Copy link
Contributor Author

@akhilerm The node OS is AliOS which is modified by Alibaba group from CentOS 7. The NVMe device on it is the "Local Disk" product of AliCloud. I googled the naming format. Then I get this page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants