Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenSUSE] Pool not created - wrong device path #2126

Closed
bored-enginr opened this issue Feb 9, 2020 · 8 comments · Fixed by #2194
Closed

[OpenSUSE] Pool not created - wrong device path #2126

bored-enginr opened this issue Feb 9, 2020 · 8 comments · Fixed by #2194

Comments

@bored-enginr
Copy link

bored-enginr commented Feb 9, 2020

I tried to create a pool on a Dell PERC/6i with three 300GB SAS drives. This is Rockstor on OpenSUSE Leap 15.1. This was intended to be just for testing until new equipment arrived. It appears that the path generated in pool.py or btrfs.py is wrong. (I can't tell where the path is created without looking deeper.)

uname:
ordvac:~ # uname -a Linux ordvac 4.12.14-lp151.28.36-default #1 SMP Fri Dec 6 13:50:27 UTC 2019 (8f4a495) x86_64 x86_64 x86_64 GNU/Linux

lspci:
ordvac:~ # lspci | grep LSI 05:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 1078 (rev 04)

part of the drive path:
ordvac:~ # ls /dev/disk/by-id/ ata-HL-DT-ST_DVD+_-RW_GHA2N_KDXC9BK5217 scsi-3690b11c0223db60025bb4a6c79bc0d52 scsi-3690b11c0223db60025bb4ab47e0f20eb scsi-3690b11c0223db60025bb4af281bd5edf scsi-3690b11c0223db60025bb4b6888c47e73 scsi-3690b11c0223db60025bb4b6888c47e73-part1 scsi-3690b11c0223db60025bb4b6888c47e73-part2 scsi-3690b11c0223db60025bb4b6888c47e73-part3 scsi-3690b11c0223db60025bb4b6888c47e73-part4 scsi-SDELL_PERC_6 wwn-0x690b11c0223db60025bb4a6c79bc0d52 wwn-0x690b11c0223db60025bb4ab47e0f20eb wwn-0x690b11c0223db60025bb4af281bd5edf wwn-0x690b11c0223db60025bb4b6888c47e73 wwn-0x690b11c0223db60025bb4b6888c47e73-part1 wwn-0x690b11c0223db60025bb4b6888c47e73-part2 wwn-0x690b11c0223db60025bb4b6888c47e73-part3 wwn-0x690b11c0223db60025bb4b6888c47e73-part4

Where the disk IDs actually are:
ordvac:~ # ls /dev/disk/by-id/scsi-SDELL_PERC_6/ i_Adapter_00520dbc796c4abb2500b63d22c0110b i_Adapter_00737ec488684bbb2500b63d22c0110b i_Adapter_00737ec488684bbb2500b63d22c0110b-part1 i_Adapter_00737ec488684bbb2500b63d22c0110b-part2 i_Adapter_00737ec488684bbb2500b63d22c0110b-part3 i_Adapter_00737ec488684bbb2500b63d22c0110b-part4 i_Adapter_00df5ebd81f24abb2500b63d22c0110b i_Adapter_00eb200f7eb44abb2500b63d22c0110b

. . .And finally, the Traceback:
Traceback (most recent call last): File "/opt/rockstor/src/rockstor/rest_framework_custom/generic_view.py", line 41, in _handle_exception yield File "/opt/rockstor/src/rockstor/storageadmin/views/pool.py", line 413, in post add_pool(p, dnames) File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 86, in add_pool out, err, rc = run_command(cmd, log=True) File "/opt/rockstor/src/rockstor/system/osi.py", line 120, in run_command raise CommandException(cmd, out, err, rc) CommandException: Error running a command. cmd = /usr/sbin/mkfs.btrfs -f -d raid5 -m raid5 -L Main /dev/disk/by-id/i_Adapter_00520dbc796c4abb2500b63d22c0110b /dev/disk/by-id/i_Adapter_00df5ebd81f24abb2500b63d22c0110b /dev/disk/by-id/i_Adapter_00eb200f7eb44abb2500b63d22c0110b. rc = 1. stdout = ['btrfs-progs v4.19.1 ', 'See http://btrfs.wiki.kernel.org for more information.', '', '']. stderr = ['ERROR: mount check: cannot open /dev/disk/by-id/i_Adapter_00520dbc796c4abb2500b63d22c0110b: No such file or directory', 'ERROR: cannot check mount status of /dev/disk/by-id/i_Adapter_00520dbc796c4abb2500b63d22c0110b: No such file or directory', '']

Sorry it's so long. As I alluded to above, I'm going to retire the disks and controller, bit I'd be happy to reinstall them for testing, if needed.

@bored-enginr bored-enginr changed the title Pool not created - wrong device path [OpenSUSE] Pool not created - wrong device path Feb 10, 2020
@phillxnet
Copy link
Member

@bored-enginr Are you certain this affects only our Built on openSUSE variant. I.e. does this also behave similarly on a fully updated Stable channel release (currently CentOS only). This is quite important as then it is generic, which is what I suspect. And we need to remove the [openSUSE] stipulation here. Contact me on the forum (I have the same user name there) via PM if you are up for testing this issue in current Stable channel and you don't currently have access.

Also can we have a full

ls -la /dev/disk/by-id

output.

Thanks for the testing and report by the way. But I suspect this is generic and not specific to our Built on openSUSE builds. And in which case it is specific to this hardware so would be nice to get sorted. I can take a closer look later hopefully. But do past that command if you still have access.

@bored-enginr
Copy link
Author

I'm now certain that this affects only the OpenSUSE variant. A screenshot after the pool was created on the same hardware (one of the drives was different) is attached. (I didn't have time to update system. If you feel it's necessary, I can revisit this in a few days.)

Rocktstor - Centos

Here is the same file listing that you asked for from the OpenSUSE install:
[radmin@test ~]$ ls -la /dev/disk/by-id total 0 drwxr-xr-x 2 root root 340 Feb 10 19:06 . drwxr-xr-x 6 root root 120 Feb 10 19:06 .. lrwxrwxrwx 1 root root 9 Feb 10 19:13 ata-HL-DT-ST_DVD+_-RW_GHA2N_KDXC9BK5217 -> ../../sr0 lrwxrwxrwx 1 root root 9 Feb 10 19:13 scsi-3690b11c0223db60025bb4a6c79bc0d52 -> ../../sda lrwxrwxrwx 1 root root 10 Feb 10 19:13 scsi-3690b11c0223db60025bb4a6c79bc0d52-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Feb 10 19:13 scsi-3690b11c0223db60025bb4a6c79bc0d52-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Feb 10 19:13 scsi-3690b11c0223db60025bb4a6c79bc0d52-part3 -> ../../sda3 lrwxrwxrwx 1 root root 9 Feb 10 19:13 scsi-3690b11c0223db60025bb4ab47e0f20eb -> ../../sdb lrwxrwxrwx 1 root root 9 Feb 10 19:13 scsi-3690b11c0223db60025bb4af281bd5edf -> ../../sdc lrwxrwxrwx 1 root root 9 Feb 10 19:13 scsi-3690b11c0223db60025d4c51e13da4d84 -> ../../sdd lrwxrwxrwx 1 root root 9 Feb 10 19:13 wwn-0x690b11c0223db60025bb4a6c79bc0d52 -> ../../sda lrwxrwxrwx 1 root root 10 Feb 10 19:13 wwn-0x690b11c0223db60025bb4a6c79bc0d52-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Feb 10 19:13 wwn-0x690b11c0223db60025bb4a6c79bc0d52-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Feb 10 19:13 wwn-0x690b11c0223db60025bb4a6c79bc0d52-part3 -> ../../sda3 lrwxrwxrwx 1 root root 9 Feb 10 19:13 wwn-0x690b11c0223db60025bb4ab47e0f20eb -> ../../sdb lrwxrwxrwx 1 root root 9 Feb 10 19:13 wwn-0x690b11c0223db60025bb4af281bd5edf -> ../../sdc lrwxrwxrwx 1 root root 9 Feb 10 19:13 wwn-0x690b11c0223db60025d4c51e13da4d84 -> ../../sdd

This is the file listing from the OpenSUSE build:

[I had to give up with formatting the snippet. It just wouldn't work. I've attached a screen shot, instead.]
OpenSUSE File Listing

Is there, perhaps, a difference in the udev rules?

@phillxnet
Copy link
Member

@bored-enginr Thanks for the follow:

A screenshot after the pool was created on the same hardware (one of the drives was different) is attached. (I didn't have time to update system. If you feel it's necessary, I can revisit this in a few days.)

Yes, we do need to compare the function on this hardware across the same version of code really: as per our forum PM, but as you say this will take some time so we circle back around to that.

Is there, perhaps, a difference in the udev rules?

This is my suspicion also, and we have seen this. It may also be down to a driver difference between our now old kernel in our CentOS based offering and the much newer but potentially problematic in this hardware case openSUSE kernels.

Thanks again for the list postings. Given the formatting issues I'd like to suggest we move this issue to the forum and bring it back here once we have identified the root cause. Formatting is easier there. We can link to this GitHub issue to get us started.

@phillxnet
Copy link
Member

phillxnet commented Feb 11, 2020

@bored-enginr

Continuing here for now as I've got a quick command output request.

Just had another little look at this. As you have shown in your ls -la output on an openSUSE base our later code looks to be 'finding' the wrong by-id names ie not those in:

/dev/disk/by-id

which look just dandy.
But those in:

/dev/disk/by-id/scsi-SDELL_PERC_6

which are the strange i_Adapter_long_hexadeximal_here ones that it's using in error and failing with.

That definitely looks like a bug and one I suspect is shared in out latest CentOS code also, assuming the CentOS base is also producing those names, i.e. could be an artefact of newer drivers for example.

Copying in the relevant 'suspect' code area:

def get_dev_byid_name() in src/rockstor/system/osi.py

So that I might re-create this failure here in a unit test could you, when you next get the chance, paste the output of the following command for me:

udevadm info --query=property --name /dev/sd-of-a-problem-drive

so that's probably going to be:

udevadm info --query=property --name /dev/sdb

Never seen that "scsi-SDELL_PERC_6" subdirectory before or those weird i_Adapter_names either. So likely that, as you intimated with your supplied listings, is the root of where we are going wrong here. I'll have another look when I next get the time, and after you've posted the above command output, and hopefully there's a fix in here somewhere as I've likely just not accounted for these types of names and they have been inadvertently prioritised / picked-up over our /dev/disk/by-id ones and then used in their place. With the obvious failure that they don't actually exist directly in /dev/disk/by-id. Curios one that is.

@phillxnet
Copy link
Member

@bored-enginr Re:

So that I might re-create this failure here in a unit test could you, when you next get the chance, paste the output of the following command for me:

Any chance you can paste the requested command outputs?

Once we have them in a unit test we can fix what's going wrong here. Pretty sure it's just an over site of these "scsi-SDELL_PERC_6" names going on. And in which case the fix may be fairly straightforward. But we will need those command outputs to ensure we do the right thing and to prove the fix.

Cheers.

@bored-enginr
Copy link
Author

I'm sorry for the delay. I had some troubles that necessitated putting this aside for quite some time. Here is the output of the command on the disc pool in OpenSuse. This was taken several months ago:

linux-25wx:~ # udevadm info --query=property --name /dev/sdb
COMPAT_SYMLINK_GENERATION=2
DEVLINKS=/dev/disk/by-id/wwn-0x690b11c0223db60025d91a4a09321e2c /dev/disk/by-id/scsi-SDELL_PERC_6/i_Adapter_002c1e32094a1ad92500b63d22c0110b /dev/disk/by-label/Main /dev/disk/by-uuid/e7e4a52b-6fb0-4361-a463-509f2d875f99 /dev/disk/by-path/pci-0000:05:00.0-scsi-0:2:1:0 /dev/disk/by-id/scsi-3690b11c0223db60025d91a4a09321e2c
DEVNAME=/dev/sdb
DEVPATH=/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/target0:2:1/0:2:1:0/block/sdb
DEVTYPE=disk
DM_MULTIPATH_DEVICE_PATH=0
DONT_DEL_PART_NODES=1
FC_TARGET_LUN=0
ID_BTRFS_READY=1
ID_BUS=scsi
ID_FS_LABEL=Main
ID_FS_LABEL_ENC=Main
ID_FS_TYPE=btrfs
ID_FS_USAGE=filesystem
ID_FS_UUID=e7e4a52b-6fb0-4361-a463-509f2d875f99
ID_FS_UUID_ENC=e7e4a52b-6fb0-4361-a463-509f2d875f99
ID_FS_UUID_SUB=24f91f99-9fb0-44ee-a311-2be3a7ddecbc
ID_FS_UUID_SUB_ENC=24f91f99-9fb0-44ee-a311-2be3a7ddecbc
ID_MODEL=PERC_6/i_Adapter
ID_MODEL_ENC=PERC\x206/i\x20Adapter
ID_PATH=pci-0000:05:00.0-scsi-0:2:1:0
ID_PATH_TAG=pci-0000_05_00_0-scsi-0_2_1_0
ID_REVISION=1.22
ID_SCSI=1
ID_SCSI_INQUIRY=1
ID_SERIAL=3690b11c0223db60025d91a4a09321e2c
ID_SERIAL_SHORT=690b11c0223db60025d91a4a09321e2c
ID_TYPE=disk
ID_VENDOR=DELL
ID_VENDOR_ENC=DELL\x20\x20\x20\x20
ID_WWN=0x690b11c0223db600
ID_WWN_WITH_EXTENSION=0x690b11c0223db60025d91a4a09321e2c
MAJOR=8
MINOR=16
MPATH_SBIN_PATH=/sbin
SCSI_IDENT_LUN_NAA_REGEXT=690b11c0223db60025d91a4a09321e2c
SCSI_IDENT_SERIAL=002c1e32094a1ad92500b63d22c0110b
SCSI_MODEL=PERC_6/i_Adapter
SCSI_MODEL_ENC=PERC\x206/i\x20Adapter
SCSI_REVISION=1.22
SCSI_TPGS=0
SCSI_TYPE=disk
SCSI_VENDOR=DELL
SCSI_VENDOR_ENC=DELL\x20\x20\x20\x20
SUBSYSTEM=block
TAGS=:systemd:
USEC_INITIALIZED=11820225

Here's the same thing from a recent Rockstor (CentOS base) Stable install:

[root@rockstor ~]# udevadm info --query=property --name /dev/sdb
DEVLINKS=/dev/disk/by-id/scsi-3690b11c0223db6002667150a14b8cb34 /dev/disk/by-id/ wwn-0x690b11c0223db6002667150a14b8cb34 /dev/disk/by-path/pci-0000:05:00.0-scsi-0 :2:1:0
DEVNAME=/dev/sdb
DEVPATH=/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/target0:2:1/0:2:1:0/ block/sdb
DEVTYPE=disk
ID_BUS=scsi
ID_MODEL=PERC_6_i_Adapter
ID_MODEL_ENC=PERC\x206\x2fi\x20Adapter
ID_PATH=pci-0000:05:00.0-scsi-0:2:1:0
ID_PATH_TAG=pci-0000_05_00_0-scsi-0_2_1_0
ID_REVISION=1.22
ID_SCSI=1
ID_SCSI_SERIAL=0034cbb8140a15672600b63d22c0110b
ID_SERIAL=3690b11c0223db6002667150a14b8cb34
ID_SERIAL_SHORT=690b11c0223db6002667150a14b8cb34
ID_TYPE=disk
ID_VENDOR=DELL
ID_VENDOR_ENC=DELL\x20\x20\x20\x20
ID_WWN=0x690b11c0223db600
ID_WWN_VENDOR_EXTENSION=0x2667150a14b8cb34
ID_WWN_WITH_EXTENSION=0x690b11c0223db6002667150a14b8cb34
MAJOR=8
MINOR=16
SUBSYSTEM=block
TAGS=:systemd:
USEC_INITIALIZED=609981
[root@rockstor ~]# ^C
[root@rockstor ~]# udevadm info --query=property --name /dev/sdb
DEVLINKS=/dev/disk/by-id/scsi-3690b11c0223db6002667150a14b8cb34 /dev/disk/by-id/wwn-0x690b11c0223db6002667150a14b8cb34 /dev/disk/by-path/pci-0000:05:00.0-scsi-0:2:1:0
DEVNAME=/dev/sdb
DEVPATH=/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/target0:2:1/0:2:1:0/block/sdb
DEVTYPE=disk
ID_BUS=scsi
ID_MODEL=PERC_6_i_Adapter
ID_MODEL_ENC=PERC\x206\x2fi\x20Adapter
ID_PATH=pci-0000:05:00.0-scsi-0:2:1:0
ID_PATH_TAG=pci-0000_05_00_0-scsi-0_2_1_0
ID_REVISION=1.22
ID_SCSI=1
ID_SCSI_SERIAL=0034cbb8140a15672600b63d22c0110b
ID_SERIAL=3690b11c0223db6002667150a14b8cb34
ID_SERIAL_SHORT=690b11c0223db6002667150a14b8cb34
ID_TYPE=disk
ID_VENDOR=DELL
ID_VENDOR_ENC=DELL\x20\x20\x20\x20
ID_WWN=0x690b11c0223db600
ID_WWN_VENDOR_EXTENSION=0x2667150a14b8cb34
ID_WWN_WITH_EXTENSION=0x690b11c0223db6002667150a14b8cb34
MAJOR=8
MINOR=16
SUBSYSTEM=block
TAGS=:systemd:
USEC_INITIALIZED=609981

Again, I apologize for the delay. If there's anything else you need, I'd be happy to supply it.

@phillxnet
Copy link
Member

@bored-enginr Thanks for this. I'll hopefully now be able to reproduce this issue within our unit tests and sort it out. May be a little while unfortunately but would like to sort it as I'm pretty sure it's going to be down to a simple over-site.

I'll use this issue to report any progress.

Cheers.

@phillxnet
Copy link
Member

I am currently working on this issue.

phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jul 15, 2020
Some controllers/drivers populate /dev/disk/by-id with additional device
links within a subdirectory. These are not compatible and will result in
basic failure to access these devices for Pool creation etc.

The existing associated unit tests were used to avoid known regressions
and extended to cover the observed regression case of these subdir
device links being inadvertently selected as canonical.
phillxnet added a commit that referenced this issue Jul 17, 2020
…d_-_wrong_device_path

avoid referencing incompatible subdir by-id device names. Fixes #2126
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants