-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenSUSE] Pool not created - wrong device path #2126
Comments
@bored-enginr Are you certain this affects only our Built on openSUSE variant. I.e. does this also behave similarly on a fully updated Stable channel release (currently CentOS only). This is quite important as then it is generic, which is what I suspect. And we need to remove the [openSUSE] stipulation here. Contact me on the forum (I have the same user name there) via PM if you are up for testing this issue in current Stable channel and you don't currently have access. Also can we have a full
output. Thanks for the testing and report by the way. But I suspect this is generic and not specific to our Built on openSUSE builds. And in which case it is specific to this hardware so would be nice to get sorted. I can take a closer look later hopefully. But do past that command if you still have access. |
@bored-enginr Thanks for the follow:
Yes, we do need to compare the function on this hardware across the same version of code really: as per our forum PM, but as you say this will take some time so we circle back around to that.
This is my suspicion also, and we have seen this. It may also be down to a driver difference between our now old kernel in our CentOS based offering and the much newer but potentially problematic in this hardware case openSUSE kernels. Thanks again for the list postings. Given the formatting issues I'd like to suggest we move this issue to the forum and bring it back here once we have identified the root cause. Formatting is easier there. We can link to this GitHub issue to get us started. |
Continuing here for now as I've got a quick command output request. Just had another little look at this. As you have shown in your ls -la output on an openSUSE base our later code looks to be 'finding' the wrong by-id names ie not those in:
which look just dandy.
which are the strange i_Adapter_long_hexadeximal_here ones that it's using in error and failing with. That definitely looks like a bug and one I suspect is shared in out latest CentOS code also, assuming the CentOS base is also producing those names, i.e. could be an artefact of newer drivers for example. Copying in the relevant 'suspect' code area: def get_dev_byid_name() in src/rockstor/system/osi.py So that I might re-create this failure here in a unit test could you, when you next get the chance, paste the output of the following command for me:
so that's probably going to be:
Never seen that "scsi-SDELL_PERC_6" subdirectory before or those weird i_Adapter_names either. So likely that, as you intimated with your supplied listings, is the root of where we are going wrong here. I'll have another look when I next get the time, and after you've posted the above command output, and hopefully there's a fix in here somewhere as I've likely just not accounted for these types of names and they have been inadvertently prioritised / picked-up over our /dev/disk/by-id ones and then used in their place. With the obvious failure that they don't actually exist directly in /dev/disk/by-id. Curios one that is. |
@bored-enginr Re:
Any chance you can paste the requested command outputs? Once we have them in a unit test we can fix what's going wrong here. Pretty sure it's just an over site of these "scsi-SDELL_PERC_6" names going on. And in which case the fix may be fairly straightforward. But we will need those command outputs to ensure we do the right thing and to prove the fix. Cheers. |
I'm sorry for the delay. I had some troubles that necessitated putting this aside for quite some time. Here is the output of the command on the disc pool in OpenSuse. This was taken several months ago:
Here's the same thing from a recent Rockstor (CentOS base) Stable install:
Again, I apologize for the delay. If there's anything else you need, I'd be happy to supply it. |
@bored-enginr Thanks for this. I'll hopefully now be able to reproduce this issue within our unit tests and sort it out. May be a little while unfortunately but would like to sort it as I'm pretty sure it's going to be down to a simple over-site. I'll use this issue to report any progress. Cheers. |
I am currently working on this issue. |
Some controllers/drivers populate /dev/disk/by-id with additional device links within a subdirectory. These are not compatible and will result in basic failure to access these devices for Pool creation etc. The existing associated unit tests were used to avoid known regressions and extended to cover the observed regression case of these subdir device links being inadvertently selected as canonical.
…d_-_wrong_device_path avoid referencing incompatible subdir by-id device names. Fixes #2126
I tried to create a pool on a Dell PERC/6i with three 300GB SAS drives. This is Rockstor on OpenSUSE Leap 15.1. This was intended to be just for testing until new equipment arrived. It appears that the path generated in pool.py or btrfs.py is wrong. (I can't tell where the path is created without looking deeper.)
uname:
ordvac:~ # uname -a Linux ordvac 4.12.14-lp151.28.36-default #1 SMP Fri Dec 6 13:50:27 UTC 2019 (8f4a495) x86_64 x86_64 x86_64 GNU/Linux
lspci:
ordvac:~ # lspci | grep LSI 05:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 1078 (rev 04)
part of the drive path:
ordvac:~ # ls /dev/disk/by-id/ ata-HL-DT-ST_DVD+_-RW_GHA2N_KDXC9BK5217 scsi-3690b11c0223db60025bb4a6c79bc0d52 scsi-3690b11c0223db60025bb4ab47e0f20eb scsi-3690b11c0223db60025bb4af281bd5edf scsi-3690b11c0223db60025bb4b6888c47e73 scsi-3690b11c0223db60025bb4b6888c47e73-part1 scsi-3690b11c0223db60025bb4b6888c47e73-part2 scsi-3690b11c0223db60025bb4b6888c47e73-part3 scsi-3690b11c0223db60025bb4b6888c47e73-part4 scsi-SDELL_PERC_6 wwn-0x690b11c0223db60025bb4a6c79bc0d52 wwn-0x690b11c0223db60025bb4ab47e0f20eb wwn-0x690b11c0223db60025bb4af281bd5edf wwn-0x690b11c0223db60025bb4b6888c47e73 wwn-0x690b11c0223db60025bb4b6888c47e73-part1 wwn-0x690b11c0223db60025bb4b6888c47e73-part2 wwn-0x690b11c0223db60025bb4b6888c47e73-part3 wwn-0x690b11c0223db60025bb4b6888c47e73-part4
Where the disk IDs actually are:
ordvac:~ # ls /dev/disk/by-id/scsi-SDELL_PERC_6/ i_Adapter_00520dbc796c4abb2500b63d22c0110b i_Adapter_00737ec488684bbb2500b63d22c0110b i_Adapter_00737ec488684bbb2500b63d22c0110b-part1 i_Adapter_00737ec488684bbb2500b63d22c0110b-part2 i_Adapter_00737ec488684bbb2500b63d22c0110b-part3 i_Adapter_00737ec488684bbb2500b63d22c0110b-part4 i_Adapter_00df5ebd81f24abb2500b63d22c0110b i_Adapter_00eb200f7eb44abb2500b63d22c0110b
. . .And finally, the Traceback:
Traceback (most recent call last): File "/opt/rockstor/src/rockstor/rest_framework_custom/generic_view.py", line 41, in _handle_exception yield File "/opt/rockstor/src/rockstor/storageadmin/views/pool.py", line 413, in post add_pool(p, dnames) File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 86, in add_pool out, err, rc = run_command(cmd, log=True) File "/opt/rockstor/src/rockstor/system/osi.py", line 120, in run_command raise CommandException(cmd, out, err, rc) CommandException: Error running a command. cmd = /usr/sbin/mkfs.btrfs -f -d raid5 -m raid5 -L Main /dev/disk/by-id/i_Adapter_00520dbc796c4abb2500b63d22c0110b /dev/disk/by-id/i_Adapter_00df5ebd81f24abb2500b63d22c0110b /dev/disk/by-id/i_Adapter_00eb200f7eb44abb2500b63d22c0110b. rc = 1. stdout = ['btrfs-progs v4.19.1 ', 'See http://btrfs.wiki.kernel.org for more information.', '', '']. stderr = ['ERROR: mount check: cannot open /dev/disk/by-id/i_Adapter_00520dbc796c4abb2500b63d22c0110b: No such file or directory', 'ERROR: cannot check mount status of /dev/disk/by-id/i_Adapter_00520dbc796c4abb2500b63d22c0110b: No such file or directory', '']
Sorry it's so long. As I alluded to above, I'm going to retire the disks and controller, bit I'd be happy to reinstall them for testing, if needed.
The text was updated successfully, but these errors were encountered: