cannot resolve path '/dev/disk/by-vdev/c1d14-part1': 2 #1646

ZNikke · 2013-08-12T20:14:04Z

When doing zpool create on a new device using its vdev name, ie a name specified in vdev_id.conf, the zpool create command doesn't seem to wait for udev to actually create the partition symlink before trying to use it.

Doing ls -l afterwards on the path in the error message shows that it does exist, and retrying the command either succeeds or complains on the next unpartitioned disk device.

Ie, successive invocations can look like this:

zpool create -f hometest mirror c1d6 c1d14 c1d22 
cannot resolve path '/dev/disk/by-vdev/c1d14-part1': 2

zpool create -f hometest mirror c1d6 c1d14 c1d22 
cannot resolve path '/dev/disk/by-vdev/c1d6-part1': 2

As the workaround is to merely retry until it succeeds this is a low priority issue, albeit annoying and probably easy to fix if (either wait/retry on error or do whatever udev magic that exist for it to do its thing before trying to use the created partitions).

Environment: Ubuntu 12.04 using the zfs-native/stable PPA (zfsutils 0.6.1-1~precise).

The text was updated successfully, but these errors were encountered:

behlendorf · 2013-08-13T00:02:07Z

In fact we do wait 1 second for udev to create the partition You could try increasing the delay here, although 1 second already sounds like an awfully long time.

https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_vdev.c#L1051

ZNikke · 2013-08-13T07:02:00Z

Yeah, but obviously not long enough to handle the added time for the udev/vdev-thingies. I'm guessing that this will take even longer time on a machine with lots and lots of devices (this machine has 24 disk slots).

The comment block for libzfs/libzfs_pool.c:zpool_label_disk_wait() actually states:

Depending on the udev rules this may take a few seconds.

So waiting just 1 second seems to contradict that comment ;)

Anyhow, changing that timeout from 1 second to 10 seconds should solve the problem for now.

An alternative could be to call the "udevadm settle" command, but I guess you'd stil have to wait for a bit for the partiton scan to trigger the udev event(s).

Btw, instead of doing partition-wait for each disk device it would be faster to first partition all disks and then start waiting for the partition links/devices to show up, but I guess you already knew that :-)

ZNikke · 2013-08-13T12:55:51Z

Hmm. This might be a bigger problem than I thought...

I'm now doing some live testing, and I yanked a drive while doing heavy IO and replaced it with another drive. I then tried doing zpool replace pool drive to sync up the new one, but I constantly get the "cannot resolve path" error message.

The thing is that when the system is maxed out IO-wise creating a partition and doing the rescan can take a LONG time:

zpool replace hometest c1d24 ; time udevadm settle
cannot resolve path '/dev/disk/by-vdev/c1d24-part1': 2

real    0m14.648s
user    0m0.004s
sys     0m0.000s

So after zpool replace has errored out it took 15 seconds for udev to finish its job!

I'm more and more leaning towards that the correct solution is doing something like:
fiddle with partitions
sleep 0.1 s ?
run udevadm settle --exit-if-exists=the-devlink-we're-waiting-for

But of course this poses problems on older distros like Centos/RHEL 5 that has a separate udevsettle command that doesn't grok the --exit-if-exists argument.

behlendorf · 2013-08-13T16:54:02Z

@ZNikke It would be useful to run down exactly why udev is taking so long to complete for you. It typically takes just a few milliseconds per device,, under load it may take slightly longer but shouldn't be that bad. I've tested using systems with 100+ devices and 1 second has always been enough. That said, I'm happy to take a patch to do this in parallel. We could partition them all and then wait say 30 seconds.

ZNikke · 2013-08-14T08:58:09Z

Oh, that's easy. You just have to do some violent testing.

Saturate the machine with enough IO so that file system operations take a while to perform. It helps having a rather crappy IO controller. In our case it's an Areca ARC-1280 in JBOD mode and I think it's still doing something funny under the covers...
Apply more IO (ie, zpool replace or whatnot) and observe that it takes much longer compared to an idle system.

In my zpool replace issue above, the workaround was to stop all IO and then retry the zpool replace.

However, on a loaded disk server (think a loaded Lustre backend thrashed to the max) silencing the IO to be able to perform a zpool replace operation isn't ideal...

So yes, on an idle system with quick/nicely behaving controllers things are rather quick. Add IO load and not-so quick IO controllers and then you're in a completely different ballpark :/

I think the main issue has shifted to having zpool replace behaving in a reliable manner, parallelism for zpool create would be nice but honestly if zpool create takes a while doesn't matter much...

Just increasing the timeout is the "easy fix", but will it be good enough? If there was a config knob so it could be tuned for ill-behaving systems then the default could be set to a couple of seconds for sane systems and we could just bump it in a local config to whatever is needed to get the job done.

behlendorf · 2013-08-14T21:44:58Z

@ZNikke Crappy hardware and drivers will certainly will stress things more! Regarding the zpool replace command, the make_disks() function where we block waiting for udev is common to both the zpool create and zpool replace call paths so increasing the timeout there covers both cases.

We could simply increase it since for well behaved systems you should never reach the timeout and on badly behaved systems you will want to wait longer. The case where this could bite us is for system with broken udev rules resulting in the partition links never being created. In that case it would take longer to get the failure but that's perhaps acceptable.

So then my question for you and your spectacularly bad (and busy) hardware is, how long do we need to wait? Can you play around with this and determine what a reasonable timeout is. I'll be the first to admit the current value was picked out of thin air and just happens to work well for most situations.

ZNikke · 2013-08-15T07:15:04Z

Just looking at my previous comment, it took about 15 seconds for udevadm settle to return, so I'd say that a 20 second timeout should cover it for this particularly crappy controller when loaded. And yes, more testing shows that it's performing even worse than we feared so this rig probably won't be useful for more than pathetic worst-case tests...

I couldn't imagine something worse than this will see real-life use, but then someone reminded me that there are people out there with droves of USB drives hanging off a single USB hub...

So, go for broke with a 30 second timeout? One could even add a note that the flashy solution would be to use "udevadm settle" or an equivalent API.

arturpzol · 2013-09-18T10:11:12Z

I experienced the same problem on my environment. Sometimes creating zpool is successfully sometimes not so I suspect that 1 second timeout is too small:

root@59944118:# zpool create appool /dev/disk/by-id/scsi-2456c6b475047334e -f
cannot resolve path '/dev/disk/by-id/scsi-2456c6b475047334e-part1': 2
root@59944118:# udevtrigger
root@59944118:# zpool create appool /dev/disk/by-id/scsi-2456c6b475047334e -f
cannot resolve path '/dev/disk/by-id/scsi-2456c6b475047334e-part1': 2
root@59944118:# udevtrigger
root@59944118:# zpool create appool /dev/disk/by-id/scsi-2456c6b475047334e -f
cannot resolve path '/dev/disk/by-id/scsi-2456c6b475047334e-part1': 2
root@59944118:# udevtrigger
root@59944118:# zpool create appool /dev/disk/by-id/scsi-2456c6b475047334e -f
root@59944118:# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
appool 304M 480K 304M 0% 1.00x ONLINE -

marcinkuk · 2013-10-04T11:49:17Z

Hi,

the same problem. If I use sdb, sdc, sdd etc. all works fine, but 'by-id' names are unusable for me with ZFSonLinux.
I had problems with changing names if I used sdX, so I would like to use by-id.
Is any chance to solve this issue in next release?

Best regards,
Marcin

When creating a new pool, or adding/replacing a disk in an existing pool, partition tables will be automatically created on the devices. Under normal circumstances it will take less than a second for udev to create the expected device files under /dev/. However, it has been observed that if the system is doing heavy IO concurrently udev may take far longer. If you also throw in some cheap dodgy hardware it may take even longer. To prevent zpool commands from failing due to this the default wait time for udev is being increased to 30 seconds. This will have no impact on normal usage, the increase timeout should only be noticed if your udev rules are incorrectly configured. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#1646

behlendorf · 2013-10-11T21:41:28Z

@ZNikke @arturpzol @marcinkuk Can one of you please verify the patches proposed in #1783 resolves this issue for you.

arturpzol · 2013-10-19T12:00:24Z

Thank you for the fix but it seems that issue is not in timeout udev value:

root@44354374:~# time zpool create test1 /dev/disk/by-id/scsi-2796444756e523656 /dev/disk/by-id/scsi-27851597232454b44 /dev/disk/by-id/scsi-278386c465471514c /dev/disk/by-id/scsi-26d7a547273593576 -f
cannot resolve path '/dev/disk/by-id/scsi-2796444756e523656-part1': 2

real 0m33.180s
user 0m0.030s
sys 0m0.280s

root@44354374:# udevtrigger
root@44354374:# time zpool create test1 /dev/disk/by-id/scsi-2796444756e523656 /dev/disk/by-id/scsi-27851597232454b44 /dev/disk/by-id/scsi-278386c465471514c /dev/disk/by-id/scsi-26d7a547273593576 -f

real 0m2.362s
user 0m0.000s
sys 0m0.030s

Do you think that it can be issue in udev tool?

arturpzol · 2013-10-19T13:00:15Z

I tested again the issue with the newest version of Debian Wheezy (previously test was performed on old Debian) and issue is repeatable with clean ZFS 0.6.2:

root@4435437:~# while true; do zpool create test1 /dev/disk/by-id/scsi-2796444756e523656 /dev/disk/by-id/scsi-27851597232454b44 /dev/disk/by-id/scsi-278386c465471514c /dev/disk/by-id/scsi-26d7a547273593576 -f && zpool destroy test1 ; done
cannot resolve path '/dev/disk/by-id/scsi-278386c465471514c-part1': 2
cannot resolve path '/dev/disk/by-id/scsi-278386c465471514c-part1': 2
cannot resolve path '/dev/disk/by-id/scsi-2796444756e523656-part1': 2
cannot resolve path '/dev/disk/by-id/scsi-2796444756e523656-part1': 2
the kernel failed to rescan the partition table: 16
cannot label 'sdae': try using parted(8) and then provide a specific slice: -1
cannot resolve path '/dev/disk/by-id/scsi-2796444756e523656-part1': 2
the kernel failed to rescan the partition table: 16
cannot label 'sdae': try using parted(8) and then provide a specific slice: -1
cannot resolve path '/dev/disk/by-id/scsi-2796444756e523656-part1': 2
cannot resolve path '/dev/disk/by-id/scsi-26d7a547273593576-part1': 2
cannot resolve path '/dev/disk/by-id/scsi-2796444756e523656-part1': 2

but with fix for 30 seconds timeout problem is not repeatable so I think that I had two issues: first was connected with timeout and second was connected with udev tool/old Debian system.

behlendorf · 2013-10-19T18:41:44Z

Thanks for the feedback. It sounds like in general this will help make things more robust.

When creating a new pool, or adding/replacing a disk in an existing pool, partition tables will be automatically created on the devices. Under normal circumstances it will take less than a second for udev to create the expected device files under /dev/. However, it has been observed that if the system is doing heavy IO concurrently udev may take far longer. If you also throw in some cheap dodgy hardware it may take even longer. To prevent zpool commands from failing due to this the default wait time for udev is being increased to 30 seconds. This will have no impact on normal usage, the increase timeout should only be noticed if your udev rules are incorrectly configured. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#1646

SangeethaBusangari · 2015-09-23T03:34:53Z

Hello all, iam trying to create lxc on ubuntu 14.04.i executed
sudo zpool create -f tank /dev/vdc
it shows try sudo /sbin/modprobe zfs
so i executed sudo /sbin/modprobe zfs
no error.
sudo zpool create -f tank /dev/vdc
cannot resolve path /dev/vdc
just iam following steps according to
http://terrarum.net/blog/building-an-lxc-server-1404.html#prerequisites-and-dependencies
iam not aware of zpool etc. please help me. what went wrong. how can i get it back?
Thanks
Sangeetha

RubenKelevra · 2016-04-05T16:03:11Z

Still encounter this issue in 0.6.5.6 on Centos 7 ... has this patch been reverted?

jeffsf · 2019-09-24T01:11:00Z

Similar issue with Debian 10 where, immediately after formatting with sgdisk, scripted creation of zpool create referencing /dev/disk/by-id/ in creation of a mirror fails, but the command as output by set -x successfully executes when copied and pasted to the command line.

Adding a one-second delay prior to zpool create appears to resolve the condition.

mgmartin mentioned this issue Sep 10, 2012

zfs destroy on a zvol snapshot doesn't remove /dev/zd* device #903

Closed

behlendorf mentioned this issue Oct 11, 2013

Increase default udev wait time #1783

Closed

behlendorf closed this as completed in 11cb9d7 Oct 22, 2013

dmikalova mentioned this issue Aug 10, 2014

ZPool creation doesn't work for slower drives. #2582

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot resolve path '/dev/disk/by-vdev/c1d14-part1': 2 #1646

cannot resolve path '/dev/disk/by-vdev/c1d14-part1': 2 #1646

ZNikke commented Aug 12, 2013

behlendorf commented Aug 13, 2013

ZNikke commented Aug 13, 2013

ZNikke commented Aug 13, 2013

behlendorf commented Aug 13, 2013

ZNikke commented Aug 14, 2013

behlendorf commented Aug 14, 2013

ZNikke commented Aug 15, 2013

arturpzol commented Sep 18, 2013

marcinkuk commented Oct 4, 2013

behlendorf commented Oct 11, 2013

arturpzol commented Oct 19, 2013

arturpzol commented Oct 19, 2013

behlendorf commented Oct 19, 2013

SangeethaBusangari commented Sep 23, 2015

RubenKelevra commented Apr 5, 2016

jeffsf commented Sep 24, 2019 •

edited

Loading

cannot resolve path '/dev/disk/by-vdev/c1d14-part1': 2 #1646

cannot resolve path '/dev/disk/by-vdev/c1d14-part1': 2 #1646

Comments

ZNikke commented Aug 12, 2013

behlendorf commented Aug 13, 2013

ZNikke commented Aug 13, 2013

ZNikke commented Aug 13, 2013

behlendorf commented Aug 13, 2013

ZNikke commented Aug 14, 2013

behlendorf commented Aug 14, 2013

ZNikke commented Aug 15, 2013

arturpzol commented Sep 18, 2013

marcinkuk commented Oct 4, 2013

behlendorf commented Oct 11, 2013

arturpzol commented Oct 19, 2013

arturpzol commented Oct 19, 2013

behlendorf commented Oct 19, 2013

SangeethaBusangari commented Sep 23, 2015

RubenKelevra commented Apr 5, 2016

jeffsf commented Sep 24, 2019 • edited Loading

jeffsf commented Sep 24, 2019 •

edited

Loading