Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during regression tests. #233

Closed
astifter opened this issue May 7, 2011 · 8 comments
Closed

Error during regression tests. #233

astifter opened this issue May 7, 2011 · 8 comments
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Milestone

Comments

@astifter
Copy link

astifter commented May 7, 2011

I compiled SPL and ZFS 0.6.0-rc4 on Ubuntu 10.10 today and experienced problems with the regression tests.

OS: Ubuntu 10.10
Kernel: Linux ubuntu10 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux
SPL/ZFS: 0.6.0-rc4

Both SPL and ZFS compiled with './configure --prefix /usr; make; sudo make install'.

Modules load fine into Kernel, 'splat -a' reports no errors.

The first two tests of 'zconfig.sh -c' run fine, from the third test on I get errors:

$> zconfig.sh -c
1 persistent zpool.cache Pass
2 scan disks for pools to import Pass
3 zpool import/export device /dev/tank/volume may not be immediately available
Fail (4)

$> zconfig.sh -c -v
Destroying
1 persistent zpool.cache Pass
2 scan disks for pools to import Pass
3 zpool import/export device /dev/tank/volume may not be immediately available
Fail (4)

$> sudo /usr/libexec/zfs/zconfig.sh -c -v -s 3
Destroying
1 persistent zpool.cache Pass
2 scan disks for pools to import Pass
3 zpool import/export device Skip
4 zpool insmod/rmmod device /dev/tank/volume may not be immediately available
Fail (4)

Am I doing something wrong? I tested 0.5.2 right before 0.6.0-rc4 and all regession tests ran fine....

@behlendorf
Copy link
Contributor

Your not doing anything wrong, this is actually a bit of a known issue. Unfortunately, the regression tests can no longer be entirely run in-tree. They've grown certain dependencies on zvol udev rules which are installed as part of the packages. If you build and install both the spl+zfs then run the regression tests they should work. They will be installed as part of the zfs-test package in /usr/libexec/zfs/.

Longer term I'm conflicted if this should be fixed to run in-tree(somehow), or if all the tests should simply be moved out of the zfs package in to their own full zfs regression test suite package.

@astifter
Copy link
Author

astifter commented May 9, 2011

Sorry for my inaccurate bug report, I was running all of the tests from /usr/libexec/zfs, not from in-tree. Any further suggestions?

@behlendorf
Copy link
Contributor

Oh... then this is a little more troubling. The test is likely failing because the symlinks for the zvol devices aren't being created. The most common reason is there's a problem with the 60-zvol.rules udev rule on your system. Basically, when you load the zfs module /dev/zdN devices are created for all the zvols, udev then creates symlinks to these devices with the help of the zvol_id helper.

The first thing I'd verify after the test failure is that the /dev/zdN devices exist. If they do then it's definitely a problem with the udev rule. Unfortunately, exactly how to write a udev rule can differ slightly based on your exact version of udev. We've tried to provide one that's as portable as possible but you may have some some corner case.

@bitloggerig
Copy link

Same on Arch Linux with recent SPL&ZFS from GIT:

zfs create -s -V 100g storage/thinvol

/dev/storage/thinvol may not be immediately available

The /dev/zd0 exists, but there is no /dev/storage/thinvol available.

@behlendorf
Copy link
Contributor

These udev rules issues are believed to have been resolved.

@gregfr
Copy link

gregfr commented Apr 8, 2013

I'm having the same issue today:

Debian 6.0.7 spl+zfs 0.6.1 compiled from source and installed:

# uname -a
Linux ksA1 2.6.32-18-pve #1 SMP Mon Jan 21 12:09:05 CET 2013 x86_64 GNU/Linux

# /usr/local/share/zfs/zconfig.sh -c
1    persistent zpool.cache             Pass
2    scan disks for pools to import     Pass
3    zpool import/export device         /dev/tank/volume may not be immediately available
Error: Could not stat device /dev/zvol/tank/volume - No such file or directory.
Fail (4)

# ll /dev/zd*
brw-rw---- 1 root disk 230, 0 Apr  8 05:21 /dev/zd0

# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank  1000M   360K  1000M     0%  1.00x  ONLINE  -

# zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
tank          104M   363M  43.3K  /tank
tank/volume   103M   466M  23.9K  -


# ll /dev/ta*
ls: cannot access /dev/ta*: No such file or directory

# /usr/local/share/zfs/zfault.sh -c
                                        raid0   raid10  raidz   raidz2  raidz3
1    soft write error                   cannot create 'tank': I/O error
cannot open 'tank': no such pool
Fail (3)


@behlendorf
Copy link
Contributor

The lack of /dev/zvol/pool/datset links is usually caused by the udev rules not being properly installed. When configuring for your system make sure you use the following options to ensure they're installed in the right place.

  --with-udevdir=DIR      install udev helpers [[EPREFIX/lib/udev]]
  --with-udevruledir=DIR  install udev rules [[UDEVDIR/rules.d]

@gregfr
Copy link

gregfr commented Apr 9, 2013

I recompiled with:

./configure  --with-udevdir=/lib/udev

now I have:

# /usr/local/share/zfs/zconfig.sh -c
Skipping test 10 which requires the scsi_debug  module and the /usr/bin/lsscsi utility
1    persistent zpool.cache             Pass
2    scan disks for pools to import     Pass
3    zpool import/export device         Pass
4    zpool insmod/rmmod device          Pass
5    zvol+ext2 volume                   Warning: specified blocksize 4096 is less than device physical sectorsize 8192, forced to continue
Pass
6    zvol+ext2 snapshot                 Warning: specified blocksize 4096 is less than device physical sectorsize 8192, forced to continue
Pass
7    zvol+ext2 clone                    Warning: specified blocksize 4096 is less than device physical sectorsize 8192, forced to continue
Pass
8    zfs send/receive                   Warning: specified blocksize 4096 is less than device physical sectorsize 8192, forced to continue
Pass
9    zpool events                       Pass
10   zpool add/remove vdev              Skip

however

# /usr/local/share/zfs/zfault.sh -c
                                        raid0   raid10  raidz   raidz2  raidz3
1    soft write error                   cannot create 'tank': I/O error
cannot open 'tank': no such pool
Fail (3)

and

# ll /dev/z*
crw-rw-rw- 1 root root  1,  5 Apr  9 07:35 /dev/zero
crw------- 1 root root 10, 56 Apr  9 07:38 /dev/zfs

also in "/lib/udev/rules.d/" I had a file "90-zfs.rules.disabled" which I renamed to "90-zfs.rules" (and reboot).

I'm completly confused... zfs works fine on all my systems and I love it! but I feel I'm missing something more than just the test scripts...

sdimitro pushed a commit to sdimitro/zfs that referenced this issue Dec 11, 2020
On very fragmented pools, there may be no large free chunks in the
normal allocation class (e.g. ~64KB, the size of compressed indirect
blocks).  This will cause the allocation to fall back on the embedded
slog metaslab.  This can cause the embedded slog metaslab to become
full/fragmented enough that ZIL allocations fail, causing sync writes to
fall back on txg_wait_synced(), which is very very slow.

To address this problem, this commit makes allocations try to gang
before allowing the embedded slog metaslab to be used for normal
allocations.  Although ganging is slow (it's roughly 25% of normal
performance, because we do ~4 writes for each block), it's much better
than sync writes whose ZIL allocation fails, which can be 0.1% normal
performance (txg_wait_synced() could take 10 seconds compared to a 10ms
write).

Additionally, when writing a Gang Block Header (GBH), if the allocation
from the normal class fails, retry it from the embedded slog.  The GBH
is 512 bytes, so it can only fail when the class is completely full.
This change doesn't impact performance but it prevents an unnecessary
allocation failure on small, extremely full pools, where there is zero
free space in the normal class.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Projects
None yet
Development

No branches or pull requests

4 participants