Commits on Feb 8, 2012
  1. Addresses code review concerns for dracutdir patch

    Comments: zfsonlinux#561 (comment)
    * No longer check for SED nor GREP in user-dracut.m4
    * Support --with-dracutdir=... as well as --without-dracutdir
    * Adjust how Dracut build is disabled when --without-dracutdir or missing dracut script:
      - Put switch in dracut/ instead of dracut/90zfs/
      - Retain dracut *.in files when doing a `make dist` so end user can include dracut if needed
    * Modify RPM specs to honor original configure script's call with regard to dracut behavior.
    * Modify DEB/alien call to exclude dracut RPM if it wasn't built.
    * Move INSTALL_DRACUT conditional out to zfs-build.m4 so that it's available for both
      kernel & user builds.
    committed Feb 8, 2012
Commits on Feb 5, 2012
  1. Add autoconf detection to find Dracut modules.d directory (different …

    …versions of Dracut place it differently).
    Add --with-dracutdir parameter to override detection.
    Don't install Dracut modules if Dracut isn't installed on host.
    Fixes #546
    committed Feb 5, 2012
Commits on Feb 3, 2012
  1. Linux 3.3 compat, sops->show_options()

    The second argument of sops->show_options() was changed from a
    'struct vfsmount *' to a 'struct dentry *'.  Add an autoconf check
    to detect the API change and then conditionally define the expected
    interface.  In either case we are only interested in the zfs_sb_t.
    Signed-off-by: Brian Behlendorf <>
    Closes #549
    behlendorf committed Feb 2, 2012
Commits on Feb 2, 2012
  1. Cleanup ZFS debug infrastructure

    Historically the internal zfs debug infrastructure has been
    scattered throughout the code.  Since we expect to start making
    more use of this code this patch performs some cleanup.
    * Consolidate the zfs debug infrastructure in the zfs_debug.[ch]
      files.  This includes moving the zfs_flags and zfs_recover
      variables, plus moving the zfs_panic_recover() function.
    * Remove the existing unused functionality in zfs_debug.c and
      replace it with code which correctly utilized the spl logging
    * Remove the __dprintf() function from zfs_ioctl.c.  This is
      dead code, the dprintf() functionality in the kernel relies
      on the spl log support.
    * Remove dprintf() from hdr_recl().  This wasn't particularly
      useful and was missing the required format specifier anyway.
    * Subsequent patches should unify the dprintf() and zfs_dbgmsg()
    Signed-off-by: Brian Behlendorf <>
    behlendorf committed Jan 20, 2012
  2. Allow multiple values per directory entry

    When using zfs to back a Lustre filesystem it's advantageous to
    to store a fid with the object id in the directory zap.  The only
    technical impediment to doing this is that the zpl code expects
    a single value in the zap per directory entry.
    This change relaxes that requirement such that multiple entries
    are allowed provided the first one is the object id.  The zpl
    code will just ignore additional entries.  This allows the ZoL
    count to mount datasets which are being used as Lustre server
    Once the upstream feature flags support is merged in this change
    should be updated to a read-only feature.  Until this occurs
    other zfs implementations will not be able to read the zfs
    filesystems created by Lustre.
    Signed-off-by: Brian Behlendorf <>
    behlendorf committed Jan 27, 2012
Commits on Jan 27, 2012
  1. Export symbol zfs_attr_table

    Export the zfs_attr_table symbol so it may be used by non-zpl
    consumers which are still interested in writing a zpl compatible
    dataset (e.g. Lustre).
    Signed-off-by: Brian Behlendorf <>
    behlendorf committed Jan 27, 2012
Commits on Jan 19, 2012
  1. Ignore dataset if the dds_type is DMU_OST_OTHER

    Since the zpios and potentially other ZFS tests use the
    DMU_OST_OTHER type to label their datasets, the zpool and
    zfs commands should gracefully handle this type when it is
    encountered.  This patch modifies the commands' behavior
    to ignore any datasets with a dds_type of DMU_OST_OTHER.
    Signed-off-by: Prakash Surya <>
    Signed-off-by: Brian Behlendorf <>
    Closes #536
    prakashsurya committed with behlendorf Jan 18, 2012
Commits on Jan 18, 2012
  1. Fix rpm dependencies

    This change updates the rpm spec files to have strictly correct
    package dependencies.  That means a few things:
    * The zfs-modules package is now tied to a specific build of
      the spl-modules packages based on the kernel version.  This
      ensures that the correct spl-modules packages will always get
      installed and not just the newest.
    * The zfs package now requires both the zfs-modules and spl
      packages.  Thus a 'yum install zfs' will pull in the minimal
      set of packages required for a functional system.
    * The zfs-devel packages now require the zfs package to be
      installed which is normal behavior for -devel packages.
    * Remove the redundant distribution release extension.  This
      is already added once because it is part of the kernel package
      release name.
    Signed-off-by: Brian Behlendorf <>
    behlendorf committed Jan 18, 2012
  2. Add the release component to headers

    When the original build system code was added the release
    component was accidentally omited from the development header
    install path.  This patch adds the missing path component so
    it's always clear exactly what release your compiling against.
    Signed-off-by: Brian Behlendorf <>
    behlendorf committed Jan 18, 2012
  3. Allow GPT+EFI vdev replacement in boot pools.

    Commit zfsonlinux/zfs@57a4edd
    allows the bootfs property to be set on any pool, but does not
    accommodate subsequent vdev changes. For example:
    	# zpool replace rpool /dev/sda /dev/sdb
    	operation not supported on this type of pool
    	property 'bootfs' is not supported on EFI labeled devices
    For non-Solaris builds, disable the check that emits this error.
    Signed-off-by: Brian Behlendorf <>
    dajhorn committed with behlendorf Jan 18, 2012
Commits on Jan 17, 2012
  1. Combine libraries: spl, avl, efi, share, unicode.

    These libraries, which are an artifact of the ZoL development
    process, conflict with packages that are already in distribution:
      * libspl: SPL Programming Language
      * libavl: AVL for Linux
      * libefi: GRUB
    And these libraries are potential conflicts:
      * libshare: the Linux Mount Manager
      * libunicode: Perl and Python
    Recompose these five ZoL components into the four libraries that are
    conventionally provided by Solaris and FreeBSD systems:
      + libnvpair
      + libuutil
      + libzpool
      + libzfs
    This change resolves the name conflict, makes ZoL more compatible
    with existing software that uses autotools to detect ZFS, and allows
    pkg-zfs to better reflect the official Debian kFreeBSD packaging.
    Signed-off-by: Brian Behlendorf <>
    Closes: #430
    dajhorn committed with behlendorf Dec 31, 2011
  2. Allow setting bootfs on any pool

    The vdev_is_bootable() restrictions are no longer necessary
    with recent GRUB2 code.  FreeBSD has implemented the same
    change, except that I moved the Solaris comment to be inside
    the #ifdef __sun__ block.
    Signed-off-by: Brian Behlendorf <>
    Issue #317
    rlaager committed with behlendorf Jan 13, 2012
  3. Reduce number of zio free threads

    As described in Issue #458 and #258, unlinking large amounts of data
    can cause the threads in the zio free wait queue to start spinning.
    Reducing the number of z_fr_iss threads from a fixed value of 100 to 1
    per cpu signficantly reduces contention on the taskq spinlock and
    improves throughput.
    Instrumenting the taskq code showed that __taskq_dispatch() can spend
    a long time holding tq->tq_lock if there are a large number of threads
    in the queue.  It turns out the time spent in wake_up() scales
    linearly with the number of threads in the queue.  When a large number
    of short work items are dispatched, as seems to be the case with
    unlink, the worker threads drain the queue faster than the dispatcher
    can fill it.  They then all pile into the work wait queue to wait for
    new work items.  So if 100 threads are in the queue, wake_up() takes
    about 100 times as long, and the woken threads have to spin until the
    dispatcher releases the lock.
    Reducing the number of threads helps with the symptoms, but doesn't
    get to the root of the problem.  It would seem that wake_up()
    shouldn't scale linearly in time with queue depth, particularly if we
    are only trying to wake up one thread.  In that vein, I tried making
    all of the waiting processes exclusive to prevent the scheduler from
    iterating over the entire list, but I still saw the linear time
    scaling.  So further investigation is needed, but in the meantime
    reducing the thread count is an easy workaround.
    Signed-off-by: Brian Behlendorf <>
    Issue #258
    Issue #458
    nedbass committed with behlendorf Jan 13, 2012
Commits on Jan 13, 2012
  1. Increase link count limit to 2^31-1

    Originally, the per-file link limit was set to 65536 because the
    exact Linux VFS limit was unclear.  Internally ZFS is able to
    support 64-bit link counts.  After a more careful investigation
    the limit can be safely raised to 2^31-1.
    Signed-off-by: Brian Behlendorf <>
    Closes #514
    behlendorf committed Jan 13, 2012
  2. Run ZFS_AC_PACMAN only if $VENDOR is "arch"

    Unfortunately, Arch's package manager `pacman` shares it's name with a
    popular arcade video game. Thus, in order to refrain from executing the
    video game when we mean to execute the package manager, ZFS_AC_PACMAN is
    now only run when $VENDOR is determined to be "arch".
    Signed-off-by: Prakash Surya <>
    Signed-off-by: Brian Behlendorf <>
    Closes #517
    prakashsurya committed with behlendorf Jan 13, 2012
Commits on Jan 12, 2012
  1. Add overlay(-O) mount option support

    Linux supports mounting over non-empty directories by default.
    In Solaris this is not the case and -O option is required for
    zfs mount to mount a zfs filesystem over a non-empty directory.
    For compatibility, I've added support for -O option to mount
    zfs filesystems over non-empty directories if the user wants
    to, just like in Solaris.
    I've defined MS_OVERLAY to record it in the flags variable if
    the -O option is supplied.  The flags variable passes through
    a few functions and its checked before performing the empty
    directory check in zfs_mount function.  If -O is given, the
    check is not performed.
    Signed-off-by: Brian Behlendorf <>
    Closes #473
    Suman Chakravartula committed with behlendorf Jan 12, 2012
  2. Apply the ZoL coding standard to zpl_xattr.c

    Make the indenting in the zpl_xattr.c file consistent with the Sun
    coding standard by removing soft tabs.
    Signed-off-by: Brian Behlendorf <>
    dajhorn committed with behlendorf Jan 5, 2012
  3. Linux 3.2 compat, security_inode_init_security()

    The security_inode_init_security() API has been changed to include
    a filesystem specific callback to write security extended attributes.
    This was done to support the initialization of multiple LSM xattrs
    and the EVM xattr.
    This change updates the code to use the new API when it's available.
    Otherwise it falls back to the previous implementation.
    autoconf test has been made more rigerous by passing the expected
    types.  This is done to ensure we always properly the detect the
    correct form for the security_inode_init_security() API.
    Signed-off-by: Brian Behlendorf <>
    Closes #516
    behlendorf committed Jan 12, 2012
  4. Treat /dev/vd* as whole disks

    Correctly detect /dev/vd devices as whole disks and attempt to
    create an EFI partition table.
    Signed-off-by: Brian Behlendorf <>
    rlaager committed with behlendorf Jan 9, 2012
Commits on Jan 11, 2012
  1. Avoid using awk in the zpool_id script.

    Some implementations of `awk` incorrectly parse the \< and \> regex
    symbols, so use a `while read` loop and regular globbing instead.
    Signed-off-by: Brian Behlendorf <>
    Closes: #259
    dajhorn committed with behlendorf Dec 30, 2011
  2. Linux 3.1 compat, super_block->s_shrink

    The Linux 3.1 kernel has introduced the concept of per-filesystem
    shrinkers which are directly assoicated with a super block.  Prior
    to this change there was one shared global shrinker.
    The zfs code relied on being able to call the global shrinker when
    the arc_meta_limit was exceeded.  This would cause the VFS to drop
    references on a fraction of the dentries in the dcache.  The ARC
    could then safely reclaim the memory used by these entries and
    honor the arc_meta_limit.  Unfortunately, when per-filesystem
    shrinkers were added the old interfaces were made unavailable.
    This change adds support to use the new per-filesystem shrinker
    interface so we can continue to honor the arc_meta_limit.  The
    major benefit of the new interface is that we can now target
    only the zfs filesystem for dentry and inode pruning.  Thus we
    can minimize any impact on the caching of other filesystems.
    In the context of making this change several other important
    issues related to managing the ARC were addressed, they include:
    * The dnlc_reduce_cache() function which was called by the ARC
    to drop dentries for the Posix layer was replaced with a generic
    zfs_prune_t callback.  The ZPL layer now registers a callback to
    drop these dentries removing a layering violation which dates
    back to the Solaris code.  This callback can also be used by
    other ARC consumers such as Lustre.
    * The arc_reduce_dnlc_percent module option has been changed to
    arc_meta_prune for clarity.  The dnlc functions are specific to
    Solaris's VFS and have already been largely eliminated already.
    The replacement tunable now represents the number of bytes the
    prune callback will request when invoked.
    * Less aggressively invoke the prune callback.  We used to call
    this whenever we exceeded the arc_meta_limit however that's not
    strictly correct since it results in over zeleous reclaim of
    dentries and inodes.  It is now only called once the arc_meta_limit
    is exceeded and every effort has been made to evict other data from
    the ARC cache.
    * More promptly manage exceeding the arc_meta_limit.  When reading
    meta data in to the cache if a buffer was unable to be recycled
    notify the arc_reclaim thread to invoke the required prune.
    * Added arcstat_prune kstat which is incremented when the ARC
    is forced to request that a consumer prune its cache.  Remember
    this will only occur when the ARC has no other choice.  If it
    can evict buffers safely without invoking the prune callback
    it will.
    * This change is also expected to resolve the unexpect collapses
    of the ARC cache.  This would occur because when exceeded just the
    arc_meta_limit reclaim presure would be excerted on the arc_c
    value via arc_shrink().  This effectively shrunk the entire cache
    when really we just needed to reclaim meta data.
    Signed-off-by: Brian Behlendorf <>
    Closes #466
    Closes #292
    behlendorf committed Dec 22, 2011
Commits on Dec 19, 2011
  1. Move Arch Linux's VENDOR check above Ubuntu's

    If the lsb-release package is installed on an Arch Linux distribution,
    the configure step will incorrectly detect the running distribution as
    Ubuntu. This is a result of both distributions providing an
    /etc/lsb-release file, and the Ubuntu VENDOR check being performed
    Since the Arch Linux test check's for a file more specific to the Arch
    Linux distribution, moving Arch Linux's VENDOR check above Unbuntu's
    check provides a quick and easy solution.
    Signed-off-by: Prakash Surya <>
    Signed-off-by: Brian Behlendorf <>
    prakashsurya committed with behlendorf Dec 17, 2011
Commits on Dec 17, 2011
  1. Add LIBSELINUX to mount_zfs_LDFLAGS.

    Regenerating the autotools configuration on Debian and Ubuntu systems
    causes compilation to fail with this error message:
        undefined reference to `is_selinux_enabled'
    In the automake template, set "mount_zfs_LDFLAGS = ... $(LIBSELINUX)"
    so that the /sbin/mount.zfs utility is linked to libselinux.
    Signed-off-by: Brian Behlendorf <>
    dajhorn committed with behlendorf Dec 17, 2011
  2. Linux 3.2 compat: set_nlink()

    Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit
      SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170
    Use the new set_nlink() kernel function instead.
    Signed-off-by: Brian Behlendorf <>
    Closes: #462
    dajhorn committed with behlendorf Dec 16, 2011
Commits on Dec 16, 2011
  1. Update the character class in the zpool man page.

    ZoL and all Solaris derivatives allow pool names to contain the colon
    and space characters. Update the man page to reflect current behavior.
    Signed-off-by: Brian Behlendorf <>
    Closes: #438
    dajhorn committed with behlendorf Dec 16, 2011
Commits on Dec 15, 2011
  1. Add make rule for building Arch Linux packages

    Added the necessary build infrastructure for building packages
    compatible with the Arch Linux distribution. As such, one can now run:
        $ ./configure
        $ make pkg     # Alternatively, one can run 'make arch' as well
    on the Arch Linux machine to create two binary packages compatible with
    the pacman package manager, one for the zfs userland utilities and
    another for the zfs kernel modules. The new packages can then be
    installed by running:
        # pacman -U $package.pkg.tar.xz
    In addition, source-only packages suitable for an Arch Linux chroot
    environment or remote builder can also be build using the 'sarch' make
    NOTE: Since the source dist tarball is created on the fly from the head
    of the build tree, it's MD5 hash signature will be continually influx.
    As a result, the md5sum variable was intentionally omitted from the
    PKGBUILD files, and the '--skipinteg' makepkg option is used. This may
    or may not have any serious security implications, as the source tarball
    is not being downloaded from an outside source.
    Signed-off-by: Prakash Surya <>
    Signed-off-by: Brian Behlendorf <>
    Closes #491
    prakashsurya committed with behlendorf Dec 8, 2011
Commits on Dec 14, 2011
  1. Illumos #734: Use taskq_dispatch_ent() interface

    It has been observed that some of the hottest locks are those
    of the zio taskqs.  Contention on these locks can limit the
    rate at which zios are dispatched which limits performance.
    This upstream change from Illumos uses new interface to the
    taskqs which allow them to utilize a prealloc'ed taskq_ent_t.
    This removes the need to perform an allocation at dispatch
    time while holding the contended lock.  This has the effect
    of improving system performance.
    Reviewed by: Albert Lee <>
    Reviewed by: Richard Lowe <>
    Reviewed by: Alexey Zaytsev <>
    Reviewed by: Jason Brian King <>
    Reviewed by: George Wilson <>
    Reviewed by: Adam Leventhal <>
    Approved by: Gordon Ross <>
    References to Illumos issue:
    Ported-by: Prakash Surya <>
    Signed-off-by: Brian Behlendorf <>
    Closes #482
    Garrett D'Amore committed with behlendorf Nov 8, 2011
Commits on Dec 7, 2011
  1. Set zvol_major/zvol_threads permissions

    The zvol_major and zvol_threads module options were being created
    with 0 permission bits.  This prevented them from being listed in
    the /sys/module/zfs/parameters/ directory, although they were
    visible in `modinfo zfs`.  This patch fixes the issue by updating
    the permission bits to 0444.  For the moment these options must
    be read-only because they are used during module initialization.
    Signed-off-by: Brian Behlendorf <>
    Issue #392
    behlendorf committed Dec 7, 2011
Commits on Dec 5, 2011
  1. Update default ARC memory limits

    In the upstream OpenSolaris ZFS code the maximum ARC usage is
    limited to 3/4 of memory or all but 1GB, whichever is larger.
    Because of how Linux's VM subsystem is organized these defaults
    have proven to be too large which can lead to stability issues.
    To avoid making everyone manually tune the ARC the defaults are
    being changed to 1/2 of memory or all but 4GB.  The rational for
    this is as follows:
    * Desktop Systems (less than 8GB of memory)
      Limiting the ARC to 1/2 of memory is desirable for desktop
      systems which have highly dynamic memory requirements.  For
      example, launching your web browser can suddenly result in a
      demand for several gigabytes of memory.  This memory must be
      reclaimed from the ARC cache which can take some time.  The
      user will experience this reclaim time as a sluggish system
      with poor interactive performance.  Thus in this case it is
      preferable to leave the memory as free and available for
      immediate use.
    * Server Systems (more than 8GB of memory)
      Using all but 4GB of memory for the ARC is preferable for
      server systems.  These systems often run with minimal user
      interaction and have long running daemons with relatively
      stable memory demands.  These systems will benefit most by
      having as much data cached in memory as possible.
    These values should work well for most configurations.  However,
    if you have a desktop system with more than 8GB of memory you may
    wish to further restrict the ARC.  This can still be accomplished
    by setting the 'zfs_arc_max' module option.
    Additionally, keep in mind these aren't currently hard limits.
    The ARC is based on a slab implementation which can suffer from
    memory fragmentation.  Because this fragmentation is not visible
    from the ARC it may believe it is within the specified limits while
    actually consuming slightly more memory.  How much more memory get's
    consumed will be determined by how badly fragmented the slabs are.
    In the long term this can be mitigated by slab defragmentation code
    which was OpenSolaris solution.  Or preferably, using the page cache
    to back the ARC under Linux would be even better.  See issue #75
    for the benefits of more tightly integrating with the page cache.
    This change also fixes a issue where the default ARC max was being
    set incorrectly for machines with less than 2GB of memory.  The
    constant in the arc_c_max comparison must be explicitly cast to
    a uint64_t type to prevent overflow and the wrong conditional
    branch being taken.  This failure was typically observed in VMs
    which are commonly created with less than 2GB of memory.
    Signed-off-by: Brian Behlendorf <>
    Issue #75
    behlendorf committed Dec 5, 2011
  2. Quote variables in the zfs.lsb script.

    For consistency and safety, quote all variables in the zfs.lsb script.
    This protects in the unlikely case that any of the file names contain
    Signed-off-by: Brian Behlendorf <>
    Issue #439
    dajhorn committed with behlendorf Dec 4, 2011
  3. Source /etc/default/zfs after setting defaults.

    Let the administrator override all script variables by sourcing the
    /etc/default/zfs file after the default values are set.
    The spelling mistake in the old path name makes it unlikely that this
    bug affected any users.
    Signed-off-by: Brian Behlendorf <>
    Closes: #371
    dajhorn committed with behlendorf Dec 4, 2011
  4. Demote the whackbang in the zpool_id script.

    The zpool_id script is posixly correct and does not use bash
    features, so change its whackbang from /bin/bash to /bin/sh.
    Debian policy also stipulates that system scripts be dash compatible.
    Signed-off-by: Brian Behlendorf <>
    dajhorn committed with behlendorf Dec 4, 2011
  5. Demote egrep to grep in the zpool_id script.

    Direct invocation of GNU egrep is deprecated by its man page, and the
    its argument in the zpool_id script is not an extended expression.
    Signed-off-by: Brian Behlendorf <>
    dajhorn committed with behlendorf Dec 4, 2011
  6. Quote variables in the zpool_id script.

    For consistency and safety, quote all variables in the zpool_id
    script. This accomodates a `-c CONFIG` parameter value with
    whitespace in the path name.
    Also fix a typo in the usage synopsis for `-h`.
    Signed-off-by: Brian Behlendorf <>
    Issue #439
    dajhorn committed with behlendorf Dec 4, 2011
  7. Support path_id changes in udev 174.

    The /lib/udev/path_id helper became a builtin command in the udev 174
    release, so test whether path_id is external in the zpool_id script.
    Signed-off-by: Brian Behlendorf <>
    Closes: #429
    dajhorn committed with behlendorf Dec 4, 2011