Commits on Sep 29, 2010
  1. - updated release to beta-03

    Anand Mitra committed Sep 29, 2010
  2. - minor formating and fixes

    Anand Mitra committed Sep 29, 2010
Commits on Sep 23, 2010
  1. - copy to and from user buffer was incorrect in the read/write, fix t…

    …o handle
    
      page faults correctly.
    - partial fix to export filesystems
    Anand Mitra committed Sep 23, 2010
Commits on Sep 14, 2010
  1. Pushing spl changes for zpl linux support.

    Changes corresponding to tag beta-01
    Anand Mitra committed Sep 14, 2010
Commits on Nov 24, 2009
  1. @behlendorf

    spl-modules-devel package must depend on the exact version of kernel

    devel package it was built against.
    behlendorf committed Nov 24, 2009
  2. @behlendorf
  3. @behlendorf
Commits on Nov 21, 2009
  1. @behlendorf
  2. @behlendorf
Commits on Nov 15, 2009
  1. @behlendorf
  2. @behlendorf
  3. @behlendorf

    Add mutex_enter_nested() as wrapper for mutex_lock_nested().

    This symbol can be used by GPL modules which use the SPL to handle
    cases where a call path takes a two different locks by the same
    name.  This is needed to avoid a false positive in the lock checker.
    behlendorf committed Nov 15, 2009
Commits on Nov 13, 2009
  1. @behlendorf

    Linux 2.6.31 kmem cache alignment fixes and cleanup.

    The big fix here is the removal of kmalloc() in kv_alloc().  It used
    to be true in previous kernels that kmallocs over PAGE_SIZE would
    always be pages aligned.  This is no longer true atleast in 2.6.31
    there are no longer any alignment expectations.  Since kv_alloc()
    requires the resulting address to be page align we no only either
    directly allocate pages in the KMC_KMEM case, or directly call
    __vmalloc() both of which will always return a page aligned address.
    Additionally, to avoid wasting memory size is always a power of two.
    
    As for cleanup several helper functions were introduced to calculate
    the aligned sizes of various data structures.  This helps ensure no
    case is accidentally missed where the alignment needs to be taken in
    to account.  The helpers now use P2ROUNDUP_TYPE instead of P2ROUNDUP
    which is safer since the type will be explict and we no longer count
    on the compiler to auto promote types hopefully as we expected.
    
    Always wnforce minimum (SPL_KMEM_CACHE_ALIGN) and maximum (PAGE_SIZE)
    alignment restrictions at cache creation time.
    
    Use SPL_KMEM_CACHE_ALIGN in splat alignment test.
    behlendorf committed Nov 13, 2009
Commits on Nov 12, 2009
  1. @behlendorf

    Remove __GFP_NOFAIL in kmem and retry internally.

    As of 2.6.31 it's clear __GFP_NOFAIL should no longer be used and it
    may disappear from the kernel at any time.  To handle this I have simply
    added *_nofail wrappers in the kmem implementation which perform the
    retry for non-atomic allocations.
    
    From linux-2.6.31 mm/page_alloc.c:1166
    /*
     * __GFP_NOFAIL is not to be used in new code.
     *
     * All __GFP_NOFAIL callers should be fixed so that they
     * properly detect and handle allocation failures.
     *
     * We most definitely don't want callers attempting to
     * allocate greater than order-1 page units with
     * __GFP_NOFAIL.
     */
    WARN_ON_ONCE(order > 1);
    behlendorf committed Nov 12, 2009
Commits on Nov 10, 2009
  1. @behlendorf

    Linux 2.6.31 Compatibility Updates

    SPL_AC_2ARGS_SET_FS_PWD macro updated to explicitly include
    linux/fs_struct.h which was dropped from linux/sched.h.
    
    min_wmark_pages, low_wmark_pages, high_wmark_pages macros
    introduced in newer kernels.  For older kernels mm_compat.h
    was introduced to define them as needed as direct mappings
    to per zone min_pages, low_pages, max_pages.
    behlendorf committed Nov 10, 2009
Commits on Nov 2, 2009
  1. @behlendorf
Commits on Oct 30, 2009
  1. @behlendorf

    Autoconf --enable-debug-* cleanup

    Cleanup the --enable-debug-* configure options, this has been pending
    for quite some time and I am glad I finally got to it.  To summerize:
    
    1) All SPL_AC_DEBUG_* macros were updated to be a more autoconf
    friendly.  This mainly involved shift to the GNU approved usage of
    AC_ARG_ENABLE and ensuring AS_IF is used rather than directly using
    an if [ test ] construct.
    
    2) --enable-debug-kmem=yes by default.  This simply enabled keeping
    a running tally of total memory allocated and freed and reporting a
    memory leak if there was one at module unload.  Additionally, it
    ensure /proc/spl/kmem/slab will exist by default which is handy.
    The overhead is low for this and it should not impact performance.
    
    3) --enable-debug-kmem-tracking=no by default.  This option was added
    to provide a configure option to enable to detailed memory allocation
    tracking.  This support was always there but you had to know where to
    turn it on.  By default this support is disabled because it is known
    to badly hurt performence, however it is invaluable when chasing a
    memory leak.
    
    4) --enable-debug-kstat removed.  After further reflection I can't see
    why you would ever really want to turn this support off.  It is now
    always on which had the nice side effect of simplifying the proc handling
    code in spl-proc.c.  We can now always assume the top level directory
    will be there.
    
    5) --enable-debug-callb removed.  This never really did anything, it was
    put in provisionally because it might have been needed.  It turns out
    it was not so I am just removing it to prevent confusion.
    behlendorf committed Oct 30, 2009
  2. @behlendorf

    Add autoconf checks for atomic64_cmpxchg + atomic64_xchg

    These functions didn't exist for all archs prior to 2.6.24.  This
    patch addes an autoconf test to detect this and add them when needed.
    The autoconf check is needed instead of just an #ifndef because in
    the most modern kernels atomic64_{cmp}xchg are implemented as in
    inline function and not a #define.
    behlendorf committed Oct 30, 2009
  3. @behlendorf

    Use Linux atomic primitives by default.

    Previously Solaris style atomic primitives were implemented simply by
    wrapping the desired operation in a global spinlock.  This was easy to
    implement at the time when I wasn't 100% sure I could safely layer the
    Solaris atomic primatives on the Linux counterparts.  It however was
    likely not good for performance.
    
    After more investigation however it does appear the Solaris primitives
    can be layered on Linux's fairly safely.  The Linux atomic_t type really
    just wraps a long so we can simply cast the Solaris unsigned value to
    either a atomic_t or atomic64_t.  The only lingering problem for both
    implementations is that Solaris provides no atomic read function.  This
    means reading a 64-bit value on a 32-bit arch can (and will) result in
    word breaking.  I was very concerned about this initially, but upon
    further reflection it is a limitation of the Solaris API.  So really
    we are just being bug-for-bug compatible here.
    
    With this change the default implementation is layered on top of Linux
    atomic types.  However, because we're assuming a lot about the internal
    implementation of those types I've made it easy to fall-back to the
    generic approach.  Simply build with --enable-atomic_spinlocks if
    issues are encountered with the new implementation.
    behlendorf committed Oct 30, 2009
Commits on Oct 27, 2009
  1. @behlendorf
  2. @behlendorf

    Rebase cmn_err on vcmn_err and don't warn about missing \n

    The cmn_err/vcmn_err functions are layered on top of the debug
    system which usually expects a newline at the end.  However, there
    really doesn't need to be a newline there and there in fact should
    not be for the CE_CONT case so let's just drop the warning.
    
    Also we make a half-hearted attempt to handle a leading ! which
    means only send it to the syslog not the console.  In this case
    we just send to the the debug logs and not the console.
    behlendorf committed Oct 27, 2009
Commits on Oct 5, 2009
  1. @behlendorf

    Remove usage of the __id_u macro for portability.

    This macro was removed from the default RPM macro file.  Interestly,
    some of the arch specific macro's add it back it based on your distro
    but it should not be counted on.  However, __id still exists and its
    command line args have historically been fairly stable so we will
    directly use %{__id} -un to get the user name.
    behlendorf committed Oct 5, 2009
Commits on Oct 2, 2009
  1. @behlendorf

    Use kobject_set_name() for increased portability.

    As of 2.6.25 kobj->k_name was replaced with kobj->name.  Some distros
    such as RHEL5 (2.6.18) add a patch to prevent this from being a problem
    but other older distros such as SLES10 (2.6.16) have not.  To avoid
    the whole issue I'm updating the code to use kobject_set_name() which
    does what I want and has existed all the way back to 2.6.11.
    behlendorf committed Oct 2, 2009
Commits on Oct 1, 2009
  1. @behlendorf

    Set cwd to '/' for the process executing insmod.

    Ricardo has pointed out that under Solaris the cwd is set to '/'
    during module load, while under Linux it is set to the callers cwd.
    To handle this cleanly I've reworked the module *_init()/_exit()
    macros so they call a *_setup()/_cleanup() function when any SPL
    dependent module is loaded or unloaded.  This gives us a chance to
    perform any needed modification of the process, in this case changing
    the cwd.  It also handily provides a way to avoid creating wrapper
    init()/exit() functions because the Solaris and Linux prototypes
    differ slightly.  All dependent modules should now call the spl
    helper macros spl_module_{init,exit}() instead of the native linux
    versions.
    
    Unfortunately, it appears that under Linux there has been no consistent
    API in the kernel to set the cwd in a module.  Because of this I have
    had to add more autoconf magic than I'd like.  However, what I have
    done is correct and has been tested on RHEL5, SLES11, FC11, and CHAOS
    kernels.
    
    In addition, I have change the rootdir type from a 'void *' to the
    correct 'vnode_t *' type.  And I've set rootdir to a non-NULL value.
    behlendorf committed Oct 1, 2009
Commits on Sep 29, 2009
  1. @behlendorf

    Expand SEM() outside init_rwsem and directly call __init_rwsem().

    We need to directly call __init_rwsem() or the name gets expanded
    to SEM(lock-name).  This is safe and correct for the support arches
    x86/x86_64/ppc/ppc64.
    behlendorf committed Sep 29, 2009
Commits on Sep 25, 2009
  1. @behlendorf

    Reimplement mutexs for Linux lock profiling/analysis

    For a generic explanation of why mutexs needed to be reimplemented
    to work with the kernel lock profiling see commits:
      e811949 and
      d28db80
    
    The specific changes made to the mutex implemetation are as follows.
    The Linux mutex structure is now directly embedded in the kmutex_t.
    This allows a kmutex_t to be directly case to a mutex struct and
    passed directly to the Linux primative.
    
    Just like with the rwlocks it is critical that these functions be
    implemented as '#defines to ensure the location information is
    preserved.  The preprocessor can then do a direct replacement of
    the Solaris primative with the linux primative.
    
    Just as with the rwlocks we need to track the lock owner.  Here
    things get a little more interesting because depending on your
    kernel version, and how you've built your kernel Linux may already
    do this for you.  If your running a 2.6.29 or newer kernel on a
    SMP system the lock owner will be tracked.  This was added to Linux
    to support adaptive mutexs, more on that shortly.  Alternately, your
    kernel might track the lock owner if you've set CONFIG_DEBUG_MUTEXES
    in the kernel build.  If neither of the above things is true for
    your kernel the kmutex_t type will include and track the lock owner
    to ensure correct behavior.  This is all handled by a new autoconf
    check called SPL_AC_MUTEX_OWNER.
    
    Concerning adaptive mutexs these are a very recent development and
    they did not make it in to either the latest FC11 of SLES11 kernels.
    Ideally, I'd love to see this kernel change appear in one of these
    distros because it does help performance.  From Linux kernel commit:
      0d66bf6d3514b35eb6897629059443132992dbd7
      "Testing with Ingo's test-mutex application...
      gave a 345% boost for VFS scalability on my testbox"
    However, if you don't want to backport this change yourself you
    can still simply export the task_curr() symbol.  The kmutex_t
    implementation will use this symbol when it's available to
    provide it's own adaptive mutexs.
    
    Finally, DEBUG_MUTEX support was removed including the proc handlers.
    This was done because now that we are cleanly integrated with the
    kernel profiling all this information and much much more is available
    in debug kernel builds.  This code was now redundant.
    
    Update mutexs validated on:
        - SLES10   (ppc64)
        - SLES11   (x86_64)
        - CHAOS4.2 (x86_64)
        - RHEL5.3  (x86_64)
        - RHEL6    (x86_64)
        - FC11     (x86_64)
    behlendorf committed Sep 25, 2009
  2. @behlendorf

    Update rwlocks to track owner to ensure correct semantics

    The behavior of RW_*_HELD was updated because it was not quite right.
    It is not sufficient to return non-zero when the lock is help, we must
    only do this when the current task in the holder.
    
    This means we need to track the lock owner which is not something
    tracked in a Linux semaphore.  After some experimentation the
    solution I settled on was to embed the Linux semaphore at the start
    of a larger krwlock_t structure which includes the owner field.
    This maintains good performance and allows us to cleanly intergrate
    with the kernel lock analysis tools.  My reasons:
    
    1) By placing the Linux semaphore at the start of krwlock_t we can
    then simply cast krwlock_t to a rw_semaphore and pass that on to
    the linux kernel.  This allows us to use '#defines so the preprocessor
    can do direct replacement of the Solaris primative with the linux
    equivilant.  This is important because it then maintains the location
    information for each rw_* call point.
    
    2) Additionally, by adding the owner to krwlock_t we can keep this
    needed extra information adjacent to the lock itself.  This removes
    the need for a fancy lookup to get the owner which is optimal for
    performance.  We can also leverage the existing spin lock in the
    semaphore to ensure owner is updated correctly.
    
    3) All helper functions which do not need to strictly be implemented
    as a define to preserve location information can be done as a static
    inline function.
    
    4) Adding the owner to krwlock_t allows us to remove all memory
    allocations done during lock initialization.  This is good for all
    the obvious reasons, we do give up the ability to specific the lock
    name.  The Linux profiling tools will stringify the lock name used
    in the code via the preprocessor and use that.
    
    Update rwlocks validated on:
    - SLES10   (ppc64)
    - SLES11   (x86_64)
    - CHAOS4.2 (x86_64)
    - RHEL5.3  (x86_64)
    - RHEL6    (x86_64)
    - FC11     (x86_64)
    behlendorf committed Sep 25, 2009
Commits on Sep 18, 2009
  1. @behlendorf

    Reimplement rwlocks for Linux lock profiling/analysis.

    It turns out that the previous rwlock implementation worked well but
    did not integrate properly with the upstream kernel lock profiling/
    analysis tools.  This is a major problem since it would be awfully
    nice to be able to use the automatic lock checker and profiler.
    
    The problem is that the upstream lock tools use the pre-processor
    to create a lock class for each uniquely named locked.  Since the
    rwsem was embedded in a wrapper structure the name was always the
    same.  The effect was that we only ended up with one lock class for
    the entire SPL which caused the lock dependency checker to flag
    nearly everything as a possible deadlock.
    
    The solution was to directly map a krwlock to a Linux rwsem using
    a typedef there by eliminating the wrapper structure.  This was not
    done initially because the rwsem implementation is specific to the arch.
    To fully implement the Solaris krwlock API using only the provided rwsem
    API is not possible.  It can only be done by directly accessing some of
    the internal data member of the rwsem structure.
    
    For example, the Linux API provides a different function for dropping
    a reader vs writer lock.  Whereas the Solaris API uses the same function
    and the caller does not pass in what type of lock it is.  This means to
    properly drop the lock we need to determine if the lock is currently a
    reader or writer lock.  Then we need to call the proper Linux API function.
    Unfortunately, there is no provided API for this so we must extracted this
    information directly from arch specific lock implementation.  This is
    all do able, and what I did, but it does complicate things considerably.
    
    The good news is that in addition to the profiling benefits of this
    change.  We may see performance improvements due to slightly reduced
    overhead when creating rwlocks and manipulating them.
    
    The only function I was forced to sacrafice was rw_owner() because this
    information is simply not stored anywhere in the rwsem.  Luckily this
    appears not to be a commonly used function on Solaris, and it is my
    understanding it is mainly used for debugging anyway.
    
    In addition to the core rwlock changes, extensive updates were made to
    the rwlock regression tests.  Each class of test was extended to provide
    more API coverage and to be more rigerous in checking for misbehavior.
    
    This is a pretty significant change and with that in mind I have been
    careful to validate it on several platforms before committing.  The full
    SPLAT regression test suite was run numberous times on all of the following
    platforms.  This includes various kernels ranging from 2.6.16 to 2.6.29.
    
    - SLES10   (ppc64)
    - SLES11   (x86_64)
    - CHAOS4.2 (x86_64)
    - RHEL5.3  (x86_64)
    - RHEL6    (x86_64)
    - FC11     (x86_64)
    behlendorf committed Sep 18, 2009
Commits on Aug 14, 2009
  1. @behlendorf

    Various spec file tweaks to handle rpm building of several distros.

    Supported and tested distros now include SLES10, SLES11, Chaos 4.x,
    RHEL5, and Fedora 11.  This update was mainly to address rebuildable
    kernel module rpms, and correct rpm dependencies for each distro.
    behlendorf committed Aug 14, 2009
Commits on Aug 13, 2009
  1. @behlendorf

    Explicit check for requires_* rpm defines

    Due to different distros and/or versions of rpm mishandling the shorthand
    syntax simply use the longer version which get interpreted correctly.
    behlendorf committed Aug 13, 2009
Commits on Aug 4, 2009
  1. @behlendorf

    Tag spl-0.4.5.

    Update the ChangeLog with a summary of the changes since the last release
    and update the META file to reflect the new version number.
    behlendorf committed Aug 4, 2009
Commits on Jul 31, 2009
  1. @behlendorf
Commits on Jul 30, 2009
  1. @behlendorf

    Disable stack overflow checking by default.

    The run time stack overflow checking is being disabled by default
    because it is not safe for use with 2.6.29 and latter kernels.  These
    kernels do now have their own stack overflow checking so this support
    has become redundant anyway.  It can be re-enabled for older kernels or
    arches without stack overflow checking by redefining CHECK_STACK().
    behlendorf committed Jul 30, 2009
Commits on Jul 28, 2009
  1. @behlendorf

    Update global_page_state() support for 2.6.29 kernels.

    Basically everything we need to monitor the global memory state of
    the system is now cleanly available via global_page_state().  The
    problem is that this interface is still fairly recent, and there
    has been one change in the page state enum which we need to handle.
    These changes basically boil down to the following:
    - If global_page_state() is available we should use it.  Several
      autoconf checks have been added to detect the correct enum names.
    - If global_page_state() is not available check to see if
      get_zone_counts() symbol is available and use that.
    - If the get_zone_counts() symbol is not exported we have no choice
      be to dynamically aquire it at load time.  This is an absolute
      last resort for old kernel which we don't want to patch to
      cleanly export the symbol.
    behlendorf committed Jul 28, 2009
  2. @behlendorf

    Remove get/put_task_struct as they are not available for SLES11

    This interface is going away, and it's not as if most callers actually
    use crhold/crfree when working with credentials.  So it'll be okay
    they we're not taking a reference on the task structure the odds of
    it going away while working with a credential and pretty small.
    behlendorf committed Jul 28, 2009