Skip to content
Commits on Jul 25, 2011
  1. LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

    committed
    Change-Id: Iea0556a006f8826f8597824131fb5110a848c434
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
  2. LU-407 Fix lustre-modules rpm name in spec file.

    committed
    RHEL and Fedora kernels include ".%{_arch}" in the kernel's version
    string.  Since Lustre includes the kernel version in its rpms, it
    can appear that a double architecture name appears at the end.  Really
    one is from the kernel, and one is from Lustre.  Trying to change that
    is probably more trouble than it is worth.
    
    This seems to cause problems with rpm's understanding of rpm names.  To
    fix that this patch takes Andreas's suggestion of grep'ing the full
    rpm name from "rpm -q kernel-modules".
    
    We also add "|| true" to the last line of the %preun scriptlet so that
    failure does not aport installation.
    
    Finally, we also remove the redundant greps in the postun scriptlet.
    
    Change-Id: I2c71e853e28ec6e0907eb4ea7c3205ca6e5dd873
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Commits on Jul 23, 2011
  1. @prakashsurya

    LU-455 Replace DIST_SOURCES with EXTRA_DIST

    prakashsurya committed with
    Resolve autoconf-2.63 warnings mainly by replacing DIST_SOURCES
    with EXTRA_DIST.  Additionally, the non-portable $(shell) contruct
    was removed from tests/module.mk and the sources simply enumated.
    Finally, the incorrect EXTRA_SOURCES instance was removed from
    the top level autoMakefile.am.
    
      Running automake-1.11 -a -c -W no-portability
      libcfs/libcfs/autoMakefile.am:92:
              variable `DIST_SOURCES' is defined but no program or
      libcfs/libcfs/autoMakefile.am:92:
              library has `DIST' as canonical name (possible typo)
      lnet/klnds/mxlnd/autoMakefile.am:44:
              variable `DIST_SOURCES' is defined but no program or
      lnet/klnds/mxlnd/autoMakefile.am:44:
              library has `DIST' as canonical name (possible typo)
      ...
    
    Also, as a result of the above changes, libcfs/libcfs/autoMakefile.am had to be
    modified in order to allow 'make dist' to succeed.
    
    As it turns out, libcfsutils_a_SOURCES was incorrect. That list contained
    references to nonexistent 'util/[parser|platform].h' files. The assumption is
    it intended to reference libcfs/include/libcfs/util/[parser|platform].h.
    
    To fix the issue, both [parser|platform].h references were removed from the
    list. This produces a simple solution that maintains consistency with the rest
    of the build system. The caveat being that libcfsutil.a won't automatically be
    rebuilt if either of the intended [parser|platform].h files are modified.
    
    Change-Id: Ia81eb1e3fc219f6dac4c7da234f7e736754c5440
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    Signed-off-by: Prakash Surya <surya1@llnl.gov>
Commits on Jul 22, 2011
  1. Revert "LU-455 Replace DIST_SOURCES with EXTRA_DIST"

    committed
    This reverts commit 0749f65.
  2. @prakashsurya

    LU-455 Replace DIST_SOURCES with EXTRA_DIST

    prakashsurya committed with
    Resolve autoconf-2.63 warnings mainly by replacing DIST_SOURCES
    with EXTRA_DIST.  Additionally, the non-portable $(shell) contruct
    was removed from tests/module.mk and the sources simply enumated.
    Finally, the incorrect EXTRA_SOURCES instance was removed from
    the top level autoMakefile.am.
    
      Running automake-1.11 -a -c -W no-portability
      libcfs/libcfs/autoMakefile.am:92:
              variable `DIST_SOURCES' is defined but no program or
      libcfs/libcfs/autoMakefile.am:92:
              library has `DIST' as canonical name (possible typo)
      lnet/klnds/mxlnd/autoMakefile.am:44:
              variable `DIST_SOURCES' is defined but no program or
      lnet/klnds/mxlnd/autoMakefile.am:44:
              library has `DIST' as canonical name (possible typo)
      ...
    
    Also, as a result of the above changes, libcfs/libcfs/autoMakefile.am had to be
    modified in order to allow 'make dist' to succeed.
    
    As it turns out, libcfsutils_a_SOURCES was incorrect. That list contained
    references to nonexistent 'util/[parser|platform].h' files. The assumption is
    it intended to reference libcfs/include/libcfs/util/[parser|platform].h.
    
    To fix the issue, both [parser|platform].h references were removed from the
    list. This produces a simple solution that maintains consistency with the rest
    of the build system. The caveat being that libcfsutil.a won't automatically be
    rebuilt if either of the intended [parser|platform].h files are modified.
    
    Change-Id: Ia81eb1e3fc219f6dac4c7da234f7e736754c5440
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    Signed-off-by: Prakash Surya <surya1@llnl.gov>
    
    Conflicts:
    
    	autoMakefile.am
  3. @prakashsurya

    Revert "LU-455 Replace DIST_SOURCES with EXTRA_DIST"

    prakashsurya committed with
    This reverts commit bb60c75.
Commits on Jul 19, 2011
  1. LU-394: LND failure casued by discontiguous KIOV

    Jinshan Xiong committed with
    This issue was imported by bug 18881 where I moved the urgent
    pages to front of lop_pending to fix a deadlock issue.
    I reverted bug 18881 in this patch and came up with a new solution:
    cl_page_gang_lookup() only blocks on the first page. This is also
    for deadlock avoid since we should never grab multiple pages' lock
    without try method.
    
    Change-Id: I5dce35e3929e4f79a350e56ddc9e752269db060e
    Signed-off-by: Jinshan Xiong <jay@whamcloud.com>
  2. LU-481 Don't store 'transient' page in radix tree

    Niu Yawei committed with
    - We currently store both 'transient' page and inode page in the
      same radix tree, which will cause trouble for the race handling
      of concurrent dio and buffered read, imagine the following case:
    
      dio created a 'transient' page for a given file offset in the
      radix tree, while a concurrent buffered read on the same offset
      happened, the reader will try to find the exsting cached page by
      searching the radix tree, however, the 'transient' page is found,
      and the read will fail for -EBUSY at the end.
    
      To make the situation worse, there are two level of radix trees for
      a give file page (object and sub-object), above race can happen
      in both levels and we have to make sure page type consistence
      between these two levels.
    
      Actually, it doesn't make sense to store these disposable 'transient'
      page in the radix tree, so we just remove them in this patch.
    
    - In cl_page_alloc(), if the coo_page_init() fails, we should call
      cl_page_delete0() to break the linkage between vmpage and cl_page
      before calling cl_page_free().
    
    Signed-off-by: Niu Yawei <niu@whamcloud.com>
    Change-Id: If2fa85495e6e78b330571b3348ac55a7796a4a9f
    Reviewed-on: http://review.whamcloud.com/1072
    Tested-by: Hudson
    Tested-by: Maloo <whamcloud.maloo@gmail.com>
    Reviewed-by: Jinshan Xiong <jay@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
  3. @prakashsurya

    LU-488 ptlrpc_connection_put() LASSERT(!cfs_hlist_unhashed(&conn->c_h…

    Lai Siyao committed with prakashsurya
    …ash))
    
    Connection hash may be rehashed while ptlrpc_connection_put() is
    called, ASSERT &conn->c_refcount > 1 instead of this.
    
    Signed-off-by: Lai Siyao <laisiyao@whamcloud.com>
    Change-Id: Iec6d35419e0c4d8497bd0b84c6210abc8eb23882
Commits on Jul 14, 2011
  1. @behlendorf

    LU-455 Replace DIST_SOURCES with EXTRA_DIST

    behlendorf committed with
    Resolve autoconf-2.63 warnings mainly by replacing DIST_SOURCES
    with EXTRA_DIST.  Additionally, the non-portable $(shell) contruct
    was removed from tests/module.mk and the sources simply enumated.
    Finally, the incorrect EXTRA_SOURCES instance was removed from
    the top level autoMakefile.am.
    
      Running automake-1.11 -a -c -W no-portability
      libcfs/libcfs/autoMakefile.am:92:
              variable `DIST_SOURCES' is defined but no program or
      libcfs/libcfs/autoMakefile.am:92:
              library has `DIST' as canonical name (possible typo)
      lnet/klnds/mxlnd/autoMakefile.am:44:
              variable `DIST_SOURCES' is defined but no program or
      lnet/klnds/mxlnd/autoMakefile.am:44:
              library has `DIST' as canonical name (possible typo)
      ...
    
    Change-Id: Ia81eb1e3fc219f6dac4c7da234f7e736754c5440
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Commits on Jul 9, 2011
  1. b=18066 Tune adaptive timeouts

    committed
    Simul test 4 causes soft lockups on OSTs llnl bug 761.  Because of this
    we need to increase at_min to 15 seconds to avoid this failure when
    we are dealing with roughly 10,000 pending locks.  Once this
    algorithm is improved we may be able to remove this.
    
    We have also increase at_early_margin to 10 seconds because we do
    see what looks like excessive network transit times and this gives
    us a bit larger hedge against that.
    
    Change-Id: I3128edcf04d44c6e1d95793ae7e72ea872280089
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Jim Garlick <garlick@llnl.gov>
  2. b=18066 Tune fsfilt slow disk transaction warning

    committed
    Sun bug 18066, attachment 20976
    
    Increase threshold for 'slow' disk transaction warnings to the console.
    By default this is set to 30 seconds which is far to low.  On hype/igs
    with 500+ service threads per OSS and a heavy load its not uncommon to
    see 100+ second service delays.  We only want to see these warning on
    the console when things are moving very slowly, these errors will still
    go to the internal debug log.
    
    We should use INFO/WARN instead of WARN/ERROR for the slow messages.
    Not only is there no real error here but it fixes an annoying quirk
    of the message formatting.  With the old levels you would see the
    messages formatted differently based on the time.
    
      Lustre: lc1-OST0001: slow parent lock 289s due to heavy IO load
      LustreError: 0-0: lc1-OST0001: slow parent lock 324s due to heavy IO load
    
    With the new levels things are more consistent.
    
      Lustre: lc1-OST0001: slow parent lock 289s due to heavy IO load
      Lustre: lc1-OST0001: slow parent lock 324s due to heavy IO load
    
    Change-Id: Ieae13875f7baf32ad7d17661cedf4ec3e2f92320
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
  3. Increase the watchdog factors

    committed
    Under a heavy IO load it is not uncommon to see service threads take
    200-450 seconds to complete due to slow journal starts.  This patch
    increases the watchdog factors to the point where we should only get
    watchdogs for real deadlocks.
    
    Change-Id: I24f62c763feab36852d7025584d22eb728852a6b
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
  4. LU-429 Fix and quiet debug message in filter_connect

    committed
    The comment refers to mds_connect(), which no longer exits.  So I
    just removed the comment.
    
    The message says "Received MDS connection", but this function is
    used when clients connect to OSTs as well.  We don't need to see
    this message on the console all of the time, so I changed it to
    D_INFO.
    
    Change-Id: Iebfbaafe39df702862ade126b979528855173d5c
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
  5. LU-407 Fix lustre-modules rpm name in spec file.

    committed
    On RHEL systems rpm names end in .%{_arch}.  Without the correct name,
    the rpm command does not find the package, and the %preun script exits
    with an error and prevents rpm package removal (without a force option).
    
    Change-Id: I2c71e853e28ec6e0907eb4ea7c3205ca6e5dd873
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
  6. @nedbass

    LU-107 Add scripts for implementing heartbeat v1 failover

    nedbass committed with
    /usr/sbin/ldev - list devices, determine validity, etc.
    /usr/sbin/lhbadm - wrapper for heartbeat utils for failover/failback/status
    /etc/ha.d/resource.d/Lustre - heartbeat resource agent (wraps init script)
    /etc/init.d/lustre - lustre init script
    /etc/init.d/lnet - lnet init script
    /usr/sbin/haconfig - helper script for building heartbeat config files
    
    The scripts use two configuration files:
     /etc/ldev.conf - maps hostnames to failover partners, devices, and labels
     /etc/nids - hostnames to lustre NIDS
    
    In addition to heartbeat support, the ldev script enables parallel
    execution of commands against all luns configured on a server.  The
    lustre init script supports devices backed by Linux software RAID, ZFS,
    or traditional block devices.
    
    NOTE: these scripts presume the udev rules for persistent block device
    naming are in place, in particular that lustre labels can be mapped to
    block devices in /dev/disk/by-id.
    
    Change-Id: I8391744ce6eed989c061f131aca4a2da7b5d51b2
    Signed-off-by: Ned Bass <bass6@llnl.gov>
  7. LU-278 Improve regex for output of git describe

    committed
    Improve the regex in version_tag-git.pl to properly parse the output
    then the current commit is tagged, and when the tag names are longer
    than just the digits used in the upstream version tags.
    
    Addition from Ned:
    
    Trim trailing newline from git describe output in the
    case where the current commit is tagged.
    
    Change-Id: I0a6a1b54f9e2dbd381a6d00f9dcf47d9b8b66616
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
  8. Add BuildRequires lustre-ldiskfs-devel for chaos 5

    committed
    Change-Id: I9f6e088322aee6e73e2ab7b11020f8863b7d010b
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
  9. Improve lustre.spec support for RHEL6

    committed
    Add appropriate Requires and BuildRequires.
    Make spec search /usr/src/kernels for the files installed by
    RHEL's kernel-devel package, instead of looking in
    /lib/modules/`uname -r`.  This will allow lustre to build in a
    minimal mock environment.
    
    Change-Id: Ided90609c0fe5b187d1787f796bf1369f5c8016c
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
  10. LU-278 Don't abort configure if tag is of unrecognized form.

    committed
    The configure process should NOT abort just because the most
    recent tag is not of the form that upstream uses to tag Lustre.
    Downstream developers may use their own tags, or just add
    extensions to upsteam's version tags.
    
    Change-Id: I9a98bfd4475d3df2694f536ba0352779a62650c7
  11. @nedbass

    Fix for building ldiskfs with weak updates

    nedbass committed with
    With ldiskfs packaged for weak updates support, the file glob used to
    locate the lustre-ldiskfs-devel directory,
    /usr/src/lustre-ldiskfs-*/$LINUXRELEASE, will not work if ldiskfs was
    built against a kernel version other than $LINUXRELEASE.  Instead we
    just use the newest installed version of lustre-ldiskfs-devel.
  12. @nedbass

    LU-227 external lustre-ldiskfs package integration

    nedbass committed with
    Add a configure option --with-ldiskfs-devel to build Lustre against the
    externally maintained lustre-ldiskfs package.  This package maintains
    ldiskfs as a standalone codebase rather than copying and patching ext
    sources from the kernel.  The intent is to ease the burden of porting
    Lustre to new kernels by supporting multiple kernel APIs within ldiskfs
    with the help of autoconf. Thus we eliminate the need to maintain a
    separate patch stack for each supported kernel.
    
    Signed-off-by: Ned Bass <bass6@llnl.gov>
    Change-Id: Iaa17d0483e155439c8de206af81f19a321bbeaa1
  13. @nedbass

    LU-251 Fix gcc configure warnings

    nedbass committed with
    Newer versions of gcc are getting smart enough to detect the sloppy
    syntax used for the autoconf tests.  It is now generating warnings for
    unused or uninitialized variables.  Newer versions of gcc even have the
    -Wunused-but-set-variable option set by default.  This isn't a problem
    except when -Werror is set and they get promoted to an error.  In this
    case the autoconf test will return an incorrect result which will result
    in a build failure or runtime error later on.
    
    To handle this I'm tightening up many of the autoconf tests to
    explicitly mark variables as unused to suppress the gcc warning.  Tests
    emitting uninitialized variable errors are updated to initialize
    pointers to NULL, and some variables are converted to pointers to
    accomodate this.  'Argument makes integer from pointer without a cast'
    errors were fixed by passing 0 for the offending argument in cases where
    we are not explicitly testing the argument type.  0 is accepted as both
    an integer and a pointer.
    
    Signed-off-by: Ned Bass <bass6@llnl.gov>
    Change-Id: Idaa04b04308e3cd994b0d802a5ee1eb5c90f9be6
  14. Increase lnet hash table by 4X.

    committed
    With the number of peers we see in production, we really think this
    should be larger.
    
    Change-Id: Ia6f43caed4f3c89316fe768c2347cdccc163e7ce
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
  15. b=23352 don't close connection for timedout ZC_REQ

    Liang Zhen committed with
    Change-Id: Id863c6d9e5180b3dcc8612f6be3e0da9f419c193
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
  16. Enable asymmetric router failure detection by default

    committed
    Change-Id: I17be5f605e76c256fff4eebabe3a2781ecad8074
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
  17. b=21103 Make asymmetric router failure parameters tunable

    Joseph Herring committed with
    Change-Id: Ie36f79d01c35d4c11c4532187abdeb9473ea60b4
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Commits on Jul 8, 2011
  1. New version 2.0.65

    Oleg Drokin committed
    Change-Id: Iab44e6d976cf13d12d525d16c8aeaf2c4bc2d423
    Signed-off-by: Oleg Drokin <green@whamcloud.com>
  2. LU-469 Build with lustre own kernel config file.

    yangsheng committed with Oleg Drokin
    -- Use lustre own rhel6 kernel config files in lbuild.
    -- Rebase config files in latest vendor update.
    -- Correct config file diff output.
    
    Change-Id: I9ee5115009de5bcb00084dcac2e204774178a8c1
    Signed-off-by: Yang Sheng <ys@whamcloud.com>
    Reviewed-on: http://review.whamcloud.com/1034
    Tested-by: Hudson
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
  3. LU-479 sanity 124a failed

    Lai Siyao committed with Oleg Drokin
    add message to help debug.
    
    Signed-off-by: Lai Siyao <laisiyao@whamcloud.com>
    Change-Id: I4e675ed3eeb509c9ba4f1c25420596c0d5357459
    Reviewed-on: http://review.whamcloud.com/1052
    Tested-by: Hudson
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
  4. LU-136 change "force_over_16tb" mount option to "force_over_128tb"

    Yu Jian committed with Oleg Drokin
    Change the "force_over_16tb" mount option to "force_over_128tb" and
    rename the ext4-force_over_16tb-*.patch to ext4-force_over_128tb-*.patch
    after testing and validating the 128TB LUN.
    
    Signed-off-by: Yu Jian <yujian@whamcloud.com>
    Change-Id: I19c73280cf2934112aefab8976d7eac18915529a
    Reviewed-on: http://review.whamcloud.com/1073
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Tested-by: Hudson
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
  5. LU-477 allocate memory for s_group_desc and s_group_info by vmalloc()

    Yu Jian committed with Oleg Drokin
    Large kmalloc() for sbi->s_group_desc and sbi->s_group_info can fail
    for large filesystems, which will cause the "not enough memory" error
    while mounting. This patch makes it fall back to vmalloc() if the
    kmalloc() failed, as what was done for sbi->s_flex_groups.
    
    To avoid colliding with an valid on-disk inode number, EXT4_BAD_INO
    is used as the number of the buddy cache inode.
    
    The patch also incorporates the following upstream kernel fix:
    
    commit	32a9bb57d7c1fd04ae0f72b8f671501f000a0e9f
    ext4: fix missing iput of root inode for some mount error paths
    https://bugzilla.kernel.org/show_bug.cgi?id=26752
    
    Signed-off-by: Yu Jian <yujian@whamcloud.com>
    Change-Id: I3950425835ea7f2968ceb2edbc622e3ff3ed8545
    Reviewed-on: http://review.whamcloud.com/1071
    Tested-by: Hudson
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Tested-by: Maloo <whamcloud.maloo@gmail.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
  6. LU-168 Fix schedule race in sanityn PDO lock tests

    nasf committed with Oleg Drokin
    In sanityn PDO lock tests, even if the second operation is blocked by
    the first one on server-side, after the blocking, the second one may
    be finished earlier than the first one because of client-side schedule
    order. So sleep a sec before check_pdo_conflict() to ensure the first
    operation is finished after OBD_FAIL_MDS_PDO_LOCK barriers.
    
    Change-Id: I62412d74e17be012ee6660179ad77375c196671d
    Signed-off-by: nasf <yong.fan@whamcloud.com>
    Reviewed-on: http://review.whamcloud.com/1030
    Tested-by: Hudson
    Reviewed-by: Mikhail Pershin <tappro@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
Something went wrong with that request. Please try again.