Permalink
Commits on Jul 25, 2011
  1. LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

    Change-Id: Iea0556a006f8826f8597824131fb5110a848c434
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    committed Jul 20, 2011
  2. LU-407 Fix lustre-modules rpm name in spec file.

    RHEL and Fedora kernels include ".%{_arch}" in the kernel's version
    string.  Since Lustre includes the kernel version in its rpms, it
    can appear that a double architecture name appears at the end.  Really
    one is from the kernel, and one is from Lustre.  Trying to change that
    is probably more trouble than it is worth.
    
    This seems to cause problems with rpm's understanding of rpm names.  To
    fix that this patch takes Andreas's suggestion of grep'ing the full
    rpm name from "rpm -q kernel-modules".
    
    We also add "|| true" to the last line of the %preun scriptlet so that
    failure does not aport installation.
    
    Finally, we also remove the redundant greps in the postun scriptlet.
    
    Change-Id: I2c71e853e28ec6e0907eb4ea7c3205ca6e5dd873
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    committed Jun 9, 2011
  3. Revert "LU-407 Fix lustre-modules rpm name in spec file."

    This reverts commit 4d96a5b.
    committed Jul 25, 2011
Commits on Jul 23, 2011
  1. LU-455 Replace DIST_SOURCES with EXTRA_DIST

    Resolve autoconf-2.63 warnings mainly by replacing DIST_SOURCES
    with EXTRA_DIST.  Additionally, the non-portable $(shell) contruct
    was removed from tests/module.mk and the sources simply enumated.
    Finally, the incorrect EXTRA_SOURCES instance was removed from
    the top level autoMakefile.am.
    
      Running automake-1.11 -a -c -W no-portability
      libcfs/libcfs/autoMakefile.am:92:
              variable `DIST_SOURCES' is defined but no program or
      libcfs/libcfs/autoMakefile.am:92:
              library has `DIST' as canonical name (possible typo)
      lnet/klnds/mxlnd/autoMakefile.am:44:
              variable `DIST_SOURCES' is defined but no program or
      lnet/klnds/mxlnd/autoMakefile.am:44:
              library has `DIST' as canonical name (possible typo)
      ...
    
    Also, as a result of the above changes, libcfs/libcfs/autoMakefile.am had to be
    modified in order to allow 'make dist' to succeed.
    
    As it turns out, libcfsutils_a_SOURCES was incorrect. That list contained
    references to nonexistent 'util/[parser|platform].h' files. The assumption is
    it intended to reference libcfs/include/libcfs/util/[parser|platform].h.
    
    To fix the issue, both [parser|platform].h references were removed from the
    list. This produces a simple solution that maintains consistency with the rest
    of the build system. The caveat being that libcfsutil.a won't automatically be
    rebuilt if either of the intended [parser|platform].h files are modified.
    
    Change-Id: Ia81eb1e3fc219f6dac4c7da234f7e736754c5440
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    Signed-off-by: Prakash Surya <surya1@llnl.gov>
    prakashsurya committed with Jul 22, 2011
Commits on Jul 22, 2011
  1. Revert "LU-455 Replace DIST_SOURCES with EXTRA_DIST"

    This reverts commit 0749f65.
    committed Jul 22, 2011
  2. LU-455 Replace DIST_SOURCES with EXTRA_DIST

    Resolve autoconf-2.63 warnings mainly by replacing DIST_SOURCES
    with EXTRA_DIST.  Additionally, the non-portable $(shell) contruct
    was removed from tests/module.mk and the sources simply enumated.
    Finally, the incorrect EXTRA_SOURCES instance was removed from
    the top level autoMakefile.am.
    
      Running automake-1.11 -a -c -W no-portability
      libcfs/libcfs/autoMakefile.am:92:
              variable `DIST_SOURCES' is defined but no program or
      libcfs/libcfs/autoMakefile.am:92:
              library has `DIST' as canonical name (possible typo)
      lnet/klnds/mxlnd/autoMakefile.am:44:
              variable `DIST_SOURCES' is defined but no program or
      lnet/klnds/mxlnd/autoMakefile.am:44:
              library has `DIST' as canonical name (possible typo)
      ...
    
    Also, as a result of the above changes, libcfs/libcfs/autoMakefile.am had to be
    modified in order to allow 'make dist' to succeed.
    
    As it turns out, libcfsutils_a_SOURCES was incorrect. That list contained
    references to nonexistent 'util/[parser|platform].h' files. The assumption is
    it intended to reference libcfs/include/libcfs/util/[parser|platform].h.
    
    To fix the issue, both [parser|platform].h references were removed from the
    list. This produces a simple solution that maintains consistency with the rest
    of the build system. The caveat being that libcfsutil.a won't automatically be
    rebuilt if either of the intended [parser|platform].h files are modified.
    
    Change-Id: Ia81eb1e3fc219f6dac4c7da234f7e736754c5440
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    Signed-off-by: Prakash Surya <surya1@llnl.gov>
    
    Conflicts:
    
    	autoMakefile.am
    prakashsurya committed with Jul 22, 2011
  3. Revert "LU-455 Replace DIST_SOURCES with EXTRA_DIST"

    This reverts commit bb60c75.
    prakashsurya committed with Jul 22, 2011
Commits on Jul 19, 2011
  1. LU-394: LND failure casued by discontiguous KIOV

    This issue was imported by bug 18881 where I moved the urgent
    pages to front of lop_pending to fix a deadlock issue.
    I reverted bug 18881 in this patch and came up with a new solution:
    cl_page_gang_lookup() only blocks on the first page. This is also
    for deadlock avoid since we should never grab multiple pages' lock
    without try method.
    
    Change-Id: I5dce35e3929e4f79a350e56ddc9e752269db060e
    Signed-off-by: Jinshan Xiong <jay@whamcloud.com>
    Jinshan Xiong committed with Jul 12, 2011
  2. LU-481 Don't store 'transient' page in radix tree

    - We currently store both 'transient' page and inode page in the
      same radix tree, which will cause trouble for the race handling
      of concurrent dio and buffered read, imagine the following case:
    
      dio created a 'transient' page for a given file offset in the
      radix tree, while a concurrent buffered read on the same offset
      happened, the reader will try to find the exsting cached page by
      searching the radix tree, however, the 'transient' page is found,
      and the read will fail for -EBUSY at the end.
    
      To make the situation worse, there are two level of radix trees for
      a give file page (object and sub-object), above race can happen
      in both levels and we have to make sure page type consistence
      between these two levels.
    
      Actually, it doesn't make sense to store these disposable 'transient'
      page in the radix tree, so we just remove them in this patch.
    
    - In cl_page_alloc(), if the coo_page_init() fails, we should call
      cl_page_delete0() to break the linkage between vmpage and cl_page
      before calling cl_page_free().
    
    Signed-off-by: Niu Yawei <niu@whamcloud.com>
    Change-Id: If2fa85495e6e78b330571b3348ac55a7796a4a9f
    Reviewed-on: http://review.whamcloud.com/1072
    Tested-by: Hudson
    Tested-by: Maloo <whamcloud.maloo@gmail.com>
    Reviewed-by: Jinshan Xiong <jay@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Niu Yawei committed with Jul 8, 2011
  3. LU-488 ptlrpc_connection_put() LASSERT(!cfs_hlist_unhashed(&conn->c_h…

    …ash))
    
    Connection hash may be rehashed while ptlrpc_connection_put() is
    called, ASSERT &conn->c_refcount > 1 instead of this.
    
    Signed-off-by: Lai Siyao <laisiyao@whamcloud.com>
    Change-Id: Iec6d35419e0c4d8497bd0b84c6210abc8eb23882
    Lai Siyao committed with prakashsurya Jul 8, 2011
Commits on Jul 14, 2011
  1. LU-455 Replace DIST_SOURCES with EXTRA_DIST

    Resolve autoconf-2.63 warnings mainly by replacing DIST_SOURCES
    with EXTRA_DIST.  Additionally, the non-portable $(shell) contruct
    was removed from tests/module.mk and the sources simply enumated.
    Finally, the incorrect EXTRA_SOURCES instance was removed from
    the top level autoMakefile.am.
    
      Running automake-1.11 -a -c -W no-portability
      libcfs/libcfs/autoMakefile.am:92:
              variable `DIST_SOURCES' is defined but no program or
      libcfs/libcfs/autoMakefile.am:92:
              library has `DIST' as canonical name (possible typo)
      lnet/klnds/mxlnd/autoMakefile.am:44:
              variable `DIST_SOURCES' is defined but no program or
      lnet/klnds/mxlnd/autoMakefile.am:44:
              library has `DIST' as canonical name (possible typo)
      ...
    
    Change-Id: Ia81eb1e3fc219f6dac4c7da234f7e736754c5440
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    behlendorf committed with Jul 13, 2011
Commits on Jul 9, 2011
  1. b=18066 Tune adaptive timeouts

    Simul test 4 causes soft lockups on OSTs llnl bug 761.  Because of this
    we need to increase at_min to 15 seconds to avoid this failure when
    we are dealing with roughly 10,000 pending locks.  Once this
    algorithm is improved we may be able to remove this.
    
    We have also increase at_early_margin to 10 seconds because we do
    see what looks like excessive network transit times and this gives
    us a bit larger hedge against that.
    
    Change-Id: I3128edcf04d44c6e1d95793ae7e72ea872280089
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Jim Garlick <garlick@llnl.gov>
    committed Dec 18, 2009
  2. b=18066 Tune fsfilt slow disk transaction warning

    Sun bug 18066, attachment 20976
    
    Increase threshold for 'slow' disk transaction warnings to the console.
    By default this is set to 30 seconds which is far to low.  On hype/igs
    with 500+ service threads per OSS and a heavy load its not uncommon to
    see 100+ second service delays.  We only want to see these warning on
    the console when things are moving very slowly, these errors will still
    go to the internal debug log.
    
    We should use INFO/WARN instead of WARN/ERROR for the slow messages.
    Not only is there no real error here but it fixes an annoying quirk
    of the message formatting.  With the old levels you would see the
    messages formatted differently based on the time.
    
      Lustre: lc1-OST0001: slow parent lock 289s due to heavy IO load
      LustreError: 0-0: lc1-OST0001: slow parent lock 324s due to heavy IO load
    
    With the new levels things are more consistent.
    
      Lustre: lc1-OST0001: slow parent lock 289s due to heavy IO load
      Lustre: lc1-OST0001: slow parent lock 324s due to heavy IO load
    
    Change-Id: Ieae13875f7baf32ad7d17661cedf4ec3e2f92320
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    committed Dec 18, 2009
  3. Increase the watchdog factors

    Under a heavy IO load it is not uncommon to see service threads take
    200-450 seconds to complete due to slow journal starts.  This patch
    increases the watchdog factors to the point where we should only get
    watchdogs for real deadlocks.
    
    Change-Id: I24f62c763feab36852d7025584d22eb728852a6b
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    committed Dec 18, 2009
  4. LU-429 Fix and quiet debug message in filter_connect

    The comment refers to mds_connect(), which no longer exits.  So I
    just removed the comment.
    
    The message says "Received MDS connection", but this function is
    used when clients connect to OSTs as well.  We don't need to see
    this message on the console all of the time, so I changed it to
    D_INFO.
    
    Change-Id: Iebfbaafe39df702862ade126b979528855173d5c
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    committed Jun 17, 2011
  5. LU-407 Fix lustre-modules rpm name in spec file.

    On RHEL systems rpm names end in .%{_arch}.  Without the correct name,
    the rpm command does not find the package, and the %preun script exits
    with an error and prevents rpm package removal (without a force option).
    
    Change-Id: I2c71e853e28ec6e0907eb4ea7c3205ca6e5dd873
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    committed Jun 9, 2011
  6. LU-107 Add scripts for implementing heartbeat v1 failover

    /usr/sbin/ldev - list devices, determine validity, etc.
    /usr/sbin/lhbadm - wrapper for heartbeat utils for failover/failback/status
    /etc/ha.d/resource.d/Lustre - heartbeat resource agent (wraps init script)
    /etc/init.d/lustre - lustre init script
    /etc/init.d/lnet - lnet init script
    /usr/sbin/haconfig - helper script for building heartbeat config files
    
    The scripts use two configuration files:
     /etc/ldev.conf - maps hostnames to failover partners, devices, and labels
     /etc/nids - hostnames to lustre NIDS
    
    In addition to heartbeat support, the ldev script enables parallel
    execution of commands against all luns configured on a server.  The
    lustre init script supports devices backed by Linux software RAID, ZFS,
    or traditional block devices.
    
    NOTE: these scripts presume the udev rules for persistent block device
    naming are in place, in particular that lustre labels can be mapped to
    block devices in /dev/disk/by-id.
    
    Change-Id: I8391744ce6eed989c061f131aca4a2da7b5d51b2
    Signed-off-by: Ned Bass <bass6@llnl.gov>
    nedbass committed with Feb 26, 2011
  7. LU-278 Improve regex for output of git describe

    Improve the regex in version_tag-git.pl to properly parse the output
    then the current commit is tagged, and when the tag names are longer
    than just the digits used in the upstream version tags.
    
    Addition from Ned:
    
    Trim trailing newline from git describe output in the
    case where the current commit is tagged.
    
    Change-Id: I0a6a1b54f9e2dbd381a6d00f9dcf47d9b8b66616
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    committed May 5, 2011
  8. Add BuildRequires lustre-ldiskfs-devel for chaos 5

    Change-Id: I9f6e088322aee6e73e2ab7b11020f8863b7d010b
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    committed May 6, 2011
  9. Improve lustre.spec support for RHEL6

    Add appropriate Requires and BuildRequires.
    Make spec search /usr/src/kernels for the files installed by
    RHEL's kernel-devel package, instead of looking in
    /lib/modules/`uname -r`.  This will allow lustre to build in a
    minimal mock environment.
    
    Change-Id: Ided90609c0fe5b187d1787f796bf1369f5c8016c
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    committed May 10, 2011
  10. LU-278 Don't abort configure if tag is of unrecognized form.

    The configure process should NOT abort just because the most
    recent tag is not of the form that upstream uses to tag Lustre.
    Downstream developers may use their own tags, or just add
    extensions to upsteam's version tags.
    
    Change-Id: I9a98bfd4475d3df2694f536ba0352779a62650c7
    committed May 5, 2011
  11. Fix for building ldiskfs with weak updates

    With ldiskfs packaged for weak updates support, the file glob used to
    locate the lustre-ldiskfs-devel directory,
    /usr/src/lustre-ldiskfs-*/$LINUXRELEASE, will not work if ldiskfs was
    built against a kernel version other than $LINUXRELEASE.  Instead we
    just use the newest installed version of lustre-ldiskfs-devel.
    nedbass committed with May 4, 2011
  12. LU-227 external lustre-ldiskfs package integration

    Add a configure option --with-ldiskfs-devel to build Lustre against the
    externally maintained lustre-ldiskfs package.  This package maintains
    ldiskfs as a standalone codebase rather than copying and patching ext
    sources from the kernel.  The intent is to ease the burden of porting
    Lustre to new kernels by supporting multiple kernel APIs within ldiskfs
    with the help of autoconf. Thus we eliminate the need to maintain a
    separate patch stack for each supported kernel.
    
    Signed-off-by: Ned Bass <bass6@llnl.gov>
    Change-Id: Iaa17d0483e155439c8de206af81f19a321bbeaa1
    nedbass committed with Apr 20, 2011
  13. LU-251 Fix gcc configure warnings

    Newer versions of gcc are getting smart enough to detect the sloppy
    syntax used for the autoconf tests.  It is now generating warnings for
    unused or uninitialized variables.  Newer versions of gcc even have the
    -Wunused-but-set-variable option set by default.  This isn't a problem
    except when -Werror is set and they get promoted to an error.  In this
    case the autoconf test will return an incorrect result which will result
    in a build failure or runtime error later on.
    
    To handle this I'm tightening up many of the autoconf tests to
    explicitly mark variables as unused to suppress the gcc warning.  Tests
    emitting uninitialized variable errors are updated to initialize
    pointers to NULL, and some variables are converted to pointers to
    accomodate this.  'Argument makes integer from pointer without a cast'
    errors were fixed by passing 0 for the offending argument in cases where
    we are not explicitly testing the argument type.  0 is accepted as both
    an integer and a pointer.
    
    Signed-off-by: Ned Bass <bass6@llnl.gov>
    Change-Id: Idaa04b04308e3cd994b0d802a5ee1eb5c90f9be6
    nedbass committed with Apr 21, 2011
  14. Increase lnet hash table by 4X.

    With the number of peers we see in production, we really think this
    should be larger.
    
    Change-Id: Ia6f43caed4f3c89316fe768c2347cdccc163e7ce
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    committed Sep 3, 2010
  15. b=23352 don't close connection for timedout ZC_REQ

    Change-Id: Id863c6d9e5180b3dcc8612f6be3e0da9f419c193
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    Liang Zhen committed with Aug 13, 2010
  16. Enable asymmetric router failure detection by default

    Change-Id: I17be5f605e76c256fff4eebabe3a2781ecad8074
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    committed May 12, 2010
  17. b=21103 Make asymmetric router failure parameters tunable

    Change-Id: Ie36f79d01c35d4c11c4532187abdeb9473ea60b4
    Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
    Joseph Herring committed with May 12, 2010
Commits on Jul 8, 2011
  1. New version 2.0.65

    Change-Id: Iab44e6d976cf13d12d525d16c8aeaf2c4bc2d423
    Signed-off-by: Oleg Drokin <green@whamcloud.com>
    Oleg Drokin committed Jul 8, 2011
  2. LU-469 Build with lustre own kernel config file.

    -- Use lustre own rhel6 kernel config files in lbuild.
    -- Rebase config files in latest vendor update.
    -- Correct config file diff output.
    
    Change-Id: I9ee5115009de5bcb00084dcac2e204774178a8c1
    Signed-off-by: Yang Sheng <ys@whamcloud.com>
    Reviewed-on: http://review.whamcloud.com/1034
    Tested-by: Hudson
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    yangsheng committed with Oleg Drokin Jun 28, 2011
  3. LU-479 sanity 124a failed

    add message to help debug.
    
    Signed-off-by: Lai Siyao <laisiyao@whamcloud.com>
    Change-Id: I4e675ed3eeb509c9ba4f1c25420596c0d5357459
    Reviewed-on: http://review.whamcloud.com/1052
    Tested-by: Hudson
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Lai Siyao committed with Oleg Drokin Jul 5, 2011
  4. LU-136 change "force_over_16tb" mount option to "force_over_128tb"

    Change the "force_over_16tb" mount option to "force_over_128tb" and
    rename the ext4-force_over_16tb-*.patch to ext4-force_over_128tb-*.patch
    after testing and validating the 128TB LUN.
    
    Signed-off-by: Yu Jian <yujian@whamcloud.com>
    Change-Id: I19c73280cf2934112aefab8976d7eac18915529a
    Reviewed-on: http://review.whamcloud.com/1073
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Tested-by: Hudson
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Yu Jian committed with Oleg Drokin Jul 8, 2011
  5. LU-477 allocate memory for s_group_desc and s_group_info by vmalloc()

    Large kmalloc() for sbi->s_group_desc and sbi->s_group_info can fail
    for large filesystems, which will cause the "not enough memory" error
    while mounting. This patch makes it fall back to vmalloc() if the
    kmalloc() failed, as what was done for sbi->s_flex_groups.
    
    To avoid colliding with an valid on-disk inode number, EXT4_BAD_INO
    is used as the number of the buddy cache inode.
    
    The patch also incorporates the following upstream kernel fix:
    
    commit	32a9bb57d7c1fd04ae0f72b8f671501f000a0e9f
    ext4: fix missing iput of root inode for some mount error paths
    https://bugzilla.kernel.org/show_bug.cgi?id=26752
    
    Signed-off-by: Yu Jian <yujian@whamcloud.com>
    Change-Id: I3950425835ea7f2968ceb2edbc622e3ff3ed8545
    Reviewed-on: http://review.whamcloud.com/1071
    Tested-by: Hudson
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Tested-by: Maloo <whamcloud.maloo@gmail.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    Yu Jian committed with Oleg Drokin Jul 7, 2011
  6. LU-168 Fix schedule race in sanityn PDO lock tests

    In sanityn PDO lock tests, even if the second operation is blocked by
    the first one on server-side, after the blocking, the second one may
    be finished earlier than the first one because of client-side schedule
    order. So sleep a sec before check_pdo_conflict() to ensure the first
    operation is finished after OBD_FAIL_MDS_PDO_LOCK barriers.
    
    Change-Id: I62412d74e17be012ee6660179ad77375c196671d
    Signed-off-by: nasf <yong.fan@whamcloud.com>
    Reviewed-on: http://review.whamcloud.com/1030
    Tested-by: Hudson
    Reviewed-by: Mikhail Pershin <tappro@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>
    nasf committed with Oleg Drokin Jul 8, 2011