Permalink
Commits on Apr 19, 2015
  1. nilfs2-kmod6 v1.0 release

    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 18, 2015
  2. README: correct spell of CentOS

    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 18, 2015
Commits on Apr 18, 2015
  1. nilfs2: improve execution time of NILFS_IOCTL_GET_CPINFO ioctl

    The older a filesystem gets, the slower lscp command becomes.  This is
    because nilfs_cpfile_do_get_cpinfo() function meets more hole blocks
    as the start offset of valid checkpoint numbers gets bigger.
    
    This reduces the overhead by skipping hole blocks efficiently with
    nilfs_mdt_find_block() helper.
    
    A measurement result of this patch is as follows:
    
    Before:
    $ time lscp
                     CNO        DATE     TIME  MODE  FLG      BLKCNT       ICNT
                 5769303  2015-02-22 19:31:33   cp    -          108          1
                 5769304  2015-02-22 19:38:54   cp    -          108          1
    
    real    0m0.182s
    user    0m0.003s
    sys     0m0.180s
    
    After:
    $ time lscp
                     CNO        DATE     TIME  MODE  FLG      BLKCNT       ICNT
                 5769303  2015-02-22 19:31:33   cp    -          108          1
                 5769304  2015-02-22 19:38:54   cp    -          108          1
    
    real    0m0.003s
    user    0m0.001s
    sys     0m0.002s
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 16, 2015
  2. nilfs2: add helper to find existent block on metadata file

    Add a new metadata file function, nilfs_mdt_find_block(), which finds
    an existent block on a metadata file in a given range of blocks.  This
    function skips continuous hole blocks efficiently by using
    nilfs_bmap_seek_key().
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 16, 2015
  3. nilfs2: add bmap function to seek a valid key

    Add a new bmap function, nilfs_bmap_seek_key(), which seeks a valid
    entry and returns its key starting from a given key.  This function
    can be used to skip hole blocks efficiently.
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 16, 2015
  4. nilfs2: unify type of key arguments in bmap interface

    The type of key arguments in block mapping interface varies depending
    on function.  For instance, nilfs_bmap_lookup_at_level() takes "__u64"
    for its key argument whereas nilfs_bmap_lookup() takes "unsigned
    long".
    
    This fits them to "__u64" to eliminate the variation.
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 16, 2015
  5. nilfs2: use set_mask_bits() for operations on buffer state bitmap

    nilfs_forget_buffer(), nilfs_clear_dirty_page(), and
    nilfs_segctor_complete_write() are using a bunch of atomic bit operations
    against buffer state bitmap.
    
    This reduces the number of them by utilizing set_mask_bits() macro.
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    [add reference to "kern_feature.h" to page.c]
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 16, 2015
  6. Add mimic of set_mask_bits() helper for old kernels

    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 18, 2015
  7. nilfs2: do not use async write flag for segment summary buffers

    The async write flag is introduced to nilfs2 in the commit 7f42ec394156
    ("nilfs2: fix issue with race condition of competition between segments
    for dirty blocks"), but the flag only makes sense for data buffers and
    btree node buffers.  It is not needed for segment summary buffers.
    
    This gets rid of the latter uses as part of refactoring of atomic bit
    operations on buffer state bitmap.
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 16, 2015
  8. nilfs2: use bgl_lock_ptr()

    Simplify nilfs_mdt_bgl_lock() by utilizing bgl_lock_ptr() helper in
    <linux/blockgroup_lock.h>.
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 16, 2015
  9. nilfs2: use inode_set_flags() in nilfs_set_inode_flags()

    Use inode_set_flags() to atomically set i_flags instead of clearing out
    the S_IMMUTABLE, S_APPEND, etc.  flags and then setting them from the
    FS_IMMUTABLE_FL, FS_APPEND_FL flags to avoid a race where an immutable
    file has the immutable flag cleared for a brief window of time.
    
    This is a similar fix to commit 5f16f3225b06 ("ext4: atomically set
    inode->i_flags in ext4_set_inode_flags()").
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: "Theodore Ts'o" <tytso@mit.edu>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 17, 2015
  10. Add mimic of inode_set_flags() for old kernels

    inode_set_flags() is used to atomically set inode->i_flags.
    This adds the function as an inline to make it available
    in kernel 3.15 and earlier.
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 18, 2015
  11. nilfs2: put out gfp mask manipulation from nilfs_set_inode_flags()

    nilfs_set_inode_flags() function adjusts gfp-mask of inode->i_mapping as
    well as i_flags, however, this coupling of operations is not appropriate.
    
    For instance, nilfs_ioctl_setflags(), one of three callers of
    nilfs_set_inode_flags(), doesn't need to reinitialize the gfp-mask at all.
     In addition, nilfs_new_inode(), another caller of
    nilfs_set_inode_flags(), doesn't either because it has already initialized
    the gfp-mask.
    
    Only __nilfs_read_inode(), the remaining caller, needs it.  So, this moves
    the gfp mask manipulation to __nilfs_read_inode() from
    nilfs_set_inode_flags().
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 16, 2015
  12. nilfs2: fix gcc warning at nilfs_checkpoint_is_mounted()

    Fix the following build warning:
    
     fs/nilfs2/super.c: In function 'nilfs_checkpoint_is_mounted':
     fs/nilfs2/super.c:1023:10: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
       if (cno < 0 || cno > nilfs->ns_cno)
               ^
    
    This warning indicates that the comparision "cno < 0" is useless because
    variable "cno" has an unsigned integer type "__u64".
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Reported-by: David Binderman <dcb314@hotmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 16, 2015
Commits on Mar 3, 2015
  1. nilfs2: fix potential memory overrun on inode

    Each inode of nilfs2 stores a root node of a b-tree, and it turned out to
    have a memory overrun issue:
    
    Each b-tree node of nilfs2 stores a set of key-value pairs and the number
    of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as
    a few other "bn_*" members.
    
    Since the value of "bn_nchildren" is used for operations on the key-values
    within the b-tree node, it can cause memory access overrun if a large
    number is incorrectly set to "bn_nchildren".
    
    For instance, nilfs_btree_node_lookup() function determines the range of
    binary search with it, and too large "bn_nchildren" leads
    nilfs_btree_node_get_key() in that function to overrun.
    
    As for intermediate b-tree nodes, this is prevented by a sanity check
    performed when each node is read from a drive, however, no sanity check
    has been done for root nodes stored in inodes.
    
    This patch fixes the issue by adding missing sanity check against b-tree
    root nodes so that it's called when on-memory inodes are read from ifile,
    inode metadata file.
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Feb 27, 2015
Commits on Feb 7, 2015
  1. nilfs2-kmod-centos6: 0.6.3 release

    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Feb 7, 2015
Commits on Jan 17, 2015
  1. nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races

    Same story as in commit 41080b5a2401 ("nfsd race fixes: ext2") (similar
    ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
    of insert_inode_locked() and a bug of a check for dead inodes needs to
    be fixed.
    
    If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
    insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
    the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.
    
    If nilfs_iget() is called before nilfs_new_inode() calls
    insert_inode_locked4(), it will create an in-core inode and read its
    data from the on-disk inode.  But, nilfs_iget() will find i_nlink equals
    zero and fail at nilfs_read_inode_common(), which will lead it to call
    iget_failed() and cleanly fail.
    
    However, this sanity check doesn't work as expected for reused on-disk
    inodes because they leave a non-zero value in i_mode field and it
    hinders the test of i_nlink.  This patch also fixes the issue by
    removing the test on i_mode that nilfs2 doesn't need.
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Dec 10, 2014
Commits on Oct 19, 2014
  1. nilfs2-kmod-centos6: 0.6.2 release

    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Oct 19, 2014
Commits on Oct 18, 2014
  1. nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

    Under normal circumstances nilfs_sync_fs() writes out the super block,
    which causes a flush of the underlying block device.  But this depends
    on the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
    last segment crosses a segment boundary.  So if only a small amount of
    data is written before the call to nilfs_sync_fs(), no flush of the
    block device occurs.
    
    In the above case an additional call to blkdev_issue_flush() is needed.
    To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device
    is introduced, which is cleared whenever new logs are written and set
    whenever the block device is flushed.  For convenience the function
    nilfs_flush_device() is added, which contains the above logic.
    
    Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    zeitgeist87 committed with konis Oct 13, 2014
Commits on Sep 27, 2014
  1. nilfs2-kmod-centos6: 0.6.1 release

    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Sep 27, 2014
  2. nilfs2: fix data loss with mmap()

    This bug leads to reproducible silent data loss, despite the use of
    msync(), sync() and a clean unmount of the file system.  It is easily
    reproducible with the following script:
    
      ----------------[BEGIN SCRIPT]--------------------
      mkfs.nilfs2 -f /dev/sdb
      mount /dev/sdb /mnt
    
      dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
    
      umount /mnt
      mount /dev/sdb /mnt
      CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
    
      /root/mmaptest/mmaptest /mnt/testfile 30 10 5
    
      sync
      CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
      umount /mnt
      mount /dev/sdb /mnt
      CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
      umount /mnt
    
      echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
      echo "AFTER MMAP:\t$CHECKSUM_AFTER"
      echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
      ----------------[END SCRIPT]--------------------
    
    The mmaptest tool looks something like this (very simplified, with
    error checking removed):
    
      ----------------[BEGIN mmaptest]--------------------
      data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
                  MAP_SHARED, fd, file_offset);
    
      for (i = 0; i < write_count; ++i) {
            memcpy(data + i * 4096, buf, sizeof(buf));
            msync(data, file_size - file_offset, MS_SYNC))
      }
      ----------------[END mmaptest]--------------------
    
    The output of the script looks something like this:
    
      BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
      AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
      AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
    
    So it is clear, that the changes done using mmap() do not survive a
    remount.  This can be reproduced a 100% of the time.  The problem was
    introduced in commit 136e8770cd5d ("nilfs2: fix issue of
    nilfs_set_page_dirty() for page at EOF boundary").
    
    If the page was read with mpage_readpage() or mpage_readpages() for
    example, then it has no buffers attached to it.  In that case
    page_has_buffers(page) in nilfs_set_page_dirty() will be false.
    Therefore nilfs_set_file_dirty() is never called and the pages are never
    collected and never written to disk.
    
    This patch fixes the problem by also calling nilfs_set_file_dirty() if the
    page has no buffers attached to it.
    
    [akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
    Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
    Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    zeitgeist87 committed with konis Sep 25, 2014
Commits on Apr 28, 2014
  1. nilfs2-kmod: use mnt_want_write instead of mnt_want_write_file

    This replaces mnt_{want,drop}_write_file() mistakenly backported in
    nilfs_ioctl_set_suinfo() function with mnt_{want,drop}_write().
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 20, 2014
Commits on Apr 19, 2014
  1. nilfs2-kmod: 0.6.0 release

    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 19, 2014
Commits on Apr 6, 2014
  1. nilfs2: update project's web site in nilfs2.txt

    Project's web site was moved to nilfs.sourceforge.net from
    www.nilfs.org.  This updates the site information in
    Documentation/filesystems/nilfs2.txt with the new location.
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 3, 2014
  2. nilfs2: verify metadata sizes read from disk

    Add code to check sizes of on-disk data of metadata files such as inode
    size, segment usage size, DAT entry size, and checkpoint size.  Although
    these sizes are read from disk, the current implementation doesn't check
    them.
    
    If these sizes are not sane on disk, it can cause out-of-range access to
    metadata or memory access overrun on metadata block buffers due to
    overflow in sundry calculations.
    
    Both lower limit and upper limit of metadata sizes are verified to
    prevent these issues.
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: Andreas Rohner <andreas.rohner@gmx.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 3, 2014
  3. nilfs2: add FITRIM ioctl support for nilfs2

    Add support for the FITRIM ioctl, which enables user space tools to
    issue TRIM/DISCARD requests to the underlying device.  Every clean
    segment within the specified range will be discarded.
    
    Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    zeitgeist87 committed with konis Apr 3, 2014
  4. nilfs2-kmod: add glue code to support FITRIM ioctl

    Declare FITRIM ioctl and add fstrim_range struct if the kernel doesn't
    have those declarations.
    
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Apr 6, 2014
  5. nilfs2: add nilfs_sufile_trim_fs to trim clean segs

    Add nilfs_sufile_trim_fs(), which takes an fstrim_range structure and
    calls blkdev_issue_discard for every clean segment in the specified
    range.  The range is truncated to file system block boundaries.
    
    Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    [support old kernels which had deprecated barrier bio]
    [add KM_USER0 argument to k{map,unmap}_atomic() to fix build]
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    zeitgeist87 committed with konis Apr 3, 2014
  6. nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl

    With this ioctl the segment usage entries in the SUFILE can be updated
    from userspace.
    
    This is useful, because it allows the userspace GC to modify and update
    segment usage entries for specific segments, which enables it to avoid
    unnecessary write operations.
    
    If a segment needs to be cleaned, but there is no or very little
    reclaimable space in it, the cleaning operation basically degrades to a
    useless moving operation.  In the end the only thing that changes is the
    location of the data and a timestamp in the segment usage information.
    With this ioctl the GC can skip the cleaning and update the segment
    usage entries directly instead.
    
    This is basically a shortcut to cleaning the segment.  It is still
    necessary to read the segment summary information, but the writing of
    the live blocks can be skipped if it's not worth it.
    
    [konishi.ryusuke@lab.ntt.co.jp: add description of NILFS_IOCTL_SET_SUINFO ioctl]
    Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    zeitgeist87 committed with konis Apr 3, 2014
  7. nilfs2: add nilfs_sufile_set_suinfo to update segment usage

    Introduce nilfs_sufile_set_suinfo(), which expects an array of
    nilfs_suinfo_update structures and updates the segment usage information
    accordingly.
    
    This is basically a helper function for the newly introduced
    NILFS_IOCTL_SET_SUINFO ioctl.
    
    [konishi.ryusuke@lab.ntt.co.jp: use put_bh() instead of brelse() because we know bh != NULL]
    Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    [add KM_USER0 argument to k{map,kunmap}_atomic() to fix build]
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    zeitgeist87 committed with konis Apr 3, 2014
  8. nilfs2: add struct nilfs_suinfo_update and flags

    Add the nilfs_suinfo_update structure, which contains the information
    needed to update one segment usage entry.  The flags specify, which
    fields need to be updated.
    
    Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    zeitgeist87 committed with konis Apr 3, 2014
Commits on Feb 28, 2014
  1. nilfs2: add comments for ioctls

    Add comments for ioctls in fs/nilfs2/ioctl.c file and describe NILFS2
    specific ioctls in Documentation/filesystems/nilfs2.txt.
    
    Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
    Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: Wenliang Fan <fanwlexca@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    dubeyko committed with konis Jan 23, 2014
Commits on Jan 18, 2014
  1. nilfs2-kmod: 0.5.1 release

    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    konis committed Jan 18, 2014
Commits on Jan 16, 2014
  1. nilfs2: fix segctor bug that causes file system corruption

    There is a bug in the function nilfs_segctor_collect, which results in
    active data being written to a segment, that is marked as clean.  It is
    possible, that this segment is selected for a later segment
    construction, whereby the old data is overwritten.
    
    The problem shows itself with the following kernel log message:
    
      nilfs_sufile_do_cancel_free: segment 6533 must be clean
    
    Usually a few hours later the file system gets corrupted:
    
      NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
      NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)
    
    The issue can be reproduced with a file system that is nearly full and
    with the cleaner running, while some IO intensive task is running.
    Although it is quite hard to reproduce.
    
    This is what happens:
    
     1. The cleaner starts the segment construction
     2. nilfs_segctor_collect is called
     3. sc_stage is on NILFS_ST_SUFILE and segments are freed
     4. sc_stage is on NILFS_ST_DAT current segment is full
     5. nilfs_segctor_extend_segments is called, which
        allocates a new segment
     6. The new segment is one of the segments freed in step 3
     7. nilfs_sufile_cancel_freev is called and produces an error message
     8. Loop around and the collection starts again
     9. sc_stage is on NILFS_ST_SUFILE and segments are freed
        including the newly allocated segment, which will contain active
        data and can be allocated at a later time
    10. A few hours later another segment construction allocates the
        segment and causes file system corruption
    
    This can be prevented by simply reordering the statements.  If
    nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
    the freed segments are marked as dirty and cannot be allocated any more.
    
    Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
    Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    zeitgeist87 committed with konis Jan 15, 2014
Commits on Oct 24, 2013
  1. nilfs2: fix issue with race condition of competition between segments…

    … for dirty blocks
    
    Many NILFS2 users were reported about strange file system corruption
    (for example):
    
       NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
       NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)
    
    But such error messages are consequence of file system's issue that takes
    place more earlier.  Fortunately, Jerome Poulin <jeromepoulin@gmail.com>
    and Anton Eliasson <devel@antoneliasson.se> were reported about another
    issue not so recently.  These reports describe the issue with segctor
    thread's crash:
    
      BUG: unable to handle kernel paging request at 0000000000004c83
      IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]
    
      Call Trace:
       nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
       nilfs_segctor_construct+0x17b/0x290 [nilfs2]
       nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
       kthread+0xc0/0xd0
       ret_from_fork+0x7c/0xb0
    
    These two issues have one reason.  This reason can raise third issue
    too.  Third issue results in hanging of segctor thread with eating of
    100% CPU.
    
    REPRODUCING PATH:
    
    One of the possible way or the issue reproducing was described by
    Jermoe me Poulin <jeromepoulin@gmail.com>:
    
    1. init S to get to single user mode.
    2. sysrq+E to make sure only my shell is running
    3. start network-manager to get my wifi connection up
    4. login as root and launch "screen"
    5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
    6. lscp | xz -9e > lscp.txt.xz
    7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
    8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
    9. start a screen and launch strace -f -o find-cat.log -t find
    /mnt/nilfs -type f -exec cat {} > /dev/null \;
    10. start a screen and launch strace -f -o apt-get.log -t apt-get update
    11. launch the last command again as it did not crash the first time
    12. apt-get crashes
    13. ps aux > ps-aux-crashed.log
    13. sysrq+W
    14. sysrq+E  wait for everything to terminate
    15. sysrq+SUSB
    
    Simplified way of the issue reproducing is starting kernel compilation
    task and "apt-get update" in parallel.
    
    REPRODUCIBILITY:
    
    The issue is reproduced not stable [60% - 80%].  It is very important to
    have proper environment for the issue reproducing.  The critical
    conditions for successful reproducing:
    
    (1) It should have big modified file by mmap() way.
    
    (2) This file should have the count of dirty blocks are greater that
        several segments in size (for example, two or three) from time to time
        during processing.
    
    (3) It should be intensive background activity of files modification
        in another thread.
    
    INVESTIGATION:
    
    First of all, it is possible to see that the reason of crash is not valid
    page address:
    
      NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
      NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783
    
    Moreover, value of b_page (0x1a82) is 6786.  This value looks like segment
    number.  And b_blocknr with b_size values look like block numbers.  So,
    buffer_head's pointer points on not proper address value.
    
    Detailed investigation of the issue is discovered such picture:
    
      [-----------------------------SEGMENT 6783-------------------------------]
      NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
      NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
      NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
      NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
      NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
      NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
      NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
      NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783
    
      [-----------------------------SEGMENT 6784-------------------------------]
      NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
      NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
      NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
      NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
      NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
      NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
      NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
      NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
      NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
      NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
      NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
      NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
      NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
      [----------] ditto
      NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15
    
      [-----------------------------SEGMENT 6785-------------------------------]
      NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
      NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
      NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
      NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
      NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
      NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
      NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
      NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
      NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
      NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
      NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
      NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
      [----------] ditto
      NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12
    
      NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
      NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
      NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
      NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785
    
      NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
    
      BUG: unable to handle kernel paging request at 0000000000001a82
      IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
    
    Usually, for every segment we collect dirty files in list.  Then, dirty
    blocks are gathered for every dirty file, prepared for write and
    submitted by means of nilfs_segbuf_submit_bh() call.  Finally, it takes
    place complete write phase after calling nilfs_end_bio_write() on the
    block layer.  Buffers/pages are marked as not dirty on final phase and
    processed files removed from the list of dirty files.
    
    It is possible to see that we had three prepare_write and submit_bio
    phases before segbuf_wait and complete_write phase.  Moreover, segments
    compete between each other for dirty blocks because on every iteration
    of segments processing dirty buffer_heads are added in several lists of
    payload_buffers:
    
      [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
      [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
    
    The next pointer is the same but prev pointer has changed.  It means
    that buffer_head has next pointer from one list but prev pointer from
    another.  Such modification can be made several times.  And, finally, it
    can be resulted in various issues: (1) segctor hanging, (2) segctor
    crashing, (3) file system metadata corruption.
    
    FIX:
    This patch adds:
    
    (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
        for every proccessed dirty block;
    
    (2) checking of BH_Async_Write flag in
        nilfs_lookup_dirty_data_buffers() and
        nilfs_lookup_dirty_node_buffers();
    
    (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
        nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
    
    Reported-by: Jerome Poulin <jeromepoulin@gmail.com>
    Reported-by: Anton Eliasson <devel@antoneliasson.se>
    Cc: Paul Fertser <fercerpav@gmail.com>
    Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
    Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
    Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net>
    Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
    Cc: Elmer Zhang <freeboy6716@gmail.com>
    Cc: Kenneth Langga <klangga@gmail.com>
    Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
    Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    dubeyko committed with konis Sep 30, 2013