Commits on Aug 20, 2010
  1. Linux

    gregkh committed Aug 20, 2010
  2. mm: fix up some user-visible effects of the stack guard page

    commit d782437 upstream.
    This commit makes the stack guard page somewhat less visible to user
    space. It does this by:
     - not showing the guard page in /proc/<pid>/maps
       It looks like lvm-tools will actually read /proc/self/maps to figure
       out where all its mappings are, and effectively do a specialized
       "mlockall()" in user space.  By not showing the guard page as part of
       the mapping (by just adding PAGE_SIZE to the start for grows-up
       pages), lvm-tools ends up not being aware of it.
     - by also teaching the _real_ mlock() functionality not to try to lock
       the guard page.
       That would just expand the mapping down to create a new guard page,
       so there really is no point in trying to lock it in place.
    It would perhaps be nice to show the guard page specially in
    /proc/<pid>/maps (or at least mark grow-down segments some way), but
    let's not open ourselves up to more breakage by user space from programs
    that depends on the exact deails of the 'maps' file.
    Special thanks to Henrique de Moraes Holschuh for diving into lvm-tools
    source code to see what was going on with the whole new warning.
    Reported-and-tested-by: François Valenduc <
    Reported-by: Henrique de Moraes Holschuh <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Greg Kroah-Hartman <>
    torvalds committed with gregkh Aug 15, 2010
  3. mm: fix page table unmap for stack guard page properly

    commit 11ac552 upstream.
    We do in fact need to unmap the page table _before_ doing the whole
    stack guard page logic, because if it is needed (mainly 32-bit x86 with
    PAE and CONFIG_HIGHPTE, but other architectures may use it too) then it
    will do a kmap_atomic/kunmap_atomic.
    And those kmaps will create an atomic region that we cannot do
    allocations in.  However, the whole stack expand code will need to do
    anon_vma_prepare() and vma_lock_anon_vma() and they cannot do that in an
    atomic region.
    Now, a better model might actually be to do the anon_vma_prepare() when
    _creating_ a VM_GROWSDOWN segment, and not have to worry about any of
    this at page fault time.  But in the meantime, this is the
    straightforward fix for the issue.
    See for details.
    Reported-by: Wylda <>
    Reported-by: Sedat Dilek <>
    Reported-by: Mike Pagano <>
    Reported-by: François Valenduc <>
    Tested-by: Ed Tomlinson <>
    Cc: Pekka Enberg <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Greg Kroah-Hartman <>
    torvalds committed with gregkh Aug 14, 2010
Commits on Aug 13, 2010
  1. Linux

    gregkh committed Aug 13, 2010
  2. x86: don't send SIGBUS for kernel page faults

    commit 9605456 upstream.
    It's wrong for several reasons, but the most direct one is that the
    fault may be for the stack accesses to set up a previous SIGBUS.  When
    we have a kernel exception, the kernel exception handler does all the
    fixups, not some user-level signal handler.
    Even apart from the nested SIGBUS issue, it's also wrong to give out
    kernel fault addresses in the signal handler info block, or to send a
    SIGBUS when a system call already returns EFAULT.
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Greg Kroah-Hartman <>
    torvalds committed with gregkh Aug 13, 2010
  3. mm: fix missing page table unmap for stack guard page failure case

    commit 5528f91 upstream.
    .. which didn't show up in my tests because it's a no-op on x86-64 and
    most other architectures.  But we enter the function with the last-level
    page table mapped, and should unmap it at exit.
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Greg Kroah-Hartman <>
    torvalds committed with gregkh Aug 13, 2010
  4. mm: keep a guard page below a grow-down stack segment

    commit 320b2b8 upstream.
    This is a rather minimally invasive patch to solve the problem of the
    user stack growing into a memory mapped area below it.  Whenever we fill
    the first page of the stack segment, expand the segment down by one
    Now, admittedly some odd application might _want_ the stack to grow down
    into the preceding memory mapping, and so we may at some point need to
    make this a process tunable (some people might also want to have more
    than a single page of guarding), but let's try the minimal approach
    Tested with trivial application that maps a single page just below the
    stack, and then starts recursing.  Without this, we will get a SIGSEGV
    _after_ the stack has smashed the mapping.  With this patch, we'll get a
    nice SIGBUS just as the stack touches the page just above the mapping.
    Requested-by: Keith Packard <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Greg Kroah-Hartman <>
    torvalds committed with gregkh Aug 13, 2010
  5. mm: fix corruption of hibernation caused by reusing swap during image…

    … saving
    commit 966cca0 upstream.
    Since 2.6.31, swap_map[]'s refcounting was changed to show that a used
    swap entry is just for swap-cache, can be reused.  Then, while scanning
    free entry in swap_map[], a swap entry may be able to be reclaimed and
    reused.  It was caused by commit c9e4441 ("mm: reuse unused swap
    entry if necessary").
    But this caused deta corruption at resume. The scenario is
    - Assume a clean-swap cache, but mapped.
    - at hibernation_snapshot[], clean-swap-cache is saved as
      clean-swap-cache and swap_map[] is marked as SWAP_HAS_CACHE.
    - then, save_image() is called.  And reuse SWAP_HAS_CACHE entry to save
      image, and break the contents.
    After resume:
    - the memory reclaim runs and finds clean-not-referenced-swap-cache and
      discards it because it's marked as clean.  But here, the contents on
      disk and swap-cache is inconsistent.
    Hance memory is corrupted.
    This patch avoids the bug by not reclaiming swap-entry during hibernation.
    This is a quick fix for backporting.
    Signed-off-by: KAMEZAWA Hiroyuki <>
    Cc: Rafael J. Wysocki <>
    Reported-by: Ondreg Zary <>
    Tested-by: Ondreg Zary <>
    Tested-by: Andrea Gelmini <>
    Signed-off-by: Hugh Dickins <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Greg Kroah-Hartman <>
    hkamezawa committed with gregkh Aug 11, 2010
  6. md/raid1: delay reads that could overtake behind-writes.

    commit e555190 upstream.
    When a raid1 array is configured to support write-behind
    on some devices, it normally only reads from other devices.
    If all devices are write-behind (because the rest have failed)
    it is possible for a read request to be serviced before a
    behind-write request, which would appear as data corruption.
    So when forced to read from a WriteMostly device, wait for any
    write-behind to complete, and don't start any more behind-writes.
    Signed-off-by: NeilBrown <>
    Signed-off-by: Greg Kroah-Hartman <>
    neilbrown committed with gregkh Mar 31, 2010
  7. ibmvfc: Reduce error recovery timeout

    commit daa142d upstream.
    If a command times out resulting in EH getting invoked, we wait for the
    aborted commands to come back after sending the abort. Shorten
    the amount of time we wait for these responses, to ensure we don't
    get stuck in EH for several minutes.
    Signed-off-by: Brian King <>
    Signed-off-by: James Bottomley <>
    Signed-off-by: Greg Kroah-Hartman <>
    bjking1 committed with gregkh Apr 20, 2010
  8. ibmvfc: Fix command completion handling

    commit f5832fa upstream.
    Commands which are completed by the VIOS are placed on a CRQ
    in kernel memory for the ibmvfc driver to process. Each CRQ
    entry is 16 bytes. The ibmvfc driver reads the first 8 bytes
    to check if the entry is valid, then reads the next 8 bytes to get
    the handle, which is a pointer the completed command. This fixes
    an issue seen on Power 7 where the processor reordered the
    loads from memory, resulting in processing command completion
    with a stale handle. This could result in command timeouts,
    and also early completion of commands.
    Signed-off-by: Brian King <>
    Signed-off-by: James Bottomley <>
    Signed-off-by: Greg Kroah-Hartman <>
    bjking1 committed with gregkh Apr 20, 2010
  9. aic79xx: check for non-NULL scb in ahd_handle_nonpkt_busfree

    commit 534ef05 upstream.
    When removing several devices aic79xx will occasionally Oops
    in ahd_handle_nonpkt_busfree during rescan. Looking at the
    code I found that we're indeed not checking if the scb in
    question is NULL. So check for it before accessing it.
    Signed-off-by: Hannes Reinecke <>
    Signed-off-by: James Bottomley <>
    Signed-off-by: Greg Kroah-Hartman <>
    hreinecke committed with gregkh Jan 15, 2010
  10. loop: Update mtime when writing using aops

    commit 02246c4 upstream.
    Update mtime when writing to backing filesystem using the address space
    operations write_begin and write_end.
    Signed-off-by: Nikanth Karthikesan <>
    Signed-off-by: Jens Axboe <>
    Signed-off-by: Greg Kroah-Hartman <>
    Nikanth Karthikesan committed with gregkh Apr 8, 2010
  11. Skip check for mandatory locks when unlocking

    commit ee860b6 upstream.
    ocfs2_lock() will skip locks on file which has mode set to 02666. This
    is a problem in cases where the mode of the file is changed after a
    process has obtained a lock on the file.
    ocfs2_lock() should skip the check for mandatory locks when unlocking a
    Signed-off-by: Sachin Prabhu <>
    Signed-off-by: Joel Becker <>
    Signed-off-by: Neil Brown <>
    Signed-off-by: Greg Kroah-Hartman <>
    Sachin Prabhu committed with gregkh Mar 10, 2010
  12. ocfs2: Set MS_POSIXACL on remount

    commit 57b09bb upstream.
    We have to set MS_POSIXACL on remount as well. Otherwise VFS
    would not know we started supporting ACLs after remount and
    thus ACLs would not work.
    Signed-off-by: Jan Kara <>
    Signed-off-by: Joel Becker <>
    Signed-off-by: Mark Fasheh <>
    Signed-off-by: Greg Kroah-Hartman <>
    jankara committed with gregkh Oct 15, 2009
  13. ocfs2: Find proper end cpos for a leaf refcount block.

    commit 38a04e4 upstream.
    ocfs2 refcount tree is stored as an extent tree while
    the leaf ocfs2_refcount_rec points to a refcount block.
    The following step can trip a kernel panic.
    mkfs.ocfs2 -b 512 -C 1M --fs-features=refcount $DEVICE
    mount -t ocfs2 $DEVICE $MNT_DIR
    # /mnt/1048576 is a file with 1048576 sizes.
    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
    # write_f is a program which will write some bytes to a file at offset.
    # write_f -f file_name -l offset -w write_bytes.
    ./write_f -f $MNT_DIR/$FILE_REF -l $[310*1048576] -w 4096
    ./write_f -f $MNT_DIR/$FILE_REF -l $[306*1048576] -w 4096
    ./write_f -f $MNT_DIR/$FILE_REF -l $[311*1048576] -w 4096
    ./write_f -f $MNT_DIR/$FILE_NAME -l $[310*1048576] -w 4096
    ./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096
    ./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096
    #kernel panic here.
    The reason is that if the ocfs2_extent_rec is the last record
    in a leaf extent block, the old solution fails to find the
    suitable end cpos. So this patch try to walk through the b-tree,
    find the next sub root and get the c_pos the next sub-tree starts
    btw, I have runned tristan's test case against the patched kernel
    for several days and this type of kernel panic never happens again.
    Signed-off-by: Tao Ma <>
    Signed-off-by: Joel Becker <>
    Signed-off-by: Greg Kroah-Hartman <>
    Tao Ma committed with gregkh Nov 30, 2009
  14. dlm: send reply before bast

    commit cf6620a upstream.
    When the lock master processes a successful operation (request,
    convert, cancel, or unlock), it will process the effects of the
    change before sending the reply for the operation.  The "effects"
    of the operation are:
    - blocking callbacks (basts) for any newly granted locks
    - waiting or converting locks that can now be granted
    The cast is queued on the local node when the reply from the lock
    master is received.  This means that a lock holder can receive a
    bast for a lock mode that is doesn't yet know has been granted.
    Signed-off-by: David Teigland <>
    Signed-off-by: Greg Kroah-Hartman <>
    teigland committed with gregkh Feb 24, 2010
  15. dlm: fix ordering of bast and cast

    commit 7fe2b31 upstream.
    When both blocking and completion callbacks are queued for lock,
    the dlm would always deliver the completion callback (cast) first.
    In some cases the blocking callback (bast) is queued before the
    cast, though, and should be delivered first.  This patch keeps
    track of the order in which they were queued and delivers them
    in that order.
    This patch also keeps track of the granted mode in the last cast
    and eliminates the following bast if the bast mode is compatible
    with the preceding cast mode.  This happens when a remotely mastered
    lock is demoted, e.g. EX->NL, in which case the local node queues
    a cast immediately after sending the demote message.  In this way
    a cast can be queued for a mode, e.g. NL, that makes an in-transit
    bast extraneous.
    Signed-off-by: David Teigland <>
    Signed-off-by: Greg Kroah-Hartman <>
    teigland committed with gregkh Feb 24, 2010
  16. dlm: always use GFP_NOFS

    commit 573c24c upstream.
    Replace all GFP_KERNEL and ls_allocation with GFP_NOFS.
    ls_allocation would be GFP_KERNEL for userland lockspaces
    and GFP_NOFS for file system lockspaces.
    It was discovered that any lockspaces on the system can
    affect all others by triggering memory reclaim in the
    file system which could in turn call back into the dlm
    to acquire locks, deadlocking dlm threads that were
    shared by all lockspaces, like dlm_recv.
    Signed-off-by: David Teigland <>
    Signed-off-by: Greg Kroah-Hartman <>
    teigland committed with gregkh Nov 30, 2009
  17. reiserfs: fix oops while creating privroot with selinux enabled

    commit 6cb4aff upstream.
    Commit 57fe60d ("reiserfs: add atomic addition of selinux attributes
    during inode creation") contains a bug that will cause it to oops when
    mounting a file system that didn't previously contain extended attributes
    on a system using security.* xattrs.
    The issue is that while creating the privroot during mount
    reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which
    dereferences the xattr root.  The xattr root doesn't exist, so we get an
    Signed-off-by: Jeff Mahoney <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Greg Kroah-Hartman <>
    jeffmahoney committed with gregkh Mar 23, 2010
  18. reiserfs: properly honor read-only devices

    commit 3f8b5ee upstream.
    The reiserfs journal behaves inconsistently when determining whether to
    allow a mount of a read-only device.
    This is due to the use of the continue_replay variable to short circuit
    the journal scanning.  If it's set, it's assumed that there are
    transactions to replay, but there may not be.  If it's unset, it's assumed
    that there aren't any, and that may not be the case either.
    I've observed two failure cases:
    1) Where a clean file system on a read-only device refuses to mount
    2) Where a clean file system on a read-only device passes the
       optimization and then tries writing the journal header to update
       the latest mount id.
    The former is easily observable by using a freshly created file system on
    a read-only loopback device.
    This patch moves the check into journal_read_transaction, where it can
    bail out before it's about to replay a transaction.  That way it can go
    through and skip transactions where appropriate, yet still refuse to mount
    a file system with outstanding transactions.
    Signed-off-by: Jeff Mahoney <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Greg Kroah-Hartman <>
    jeffmahoney committed with gregkh Mar 23, 2010
  19. ext4: Fix optional-arg mount options

    commit 15121c1 upstream.
    We have 2 mount options, "barrier" and "auto_da_alloc" which may or
    may not take a 1/0 argument.  This causes the ext4 superblock mount
    code to subtract uninitialized pointers and pass the result to
    kmalloc, which results in very noisy failures.
    Per Ted's suggestion, initialize the args struct so that
    we know whether match_token() found an argument for the
    option, and skip match_int() if not.
    Also, return error (0) from parse_options if we thought
    we found an argument, but match_int() Fails.
    Reported-by: Michael S. Tsirkin <>
    Signed-off-by: Eric Sandeen <>
    Signed-off-by: "Theodore Ts'o" <>
    Acked-by: Jeff Mahoney <>
    Signed-off-by: Greg Kroah-Hartman <>
    Eric Sandeen committed with gregkh Feb 16, 2010
  20. ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only files

    commit 1f5a81e upstream.
    Dan Roseberg has reported a problem with the MOVE_EXT ioctl.  If the
    donor file is an append-only file, we should not allow the operation
    to proceed, lest we end up overwriting the contents of an append-only
    Signed-off-by: "Theodore Ts'o" <>
    Cc: Dan Rosenberg <>
    Signed-off-by: Greg Kroah-Hartman <>
    tytso committed with gregkh Jun 3, 2010
  21. ACPI: Fix regression where _PPC is not read at boot even when ignore_…

    commit 455c0d7 upstream.
    Earlier, Ingo Molnar posted a patch to make it so that the kernel would avoid
    reading _PPC on his broken T60.  Unfortunately, it seems that with Thomas
    Renninger's patch last July to eliminate _PPC evaluations when the processor
    driver loads, the kernel never actually reads _PPC at all!  This is problematic
    if you happen to boot your non-T60 computer in a state where the BIOS _wants_
    _PPC to be something other than zero.
    So, put the _PPC evaluation back into acpi_processor_get_performance_info if
    ignore_ppc isn't 1.
    Signed-off-by: Darrick J. Wong <>
    Signed-off-by: Len Brown <>
    Acked-by: Jeff Mahoney <>
    Signed-off-by: Greg Kroah-Hartman <>
    Darrick J. Wong committed with gregkh Feb 18, 2010
  22. powerpc/eeh: Fix a bug when pci structure is null

    commit 8d3d50b upstream.
    During a EEH recover, the pci_dev structure can be null, mainly if an
    eeh event is detected during cpi config operation. In this case, the
    pci_dev will not be known (and will be null) the kernel will crash
    with the following message:
    Unable to handle kernel paging request for data at address 0x000000a0
    Faulting instruction address: 0xc00000000006b8b4
    Oops: Kernel access of bad area, sig: 11 [#1]
    NIP [c00000000006b8b4] .eeh_event_handler+0x10c/0x1a0
    LR [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0
    Call Trace:
    [c0000003a80dff00] [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0
    [c0000003a80dff90] [c000000000031f1c] .kernel_thread+0x54/0x70
    The bug occurs because pci_name() tries to access a null pointer.
    This patch just guarantee that pci_name() is not called on Null pointers.
    Signed-off-by: Breno Leitao <>
    Signed-off-by: Linas Vepstas <>
    Signed-off-by: Benjamin Herrenschmidt <>
    Acked-by: Jeff Mahoney <>
    Signed-off-by: Greg Kroah-Hartman <>
    Breno Leitao committed with gregkh Feb 3, 2010
  23. HWPOISON: abort on failed unmap

    commit 1668bfd upstream.
    Don't try to isolate a still mapped page. Otherwise we will hit the
    BUG_ON(page_mapped(page)) in __remove_from_page_cache().
    Signed-off-by: Wu Fengguang <>
    Signed-off-by: Andi Kleen <>
    Signed-off-by: Thomas Renninger <>
    Signed-off-by: Greg Kroah-Hartman <>
    fengguang committed with gregkh Dec 16, 2009
  24. HWPOISON: remove the anonymous entry

    commit 9b9a29e upstream.
    (PG_swapbacked && !PG_lru) pages should not happen.
    Better to treat them as unknown pages.
    Signed-off-by: Wu Fengguang <>
    Signed-off-by: Andi Kleen <>
    Signed-off-by: Thomas Renninger <>
    Signed-off-by: Greg Kroah-Hartman <>
    fengguang committed with gregkh Dec 16, 2009
  25. x86: Fix out of order of gsi

    commit fad5399 upstream.
    Iranna D Ankad reported that IBM x3950 systems have boot
    problems after this commit:
     | commit b9c61b7
     |    x86/pci: update pirq_enable_irq() to setup io apic routing
    The problem is that with the patch, the machine freezes when
    console=ttyS0,... kernel serial parameter is passed.
    It seem to freeze at DVD initialization and the whole problem
    seem to be DVD/pata related, but somehow exposed through the
    serial parameter.
    Such apic problems can expose really weird behavior:
      ACPI: IOAPIC (id[0x10] address[0xfecff000] gsi_base[0])
      IOAPIC[0]: apic_id 16, version 0, address 0xfecff000, GSI 0-2
      ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[3])
      IOAPIC[1]: apic_id 15, version 0, address 0xfec00000, GSI 3-38
      ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[39])
      IOAPIC[2]: apic_id 14, version 0, address 0xfec01000, GSI 39-74
      ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 4 dfl dfl)
      ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 5 dfl dfl)
      ACPI: INT_SRC_OVR (bus 0 bus_irq 3 global_irq 6 dfl dfl)
      ACPI: INT_SRC_OVR (bus 0 bus_irq 4 global_irq 7 dfl dfl)
      ACPI: INT_SRC_OVR (bus 0 bus_irq 6 global_irq 9 dfl dfl)
      ACPI: INT_SRC_OVR (bus 0 bus_irq 7 global_irq 10 dfl dfl)
      ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 11 low edge)
      ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 12 dfl dfl)
      ACPI: INT_SRC_OVR (bus 0 bus_irq 12 global_irq 15 dfl dfl)
      ACPI: INT_SRC_OVR (bus 0 bus_irq 13 global_irq 16 dfl dfl)
      ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 17 low edge)
      ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 18 dfl dfl)
    It turns out that the system has three io apic controllers, but
    boot ioapic routing is in the second one, and that gsi_base is
    not 0 - it is using a bunch of INT_SRC_OVR...
    So these recent changes:
     1. one set routing for first io apic controller
     2. assume irq = gsi
    ... will break that system.
    So try to remap those gsis, need to seperate boot_ioapic_idx
    detection out of enable_IO_APIC() and call them early.
    So introduce boot_ioapic_idx, and remap_ioapic_gsi()...
     -v2: shift gsi with delta instead of gsi_base of boot_ioapic_idx
     -v3: double check with find_isa_irq_apic(0, mp_INT) to get right
     -v4: nr_legacy_irqs
     -v5: add print out for boot_ioapic_idx, and also make it could be
          applied for current kernel and previous kernel
     -v6: add bus_irq, in acpi_sci_ioapic_setup, so can get overwride
          for sci right mapping...
     -v7: looks like pnpacpi get irq instead of gsi, so need to revert
          them back...
     -v8: split into two patches
     -v9: according to Eric, use fixed 16 for shifting instead of remap
     -v10: still need to touch rsparser.c
     -v11: just revert back to way Eric suggest...
          anyway the ioapic in first ioapic is blocked by second...
     -v12: two patches, this one will add more loop but check apic_id and irq > 16
    Reported-by: Iranna D Ankad <>
    Bisected-by: Iranna D Ankad <>
    Tested-by: Gary Hade <>
    Signed-off-by: Yinghai Lu <>
    Cc: Eric W. Biederman <>
    Cc: Thomas Renninger <>
    Cc: Eric W. Biederman <>
    Cc: Suresh Siddha <>
    LKML-Reference: <>
    Signed-off-by: Ingo Molnar <>
    Signed-off-by: Greg Kroah-Hartman <>
    ebiederm committed with gregkh Feb 28, 2010
  26. memory hotplug: fix a bug on /dev/mem for 64-bit kernels

    commit ea08541 upstream.
    Newly added memory can not be accessed via /dev/mem, because we do not
    update the variables high_memory, max_pfn and max_low_pfn.
    Add a function update_end_of_memory_vars() to update these variables for
    64-bit kernels.
    [ simplify comment]
    Signed-off-by: Shaohui Zheng <>
    Cc: Andi Kleen <>
    Cc: Li Haicheng <>
    Reviewed-by: Wu Fengguang <>
    Reviewed-by: KAMEZAWA Hiroyuki <>
    Cc: Ingo Molnar <>
    Cc: Thomas Gleixner <>
    Cc: "H. Peter Anvin" <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Greg Kroah-Hartman <>
    Shaohui Zheng committed with gregkh Feb 2, 2010
  27. crypto: testmgr - Fix complain about lack test for internal used algo…

    commit 863b557 upstream.
    When load aesni-intel and ghash_clmulni-intel driver,kernel will complain no
     test for some internal used algorithm.
    The strange information as following:
    alg: No test for __aes-aesni (__driver-aes-aesni)
    alg: No test for __ecb-aes-aesni (__driver-ecb-aes-aesni)
    alg: No test for __cbc-aes-aesni (__driver-cbc-aes-aesni)
    alg: No test for __ecb-aes-aesni (cryptd(__driver-ecb-aes-aesni)
    alg: No test for __ghash (__ghash-pclmulqdqni)
    alg: No test for __ghash (cryptd(__ghash-pclmulqdqni))
    This patch add NULL test entries for these algorithm and driver.
    Signed-off-by: Song Youquan <>
    Signed-off-by: Hang Ying <>
    Signed-off-by: Herbert Xu <>
    Acked-by: Jiri Kosina <>
    Signed-off-by: Greg Kroah-Hartman <>
    Song Youquan committed with gregkh Dec 23, 2009
  28. fix SBA IOMMU to handle allocation failure properly

    commit e2a4656 upstream.
    It's possible that SBA IOMMU might fail to find I/O space under heavy
    I/Os.  SBA IOMMU panics on allocation failure but it shouldn't; drivers
    can handle the failure.  The majority of other IOMMU drivers don't panic
    on allocation failure.
    This patch fixes SBA IOMMU path to handle allocation failure properly.
    Signed-off-by: FUJITA Tomonori <>
    Cc: Fenghua Yu <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Tony Luck <>
    Acked-by: Leonardo Chiquitto <>
    Acked-by: Jeff Mahoney <>
    Signed-off-by: Greg Kroah-Hartman <>
    fujita committed with gregkh Nov 17, 2009
  29. mutex: Don't spin when the owner CPU is offline or other weird cases

    commit 4b40221 upstream.
    Due to recent load-balancer changes that delay the task migration to
    the next wakeup, the adaptive mutex spinning ends up in a live lock
    when the owner's CPU gets offlined because the cpu_online() check
    lives before the owner running check.
    This patch changes mutex_spin_on_owner() to return 0 (don't spin) in
    any case where we aren't sure about the owner struct validity or CPU
    number, and if the said CPU is offline. There is no point going back &
    re-evaluate spinning in corner cases like that, let's just go to
    Signed-off-by: Benjamin Herrenschmidt <>
    Signed-off-by: Peter Zijlstra <>
    LKML-Reference: <1271212509.13059.135.camel@pasglop>
    Signed-off-by: Ingo Molnar <>
    Signed-off-by: Greg Kroah-Hartman <>
    ozbenh committed with gregkh Apr 16, 2010
  30. sched, cputime: Introduce thread_group_times()

    commit 0cf55e1 upstream.
    This is a real fix for problem of utime/stime values decreasing
    described in the thread:
    Now cputime is accounted in the following way:
     - {u,s}time in task_struct are increased every time when the thread
       is interrupted by a tick (timer interrupt).
     - When a thread exits, its {u,s}time are added to signal->{u,s}time,
       after adjusted by task_times().
     - When all threads in a thread_group exits, accumulated {u,s}time
       (and also c{u,s}time) in signal struct are added to c{u,s}time
       in signal struct of the group's parent.
    So {u,s}time in task struct are "raw" tick count, while
    {u,s}time and c{u,s}time in signal struct are "adjusted" values.
    And accounted values are used by:
     - task_times(), to get cputime of a thread:
       This function returns adjusted values that originates from raw
       {u,s}time and scaled by sum_exec_runtime that accounted by CFS.
     - thread_group_cputime(), to get cputime of a thread group:
       This function returns sum of all {u,s}time of living threads in
       the group, plus {u,s}time in the signal struct that is sum of
       adjusted cputimes of all exited threads belonged to the group.
    The problem is the return value of thread_group_cputime(),
    because it is mixed sum of "raw" value and "adjusted" value:
      group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)
    This misbehavior can break {u,s}time monotonicity.
    Assume that if there is a thread that have raw values greater
    than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
    but only runs 45ms) and if it exits, cputime will decrease (e.g.
    To fix this, we could do:
      group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)
    But task_times() contains hard divisions, so applying it for
    every thread should be avoided.
    This patch fixes the above problem in the following way:
     - Modify thread's exit (= __exit_signal()) not to use task_times().
       It means {u,s}time in signal struct accumulates raw values instead
       of adjusted values.  As the result it makes thread_group_cputime()
       to return pure sum of "raw" values.
     - Introduce a new function thread_group_times(*task, *utime, *stime)
       that converts "raw" values of thread_group_cputime() to "adjusted"
       values, in same calculation procedure as task_times().
     - Modify group's exit (= wait_task_zombie()) to use this introduced
       thread_group_times().  It make c{u,s}time in signal struct to
       have adjusted values like before this patch.
     - Replace some thread_group_cputime() by thread_group_times().
       This replacements are only applied where conveys the "adjusted"
       cputime to users, and where already uses task_times() near by it.
       (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.)
    This patch have a positive side effect:
     - Before this patch, if a group contains many short-life threads
       (e.g. runs 0.9ms and not interrupted by ticks), the group's
       cputime could be invisible since thread's cputime was accumulated
       after adjusted: imagine adjustment function as adj(ticks, runtime),
         {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
       After this patch it will not happen because the adjustment is
       applied after accumulated.
     - remove if()s, put new variables into signal_struct.
    Signed-off-by: Hidetoshi Seto <>
    Acked-by: Peter Zijlstra <>
    Cc: Spencer Candland <>
    Cc: Americo Wang <>
    Cc: Oleg Nesterov <>
    Cc: Balbir Singh <>
    Cc: Stanislaw Gruszka <>
    LKML-Reference: <>
    Signed-off-by: Ingo Molnar <>
    Signed-off-by: Jiri Slaby <>
    Signed-off-by: Greg Kroah-Hartman <>
    Hidetoshi Seto committed with gregkh Dec 2, 2009
  31. sched: Fix granularity of task_u/stime()

    commit 761b1d2 upstream.
    Originally task_s/utime() were designed to return clock_t but
    later changed to return cputime_t by following commit:
      commit efe567f
      Author: Christian Borntraeger <>
      Date:   Thu Aug 23 15:18:02 2007 +0200
    It only changed the type of return value, but not the
    implementation. As the result the granularity of task_s/utime()
    is still that of clock_t, not that of cputime_t.
    So using task_s/utime() in __exit_signal() makes values
    accumulated to the signal struct to be rounded and coarse
    This patch removes casts to clock_t in task_u/stime(), to keep
    granularity of cputime_t over the calculation.
      Use div_u64() to avoid error "undefined reference to `__udivdi3`"
      on some 32bit systems.
    Signed-off-by: Hidetoshi Seto <>
    Acked-by: Peter Zijlstra <>
    Cc: Spencer Candland <>
    Cc: Oleg Nesterov <>
    Cc: Stanislaw Gruszka <>
    LKML-Reference: <>
    Signed-off-by: Ingo Molnar <>
    Signed-off-by: Jiri Slaby <>
    Signed-off-by: Greg Kroah-Hartman <>
    Hidetoshi Seto committed with gregkh Nov 12, 2009
  32. timekeeping: Fix clock_gettime vsyscall time warp

    commit 0696b71 upstream.
    Since commit 0a54419 "timekeeping: Move NTP adjusted clock multiplier
    to struct timekeeper" the clock multiplier of vsyscall is updated with
    the unmodified clock multiplier of the clock source and not with the
    NTP adjusted multiplier of the timekeeper.
    This causes user space observerable time warps:
    new CLOCK-warp maximum: 120 nsecs,  00000025c337c537 -> 00000025c337c4bf
    Add a new argument "mult" to update_vsyscall() and hand in the
    timekeeping internal NTP adjusted multiplier.
    Signed-off-by: Lin Ming <>
    Cc: "Zhang Yanmin" <>
    Cc: Martin Schwidefsky <>
    Cc: Benjamin Herrenschmidt <>
    Cc: Tony Luck <>
    LKML-Reference: <>
    Signed-off-by: Thomas Gleixner <>
    Signed-off-by: Kurt Garloff <>
    Signed-off-by: Greg Kroah-Hartman <>
    Lin Ming committed with gregkh Nov 17, 2009