Commits on Mar 21, 2011
  1. @gregkh

    Linux 2.6.33.8

    gregkh committed Mar 21, 2011
  2. @tilmanschmidt @gregkh

    isdn: avoid calling tty_ldisc_flush() in atomic context

    commit bc10f96 upstream.
    
    Remove the call to tty_ldisc_flush() from the RESULT_NO_CARRIER
    branch of isdn_tty_modem_result(), as already proposed in commit
    00409bb.
    This avoids a "sleeping function called from invalid context" BUG
    when the hardware driver calls the statcallb() callback with
    command==ISDN_STAT_DHUP in atomic context, which in turn calls
    isdn_tty_modem_result(RESULT_NO_CARRIER, ~), and from there,
    tty_ldisc_flush() which may sleep.
    
    Signed-off-by: Tilman Schmidt <tilman@imap.cc>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    tilmanschmidt committed with gregkh Jul 5, 2010
  3. @gregkh

    x86: Flush TLB if PGD entry is changed in i386 PAE mode

    commit 4981d01 upstream.
    
    According to intel CPU manual, every time PGD entry is changed in i386 PAE
    mode, we need do a full TLB flush. Current code follows this and there is
    comment for this too in the code.
    
    But current code misses the multi-threaded case. A changed page table
    might be used by several CPUs, every such CPU should flush TLB. Usually
    this isn't a problem, because we prepopulate all PGD entries at process
    fork. But when the process does munmap and follows new mmap, this issue
    will be triggered.
    
    When it happens, some CPUs keep doing page faults:
    
      http://marc.info/?l=linux-kernel&m=129915020508238&w=2
    
    Reported-by: Yasunori Goto<y-goto@jp.fujitsu.com>
    Tested-by: Yasunori Goto<y-goto@jp.fujitsu.com>
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Signed-off-by: Shaohua Li<shaohua.li@intel.com>
    Cc: Mallick Asit K <asit.k.mallick@intel.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: linux-mm <linux-mm@kvack.org>
    LKML-Reference: <1300246649.2337.95.camel@sli10-conroe>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Shaohua Li committed with gregkh Mar 16, 2011
  4. @gregkh

    call_function_many: add missing ordering

    commit 45a5791 upstream.
    
    Paul McKenney's review pointed out two problems with the barriers in the
    2.6.38 update to the smp call function many code.
    
    First, a barrier that would force the func and info members of data to
    be visible before their consumption in the interrupt handler was
    missing.  This can be solved by adding a smp_wmb between setting the
    func and info members and setting setting the cpumask; this will pair
    with the existing and required smp_rmb ordering the cpumask read before
    the read of refs.  This placement avoids the need a second smp_rmb in
    the interrupt handler which would be executed on each of the N cpus
    executing the call request.  (I was thinking this barrier was present
    but was not).
    
    Second, the previous write to refs (establishing the zero that we the
    interrupt handler was testing from all cpus) was performed by a third
    party cpu.  This would invoke transitivity which, as a recient or
    concurrent addition to memory-barriers.txt now explicitly states, would
    require a full smp_mb().
    
    However, we know the cpumask will only be set by one cpu (the data
    owner) and any preivous iteration of the mask would have cleared by the
    reading cpu.  By redundantly writing refs to 0 on the owning cpu before
    the smp_wmb, the write to refs will follow the same path as the writes
    that set the cpumask, which in turn allows us to keep the barrier in the
    interrupt handler a smp_rmb instead of promoting it to a smp_mb (which
    will be be executed by N cpus for each of the possible M elements on the
    list).
    
    I moved and expanded the comment about our (ab)use of the rcu list
    primitives for the concurrent walk earlier into this function.  I
    considered moving the first two paragraphs to the queue list head and
    lock, but felt it would have been too disconected from the code.
    
    Cc: Paul McKinney <paulmck@linux.vnet.ibm.com>
    Signed-off-by: Milton Miller <miltonm@bga.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Milton Miller committed with gregkh Mar 15, 2011
  5. @gregkh

    call_function_many: fix list delete vs add race

    commit e6cd1e0 upstream.
    
    Peter pointed out there was nothing preventing the list_del_rcu in
    smp_call_function_interrupt from running before the list_add_rcu in
    smp_call_function_many.
    
    Fix this by not setting refs until we have gotten the lock for the list.
    Take advantage of the wmb in list_add_rcu to save an explicit additional
    one.
    
    I tried to force this race with a udelay before the lock & list_add and
    by mixing all 64 online cpus with just 3 random cpus in the mask, but
    was unsuccessful.  Still, inspection shows a valid race, and the fix is
    a extension of the existing protection window in the current code.
    
    Reported-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Milton Miller <miltonm@bga.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Milton Miller committed with gregkh Mar 15, 2011
  6. @gregkh

    ext3: Always set dx_node's fake_dirent explicitly.

    commit d743314 upstream.
    
    (crossport of 1f7bebb
    by Andreas Schlick <schlick@lavabit.com>)
    
    When ext3_dx_add_entry() has to split an index node, it has to ensure that
    name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck
    won't recognise it as an intermediate htree node and consider the htree to
    be corrupted.
    
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Eric Sandeen committed with gregkh Mar 4, 2011
  7. @antonblanchard @gregkh

    perf, powerpc: Handle events that raise an exception without overflowing

    commit 0837e32 upstream.
    
    Events on POWER7 can roll back if a speculative event doesn't
    eventually complete. Unfortunately in some rare cases they will
    raise a performance monitor exception. We need to catch this to
    ensure we reset the PMC. In all cases the PMC will be 256 or less
    cycles from overflow.
    
    Signed-off-by: Anton Blanchard <anton@samba.org>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <20110309143842.6c22845e@kryten>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    antonblanchard committed with gregkh Mar 9, 2011
  8. @gregkh

    SUNRPC: Ensure we always run the tk_callback before tk_action

    commit e020c68 upstream.
    
    This fixes a race in which the task->tk_callback() puts the rpc_task
    to sleep, setting a new callback. Under certain circumstances, the current
    code may end up executing the task->tk_action before it gets round to the
    callback.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Trond Myklebust committed with gregkh Mar 15, 2011
  9. @jrgruher @gregkh

    scsi_dh_alua: fix deadlock in stpg_endio

    commit ed0f36b upstream.
    
    The use of blk_execute_rq_nowait() implies __blk_put_request() is needed
    in stpg_endio() rather than blk_put_request() -- blk_finish_request() is
    called with queue lock already held.
    
    Signed-off-by: Joseph Gruher <joseph.r.gruher@intel.com>
    Signed-off-by: Ilgu Hong <ilgu.hong@promise.com>
    Signed-off-by: Mike Snitzer <snitzer@redhat.com>
    Signed-off-by: James Bottomley <James.Bottomley@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    jrgruher committed with gregkh Jan 5, 2011
  10. @gregkh

    ALSA: ctxfi - Clear input settings before initialization

    commit efed5f2 upstream.
    
    Clear input settings before initialization.
    
    Signed-off-by: Przemyslaw Bruski <pbruskispam@op.pl>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Przemyslaw Bruski committed with gregkh Mar 13, 2011
  11. @gregkh

    ALSA: ctxfi - Fix SPDIF status retrieval

    commit f164753 upstream.
    
    SDPIF status retrieval always returned the default settings instead of
    the actual ones.
    
    Signed-off-by: Przemyslaw Bruski <pbruskispam@op.pl>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Przemyslaw Bruski committed with gregkh Mar 13, 2011
  12. @gregkh

    ALSA: ctxfi - Fix incorrect SPDIF status bit mask

    commit 4c1847e upstream.
    
    SPDIF status mask creation was incorrect.
    
    Signed-off-by: Przemyslaw Bruski <pbruskispam@op.pl>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Przemyslaw Bruski committed with gregkh Mar 13, 2011
  13. @gregkh

    PCI: sysfs: Fix failure path for addition of "vpd" attribute

    commit 0f12a4e upstream.
    
    Commit 280c73d ("PCI: centralize the capabilities code in
    pci-sysfs.c") changed the initialisation of the "rom" and "vpd"
    attributes, and made the failure path for the "vpd" attribute
    incorrect.  We must free the new attribute structure (attr), but
    instead we currently free dev->vpd->attr.  That will normally be NULL,
    resulting in a memory leak, but it might be a stale pointer, resulting
    in a double-free.
    
    Found by inspection; compile-tested only.
    
    Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Ben Hutchings committed with gregkh Jan 13, 2011
  14. @gregkh

    PCI: do not create quirk I/O regions below PCIBIOS_MIN_IO for ICH

    commit 87e3dc3 upstream.
    
    Some broken BIOSes on ICH4 chipset report an ACPI region which is in
    conflict with legacy IDE ports when ACPI is disabled. Even though the
    regions overlap, IDE ports are working correctly (we cannot find out
    the decoding rules on chipsets).
    
    So the only problem is the reported region itself, if we don't reserve
    the region in the quirk everything works as expected.
    
    This patch avoids reserving any quirk regions below PCIBIOS_MIN_IO
    which is 0x1000. Some regions might be (and are by a fast google
    query) below this border, but the only difference is that they won't
    be reserved anymore. They should still work though the same as before.
    
    The conflicts look like (1f.0 is bridge, 1f.1 is IDE ctrl):
    pci 0000:00:1f.1: address space collision: [io 0x0170-0x0177] conflicts with 0000:00:1f.0 [io  0x0100-0x017f]
    
    At 0x0100 a 128 bytes long ACPI region is reported in the quirk for
    ICH4. ata_piix then fails to find disks because the IDE legacy ports
    are zeroed:
    ata_piix 0000:00:1f.1: device not available (can't reserve [io 0x0000-0x0007])
    
    References: https://bugzilla.novell.com/show_bug.cgi?id=558740
    Signed-off-by: Jiri Slaby <jslaby@suse.cz>
    Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Thomas Renninger <trenn@suse.de>
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Jiri Slaby committed with gregkh Feb 28, 2011
  15. @gregkh

    PCI: add more checking to ICH region quirks

    commit cdb9755 upstream.
    
    Per ICH4 and ICH6 specs, ACPI and GPIO regions are valid iff ACPI_EN
    and GPIO_EN bits are set to 1. Add checks for these bits into the
    quirks prior to the region creation.
    
    While at it, name the constants by macros.
    
    Signed-off-by: Jiri Slaby <jslaby@suse.cz>
    Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Thomas Renninger <trenn@suse.de>
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Jiri Slaby committed with gregkh Feb 28, 2011
  16. @jbrandeb @gregkh

    PCI: remove quirk for pre-production systems

    commit b99af4b upstream.
    
    Revert commit 7eb93b1
    Author: Yu Zhao <yu.zhao@intel.com>
    Date:   Fri Apr 3 15:18:11 2009 +0800
    
        PCI: SR-IOV quirk for Intel 82576 NIC
    
        If BIOS doesn't allocate resources for the SR-IOV BARs, zero the Flash
        BAR and program the SR-IOV BARs to use the old Flash Memory Space.
    
        Please refer to Intel 82576 Gigabit Ethernet Controller Datasheet
        section 7.9.2.14.2 for details.
        http://download.intel.com/design/network/datashts/82576_Datasheet.pdf
    
        Signed-off-by: Yu Zhao <yu.zhao@intel.com>
        Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    
    This quirk was added before SR-IOV was in production and now all machines that
    originally had this issue alreayd have bios updates to correct the issue. The
    quirk itself is no longer needed and in fact causes bugs if run.  Remove it.
    
    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    CC: Yu Zhao <yu.zhao@intel.com>
    CC: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    jbrandeb committed with gregkh Feb 14, 2011
  17. @gregkh

    ALSA: hda - fix digital mic selection in mixer on 92HD8X codecs

    commit 094a424 upstream.
    
    When the mux for digital mic is different from the mux for other mics,
    the current auto-parser doesn't handle them in a right way but provides
    only one mic.  This patch fixes the issue.
    
    Signed-off-by: Vitaliy Kulikov <Vitaliy.Kulikov@idt.com>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Vitaliy Kulikov committed with gregkh Mar 9, 2011
  18. @djrbliss @gregkh

    xfs: prevent reading uninitialized stack memory

    commit a122eb2 upstream.
    
    The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12
    bytes of uninitialized stack memory, because the fsxattr struct
    declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero)
    the 12-byte fsx_pad member before copying it back to the user.  This
    patch takes care of it.
    
    Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
    Reviewed-by: Eric Sandeen <sandeen@redhat.com>
    Signed-off-by: Alex Elder <aelder@sgi.com>
    Cc: dann frazier <dannf@debian.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    djrbliss committed with gregkh Sep 6, 2010
  19. @lpechacek @gregkh

    USB: serial: handle Data Carrier Detect changes

    commit d14fc1a upstream.
    
    Alan's commit 335f851 introduced
    .carrier_raised function in several drivers.  That also means
    tty_port_block_til_ready can now suspend the process trying to open the serial
    port when Carrier Detect is low and put it into tty_port.open_wait queue.  We
    need to wake up the process when Carrier Detect goes high and trigger TTY
    hangup when CD goes low.
    
    Some of the devices do not report modem status line changes, or at least we
    don't understand the status message, so for those we remove .carrier_raised
    again.
    
    Signed-off-by: Libor Pechacek <lpechacek@suse.cz>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    lpechacek committed with gregkh Jan 14, 2011
  20. @craigshelley @gregkh

    USB: CP210x Removed incorrect device ID

    commit 9926c0d upstream.
    
    Device ID removed 0x10C4/0x8149 for West Mountain Radio Computerized
    Battery Analyzer.  This device is actually based on a SiLabs C8051Fxxx,
    see http://www.etheus.net/SiUSBXp_Linux_Driver for further info.
    
    Signed-off-by: Craig Shelley <craig@microtron.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    craigshelley committed with gregkh Jan 2, 2011
  21. @craigshelley @gregkh

    USB: CP210x Add two device IDs

    commit faea63f upstream.
    
    Device Ids added for IRZ Automation Teleport SG-10 GSM/GPRS Modem and
    DekTec DTA Plus VHF/UHF Booster/Attenuator.
    
    Signed-off-by: Craig Shelley <craig@microtron.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    craigshelley committed with gregkh Jan 2, 2011
  22. @gregkh

    staging: usbip: remove double giveback of URB

    commit 7571f08 upstream.
    
    In the vhci_urb_dequeue() function the TCP connection is checked twice.
    Each time when the TCP connection is closed the URB is unlinked and given
    back. Remove the second attempt of unlinking and giving back of the URB completely.
    
    This patch fixes the bug described at https://bugzilla.kernel.org/show_bug.cgi?id=24872 .
    
    Signed-off-by: Márton Németh <nm127@freemail.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Márton Németh committed with gregkh Dec 13, 2010
  23. @gregkh

    sctp: Do not reset the packet during sctp_packet_config().

    commit 4bdab43 upstream.
    
    sctp_packet_config() is called when getting the packet ready
    for appending of chunks.  The function should not touch the
    current state, since it's possible to ping-pong between two
    transports when sending, and that can result packet corruption
    followed by skb overlfow crash.
    
    Reported-by: Thomas Dreibholz <dreibh@iem.uni-due.de>
    Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Vlad Yasevich committed with gregkh Sep 15, 2010
  24. @rkuester @gregkh

    SCSI: mptsas: fix hangs caused by ATA pass-through

    commit 2a1b7e5 upstream.
    
    I may have an explanation for the LSI 1068 HBA hangs provoked by ATA
    pass-through commands, in particular by smartctl.
    
    First, my version of the symptoms.  On an LSI SAS1068E B3 HBA running
    01.29.00.00 firmware, with SATA disks, and with smartd running, I'm seeing
    occasional task, bus, and host resets, some of which lead to hard faults of
    the HBA requiring a reboot.  Abusively looping the smartctl command,
    
        # while true; do smartctl -a /dev/sdb > /dev/null; done
    
    dramatically increases the frequency of these failures to nearly one per
    minute.  A high IO load through the HBA while looping smartctl seems to
    improve the chance of a full scsi host reset or a non-recoverable hang.
    
    I reduced what smartctl was doing down to a simple test case which
    causes the hang with a single IO when pointed at the sd interface.  See
    the code at the bottom of this e-mail.  It uses an SG_IO ioctl to issue
    a single pass-through ATA identify device command.  If the buffer
    userspace gives for the read data has certain alignments, the task is
    issued to the HBA but the HBA fails to respond.  If run against the sg
    interface, neither the test code nor smartctl causes a hang.
    
    sd and sg handle the SG_IO ioctl slightly differently.  Unless you
    specifically set a flag to do direct IO, sg passes a buffer of its own,
    which is page-aligned, to the block layer and later copies the result
    into the userspace buffer regardless of its alignment.  sd, on the other
    hand, always does direct IO unless the userspace buffer fails an
    alignment test at block/blk-map.c line 57, in which case a page-aligned
    buffer is created and used for the transfer.
    
    The alignment test currently checks for word-alignment, the default
    setup by scsi_lib.c; therefore, userspace buffers of almost any
    alignment are given directly to the HBA as DMA targets.  The LSI 1068
    hardware doesn't seem to like at least a couple of the alignments which
    cross a page boundary (see the test code below).  Curiously, many
    page-boundary-crossing alignments do work just fine.
    
    So, either the hardware has an bug handling certain alignments or the
    hardware has a stricter alignment requirement than the driver is
    advertising.  If stricter alignment is required, then in no case should
    misaligned buffers from userspace be allowed through without being
    bounced or at least causing an error to be returned.
    
    It seems the mptsas driver could use blk_queue_dma_alignment() to advertise
    a stricter alignment requirement.  If it does, sd does the right thing and
    bounces misaligned buffers (see block/blk-map.c line 57).  The following
    patch to 2.6.34-rc5 makes my symptoms go away.  I'm sure this is the wrong
    place for this code, but it gets my idea across.
    
    Acked-by: Kashyap Desai <Kashyap.Desai@lsi.com>
    Signed-off-by: James Bottomley <James.Bottomley@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    rkuester committed with gregkh Apr 26, 2010
  25. @sgruszka @gregkh

    sched: Fix user time incorrectly accounted as system time on 32-bit

    commit e75e863 upstream.
    
    We have 32-bit variable overflow possibility when multiply in
    task_times() and thread_group_times() functions. When the
    overflow happens then the scaled utime value becomes erroneously
    small and the scaled stime becomes i erroneously big.
    
    Reported here:
    
     https://bugzilla.redhat.com/show_bug.cgi?id=633037
     https://bugzilla.kernel.org/show_bug.cgi?id=16559
    
    Reported-by: Michael Chapman <redhat-bugzilla@very.puzzling.org>
    Reported-by: Ciriaco Garcia de Celis <sysman@etherpilot.com>
    Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
    LKML-Reference: <20100914143513.GB8415@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    sgruszka committed with gregkh Sep 14, 2010
  26. @gregkh

    rt2x00: add device id for windy31 usb device

    commit 9c4cf6d upstream.
    
    This patch adds the device id for the windy31 USB device to the rt73usb
    driver.
    
    Thanks to Ralf Flaxa for reporting this and providing testing and a
    sample device.
    
    Reported-by: Ralf Flaxa <rf@suse.de>
    Tested-by: Ralf Flaxa <rf@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>
    gregkh committed Jan 25, 2011
  27. @paulmck @gregkh

    pid: make setpgid() system call use RCU read-side critical section

    commit 950eaac upstream.
    
    [   23.584719]
    [   23.584720] ===================================================
    [   23.585059] [ INFO: suspicious rcu_dereference_check() usage. ]
    [   23.585176] ---------------------------------------------------
    [   23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection!
    [   23.585176]
    [   23.585176] other info that might help us debug this:
    [   23.585176]
    [   23.585176]
    [   23.585176] rcu_scheduler_active = 1, debug_locks = 1
    [   23.585176] 1 lock held by rc.sysinit/728:
    [   23.585176]  #0:  (tasklist_lock){.+.+..}, at: [<ffffffff8104771f>] sys_setpgid+0x5f/0x193
    [   23.585176]
    [   23.585176] stack backtrace:
    [   23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2
    [   23.585176] Call Trace:
    [   23.585176]  [<ffffffff8105b436>] lockdep_rcu_dereference+0x99/0xa2
    [   23.585176]  [<ffffffff8104c324>] find_task_by_pid_ns+0x50/0x6a
    [   23.585176]  [<ffffffff8104c35b>] find_task_by_vpid+0x1d/0x1f
    [   23.585176]  [<ffffffff81047727>] sys_setpgid+0x67/0x193
    [   23.585176]  [<ffffffff810029eb>] system_call_fastpath+0x16/0x1b
    [   24.959669] type=1400 audit(1282938522.956:4): avc:  denied  { module_request } for  pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas
    
    It turns out that the setpgid() system call fails to enter an RCU
    read-side critical section before doing a PID-to-task_struct translation.
    This commit therefore does rcu_read_lock() before the translation, and
    also does rcu_read_unlock() after the last use of the returned pointer.
    
    Reported-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Acked-by: David Howells <dhowells@redhat.com>
    Cc: Jiri Slaby <jslaby@suse.cz>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    paulmck committed with gregkh Aug 31, 2010
  28. @htejun @gregkh

    percpu: fix pcpu_last_unit_cpu

    commit 46b30ea upstream.
    
    pcpu_first/last_unit_cpu are used to track which cpu has the first and
    last units assigned.  This in turn is used to determine the span of a
    chunk for man/unmap cache flushes and whether an address belongs to
    the first chunk or not in per_cpu_ptr_to_phys().
    
    When the number of possible CPUs isn't power of two, a chunk may
    contain unassigned units towards the end of a chunk.  The logic to
    determine pcpu_last_unit_cpu was incorrect when there was an unused
    unit at the end of a chunk.  It failed to ignore the unused unit and
    assigned the unused marker NR_CPUS to pcpu_last_unit_cpu.
    
    This was discovered through kdump failure which was caused by
    malfunctioning per_cpu_ptr_to_phys() on a kvm setup with 50 possible
    CPUs by CAI Qian.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Reported-by: CAI Qian <caiqian@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    htejun committed with gregkh Sep 21, 2010
  29. @gormanm @gregkh

    mm: page allocator: update free page counters after pages are placed …

    …on the free list
    
    commit 72853e2 upstream.
    
    When allocating a page, the system uses NR_FREE_PAGES counters to
    determine if watermarks would remain intact after the allocation was made.
    This check is made without interrupts disabled or the zone lock held and
    so is race-prone by nature.  Unfortunately, when pages are being freed in
    batch, the counters are updated before the pages are added on the list.
    During this window, the counters are misleading as the pages do not exist
    yet.  When under significant pressure on systems with large numbers of
    CPUs, it's possible for processes to make progress even though they should
    have been stalled.  This is particularly problematic if a number of the
    processes are using GFP_ATOMIC as the min watermark can be accidentally
    breached and in extreme cases, the system can livelock.
    
    This patch updates the counters after the pages have been added to the
    list.  This makes the allocator more cautious with respect to preserving
    the watermarks and mitigates livelock possibilities.
    
    [akpm@linux-foundation.org: avoid modifying incoming args]
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
    Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Reviewed-by: Christoph Lameter <cl@linux.com>
    Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    gormanm committed with gregkh Sep 9, 2010
  30. @gormanm @gregkh

    mm: page allocator: drain per-cpu lists after direct reclaim allocati…

    …on fails
    
    commit 9ee493c upstream.
    
    When under significant memory pressure, a process enters direct reclaim
    and immediately afterwards tries to allocate a page.  If it fails and no
    further progress is made, it's possible the system will go OOM.  However,
    on systems with large amounts of memory, it's possible that a significant
    number of pages are on per-cpu lists and inaccessible to the calling
    process.  This leads to a process entering direct reclaim more often than
    it should increasing the pressure on the system and compounding the
    problem.
    
    This patch notes that if direct reclaim is making progress but allocations
    are still failing that the system is already under heavy pressure.  In
    this case, it drains the per-cpu lists and tries the allocation a second
    time before continuing.
    
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
    Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Reviewed-by: Christoph Lameter <cl@linux.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Wu Fengguang <fengguang.wu@intel.com>
    Cc: David Rientjes <rientjes@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    gormanm committed with gregkh Sep 9, 2010
  31. @gregkh

    mm: page allocator: calculate a better estimate of NR_FREE_PAGES when…

    … memory is low and kswapd is awake
    
    commit aa45484 upstream.
    
    Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
    cheaper than scanning a number of lists.  To avoid synchronization
    overhead, counter deltas are maintained on a per-cpu basis and drained
    both periodically and when the delta is above a threshold.  On large CPU
    systems, the difference between the estimated and real value of
    NR_FREE_PAGES can be very high.  If NR_FREE_PAGES is much higher than
    number of real free page in buddy, the VM can allocate pages below min
    watermark, at worst reducing the real number of pages to zero.  Even if
    the OOM killer kills some victim for freeing memory, it may not free
    memory if the exit path requires a new page resulting in livelock.
    
    This patch introduces a zone_page_state_snapshot() function (courtesy of
    Christoph) that takes a slightly more accurate view of an arbitrary vmstat
    counter.  It is used to read NR_FREE_PAGES while kswapd is awake to avoid
    the watermark being accidentally broken.  The estimate is not perfect and
    may result in cache line bounces but is expected to be lighter than the
    IPI calls necessary to continually drain the per-cpu counters while kswapd
    is awake.
    
    Signed-off-by: Christoph Lameter <cl@linux.com>
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Christoph Lameter committed with gregkh Sep 9, 2010
  32. @gregkh

    KEYS: Fix bug in keyctl_session_to_parent() if parent has no session …

    …keyring
    
    commit 3d96406 upstream.
    
    Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership
    of the parent process's session keyring whether or not the parent has a session
    keyring [CVE-2010-2960].
    
    This results in the following oops:
    
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
      IP: [<ffffffff811ae4dd>] keyctl_session_to_parent+0x251/0x443
      ...
      Call Trace:
       [<ffffffff811ae2f3>] ? keyctl_session_to_parent+0x67/0x443
       [<ffffffff8109d286>] ? __do_fault+0x24b/0x3d0
       [<ffffffff811af98c>] sys_keyctl+0xb4/0xb8
       [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b
    
    if the parent process has no session keyring.
    
    If the system is using pam_keyinit then it mostly protected against this as all
    processes derived from a login will have inherited the session keyring created
    by pam_keyinit during the log in procedure.
    
    To test this, pam_keyinit calls need to be commented out in /etc/pam.d/.
    
    Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com>
    Signed-off-by: David Howells <dhowells@redhat.com>
    Acked-by: Tavis Ormandy <taviso@cmpxchg8b.com>
    Cc: dann frazier <dannf@debian.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    David Howells committed with gregkh Sep 10, 2010
  33. @gregkh

    KEYS: Fix RCU no-lock warning in keyctl_session_to_parent()

    commit 9d1ac65 upstream.
    
    There's an protected access to the parent process's credentials in the middle
    of keyctl_session_to_parent().  This results in the following RCU warning:
    
      ===================================================
      [ INFO: suspicious rcu_dereference_check() usage. ]
      ---------------------------------------------------
      security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection!
    
      other info that might help us debug this:
    
      rcu_scheduler_active = 1, debug_locks = 0
      1 lock held by keyctl-session-/2137:
       #0:  (tasklist_lock){.+.+..}, at: [<ffffffff811ae2ec>] keyctl_session_to_parent+0x60/0x236
    
      stack backtrace:
      Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1
      Call Trace:
       [<ffffffff8105606a>] lockdep_rcu_dereference+0xaa/0xb3
       [<ffffffff811ae379>] keyctl_session_to_parent+0xed/0x236
       [<ffffffff811af77e>] sys_keyctl+0xb4/0xb6
       [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b
    
    The code should take the RCU read lock to make sure the parents credentials
    don't go away, even though it's holding a spinlock and has IRQ disabled.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: dann frazier <dannf@debian.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    David Howells committed with gregkh Sep 10, 2010
  34. @eparis @gregkh

    inotify: send IN_UNMOUNT events

    commit 611da04 upstream.
    
    Since the .31 or so notify rewrite inotify has not sent events about
    inodes which are unmounted.  This patch restores those events.
    
    Signed-off-by: Eric Paris <eparis@redhat.com>
    Cc: Ben Hutchings <ben@decadent.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    eparis committed with gregkh Jul 28, 2010
  35. @ptesarik @gregkh

    IA64: Optimize ticket spinlocks in fsys_rt_sigprocmask

    commit 2d2b690 upstream.
    
    Tony's fix (f574c84) has a small bug,
    it incorrectly uses "r3" as a scratch register in the first of the two
    unlock paths ... it is also inefficient.  Optimize the fast path again.
    
    Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    ptesarik committed with gregkh Sep 15, 2010