Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Commits on Oct 17, 2012
  1. Linux 3.2.32

    Ben Hutchings authored
  2. @danvet

    drm/i915: clear fencing tracking state when retiring requests

    danvet authored Ben Hutchings committed
    commit 15a13bb upstream.
    This fixes a resume regression introduced in
    commit 7dd4906
    Author: Chris Wilson <>
    Date:   Wed Mar 21 10:48:18 2012 +0000
        drm/i915: Mark untiled BLT commands as fenced on gen2/3
    which fixed fencing tracking for untiled blt commands.
    A side effect of that patch was that now also untiled objects have a
    non-zero obj->last_fenced_seqno to track when a fence can be set up
    after a pipelined tiling change. Unfortunately this was only cleared
    by the fence setup and teardown code, resulting in tons of untiled but
    inactive objects with non-zero last_fenced_seqno.
    Now after resume we completely reset the seqno tracking, both on the
    driver side (by setting dev_priv->next_seqno = 1) and on the hw side
    (by allocating a new hws page, which contains the seqnos). Hilarity
    and indefinite waits ensued from the stale seqnos in
    obj->last_fenced_seqno from before the suspend.
    The fix is to properly clear the fencing tracking state like we
    already do for the normal gpu rendering while moving objects off the
    active list.
    Reported-and-tested-by: "Rafael J. Wysocki" <>
    Cc: Jiri Slaby <>
    Reviewed-by: Chris Wilson <>
    Signed-Off-by: Daniel Vetter <>
    Signed-off-by: Ben Hutchings <>
  3. @ickle

    drm/i915: Mark untiled BLT commands as fenced on gen2/3

    ickle authored Ben Hutchings committed
    commit 7dd4906 upstream.
    The BLT commands on gen2/3 utilize the fence registers and so we cannot
    modify any fences for the object whilst those commands are in flight.
    Currently we marked tiled commands as occupying a fence, but forgot to
    restrict the untiled commands from preventing a fence being assigned
    before they were completed.
    One side-effect is that we ten have to double check that a fence was
    allocated for a fenced buffer during move-to-active.
    Reported-by: Jiri Slaby <>
    Reviewed-by: Daniel Vetter <>
    Testcase: i-g-t/tests/gem_tiled_after_untiled_blt
    Tested-by: Daniel Vetter <>
    Signed-off-by: Chris Wilson <>
    Signed-off-by: Daniel Vetter <>
    [bwh: Backported to 3.2: The nesting of if-statements in the old
     i915_gem_execbuffer_reserve() differs from pin_and_fence_object(),
     so don't move the assignment of obj->pending_fenced_gpu_access but
     adjust the boolean expression as recommended by Daniel Vetter.]
    Signed-off-by: Ben Hutchings <>
  4. @danvet

    drm/i915: fix swizzle detection for gen3

    danvet authored Ben Hutchings committed
    commit c9c4b6f upstream.
    It looks like the desktop variants of i915 and i945 also have the DCC
    register to control dram channel interleave and cpu side bit6
    Unfortunately internal Cspec/ConfigDB documentation for these ancient chips
    have already been dropped and there seem to be no archives. Also
    somebody thought the swizzling behaviour is surely a worthy secret to
    keep and redacted any mention of these fields from the published Intel
    I suspect the hw engineers were really proud of the page coloring
    they've achieved in their first dual channel dram controller with
    bit17 - after all Bspec explains in great length the optimal layout of
    page frame numbers modulo 4 for the color and depth buffers, too.
    Later on when they've started to work on VT-d they shamefully
    discoverd their stupidity and tried to cover the tracks ...
    Tested-by: Daniel Vetter <> (i915g)
    Tested-by: Pavel Ondračka <> (i945g)
    Tested-by: Chris Wilson <>
    Signed-Off-by: Daniel Vetter <>
    Signed-off-by: Ben Hutchings <>
  5. xHCI: handle command after aborting the command ring

    Elric Fu authored Ben Hutchings committed
    commit b63f405 upstream.
    According to xHCI spec section and section,
    after aborting a command on the command ring, xHC will
    generate a command completion event with its completion
    code set to Command Ring Stopped at least. If a command is
    currently executing at the time of aborting a command, xHC
    also generate a command completion event with its completion
    code set to Command Abort. When the command ring is stopped,
    software may remove, add, or rearrage Command Descriptors.
    To cancel a command, software will initialize a command
    descriptor for the cancel command, and add it into a
    cancel_cmd_list of xhci. When the command ring is stopped,
    software will find the command trbs described by command
    descriptors in cancel_cmd_list and modify it to No Op
    command. If software can't find the matched trbs, we can
    think it had been finished.
    This patch should be backported to kernels as old as 3.0, that contain
    the commit 7ed603e "xhci: Add an
    assertion to check for virt_dev=0 bug." That commit papers over a NULL
    pointer dereference, and this patch fixes the underlying issue that
    caused the NULL pointer dereference.
    Signed-off-by: Elric Fu <>
    Signed-off-by: Sarah Sharp <>
    Tested-by: Miroslav Sabljic <>
    [bwh: Backported to 3.2: inc_deq() needs an additional 'consumer' argument;
     Jonathan Nieder worked out that this should be false]
    Signed-off-by: Ben Hutchings <>
  6. @jbrandeb

    e1000: fix lockdep splat in shutdown handler

    jbrandeb authored Ben Hutchings committed
    commit 3a3847e upstream.
    As reported by Steven Rostedt, e1000 has a lockdep splat added
    during the recent merge window.  The issue is that
    cancel_delayed_work is called while holding our private mutex.
    There is no reason that I can see to hold the mutex during pci
    shutdown, it was more just paranoia that I put the mutex_lock
    around the call to e1000_down.
    In a quick survey lots of drivers handle locking differently when
    being called by the pci layer.  The assumption here is that we
    don't need the mutexes' protection in this function because
    the driver could not be unloaded while in the shutdown handler
    which is only called at reboot or poweroff.
    Reported-by: Steven Rostedt <>
    Signed-off-by: Jesse Brandeburg <>
    Tested-by: Steven Rostedt <>
    Tested-by: Aaron Brown <>
    Signed-off-by: Jeff Kirsher <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Ben Hutchings <>
  7. @jengelh

    netfilter: xt_limit: have r->cost != 0 case work

    jengelh authored Ben Hutchings committed
    commit 82e6bfe upstream.
    Commit v2.6.19-rc1~1272^2~41 tells us that r->cost != 0 can happen when
    a running state is saved to userspace and then reinstated from there.
    Make sure that private xt_limit area is initialized with correct values.
    Otherwise, random matchings due to use of uninitialized memory.
    Signed-off-by: Jan Engelhardt <>
    Signed-off-by: Pablo Neira Ayuso <>
    Signed-off-by: Ben Hutchings <>
  8. netfilter: limit, hashlimit: avoid duplicated inline

    Florian Westphal authored Ben Hutchings committed
    commit 7a909ac upstream.
    credit_cap can be set to credit, which avoids inlining user2credits
    twice. Also, remove inline keyword and let compiler decide.
        684     192       0     876     36c net/netfilter/xt_limit.o
       4927     344      32    5303    14b7 net/netfilter/xt_hashlimit.o
        668     192       0     860     35c net/netfilter/xt_limit.o
       4793     344      32    5169    1431 net/netfilter/xt_hashlimit.o
    Signed-off-by: Florian Westphal <>
    Signed-off-by: Pablo Neira Ayuso <>
    Signed-off-by: Ben Hutchings <>
  9. ipvs: fix oops on NAT reply in br_nf context

    Lin Ming authored Ben Hutchings committed
    commit 9e33ce4 upstream.
    IPVS should not reset skb->nf_bridge in FORWARD hook
    by calling nf_reset for NAT replies. It triggers oops in
    [  579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
    [  579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
    [  579.781792] PGD 218f9067 PUD 0
    [  579.781865] Oops: 0000 [#1] SMP
    [  579.781945] CPU 0
    [  579.781983] Modules linked in:
    [  579.782047]
    [  579.782080]
    [  579.782114] Pid: 4644, comm: qemu Tainted: G        W    3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard  /30E8
    [  579.782300] RIP: 0010:[<ffffffff817b1ca5>]  [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
    [  579.782455] RSP: 0018:ffff88007b003a98  EFLAGS: 00010287
    [  579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a
    [  579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00
    [  579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90
    [  579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02
    [  579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000
    [  579.783177] FS:  0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70
    [  579.783306] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
    [  579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0
    [  579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [  579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760)
    [  579.783919] Stack:
    [  579.783959]  ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
    [  579.784110]  ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
    [  579.784260]  ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
    [  579.784477] Call Trace:
    [  579.784523]  <IRQ>
    [  579.784562]
    [  579.784603]  [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8
    [  579.784707]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
    [  579.784797]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
    [  579.784906]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
    [  579.784995]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
    [  579.785175]  [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b
    [  579.785179]  [<ffffffff817ac417>] __br_forward+0x97/0xa2
    [  579.785179]  [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257
    [  579.785179]  [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb
    [  579.785179]  [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1
    [  579.785179]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
    [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
    [  579.785179]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
    [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
    [  579.785179]  [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54
    [  579.785179]  [<ffffffff817ad62a>] br_handle_frame+0x213/0x229
    [  579.785179]  [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257
    [  579.785179]  [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1
    [  579.785179]  [<ffffffff816e69fc>] process_backlog+0x99/0x1e2
    [  579.785179]  [<ffffffff816e6800>] net_rx_action+0xdf/0x242
    [  579.785179]  [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0
    [  579.785179]  [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
    [  579.785179]  [<ffffffff8188812c>] call_softirq+0x1c/0x30
    The steps to reproduce as follow,
    1. On Host1, setup brige br0(
    2. Boot a kvm guest( on Host1 and start httpd
    3. Start IPVS service on Host1
       ipvsadm -A -t -s rr
       ipvsadm -a -t -r -m
    4. Run apache benchmark on Host2(
       ab -n 1000
              skb->nf_bridge = NULL;
    Actually, IPVS wants in this case just to replace nfct
    with untracked version. So replace the nf_reset(skb) call
    in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.
    Signed-off-by: Lin Ming <>
    Signed-off-by: Julian Anastasov <>
    Signed-off-by: Simon Horman <>
    Signed-off-by: Pablo Neira Ayuso <>
    Signed-off-by: Ben Hutchings <>
  10. netfilter: nf_ct_expect: fix possible access to uninitialized timer

    Pablo Neira Ayuso authored Ben Hutchings committed
    commit 2614f86 upstream.
    In __nf_ct_expect_check, the function refresh_timer returns 1
    if a matching expectation is found and its timer is successfully
    refreshed. This results in nf_ct_expect_related returning 0.
    Note that at this point:
    - the passed expectation is not inserted in the expectation table
      and its timer was not initialized, since we have refreshed one
      matching/existing expectation.
    - nf_ct_expect_alloc uses kmem_cache_alloc, so the expectation
      timer is in some undefined state just after the allocation,
      until it is appropriately initialized.
    This can be a problem for the SIP helper during the expectation
     if (nf_ct_expect_related(rtp_exp) == 0) {
             if (nf_ct_expect_related(rtcp_exp) != 0)
    Note that nf_ct_expect_related(rtp_exp) may return 0 for the timer refresh
    case that is detailed above. Then, if nf_ct_unexpect_related(rtcp_exp)
    returns != 0, nf_ct_unexpect_related(rtp_exp) is called, which does:
     if (del_timer(&exp->timeout)) {
    Note that del_timer always returns false if the timer has been
    initialized.  However, the timer was not initialized since setup_timer
    was not called, therefore, the expectation timer remains in some
    undefined state. If I'm not missing anything, this may lead to the
    removal an unexistent expectation.
    To fix this, the optimization that allows refreshing an expectation
    is removed. Now nf_conntrack_expect_related looks more consistent
    to me since it always add the expectation in case that it returns
    Thanks to Patrick McHardy for participating in the discussion of
    this patch.
    I think this may be the source of the problem described by:
    Reported-by: Rafal Fitt <>
    Acked-by: Patrick McHardy <>
    Signed-off-by: Pablo Neira Ayuso <>
    Signed-off-by: Ben Hutchings <>
  11. @kaber

    netfilter: nf_nat_sip: fix via header translation with multiple param…

    kaber authored Ben Hutchings committed
    commit f22eb25 upstream.
    Via-headers are parsed beginning at the first character after the Via-address.
    When the address is translated first and its length decreases, the offset to
    start parsing at is incorrect and header parameters might be missed.
    Update the offset after translating the Via-address to fix this.
    Signed-off-by: Patrick McHardy <>
    Signed-off-by: Pablo Neira Ayuso <>
    Signed-off-by: Ben Hutchings <>
  12. netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expec…

    Pablo Neira Ayuso authored Ben Hutchings committed
    commit 3f509c6 upstream.
    We're hitting bug while trying to reinsert an already existing
    kernel BUG at kernel/timer.c:895!
    invalid opcode: 0000 [#1] SMP
    Call Trace:
     [<ffffffffa0069563>] nf_ct_expect_related_report+0x4a0/0x57a [nf_conntrack]
     [<ffffffff812d423a>] ? in4_pton+0x72/0x131
     [<ffffffffa00ca69e>] ip_nat_sdp_media+0xeb/0x185 [nf_nat_sip]
     [<ffffffffa00b5b9b>] set_expected_rtp_rtcp+0x32d/0x39b [nf_conntrack_sip]
     [<ffffffffa00b5f15>] process_sdp+0x30c/0x3ec [nf_conntrack_sip]
     [<ffffffff8103f1eb>] ? irq_exit+0x9a/0x9c
     [<ffffffffa00ca738>] ? ip_nat_sdp_media+0x185/0x185 [nf_nat_sip]
    We have to remove the RTP expectation if the RTCP expectation hits EBUSY
    since we keep trying with other ports until we succeed.
    Reported-by: Rafal Fitt <>
    Signed-off-by: Pablo Neira Ayuso <>
    Signed-off-by: Ben Hutchings <>
  13. netfilter: nf_ct_ipv4: packets with wrong ihl are invalid

    Jozsef Kadlecsik authored Ben Hutchings committed
    commit 07153c6 upstream.
    It was reported that the Linux kernel sometimes logs:
    klogd: [2629147.402413] kernel BUG at net / netfilter /
    nf_conntrack_proto_tcp.c: 447!
    klogd: [1072212.887368] kernel BUG at net / netfilter /
    nf_conntrack_proto_tcp.c: 392
    ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
    nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
    at the indicated lines - TCP options parsing - should not happen.
    However, tcp_error() relies on the "dataoff" offset to the TCP header,
    calculated by ipv4_get_l4proto().  But ipv4_get_l4proto() does not check
    bogus ihl values in IPv4 packets, which then can slip through tcp_error()
    and get caught at the TCP options parsing routines.
    The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
    ihl value.
    The patch closes netfilter bugzilla id 771.
    Signed-off-by: Jozsef Kadlecsik <>
    Signed-off-by: Pablo Neira Ayuso <>
    Signed-off-by: Ben Hutchings <>
  14. sched: Fix migration thread runtime bogosity

    Mike Galbraith authored Ben Hutchings committed
    commit 8f61896 upstream.
    Make stop scheduler class do the same accounting as other classes,
    Migration threads can be caught in the act while doing exec balancing,
    leading to the below due to use of unmaintained ->se.exec_start.  The
    load that triggered this particular instance was an apparently out of
    control heavily threaded application that does system monitoring in
    what equated to an exec bomb, with one of the VERY frequently migrated
    tasks being ps.
    %CPU   PID USER     CMD
    99.3    45 root     [migration/10]
    97.7    53 root     [migration/12]
    97.0    57 root     [migration/13]
    90.1    49 root     [migration/11]
    89.6    65 root     [migration/15]
    88.7    17 root     [migration/3]
    80.4    37 root     [migration/8]
    78.1    41 root     [migration/9]
    44.2    13 root     [migration/2]
    Signed-off-by: Mike Galbraith <>
    Signed-off-by: Peter Zijlstra <>
    Signed-off-by: Thomas Gleixner <>
    [Steven Rostedt: backport for 3.2.]
    Signed-off-by: Ben Hutchings <>
  15. @smcameron

    hpsa: dial down lockup detection during firmware flash

    smcameron authored Ben Hutchings committed
    commit e85c597 upstream.
    Dial back the aggressiveness of the controller lockup detection thread.
    Currently it will declare the controller to be locked up if it goes
    for 10 seconds with no interrupts and no change in the heartbeat
    register.  Dial back this to 30 seconds with no heartbeat change, and
    also snoop the ioctl path and if a firmware flash command is detected,
    dial it back further to 4 minutes until the firmware flash command
    completes.  The reason for this is that during the firmware flash
    operation, the controller apparently doesn't update the heartbeat
    register as frequently as it is supposed to, and we can get a false
    Signed-off-by: Stephen M. Cameron <>
    Signed-off-by: James Bottomley <>
    [bwh: Backported to 3.2: adjust context]
    Signed-off-by: Ben Hutchings <>
  16. r8169: 8168c and later require bit 0x20 to be set in Config2 for PME …

    Francois Romieu authored Ben Hutchings committed
    commit d387b42 upstream.
    The new 84xx stopped flying below the radars.
    Signed-off-by: Francois Romieu <>
    Cc: Hayes Wang <>
    Signed-off-by: Ben Hutchings <>
  17. r8169: Config1 is read-only on 8168c and later.

    Francois Romieu authored Ben Hutchings committed
    commit 851e602 upstream.
    Suggested by Hayes.
    Signed-off-by: Francois Romieu <>
    Cc: Hayes Wang <>
    Signed-off-by: Ben Hutchings <>
  18. mempolicy: fix a memory corruption by refcount imbalance in alloc_pag…

    Mel Gorman authored Ben Hutchings committed
    commit 00442ad upstream.
    Commit cc9a6c8 ("cpuset: mm: reduce large amounts of memory barrier
    related damage v3") introduced a potential memory corruption.
    shmem_alloc_page() uses a pseudo vma and it has one significant unique
    combination, vma->vm_ops=NULL and vma->policy->flags & MPOL_F_SHARED.
    get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
    and mpol_cond_put() DOES decrease a policy ref when a policy has
    MPOL_F_SHARED.  Therefore, when a cpuset update race occurs,
    alloc_pages_vma() falls in 'goto retry_cpuset' path, decrements the
    reference count and frees the policy prematurely.
    Signed-off-by: KOSAKI Motohiro <>
    Signed-off-by: Mel Gorman <>
    Reviewed-by: Christoph Lameter <>
    Cc: Josh Boyer <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Mel Gorman <>
    Signed-off-by: Ben Hutchings <>
  19. @kosaki

    mempolicy: fix refcount leak in mpol_set_shared_policy()

    kosaki authored Ben Hutchings committed
    commit 63f74ca upstream.
    When shared_policy_replace() fails to allocate new->policy is not freed
    correctly by mpol_set_shared_policy().  The problem is that shared
    mempolicy code directly call kmem_cache_free() in multiple places where
    it is easy to make a mistake.
    This patch creates an sp_free wrapper function and uses it. The bug was
    introduced pre-git age (IOW, before 2.6.12-rc2).
    [ Editted changelog]
    Signed-off-by: KOSAKI Motohiro <>
    Signed-off-by: Mel Gorman <>
    Reviewed-by: Christoph Lameter <>
    Cc: Josh Boyer <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Mel Gorman <>
    Signed-off-by: Ben Hutchings <>
  20. mempolicy: fix a race in shared_policy_replace()

    Mel Gorman authored Ben Hutchings committed
    commit b22d127 upstream.
    shared_policy_replace() use of sp_alloc() is unsafe.  1) sp_node cannot
    be dereferenced if sp->lock is not held and 2) another thread can modify
    sp_node between spin_unlock for allocating a new sp node and next
    spin_lock.  The bug was introduced before 2.6.12-rc2.
    Kosaki's original patch for this problem was to allocate an sp node and
    policy within shared_policy_replace and initialise it when the lock is
    reacquired.  I was not keen on this approach because it partially
    duplicates sp_alloc().  As the paths were sp->lock is taken are not that
    performance critical this patch converts sp->lock to sp->mutex so it can
    sleep when calling sp_alloc().
    [ Original patch]
    Signed-off-by: Mel Gorman <>
    Acked-by: KOSAKI Motohiro <>
    Reviewed-by: Christoph Lameter <>
    Cc: Josh Boyer <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Mel Gorman <>
    Signed-off-by: Ben Hutchings <>
  21. @kosaki

    mempolicy: remove mempolicy sharing

    kosaki authored Ben Hutchings committed
    commit 869833f upstream.
    Dave Jones' system call fuzz testing tool "trinity" triggered the
    following bug error with slab debugging enabled
        BUG numa_policy (Not tainted): Poison overwritten
        INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
        INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
        INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
        INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x          (null) flags=0x20000000004080
        INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0
    The problem is that the structure is being prematurely freed due to a
    reference count imbalance. In the following case mbind(addr, len) should
    replace the memory policies of both vma1 and vma2 and thus they will
    become to share the same mempolicy and the new mempolicy will have the
    MPOL_F_SHARED flag.
      |     vma1          |     vma2(shmem)   |
      |                                       |
     addr                                 addr+len
    alloc_pages_vma() uses get_vma_policy() and mpol_cond_put() pair for
    maintaining the mempolicy reference count.  The current rule is that
    get_vma_policy() only increments refcount for shmem VMA and
    mpol_conf_put() only decrements refcount if the policy has
    In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED!
    The reference count will be decreased even though was not increased
    whenever alloc_page_vma() is called.  This has been broken since commit
    [52cd3b0: mempolicy: rework mempolicy Reference Counting] in 2008.
    There is another serious bug with the sharing of memory policies.
    Currently, mempolicy rebind logic (it is called from cpuset rebinding)
    ignores a refcount of mempolicy and override it forcibly.  Thus, any
    mempolicy sharing may cause mempolicy corruption.  The bug was
    introduced by commit [68860ec: cpusets: automatic numa mempolicy
    Ideally, the shared policy handling would be rewritten to either
    properly handle COW of the policy structures or at least reference count
    MPOL_F_SHARED based exclusively on information within the policy.
    However, this patch takes the easier approach of disabling any policy
    sharing between VMAs.  Each new range allocated with sp_alloc will
    allocate a new policy, set the reference count to 1 and drop the
    reference count of the old policy.  This increases the memory footprint
    but is not expected to be a major problem as mbind() is unlikely to be
    used for fine-grained ranges.  It is also inefficient because it means
    we allocate a new policy even in cases where mbind_range() could use the
    new_policy passed to it.  However, it is more straight-forward and the
    change should be invisible to the user.
    [ Edited changelog]
    Reported-by: Dave Jones <>,
    Cc: Christoph Lameter <>,
    Reviewed-by: Christoph Lameter <>
    Signed-off-by: KOSAKI Motohiro <>
    Signed-off-by: Mel Gorman <>
    Cc: Josh Boyer <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Mel Gorman <>
    Signed-off-by: Ben Hutchings <>
  22. efi: initialize efi.runtime_version to make query_variable_info/updat…

    Seiji Aguchi authored Ben Hutchings committed
    …e_capsule workable
    commit d6cf86d upstream.
    A value of efi.runtime_version is checked before calling
    update_capsule()/query_variable_info() as follows.
    But it isn't initialized anywhere.
    static efi_status_t virt_efi_query_variable_info(u32 attr,
                                                     u64 *storage_space,
                                                     u64 *remaining_space,
                                                     u64 *max_variable_size)
            if (efi.runtime_version < EFI_2_00_SYSTEM_TABLE_REVISION)
                    return EFI_UNSUPPORTED;
    This patch initializes a value of efi.runtime_version at boot time.
    Signed-off-by: Seiji Aguchi <>
    Acked-by: Matthew Garrett <>
    Signed-off-by: Matt Fleming <>
    Signed-off-by: Ben Hutchings <>
  23. drm/radeon: properly handle mc_stop/mc_resume on evergreen+ (v2)

    Alex Deucher authored Ben Hutchings committed
    commit 62444b7 upstream.
    - Stop the displays from accessing the FB
    - Block CPU access
    - Turn off MC client access
    This should fix issues some users have seen, especially
    with UEFI, when changing the MC FB location that result
    in hangs or display corruption.
    v2: fix crtc enabled check noticed by Luca Tettamanti
    Signed-off-by: Alex Deucher <>
    [bwh: Backported to 3.2:
     - Drop DCE6 cases
     - Call evergreen_mc_wait_for_idle() directly
     - Add dce4_wait_for_vblank() (commits 3ae19b7
       and 4a15903) and call it directly
    Signed-off-by: Ben Hutchings <>
  24. eCryptfs: Call lower ->flush() from ecryptfs_flush()

    Tyler Hicks authored Ben Hutchings committed
    commit 64e6651 upstream.
    Since eCryptfs only calls fput() on the lower file in
    ecryptfs_release(), eCryptfs should call the lower filesystem's
    ->flush() from ecryptfs_flush().
    If the lower filesystem implements ->flush(), then eCryptfs should try
    to flush out any dirty pages prior to calling the lower ->flush(). If
    the lower filesystem does not implement ->flush(), then eCryptfs has no
    need to do anything in ecryptfs_flush() since dirty pages are now
    written out to the lower filesystem in ecryptfs_release().
    Signed-off-by: Tyler Hicks <>
    Signed-off-by: Ben Hutchings <>
  25. eCryptfs: Write out all dirty pages just before releasing the lower file

    Tyler Hicks authored Ben Hutchings committed
    commit 7149f25 upstream.
    Fixes a regression caused by:
    821f749 eCryptfs: Revert to a writethrough cache model
    That patch reverted some code (specifically, 32001d6) that was
    necessary to properly handle open() -> mmap() -> close() -> dirty pages
    -> munmap(), because the lower file could be closed before the dirty
    pages are written out.
    Rather than reapplying 32001d6, this approach is a better way of
    ensuring that the lower file is still open in order to handle writing
    out the dirty pages. It is called from ecryptfs_release(), while we have
    a lock on the lower file pointer, just before the lower file gets the
    final fput() and we overwrite the pointer.
    Signed-off-by: Tyler Hicks <>
    Reported-by: Artemy Tregubenko <>
    Tested-by: Artemy Tregubenko <>
    Tested-by: Colin Ian King <>
    Signed-off-by: Ben Hutchings <>
  26. eCryptfs: Revert to a writethrough cache model

    Tyler Hicks authored Ben Hutchings committed
    commit 821f749 upstream.
    A change was made about a year ago to get eCryptfs to better utilize its
    page cache during writes. The idea was to do the page encryption
    operations during page writeback, rather than doing them when initially
    writing into the page cache, to reduce the number of page encryption
    operations during sequential writes. This meant that the encrypted page
    would only be written to the lower filesystem during page writeback,
    which was a change from how eCryptfs had previously wrote to the lower
    filesystem in ecryptfs_write_end().
    The change caused a few eCryptfs-internal bugs that were shook out.
    Unfortunately, more grave side effects have been identified that will
    force changes outside of eCryptfs. Because the lower filesystem isn't
    consulted until page writeback, eCryptfs has no way to pass lower write
    errors (ENOSPC, mainly) back to userspace. Additionaly, it was reported
    that quotas could be bypassed because of the way eCryptfs may sometimes
    open the lower filesystem using a privileged kthread.
    It would be nice to resolve the latest issues, but it is best if the
    eCryptfs commits be reverted to the old behavior in the meantime.
    This reverts:
    32001d6 "eCryptfs: Flush file in vma close"
    5be79de "eCryptfs: Flush dirty pages in setattr"
    57db4e8 "ecryptfs: modify write path to encrypt page in writepage"
    Signed-off-by: Tyler Hicks <>
    Tested-by: Colin King <>
    Cc: Colin King <>
    Cc: Thieu Le <>
    Signed-off-by: Ben Hutchings <>
  27. eCryptfs: Initialize empty lower files when opening them

    Tyler Hicks authored Ben Hutchings committed
    commit e3ccaa9 upstream.
    Historically, eCryptfs has only initialized lower files in the
    ecryptfs_create() path. Lower file initialization is the act of writing
    the cryptographic metadata from the inode's crypt_stat to the header of
    the file. The ecryptfs_open() path already expects that metadata to be
    in the header of the file.
    A number of users have reported empty lower files in beneath their
    eCryptfs mounts. Most of the causes for those empty files being left
    around have been addressed, but the presence of empty files causes
    problems due to the lack of proper cryptographic metadata.
    To transparently solve this problem, this patch initializes empty lower
    files in the ecryptfs_open() error path. If the metadata is unreadable
    due to the lower inode size being 0, plaintext passthrough support is
    not in use, and the metadata is stored in the header of the file (as
    opposed to the user.ecryptfs extended attribute), the lower file will be
    The number of nested conditionals in ecryptfs_open() was getting out of
    hand, so a helper function was created. To avoid the same nested
    conditional problem, the conditional logic was reversed inside of the
    helper function.
    Signed-off-by: Tyler Hicks <>
    Cc: John Johansen <>
    Cc: Colin Ian King <>
    Signed-off-by: Ben Hutchings <>
  28. eCryptfs: Unlink lower inode when ecryptfs_create() fails

    Tyler Hicks authored Ben Hutchings committed
    commit 8bc2d3c upstream.
    ecryptfs_create() creates a lower inode, allocates an eCryptfs inode,
    initializes the eCryptfs inode and cryptographic metadata attached to
    the inode, and then writes the metadata to the header of the file.
    If an error was to occur after the lower inode was created, an empty
    lower file would be left in the lower filesystem. This is a problem
    because ecryptfs_open() refuses to open any lower files which do not
    have the appropriate metadata in the file header.
    This patch properly unlinks the lower inode when an error occurs in the
    later stages of ecryptfs_create(), reducing the chance that an empty
    lower file will be left in the lower filesystem.
    Signed-off-by: Tyler Hicks <>
    Cc: John Johansen <>
    Cc: Colin Ian King <>
    Signed-off-by: Ben Hutchings <>
  29. udf: fix retun value on error path in udf_load_logicalvol

    Nikola Pajkovsky authored Ben Hutchings committed
    commit 68766a2 upstream.
    In case we detect a problem and bail out, we fail to set "ret" to a
    nonzero value, and udf_load_logicalvol will mistakenly report success.
    Signed-off-by: Nikola Pajkovsky <>
    Signed-off-by: Jan Kara <>
    Signed-off-by: Ben Hutchings <>
  30. @raven-au

    autofs4 - fix reset pending flag on mount fail

    raven-au authored Ben Hutchings committed
    commit 49999ab upstream.
    In autofs4_d_automount(), if a mount fail occurs the AUTOFS_INF_PENDING
    mount pending flag is not cleared.
    One effect of this is when using the "browse" option, directory entry
    attributes show up with all "?"s due to the incorrect callback and
    subsequent failure return (when in fact no callback should be made).
    Signed-off-by: Ian Kent <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Ben Hutchings <>
  31. firewire: cdev: fix user memory corruption (i386 userland on amd64 ke…

    Stefan Richter authored Ben Hutchings committed
    commit 790198f upstream.
    Fix two bugs of the /dev/fw* character device concerning the
    FW_CDEV_IOC_GET_INFO ioctl with nonzero fw_cdev_get_info.bus_reset.
    (Practically all /dev/fw* clients issue this ioctl right after opening
    the device.)
    Both bugs are caused by sizeof(struct fw_cdev_event_bus_reset) being 36
    without natural alignment and 40 with natural alignment.
     1) Memory corruption, affecting i386 userland on amd64 kernel:
        Userland reserves a 36 bytes large buffer, kernel writes 40 bytes.
        This has been first found and reported against libraw1394 if
        compiled with gcc 4.7 which happens to order libraw1394's stack such
        that the bug became visible as data corruption.
     2) Information leak, affecting all kernel architectures except i386:
        4 bytes of random kernel stack data were leaked to userspace.
    Hence limit the respective copy_to_user() to the 32-bit aligned size of
    struct fw_cdev_event_bus_reset.
    Reported-by: Simon Kirby <>
    Signed-off-by: Stefan Richter <>
    Signed-off-by: Ben Hutchings <>
  32. hugetlb: do not use vma_hugecache_offset() for vma_prio_tree_foreach

    Michal Hocko authored Ben Hutchings committed
    commit 36e4f20 upstream.
    Commit 0c176d5 ("mm: hugetlb: fix pgoff computation when unmapping
    page from vma") fixed pgoff calculation but it has replaced it by
    vma_hugecache_offset() which is not approapriate for offsets used for
    vma_prio_tree_foreach() because that one expects index in page units
    rather than in huge_page_shift.
    Johannes said:
    : The resulting index may not be too big, but it can be too small: assume
    : hpage size of 2M and the address to unmap to be 0x200000.  This is regular
    : page index 512 and hpage index 1.  If you have a VMA that maps the file
    : only starting at the second huge page, that VMAs vm_pgoff will be 512 but
    : you ask for offset 1 and miss it even though it does map the page of
    : interest.  hugetlb_cow() will try to unmap, miss the vma, and retry the
    : cow until the allocation succeeds or the skipped vma(s) go away.
    Signed-off-by: Michal Hocko <>
    Acked-by: Hillf Danton <>
    Cc: Mel Gorman <>
    Cc: KAMEZAWA Hiroyuki <>
    Cc: Andrea Arcangeli <>
    Cc: David Rientjes <>
    Acked-by: Johannes Weiner <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Ben Hutchings <>
  33. mm: hugetlb: fix pgoff computation when unmapping page from vma

    Hillf Danton authored Ben Hutchings committed
    commit 0c176d5 upstream.
    The computation for pgoff is incorrect, at least with
    	(vma->vm_pgoff >> PAGE_SHIFT)
    involved.  It is fixed with the available method if HPAGE_SIZE is
    concerned in page cache lookup.
    [ use vma_hugecache_offset() directly, per Michal]
    Signed-off-by: Hillf Danton <>
    Cc: Mel Gorman <>
    Cc: Michal Hocko <>
    Reviewed-by: KAMEZAWA Hiroyuki <>
    Cc: Andrea Arcangeli <>
    Cc: David Rientjes <>
    Reviewed-by: Michal Hocko <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    [bwh: Backported to 3.2: adjust context]
    Signed-off-by: Ben Hutchings <>
  34. mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP

    Andrea Arcangeli authored Ben Hutchings committed
    commit 027ef6c upstream.
    In many places !pmd_present has been converted to pmd_none.  For pmds
    that's equivalent and pmd_none is quicker so using pmd_none is better.
    However (unless we delete pmd_present) we should provide an accurate
    pmd_present too.  This will avoid the risk of code thinking the pmd is non
    present because it's under __split_huge_page_map, see the pmd_mknotpresent
    there and the comment above it.
    If the page has been mprotected as PROT_NONE, it would also lead to a
    pmd_present false negative in the same way as the race with
    Because the PSE bit stays on at all times (both during split_huge_page and
    when the _PAGE_PROTNONE bit get set), we could only check for the PSE bit,
    but checking the PROTNONE bit too is still good to remember pmd_present
    must always keep PROT_NONE into account.
    This explains a not reproducible BUG_ON that was seldom reported on the
    The same issue is in pmd_large, it would go wrong with both PROT_NONE and
    if it races with split_huge_page.
    Signed-off-by: Andrea Arcangeli <>
    Acked-by: Rik van Riel <>
    Cc: Johannes Weiner <>
    Cc: Hugh Dickins <>
    Cc: Mel Gorman <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Ben Hutchings <>
  35. mm: fix invalidate_complete_page2() lock ordering

    Hugh Dickins authored Ben Hutchings committed
    commit ec4d9f6 upstream.
    In fuzzing with trinity, lockdep protested "possible irq lock inversion
    dependency detected" when isolate_lru_page() reenabled interrupts while
    still holding the supposedly irq-safe tree_lock:
    isolate_lru_page() is correct to enable interrupts unconditionally:
    invalidate_complete_page2() is incorrect to call clear_page_mlock() while
    holding tree_lock, which is supposed to nest inside lru_lock.
    Both truncate_complete_page() and invalidate_complete_page() call
    clear_page_mlock() before taking tree_lock to remove page from radix_tree.
     I guess invalidate_complete_page2() preferred to test PageDirty (again)
    under tree_lock before committing to the munlock; but since the page has
    already been unmapped, its state is already somewhat inconsistent, and no
    worse if clear_page_mlock() moved up.
    Reported-by: Sasha Levin <>
    Deciphered-by: Andrew Morton <>
    Signed-off-by: Hugh Dickins <>
    Acked-by: Mel Gorman <>
    Cc: Rik van Riel <>
    Cc: Johannes Weiner <>
    Cc: Michel Lespinasse <>
    Cc: Ying Han <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Ben Hutchings <>
Something went wrong with that request. Please try again.