Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

wmii colour misrepresentation #10

Closed
colons opened this Issue · 1 comment

2 participants

@colons

When running wmii-hg on Arch Linux Arm on the Raspberry Pi with my favourite wmiirc, the window decoration colours are modified in a bizarre way I have yet to identify.

Screenshots:
distortion: pi (photo)
expected: arch x86 box (photo)

What's weird about this (and what leads me to believe it's a low-level issue) is that when run in a tightvnc x server, the colours load and render fine. Additionally, everything else under the broken wmii instances works fine — images, webpages, etc. render perfectly. I assume this means it's a bug in whatever system wmii uses to initially select its colours.

If this doesn't belong here, I apologise and will be willing to resubmit elsewehere.

@popcornmix
Owner

There's an update to kernel source that might fix this. Either rebuild kernel, or wait for an Arch udpate and report back.

@bootc bootc referenced this issue from a commit in bootc/linux-rpi-orig
Mel Gorman mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGE…
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9da11af
@popcornmix popcornmix closed this
@PeterHuewe PeterHuewe referenced this issue from a commit in PeterHuewe/linux
Mel Gorman mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGE…
…S block during isolation for migration

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
0bf380b
@PeterHuewe PeterHuewe referenced this issue from a commit in PeterHuewe/linux
@yizou yizou ixgbe: do not update real num queues when netdev is going away
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 #6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 #7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 #9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
#19 [ffff88021f007e68] worker_thread at ffffffff81060513
#20 [ffff88021f007ee8] kthread at ffffffff810648b6
#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9d837ea
@popcornmix popcornmix referenced this issue from a commit in popcornmix/linux
Jeff Layton cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps
commit 3cf003c upstream.

Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:

crash> bt
PID: 2789   TASK: f02edaa0  CPU: 3   COMMAND: "fsx"
 #0 [eed63cbc] schedule at c083c5b3
 #1 [eed63d80] kmap_high at c0500ec8
 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
 #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
 #4 [eed63e50] do_writepages at c04f3e32
 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a
 #6 [eed63ea4] filemap_fdatawrite at c04e1b3e
 #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
 #8 [eed63ecc] do_sync_write at c052d202
 #9 [eed63f74] vfs_write at c052d4ee
#10 [eed63f94] sys_write at c052df4c
#11 [eed63fb0] ia32_sysenter_target at c0409a98
    EAX: 00000004  EBX: 00000003  ECX: abd73b73  EDX: 012a65c6
    DS:  007b      ESI: 012a65c6  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bf8db178  EBP: bf8db1f8  GS:  0033
    CS:  0073      EIP: 40000424  ERR: 00000004  EFLAGS: 00000246

Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.

This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.

There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.

Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
bb761c9
@popcornmix popcornmix referenced this issue from a commit in popcornmix/linux
Jeff Layton nfs: skip commit in releasepage if we're freeing memory for fs-relate…
…d reasons

commit 5cf02d0 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
1c88c58
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Ben Myers xfs: stop the sync worker before xfs_unmountfs
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs.  This prevents occasional crashes on unmount like so:

PID: 21602  TASK: ee9df060  CPU: 0   COMMAND: "kworker/0:3"
 #0 [c5377d28] crash_kexec at c0292c94
 #1 [c5377d80] oops_end at c07090c2
 #2 [c5377d98] no_context at c06f614e
 #3 [c5377dbc] __bad_area_nosemaphore at c06f6281
 #4 [c5377df4] bad_area_nosemaphore at c06f629b
 #5 [c5377e00] do_page_fault at c070b0cb
 #6 [c5377e7c] error_code (via page_fault) at c070892c
    EAX: f300c6a8  EBX: f300c6a8  ECX: 000000c0  EDX: 000000c0  EBP: c5377ed0
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000001  GS:  ffffad20
    CS:  0060      EIP: c0481ad0  ERR: ffffffff  EFLAGS: 00010246
 #7 [c5377eb0] atomic64_read_cx8 at c0481ad0
 #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
 #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653  TASK: e79143b0  CPU: 3   COMMAND: "umount"
 #0 [cde0fda0] __schedule at c0706595
 #1 [cde0fe28] schedule at c0706b89
 #2 [cde0fe30] schedule_timeout at c0705600
 #3 [cde0fe94] __down_common at c0706098
 #4 [cde0fec8] __down at c0706122
 #5 [cde0fed0] down at c025936f
 #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
 #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
 #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
 #9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66

commit 11159a0 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
4026c9f
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Ben Myers xfs: stop the sync worker before xfs_unmountfs
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs.  This prevents occasional crashes on unmount like so:

PID: 21602  TASK: ee9df060  CPU: 0   COMMAND: "kworker/0:3"
 #0 [c5377d28] crash_kexec at c0292c94
 #1 [c5377d80] oops_end at c07090c2
 #2 [c5377d98] no_context at c06f614e
 #3 [c5377dbc] __bad_area_nosemaphore at c06f6281
 #4 [c5377df4] bad_area_nosemaphore at c06f629b
 #5 [c5377e00] do_page_fault at c070b0cb
 #6 [c5377e7c] error_code (via page_fault) at c070892c
    EAX: f300c6a8  EBX: f300c6a8  ECX: 000000c0  EDX: 000000c0  EBP: c5377ed0
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000001  GS:  ffffad20
    CS:  0060      EIP: c0481ad0  ERR: ffffffff  EFLAGS: 00010246
 #7 [c5377eb0] atomic64_read_cx8 at c0481ad0
 #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
 #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653  TASK: e79143b0  CPU: 3   COMMAND: "umount"
 #0 [cde0fda0] __schedule at c0706595
 #1 [cde0fe28] schedule at c0706b89
 #2 [cde0fe30] schedule_timeout at c0705600
 #3 [cde0fe94] __down_common at c0706098
 #4 [cde0fec8] __down at c0706122
 #5 [cde0fed0] down at c025936f
 #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
 #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
 #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
 #9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66

commit 11159a0 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
0ba6e53
@popcornmix popcornmix referenced this issue from a commit
Luck, Tony x86: Remove some noise from boot log when starting cpus
Printing the "start_ip" for every secondary cpu is very noisy on a large
system - and doesn't add any value. Drop this message.

Console log before:
Booting Node   0, Processors  #1
smpboot cpu 1: start_ip = 96000
 #2
smpboot cpu 2: start_ip = 96000
 #3
smpboot cpu 3: start_ip = 96000
 #4
smpboot cpu 4: start_ip = 96000
       ...
 #31
smpboot cpu 31: start_ip = 96000
Brought up 32 CPUs

Console log after:
Booting Node   0, Processors  #1 #2 #3 #4 #5 #6 #7 Ok.
Booting Node   1, Processors  #8 #9 #10 #11 #12 #13 #14 #15 Ok.
Booting Node   0, Processors  #16 #17 #18 #19 #20 #21 #22 #23 Ok.
Booting Node   1, Processors  #24 #25 #26 #27 #28 #29 #30 #31
Brought up 32 CPUs

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
140f190
@popcornmix popcornmix referenced this issue from a commit
Borislav Petkov x86/smp: Fix topology checks on AMD MCM CPUs
The warning below triggers on AMD MCM packages because physical package
IDs on the cores of a _physical_ socket are the same. I.e., this field
says which CPUs belong to the same physical package.

However, the same two CPUs belong to two different internal, i.e.
"logical" nodes in the same physical socket which is reflected in the
CPU-to-node map on x86 with NUMA.

Which makes this check wrong on the above topologies so circumvent it.

[    0.444413] Booting Node   0, Processors  #1 #2 #3 #4 #5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  #6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  #7 #8 #9 #10 #11 Ok.
[    0.807108] Booting Node   3, Processors  #12 #13 #14 #15 #16 #17 Ok.
[    0.897587] Booting Node   2, Processors  #18 #19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

We ran a topology sanity check test we have here on it and
it all looks ok... hopefully :).

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
161270f
@popcornmix popcornmix referenced this issue from a commit
Jeff Layton cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps
Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:

crash> bt
PID: 2789   TASK: f02edaa0  CPU: 3   COMMAND: "fsx"
 #0 [eed63cbc] schedule at c083c5b3
 #1 [eed63d80] kmap_high at c0500ec8
 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
 #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
 #4 [eed63e50] do_writepages at c04f3e32
 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a
 #6 [eed63ea4] filemap_fdatawrite at c04e1b3e
 #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
 #8 [eed63ecc] do_sync_write at c052d202
 #9 [eed63f74] vfs_write at c052d4ee
#10 [eed63f94] sys_write at c052df4c
#11 [eed63fb0] ia32_sysenter_target at c0409a98
    EAX: 00000004  EBX: 00000003  ECX: abd73b73  EDX: 012a65c6
    DS:  007b      ESI: 012a65c6  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bf8db178  EBP: bf8db1f8  GS:  0033
    CS:  0073      EIP: 40000424  ERR: 00000004  EFLAGS: 00000246

Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.

This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.

There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.

Cc: <stable@vger.kernel.org>
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
3cf003c
@popcornmix popcornmix referenced this issue from a commit
Jeff Layton nfs: skip commit in releasepage if we're freeing memory for fs-relate…
…d reasons

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
5cf02d0
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
@davem330 davem330 Merge branch 'tipc_net-next_v2' of git://git.kernel.org/pub/scm/linux…
…/kernel/git/paulg/linux

Paul Gortmaker says:

====================
Changes since v1:
	-get rid of essentially unused variable spotted by
	 Neil Horman (patch #2)

	-drop patch #3; defer it for 3.9 content, so Neil,
	 Jon and Ying can discuss its specifics at their
	 leisure while net-next is closed.  (It had no
	 direct dependencies to the rest of the series, and
	 was just an optimization)

	-fix indentation of accept() code directly in place
	 vs. forking it out to a separate function (was patch
	 #10, now patch #9).

Rebuilt and re-ran tests just to ensure nothing odd happened.

Original v1 text follows, updated pull information follows that.

           ---------

Here is another batch of TIPC changes.  The most interesting
thing is probably the non-blocking socket connect - I'm told
there were several users looking forward to seeing this.

Also there were some resource limitation changes that had
the right intent back in 2005, but were now apparently causing
needless limitations to people's real use cases; those have
been relaxed/removed.

There is a lockdep splat fix, but no need for a stable backport,
since it is virtually impossible to trigger in mainline; you
have to essentially modify code to force the probabilities
in your favour to see it.

The rest can largely be categorized as general cleanup of things
seen in the process of getting the above changes done.

Tested between 64 and 32 bit nodes with the test suite.  I've
also compile tested all the individual commits on the chain.

I'd originally figured on this queue not being ready for 3.8, but
the extended stabilization window of 3.7 has changed that.  On
the other hand, this can still be 3.9 material, if that simply
works better for folks - no problem for me to defer it to 2013.
If anyone spots any problems then I'll definitely defer it,
rather than rush a last minute respin.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>
ba50166
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Jeff Layton cifs: move check for NULL socket into smb_send_rqst
Cai reported this oops:

[90701.616664] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[90701.625438] IP: [<ffffffff814a343e>] kernel_setsockopt+0x2e/0x60
[90701.632167] PGD fea319067 PUD 103fda4067 PMD 0
[90701.637255] Oops: 0000 [#1] SMP
[90701.640878] Modules linked in: des_generic md4 nls_utf8 cifs dns_resolver binfmt_misc tun sg igb iTCO_wdt iTCO_vendor_support lpc_ich pcspkr i2c_i801 i2c_core i7core_edac edac_core ioatdma dca mfd_core coretemp kvm_intel kvm crc32c_intel microcode sr_mod cdrom ata_generic sd_mod pata_acpi crc_t10dif ata_piix libata megaraid_sas dm_mirror dm_region_hash dm_log dm_mod
[90701.677655] CPU 10
[90701.679808] Pid: 9627, comm: ls Tainted: G        W    3.7.1+ #10 QCI QSSC-S4R/QSSC-S4R
[90701.688950] RIP: 0010:[<ffffffff814a343e>]  [<ffffffff814a343e>] kernel_setsockopt+0x2e/0x60
[90701.698383] RSP: 0018:ffff88177b431bb8  EFLAGS: 00010206
[90701.704309] RAX: ffff88177b431fd8 RBX: 00007ffffffff000 RCX: ffff88177b431bec
[90701.712271] RDX: 0000000000000003 RSI: 0000000000000006 RDI: 0000000000000000
[90701.720223] RBP: ffff88177b431bc8 R08: 0000000000000004 R09: 0000000000000000
[90701.728185] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
[90701.736147] R13: ffff88184ef92000 R14: 0000000000000023 R15: ffff88177b431c88
[90701.744109] FS:  00007fd56a1a47c0(0000) GS:ffff88105fc40000(0000) knlGS:0000000000000000
[90701.753137] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[90701.759550] CR2: 0000000000000028 CR3: 000000104f15f000 CR4: 00000000000007e0
[90701.767512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[90701.775465] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[90701.783428] Process ls (pid: 9627, threadinfo ffff88177b430000, task ffff88185ca4cb60)
[90701.792261] Stack:
[90701.794505]  0000000000000023 ffff88177b431c50 ffff88177b431c38 ffffffffa014fcb1
[90701.802809]  ffff88184ef921bc 0000000000000000 00000001ffffffff ffff88184ef921c0
[90701.811123]  ffff88177b431c08 ffffffff815ca3d9 ffff88177b431c18 ffff880857758000
[90701.819433] Call Trace:
[90701.822183]  [<ffffffffa014fcb1>] smb_send_rqst+0x71/0x1f0 [cifs]
[90701.828991]  [<ffffffff815ca3d9>] ? schedule+0x29/0x70
[90701.834736]  [<ffffffffa014fe6d>] smb_sendv+0x3d/0x40 [cifs]
[90701.841062]  [<ffffffffa014fe96>] smb_send+0x26/0x30 [cifs]
[90701.847291]  [<ffffffffa015801f>] send_nt_cancel+0x6f/0xd0 [cifs]
[90701.854102]  [<ffffffffa015075e>] SendReceive+0x18e/0x360 [cifs]
[90701.860814]  [<ffffffffa0134a78>] CIFSFindFirst+0x1a8/0x3f0 [cifs]
[90701.867724]  [<ffffffffa013f731>] ? build_path_from_dentry+0xf1/0x260 [cifs]
[90701.875601]  [<ffffffffa013f731>] ? build_path_from_dentry+0xf1/0x260 [cifs]
[90701.883477]  [<ffffffffa01578e6>] cifs_query_dir_first+0x26/0x30 [cifs]
[90701.890869]  [<ffffffffa015480d>] initiate_cifs_search+0xed/0x250 [cifs]
[90701.898354]  [<ffffffff81195970>] ? fillonedir+0x100/0x100
[90701.904486]  [<ffffffffa01554cb>] cifs_readdir+0x45b/0x8f0 [cifs]
[90701.911288]  [<ffffffff81195970>] ? fillonedir+0x100/0x100
[90701.917410]  [<ffffffff81195970>] ? fillonedir+0x100/0x100
[90701.923533]  [<ffffffff81195970>] ? fillonedir+0x100/0x100
[90701.929657]  [<ffffffff81195848>] vfs_readdir+0xb8/0xe0
[90701.935490]  [<ffffffff81195b9f>] sys_getdents+0x8f/0x110
[90701.941521]  [<ffffffff815d3b99>] system_call_fastpath+0x16/0x1b
[90701.948222] Code: 66 90 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 53 48 83 ec 08 83 fe 01 48 8b 98 48 e0 ff ff 48 c7 80 48 e0 ff ff ff ff ff ff 74 22 <48> 8b 47 28 ff 50 68 65 48 8b 14 25 f0 c6 00 00 48 89 9a 48 e0
[90701.970313] RIP  [<ffffffff814a343e>] kernel_setsockopt+0x2e/0x60
[90701.977125]  RSP <ffff88177b431bb8>
[90701.981018] CR2: 0000000000000028
[90701.984809] ---[ end trace 24bd602971110a43 ]---

This is likely due to a race vs. a reconnection event.

The current code checks for a NULL socket in smb_send_kvec, but that's
too late. By the time that check is done, the socket will already have
been passed to kernel_setsockopt. Move the check into smb_send_rqst, so
that it's checked earlier.

In truth, this is a bit of a half-assed fix. The -ENOTSOCK error
return here looks like it could bubble back up to userspace. The locking
rules around the ssocket pointer are really unclear as well. There are
cases where the ssocket pointer is changed without holding the srv_mutex,
but I'm not clear whether there's a potential race here yet or not.

This code seems like it could benefit from some fundamental re-think of
how the socket handling should behave. Until then though, this patch
should at least fix the above oops in most cases.

Cc: <stable@vger.kernel.org> # 3.7+
Reported-and-Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
ea702b8
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Sha Zhengju memcg, oom: provide more precise dump info while memcg oom happening
Currently when a memcg oom is happening the oom dump messages is still
global state and provides few useful info for users.  This patch prints
more pointed memcg page statistics for memcg-oom and take hierarchy into
consideration:

Based on Michal's advice, we take hierarchy into consideration: supppose
we trigger an OOM on A's limit

        root_memcg
            |
            A (use_hierachy=1)
           / \
          B   C
          |
          D
then the printed info will be:

  Memory cgroup stats for /A:...
  Memory cgroup stats for /A/B:...
  Memory cgroup stats for /A/C:...
  Memory cgroup stats for /A/B/D:...

Following are samples of oom output:

(1) Before change:

    mal-80 invoked oom-killer:gfp_mask=0xd0, order=0, oom_score_adj=0
    mal-80 cpuset=/ mems_allowed=0
    Pid: 2976, comm: mal-80 Not tainted 3.7.0+ #10
    Call Trace:
     [<ffffffff8167fbfb>] dump_header+0x83/0x1ca
     ..... (call trace)
     [<ffffffff8168a818>] page_fault+0x28/0x30
                             <<<<<<<<<<<<<<<<<<<<< memcg specific information
    Task in /A/B/D killed as a result of limit of /A
    memory: usage 101376kB, limit 101376kB, failcnt 57
    memory+swap: usage 101376kB, limit 101376kB, failcnt 0
    kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
                             <<<<<<<<<<<<<<<<<<<<< print per cpu pageset stat
    Mem-Info:
    Node 0 DMA per-cpu:
    CPU    0: hi:    0, btch:   1 usd:   0
    ......
    CPU    3: hi:    0, btch:   1 usd:   0
    Node 0 DMA32 per-cpu:
    CPU    0: hi:  186, btch:  31 usd: 173
    ......
    CPU    3: hi:  186, btch:  31 usd: 130
                             <<<<<<<<<<<<<<<<<<<<< print global page state
    active_anon:92963 inactive_anon:40777 isolated_anon:0
     active_file:33027 inactive_file:51718 isolated_file:0
     unevictable:0 dirty:3 writeback:0 unstable:0
     free:729995 slab_reclaimable:6897 slab_unreclaimable:6263
     mapped:20278 shmem:35971 pagetables:5885 bounce:0
     free_cma:0
                             <<<<<<<<<<<<<<<<<<<<< print per zone page state
    Node 0 DMA free:15836kB ... all_unreclaimable? no
    lowmem_reserve[]: 0 3175 3899 3899
    Node 0 DMA32 free:2888564kB ... all_unrelaimable? no
    lowmem_reserve[]: 0 0 724 724
    lowmem_reserve[]: 0 0 0 0
    Node 0 DMA: 1*4kB (U) ... 3*4096kB (M) = 15836kB
    Node 0 DMA32: 41*4kB (UM) ... 702*4096kB (MR) = 2888316kB
    120710 total pagecache pages
    0 pages in swap cache
                             <<<<<<<<<<<<<<<<<<<<< print global swap cache stat
    Swap cache stats: add 0, delete 0, find 0/0
    Free swap  = 499708kB
    Total swap = 499708kB
    1040368 pages RAM
    58678 pages reserved
    169065 pages shared
    173632 pages non-shared
    [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
    [ 2693]     0  2693     6005     1324      17        0             0 god
    [ 2754]     0  2754     6003     1320      16        0             0 god
    [ 2811]     0  2811     5992     1304      18        0             0 god
    [ 2874]     0  2874     6005     1323      18        0             0 god
    [ 2935]     0  2935     8720     7742      21        0             0 mal-30
    [ 2976]     0  2976    21520    17577      42        0             0 mal-80
    Memory cgroup out of memory: Kill process 2976 (mal-80) score 665 or sacrifice child
    Killed process 2976 (mal-80) total-vm:86080kB, anon-rss:69964kB, file-rss:344kB

We can see that messages dumped by show_free_areas() are longsome and can
provide so limited info for memcg that just happen oom.

(2) After change
    mal-80 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
    mal-80 cpuset=/ mems_allowed=0
    Pid: 2704, comm: mal-80 Not tainted 3.7.0+ #10
    Call Trace:
     [<ffffffff8167fd0b>] dump_header+0x83/0x1d1
     .......(call trace)
     [<ffffffff8168a918>] page_fault+0x28/0x30
    Task in /A/B/D killed as a result of limit of /A
                             <<<<<<<<<<<<<<<<<<<<< memcg specific information
    memory: usage 102400kB, limit 102400kB, failcnt 140
    memory+swap: usage 102400kB, limit 102400kB, failcnt 0
    kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
    Memory cgroup stats for /A: cache:32KB rss:30984KB mapped_file:0KB swap:0KB inactive_anon:6912KB active_anon:24072KB inactive_file:32KB active_file:0KB unevictable:0KB
    Memory cgroup stats for /A/B: cache:0KB rss:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
    Memory cgroup stats for /A/C: cache:0KB rss:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
    Memory cgroup stats for /A/B/D: cache:32KB rss:71352KB mapped_file:0KB swap:0KB inactive_anon:6656KB active_anon:64696KB inactive_file:16KB active_file:16KB unevictable:0KB
    [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
    [ 2260]     0  2260     6006     1325      18        0             0 god
    [ 2383]     0  2383     6003     1319      17        0             0 god
    [ 2503]     0  2503     6004     1321      18        0             0 god
    [ 2622]     0  2622     6004     1321      16        0             0 god
    [ 2695]     0  2695     8720     7741      22        0             0 mal-30
    [ 2704]     0  2704    21520    17839      43        0             0 mal-80
    Memory cgroup out of memory: Kill process 2704 (mal-80) score 669 or sacrifice child
    Killed process 2704 (mal-80) total-vm:86080kB, anon-rss:71016kB, file-rss:340kB

This version provides more pointed info for memcg in "Memory cgroup stats
for XXX" section.

Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
58cf188
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
@kghost kghost vxlan: fix oops when delete netns containing vxlan
The following script will produce a kernel oops:

    sudo ip netns add v
    sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo
    sudo ip netns exec v ip link set lo up
    sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo
    sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo
    sudo ip netns exec v ip link set vxlan0 up
    sudo ip netns del v

where inspect by gdb:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 107]
    0xffffffffa0289e33 in ?? ()
    (gdb) bt
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    #1  vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087
    #2  0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299
    #3  0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335
    #4  0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851
    #5  0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752
    #6  0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170
    #7  0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302
    #8  0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157
    #9  0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276
    #10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168
    #11 <signal handler called>
    #12 0x0000000000000000 in ?? ()
    #13 0x0000000000000000 in ?? ()
    (gdb) fr 0
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    533		struct sock *sk = vn->sock->sk;
    (gdb) l
    528	static int vxlan_leave_group(struct net_device *dev)
    529	{
    530		struct vxlan_dev *vxlan = netdev_priv(dev);
    531		struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
    532		int err = 0;
    533		struct sock *sk = vn->sock->sk;
    534		struct ip_mreqn mreq = {
    535			.imr_multiaddr.s_addr	= vxlan->gaddr,
    536			.imr_ifindex		= vxlan->link,
    537		};
    (gdb) p vn->sock
    $4 = (struct socket *) 0x0

The kernel calls `vxlan_exit_net` when deleting the netns before shutting down
vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock`
is already gone causes the oops. so we should manually shutdown all interfaces
before deleting `vn->sock` as the patch does.

Signed-off-by: Zang MingJie <zealot0630@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9cb6cb7
@popcornmix popcornmix referenced this issue from a commit
@kghost kghost vxlan: fix oops when delete netns containing vxlan
[ Upstream commit 9cb6cb7 ]

The following script will produce a kernel oops:

    sudo ip netns add v
    sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo
    sudo ip netns exec v ip link set lo up
    sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo
    sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo
    sudo ip netns exec v ip link set vxlan0 up
    sudo ip netns del v

where inspect by gdb:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 107]
    0xffffffffa0289e33 in ?? ()
    (gdb) bt
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    #1  vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087
    #2  0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299
    #3  0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335
    #4  0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851
    #5  0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752
    #6  0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170
    #7  0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302
    #8  0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157
    #9  0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276
    #10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168
    #11 <signal handler called>
    #12 0x0000000000000000 in ?? ()
    #13 0x0000000000000000 in ?? ()
    (gdb) fr 0
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    533		struct sock *sk = vn->sock->sk;
    (gdb) l
    528	static int vxlan_leave_group(struct net_device *dev)
    529	{
    530		struct vxlan_dev *vxlan = netdev_priv(dev);
    531		struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
    532		int err = 0;
    533		struct sock *sk = vn->sock->sk;
    534		struct ip_mreqn mreq = {
    535			.imr_multiaddr.s_addr	= vxlan->gaddr,
    536			.imr_ifindex		= vxlan->link,
    537		};
    (gdb) p vn->sock
    $4 = (struct socket *) 0x0

The kernel calls `vxlan_exit_net` when deleting the netns before shutting down
vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock`
is already gone causes the oops. so we should manually shutdown all interfaces
before deleting `vn->sock` as the patch does.

Signed-off-by: Zang MingJie <zealot0630@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
d711a39
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Russ Anderson mm: zone_end_pfn is too small
Booting with 32 TBytes memory hits BUG at mm/page_alloc.c:552! (output
below).

The key hint is "page 4294967296 outside zone".
4294967296 = 0x100000000 (bit 32 is set).

The problem is in include/linux/mmzone.h:

  530 static inline unsigned zone_end_pfn(const struct zone *zone)
  531 {
  532         return zone->zone_start_pfn + zone->spanned_pages;
  533 }

zone_end_pfn is "unsigned" (32 bits).  Changing it to "unsigned long"
(64 bits) fixes the problem.

zone_end_pfn() was added recently in commit 108bcc9 ("mm: add & use
zone_end_pfn() and zone_spans_pfn()")

Output from the failure.

  No AGP bridge found
  page 4294967296 outside zone [ 4294967296 - 4327469056 ]
  ------------[ cut here ]------------
  kernel BUG at mm/page_alloc.c:552!
  invalid opcode: 0000 [#1] SMP
  Modules linked in:
  CPU 0
  Pid: 0, comm: swapper Not tainted 3.9.0-rc2.dtp+ #10
  RIP: free_one_page+0x382/0x430
  Process swapper (pid: 0, threadinfo ffffffff81942000, task ffffffff81955420)
  Call Trace:
    __free_pages_ok+0x96/0xb0
    __free_pages+0x25/0x50
    __free_pages_bootmem+0x8a/0x8c
    __free_memory_core+0xea/0x131
    free_low_memory_core_early+0x4a/0x98
    free_all_bootmem+0x45/0x47
    mem_init+0x7b/0x14c
    start_kernel+0x216/0x433
    x86_64_start_reservations+0x2a/0x2c
    x86_64_start_kernel+0x144/0x153
  Code: 89 f1 ba 01 00 00 00 31 f6 d3 e2 4c 89 ef e8 66 a4 01 00 e9 2c fe ff ff 0f 0b eb fe 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 eb f3 <0f> 0b eb fe 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 0f 0b eb fe 49

Signed-off-by: Russ Anderson <rja@sgi.com>
Reported-by: George Beshers <gbeshers@sgi.com>
Acked-by: Hedi Berriche <hedi@sgi.com>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
f9228b2
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Eric Dumazet ip_tunnel: fix kernel panic with icmp_dest_unreach
Daniel Petre reported crashes in icmp_dst_unreach() with following call
graph:

#3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
#4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
#5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
#6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
#7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
#8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
#9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596

Daniel found a similar problem mentioned in
 http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html

And indeed this is the root cause : skb->cb[] contains data fooling IP
stack.

We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
is called. Or else skb->cb[] might contain garbage from GSO segmentation
layer.

A similar fix was tested on linux-3.9, but gre code was refactored in
linux-3.10. I'll send patches for stable kernels as well.

Many thanks to Daniel for providing reports, patches and testing !

Reported-by: Daniel Petre <daniel.petre@rcs-rds.ro>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a622260
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Michael Wang cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work()
Jiri Kosina <jkosina@suse.cz> and Borislav Petkov <bp@alien8.de>
reported the warning:

[   51.616759] ------------[ cut here ]------------
[   51.621460] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
[   51.629638] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
[   51.675581] CPU: 0 PID: 244 Comm: kworker/1:1 Tainted: G        W    3.10.0-rc1+ #10
[   51.683407] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   51.690901] Workqueue: events od_dbs_timer
[   51.695069]  0000000000000009 ffff88043a2f5b68 ffffffff8161441c ffff88043a2f5ba8
[   51.702602]  ffffffff8103e540 0000000000000033 0000000000000001 ffff88043d5f8000
[   51.710136]  00000000ffff0ce1 0000000000000001 ffff88044fc4fc08 ffff88043a2f5bb8
[   51.717691] Call Trace:
[   51.720191]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
[   51.725396]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   51.731473]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   51.737378]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
[   51.744013]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
[   51.749745]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
[   51.755214]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
[   51.761470]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
[   51.767724]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
[   51.773719]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
[   51.779271]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
[   51.784734]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   51.790634]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   51.796711]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   51.802350]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   51.808264]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   51.813200]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   51.819644]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
[   51.918165] nouveau E[     DRM] GPU lockup - switching to software fbcon
[   51.930505]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   51.936994] ---[ end trace f419538ada83b5c5 ]---

It was caused by the policy->cpus changed during the process of
__gov_queue_work(), in other word, cpu offline happened.

Use get/put_online_cpus() to prevent the offline from happening while
__gov_queue_work() is running.

[rjw: The problem has been present since recent commit 031299b
(cpufreq: governors: Avoid unnecessary per cpu timer interrupts)]

References: https://lkml.org/lkml/2013/6/5/88
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2f7021a
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Cong Wang bridge: fix some kernel warning in multicast timer
Several people reported the warning: "kernel BUG at kernel/timer.c:729!"
and the stack trace is:

	#7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905
	#8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge]
	#9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge]
	#10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge]
	#11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge]
	#12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc
	#13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6
	#14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad
	#15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17
	#16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68
	#17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101
	#18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8
	#19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun]
	#20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun]
	#21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1
	#22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe
	#23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f
	#24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1
	#25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292

this is due to I forgot to check if mp->timer is armed in
br_multicast_del_pg(). This bug is introduced by
commit 9f00b2e (bridge: only expire the mdb entry
when query is received).

Same for __br_mdb_del().

Tested-by: poma <pomidorabelisima@gmail.com>
Reported-by: LiYonghua <809674045@qq.com>
Reported-by: Robert Hancock <hancockrwd@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c7e8e8a
@popcornmix popcornmix referenced this issue from a commit
Harshula Jayasuriya nfsd: nfsd_open: when dentry_open returns an error do not propagate a…
…s struct file

commit e4daf1f upstream.

The following call chain:
------------------------------------------------------------
nfs4_get_vfs_file
- nfsd_open
  - dentry_open
    - do_dentry_open
      - __get_file_write_access
        - get_write_access
          - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
------------------------------------------------------------

can result in the following state:
------------------------------------------------------------
struct nfs4_file {
...
  fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
  fi_access = {{
      counter = 0x1
    }, {
      counter = 0x0
    }},
...
------------------------------------------------------------

1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NULL, hence nfsd_open() is called where we get status set to an error
and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.

2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
Thus we leave a landmine in the form of the nfs4_file data structure in
an incorrect state.

3) Eventually, when __nfs4_file_put_access() is called it finds
fi_access[O_WRONLY] being non-zero, it decrements it and calls
nfs4_file_put_fd() which tries to fput -ETXTBSY.
------------------------------------------------------------
...
     [exception RIP: fput+0x9]
     RIP: ffffffff81177fa9  RSP: ffff88062e365c90  RFLAGS: 00010282
     RAX: ffff880c2b3d99cc  RBX: ffff880c2b3d9978  RCX: 0000000000000002
     RDX: dead000000100101  RSI: 0000000000000001  RDI: ffffffffffffffe6
     RBP: ffff88062e365c90   R8: ffff88041fe797d8   R9: ffff88062e365d58
     R10: 0000000000000008  R11: 0000000000000000  R12: 0000000000000001
     R13: 0000000000000007  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
 #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
 #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
 #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
 #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
 #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
 #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
 #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
 #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
 #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
 #19 [ffff88062e365ee8] kthread at ffffffff81090886
 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
------------------------------------------------------------

Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ce4eb5e
@popcornmix popcornmix referenced this issue from a commit
@jmberg jmberg iwlwifi: mvm: fix flushing not started aggregation sessions
commit b6658ff upstream.

When a not fully started aggregation session is destroyed
and flushed, we get a warning, e.g.

  WARNING: at drivers/net/wireless/iwlwifi/pcie/tx.c:1142 iwl_trans_pcie_txq_disable+0x11c/0x160
  queue 16 not used
  Modules linked in: [...]
  Pid: 5135, comm: hostapd Tainted: G        W  O 3.5.0 #10
  Call Trace:
  wlan0: driver sets block=0 for sta 00:03:7f:10:44:d3
   [<ffffffff81036492>] warn_slowpath_common+0x72/0xa0
   [<ffffffff81036577>] warn_slowpath_fmt+0x47/0x50
   [<ffffffffa0368d6c>] iwl_trans_pcie_txq_disable+0x11c/0x160 [iwlwifi]
   [<ffffffffa03a2099>] iwl_mvm_sta_tx_agg_flush+0xe9/0x150 [iwlmvm]
   [<ffffffffa0396c43>] iwl_mvm_mac_ampdu_action+0xf3/0x1e0 [iwlmvm]
   [<ffffffffa0293ad3>] ___ieee80211_stop_tx_ba_session+0x193/0x920 [mac80211]
   [<ffffffffa0294ed8>] __ieee80211_stop_tx_ba_session+0x48/0x70 [mac80211]
   [<ffffffffa029159f>] ieee80211_sta_tear_down_BA_sessions+0x4f/0x80 [mac80211]
   [<ffffffffa028a686>] __sta_info_destroy+0x66/0x370 [mac80211]
   [<ffffffffa028abb4>] sta_info_destroy_addr_bss+0x44/0x70 [mac80211]
   [<ffffffffa02a3e26>] ieee80211_del_station+0x26/0x50 [mac80211]
   [<ffffffffa01e6395>] nl80211_del_station+0x85/0x200 [cfg80211]

when a station deauthenticated from us without fully setting
up the aggregation session.

Fix this by checking the aggregation state before removing
the hardware queue.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fae60f7
@popcornmix popcornmix referenced this issue from a commit
Harshula Jayasuriya nfsd: nfsd_open: when dentry_open returns an error do not propagate a…
…s struct file

The following call chain:
------------------------------------------------------------
nfs4_get_vfs_file
- nfsd_open
  - dentry_open
    - do_dentry_open
      - __get_file_write_access
        - get_write_access
          - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
------------------------------------------------------------

can result in the following state:
------------------------------------------------------------
struct nfs4_file {
...
  fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
  fi_access = {{
      counter = 0x1
    }, {
      counter = 0x0
    }},
...
------------------------------------------------------------

1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NULL, hence nfsd_open() is called where we get status set to an error
and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.

2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
Thus we leave a landmine in the form of the nfs4_file data structure in
an incorrect state.

3) Eventually, when __nfs4_file_put_access() is called it finds
fi_access[O_WRONLY] being non-zero, it decrements it and calls
nfs4_file_put_fd() which tries to fput -ETXTBSY.
------------------------------------------------------------
...
     [exception RIP: fput+0x9]
     RIP: ffffffff81177fa9  RSP: ffff88062e365c90  RFLAGS: 00010282
     RAX: ffff880c2b3d99cc  RBX: ffff880c2b3d9978  RCX: 0000000000000002
     RDX: dead000000100101  RSI: 0000000000000001  RDI: ffffffffffffffe6
     RBP: ffff88062e365c90   R8: ffff88041fe797d8   R9: ffff88062e365d58
     R10: 0000000000000008  R11: 0000000000000000  R12: 0000000000000001
     R13: 0000000000000007  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
 #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
 #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
 #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
 #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
 #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
 #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
 #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
 #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
 #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
 #19 [ffff88062e365ee8] kthread at ffffffff81090886
 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
------------------------------------------------------------

Cc: stable@vger.kernel.org
Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
e4daf1f
@popcornmix popcornmix referenced this issue from a commit
@jmberg jmberg iwlwifi: mvm: fix flushing not started aggregation sessions
When a not fully started aggregation session is destroyed
and flushed, we get a warning, e.g.

  WARNING: at drivers/net/wireless/iwlwifi/pcie/tx.c:1142 iwl_trans_pcie_txq_disable+0x11c/0x160
  queue 16 not used
  Modules linked in: [...]
  Pid: 5135, comm: hostapd Tainted: G        W  O 3.5.0 #10
  Call Trace:
  wlan0: driver sets block=0 for sta 00:03:7f:10:44:d3
   [<ffffffff81036492>] warn_slowpath_common+0x72/0xa0
   [<ffffffff81036577>] warn_slowpath_fmt+0x47/0x50
   [<ffffffffa0368d6c>] iwl_trans_pcie_txq_disable+0x11c/0x160 [iwlwifi]
   [<ffffffffa03a2099>] iwl_mvm_sta_tx_agg_flush+0xe9/0x150 [iwlmvm]
   [<ffffffffa0396c43>] iwl_mvm_mac_ampdu_action+0xf3/0x1e0 [iwlmvm]
   [<ffffffffa0293ad3>] ___ieee80211_stop_tx_ba_session+0x193/0x920 [mac80211]
   [<ffffffffa0294ed8>] __ieee80211_stop_tx_ba_session+0x48/0x70 [mac80211]
   [<ffffffffa029159f>] ieee80211_sta_tear_down_BA_sessions+0x4f/0x80 [mac80211]
   [<ffffffffa028a686>] __sta_info_destroy+0x66/0x370 [mac80211]
   [<ffffffffa028abb4>] sta_info_destroy_addr_bss+0x44/0x70 [mac80211]
   [<ffffffffa02a3e26>] ieee80211_del_station+0x26/0x50 [mac80211]
   [<ffffffffa01e6395>] nl80211_del_station+0x85/0x200 [cfg80211]

when a station deauthenticated from us without fully setting
up the aggregation session.

Fix this by checking the aggregation state before removing
the hardware queue.

Cc: stable@vger.kernel.org
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
b6658ff
@popcornmix popcornmix referenced this issue from a commit
Russ Anderson drivers/base/memory.c: fix show_mem_removable() to handle missing sec…
…tions

"cat /sys/devices/system/memory/memory*/removable" crashed the system.

The problem is that show_mem_removable() is passing a
bad pfn to is_mem_section_removable(), which causes

    if (!node_online(page_to_nid(page)))

to blow up.  Why is it passing in a bad pfn?

The reason is that show_mem_removable() will loop sections_per_block
times.  sections_per_block is 16, but mem->section_count is 8,
indicating holes in this memory block.  Checking that the memory section
is present before checking to see if the memory section is removable
fixes the problem.

   harp5-sys:~ # cat /sys/devices/system/memory/memory*/removable
   0
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   BUG: unable to handle kernel paging request at ffffea00c3200000
   IP: [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
   PGD 83ffd4067 PUD 37bdfce067 PMD 0
   Oops: 0000 [#1] SMP
   Modules linked in: autofs4 binfmt_misc rdma_ucm rdma_cm iw_cm ib_addr ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad iw_cxgb3 cxgb3 mdio mlx4_en mlx4_ib ib_sa mlx4_core ib_mthca ib_mad ib_core fuse nls_iso8859_1 nls_cp437 vfat fat joydev loop hid_generic usbhid hid hwperf(O) numatools(O) dm_mod iTCO_wdt ipv6 iTCO_vendor_support igb i2c_i801 ioatdma i2c_algo_bit ehci_pci pcspkr lpc_ich i2c_core ehci_hcd ptp sg mfd_core dca rtc_cmos pps_core mperf button xhci_hcd sd_mod crc_t10dif usbcore usb_common scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh gru(O) xvma(O) xfs crc32c libcrc32c thermal sata_nv processor piix mptsas mptscsih scsi_transport_sas mptbase megaraid_sas fan thermal_sys hwmon ext3 jbd ata_piix ahci libahci libata scsi_mod
   CPU: 4 PID: 5991 Comm: cat Tainted: G           O 3.11.0-rc5-rja-uv+ #10
   Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013
   task: ffff88081f034580 ti: ffff880820022000 task.ti: ffff880820022000
   RIP: 0010:[<ffffffff81117ed1>]  [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
   RSP: 0018:ffff880820023df8  EFLAGS: 00010287
   RAX: 0000000000040000 RBX: ffffea00c3200000 RCX: 0000000000000004
   RDX: ffffea00c30b0000 RSI: 00000000001c0000 RDI: ffffea00c3200000
   RBP: ffff880820023e38 R08: 0000000000000000 R09: 0000000000000001
   R10: 0000000000000000 R11: 0000000000000001 R12: ffffea00c33c0000
   R13: 0000160000000000 R14: 6db6db6db6db6db7 R15: 0000000000000001
   FS:  00007ffff7fb2700(0000) GS:ffff88083fc80000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: ffffea00c3200000 CR3: 000000081b954000 CR4: 00000000000407e0
   Call Trace:
     show_mem_removable+0x41/0x70
     dev_attr_show+0x2a/0x60
     sysfs_read_file+0xf7/0x1c0
     vfs_read+0xc8/0x130
     SyS_read+0x5d/0xa0
     system_call_fastpath+0x16/0x1b

Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
21ea9f5
@popcornmix popcornmix referenced this issue from a commit
Russ Anderson drivers/base/memory.c: fix show_mem_removable() to handle missing sec…
…tions

commit 21ea9f5 upstream.

"cat /sys/devices/system/memory/memory*/removable" crashed the system.

The problem is that show_mem_removable() is passing a
bad pfn to is_mem_section_removable(), which causes

    if (!node_online(page_to_nid(page)))

to blow up.  Why is it passing in a bad pfn?

The reason is that show_mem_removable() will loop sections_per_block
times.  sections_per_block is 16, but mem->section_count is 8,
indicating holes in this memory block.  Checking that the memory section
is present before checking to see if the memory section is removable
fixes the problem.

   harp5-sys:~ # cat /sys/devices/system/memory/memory*/removable
   0
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   BUG: unable to handle kernel paging request at ffffea00c3200000
   IP: [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
   PGD 83ffd4067 PUD 37bdfce067 PMD 0
   Oops: 0000 [#1] SMP
   Modules linked in: autofs4 binfmt_misc rdma_ucm rdma_cm iw_cm ib_addr ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad iw_cxgb3 cxgb3 mdio mlx4_en mlx4_ib ib_sa mlx4_core ib_mthca ib_mad ib_core fuse nls_iso8859_1 nls_cp437 vfat fat joydev loop hid_generic usbhid hid hwperf(O) numatools(O) dm_mod iTCO_wdt ipv6 iTCO_vendor_support igb i2c_i801 ioatdma i2c_algo_bit ehci_pci pcspkr lpc_ich i2c_core ehci_hcd ptp sg mfd_core dca rtc_cmos pps_core mperf button xhci_hcd sd_mod crc_t10dif usbcore usb_common scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh gru(O) xvma(O) xfs crc32c libcrc32c thermal sata_nv processor piix mptsas mptscsih scsi_transport_sas mptbase megaraid_sas fan thermal_sys hwmon ext3 jbd ata_piix ahci libahci libata scsi_mod
   CPU: 4 PID: 5991 Comm: cat Tainted: G           O 3.11.0-rc5-rja-uv+ #10
   Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013
   task: ffff88081f034580 ti: ffff880820022000 task.ti: ffff880820022000
   RIP: 0010:[<ffffffff81117ed1>]  [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
   RSP: 0018:ffff880820023df8  EFLAGS: 00010287
   RAX: 0000000000040000 RBX: ffffea00c3200000 RCX: 0000000000000004
   RDX: ffffea00c30b0000 RSI: 00000000001c0000 RDI: ffffea00c3200000
   RBP: ffff880820023e38 R08: 0000000000000000 R09: 0000000000000001
   R10: 0000000000000000 R11: 0000000000000001 R12: ffffea00c33c0000
   R13: 0000160000000000 R14: 6db6db6db6db6db7 R15: 0000000000000001
   FS:  00007ffff7fb2700(0000) GS:ffff88083fc80000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: ffffea00c3200000 CR3: 000000081b954000 CR4: 00000000000407e0
   Call Trace:
     show_mem_removable+0x41/0x70
     dev_attr_show+0x2a/0x60
     sysfs_read_file+0xf7/0x1c0
     vfs_read+0xc8/0x130
     SyS_read+0x5d/0xa0
     system_call_fastpath+0x16/0x1b

Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ea35d7d
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Jim Foraker IPoIB: Fix race in deleting ipoib_neigh entries
In several places, this snippet is used when removing neigh entries:

	list_del(&neigh->list);
	ipoib_neigh_free(neigh);

The list_del() removes neigh from the associated struct ipoib_path, while
ipoib_neigh_free() removes neigh from the device's neigh entry lookup
table.  Both of these operations are protected by the priv->lock
spinlock.  The table however is also protected via RCU, and so naturally
the lock is not held when doing reads.

This leads to a race condition, in which a thread may successfully look
up a neigh entry that has already been deleted from neigh->list.  Since
the previous deletion will have marked the entry with poison, a second
list_del() on the object will cause a panic:

  #5 [ffff8802338c3c70] general_protection at ffffffff815108c5
     [exception RIP: list_del+16]
     RIP: ffffffff81289020  RSP: ffff8802338c3d20  RFLAGS: 00010082
     RAX: dead000000200200  RBX: ffff880433e60c88  RCX: 0000000000009e6c
     RDX: 0000000000000246  RSI: ffff8806012ca298  RDI: ffff880433e60c88
     RBP: ffff8802338c3d30   R8: ffff8806012ca2e8   R9: 00000000ffffffff
     R10: 0000000000000001  R11: 0000000000000000  R12: ffff8804346b2020
     R13: ffff88032a3e7540  R14: ffff8804346b26e0  R15: 0000000000000246
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib]
  #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm]
  #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm]
  #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10
 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66
 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca

We move the list_del() into ipoib_neigh_free(), so that deletion happens
only once, after the entry has been successfully removed from the lookup
table.  This same behavior is already used in ipoib_del_neighs_by_gid()
and __ipoib_reap_neigh().

Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
49b8e74
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Russ Anderson drivers/base/memory.c: fix show_mem_removable() to handle missing sec…
…tions

"cat /sys/devices/system/memory/memory*/removable" crashed the system.

The problem is that show_mem_removable() is passing a
bad pfn to is_mem_section_removable(), which causes
if (!node_online(page_to_nid(page))) to blow up.
Why is it passing in a bad pfn?

show_mem_removable() will loop sections_per_block times.
sections_per_block is 16, but mem->section_count is 8,
indicating holes in this memory block.  Checking that
the memory section is present before checking to see
if the memory section is removable fixes the problem.

harp5-sys:~ # cat /sys/devices/system/memory/memory*/removable
0
1
1
1
1
1
1
1
1
1
1
1
1
1
[  372.111178] BUG: unable to handle kernel paging request at ffffea00c3200000
[  372.119230] IP: [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
[  372.127022] PGD 83ffd4067 PUD 37bdfce067 PMD 0
[  372.132109] Oops: 0000 [#1] SMP
[  372.135730] Modules linked in: autofs4 binfmt_misc rdma_ucm rdma_cm iw_cm ib_addr ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad iw_cxgb3 cxgb3 mdio mlx4_en mlx4_ib ib_sa mlx4_core ib_mthca ib_mad ib_core fuse nls_iso8859_1 nls_cp437 vfat fat joydev loop hid_generic usbhid hid hwperf(O) numatools(O) dm_mod iTCO_wdt ipv6 iTCO_vendor_support igb i2c_i801 ioatdma i2c_algo_bit ehci_pci pcspkr lpc_ich i2c_core ehci_hcd ptp sg mfd_core dca rtc_cmos pps_core mperf button xhci_hcd sd_mod crc_t10dif usbcore usb_common scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh gru(O) xvma(O) xfs crc32c libcrc32c thermal sata_nv processor piix mptsas mptscsih scsi_transport_sas mptbase megaraid_sas fan thermal_sys hwmon ext3 jbd ata_piix ahci libahci libata scsi_mod
[  372.213536] CPU: 4 PID: 5991 Comm: cat Tainted: G           O 3.11.0-rc5-rja-uv+ #10
[  372.222173] Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013
[  372.231391] task: ffff88081f034580 ti: ffff880820022000 task.ti: ffff880820022000
[  372.239737] RIP: 0010:[<ffffffff81117ed1>]  [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
[  372.250229] RSP: 0018:ffff880820023df8  EFLAGS: 00010287
[  372.256151] RAX: 0000000000040000 RBX: ffffea00c3200000 RCX: 0000000000000004
[  372.264111] RDX: ffffea00c30b0000 RSI: 00000000001c0000 RDI: ffffea00c3200000
[  372.272071] RBP: ffff880820023e38 R08: 0000000000000000 R09: 0000000000000001
[  372.280030] R10: 0000000000000000 R11: 0000000000000001 R12: ffffea00c33c0000
[  372.287987] R13: 0000160000000000 R14: 6db6db6db6db6db7 R15: 0000000000000001
[  372.295945] FS:  00007ffff7fb2700(0000) GS:ffff88083fc80000(0000) knlGS:0000000000000000
[  372.304970] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  372.311378] CR2: ffffea00c3200000 CR3: 000000081b954000 CR4: 00000000000407e0
[  372.319335] Stack:
[  372.321575]  ffff880820023e38 ffffffff81161e94 ffffffff81d9e940 0000000000000009
[  372.329872]  0000000000000000 ffff8817bb97b800 ffff88081e928000 ffff8817bb97b870
[  372.338167]  ffff880820023e68 ffffffff813730d1 fffffffffffffffb ffffffff81a97600
[  372.346463] Call Trace:
[  372.349201]  [<ffffffff81161e94>] ? is_mem_section_removable+0x84/0x110
[  372.356579]  [<ffffffff813730d1>] show_mem_removable+0x41/0x70
[  372.363094]  [<ffffffff8135be8a>] dev_attr_show+0x2a/0x60
[  372.369122]  [<ffffffff811e1817>] sysfs_read_file+0xf7/0x1c0
[  372.375441]  [<ffffffff8116e7e8>] vfs_read+0xc8/0x130
[  372.381076]  [<ffffffff8116ee5d>] SyS_read+0x5d/0xa0
[  372.386624]  [<ffffffff814bfa12>] system_call_fastpath+0x16/0x1b
[  372.393313] Code: 01 00 00 00 e9 3c ff ff ff 90 0f b6 4a 30 44 89 d8 d3 e0 89 c1 83 e9 01 48 63 c9 49 01 c8 eb 92 66 2e 0f 1f 84 00 00 00 00 00 55 <48> 8b 0f 49 89 f8 48 89 e5 48 89 ca 48 c1 ea 36 0f a3 15 d8 2f
[  372.415032] RIP  [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
[  372.422905]  RSP <ffff880820023df8>
[  372.426792] CR2: ffffea00c3200000

Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2c2ba38
@popcornmix popcornmix referenced this issue from a commit
Libin x86/smpboot: Fix announce_cpu() to printk() the last "OK" properly
When booting secondary CPUs, announce_cpu() is called to show which cpu has
been brought up. For example:

[    0.402751] smpboot: Booting Node   0, Processors  #1 #2 #3 #4 #5 OK
[    0.525667] smpboot: Booting Node   1, Processors  #6 #7 #8 #9 #10 #11 OK
[    0.755592] smpboot: Booting Node   0, Processors  #12 #13 #14 #15 #16 #17 OK
[    0.890495] smpboot: Booting Node   1, Processors  #18 #19 #20 #21 #22 #23

But the last "OK" is lost, because 'nr_cpu_ids-1' represents the maximum
possible cpu id. It should use the maximum present cpu id in case not all
CPUs booted up.

Signed-off-by: Libin <huawei.libin@huawei.com>
Cc: <guohanjun@huawei.com>
Cc: <wangyijing@huawei.com>
Cc: <fenghua.yu@intel.com>
Cc: <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1378378676-18276-1-git-send-email-huawei.libin@huawei.com
[ tweaked the changelog, removed unnecessary line break, tweaked the format to align the fields vertically. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5223948
@popcornmix popcornmix referenced this issue from a commit
Adrian Hunter perf annotate: Fix objdump line parsing offset validation
When parsing lines from objdump a line containing source code starting
with a numeric label is mistaken for a line of disassembly starting with
a memory address.

Current validation fails to recognise that the "memory address" is out
of range and calculates an invalid offset which later causes this
segfault:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50)
    at util/annotate.c:631
631				hits += h->addr[offset++];
(gdb) bt
 #0  0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50)
    at util/annotate.c:631
 #1  0x00000000004d65e3 in annotate_browser__calc_percent (browser=0x7fffffffd130, evsel=0xa01da0) at ui/browsers/annotate.c:364
 #2  0x00000000004d7433 in annotate_browser__run (browser=0x7fffffffd130, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:672
 #3  0x00000000004d80c9 in symbol__tui_annotate (sym=0xc989a0, map=0xa02660, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:962
 #4  0x00000000004d7aa0 in hist_entry__tui_annotate (he=0xdf73f0, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:823
 #5  0x00000000004dd648 in perf_evsel__hists_browse (evsel=0xa01da0, nr_events=1, helpline=
    0x58b768 "For a higher level overview, try: perf report --sort comm,dso", ev_name=0xa02cd0 "cycles", left_exits=false, hbt=
    0x0, min_pcnt=0, env=0xa011e0) at ui/browsers/hists.c:1659
 #6  0x00000000004de372 in perf_evlist__tui_browse_hists (evlist=0xa01520, help=
    0x58b768 "For a higher level overview, try: perf report --sort comm,dso", hbt=0x0, min_pcnt=0, env=0xa011e0)
    at ui/browsers/hists.c:1950
 #7  0x000000000042cf6b in __cmd_report (rep=0x7fffffffd6c0) at builtin-report.c:581
 #8  0x000000000042e25d in cmd_report (argc=0, argv=0x7fffffffe4b0, prefix=0x0) at builtin-report.c:965
 #9  0x000000000041a0e1 in run_builtin (p=0x801548, argc=1, argv=0x7fffffffe4b0) at perf.c:319
 #10 0x000000000041a319 in handle_internal_command (argc=1, argv=0x7fffffffe4b0) at perf.c:376
 #11 0x000000000041a465 in run_argv (argcp=0x7fffffffe38c, argv=0x7fffffffe380) at perf.c:420
 #12 0x000000000041a707 in main (argc=1, argv=0x7fffffffe4b0) at perf.c:521

After the fix is applied the symbol can be annotated showing the
problematic line "1:      rep"

copy_user_generic_string  /usr/lib/debug/lib/modules/3.9.10-100.fc17.x86_64/vmlinux
             */
            ENTRY(copy_user_generic_string)
                    CFI_STARTPROC
                    ASM_STAC
                    andl %edx,%edx
              and    %edx,%edx
                    jz 4f
              je     37
                    cmpl $8,%edx
              cmp    $0x8,%edx
                    jb 2f           /* less than 8 bytes, go to byte copy loop */
              jb     33
                    ALIGN_DESTINATION
              mov    %edi,%ecx
              and    $0x7,%ecx
              je     28
              sub    $0x8,%ecx
              neg    %ecx
              sub    %ecx,%edx
        1a:   mov    (%rsi),%al
              mov    %al,(%rdi)
              inc    %rsi
              inc    %rdi
              dec    %ecx
              jne    1a
                    movl %edx,%ecx
        28:   mov    %edx,%ecx
                    shrl $3,%ecx
              shr    $0x3,%ecx
                    andl $7,%edx
              and    $0x7,%edx
            1:      rep
100.00        rep    movsq %ds:(%rsi),%es:(%rdi)
                    movsq
            2:      movl %edx,%ecx
        33:   mov    %edx,%ecx
            3:      rep
              rep    movsb %ds:(%rsi),%es:(%rdi)
                    movsb
            4:      xorl %eax,%eax
        37:   xor    %eax,%eax
              data32 xchg %ax,%ax
                    ASM_CLAC
                    ret
              retq

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1379009721-27667-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
886b37b
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Borislav Petkov x86: Improve the printout of the SMP bootup CPU table
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
646e29a
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Borislav Petkov x86/boot: Further compress CPUs bootup message
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
a17bce4
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
@jmberg jmberg iwlwifi: mvm: fix locking in iwl_mvm_bt_rssi_event()
This will deadlock due to commit 9f34783863bea806
("iwlwifi: mvm: Implement BT coex notifications"):

  =============================================
  [ INFO: possible recursive locking detected ]
  3.5.0 #10 Tainted: G        W  O
  ---------------------------------------------
  kworker/2:1/5214 is trying to acquire lock:
   (&mvm->mutex){+.+.+.}, at: [<ffffffffa03be23e>] iwl_mvm_bt_rssi_event+0x5e/0x210 [iwlmvm]

  but task is already holding lock:
   (&mvm->mutex){+.+.+.}, at: [<ffffffffa03ab2d9>] iwl_mvm_async_handlers_wk+0x49/0x120 [iwlmvm]

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&mvm->mutex);
    lock(&mvm->mutex);

   *** DEADLOCK ***

Change-Id: I9104f252b34676e2f7ffcd51166f95367e08a4d9
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-on: https://gerrit.rds.intel.com/21887
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Tested-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Conflicts:
	drivers/net/wireless/iwlwifi/mvm/bt-coex.c
3dd1cd2
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
@rostedt rostedt tracing: Fix potential out-of-bounds in trace_get_user()
Andrey reported the following report:

ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3
ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3)
Accessed by thread T13003:
  #0 ffffffff810dd2da (asan_report_error+0x32a/0x440)
  #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40)
  #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20)
  #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260)
  #4 ffffffff812a1065 (__fput+0x155/0x360)
  #5 ffffffff812a12de (____fput+0x1e/0x30)
  #6 ffffffff8111708d (task_work_run+0x10d/0x140)
  #7 ffffffff810ea043 (do_exit+0x433/0x11f0)
  #8 ffffffff810eaee4 (do_group_exit+0x84/0x130)
  #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30)
  #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Allocated by thread T5167:
  #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0)
  #1 ffffffff8128337c (__kmalloc+0xbc/0x500)
  #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90)
  #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0)
  #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40)
  #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430)
  #6 ffffffff8129b668 (finish_open+0x68/0xa0)
  #7 ffffffff812b66ac (do_last+0xb8c/0x1710)
  #8 ffffffff812b7350 (path_openat+0x120/0xb50)
  #9 ffffffff812b8884 (do_filp_open+0x54/0xb0)
  #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0)
  #11 ffffffff8129d4b7 (SyS_open+0x37/0x50)
  #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Shadow bytes around the buggy address:
  ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb
  ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap redzone:          fa
  Heap kmalloc redzone:  fb
  Freed heap region:     fd
  Shadow gap:            fe

The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;'

Although the crash happened in ftrace_regex_open() the real bug
occurred in trace_get_user() where there's an incrementation to
parser->idx without a check against the size. The way it is triggered
is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop
that reads the last character stores it and then breaks out because
there is no more characters. Then the last character is read to determine
what to do next, and the index is incremented without checking size.

Then the caller of trace_get_user() usually nulls out the last character
with a zero, but since the index is equal to the size, it writes a nul
character after the allocated space, which can corrupt memory.

Luckily, only root user has write access to this file.

Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
057db84
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
@torvalds torvalds Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/lin…
…ux/kernel/git/tip/tip

Pull x86 boot changes from Ingo Molnar:
 "Two changes that prettify and compactify the SMP bootup output from:

     smpboot: Booting Node   0, Processors  #1 #2 #3 OK
     smpboot: Booting Node   1, Processors  #4 #5 #6 #7 OK
     smpboot: Booting Node   2, Processors  #8 #9 #10 #11 OK
     smpboot: Booting Node   3, Processors  #12 #13 #14 #15 OK
     Brought up 16 CPUs

  to something like:

     x86: Booting SMP configuration:
     .... node  #0, CPUs:        #1  #2  #3
     .... node  #1, CPUs:    #4  #5  #6  #7
     .... node  #2, CPUs:    #8  #9 #10 #11
     .... node  #3, CPUs:   #12 #13 #14 #15
     x86: Booted up 4 nodes, 16 CPUs"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Further compress CPUs bootup message
  x86: Improve the printout of the SMP bootup CPU table
014d595
@popcornmix popcornmix referenced this issue from a commit
@rostedt rostedt tracing: Fix potential out-of-bounds in trace_get_user()
commit 057db84 upstream.

Andrey reported the following report:

ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3
ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3)
Accessed by thread T13003:
  #0 ffffffff810dd2da (asan_report_error+0x32a/0x440)
  #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40)
  #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20)
  #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260)
  #4 ffffffff812a1065 (__fput+0x155/0x360)
  #5 ffffffff812a12de (____fput+0x1e/0x30)
  #6 ffffffff8111708d (task_work_run+0x10d/0x140)
  #7 ffffffff810ea043 (do_exit+0x433/0x11f0)
  #8 ffffffff810eaee4 (do_group_exit+0x84/0x130)
  #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30)
  #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Allocated by thread T5167:
  #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0)
  #1 ffffffff8128337c (__kmalloc+0xbc/0x500)
  #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90)
  #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0)
  #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40)
  #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430)
  #6 ffffffff8129b668 (finish_open+0x68/0xa0)
  #7 ffffffff812b66ac (do_last+0xb8c/0x1710)
  #8 ffffffff812b7350 (path_openat+0x120/0xb50)
  #9 ffffffff812b8884 (do_filp_open+0x54/0xb0)
  #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0)
  #11 ffffffff8129d4b7 (SyS_open+0x37/0x50)
  #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Shadow bytes around the buggy address:
  ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb
  ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap redzone:          fa
  Heap kmalloc redzone:  fb
  Freed heap region:     fd
  Shadow gap:            fe

The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;'

Although the crash happened in ftrace_regex_open() the real bug
occurred in trace_get_user() where there's an incrementation to
parser->idx without a check against the size. The way it is triggered
is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop
that reads the last character stores it and then breaks out because
there is no more characters. Then the last character is read to determine
what to do next, and the index is incremented without checking size.

Then the caller of trace_get_user() usually nulls out the last character
with a zero, but since the index is equal to the size, it writes a nul
character after the allocated space, which can corrupt memory.

Luckily, only root user has write access to this file.

Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
a80d0c3
@popcornmix popcornmix referenced this issue from a commit
@rostedt rostedt tracing: Fix potential out-of-bounds in trace_get_user()
commit 057db84 upstream.

Andrey reported the following report:

ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3
ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3)
Accessed by thread T13003:
  #0 ffffffff810dd2da (asan_report_error+0x32a/0x440)
  #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40)
  #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20)
  #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260)
  #4 ffffffff812a1065 (__fput+0x155/0x360)
  #5 ffffffff812a12de (____fput+0x1e/0x30)
  #6 ffffffff8111708d (task_work_run+0x10d/0x140)
  #7 ffffffff810ea043 (do_exit+0x433/0x11f0)
  #8 ffffffff810eaee4 (do_group_exit+0x84/0x130)
  #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30)
  #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Allocated by thread T5167:
  #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0)
  #1 ffffffff8128337c (__kmalloc+0xbc/0x500)
  #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90)
  #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0)
  #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40)
  #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430)
  #6 ffffffff8129b668 (finish_open+0x68/0xa0)
  #7 ffffffff812b66ac (do_last+0xb8c/0x1710)
  #8 ffffffff812b7350 (path_openat+0x120/0xb50)
  #9 ffffffff812b8884 (do_filp_open+0x54/0xb0)
  #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0)
  #11 ffffffff8129d4b7 (SyS_open+0x37/0x50)
  #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Shadow bytes around the buggy address:
  ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb
  ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap redzone:          fa
  Heap kmalloc redzone:  fb
  Freed heap region:     fd
  Shadow gap:            fe

The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;'

Although the crash happened in ftrace_regex_open() the real bug
occurred in trace_get_user() where there's an incrementation to
parser->idx without a check against the size. The way it is triggered
is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop
that reads the last character stores it and then breaks out because
there is no more characters. Then the last character is read to determine
what to do next, and the index is incremented without checking size.

Then the caller of trace_get_user() usually nulls out the last character
with a zero, but since the index is equal to the size, it writes a nul
character after the allocated space, which can corrupt memory.

Luckily, only root user has write access to this file.

Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
33e3df4
@davet321 davet321 referenced this issue from a commit in davet321/rpi-linux
@rostedt rostedt tracing: Fix potential out-of-bounds in trace_get_user()
commit 057db84 upstream.

Andrey reported the following report:

ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3
ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3)
Accessed by thread T13003:
  #0 ffffffff810dd2da (asan_report_error+0x32a/0x440)
  #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40)
  #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20)
  #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260)
  #4 ffffffff812a1065 (__fput+0x155/0x360)
  #5 ffffffff812a12de (____fput+0x1e/0x30)
  #6 ffffffff8111708d (task_work_run+0x10d/0x140)
  #7 ffffffff810ea043 (do_exit+0x433/0x11f0)
  #8 ffffffff810eaee4 (do_group_exit+0x84/0x130)
  #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30)
  #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Allocated by thread T5167:
  #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0)
  #1 ffffffff8128337c (__kmalloc+0xbc/0x500)
  #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90)
  #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0)
  #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40)
  #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430)
  #6 ffffffff8129b668 (finish_open+0x68/0xa0)
  #7 ffffffff812b66ac (do_last+0xb8c/0x1710)
  #8 ffffffff812b7350 (path_openat+0x120/0xb50)
  #9 ffffffff812b8884 (do_filp_open+0x54/0xb0)
  #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0)
  #11 ffffffff8129d4b7 (SyS_open+0x37/0x50)
  #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Shadow bytes around the buggy address:
  ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb
  ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap redzone:          fa
  Heap kmalloc redzone:  fb
  Freed heap region:     fd
  Shadow gap:            fe

The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;'

Although the crash happened in ftrace_regex_open() the real bug
occurred in trace_get_user() where there's an incrementation to
parser->idx without a check against the size. The way it is triggered
is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop
that reads the last character stores it and then breaks out because
there is no more characters. Then the last character is read to determine
what to do next, and the index is incremented without checking size.

Then the caller of trace_get_user() usually nulls out the last character
with a zero, but since the index is equal to the size, it writes a nul
character after the allocated space, which can corrupt memory.

Luckily, only root user has write access to this file.

Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8430063
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
@fabioestevam fabioestevam ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS cal…
…culation

Currently mx53 (CortexA8) running at 1GHz reports:
Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)

Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)

The original object code looks like this:

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr

0000003c <__loop_delay>:
  3c:	e2500001 	subs	r0, r0, #1
  40:	8afffffe 	bhi	3c <__loop_delay>
  44:	e1a0f00e 	mov	pc, lr

After adding the 'align 3' directive to __loop_delay (align to 8 bytes):

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr
  3c:	e320f000 	nop	{0}

00000040 <__loop_delay>:
  40:	e2500001 	subs	r0, r0, #1
  44:	8afffffe 	bhi	40 <__loop_delay>
  48:	e1a0f00e 	mov	pc, lr
  4c:	e320f000 	nop	{0}

, which now reports:
Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)

Some more test results:

On mx31 (ARM1136) running at 532 MHz, before the patch:
Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)

On mx31 (ARM1136) running at 532 MHz after the patch:
Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)

Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
BogoMIPS value before and after this patch.

Reported-by: Tom Evans <tom_usenet@optusnet.com.au>
Suggested-by: Tom Evans <tom_usenet@optusnet.com.au>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
11d4bb1
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
@davem330 davem330 Merge branch 'dev_get_by_index'
Ying Xue says:

====================
use appropriate APIs to get interfaces

Under rtnl_lock protection, we should use __dev_get_name/index()
rather than dev_get_name()/index() to find interface handlers
because the former interfaces can help us avoid to change interface
reference counter.

v2 changes:
 - Change return value of nl80211_set_wiphy() to 0 in patch #10
   by johannes's suggestion.
 - Add 'Acked-by' into several patches which were acknowledged by
   corresponding maintainers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
c4ba999
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
@azat azat ext4: initialize multi-block allocator before checking block descriptors
With EXT4FS_DEBUG ext4_count_free_clusters() will call
ext4_read_block_bitmap() without s_group_info initialized, so we need to
initialize multi-block allocator before.

And we can't initialize multi-block allocator without group descriptors,
since it use them.
Also we need to install s_op before initializing multi-block allocator,
because in ext4_mb_init_backend() new inode is created.

Here is bt:
(gdb) bt
 #0  ext4_get_group_info (group=0, sb=0xffff880079a10000) at ext4.h:2430
 #1  ext4_validate_block_bitmap (sb=sb@entry=0xffff880079a10000, desc=desc@entry=0xffff880056510000, block_group=block_group@entry=0,
     bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:358
 #2  0xffffffff81232202 in ext4_wait_block_bitmap (sb=sb@entry=0xffff880079a10000, block_group=block_group@entry=0,
     bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:476
 #3  0xffffffff81232eaf in ext4_read_block_bitmap (sb=sb@entry=0xffff880079a10000, block_group=block_group@entry=0) at balloc.c:489
 #4  0xffffffff81232fc0 in ext4_count_free_clusters (sb=sb@entry=0xffff880079a10000) at balloc.c:665
 #5  0xffffffff81259ffa in ext4_check_descriptors (first_not_zeroed=<synthetic pointer>, sb=0xffff880079a10000) at super.c:2143
 #6  ext4_fill_super (sb=sb@entry=0xffff880079a10000, data=<optimized out>, data@entry=0x0 <irq_stack_union>, silent=silent@entry=0)
     at super.c:3851
 #7  0xffffffff811b8340 in mount_bdev (fs_type=<optimized out>, flags=0, dev_name=<optimized out>, data=0x0 <irq_stack_union>,
     fill_super=fill_super@entry=0xffffffff812589c0 <ext4_fill_super>) at super.c:987
 #8  0xffffffff8124ec35 in ext4_mount (fs_type=<optimized out>, flags=<optimized out>, dev_name=<optimized out>, data=<optimized out>)
     at super.c:5365
 #9  0xffffffff811b8cf9 in mount_fs (type=type@entry=0xffffffff81c71840 <ext4_fs_type>, flags=flags@entry=0,
     name=name@entry=0xffff880077a80c70 "/dev/loop4", data=data@entry=0x0 <irq_stack_union>) at super.c:1090
 #10 0xffffffff811d2ff3 in vfs_kern_mount (type=type@entry=0xffffffff81c71840 <ext4_fs_type>, flags=0,
     name=name@entry=0xffff880077a80c70 "/dev/loop4", data=data@entry=0x0 <irq_stack_union>) at namespace.c:813
 #11 0xffffffff811d55de in do_new_mount (data=0x0 <irq_stack_union>, name=0xffff880077a80c70 "/dev/loop4", mnt_flags=32,
     flags=<optimized out>, fstype=0xffff880077a80ca0 "ext4-insane", path=0xffff88007a5b1ed0) at namespace.c:2068
 #12 do_mount (dev_name=0xffff880077a80c70 "/dev/loop4", dir_name=<optimized out>, type_page=0xffff880077a80ca0 "ext4-insane",
     flags=<optimized out>, flags@entry=3236757504, data_page=0x0 <irq_stack_union>) at namespace.c:2392
 #13 0xffffffff811d6183 in SYSC_mount (data=0x0 <irq_stack_union>, flags=3236757504, type=<optimized out>, dir_name=<optimized out>,
     dev_name=0x7ffad9649c20 "/dev/loop4") at namespace.c:2586
 #14 SyS_mount (dev_name=140715365800992, dir_name=<optimized out>, type=<optimized out>, flags=3236757504, data=0) at namespace.c:2559

Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
52d249e
@popcornmix popcornmix referenced this issue from a commit
@snitm snitm block: fix q->flush_rq NULL pointer crash on dm-mpath flush
Commit 1874198 ("blk-mq: rework flush sequencing logic") switched
->flush_rq from being an embedded member of the request_queue structure
to being dynamically allocated in blk_init_queue_node().

Request-based DM multipath doesn't use blk_init_queue_node(), instead it
uses blk_alloc_queue_node() + blk_init_allocated_queue().  Because
commit 1874198 placed the dynamic allocation of ->flush_rq in
blk_init_queue_node() any flush issued to a dm-mpath device would crash
with a NULL pointer, e.g.:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8125037e>] blk_rq_init+0x1e/0xb0
PGD bb3c7067 PUD bb01d067 PMD 0
Oops: 0002 [#1] SMP
...
CPU: 5 PID: 5028 Comm: dt Tainted: G        W  O 3.14.0-rc3.snitm+ #10
...
task: ffff88032fb270e0 ti: ffff880079564000 task.ti: ffff880079564000
RIP: 0010:[<ffffffff8125037e>]  [<ffffffff8125037e>] blk_rq_init+0x1e/0xb0
RSP: 0018:ffff880079565c98  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000030
RDX: ffff880260c74048 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff880079565ca8 R08: ffff880260aa1e98 R09: 0000000000000001
R10: ffff88032fa78500 R11: 0000000000000246 R12: 0000000000000000
R13: ffff880260aa1de8 R14: 0000000000000650 R15: 0000000000000000
FS:  00007f8d36a2a700(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000079b36000 CR4: 00000000000007e0
Stack:
 0000000000000000 ffff880260c74048 ffff880079565cd8 ffffffff81257a47
 ffff880260aa1de8 ffff880260c74048 0000000000000001 0000000000000000
 ffff880079565d08 ffffffff81257c2d 0000000000000000 ffff880260aa1de8
Call Trace:
 [<ffffffff81257a47>] blk_flush_complete_seq+0x2d7/0x2e0
 [<ffffffff81257c2d>] blk_insert_flush+0x1dd/0x210
 [<ffffffff8124ec59>] __elv_add_request+0x1f9/0x320
 [<ffffffff81250681>] ? blk_account_io_start+0x111/0x190
 [<ffffffff81253a4b>] blk_queue_bio+0x25b/0x330
 [<ffffffffa0020bf5>] dm_request+0x35/0x40 [dm_mod]
 [<ffffffff812530c0>] generic_make_request+0xc0/0x100
 [<ffffffff81253173>] submit_bio+0x73/0x140
 [<ffffffff811becdd>] submit_bio_wait+0x5d/0x80
 [<ffffffff81257528>] blkdev_issue_flush+0x78/0xa0
 [<ffffffff811c1f6f>] blkdev_fsync+0x3f/0x60
 [<ffffffff811b7fde>] vfs_fsync_range+0x1e/0x20
 [<ffffffff811b7ffc>] vfs_fsync+0x1c/0x20
 [<ffffffff811b81f1>] do_fsync+0x41/0x80
 [<ffffffff8118874e>] ? SyS_lseek+0x7e/0x80
 [<ffffffff811b8260>] SyS_fsync+0x10/0x20
 [<ffffffff8154c2d2>] system_call_fastpath+0x16/0x1b

Fix this by moving the ->flush_rq allocation from blk_init_queue_node()
to blk_init_allocated_queue().  blk_init_queue_node() also calls
blk_init_allocated_queue() so this change is functionality equivalent
for all blk_init_queue_node() callers.

Reported-by: Hannes Reinecke <hare@suse.de>
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7982e90
@popcornmix popcornmix referenced this issue from a commit
Heiko Carstens s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
smp_stop_cpu() should stop the current cpu even for !CONFIG_SMP.
Otherwise machine_halt() will return and and the machine generates a
panic instread of simply stopping the current cpu:

Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000

CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 3.14.0-01527-g2b6ef16a6bc5 #10
[...]
Call Trace:
([<0000000000110db0>] show_trace+0xf8/0x158)
 [<0000000000110e7a>] show_stack+0x6a/0xe8
 [<000000000074dba8>] panic+0xe4/0x268
 [<0000000000140570>] do_exit+0xa88/0xb2c
 [<000000000016e12c>] SyS_reboot+0x1f0/0x234
 [<000000000075da70>] sysc_nr_ok+0x22/0x28
 [<000000007d5a09b4>] 0x7d5a09b4

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
e7c46c6
@popcornmix popcornmix referenced this issue from a commit
Tejun Heo cgroup_freezer: replace freezer->lock with freezer_mutex
After 96d365e ("cgroup: make css_set_lock a rwsem and rename it
to css_set_rwsem"), css task iterators requires sleepable context as
it may block on css_set_rwsem.  I missed that cgroup_freezer was
iterating tasks under IRQ-safe spinlock freezer->lock.  This leads to
errors like the following on freezer state reads and transitions.

  BUG: sleeping function called from invalid context at /work
 /os/work/kernel/locking/rwsem.c:20
  in_atomic(): 0, irqs_disabled(): 0, pid: 462, name: bash
  5 locks held by bash/462:
   #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff811f0843>] vfs_write+0x1a3/0x1c0
   #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8126d78b>] kernfs_fop_write+0xbb/0x170
   #2:  (s_active#70){.+.+.+}, at: [<ffffffff8126d793>] kernfs_fop_write+0xc3/0x170
   #3:  (freezer_mutex){+.+...}, at: [<ffffffff81135981>] freezer_write+0x61/0x1e0
   #4:  (rcu_read_lock){......}, at: [<ffffffff81135973>] freezer_write+0x53/0x1e0
  Preemption disabled at:[<ffffffff81104404>] console_unlock+0x1e4/0x460

  CPU: 3 PID: 462 Comm: bash Not tainted 3.15.0-rc1-work+ #10
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
   ffff88000916a6d0 ffff88000e0a3da0 ffffffff81cf8c96 0000000000000000
   ffff88000e0a3dc8 ffffffff810cf4f2 ffffffff82388040 ffff880013aaf740
   0000000000000002 ffff88000e0a3de8 ffffffff81d05974 0000000000000246
  Call Trace:
   [<ffffffff81cf8c96>] dump_stack+0x4e/0x7a
   [<ffffffff810cf4f2>] __might_sleep+0x162/0x260
   [<ffffffff81d05974>] down_read+0x24/0x60
   [<ffffffff81133e87>] css_task_iter_start+0x27/0x70
   [<ffffffff8113584d>] freezer_apply_state+0x5d/0x130
   [<ffffffff81135a16>] freezer_write+0xf6/0x1e0
   [<ffffffff8112eb88>] cgroup_file_write+0xd8/0x230
   [<ffffffff8126d7b7>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f0756>] vfs_write+0xb6/0x1c0
   [<ffffffff811f121d>] SyS_write+0x4d/0xc0
   [<ffffffff81d08292>] system_call_fastpath+0x16/0x1b

freezer->lock used to be used in hot paths but that time is long gone
and there's no reason for the lock to be IRQ-safe spinlock or even
per-cgroup.  In fact, given the fact that a cgroup may contain large
number of tasks, it's not a good idea to iterate over them while
holding IRQ-safe spinlock.

Let's simplify locking by replacing per-cgroup freezer->lock with
global freezer_mutex.  This also makes the comments explaining the
intricacies of policy inheritance and the locking around it as the
states are protected by a common mutex.

The conversion is mostly straight-forward.  The followings are worth
mentioning.

* freezer_css_online() no longer needs double locking.

* freezer_attach() now performs propagation simply while holding
  freezer_mutex.  update_if_frozen() race no longer exists and the
  comment is removed.

* freezer_fork() now tests whether the task is in root cgroup using
  the new task_css_is_root() without doing rcu_read_lock/unlock().  If
  not, it grabs freezer_mutex and performs the operation.

* freezer_read() and freezer_change_state() grab freezer_mutex across
  the whole operation and pin the css while iterating so that each
  descendant processing happens in sleepable context.

Fixes: 96d365e ("cgroup: make css_set_lock a rwsem and rename it to css_set_rwsem")
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
e5ced8e
@popcornmix popcornmix referenced this issue from a commit
@iovisor iovisor net: filter: additional BPF tests
All tests should pass with and without JIT.

Example output:
test_bpf: #0 TAX 35 16 16 PASS
test_bpf: #1 TXA 7 7 7 PASS
test_bpf: #2 ADD_SUB_MUL_K 10 PASS
test_bpf: #3 DIV_KX 33 PASS
test_bpf: #4 AND_OR_LSH_K 10 10 PASS
test_bpf: #5 LD_IND 8 8 8 PASS
test_bpf: #6 LD_ABS 8 8 8 PASS
test_bpf: #7 LD_ABS_LL 13 14 PASS
test_bpf: #8 LD_IND_LL 12 12 12 PASS
test_bpf: #9 LD_ABS_NET 10 12 PASS
test_bpf: #10 LD_IND_NET 11 12 12 PASS
...

Numbers are times in nsec per filter for given input data.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9def624
@popcornmix popcornmix referenced this issue from a commit
@pranith pranith torture: Remove __init from torture_init_begin/end
Loading rcutorture as a module (as opposed to building it directly into
the kernel) results in the following splat:

[Wed Apr 16 15:29:33 2014] BUG: unable to handle kernel paging request at ffffffffa0003000
[Wed Apr 16 15:29:33 2014] IP: [<ffffffffa0003000>] 0xffffffffa0003000
[Wed Apr 16 15:29:33 2014] PGD 1c0f067 PUD 1c10063 PMD 378a6067 PTE 0
[Wed Apr 16 15:29:33 2014] Oops: 0010 [#1] SMP
[Wed Apr 16 15:29:33 2014] Modules linked in: rcutorture(+) torture
[Wed Apr 16 15:29:33 2014] CPU: 0 PID: 4257 Comm: modprobe Not tainted 3.15.0-rc1 #10
[Wed Apr 16 15:29:33 2014] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[Wed Apr 16 15:29:33 2014] task: ffff8800db1e88d0 ti: ffff8800db25c000 task.ti: ffff8800db25c000
[Wed Apr 16 15:29:33 2014] RIP: 0010:[<ffffffffa0003000>]  [<ffffffffa0003000>] 0xffffffffa0003000
[Wed Apr 16 15:29:33 2014] RSP: 0018:ffff8800db25dca0  EFLAGS: 00010282
[Wed Apr 16 15:29:33 2014] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[Wed Apr 16 15:29:33 2014] RDX: ffffffffa00090a8 RSI: 0000000000000001 RDI: ffffffffa0008337
[Wed Apr 16 15:29:33 2014] RBP: ffff8800db25dd50 R08: 0000000000000000 R09: 0000000000000000
[Wed Apr 16 15:29:33 2014] R10: ffffea000357b680 R11: ffffffff8113257a R12: ffffffffa000d000
[Wed Apr 16 15:29:33 2014] R13: ffffffffa00094c0 R14: ffffffffa0009510 R15: 0000000000000001
[Wed Apr 16 15:29:33 2014] FS:  00007fee30ce5700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
[Wed Apr 16 15:29:33 2014] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[Wed Apr 16 15:29:33 2014] CR2: ffffffffa0003000 CR3: 00000000d5eb1000 CR4: 00000000000006f0
[Wed Apr 16 15:29:33 2014] Stack:
[Wed Apr 16 15:29:33 2014]  ffffffffa000d02c 0000000000000000 ffff88021700d400 0000000000000000
[Wed Apr 16 15:29:33 2014]  ffff8800db25dd40 ffffffff81647951 ffff8802162bd000 ffff88021541846c
[Wed Apr 16 15:29:33 2014]  0000000000000000 ffffffff817dbe2d ffffffff817dbe2d 0000000000000001
[Wed Apr 16 15:29:33 2014] Call Trace:
[Wed Apr 16 15:29:33 2014]  [<ffffffffa000d02c>] ? rcu_torture_init+0x2c/0x8b4 [rcutorture]
[Wed Apr 16 15:29:33 2014]  [<ffffffff81647951>] ? netlink_broadcast_filtered+0x121/0x3a0
[Wed Apr 16 15:29:33 2014]  [<ffffffff817dbe2d>] ? mutex_lock+0xd/0x2a
[Wed Apr 16 15:29:33 2014]  [<ffffffff817dbe2d>] ? mutex_lock+0xd/0x2a
[Wed Apr 16 15:29:33 2014]  [<ffffffff810e7022>] ? trace_module_notify+0x62/0x1d0
[Wed Apr 16 15:29:33 2014]  [<ffffffffa000d000>] ? 0xffffffffa000cfff
[Wed Apr 16 15:29:33 2014]  [<ffffffff8100034a>] do_one_initcall+0xfa/0x140
[Wed Apr 16 15:29:33 2014]  [<ffffffff8106b4ce>] ? __blocking_notifier_call_chain+0x5e/0x80
[Wed Apr 16 15:29:33 2014]  [<ffffffff810b3481>] load_module+0x1931/0x21b0
[Wed Apr 16 15:29:33 2014]  [<ffffffff810b0330>] ? show_initstate+0x50/0x50
[Wed Apr 16 15:29:33 2014]  [<ffffffff810b3d9e>] SyS_init_module+0x9e/0xc0
[Wed Apr 16 15:29:33 2014]  [<ffffffff817e4c22>] system_call_fastpath+0x16/0x1b
[Wed Apr 16 15:29:33 2014] Code:  Bad RIP value.
[Wed Apr 16 15:29:33 2014] RIP  [<ffffffffa0003000>] 0xffffffffa0003000
[Wed Apr 16 15:29:33 2014]  RSP <ffff8800db25dca0>
[Wed Apr 16 15:29:33 2014] CR2: ffffffffa0003000
[Wed Apr 16 15:29:33 2014] ---[ end trace 3e88c173037af84b ]---

This splat is due to the fact that torture_init_begin() and
torture_init_end() are both marked with __init, despite their use
at runtime.  This commit therefore removes __init from both functions.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2b3f8ff
@popcornmix popcornmix referenced this issue from a commit
@ribalda ribalda [media] videobuf2-dma-sg: Fix NULL pointer dereference BUG
vb2_get_vma() copy the content of the vma to a new structure but set
some of its pointers to NULL.

One of this pointer is used by follow_pte() called by follow_pfn()
on io memory.

This can lead to a NULL pointer derreference.

The version of vma that has not been cleared must be used.

[  406.143320] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[  406.143427] IP: [<ffffffff8115204c>] follow_pfn+0x2c/0x70
[  406.143491] PGD 6c3f0067 PUD 6c3ef067 PMD 0
[  406.143546] Oops: 0000 [#1] SMP
[  406.143587] Modules linked in: qtec_mem qt5023_video qtec_testgen qtec_xform videobuf2_core gpio_xilinx videobuf2_vmalloc videobuf2_dma_sg qtec_cmosis videobuf2_memops qtec_pcie qtec_white fglrx(PO) qt5023 spi_xilinx spi_bitbang
[  406.143852] CPU: 0 PID: 299 Comm: tracker Tainted: P           O 3.13.0-qtec-standard #10
[  406.143927] Hardware name: QTechnology QT5022/QT5022, BIOS PM_2.1.0.309 X64 04/04/2013
[  406.144000] task: ffff880085c82d60 ti: ffff880085abe000 task.ti: ffff880085abe000
[  406.144067] RIP: 0010:[<ffffffff8115204c>]  [<ffffffff8115204c>] follow_pfn+0x2c/0x70
[  406.144145] RSP: 0018:ffff880085abf888  EFLAGS: 00010296
[  406.144195] RAX: 0000000000000000 RBX: ffff880085abf8e0 RCX: ffff880085abf888
[  406.144260] RDX: ffff880085abf890 RSI: 00007fc52e173000 RDI: ffff8800863cbe40
[  406.144325] RBP: ffff880085abf8a8 R08: 0000000000000018 R09: ffff8800863cbf00
[  406.144388] R10: ffff880086703b80 R11: 00000000000001e0 R12: 0000000000018000
[  406.144452] R13: 0000000000000000 R14: ffffea0000000000 R15: ffff88015922fea0
[  406.144517] FS:  00007fc536e7c740(0000) GS:ffff88015ec00000(0000) knlGS:0000000000000000
[  406.144591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  406.144644] CR2: 0000000000000040 CR3: 0000000066c9d000 CR4: 00000000000007f0
[  406.144708] Stack:
[  406.144731]  0000000000018000 00007fc52e18b000 0000000000000000 00007fc52e173000
[  406.144813]  ffff880085abf918 ffffffffa083b2fd ffff880085ab1ba8 0000000000000000
[  406.144894]  0000000000000000 0000000100000000 ffff880085abf928 ffff880159a20800
[  406.144976] Call Trace:
[  406.145011]  [<ffffffffa083b2fd>] vb2_dma_sg_get_userptr+0x14d/0x310 [videobuf2_dma_sg]
[  406.145089]  [<ffffffffa08507df>] __qbuf_userptr+0xbf/0x3e0 [videobuf2_core]
[  406.147229]  [<ffffffffa0041454>] ? mc_heap_lock_memory+0x1f4/0x490 [fglrx]
[  406.149234]  [<ffffffff813428f3>] ? cpumask_next_and+0x23/0x50
[  406.151223]  [<ffffffff810b2e38>] ? enqueue_task_fair+0x658/0xde0
[  406.153199]  [<ffffffff81061888>] ? native_smp_send_reschedule+0x48/0x60
[  406.155184]  [<ffffffff815836b9>] ? get_ctrl+0xa9/0xd0
[  406.157161]  [<ffffffff8116f4e4>] ? __kmalloc+0x1a4/0x1b0
[  406.159135]  [<ffffffffa0850b9c>] ? __vb2_queue_alloc+0x9c/0x4a0 [videobuf2_core]
[  406.161130]  [<ffffffffa0852d08>] __buf_prepare+0x1a8/0x210 [videobuf2_core]
[  406.163171]  [<ffffffffa0854c57>] __vb2_qbuf+0x27/0xcc [videobuf2_core]
[  406.165229]  [<ffffffffa0851dfd>] vb2_queue_or_prepare_buf+0x1ed/0x270 [videobuf2_core]
[  406.167325]  [<ffffffffa0854c30>] ? vb2_ioctl_querybuf+0x30/0x30 [videobuf2_core]
[  406.169419]  [<ffffffffa0851e9c>] vb2_qbuf+0x1c/0x20 [videobuf2_core]
[  406.171508]  [<ffffffffa0851ef8>] vb2_ioctl_qbuf+0x58/0x70 [videobuf2_core]
[  406.173604]  [<ffffffff8157d3a8>] v4l_qbuf+0x48/0x60
[  406.175681]  [<ffffffff8157b29c>] __video_do_ioctl+0x2bc/0x340
[  406.177779]  [<ffffffff8116f43c>] ? __kmalloc+0xfc/0x1b0
[  406.179883]  [<ffffffff8157cd0e>] ? video_usercopy+0x7e/0x470
[  406.181961]  [<ffffffff8157ce81>] video_usercopy+0x1f1/0x470
[  406.184021]  [<ffffffff8157afe0>] ? v4l_printk_ioctl+0xb0/0xb0
[  406.186085]  [<ffffffff810ae1ed>] ? account_system_time+0x8d/0x190
[  406.188149]  [<ffffffff8157d115>] video_ioctl2+0x15/0x20
[  406.190216]  [<ffffffff815781b3>] v4l2_ioctl+0x123/0x160
[  406.192251]  [<ffffffff810ce415>] ? rcu_eqs_enter+0x65/0xa0
[  406.194256]  [<ffffffff81186b28>] do_vfs_ioctl+0x88/0x560
[  406.196258]  [<ffffffff810ae145>] ? account_user_time+0x95/0xb0
[  406.198262]  [<ffffffff810ae6a4>] ? vtime_account_user+0x44/0x70
[  406.200215]  [<ffffffff81187091>] SyS_ioctl+0x91/0xb0
[  406.202107]  [<ffffffff817be109>] tracesys+0xd0/0xd5
[  406.203946] Code: 66 66 66 90 48 f7 47 50 00 44 00 00 b8 ea ff ff ff 74 52 55 48 89 e5 53 48 89 d3 48 8d 4d e0 48 8d 55 e8 48 83 ec 18 48 8b 47 40 <48> 8b 78 40 e8 8b fe ff ff 85 c0 75 27 48 8b 55 e8 48 b9 00 f0
[  406.208011] RIP  [<ffffffff8115204c>] follow_pfn+0x2c/0x70
[  406.209908]  RSP <ffff880085abf888>
[  406.211760] CR2: 0000000000000040
[  406.213676] ---[ end trace 996d9f64e6739a04 ]---

Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
227ae22
@popcornmix popcornmix referenced this issue from a commit
@georgecherian georgecherian usb: dwc3: dwc3-omap: Fix the crash on module removal
Following crash is seen on dwc3_omap removal
Unable to handle kernel NULL pointer dereference at virtual address 00000018
pgd = ec098000
[00000018] *pgd=ad1f9831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in: usb_f_ss_lb g_zero usb_f_acm u_serial usb_f_ecm u_ether libcomposite configfs snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_hwdep snd_soc_omap snd_pcm_dmaengine snd_soc_core snd_compress snd_pcm snd_tim]
CPU: 0 PID: 1296 Comm: rmmod Tainted: G        W     3.15.0-rc4-02716-g95c4e18-dirty #10
task: ed05a080 ti: ec368000 task.ti: ec368000
PC is at release_resource+0x14/0x7c
LR is at release_resource+0x10/0x7c
pc : [<c0044724>]    lr : [<c0044720>]    psr: 60000013
sp : ec369ec0  ip : 60000013  fp : 00021008
r10: 00000000  r9 : ec368000  r8 : c000e7a4
r7 : 00000081  r6 : bf0062c0  r5 : ed7cd000  r4 : ed7d85c0
r3 : 00000000  r2 : 00000000  r1 : 00000011  r0 : c086d08c
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: ac098059  DAC: 00000015
Process rmmod (pid: 1296, stack limit = 0xec368248)
Stack: (0xec369ec0 to 0xec36a000)
9ec0: 00000000 00000001 ed7cd000 c034de94 ed7cd010 ed7cd000 00000000 c034e194
9ee0: 00000000 bf0062cc ed7cd010 c03490b0 ed154cc0 ed4c2570 ed2b8410 ed156810
ed156810 bf006d24 c034db9c c034db84 c034c518
9f20: bf006d24 ed156810 bf006d24 c034cd2c bf006d24 bf006d68 00000800 c034c340
9f40: 00000000 c00a9e5c 00000020 00000000 bf006d68 00000800 ec369f4c 33637764
9f60: 616d6f5f 00000070 00000001 ec368000 ed05a080 c000e670 00000001 c0084010
9f80: 00021088 00000800 00021088 00000081 80000010 0000e6f4 00021088 00000800
9fa0: 00021088 c000e5e0 00021088 00000800 000210b8 00000800 e04f6d00 e04f6d00
9fc0: 00021088 00000800 00021088 00000081 00000001 00000000 be91de08 00021008
9fe0: 4d768880 be91dbb4 b6fc5984 4d76888c 80000010 000210b8 00000000 00000000
[<c0044724>] (release_resource) from [<c034de94>] (platform_device_del+0x6c/0x9c)
[<c034de94>] (platform_device_del) from [<c034e194>] (platform_device_unregister+0xc/0x18)
[<c034e194>] (platform_device_unregister) from [<bf0062cc>] (dwc3_omap_remove_core+0xc/0x14 [dwc3_omap])
[<bf0062cc>] (dwc3_omap_remove_core [dwc3_omap]) from [<c03490b0>] (device_for_each_child+0x34/0x74)
[<c03490b0>] (device_for_each_child) from [<bf0062b4>] (dwc3_omap_remove+0x6c/0x78 [dwc3_omap])
[<bf0062b4>] (dwc3_omap_remove [dwc3_omap]) from [<c034db9c>] (platform_drv_remove+0x18/0x1c)
[<c034db9c>] (platform_drv_remove) from [<c034c518>] (__device_release_driver+0x70/0xc8)
[<c034c518>] (__device_release_driver) from [<c034cd2c>] (driver_detach+0xb4/0xb8)
[<c034cd2c>] (driver_detach) from [<c034c340>] (bus_remove_driver+0x4c/0x90)
[<c034c340>] (bus_remove_driver) from [<c00a9e5c>] (SyS_delete_module+0x10c/0x198)
[<c00a9e5c>] (SyS_delete_module) from [<c000e5e0>] (ret_fast_syscall+0x0/0x48)
Code: e1a04000 e59f0068 eb14505e e5943010 (e5932018)
---[ end trace 7e2a8746ff4fc811 ]---
Segmentation fault

[ balbi@ti.com : add CONFIG_OF dependency ]

Signed-off-by: George Cherian <george.cherian@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
c5a1fbc
@popcornmix popcornmix referenced this issue from a commit
Wengang Wang ocfs2: refcount: take rw_lock in ocfs2_reflink
This patch tries to fix this crash:

 #5 [ffff88003c1cd690] do_invalid_op at ffffffff810166d5
 #6 [ffff88003c1cd730] invalid_op at ffffffff8159b2de
    [exception RIP: ocfs2_direct_IO_get_blocks+359]
    RIP: ffffffffa05dfa27  RSP: ffff88003c1cd7e8  RFLAGS: 00010202
    RAX: 0000000000000000  RBX: ffff88003c1cdaa8  RCX: 0000000000000000
    RDX: 000000000000000c  RSI: ffff880027a95000  RDI: ffff88003c79b540
    RBP: ffff88003c1cd858   R8: 0000000000000000   R9: ffffffff815f6ba0
    R10: 00000000000001c9  R11: 00000000000001c9  R12: ffff88002d271500
    R13: 0000000000000001  R14: 0000000000000000  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88003c1cd860] do_direct_IO at ffffffff811cd31b
 #8 [ffff88003c1cd950] direct_IO_iovec at ffffffff811cde9c
 #9 [ffff88003c1cd9b0] do_blockdev_direct_IO at ffffffff811ce764
#10 [ffff88003c1cdb80] __blockdev_direct_IO at ffffffff811ce7cc
#11 [ffff88003c1cdbb0] ocfs2_direct_IO at ffffffffa05df756 [ocfs2]
#12 [ffff88003c1cdbe0] generic_file_direct_write_iter at ffffffff8112f935
#13 [ffff88003c1cdc40] ocfs2_file_write_iter at ffffffffa0600ccc [ocfs2]
#14 [ffff88003c1cdd50] do_aio_write at ffffffff8119126c
#15 [ffff88003c1cddc0] aio_rw_vect_retry at ffffffff811d9bb4
#16 [ffff88003c1cddf0] aio_run_iocb at ffffffff811db880
#17 [ffff88003c1cde30] io_submit_one at ffffffff811dc238
#18 [ffff88003c1cde80] do_io_submit at ffffffff811dc437
#19 [ffff88003c1cdf70] sys_io_submit at ffffffff811dc530
#20 [ffff88003c1cdf80] system_call_fastpath at ffffffff8159a159

It crashes at
        BUG_ON(create && (ext_flags & OCFS2_EXT_REFCOUNTED));
in ocfs2_direct_IO_get_blocks.

ocfs2_direct_IO_get_blocks is expecting the OCFS2_EXT_REFCOUNTED be removed in
ocfs2_prepare_inode_for_write() if it was there. But no cluster lock is taken
during the time before (or inside) ocfs2_prepare_inode_for_write() and after
ocfs2_direct_IO_get_blocks().

It can happen in this case:

Node A(which crashes)				Node B
------------------------                 ---------------------------
ocfs2_file_aio_write
  ocfs2_prepare_inode_for_write
    ocfs2_inode_lock
    ...
    ocfs2_inode_unlock
  #no refcount found
....					ocfs2_reflink
                                          ocfs2_inode_lock
                                          ...
                                          ocfs2_inode_unlock
                                          #now, refcount flag set on extent

                                        ...
                                        flush change to disk

ocfs2_direct_IO_get_blocks
  ocfs2_get_clusters
    #extent map miss
    #buffer_head miss
    read extents from disk
  found refcount flag on extent
  crash..

Fix:
Take rw_lock in ocfs2_reflink path

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8a8ad1c
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Dave Hansen x86, sched: Add new topology for multi-NUMA-node CPUs
I'm getting the spew below when booting with Haswell (Xeon
E5-2699 v3) CPUs and the "Cluster-on-Die" (CoD) feature enabled
in the BIOS.  It seems similar to the issue that some folks from
AMD ran in to on their systems and addressed in this commit:

  161270f ("x86/smp: Fix topology checks on AMD MCM CPUs")

Both these Intel and AMD systems break an assumption which is
being enforced by topology_sane(): a socket may not contain more
than one NUMA node.

AMD special-cased their system by looking for a cpuid flag.  The
Intel mode is dependent on BIOS options and I do not know of a
way which it is enumerated other than the tables being parsed
during the CPU bringup process.  In other words, we have to trust
the ACPI tables <shudder>.

This detects the situation where a NUMA node occurs at a place in
the middle of the "CPU" sched domains.  It replaces the default
topology with one that relies on the NUMA information from the
firmware (SRAT table) for all levels of sched domains above the
hyperthreads.

This also fixes a sysfs bug.  We used to freak out when we saw
the "mc" group cross a node boundary, so we stopped building the
MC group.  MC gets exported as the 'core_siblings_list' in
/sys/devices/system/cpu/cpu*/topology/ and this caused CPUs with
the same 'physical_package_id' to not be listed together in
'core_siblings_list'.  This violates a statement from
Documentation/ABI/testing/sysfs-devices-system-cpu:

	core_siblings: internal kernel map of cpu#'s hardware threads
	within the same physical_package_id.

	core_siblings_list: human-readable list of the logical CPU
	numbers within the same physical_package_id as cpu#.

The sysfs effects here cause an issue with the hwloc tool where
it gets confused and thinks there are more sockets than are
physically present.

Before this patch, there are two packages:

# cd /sys/devices/system/cpu/
# cat cpu*/topology/physical_package_id | sort | uniq -c
     18 0
     18 1

But 4 _sets_ of core siblings:

# cat cpu*/topology/core_siblings_list | sort | uniq -c
      9 0-8
      9 18-26
      9 27-35
      9 9-17

After this set, there are only 2 sets of core siblings, which
is what we expect for a 2-socket system.

# cat cpu*/topology/physical_package_id | sort | uniq -c
     18 0
     18 1
# cat cpu*/topology/core_siblings_list | sort | uniq -c
     18 0-17
     18 18-35

Example spew:
...
	NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
	 #2  #3  #4  #5  #6  #7  #8
	.... node  #1, CPUs:    #9
	------------[ cut here ]------------
	WARNING: CPU: 9 PID: 0 at /home/ak/hle/linux-hle-2.6/arch/x86/kernel/smpboot.c:306 topology_sane.isra.2+0x74/0x90()
	sched: CPU #9's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
	Modules linked in:
	CPU: 9 PID: 0 Comm: swapper/9 Not tainted 3.17.0-rc1-00293-g8e01c4d-dirty #631
	Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0036.R05.1407140519 07/14/2014
	0000000000000009 ffff88046ddabe00 ffffffff8172e485 ffff88046ddabe48
	ffff88046ddabe38 ffffffff8109691d 000000000000b001 0000000000000009
	ffff88086fc12580 000000000000b020 0000000000000009 ffff88046ddabe98
	Call Trace:
	[<ffffffff8172e485>] dump_stack+0x45/0x56
	[<ffffffff8109691d>] warn_slowpath_common+0x7d/0xa0
	[<ffffffff8109698c>] warn_slowpath_fmt+0x4c/0x50
	[<ffffffff81074f94>] topology_sane.isra.2+0x74/0x90
	[<ffffffff8107530e>] set_cpu_sibling_map+0x31e/0x4f0
	[<ffffffff8107568d>] start_secondary+0x1ad/0x240
	---[ end trace 3fe5f587a9fcde61 ]---
	#10 #11 #12 #13 #14 #15 #16 #17
	.... node  #2, CPUs:   #18 #19 #20 #21 #22 #23 #24 #25 #26
	.... node  #3, CPUs:   #27 #28 #29 #30 #31 #32 #33 #34 #35

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
[ Added LLC domain and s/match_mc/match_die/ ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: brice.goglin@gmail.com
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/20140918193334.C065EBCE@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cebf15e
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
@davem330 davem330 Merge branch 'bpf-next'
Alexei Starovoitov says:

====================
eBPF syscall, verifier, testsuite

v14 -> v15:
- got rid of macros with hidden control flow (suggested by David)
  replaced macro with explicit goto or return and simplified
  where possible (affected patches #9 and #10)
- rebased, retested

v13 -> v14:
- small change to 1st patch to ease 'new userspace with old kernel'
  problem (done similar to perf_copy_attr()) (suggested by Daniel)
- the rest unchanged

v12 -> v13:
- replaced 'foo __user *' pointers with __aligned_u64 (suggested by David)
- added __attribute__((aligned(8)) to 'union bpf_attr' to keep
  constant alignment between patches
- updated manpage and syscall wrappers due to __aligned_u64
- rebased, retested on x64 with 32-bit and 64-bit userspace and on i386,
  build tested on arm32,sparc64

v11 -> v12:
- dropped patch 11 and copied few macros to libbpf.h (suggested by Daniel)
- replaced 'enum bpf_prog_type' with u32 to be safe in compat (.. Andy)
- implemented and tested compat support (not part of this set) (.. Daniel)
- changed 'void *log_buf' to 'char *' (.. Daniel)
- combined struct bpf_work_struct and bpf_prog_info (.. Daniel)
- added better return value explanation to manpage (.. Andy)
- added log_buf/log_size explanation to manpage (.. Andy & Daniel)
- added a lot more info about prog_type and map_type to manpage (.. Andy)
- rebased, tweaked test_stubs

Patches 1-4 establish BPF syscall shell for maps and programs.
Patches 5-10 add verifier step by step
Patch 11 adds test stubs for 'unspec' program type and verifier testsuite
  from user space

Note that patches 1,3,4,7 add commands and attributes to the syscall
while being backwards compatible from each other, which should demonstrate
how other commands can be added in the future.

After this set the programs can be loaded for testing only. They cannot
be attached to any events. Though manpage talks about tracing and sockets,
it will be a subject of future patches.

Please take a look at manpage:

BPF(2)                     Linux Programmer's Manual                    BPF(2)

NAME
       bpf - perform a command on eBPF map or program

SYNOPSIS
       #include <linux/bpf.h>

       int bpf(int cmd, union bpf_attr *attr, unsigned int size);

DESCRIPTION
       bpf()  syscall  is a multiplexor for a range of different operations on
       eBPF  which  can  be  characterized  as  "universal  in-kernel  virtual
       machine".  eBPF  is  similar  to  original  Berkeley  Packet Filter (or
       "classic BPF") used to filter network packets. Both statically  analyze
       the  programs  before  loading  them  into  the  kernel  to ensure that
       programs cannot harm the running system.

       eBPF extends classic BPF in multiple ways including ability to call in-
       kernel  helper  functions  and  access shared data structures like eBPF
       maps.  The programs can be written in a restricted C that  is  compiled
       into  eBPF  bytecode  and executed on the eBPF virtual machine or JITed
       into native instruction set.

   eBPF Design/Architecture
       eBPF maps is a generic storage of different types.   User  process  can
       create  multiple  maps  (with key/value being opaque bytes of data) and
       access them via file descriptor. In parallel eBPF programs  can  access
       maps  from inside the kernel.  It's up to user process and eBPF program
       to decide what they store inside maps.

       eBPF programs are similar to kernel modules. They  are  loaded  by  the
       user  process  and automatically unloaded when process exits. Each eBPF
       program is a safe run-to-completion set of instructions. eBPF  verifier
       statically  determines  that  the  program  terminates  and  is safe to
       execute. During verification the program takes a hold of maps  that  it
       intends to use, so selected maps cannot be removed until the program is
       unloaded. The program can be attached to different events. These events
       can  be packets, tracepoint events and other types in the future. A new
       event triggers execution of the program  which  may  store  information
       about the event in the maps.  Beyond storing data the programs may call
       into in-kernel helper functions which may, for example, dump stack,  do
       trace_printk  or other forms of live kernel debugging. The same program
       can be attached to multiple events. Different programs can  access  the
       same map:
         tracepoint  tracepoint  tracepoint    sk_buff    sk_buff
          event A     event B     event C      on eth0    on eth1
           |             |          |            |          |
           |             |          |            |          |
           --> tracing <--      tracing       socket      socket
                prog_1           prog_2       prog_3      prog_4
                |  |               |            |
             |---  -----|  |-------|           map_3
           map_1       map_2

   Syscall Arguments
       bpf()  syscall  operation  is determined by cmd which can be one of the
       following:

       BPF_MAP_CREATE
              Create a map with given type and attributes and return map FD

       BPF_MAP_LOOKUP_ELEM
              Lookup element by key in a given map and return its value

       BPF_MAP_UPDATE_ELEM
              Create or update element (key/value pair) in a given map

       BPF_MAP_DELETE_ELEM
              Lookup and delete element by key in a given map

       BPF_MAP_GET_NEXT_KEY
              Lookup element by key in a given map  and  return  key  of  next
              element

       BPF_PROG_LOAD
              Verify and load eBPF program

       attr   is a pointer to a union of type bpf_attr as defined below.

       size   is the size of the union.

       union bpf_attr {
           struct { /* anonymous struct used by BPF_MAP_CREATE command */
               __u32             map_type;
               __u32             key_size;    /* size of key in bytes */
               __u32             value_size;  /* size of value in bytes */
               __u32             max_entries; /* max number of entries in a map */
           };

           struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
               __u32             map_fd;
               __aligned_u64     key;
               union {
                   __aligned_u64 value;
                   __aligned_u64 next_key;
               };
           };

           struct { /* anonymous struct used by BPF_PROG_LOAD command */
               __u32         prog_type;
               __u32         insn_cnt;
               __aligned_u64 insns;     /* 'const struct bpf_insn *' */
               __aligned_u64 license;   /* 'const char *' */
               __u32         log_level; /* verbosity level of eBPF verifier */
               __u32         log_size;  /* size of user buffer */
               __aligned_u64 log_buf;   /* user supplied 'char *' buffer */
           };
       } __attribute__((aligned(8)));

   eBPF maps
       maps  is  a generic storage of different types for sharing data between
       kernel and userspace.

       Any map type has the following attributes:
         . type
         . max number of elements
         . key size in bytes
         . value size in bytes

       The following wrapper functions demonstrate how  this  syscall  can  be
       used  to  access the maps. The functions use the cmd argument to invoke
       different operations.

       BPF_MAP_CREATE
              int bpf_create_map(enum bpf_map_type map_type, int key_size,
                                 int value_size, int max_entries)
              {
                  union bpf_attr attr = {
                      .map_type = map_type,
                      .key_size = key_size,
                      .value_size = value_size,
                      .max_entries = max_entries
                  };

                  return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
              }
              bpf()  syscall  creates  a  map  of  map_type  type  and   given
              attributes  key_size,  value_size,  max_entries.   On success it
              returns process-local file descriptor. On error, -1 is  returned
              and errno is set to EINVAL or EPERM or ENOMEM.

              The  attributes key_size and value_size will be used by verifier
              during  program  loading  to  check  that  program  is   calling
              bpf_map_*_elem() helper functions with correctly initialized key
              and  that  program  doesn't  access  map  element  value  beyond
              specified  value_size.   For  example,  when map is created with
              key_size = 8 and program does:
              bpf_map_lookup_elem(map_fd, fp - 4)
              such program will be rejected, since in-kernel  helper  function
              bpf_map_lookup_elem(map_fd,  void  *key) expects to read 8 bytes
              from 'key' pointer, but 'fp - 4' starting address will cause out
              of bounds stack access.

              Similarly,  when  map is created with value_size = 1 and program
              does:
              value = bpf_map_lookup_elem(...);
              *(u32 *)value = 1;
              such program will be rejected, since it accesses  value  pointer
              beyond specified 1 byte value_size limit.

              Currently only hash table map_type is supported:
              enum bpf_map_type {
                 BPF_MAP_TYPE_UNSPEC,
                 BPF_MAP_TYPE_HASH,
              };
              map_type  selects  one  of  the available map implementations in
              kernel. For all map_types eBPF programs  access  maps  with  the
              same      bpf_map_lookup_elem()/bpf_map_update_elem()     helper
              functions.

       BPF_MAP_LOOKUP_ELEM
              int bpf_lookup_elem(int fd, void *key, void *value)
              {
                  union bpf_attr attr = {
                      .map_fd = fd,
                      .key = ptr_to_u64(key),
                      .value = ptr_to_u64(value),
                  };

                  return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
              }
              bpf() syscall looks up an element with given key in  a  map  fd.
              If  element  is found it returns zero and stores element's value
              into value.  If element is not found  it  returns  -1  and  sets
              errno to ENOENT.

       BPF_MAP_UPDATE_ELEM
              int bpf_update_elem(int fd, void *key, void *value)
              {
                  union bpf_attr attr = {
                      .map_fd = fd,
                      .key = ptr_to_u64(key),
                      .value = ptr_to_u64(value),
                  };

                  return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
              }
              The  call  creates  or updates element with given key/value in a
              map fd.  On success it returns zero.  On error, -1  is  returned
              and  errno  is set to EINVAL or EPERM or ENOMEM or E2BIG.  E2BIG
              indicates that number of elements in the map reached max_entries
              limit specified at map creation time.

       BPF_MAP_DELETE_ELEM
              int bpf_delete_elem(int fd, void *key)
              {
                  union bpf_attr attr = {
                      .map_fd = fd,
                      .key = ptr_to_u64(key),
                  };

                  return bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
              }
              The call deletes an element in a map fd with given key.  Returns
              zero on success. If element is not found it returns -1 and  sets
              errno to ENOENT.

       BPF_MAP_GET_NEXT_KEY
              int bpf_get_next_key(int fd, void *key, void *next_key)
              {
                  union bpf_attr attr = {
                      .map_fd = fd,
                      .key = ptr_to_u64(key),
                      .next_key = ptr_to_u64(next_key),
                  };

                  return bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
              }
              The  call  looks  up  an  element  by  key in a given map fd and
              returns key of the next element into next_key pointer. If key is
              not  found,  it return zero and returns key of the first element
              into next_key. If key is the last element,  it  returns  -1  and
              sets  errno  to  ENOENT. Other possible errno values are ENOMEM,
              EFAULT, EPERM, EINVAL.  This method can be used to iterate  over
              all elements of the map.

       close(map_fd)
              will  delete  the  map  map_fd.  Exiting process will delete all
              maps automatically.

   eBPF programs
       BPF_PROG_LOAD
              This cmd is used to load eBPF program into the kernel.

              char bpf_log_buf[LOG_BUF_SIZE];

              int bpf_prog_load(enum bpf_prog_type prog_type,
                                const struct bpf_insn *insns, int insn_cnt,
                                const char *license)
              {
                  union bpf_attr attr = {
                      .prog_type = prog_type,
                      .insns = ptr_to_u64(insns),
                      .insn_cnt = insn_cnt,
                      .license = ptr_to_u64(license),
                      .log_buf = ptr_to_u64(bpf_log_buf),
                      .log_size = LOG_BUF_SIZE,
                      .log_level = 1,
                  };

                  return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
              }
              prog_type is one of the available program types:
              enum bpf_prog_type {
                      BPF_PROG_TYPE_UNSPEC,
                      BPF_PROG_TYPE_SOCKET,
                      BPF_PROG_TYPE_TRACING,
              };
              By picking prog_type program author  selects  a  set  of  helper
              functions callable from eBPF program and corresponding format of
              struct bpf_context (which is  the  data  blob  passed  into  the
              program  as  the  first  argument).   For  example, the programs
              loaded with  prog_type  =  TYPE_TRACING  may  call  bpf_printk()
              helper,  whereas  TYPE_SOCKET  programs  may  not.   The  set of
              functions  available  to  the  programs  under  given  type  may
              increase in the future.

              Currently the set of functions for TYPE_TRACING is:
              bpf_map_lookup_elem(map_fd, void *key)              // lookup key in a map_fd
              bpf_map_update_elem(map_fd, void *key, void *value) // update key/value
              bpf_map_delete_elem(map_fd, void *key)              // delete key in a map_fd
              bpf_ktime_get_ns(void)                              // returns current ktime
              bpf_printk(char *fmt, int fmt_size, ...)            // prints into trace buffer
              bpf_memcmp(void *ptr1, void *ptr2, int size)        // non-faulting memcmp
              bpf_fetch_ptr(void *ptr)    // non-faulting load pointer from any address
              bpf_fetch_u8(void *ptr)     // non-faulting 1 byte load
              bpf_fetch_u16(void *ptr)    // other non-faulting loads
              bpf_fetch_u32(void *ptr)
              bpf_fetch_u64(void *ptr)

              and bpf_context is defined as:
              struct bpf_context {
                  /* argN fields match one to one to arguments passed to trace events */
                  u64 arg1, arg2, arg3, arg4, arg5, arg6;
                  /* return value from kretprobe event or from syscall_exit event */
                  u64 ret;
              };

              The set of helper functions for TYPE_SOCKET is TBD.

              More   program   types   may   be  added  in  the  future.  Like
              BPF_PROG_TYPE_USER_TRACING for unprivileged programs.

              BPF_PROG_TYPE_UNSPEC is used for  testing  only.  Such  programs
              cannot be attached to events.

              insns array of "struct bpf_insn" instructions

              insn_cnt number of instructions in the program

              license  license  string,  which  must be GPL compatible to call
              helper functions marked gpl_only

              log_buf user supplied buffer that in-kernel verifier is using to
              store  verification  log. Log is a multi-line string that should
              be used by program author to understand  how  verifier  came  to
              conclusion  that program is unsafe. The format of the output can
              change at any time as verifier evolves.

              log_size size of user buffer. If size of the buffer is not large
              enough  to store all verifier messages, -1 is returned and errno
              is set to ENOSPC.

              log_level verbosity level of eBPF verifier, where zero means  no
              logs provided

       close(prog_fd)
              will unload eBPF program

       The  maps  are  accesible  from  programs  and  generally  tie  the two
       together.  Programs process various events  (like  tracepoint,  kprobe,
       packets)  and  store  the  data into maps. User space fetches data from
       maps.  Either the same or a different map may be used by user space  as
       configuration space to alter program behavior on the fly.

   Events
       Once an eBPF program is loaded, it can be attached to an event. Various
       kernel subsystems have different ways to do so. For example:

       setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd));
       will attach the program prog_fd to socket sock which  was  received  by
       prior call to socket().

       ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
       will  attach  the  program  prog_fd  to  perf  event event_fd which was
       received by prior call to perf_event_open().

       Another way to attach the program to a tracing event is:
       event_fd = open("/sys/kernel/debug/tracing/events/skb/kfree_skb/filter");
       write(event_fd, "bpf-123"); /* where 123 is eBPF program FD */
       /* here program is attached and will be triggered by events */
       close(event_fd); /* to detach from event */

EXAMPLES
       /* eBPF+sockets example:
        * 1. create map with maximum of 2 elements
        * 2. set map[6] = 0 and map[17] = 0
        * 3. load eBPF program that counts number of TCP and UDP packets received
        *    via map[skb->ip->proto]++
        * 4. attach prog_fd to raw socket via setsockopt()
        * 5. print number of received TCP/UDP packets every second
        */
       int main(int ac, char **av)
       {
           int sock, map_fd, prog_fd, key;
           long long value = 0, tcp_cnt, udp_cnt;

           map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 2);
           if (map_fd < 0) {
               printf("failed to create map '%s'\n", strerror(errno));
               /* likely not run as root */
               return 1;
           }

           key = 6; /* ip->proto == tcp */
           assert(bpf_update_elem(map_fd, &key, &value) == 0);

           key = 17; /* ip->proto == udp */
           assert(bpf_update_elem(map_fd, &key, &value) == 0);

           struct bpf_insn prog[] = {
               BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),          /* r6 = r1 */
               BPF_LD_ABS(BPF_B, 14 + 9),                    /* r0 = ip->proto */
               BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),/* *(u32 *)(fp - 4) = r0 */
               BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),         /* r2 = fp */
               BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),        /* r2 = r2 - 4 */
               BPF_LD_MAP_FD(BPF_REG_1, map_fd),             /* r1 = map_fd */
               BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),      /* r0 = map_lookup(r1, r2) */
               BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),        /* if (r0 == 0) goto pc+2 */
               BPF_MOV64_IMM(BPF_REG_1, 1),                  /* r1 = 1 */
               BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* lock *(u64 *)r0 += r1 */
               BPF_MOV64_IMM(BPF_REG_0, 0),                  /* r0 = 0 */
               BPF_EXIT_INSN(),                              /* return r0 */
           };
           prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET, prog, sizeof(prog), "GPL");
           assert(prog_fd >= 0);

           sock = open_raw_sock("lo");

           assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd,
                             sizeof(prog_fd)) == 0);

           for (;;) {
               key = 6;
               assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
               key = 17;
               assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);
               printf("TCP %lld UDP %lld packets0, tcp_cnt, udp_cnt);
               sleep(1);
           }

           return 0;
       }

RETURN VALUE
       For a successful call, the return value depends on the operation:

       BPF_MAP_CREATE
              The new file descriptor associated with eBPF map.

       BPF_PROG_LOAD
              The new file descriptor associated with eBPF program.

       All other commands
              Zero.

       On error, -1 is returned, and errno is set appropriately.

ERRORS
       EPERM  bpf() syscall was made without sufficient privilege (without the
              CAP_SYS_ADMIN capability).

       ENOMEM Cannot allocate sufficient memory.

       EBADF  fd is not an open file descriptor

       EFAULT One  of  the  pointers  (  key or value or log_buf or insns ) is
              outside accessible address space.

       EINVAL The value specified in cmd is not recognized by this kernel.

       EINVAL For BPF_MAP_CREATE, either map_type or attributes are invalid.

       EINVAL For BPF_MAP_*_ELEM  commands,  some  of  the  fields  of  "union
              bpf_attr" unused by this command are not set to zero.

       EINVAL For BPF_PROG_LOAD, attempt to load invalid program (unrecognized
              instruction or uses reserved fields or jumps  out  of  range  or
              loop detected or calls unknown function).

       EACCES For BPF_PROG_LOAD, though program has valid instructions, it was
              rejected, since it was  deemed  unsafe  (may  access  disallowed
              memory   region  or  uninitialized  stack/register  or  function
              constraints don't match actual types or misaligned  access).  In
              such case it is recommended to call bpf() again with log_level =
              1 and examine log_buf for specific reason provided by verifier.

       ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM,  indicates  that
              element with given key was not found.

       E2BIG  program  is  too  large  or a map reached max_entries limit (max
              number of elements).

NOTES
       These commands may be used only by a privileged process (one having the
       CAP_SYS_ADMIN capability).

SEE ALSO
       eBPF    architecture    and    instruction    set   is   explained   in
       Documentation/networking/filter.txt

Linux                             2014-09-16                            BPF(2)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
b4fc1a4
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
Junxiao Bi ocfs2: fix deadlock due to wrong locking order
For commit ocfs2 journal, ocfs2 journal thread will acquire the mutex
osb->journal->j_trans_barrier and wake up jbd2 commit thread, then it
will wait until jbd2 commit thread done. In order journal mode, jbd2
needs flushing dirty data pages first, and this needs get page lock.
So osb->journal->j_trans_barrier should be got before page lock.

But ocfs2_write_zero_page() and ocfs2_write_begin_inline() obey this
locking order, and this will cause deadlock and hung the whole cluster.

One deadlock catched is the following:

PID: 13449  TASK: ffff8802e2f08180  CPU: 31  COMMAND: "oracle"
 #0 [ffff8802ee3f79b0] __schedule at ffffffff8150a524
 #1 [ffff8802ee3f7a58] schedule at ffffffff8150acbf
 #2 [ffff8802ee3f7a68] rwsem_down_failed_common at ffffffff8150cb85
 #3 [ffff8802ee3f7ad8] rwsem_down_read_failed at ffffffff8150cc55
 #4 [ffff8802ee3f7ae8] call_rwsem_down_read_failed at ffffffff812617a4
 #5 [ffff8802ee3f7b50] ocfs2_start_trans at ffffffffa0498919 [ocfs2]
 #6 [ffff8802ee3f7ba0] ocfs2_zero_start_ordered_transaction at ffffffffa048b2b8 [ocfs2]
 #7 [ffff8802ee3f7bf0] ocfs2_write_zero_page at ffffffffa048e9bd [ocfs2]
 #8 [ffff8802ee3f7c80] ocfs2_zero_extend_range at ffffffffa048ec83 [ocfs2]
 #9 [ffff8802ee3f7ce0] ocfs2_zero_extend at ffffffffa048edfd [ocfs2]
 #10 [ffff8802ee3f7d50] ocfs2_extend_file at ffffffffa049079e [ocfs2]
 #11 [ffff8802ee3f7da0] ocfs2_setattr at ffffffffa04910ed [ocfs2]
 #12 [ffff8802ee3f7e70] notify_change at ffffffff81187d29
 #13 [ffff8802ee3f7ee0] do_truncate at ffffffff8116bbc1
 #14 [ffff8802ee3f7f50] sys_ftruncate at ffffffff8116bcbd
 #15 [ffff8802ee3f7f80] system_call_fastpath at ffffffff81515142
    RIP: 00007f8de750c6f7  RSP: 00007fffe786e478  RFLAGS: 00000206
    RAX: 000000000000004d  RBX: ffffffff81515142  RCX: 0000000000000000
    RDX: 0000000000000200  RSI: 0000000000028400  RDI: 000000000000000d
    RBP: 00007fffe786e040   R8: 0000000000000000   R9: 000000000000000d
    R10: 0000000000000000  R11: 0000000000000206  R12: 000000000000000d
    R13: 00007fffe786e710  R14: 00007f8de70f8340  R15: 0000000000028400
    ORIG_RAX: 000000000000004d  CS: 0033  SS: 002b

crash64> bt
PID: 7610   TASK: ffff88100fd56140  CPU: 1   COMMAND: "ocfs2cmt"
 #0 [ffff88100f4d1c50] __schedule at ffffffff8150a524
 #1 [ffff88100f4d1cf8] schedule at ffffffff8150acbf
 #2 [ffff88100f4d1d08] jbd2_log_wait_commit at ffffffffa01274fd [jbd2]
 #3 [ffff88100f4d1d98] jbd2_journal_flush at ffffffffa01280b4 [jbd2]
 #4 [ffff88100f4d1dd8] ocfs2_commit_cache at ffffffffa0499b14 [ocfs2]
 #5 [ffff88100f4d1e38] ocfs2_commit_thread at ffffffffa0499d38 [ocfs2]
 #6 [ffff88100f4d1ee8] kthread at ffffffff81090db6
 #7 [ffff88100f4d1f48] kernel_thread_helper at ffffffff81516284

crash64> bt
PID: 7609   TASK: ffff88100f2d4480  CPU: 0   COMMAND: "jbd2/dm-20-86"
 #0 [ffff88100def3920] __schedule at ffffffff8150a524
 #1 [ffff88100def39c8] schedule at ffffffff8150acbf
 #2 [ffff88100def39d8] io_schedule at ffffffff8150ad6c
 #3 [ffff88100def39f8] sleep_on_page at ffffffff8111069e
 #4 [ffff88100def3a08] __wait_on_bit_lock at ffffffff8150b30a
 #5 [ffff88100def3a58] __lock_page at ffffffff81110687
 #6 [ffff88100def3ab8] write_cache_pages at ffffffff8111b752
 #7 [ffff88100def3be8] generic_writepages at ffffffff8111b901
 #8 [ffff88100def3c48] journal_submit_data_buffers at ffffffffa0120f67 [jbd2]
 #9 [ffff88100def3cf8] jbd2_journal_commit_transaction at ffffffffa0121372[jbd2]
 #10 [ffff88100def3e68] kjournald2 at ffffffffa0127a86 [jbd2]
 #11 [ffff88100def3ee8] kthread at ffffffff81090db6
 #12 [ffff88100def3f48] kernel_thread_helper at ffffffff81516284

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Alex Chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
f775da2
@swarren swarren referenced this issue from a commit in swarren/linux-rpi
@pranith pranith powerpc: Wire up sys_bpf() syscall
This patch wires up the new syscall sys_bpf() on powerpc.

Passes the tests in samples/bpf:

    #0 add+sub+mul OK
    #1 unreachable OK
    #2 unreachable2 OK
    #3 out of range jump OK
    #4 out of range jump2 OK
    #5 test1 ld_imm64 OK
    #6 test2 ld_imm64 OK
    #7 test3 ld_imm64 OK
    #8 test4 ld_imm64 OK
    #9 test5 ld_imm64 OK
    #10 no bpf_exit OK
    #11 loop (back-edge) OK
    #12 loop2 (back-edge) OK
    #13 conditional loop OK
    #14 read uninitialized register OK
    #15 read invalid register OK
    #16 program doesn't init R0 before exit OK
    #17 stack out of bounds OK
    #18 invalid call insn1 OK
    #19 invalid call insn2 OK
    #20 invalid function call OK
    #21 uninitialized stack1 OK
    #22 uninitialized stack2 OK
    #23 check valid spill/fill OK
    #24 check corrupted spill/fill OK
    #25 invalid src register in STX OK
    #26 invalid dst register in STX OK
    #27 invalid dst register in ST OK
    #28 invalid src register in LDX OK
    #29 invalid dst register in LDX OK
    #30 junk insn OK
    #31 junk insn2 OK
    #32 junk insn3 OK
    #33 junk insn4 OK
    #34 junk insn5 OK
    #35 misaligned read from stack OK
    #36 invalid map_fd for function call OK
    #37 don't check return value before access OK
    #38 access memory with incorrect alignment OK
    #39 sometimes access memory with incorrect alignment OK
    #40 jump test 1 OK
    #41 jump test 2 OK
    #42 jump test 3 OK
    #43 jump test 4 OK

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
[mpe: test using samples/bpf]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
fcbb539
@popcornmix popcornmix referenced this issue from a commit
Johan Hovold USB: console: fix uninitialised ldisc semaphore
commit d269d44 upstream.

The USB console currently allocates a temporary fake tty which is used
to pass terminal settings to the underlying serial driver.

The tty struct is not fully initialised, something which can lead to a
lockdep warning (or worse) if a serial driver tries to acquire a
line-discipline reference:

	usbserial: USB Serial support registered for pl2303
	pl2303 1-2.1:1.0: pl2303 converter detected
	usb 1-2.1: pl2303 converter now attached to ttyUSB0
	INFO: trying to register non-static key.
	the code is fine but needs lockdep annotation.
	turning off the locking correctness validator.
	CPU: 0 PID: 68 Comm: udevd Tainted: G        W      3.18.0-rc5 #10
	[<c0016f04>] (unwind_backtrace) from [<c0013978>] (show_stack+0x20/0x24)
	[<c0013978>] (show_stack) from [<c0449794>] (dump_stack+0x24/0x28)
	[<c0449794>] (dump_stack) from [<c006f730>] (__lock_acquire+0x1e50/0x2004)
	[<c006f730>] (__lock_acquire) from [<c0070128>] (lock_acquire+0xe4/0x18c)
	[<c0070128>] (lock_acquire) from [<c027c6f8>] (ldsem_down_read_trylock+0x78/0x90)
	[<c027c6f8>] (ldsem_down_read_trylock) from [<c027a1cc>] (tty_ldisc_ref+0x24/0x58)
	[<c027a1cc>] (tty_ldisc_ref) from [<c0340760>] (usb_serial_handle_dcd_change+0x48/0xe8)
	[<c0340760>] (usb_serial_handle_dcd_change) from [<bf000484>] (pl2303_read_int_callback+0x210/0x220 [pl2303])
	[<bf000484>] (pl2303_read_int_callback [pl2303]) from [<c031624c>] (__usb_hcd_giveback_urb+0x80/0x140)
	[<c031624c>] (__usb_hcd_giveback_urb) from [<c0316fc0>] (usb_giveback_urb_bh+0x98/0xd4)
	[<c0316fc0>] (usb_giveback_urb_bh) from [<c0042e44>] (tasklet_hi_action+0x9c/0x108)
	[<c0042e44>] (tasklet_hi_action) from [<c0042380>] (__do_softirq+0x148/0x42c)
	[<c0042380>] (__do_softirq) from [<c00429cc>] (irq_exit+0xd8/0x114)
	[<c00429cc>] (irq_exit) from [<c007ae58>] (__handle_domain_irq+0x84/0xdc)
	[<c007ae58>] (__handle_domain_irq) from [<c000879c>] (omap_intc_handle_irq+0xd8/0xe0)
	[<c000879c>] (omap_intc_handle_irq) from [<c0014544>] (__irq_svc+0x44/0x7c)
	Exception stack(0xdf4e7f08 to 0xdf4e7f50)
	7f00:                   debc0b80 df4e7f5c 00000000 00000000 debc0b80 be8da96c
	7f20: 00000000 00000128 c000fc84 df4e6000 00000000 df4e7f94 00000004 df4e7f50
	7f40: c038ebc0 c038d74c 600f0013 ffffffff
	[<c0014544>] (__irq_svc) from [<c038d74c>] (___sys_sendmsg.part.29+0x0/0x2e0)
	[<c038d74c>] (___sys_sendmsg.part.29) from [<c038ec08>] (SyS_sendmsg+0x18/0x1c)
	[<c038ec08>] (SyS_sendmsg) from [<c000fa00>] (ret_fast_syscall+0x0/0x48)
	console [ttyUSB0] enabled

Fixes: 3669752 ("tty: Replace ldisc locking with ldisc_sem")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
68d91b4
@julianscheel julianscheel referenced this issue from a commit in julianscheel/linux
@storulf storulf PM / Domains: Fix initial default state of the need_restore flag
The initial state of the device's need_restore flag should'nt depend on
the current state of the PM domain. For example it should be perfectly
valid to attach an inactive device to a powered PM domain.

The pm_genpd_dev_need_restore() API allow us to update the need_restore
flag to somewhat cope with such scenarios. Typically that should have
been done from drivers/buses ->probe() since it's those that put the
requirements on the value of the need_restore flag.

Until recently, the Exynos SOCs were the only user of the
pm_genpd_dev_need_restore() API, though invoking it from a centralized
location while adding devices to their PM domains.

Due to that Exynos now have swithed to the generic OF-based PM domain
look-up, it's no longer possible to invoke the API from a centralized
location. The reason is because devices are now added to their PM
domains during the probe sequence.

Commit "ARM: exynos: Move to generic PM domain DT bindings"
did the switch for Exynos to the generic OF-based PM domain look-up,
but it also removed the call to pm_genpd_dev_need_restore(). This
caused a regression for some of the Exynos drivers.

To handle things more properly in the generic PM domain, let's change
the default initial value of the need_restore flag to reflect that the
state is unknown. As soon as some of the runtime PM callbacks gets
invoked, update the initial value accordingly.

Moreover, since the generic PM domain is verifying that all devices
are both runtime PM enabled and suspended, using pm_runtime_suspended()
while pm_genpd_poweroff() is invoked from the scheduled work, we can be
sure of that the PM domain won't be powering off while having active
devices.

Do note that, the generic PM domain can still only know about active
devices which has been activated through invoking its runtime PM resume
callback. In other words, buses/drivers using pm_runtime_set_active()
during ->probe() will still suffer from a race condition, potentially
probing a device without having its PM domain being powered. That issue
will have to be solved using a different approach.

This a log from the boot regression for Exynos5, which is being fixed in
this patch.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 308 at ../drivers/clk/clk.c:851 clk_disable+0x24/0x30()
Modules linked in:
CPU: 0 PID: 308 Comm: kworker/0:1 Not tainted 3.18.0-rc3-00569-gbd9449f-dirty #10
Workqueue: pm pm_runtime_work
[<c0013c64>] (unwind_backtrace) from [<c0010dec>] (show_stack+0x10/0x14)
[<c0010dec>] (show_stack) from [<c03ee4cc>] (dump_stack+0x70/0xbc)
[<c03ee4cc>] (dump_stack) from [<c0020d34>] (warn_slowpath_common+0x64/0x88)
[<c0020d34>] (warn_slowpath_common) from [<c0020d74>] (warn_slowpath_null+0x1c/0x24)
[<c0020d74>] (warn_slowpath_null) from [<c03107b0>] (clk_disable+0x24/0x30)
[<c03107b0>] (clk_disable) from [<c02cc834>] (gsc_runtime_suspend+0x128/0x160)
[<c02cc834>] (gsc_runtime_suspend) from [<c0249024>] (pm_generic_runtime_suspend+0x2c/0x38)
[<c0249024>] (pm_generic_runtime_suspend) from [<c024f44c>] (pm_genpd_default_save_state+0x2c/0x8c)
[<c024f44c>] (pm_genpd_default_save_state) from [<c024ff2c>] (pm_genpd_poweroff+0x224/0x3ec)
[<c024ff2c>] (pm_genpd_poweroff) from [<c02501b4>] (pm_genpd_runtime_suspend+0x9c/0xcc)
[<c02501b4>] (pm_genpd_runtime_suspend) from [<c024a4f8>] (__rpm_callback+0x2c/0x60)
[<c024a4f8>] (__rpm_callback) from [<c024a54c>] (rpm_callback+0x20/0x74)
[<c024a54c>] (rpm_callback) from [<c024a930>] (rpm_suspend+0xd4/0x43c)
[<c024a930>] (rpm_suspend) from [<c024bbcc>] (pm_runtime_work+0x80/0x90)
[<c024bbcc>] (pm_runtime_work) from [<c0032a9c>] (process_one_work+0x12c/0x314)
[<c0032a9c>] (process_one_work) from [<c0032cf4>] (worker_thread+0x3c/0x4b0)
[<c0032cf4>] (worker_thread) from [<c003747c>] (kthread+0xcc/0xe8)
[<c003747c>] (kthread) from [<c000e738>] (ret_from_fork+0x14/0x3c)
---[ end trace 40cd58bcd6988f12 ]---

Fixes: a4a8c2c (ARM: exynos: Move to generic PM domain DT bindings)
Reported-and-tested0by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
67732cd
@nereeb nereeb referenced this issue from a commit in nereeb/linux
Andrea Arcangeli ext4: avoid hangs in ext4_da_should_update_i_disksize()
commit ea51d13 upstream.

If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 #10 <signal handler called>
 #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 #12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
d118bc3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.