Watchdog driver #5

Open
Alan-Cox opened this Issue Jan 25, 2012 · 0 comments

Projects

None yet

1 participant

@Alan-Cox
Contributor

This one actually looks fine.

We are moving to a watchdog class so a lot of the code can go eventually but nothing here beyond minor style (pr_ etc) and the already marked incomplete bits

@richo richo pushed a commit to richo/linux that referenced this issue Mar 6, 2012
@ming1 ming1 + Jiri Kosina HID: usbhid: fix dead lock between open and disconect
There is no reason to hold hiddev->existancelock before
calling usb_deregister_dev, so move it out of the lock.

The patch fixes the lockdep warning below.

[ 5733.386271] ======================================================
[ 5733.386274] [ INFO: possible circular locking dependency detected ]
[ 5733.386278] 3.2.0-custom-next-20120111+ #1 Not tainted
[ 5733.386281] -------------------------------------------------------
[ 5733.386284] khubd/186 is trying to acquire lock:
[ 5733.386288]  (minor_rwsem){++++.+}, at: [<ffffffffa0011a04>] usb_deregister_dev+0x37/0x9e [usbcore]
[ 5733.386311]
[ 5733.386312] but task is already holding lock:
[ 5733.386315]  (&hiddev->existancelock){+.+...}, at: [<ffffffffa0094d17>] hiddev_disconnect+0x26/0x87 [usbhid]
[ 5733.386328]
[ 5733.386329] which lock already depends on the new lock.
[ 5733.386330]
[ 5733.386333]
[ 5733.386334] the existing dependency chain (in reverse order) is:
[ 5733.386336]
[ 5733.386337] -> #1 (&hiddev->existancelock){+.+...}:
[ 5733.386346]        [<ffffffff81082d26>] lock_acquire+0xcb/0x10e
[ 5733.386357]        [<ffffffff813df961>] __mutex_lock_common+0x60/0x465
[ 5733.386366]        [<ffffffff813dfe4d>] mutex_lock_nested+0x36/0x3b
[ 5733.386371]        [<ffffffffa0094ad6>] hiddev_open+0x113/0x193 [usbhid]
[ 5733.386378]        [<ffffffffa0011971>] usb_open+0x66/0xc2 [usbcore]
[ 5733.386390]        [<ffffffff8111a8b5>] chrdev_open+0x12b/0x154
[ 5733.386402]        [<ffffffff811159a8>] __dentry_open.isra.16+0x20b/0x355
[ 5733.386408]        [<ffffffff811165dc>] nameidata_to_filp+0x43/0x4a
[ 5733.386413]        [<ffffffff81122ed5>] do_last+0x536/0x570
[ 5733.386419]        [<ffffffff8112300b>] path_openat+0xce/0x301
[ 5733.386423]        [<ffffffff81123327>] do_filp_open+0x33/0x81
[ 5733.386427]        [<ffffffff8111664d>] do_sys_open+0x6a/0xfc
[ 5733.386431]        [<ffffffff811166fb>] sys_open+0x1c/0x1e
[ 5733.386434]        [<ffffffff813e7c79>] system_call_fastpath+0x16/0x1b
[ 5733.386441]
[ 5733.386441] -> #0 (minor_rwsem){++++.+}:
[ 5733.386448]        [<ffffffff8108255d>] __lock_acquire+0xa80/0xd74
[ 5733.386454]        [<ffffffff81082d26>] lock_acquire+0xcb/0x10e
[ 5733.386458]        [<ffffffff813e01f5>] down_write+0x44/0x77
[ 5733.386464]        [<ffffffffa0011a04>] usb_deregister_dev+0x37/0x9e [usbcore]
[ 5733.386475]        [<ffffffffa0094d2d>] hiddev_disconnect+0x3c/0x87 [usbhid]
[ 5733.386483]        [<ffffffff8132df51>] hid_disconnect+0x3f/0x54
[ 5733.386491]        [<ffffffff8132dfb4>] hid_device_remove+0x4e/0x7a
[ 5733.386496]        [<ffffffff812c0957>] __device_release_driver+0x81/0xcd
[ 5733.386502]        [<ffffffff812c09c3>] device_release_driver+0x20/0x2d
[ 5733.386507]        [<ffffffff812c0564>] bus_remove_device+0x114/0x128
[ 5733.386512]        [<ffffffff812bdd6f>] device_del+0x131/0x183
[ 5733.386519]        [<ffffffff8132def3>] hid_destroy_device+0x1e/0x3d
[ 5733.386525]        [<ffffffffa00916b0>] usbhid_disconnect+0x36/0x42 [usbhid]
[ 5733.386530]        [<ffffffffa000fb60>] usb_unbind_interface+0x57/0x11f [usbcore]
[ 5733.386542]        [<ffffffff812c0957>] __device_release_driver+0x81/0xcd
[ 5733.386547]        [<ffffffff812c09c3>] device_release_driver+0x20/0x2d
[ 5733.386552]        [<ffffffff812c0564>] bus_remove_device+0x114/0x128
[ 5733.386557]        [<ffffffff812bdd6f>] device_del+0x131/0x183
[ 5733.386562]        [<ffffffffa000de61>] usb_disable_device+0xa8/0x1d8 [usbcore]
[ 5733.386573]        [<ffffffffa0006bd2>] usb_disconnect+0xab/0x11f [usbcore]
[ 5733.386583]        [<ffffffffa0008aa0>] hub_thread+0x73b/0x1157 [usbcore]
[ 5733.386593]        [<ffffffff8105dc0f>] kthread+0x95/0x9d
[ 5733.386601]        [<ffffffff813e90b4>] kernel_thread_helper+0x4/0x10
[ 5733.386607]
[ 5733.386608] other info that might help us debug this:
[ 5733.386609]
[ 5733.386612]  Possible unsafe locking scenario:
[ 5733.386613]
[ 5733.386615]        CPU0                    CPU1
[ 5733.386618]        ----                    ----
[ 5733.386620]   lock(&hiddev->existancelock);
[ 5733.386625]                                lock(minor_rwsem);
[ 5733.386630]                                lock(&hiddev->existancelock);
[ 5733.386635]   lock(minor_rwsem);
[ 5733.386639]
[ 5733.386640]  *** DEADLOCK ***
[ 5733.386641]
[ 5733.386644] 6 locks held by khubd/186:
[ 5733.386646]  #0:  (&__lockdep_no_validate__){......}, at: [<ffffffffa00084af>] hub_thread+0x14a/0x1157 [usbcore]
[ 5733.386661]  #1:  (&__lockdep_no_validate__){......}, at: [<ffffffffa0006b77>] usb_disconnect+0x50/0x11f [usbcore]
[ 5733.386677]  #2:  (hcd->bandwidth_mutex){+.+.+.}, at: [<ffffffffa0006bc8>] usb_disconnect+0xa1/0x11f [usbcore]
[ 5733.386693]  #3:  (&__lockdep_no_validate__){......}, at: [<ffffffff812c09bb>] device_release_driver+0x18/0x2d
[ 5733.386704]  #4:  (&__lockdep_no_validate__){......}, at: [<ffffffff812c09bb>] device_release_driver+0x18/0x2d
[ 5733.386714]  #5:  (&hiddev->existancelock){+.+...}, at: [<ffffffffa0094d17>] hiddev_disconnect+0x26/0x87 [usbhid]
[ 5733.386727]
[ 5733.386727] stack backtrace:
[ 5733.386731] Pid: 186, comm: khubd Not tainted 3.2.0-custom-next-20120111+ #1
[ 5733.386734] Call Trace:
[ 5733.386741]  [<ffffffff81062881>] ? up+0x34/0x3b
[ 5733.386747]  [<ffffffff813d9ef3>] print_circular_bug+0x1f8/0x209
[ 5733.386752]  [<ffffffff8108255d>] __lock_acquire+0xa80/0xd74
[ 5733.386756]  [<ffffffff810808b4>] ? trace_hardirqs_on_caller+0x15d/0x1a3
[ 5733.386763]  [<ffffffff81043a3f>] ? vprintk+0x3f4/0x419
[ 5733.386774]  [<ffffffffa0011a04>] ? usb_deregister_dev+0x37/0x9e [usbcore]
[ 5733.386779]  [<ffffffff81082d26>] lock_acquire+0xcb/0x10e
[ 5733.386789]  [<ffffffffa0011a04>] ? usb_deregister_dev+0x37/0x9e [usbcore]
[ 5733.386797]  [<ffffffff813e01f5>] down_write+0x44/0x77
[ 5733.386807]  [<ffffffffa0011a04>] ? usb_deregister_dev+0x37/0x9e [usbcore]
[ 5733.386818]  [<ffffffffa0011a04>] usb_deregister_dev+0x37/0x9e [usbcore]
[ 5733.386825]  [<ffffffffa0094d2d>] hiddev_disconnect+0x3c/0x87 [usbhid]
[ 5733.386830]  [<ffffffff8132df51>] hid_disconnect+0x3f/0x54
[ 5733.386834]  [<ffffffff8132dfb4>] hid_device_remove+0x4e/0x7a
[ 5733.386839]  [<ffffffff812c0957>] __device_release_driver+0x81/0xcd
[ 5733.386844]  [<ffffffff812c09c3>] device_release_driver+0x20/0x2d
[ 5733.386848]  [<ffffffff812c0564>] bus_remove_device+0x114/0x128
[ 5733.386854]  [<ffffffff812bdd6f>] device_del+0x131/0x183
[ 5733.386859]  [<ffffffff8132def3>] hid_destroy_device+0x1e/0x3d
[ 5733.386865]  [<ffffffffa00916b0>] usbhid_disconnect+0x36/0x42 [usbhid]
[ 5733.386876]  [<ffffffffa000fb60>] usb_unbind_interface+0x57/0x11f [usbcore]
[ 5733.386882]  [<ffffffff812c0957>] __device_release_driver+0x81/0xcd
[ 5733.386886]  [<ffffffff812c09c3>] device_release_driver+0x20/0x2d
[ 5733.386890]  [<ffffffff812c0564>] bus_remove_device+0x114/0x128
[ 5733.386895]  [<ffffffff812bdd6f>] device_del+0x131/0x183
[ 5733.386905]  [<ffffffffa000de61>] usb_disable_device+0xa8/0x1d8 [usbcore]
[ 5733.386916]  [<ffffffffa0006bd2>] usb_disconnect+0xab/0x11f [usbcore]
[ 5733.386921]  [<ffffffff813dff82>] ? __mutex_unlock_slowpath+0x130/0x141
[ 5733.386929]  [<ffffffffa0008aa0>] hub_thread+0x73b/0x1157 [usbcore]
[ 5733.386935]  [<ffffffff8106a51d>] ? finish_task_switch+0x78/0x150
[ 5733.386941]  [<ffffffff8105e396>] ? __init_waitqueue_head+0x4c/0x4c
[ 5733.386950]  [<ffffffffa0008365>] ? usb_remote_wakeup+0x56/0x56 [usbcore]
[ 5733.386955]  [<ffffffff8105dc0f>] kthread+0x95/0x9d
[ 5733.386961]  [<ffffffff813e90b4>] kernel_thread_helper+0x4/0x10
[ 5733.386966]  [<ffffffff813e24b8>] ? retint_restore_args+0x13/0x13
[ 5733.386970]  [<ffffffff8105db7a>] ? __init_kthread_worker+0x55/0x55
[ 5733.386974]  [<ffffffff813e90b0>] ? gs_change+0x13/0x13

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
ba18311
@richo richo pushed a commit to richo/linux that referenced this issue Mar 6, 2012
@torvalds Mel Gorman + torvalds mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGE…
…S block during isolation for migration

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff00  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
0bf380b
@richo richo pushed a commit to richo/linux that referenced this issue Mar 6, 2012
@yizou @Jkirsher yizou + Jkirsher ixgbe: do not update real num queues when netdev is going away
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 #6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 #7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 #9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
#19 [ffff88021f007e68] worker_thread at ffffffff81060513
#20 [ffff88021f007ee8] kthread at ffffffff810648b6
#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9d837ea
@bootc bootc pushed a commit to bootc/linux-rpi-orig that referenced this issue May 8, 2012
@gregkh Mel Gorman + gregkh mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGE…
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff00  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9da11af
@PeterHuewe PeterHuewe pushed a commit to PeterHuewe/linux that referenced this issue May 18, 2012
@gregkh Andreas Herrmann + gregkh x86, microcode: Fix sysfs warning during module unload on unsupported…
… CPUs

commit a956bd6 upstream.

Loading the microcode driver on an unsupported CPU and subsequently
unloading the driver causes

 WARNING: at fs/sysfs/group.c:138 mc_device_remove+0x5f/0x70 [microcode]()
 Hardware name: 01972NG
 sysfs group ffffffffa00013d0 not found for kobject 'cpu0'
 Modules linked in: snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel btusb snd_hda_codec bluetooth thinkpad_acpi rfkill microcode(-) [last unloaded: cfg80211]
 Pid: 4560, comm: modprobe Not tainted 3.4.0-rc2-00002-g258f742 #5
 Call Trace:
  [<ffffffff8103113b>] ? warn_slowpath_common+0x7b/0xc0
  [<ffffffff81031235>] ? warn_slowpath_fmt+0x45/0x50
  [<ffffffff81120e74>] ? sysfs_remove_group+0x34/0x120
  [<ffffffffa00000ef>] ? mc_device_remove+0x5f/0x70 [microcode]
  [<ffffffff81331eb9>] ? subsys_interface_unregister+0x69/0xa0
  [<ffffffff81563526>] ? mutex_lock+0x16/0x40
  [<ffffffffa0000c3e>] ? microcode_exit+0x50/0x92 [microcode]
  [<ffffffff8107051d>] ? sys_delete_module+0x16d/0x260
  [<ffffffff810a0065>] ? wait_iff_congested+0x45/0x110
  [<ffffffff815656af>] ? page_fault+0x1f/0x30
  [<ffffffff81565ba2>] ? system_call_fastpath+0x16/0x1b

on recent kernels.

This is due to commit 8a25a2f ("cpu: convert 'cpu' and
'machinecheck' sysdev_class to a regular subsystem") which renders
commit 6c53cbf ("x86, microcode: Correct sysdev_add error path")
useless.

See http://marc.info/?l=linux-kernel&m=133416246406478

Avoid above warning by restoring the old driver behaviour before
6c53cbf ("x86, microcode: Correct sysdev_add error path").

Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: http://lkml.kernel.org/r/20120411163849.GE4794@alberich.amd.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2069bb2
@andatche andatche pushed a commit to andatche/raspi-linux that referenced this issue May 29, 2012
@bwhacks Andreas Herrmann + bwhacks x86, microcode: Fix sysfs warning during module unload on unsupported…
… CPUs

commit a956bd6 upstream.

Loading the microcode driver on an unsupported CPU and subsequently
unloading the driver causes

 WARNING: at fs/sysfs/group.c:138 mc_device_remove+0x5f/0x70 [microcode]()
 Hardware name: 01972NG
 sysfs group ffffffffa00013d0 not found for kobject 'cpu0'
 Modules linked in: snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel btusb snd_hda_codec bluetooth thinkpad_acpi rfkill microcode(-) [last unloaded: cfg80211]
 Pid: 4560, comm: modprobe Not tainted 3.4.0-rc2-00002-g258f742 #5
 Call Trace:
  [<ffffffff8103113b>] ? warn_slowpath_common+0x7b/0xc0
  [<ffffffff81031235>] ? warn_slowpath_fmt+0x45/0x50
  [<ffffffff81120e74>] ? sysfs_remove_group+0x34/0x120
  [<ffffffffa00000ef>] ? mc_device_remove+0x5f/0x70 [microcode]
  [<ffffffff81331eb9>] ? subsys_interface_unregister+0x69/0xa0
  [<ffffffff81563526>] ? mutex_lock+0x16/0x40
  [<ffffffffa0000c3e>] ? microcode_exit+0x50/0x92 [microcode]
  [<ffffffff8107051d>] ? sys_delete_module+0x16d/0x260
  [<ffffffff810a0065>] ? wait_iff_congested+0x45/0x110
  [<ffffffff815656af>] ? page_fault+0x1f/0x30
  [<ffffffff81565ba2>] ? system_call_fastpath+0x16/0x1b

on recent kernels.

This is due to commit 8a25a2f ("cpu: convert 'cpu' and
'machinecheck' sysdev_class to a regular subsystem") which renders
commit 6c53cbf ("x86, microcode: Correct sysdev_add error path")
useless.

See http://marc.info/?l=linux-kernel&m=133416246406478

Avoid above warning by restoring the old driver behaviour before
6c53cbf ("x86, microcode: Correct sysdev_add error path").

Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: http://lkml.kernel.org/r/20120411163849.GE4794@alberich.amd.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
[bwh: Backported to 3.2: deleted line uses sys_dev, not dev]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
31114c4
@erique erique pushed a commit to erique/rpi_linux that referenced this issue Jul 16, 2012
@gregkh Andrea Arcangeli + gregkh mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race …
…condition

commit 26c1917 upstream.

When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.

PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
 #0 [f06a9dd8] crash_kexec at c049b5ec
 #1 [f06a9e2c] oops_end at c083d1c2
 #2 [f06a9e40] no_context at c0433ded
 #3 [f06a9e64] bad_area_nosemaphore at c043401a
 #4 [f06a9e6c] __do_page_fault at c0434493
 #5 [f06a9eec] do_page_fault at c083eb45
 #6 [f06a9f04] error_code (via page_fault) at c083c5d5
    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
    00000000
    DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
    CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
 #7 [f06a9f38] _spin_lock at c083bc14
 #8 [f06a9f44] sys_mincore at c0507b7d
 #9 [f06a9fb0] system_call at c083becd
                         start           len
    EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
    SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
    CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286

This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.

With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.

This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.

Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.

NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.

This bug was discovered and fully debugged by Ulrich, quote:

----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.

    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
    *pmd)
    497 {
    498         /* depend on compiler for an atomic pmd read */
    499         pmd_t pmdval = *pmd;

                                // edi = pmd pointer
0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
...
                                // edx = PTE page table high address
0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
...
                                // eax = PTE page table low address
0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax

[..]

Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.

-  The PMD entry {high|low} is 0x0000000000000000.
   The "mov" at 0xc0507a84 loads 0x00000000 into edx.

-  A page fault (on another CPU) sneaks in between the two "mov"
   instructions and instantiates the PMD.

-  The PMD entry {high|low} is now 0x00000003fda38067.
   The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----

Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2d363e9
@popcornmix popcornmix pushed a commit to popcornmix/linux that referenced this issue Aug 16, 2012
@jtlayton @bwhacks jtlayton + bwhacks cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps
commit 3cf003c upstream.

Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:

crash> bt
PID: 2789   TASK: f02edaa0  CPU: 3   COMMAND: "fsx"
 #0 [eed63cbc] schedule at c083c5b3
 #1 [eed63d80] kmap_high at c0500ec8
 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
 #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
 #4 [eed63e50] do_writepages at c04f3e32
 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a
 #6 [eed63ea4] filemap_fdatawrite at c04e1b3e
 #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
 #8 [eed63ecc] do_sync_write at c052d202
 #9 [eed63f74] vfs_write at c052d4ee
#10 [eed63f94] sys_write at c052df4c
#11 [eed63fb0] ia32_sysenter_target at c0409a98
    EAX: 00000004  EBX: 00000003  ECX: abd73b73  EDX: 012a65c6
    DS:  007b      ESI: 012a65c6  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bf8db178  EBP: bf8db1f8  GS:  0033
    CS:  0073      EIP: 40000424  ERR: 00000004  EFLAGS: 00000246

Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.

This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.

There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.

Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
bb761c9
@popcornmix popcornmix pushed a commit to popcornmix/linux that referenced this issue Aug 16, 2012
@jtlayton @bwhacks jtlayton + bwhacks nfs: skip commit in releasepage if we're freeing memory for fs-relate…
…d reasons

commit 5cf02d0 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
1c88c58
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Sep 20, 2012
@fengguang @gregkh fengguang + gregkh Staging: panel: fix spinlock trylock failure on UP
Use spin_lock_irq() to quiet warning:

         [    8.232324] BUG: spinlock trylock failure on UP on CPU#0, reboot/85
         [    8.234138]  lock: c161c760, .magic: dead4ead, .owner: reboot/85, .owner_cpu: 0
         [    8.236132] Pid: 85, comm: reboot Not tainted 3.4.0-rc7-00656-g82163ed #5
         [    8.237965] Call Trace:
         [    8.238648]  [<c13dfd20>] ? printk+0x18/0x1a
         [    8.239827]  [<c122a5e0>] spin_dump+0x80/0xd0
         [    8.241016]  [<c122a652>] spin_bug+0x22/0x30
         [    8.242181]  [<c122a93b>] do_raw_spin_trylock+0x5b/0x70
         [    8.243611]  [<c13e8bae>] _raw_spin_trylock+0xe/0x60
         [    8.244975]  [<c1392230>] ? keypad_send_key.constprop.9+0xe0/0xe0
 ==>     [    8.246638]  [<c13922ea>] panel_scan_timer+0xba/0x570
         [    8.248019]  [<c1392230>] ? keypad_send_key.constprop.9+0xe0/0xe0
         [    8.249689]  [<c102f6f5>] run_timer_softirq+0x1e5/0x370
         [    8.251191]  [<c102f645>] ? run_timer_softirq+0x135/0x370
         [    8.252718]  [<c1392230>] ? keypad_send_key.constprop.9+0xe0/0xe0
         [    8.254462]  [<c102a592>] __do_softirq+0xc2/0x1c0
         [    8.255758]  [<c102a4d0>] ? local_bh_enable_ip+0x130/0x130
         [    8.257228]  <IRQ>  [<c102a855>] ? irq_exit+0x65/0x70
         [    8.258647]  [<c1013ff9>] ? smp_apic_timer_interrupt+0x49/0x80
         [    8.260226]  [<c13e96c1>] ? apic_timer_interrupt+0x31/0x38
         [    8.261737]  [<c12700e0>] ? drm_vm_open_locked+0x70/0xb0
         [    8.263166]  [<c122489a>] ? delay_tsc+0x1a/0x30
         [    8.264452]  [<c12248c9>] ? __delay+0x9/0x10
         [    8.265621]  [<c12248ec>] ? __const_udelay+0x1c/0x20
 ==>     [    8.266967]  [<c139136c>] ? lcd_clear_fast_p8+0x9c/0xe0
         [    8.268386]  [<c1391a66>] ? lcd_write+0x226/0x810
         [    8.269653]  [<c1367900>] ? md_set_readonly+0xc0/0xc0
         [    8.271013]  [<c122a9ed>] ? do_raw_spin_unlock+0x9d/0xe0
         [    8.272470]  [<c1392a98>] ? panel_lcd_print+0x38/0x40
         [    8.273837]  [<c1392ace>] ? panel_notify_sys+0x2e/0x60
         [    8.275224]  [<c1046634>] ? notifier_call_chain+0x84/0xb0
         [    8.276754]  [<c10469ce>] ? __blocking_notifier_call_chain+0x3e/0x60
         [    8.278576]  [<c1046a0a>] ? blocking_notifier_call_chain+0x1a/0x20
         [    8.280267]  [<c1036a14>] ? kernel_restart_prepare+0x14/0x40
         [    8.281901]  [<c1036a8e>] ? kernel_restart+0xe/0x50
         [    8.283216]  [<c1036ce9>] ? sys_reboot+0x149/0x1e0
         [    8.284532]  [<c10b3fb3>] ? handle_pte_fault+0x93/0xd70
         [    8.285956]  [<c1019e35>] ? do_page_fault+0x215/0x5e0
         [    8.287330]  [<c101a113>] ? do_page_fault+0x4f3/0x5e0
         [    8.288704]  [<c1045ac6>] ? up_read+0x16/0x30
         [    8.289890]  [<c101a113>] ? do_page_fault+0x4f3/0x5e0
         [    8.291252]  [<c10d4486>] ? iterate_supers+0x86/0xd0
         [    8.292615]  [<c122a9ed>] ? do_raw_spin_unlock+0x9d/0xe0
         [    8.294049]  [<c13e8dcd>] ? _raw_spin_unlock+0x1d/0x20
         [    8.295449]  [<c10d44ab>] ? iterate_supers+0xab/0xd0
         [    8.296795]  [<c10fb620>] ? __sync_filesystem+0xa0/0xa0
         [    8.298199]  [<c13e9b03>] ? sysenter_do_call+0x12/0x37
         [    8.306899] Restarting system.
         [    8.307747] machine restart

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
d4d2dbc
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Sep 20, 2012
Ben Myers xfs: stop the sync worker before xfs_unmountfs
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs.  This prevents occasional crashes on unmount like so:

PID: 21602  TASK: ee9df060  CPU: 0   COMMAND: "kworker/0:3"
 #0 [c5377d28] crash_kexec at c0292c94
 #1 [c5377d80] oops_end at c07090c2
 #2 [c5377d98] no_context at c06f614e
 #3 [c5377dbc] __bad_area_nosemaphore at c06f6281
 #4 [c5377df4] bad_area_nosemaphore at c06f629b
 #5 [c5377e00] do_page_fault at c070b0cb
 #6 [c5377e7c] error_code (via page_fault) at c070892c
    EAX: f300c6a8  EBX: f300c6a8  ECX: 000000c0  EDX: 000000c0  EBP: c5377ed0
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000001  GS:  ffffad20
    CS:  0060      EIP: c0481ad0  ERR: ffffffff  EFLAGS: 00010246
 #7 [c5377eb0] atomic64_read_cx8 at c0481ad0
 #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
 #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653  TASK: e79143b0  CPU: 3   COMMAND: "umount"
 #0 [cde0fda0] __schedule at c0706595
 #1 [cde0fe28] schedule at c0706b89
 #2 [cde0fe30] schedule_timeout at c0705600
 #3 [cde0fe94] __down_common at c0706098
 #4 [cde0fec8] __down at c0706122
 #5 [cde0fed0] down at c025936f
 #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
 #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
 #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
 #9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66

commit 11159a0 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
4026c9f
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Sep 29, 2012
Ben Myers xfs: stop the sync worker before xfs_unmountfs
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs.  This prevents occasional crashes on unmount like so:

PID: 21602  TASK: ee9df060  CPU: 0   COMMAND: "kworker/0:3"
 #0 [c5377d28] crash_kexec at c0292c94
 #1 [c5377d80] oops_end at c07090c2
 #2 [c5377d98] no_context at c06f614e
 #3 [c5377dbc] __bad_area_nosemaphore at c06f6281
 #4 [c5377df4] bad_area_nosemaphore at c06f629b
 #5 [c5377e00] do_page_fault at c070b0cb
 #6 [c5377e7c] error_code (via page_fault) at c070892c
    EAX: f300c6a8  EBX: f300c6a8  ECX: 000000c0  EDX: 000000c0  EBP: c5377ed0
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000001  GS:  ffffad20
    CS:  0060      EIP: c0481ad0  ERR: ffffffff  EFLAGS: 00010246
 #7 [c5377eb0] atomic64_read_cx8 at c0481ad0
 #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
 #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653  TASK: e79143b0  CPU: 3   COMMAND: "umount"
 #0 [cde0fda0] __schedule at c0706595
 #1 [cde0fe28] schedule at c0706b89
 #2 [cde0fe30] schedule_timeout at c0705600
 #3 [cde0fe94] __down_common at c0706098
 #4 [cde0fec8] __down at c0706122
 #5 [cde0fed0] down at c025936f
 #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
 #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
 #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
 #9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66

commit 11159a0 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
0ba6e53
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Sep 29, 2012
@bmork @davem330 bmork + davem330 net: qmi_wwan: adding Huawei E367, ZTE MF683 and Pantech P4200
One of the modes of Huawei E367 has this QMI/wwan interface:

 I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=01 Prot=07 Driver=(none)
 E:  Ad=83(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
 E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
 E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms

Huawei use subclass and protocol to identify vendor specific
functions, so adding a new vendor rule for this combination.

The Pantech devices UML290 (106c:3718) and P4200 (106c:3721) use
the same subclass to identify the QMI/wwan function.  Replace the
existing device specific UML290 entries with generic vendor matching,
adding support for the Pantech P4200.

The ZTE MF683 has 6 vendor specific interfaces, all using
ff/ff/ff for cls/sub/prot.  Adding a match on interface #5 which
is a QMI/wwan interface.

Cc: Fangxiaozhi (Franko) <fangxiaozhi@huawei.com>
Cc: Thomas Schäfer <tschaefer@t-online.de>
Cc: Dan Williams <dcbw@redhat.com>
Cc: Shawn J. Goff <shawn7400@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
9db273f
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Sep 29, 2012
@bmork @gregkh bmork + gregkh USB: option: blacklist QMI interface on ZTE MF683
Interface #5 on ZTE MF683 is a QMI/wwan interface.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: stable <stable@vger.kernel.org>
Cc: Shawn J. Goff <shawn7400@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
160c942
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Sep 29, 2012
@bmork @davem330 bmork + davem330 net: qmi_wwan: adding Huawei E367, ZTE MF683 and Pantech P4200
One of the modes of Huawei E367 has this QMI/wwan interface:

 I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=01 Prot=07 Driver=(none)
 E:  Ad=83(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
 E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
 E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms

Huawei use subclass and protocol to identify vendor specific
functions, so adding a new vendor rule for this combination.

The Pantech devices UML290 (106c:3718) and P4200 (106c:3721) use
the same subclass to identify the QMI/wwan function.  Replace the
existing device specific UML290 entries with generic vendor matching,
adding support for the Pantech P4200.

The ZTE MF683 has 6 vendor specific interfaces, all using
ff/ff/ff for cls/sub/prot.  Adding a match on interface #5 which
is a QMI/wwan interface.

Cc: Fangxiaozhi (Franko) <fangxiaozhi@huawei.com>
Cc: Thomas Schäfer <tschaefer@t-online.de>
Cc: Dan Williams <dcbw@redhat.com>
Cc: Shawn J. Goff <shawn7400@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
42d94dc
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Oct 3, 2012
@sameo Szymon Janc + sameo NFC: Fix sleeping in invalid context when netlink socket is closed
netlink_register_notifier requires notify functions to not sleep.
nfc_stop_poll locks device mutex and must not be called from notifier.
Create workqueue that will handle this for all devices.

BUG: sleeping function called from invalid context at kernel/mutex.c:269
in_atomic(): 0, irqs_disabled(): 0, pid: 4497, name: neard
1 lock held by neard/4497:
Pid: 4497, comm: neard Not tainted 3.5.0-999-nfc+ #5
Call Trace:
[<ffffffff810952c5>] __might_sleep+0x145/0x200
[<ffffffff81743dde>] mutex_lock_nested+0x2e/0x50
[<ffffffff816ffd19>] nfc_stop_poll+0x39/0xb0
[<ffffffff81700a17>] nfc_genl_rcv_nl_event+0x77/0xc0
[<ffffffff8174aa8c>] notifier_call_chain+0x5c/0x120
[<ffffffff8174abd6>] __atomic_notifier_call_chain+0x86/0x140
[<ffffffff8174ab50>] ? notifier_call_chain+0x120/0x120
[<ffffffff815e1347>] ? skb_dequeue+0x67/0x90
[<ffffffff8174aca6>] atomic_notifier_call_chain+0x16/0x20
[<ffffffff8162119a>] netlink_release+0x24a/0x280
[<ffffffff815d7aa8>] sock_release+0x28/0xa0
[<ffffffff815d7be7>] sock_close+0x17/0x30
[<ffffffff811b2a7c>] __fput+0xcc/0x250
[<ffffffff811b2c0e>] ____fput+0xe/0x10
[<ffffffff81085009>] task_work_run+0x69/0x90
[<ffffffff8101b951>] do_notify_resume+0x81/0xd0
[<ffffffff8174ef22>] int_signal+0x12/0x17

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
3c0cc8a
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Oct 3, 2012
@rolandd Jack Morgenstein + rolandd IB/mlx4: Propagate P_Key and guid change port management events to sl…
…aves

P_Key change and guid change events are not of interest to all slaves,
but only to those slaves which "see" the table slots whose contents
have change.

For example, if the guid at port 1, index 5 has changed in the PPF, we
wish to propagate the gid-change event only to the function which has
that guid index mapped to its port/guid table (in this case it is
slave #5). Other functions should not get the event, since the event
does not affect them.

Similarly with P_Keys -- P_Key change events are forwarded only to
slaves which have that P_Key index mapped to their virtual P_Key table.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2a4fae1
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
Luck, Tony + H. Peter Anvin x86: Remove some noise from boot log when starting cpus
Printing the "start_ip" for every secondary cpu is very noisy on a large
system - and doesn't add any value. Drop this message.

Console log before:
Booting Node   0, Processors  #1
smpboot cpu 1: start_ip = 96000
 #2
smpboot cpu 2: start_ip = 96000
 #3
smpboot cpu 3: start_ip = 96000
 #4
smpboot cpu 4: start_ip = 96000
       ...
 #31
smpboot cpu 31: start_ip = 96000
Brought up 32 CPUs

Console log after:
Booting Node   0, Processors  #1 #2 #3 #4 #5 #6 #7 Ok.
Booting Node   1, Processors  #8 #9 #10 #11 #12 #13 #14 #15 Ok.
Booting Node   0, Processors  #16 #17 #18 #19 #20 #21 #22 #23 Ok.
Booting Node   1, Processors  #24 #25 #26 #27 #28 #29 #30 #31
Brought up 32 CPUs

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
140f190
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
@sgruszka @linvjw sgruszka + linvjw iwlwifi: make tx_cmd_pool kmem cache global
Otherwise we are not able to run more than one device per driver:

[   24.743045] kmem_cache_create: duplicate cache iwl_dev_cmd
[   24.743051] Pid: 3165, comm: NetworkManager Not tainted 3.3.0-rc2-wl+ #5
[   24.743054] Call Trace:
[   24.743066]  [<ffffffff811717d5>] kmem_cache_create+0x655/0x700
[   24.743101]  [<ffffffffa03b9f8b>] iwl_alive_notify+0x1cb/0x1f0 [iwlwifi]
[   24.743111]  [<ffffffffa03ba442>] iwl_load_ucode_wait_alive+0x1b2/0x220 [iwlwifi]
[   24.743142]  [<ffffffffa03ba893>] iwl_run_init_ucode+0x73/0x100 [iwlwifi]
[   24.743152]  [<ffffffffa03b8fa1>] __iwl_up+0x81/0x220 [iwlwifi]
[   24.743161]  [<ffffffffa03b91c0>] iwlagn_mac_start+0x80/0x190 [iwlwifi]
[   24.743188]  [<ffffffffa03307b3>] ieee80211_do_open+0x293/0x770 [mac80211]

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
354928d
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
@torvalds Hugh Dickins + torvalds compact_pgdat: workaround lockdep warning in kswapd
I get this lockdep warning from swapping load on linux-next, due to
"vmscan: kswapd carefully call compaction".

=================================
[ INFO: inconsistent lock state ]
3.3.0-rc2-next-20120201 #5 Not tainted
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/28 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (pcpu_alloc_mutex){+.+.?.}, at: [<ffffffff810d6684>] pcpu_alloc+0x67/0x325
{RECLAIM_FS-ON-W} state was registered at:
  [<ffffffff81099b75>] mark_held_locks+0xd7/0x103
  [<ffffffff8109a13c>] lockdep_trace_alloc+0x85/0x9e
  [<ffffffff810f6bdc>] __kmalloc+0x6c/0x14b
  [<ffffffff810d57fd>] pcpu_mem_zalloc+0x59/0x62
  [<ffffffff810d5d16>] pcpu_extend_area_map+0x26/0xb1
  [<ffffffff810d679f>] pcpu_alloc+0x182/0x325
  [<ffffffff810d694d>] __alloc_percpu+0xb/0xd
  [<ffffffff8142ebfd>] snmp_mib_init+0x1e/0x2e
  [<ffffffff8185cd8d>] ipv4_mib_init_net+0x7a/0x184
  [<ffffffff813dc963>] ops_init.clone.0+0x6b/0x73
  [<ffffffff813dc9cc>] register_pernet_operations+0x61/0xa0
  [<ffffffff813dca8e>] register_pernet_subsys+0x29/0x42
  [<ffffffff8185d044>] inet_init+0x1ad/0x252
  [<ffffffff810002e3>] do_one_initcall+0x7a/0x12f
  [<ffffffff81832bc5>] kernel_init+0x9d/0x11e
  [<ffffffff814e51e4>] kernel_thread_helper+0x4/0x10
irq event stamp: 656613
hardirqs last  enabled at (656613): [<ffffffff814e0ddc>] __mutex_unlock_slowpath+0x104/0x128
hardirqs last disabled at (656612): [<ffffffff814e0d34>] __mutex_unlock_slowpath+0x5c/0x128
softirqs last  enabled at (655568): [<ffffffff8105b4a5>] __do_softirq+0x120/0x136
softirqs last disabled at (654757): [<ffffffff814e52dc>] call_softirq+0x1c/0x30

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(pcpu_alloc_mutex);
  <Interrupt>
    lock(pcpu_alloc_mutex);

 *** DEADLOCK ***

no locks held by kswapd0/28.

stack backtrace:
Pid: 28, comm: kswapd0 Not tainted 3.3.0-rc2-next-20120201 #5
Call Trace:
 [<ffffffff810981f4>] print_usage_bug+0x1bf/0x1d0
 [<ffffffff81096c3e>] ? print_irq_inversion_bug+0x1d9/0x1d9
 [<ffffffff810982c0>] mark_lock_irq+0xbb/0x22e
 [<ffffffff810c5399>] ? free_hot_cold_page+0x13d/0x14f
 [<ffffffff81098684>] mark_lock+0x251/0x331
 [<ffffffff81098893>] mark_irqflags+0x12f/0x141
 [<ffffffff81098e32>] __lock_acquire+0x58d/0x753
 [<ffffffff810d6684>] ? pcpu_alloc+0x67/0x325
 [<ffffffff81099433>] lock_acquire+0x54/0x6a
 [<ffffffff810d6684>] ? pcpu_alloc+0x67/0x325
 [<ffffffff8107a5b8>] ? add_preempt_count+0xa9/0xae
 [<ffffffff814e0a21>] mutex_lock_nested+0x5e/0x315
 [<ffffffff810d6684>] ? pcpu_alloc+0x67/0x325
 [<ffffffff81098f81>] ? __lock_acquire+0x6dc/0x753
 [<ffffffff810c9fb0>] ? __pagevec_release+0x2c/0x2c
 [<ffffffff810d6684>] pcpu_alloc+0x67/0x325
 [<ffffffff810c9fb0>] ? __pagevec_release+0x2c/0x2c
 [<ffffffff810d694d>] __alloc_percpu+0xb/0xd
 [<ffffffff8106c35e>] schedule_on_each_cpu+0x23/0x110
 [<ffffffff810c9fcb>] lru_add_drain_all+0x10/0x12
 [<ffffffff810f126f>] __compact_pgdat+0x20/0x182
 [<ffffffff810f15c2>] compact_pgdat+0x27/0x29
 [<ffffffff810c306b>] ? zone_watermark_ok+0x1a/0x1c
 [<ffffffff810cdf6f>] balance_pgdat+0x732/0x751
 [<ffffffff810ce0ed>] kswapd+0x15f/0x178
 [<ffffffff810cdf8e>] ? balance_pgdat+0x751/0x751
 [<ffffffff8106fd11>] kthread+0x84/0x8c
 [<ffffffff814e51e4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810787ed>] ? finish_task_switch+0x85/0xea
 [<ffffffff814e3861>] ? retint_restore_args+0xe/0xe
 [<ffffffff8106fc8d>] ? __init_kthread_worker+0x56/0x56
 [<ffffffff814e51e0>] ? gs_change+0xb/0xb

The RECLAIM_FS notations indicate that it's doing the GFP_FS checking that
Nick hacked into lockdep a while back: I think we're intended to read that
"<Interrupt>" in the DEADLOCK scenario as "<Direct reclaim>".

I'm hazy, I have not reached any conclusion as to whether it's right to
complain or not; but I believe it's uneasy about kswapd now doing the
mutex_lock(&pcpu_alloc_mutex) which lru_add_drain_all() entails.  Nor have
I reached any conclusion as to whether it's important for kswapd to do
that draining or not.

But so as not to get blocked on this, with lockdep disabled from giving
further reports, here's a patch which removes the lru_add_drain_all() from
kswapd's callpath (and calls it only once from compact_nodes(), instead of
once per node).

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8575ec2
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
Andreas Herrmann + Borislav Petkov x86, microcode: Fix sysfs warning during module unload on unsupported…
… CPUs

Loading the microcode driver on an unsupported CPU and subsequently
unloading the driver causes

 WARNING: at fs/sysfs/group.c:138 mc_device_remove+0x5f/0x70 [microcode]()
 Hardware name: 01972NG
 sysfs group ffffffffa00013d0 not found for kobject 'cpu0'
 Modules linked in: snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel btusb snd_hda_codec bluetooth thinkpad_acpi rfkill microcode(-) [last unloaded: cfg80211]
 Pid: 4560, comm: modprobe Not tainted 3.4.0-rc2-00002-g258f742 #5
 Call Trace:
  [<ffffffff8103113b>] ? warn_slowpath_common+0x7b/0xc0
  [<ffffffff81031235>] ? warn_slowpath_fmt+0x45/0x50
  [<ffffffff81120e74>] ? sysfs_remove_group+0x34/0x120
  [<ffffffffa00000ef>] ? mc_device_remove+0x5f/0x70 [microcode]
  [<ffffffff81331eb9>] ? subsys_interface_unregister+0x69/0xa0
  [<ffffffff81563526>] ? mutex_lock+0x16/0x40
  [<ffffffffa0000c3e>] ? microcode_exit+0x50/0x92 [microcode]
  [<ffffffff8107051d>] ? sys_delete_module+0x16d/0x260
  [<ffffffff810a0065>] ? wait_iff_congested+0x45/0x110
  [<ffffffff815656af>] ? page_fault+0x1f/0x30
  [<ffffffff81565ba2>] ? system_call_fastpath+0x16/0x1b

on recent kernels.

This is due to commit 8a25a2f ("cpu: convert 'cpu' and
'machinecheck' sysdev_class to a regular subsystem") which renders
commit 6c53cbf ("x86, microcode: Correct sysdev_add error path")
useless.

See http://marc.info/?l=linux-kernel&m=133416246406478

Avoid above warning by restoring the old driver behaviour before
6c53cbf ("x86, microcode: Correct sysdev_add error path").

Cc: stable@vger.kernel.org
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: http://lkml.kernel.org/r/20120411163849.GE4794@alberich.amd.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
a956bd6
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
Ben Myers xfs: protect xfs_sync_worker with s_umount semaphore
xfs_sync_worker checks the MS_ACTIVE flag in s_flags to avoid doing
work during mount and unmount.  This flag can be cleared by unmount
after the xfs_sync_worker checks it but before the work is completed.
The has caused crashes in the completion handler for the dummy
transaction commited by xfs_sync_worker:

PID: 27544  TASK: ffff88013544e040  CPU: 3   COMMAND: "kworker/3:0"
 #0 [ffff88016fdff930] machine_kexec at ffffffff810244e9
 #1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053
 #2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8
 #3 [ffff88016fdffaa0] no_context at ffffffff8102bd48
 #4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d
 #5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e
 #6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee
 #7 [ffff88016fdffc60] page_fault at ffffffff813ac635
    [exception RIP: xlog_get_lowest_lsn+0x30]
    RIP: ffffffffa04a9910  RSP: ffff88016fdffd10  RFLAGS: 00010246
    RAX: ffffc90014e48000  RBX: ffff88014d879980  RCX: ffff88014d879980
    RDX: ffff8802214ee4c0  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88016fdffd10   R8: ffff88014d879a80   R9: 0000000000000000
    R10: 0000000000000001  R11: 0000000000000000  R12: ffff8802214ee400
    R13: ffff88014d879980  R14: 0000000000000000  R15: ffff88022fd96605
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs]
 #9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs]

Protect xfs_sync_worker by using the s_umount semaphore at the read
level to provide exclusion with unmount while work is progressing.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
1307bbd
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
@linvjw Emmanuel Grumbach + linvjw iwlwifi: fix the Transmit Frame Descriptor rings
The logic that allows to have a short TFD queue was completely wrong.
We do maintain 256 Transmit Frame Descriptors, but they point to
recycled buffers. We used to attach and de-attach different TFDs for
the same buffer and it worked since they pointed to the same buffer.

Also zero the number of BDs after unmapping a TFD. This seems not
necessary since we don't reclaim the same TFD twice, but I like
housekeeping.

This patch solves this warning:

[ 6427.079855] WARNING: at lib/dma-debug.c:866 check_unmap+0x727/0x7a0()
[ 6427.079859] Hardware name: Latitude E6410
[ 6427.079865] iwlwifi 0000:02:00.0: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x00000000296d393c] [size=8 bytes]
[ 6427.079870] Modules linked in: ...
[ 6427.079950] Pid: 6613, comm: ifconfig Tainted: G           O 3.3.3 #5
[ 6427.079954] Call Trace:
[ 6427.079963]  [<c10337a2>] warn_slowpath_common+0x72/0xa0
[ 6427.079982]  [<c1033873>] warn_slowpath_fmt+0x33/0x40
[ 6427.079988]  [<c12dcb77>] check_unmap+0x727/0x7a0
[ 6427.079995]  [<c12dcdaa>] debug_dma_unmap_page+0x5a/0x80
[ 6427.080024]  [<fe2312ac>] iwlagn_unmap_tfd+0x12c/0x180 [iwlwifi]
[ 6427.080048]  [<fe231349>] iwlagn_txq_free_tfd+0x49/0xb0 [iwlwifi]
[ 6427.080071]  [<fe228e37>] iwl_tx_queue_unmap+0x67/0x90 [iwlwifi]
[ 6427.080095]  [<fe22d221>] iwl_trans_pcie_stop_device+0x341/0x7b0 [iwlwifi]
[ 6427.080113]  [<fe204b0e>] iwl_down+0x17e/0x260 [iwlwifi]
[ 6427.080132]  [<fe20efec>] iwlagn_mac_stop+0x6c/0xf0 [iwlwifi]
[ 6427.080168]  [<fd8480ce>] ieee80211_stop_device+0x5e/0x190 [mac80211]
[ 6427.080198]  [<fd833208>] ieee80211_do_stop+0x288/0x620 [mac80211]
[ 6427.080243]  [<fd8335b7>] ieee80211_stop+0x17/0x20 [mac80211]
[ 6427.080250]  [<c148dac1>] __dev_close_many+0x81/0xd0
[ 6427.080270]  [<c148db3d>] __dev_close+0x2d/0x50
[ 6427.080276]  [<c148d152>] __dev_change_flags+0x82/0x150
[ 6427.080282]  [<c148e3e3>] dev_change_flags+0x23/0x60
[ 6427.080289]  [<c14f6320>] devinet_ioctl+0x6a0/0x770
[ 6427.080296]  [<c14f8705>] inet_ioctl+0x95/0xb0
[ 6427.080304]  [<c147a0f0>] sock_ioctl+0x70/0x270

Cc: stable@vger.kernel.org
Reported-by: Antonio Quartulli <ordex@autistici.org>
Tested-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Wey-Yi W Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
ebed633
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
@torvalds Andrea Arcangeli + torvalds mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race …
…condition

When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.

PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
 #0 [f06a9dd8] crash_kexec at c049b5ec
 #1 [f06a9e2c] oops_end at c083d1c2
 #2 [f06a9e40] no_context at c0433ded
 #3 [f06a9e64] bad_area_nosemaphore at c043401a
 #4 [f06a9e6c] __do_page_fault at c0434493
 #5 [f06a9eec] do_page_fault at c083eb45
 #6 [f06a9f04] error_code (via page_fault) at c083c5d5
    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
    00000000
    DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
    CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
 #7 [f06a9f38] _spin_lock at c083bc14
 #8 [f06a9f44] sys_mincore at c0507b7d
 #9 [f06a9fb0] system_call at c083becd
                         start           len
    EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
    SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
    CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286

This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.

With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.

This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.

Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.

NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.

This bug was discovered and fully debugged by Ulrich, quote:

----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.

    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
    *pmd)
    497 {
    498         /* depend on compiler for an atomic pmd read */
    499         pmd_t pmdval = *pmd;

                                // edi = pmd pointer
0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
...
                                // edx = PTE page table high address
0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
...
                                // eax = PTE page table low address
0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax

[..]

Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.

-  The PMD entry {high|low} is 0x0000000000000000.
   The "mov" at 0xc0507a84 loads 0x00000000 into edx.

-  A page fault (on another CPU) sneaks in between the two "mov"
   instructions and instantiates the PMD.

-  The PMD entry {high|low} is now 0x00000003fda38067.
   The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----

Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
26c1917
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
Borislav Petkov + Ingo Molnar x86/smp: Fix topology checks on AMD MCM CPUs
The warning below triggers on AMD MCM packages because physical package
IDs on the cores of a _physical_ socket are the same. I.e., this field
says which CPUs belong to the same physical package.

However, the same two CPUs belong to two different internal, i.e.
"logical" nodes in the same physical socket which is reflected in the
CPU-to-node map on x86 with NUMA.

Which makes this check wrong on the above topologies so circumvent it.

[    0.444413] Booting Node   0, Processors  #1 #2 #3 #4 #5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  #6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  #7 #8 #9 #10 #11 Ok.
[    0.807108] Booting Node   3, Processors  #12 #13 #14 #15 #16 #17 Ok.
[    0.897587] Booting Node   2, Processors  #18 #19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

We ran a topology sanity check test we have here on it and
it all looks ok... hopefully :).

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
161270f
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
@jtlayton @smfrench jtlayton + smfrench cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps
Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:

crash> bt
PID: 2789   TASK: f02edaa0  CPU: 3   COMMAND: "fsx"
 #0 [eed63cbc] schedule at c083c5b3
 #1 [eed63d80] kmap_high at c0500ec8
 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
 #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
 #4 [eed63e50] do_writepages at c04f3e32
 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a
 #6 [eed63ea4] filemap_fdatawrite at c04e1b3e
 #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
 #8 [eed63ecc] do_sync_write at c052d202
 #9 [eed63f74] vfs_write at c052d4ee
#10 [eed63f94] sys_write at c052df4c
#11 [eed63fb0] ia32_sysenter_target at c0409a98
    EAX: 00000004  EBX: 00000003  ECX: abd73b73  EDX: 012a65c6
    DS:  007b      ESI: 012a65c6  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bf8db178  EBP: bf8db1f8  GS:  0033
    CS:  0073      EIP: 40000424  ERR: 00000004  EFLAGS: 00000246

Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.

This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.

There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.

Cc: <stable@vger.kernel.org>
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
3cf003c
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
@jtlayton jtlayton + Trond Myklebust nfs: skip commit in releasepage if we're freeing memory for fs-relate…
…d reasons

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
5cf02d0
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
@sgruszka sgruszka + Thomas Gleixner sched: fix divide by zero at {thread_group,task}_times
On architectures where cputime_t is 64 bit type, is possible to trigger
divide by zero on do_div(temp, (__force u32) total) line, if total is a
non zero number but has lower 32 bit's zeroed. Removing casting is not
a good solution since some do_div() implementations do cast to u32
internally.

This problem can be triggered in practice on very long lived processes:

  PID: 2331   TASK: ffff880472814b00  CPU: 2   COMMAND: "oraagent.bin"
   #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
   #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
   #2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
   #3 [ffff880472a51cd0] die at ffffffff8100f26b
   #4 [ffff880472a51d00] do_trap at ffffffff814f03f4
   #5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
   #6 [ffff880472a51e00] divide_error at ffffffff8100be7b
      [exception RIP: thread_group_times+0x56]
      RIP: ffffffff81056a16  RSP: ffff880472a51eb8  RFLAGS: 00010046
      RAX: bc3572c9fe12d194  RBX: ffff880874150800  RCX: 0000000110266fad
      RDX: 0000000000000000  RSI: ffff880472a51eb8  RDI: 001038ae7d9633dc
      RBP: ffff880472a51ef8   R8: 00000000b10a3a64   R9: ffff880874150800
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: ffff880472a51f08
      R13: ffff880472a51f10  R14: 0000000000000000  R15: 0000000000000007
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
   #8 [ffff880472a51f40] sys_times at ffffffff81088524
   #9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
      RIP: 0000003808caac3a  RSP: 00007fcba27ab6d8  RFLAGS: 00000202
      RAX: 0000000000000064  RBX: ffffffff8100b0f2  RCX: 0000000000000000
      RDX: 00007fcba27ab6e0  RSI: 000000000076d58e  RDI: 00007fcba27ab6e0
      RBP: 00007fcba27ab700   R8: 0000000000000020   R9: 000000000000091b
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: 00007fff9ca41940
      R13: 0000000000000000  R14: 00007fcba27ac9c0  R15: 00007fff9ca41940
      ORIG_RAX: 0000000000000064  CS: 0033  SS: 002b

Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120808092714.GA3580@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
bea6832
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
@davem330 Eric Dumazet + davem330 l2tp: fix a lockdep splat
Fixes following lockdep splat :

[ 1614.734896] =============================================
[ 1614.734898] [ INFO: possible recursive locking detected ]
[ 1614.734901] 3.6.0-rc3+ #782 Not tainted
[ 1614.734903] ---------------------------------------------
[ 1614.734905] swapper/11/0 is trying to acquire lock:
[ 1614.734907]  (slock-AF_INET){+.-...}, at: [<ffffffffa0209d72>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.734920]
[ 1614.734920] but task is already holding lock:
[ 1614.734922]  (slock-AF_INET){+.-...}, at: [<ffffffff815fce23>] tcp_v4_err+0x163/0x6b0
[ 1614.734932]
[ 1614.734932] other info that might help us debug this:
[ 1614.734935]  Possible unsafe locking scenario:
[ 1614.734935]
[ 1614.734937]        CPU0
[ 1614.734938]        ----
[ 1614.734940]   lock(slock-AF_INET);
[ 1614.734943]   lock(slock-AF_INET);
[ 1614.734946]
[ 1614.734946]  *** DEADLOCK ***
[ 1614.734946]
[ 1614.734949]  May be due to missing lock nesting notation
[ 1614.734949]
[ 1614.734952] 7 locks held by swapper/11/0:
[ 1614.734954]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff81592801>] __netif_receive_skb+0x251/0xd00
[ 1614.734964]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff815d319c>] ip_local_deliver_finish+0x4c/0x4e0
[ 1614.734972]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff8160d116>] icmp_socket_deliver+0x46/0x230
[ 1614.734982]  #3:  (slock-AF_INET){+.-...}, at: [<ffffffff815fce23>] tcp_v4_err+0x163/0x6b0
[ 1614.734989]  #4:  (rcu_read_lock){.+.+..}, at: [<ffffffff815da240>] ip_queue_xmit+0x0/0x680
[ 1614.734997]  #5:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815d9925>] ip_finish_output+0x135/0x890
[ 1614.735004]  #6:  (rcu_read_lock_bh){.+....}, at: [<ffffffff81595680>] dev_queue_xmit+0x0/0xe00
[ 1614.735012]
[ 1614.735012] stack backtrace:
[ 1614.735016] Pid: 0, comm: swapper/11 Not tainted 3.6.0-rc3+ #782
[ 1614.735018] Call Trace:
[ 1614.735020]  <IRQ>  [<ffffffff810a50ac>] __lock_acquire+0x144c/0x1b10
[ 1614.735033]  [<ffffffff810a334b>] ? check_usage+0x9b/0x4d0
[ 1614.735037]  [<ffffffff810a6762>] ? mark_held_locks+0x82/0x130
[ 1614.735042]  [<ffffffff810a5df0>] lock_acquire+0x90/0x200
[ 1614.735047]  [<ffffffffa0209d72>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735051]  [<ffffffff810a69ad>] ? trace_hardirqs_on+0xd/0x10
[ 1614.735060]  [<ffffffff81749b31>] _raw_spin_lock+0x41/0x50
[ 1614.735065]  [<ffffffffa0209d72>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735069]  [<ffffffffa0209d72>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735075]  [<ffffffffa014f7f2>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
[ 1614.735079]  [<ffffffff81595112>] dev_hard_start_xmit+0x502/0xa70
[ 1614.735083]  [<ffffffff81594c6e>] ? dev_hard_start_xmit+0x5e/0xa70
[ 1614.735087]  [<ffffffff815957c1>] ? dev_queue_xmit+0x141/0xe00
[ 1614.735093]  [<ffffffff815b622e>] sch_direct_xmit+0xfe/0x290
[ 1614.735098]  [<ffffffff81595865>] dev_queue_xmit+0x1e5/0xe00
[ 1614.735102]  [<ffffffff81595680>] ? dev_hard_start_xmit+0xa70/0xa70
[ 1614.735106]  [<ffffffff815b4daa>] ? eth_header+0x3a/0xf0
[ 1614.735111]  [<ffffffff8161d33e>] ? fib_get_table+0x2e/0x280
[ 1614.735117]  [<ffffffff8160a7e2>] arp_xmit+0x22/0x60
[ 1614.735121]  [<ffffffff8160a863>] arp_send+0x43/0x50
[ 1614.735125]  [<ffffffff8160b82f>] arp_solicit+0x18f/0x450
[ 1614.735132]  [<ffffffff8159d9da>] neigh_probe+0x4a/0x70
[ 1614.735137]  [<ffffffff815a191a>] __neigh_event_send+0xea/0x300
[ 1614.735141]  [<ffffffff815a1c93>] neigh_resolve_output+0x163/0x260
[ 1614.735146]  [<ffffffff815d9cf5>] ip_finish_output+0x505/0x890
[ 1614.735150]  [<ffffffff815d9925>] ? ip_finish_output+0x135/0x890
[ 1614.735154]  [<ffffffff815dae79>] ip_output+0x59/0xf0
[ 1614.735158]  [<ffffffff815da1cd>] ip_local_out+0x2d/0xa0
[ 1614.735162]  [<ffffffff815da403>] ip_queue_xmit+0x1c3/0x680
[ 1614.735165]  [<ffffffff815da240>] ? ip_local_out+0xa0/0xa0
[ 1614.735172]  [<ffffffff815f4402>] tcp_transmit_skb+0x402/0xa60
[ 1614.735177]  [<ffffffff815f5a11>] tcp_retransmit_skb+0x1a1/0x620
[ 1614.735181]  [<ffffffff815f7e93>] tcp_retransmit_timer+0x393/0x960
[ 1614.735185]  [<ffffffff815fce23>] ? tcp_v4_err+0x163/0x6b0
[ 1614.735189]  [<ffffffff815fd317>] tcp_v4_err+0x657/0x6b0
[ 1614.735194]  [<ffffffff8160d116>] ? icmp_socket_deliver+0x46/0x230
[ 1614.735199]  [<ffffffff8160d19e>] icmp_socket_deliver+0xce/0x230
[ 1614.735203]  [<ffffffff8160d116>] ? icmp_socket_deliver+0x46/0x230
[ 1614.735208]  [<ffffffff8160d464>] icmp_unreach+0xe4/0x2c0
[ 1614.735213]  [<ffffffff8160e520>] icmp_rcv+0x350/0x4a0
[ 1614.735217]  [<ffffffff815d3285>] ip_local_deliver_finish+0x135/0x4e0
[ 1614.735221]  [<ffffffff815d319c>] ? ip_local_deliver_finish+0x4c/0x4e0
[ 1614.735225]  [<ffffffff815d3ffa>] ip_local_deliver+0x4a/0x90
[ 1614.735229]  [<ffffffff815d37b7>] ip_rcv_finish+0x187/0x730
[ 1614.735233]  [<ffffffff815d425d>] ip_rcv+0x21d/0x300
[ 1614.735237]  [<ffffffff81592a1b>] __netif_receive_skb+0x46b/0xd00
[ 1614.735241]  [<ffffffff81592801>] ? __netif_receive_skb+0x251/0xd00
[ 1614.735245]  [<ffffffff81593368>] process_backlog+0xb8/0x180
[ 1614.735249]  [<ffffffff81593cf9>] net_rx_action+0x159/0x330
[ 1614.735257]  [<ffffffff810491f0>] __do_softirq+0xd0/0x3e0
[ 1614.735264]  [<ffffffff8109ed24>] ? tick_program_event+0x24/0x30
[ 1614.735270]  [<ffffffff8175419c>] call_softirq+0x1c/0x30
[ 1614.735278]  [<ffffffff8100425d>] do_softirq+0x8d/0xc0
[ 1614.735282]  [<ffffffff8104983e>] irq_exit+0xae/0xe0
[ 1614.735287]  [<ffffffff8175494e>] smp_apic_timer_interrupt+0x6e/0x99
[ 1614.735291]  [<ffffffff81753a1c>] apic_timer_interrupt+0x6c/0x80
[ 1614.735293]  <EOI>  [<ffffffff810a14ad>] ? trace_hardirqs_off+0xd/0x10
[ 1614.735306]  [<ffffffff81336f85>] ? intel_idle+0xf5/0x150
[ 1614.735310]  [<ffffffff81336f7e>] ? intel_idle+0xee/0x150
[ 1614.735317]  [<ffffffff814e6ea9>] cpuidle_enter+0x19/0x20
[ 1614.735321]  [<ffffffff814e7538>] cpuidle_idle_call+0xa8/0x630
[ 1614.735327]  [<ffffffff8100c1ba>] cpu_idle+0x8a/0xe0
[ 1614.735333]  [<ffffffff8173762e>] start_secondary+0x220/0x222

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
37159ef
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
@bmork @gregkh bmork + gregkh USB: option: blacklist QMI interface on ZTE MF683
commit 160c942 upstream.

Interface #5 on ZTE MF683 is a QMI/wwan interface.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: Shawn J. Goff <shawn7400@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0eccd99
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Oct 26, 2012
@davem330 Eric Dumazet + davem330 bonding: set qdisc_tx_busylock to avoid LOCKDEP splat
If a qdisc is installed on a bonding device, its possible to get
following lockdep splat under stress :

 =============================================
 [ INFO: possible recursive locking detected ]
 3.6.0+ #211 Not tainted
 ---------------------------------------------
 ping/4876 is trying to acquire lock:
  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830

 but task is already holding lock:
  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
   lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 6 locks held by ping/4876:
  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e5030>] raw_sendmsg+0x600/0xc30
  #1:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815ba4bd>] ip_finish_output+0x12d/0x870
  #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830
  #3:  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
  #4:  (&bond->lock){++.?..}, at: [<ffffffffa02128c1>] bond_start_xmit+0x31/0x4b0 [bonding]
  #5:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830

 stack backtrace:
 Pid: 4876, comm: ping Not tainted 3.6.0+ #211
 Call Trace:
  [<ffffffff810a0145>] __lock_acquire+0x715/0x1b80
  [<ffffffff810a256b>] ? mark_held_locks+0x9b/0x100
  [<ffffffff810a1bf2>] lock_acquire+0x92/0x1d0
  [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830
  [<ffffffff81726b7c>] _raw_spin_lock+0x3c/0x50
  [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830
  [<ffffffff8106264d>] ? rcu_read_lock_bh_held+0x5d/0x90
  [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
  [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570
  [<ffffffffa0212a6a>] bond_start_xmit+0x1da/0x4b0 [bonding]
  [<ffffffff815796d0>] dev_hard_start_xmit+0x240/0x6b0
  [<ffffffff81597c6e>] sch_direct_xmit+0xfe/0x2a0
  [<ffffffff8157a249>] dev_queue_xmit+0x199/0x830
  [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570
  [<ffffffff815ba96f>] ip_finish_output+0x5df/0x870
  [<ffffffff815ba4bd>] ? ip_finish_output+0x12d/0x870
  [<ffffffff815bb964>] ip_output+0x54/0xf0
  [<ffffffff815bad48>] ip_local_out+0x28/0x90
  [<ffffffff815bc444>] ip_send_skb+0x14/0x50
  [<ffffffff815bc4b2>] ip_push_pending_frames+0x32/0x40
  [<ffffffff815e536a>] raw_sendmsg+0x93a/0xc30
  [<ffffffff8128d570>] ? selinux_file_send_sigiotask+0x1f0/0x1f0
  [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80
  [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220
  [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80
  [<ffffffff815f6855>] inet_sendmsg+0x125/0x240
  [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220
  [<ffffffff8155cddb>] sock_sendmsg+0xab/0xe0
  [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0
  [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0
  [<ffffffff8155d18c>] __sys_sendmsg+0x37c/0x390
  [<ffffffff81195b2a>] ? fsnotify+0x2ca/0x7e0
  [<ffffffff811958e8>] ? fsnotify+0x88/0x7e0
  [<ffffffff81361f36>] ? put_ldisc+0x56/0xd0
  [<ffffffff8116f98a>] ? fget_light+0x3da/0x510
  [<ffffffff8155f6c4>] sys_sendmsg+0x44/0x80
  [<ffffffff8172fc22>] system_call_fastpath+0x16/0x1b

Avoid this problem using a distinct lock_class_key for bonding
devices.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
49ee492
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Dec 5, 2012
@htejun Mike Galbraith + htejun workqueue: exit rescuer_thread() as TASK_RUNNING
A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling
off, never to be seen again.  In the case where this occurred, an exiting
thread hit reiserfs homebrew conditional resched while holding a mutex,
bringing the box to its knees.

PID: 18105  TASK: ffff8807fd412180  CPU: 5   COMMAND: "kdmflush"
 #0 [ffff8808157e7670] schedule at ffffffff8143f489
 #1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs]
 #2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14
 #3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs]
 #4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2
 #5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41
 #6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a
 #7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88
 #8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850
 #9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f
    [exception RIP: kernel_thread_helper]
    RIP: ffffffff8144a5c0  RSP: ffff8808157e7f58  RFLAGS: 00000202
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffffffff8107af60  RDI: ffff8803ee491d18
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
412d32e
@popcornmix popcornmix pushed a commit that referenced this issue Dec 11, 2012
@gregkh Mike Galbraith + gregkh workqueue: exit rescuer_thread() as TASK_RUNNING
commit 412d32e upstream.

A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling
off, never to be seen again.  In the case where this occurred, an exiting
thread hit reiserfs homebrew conditional resched while holding a mutex,
bringing the box to its knees.

PID: 18105  TASK: ffff8807fd412180  CPU: 5   COMMAND: "kdmflush"
 #0 [ffff8808157e7670] schedule at ffffffff8143f489
 #1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs]
 #2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14
 #3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs]
 #4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2
 #5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41
 #6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a
 #7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88
 #8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850
 #9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f
    [exception RIP: kernel_thread_helper]
    RIP: ffffffff8144a5c0  RSP: ffff8808157e7f58  RFLAGS: 00000202
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffffffff8107af60  RDI: ffff8803ee491d18
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2ccea42
@Olipro Olipro pushed a commit to Olipro/linux-RPi that referenced this issue Dec 26, 2012
Rob Herring + Russell King ARM: 7493/1: use generic unaligned.h
This moves ARM over to the asm-generic/unaligned.h header. This has the
benefit of better code generated especially for ARMv7 on gcc 4.7+
compilers.

As Arnd Bergmann, points out: The asm-generic version uses the "struct"
version for native-endian unaligned access and the "byteshift" version
for the opposite endianess. The current ARM version however uses the
"byteshift" implementation for both.

Thanks to Nicolas Pitre for the excellent analysis:

Test case:

int foo (int *x) { return get_unaligned(x); }
long long bar (long long *x) { return get_unaligned(x); }

With the current ARM version:

foo:
	ldrb	r3, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	r3, r3, asl #16	@ tmp154, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	r3, r3, r1, asl #8	@, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r3, r2	@ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
	orr	r0, r3, r0, asl #24	@,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	mov	r2, #0	@ tmp184,
	ldrb	r5, [r0, #6]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
	ldrb	r4, [r0, #5]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
	ldrb	ip, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #4]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
	mov	r5, r5, asl #16	@ tmp175, MEM[(const u8 *)x_1(D) + 6B],
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	orr	r5, r5, r4, asl #8	@, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
	ldrb	r6, [r0, #7]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
	orr	r5, r5, r1	@ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
	ldrb	r4, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	ip, ip, asl #16	@ tmp188, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r1, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	ip, ip, r7, asl #8	@, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r5, r6, asl #24	@,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
	orr	ip, ip, r4	@ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
	orr	ip, ip, r1, asl #24	@, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
	mov	r1, r3	@,
	orr	r0, r2, ip	@ tmp171, tmp184, tmp194
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

In both cases the code is slightly suboptimal.  One may wonder why
wasting r2 with the constant 0 in the second case for example.  And all
the mov's could be folded in subsequent orr's, etc.

Now with the asm-generic version:

foo:
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	bx	lr	@

bar:
	mov	r3, r0	@ x, x
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	ldr	r1, [r3, #4]	@ unaligned	@,
	bx	lr	@

This is way better of course, but only because this was compiled for
ARMv7. In this case the compiler knows that the hardware can do
unaligned word access.  This isn't that obvious for foo(), but if we
remove the get_unaligned() from bar as follows:

long long bar (long long *x) {return *x; }

then the resulting code is:

bar:
	ldmia	r0, {r0, r1}	@ x,,
	bx	lr	@

So this proves that the presumed aligned vs unaligned cases does have
influence on the instructions the compiler may use and that the above
unaligned code results are not just an accident.

Still... this isn't fully conclusive without at least looking at the
resulting assembly fron a pre ARMv6 compilation.  Let's see with an
ARMv5 target:

foo:
	ldrb	r3, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r2, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r3, r3, r1, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r2, asl #16	@, tmp145, tmp142, tmp143,
	orr	r0, r3, r0, asl #24	@,, tmp145, tmp146,
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r3, [r0, #4]	@ zero_extendqisi2	@ tmp149,
	ldrb	r6, [r0, #5]	@ zero_extendqisi2	@ tmp150,
	ldrb	r5, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r4, [r0, #6]	@ zero_extendqisi2	@ tmp153,
	ldrb	r1, [r0, #7]	@ zero_extendqisi2	@ tmp156,
	ldrb	ip, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r2, r2, r7, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r6, asl #8	@, tmp152, tmp149, tmp150,
	orr	r2, r2, r5, asl #16	@, tmp145, tmp142, tmp143,
	orr	r3, r3, r4, asl #16	@, tmp155, tmp152, tmp153,
	orr	r0, r2, ip, asl #24	@,, tmp145, tmp146,
	orr	r1, r3, r1, asl #24	@,, tmp155, tmp156,
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

Compared to the initial results, this is really nicely optimized and I
couldn't do much better if I were to hand code it myself.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
d25c881
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Jan 9, 2013
@aakoskin @tmlind aakoskin + tmlind ARM: OMAP1: fix USB configuration use-after-release
All boards, except Amstrad E3, mark USB config with __initdata.

As a result, when you compile USB into modules, they will try to refer
already released platform data and the behaviour is undefined. For example
on Nokia 770, I get the following kernel panic when modprobing ohci-hcd:

[    3.462158] Unable to handle kernel paging request at virtual address e7fddef0
[    3.477050] pgd = c3434000
[    3.487365] [e7fddef0] *pgd=00000000
[    3.498535] Internal error: Oops: 80000005 [#1] ARM
[    3.510955] Modules linked in: ohci_hcd(+)
[    3.522705] CPU: 0    Not tainted  (3.7.0-770_tiny+ #5)
[    3.535552] PC is at 0xe7fddef0
[    3.546508] LR is at ohci_omap_init+0x5c/0x144 [ohci_hcd]
[    3.560272] pc : [<e7fddef0>]    lr : [<bf003140>]    psr: a0000013
[    3.560272] sp : c344bdb0  ip : c344bce0  fp : c344bdcc
[    3.589782] r10: 00000001  r9 : 00000000  r8 : 00000000
[    3.604553] r7 : 00000026  r6 : 000000de  r5 : c0227300  r4 : c342d620
[    3.621032] r3 : e7fddef0  r2 : c048b880  r1 : 00000000  r0 : 0000000a
[    3.637786] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[    3.655822] Control: 0005317f  Table: 13434000  DAC: 00000015
[    3.672790] Process modprobe (pid: 425, stack limit = 0xc344a1b8)
[    3.690643] Stack: (0xc344bdb0 to 0xc344c000)
[    3.707031] bda0:                                     bf0030e4 c342d620 00000000 c049e62c
[    3.727905] bdc0: c344be04 c344bdd0 c0150ff0 bf0030f4 bf001b88 00000000 c048a4ac c345b020
[    3.748870] bde0: c342d620 00000000 c048a468 bf003968 00000001 bf006000 c344be34 c344be08
[    3.769836] be00: bf001bf0 c0150e48 00000000 c344be18 c00b9bfc c048a478 c048a4ac bf0037f8
[    3.790985] be20: c012ca04 c000e024 c344be44 c344be38 c012d968 bf001a84 c344be64 c344be48
[    3.812164] be40: c012c8ac c012d95c 00000000 c048a478 c048a4ac bf0037f8 c344be84 c344be68
[    3.833740] be60: c012ca74 c012c80c 20000013 00000000 c344be88 bf0037f8 c344beac c344be88
[    3.855468] be80: c012b038 c012ca14 c38093cc c383ee10 bf0037f8 c35be5a0 c049d5e8 00000000
[    3.877166] bea0: c344bebc c344beb0 c012c40c c012aff4 c344beec c344bec0 c012bfc0 c012c3fc
[    3.898834] bec0: bf00378c 00000000 c344beec bf0037f8 00067f39 00000000 00005c44 c000e024
[    3.920837] bee0: c344bf14 c344bef0 c012cd54 c012befc c04ce080 00067f39 00000000 00005c44
[    3.943023] bf00: c000e024 bf006000 c344bf24 c344bf18 c012db14 c012ccc0 c344bf3c c344bf28
[    3.965423] bf20: bf00604c c012dad8 c344a000 bf003834 c344bf7c c344bf40 c00087ac bf006010
[    3.987976] bf40: 0000000f bf003834 00067f39 00000000 00005c44 bf003834 00067f39 00000000
[    4.010711] bf60: 00005c44 c000e024 c344a000 00000000 c344bfa4 c344bf80 c004c35c c0008720
[    4.033569] bf80: c344bfac c344bf90 01422192 01427ea0 00000000 00000080 00000000 c344bfa8
[    4.056518] bfa0: c000dec0 c004c2f0 01422192 01427ea0 01427ea0 00005c44 00067f39 00000000
[    4.079406] bfc0: 01422192 01427ea0 00000000 00000080 b6e11008 014221aa be941fcc b6e1e008
[    4.102569] bfe0: b6ef6300 be941758 0000e93c b6ef6310 60000010 01427ea0 00000000 00000000
[    4.125946] Backtrace:
[    4.143463] [<bf0030e4>] (ohci_omap_init+0x0/0x144 [ohci_hcd]) from [<c0150ff0>] (usb_add_hcd+0x1b8/0x61c)
[    4.183898]  r6:c049e62c r5:00000000 r4:c342d620 r3:bf0030e4
[    4.205596] [<c0150e38>] (usb_add_hcd+0x0/0x61c) from [<bf001bf0>] (ohci_hcd_omap_drv_probe+0x17c/0x224 [ohci_hcd])
[    4.248138] [<bf001a74>] (ohci_hcd_omap_drv_probe+0x0/0x224 [ohci_hcd]) from [<c012d968>] (platform_drv_probe+0x1c/0x20)
[    4.292144]  r8:c000e024 r7:c012ca04 r6:bf0037f8 r5:c048a4ac r4:c048a478
[    4.316192] [<c012d94c>] (platform_drv_probe+0x0/0x20) from [<c012c8ac>] (driver_probe_device+0xb0/0x208)
[    4.360168] [<c012c7fc>] (driver_probe_device+0x0/0x208) from [<c012ca74>] (__driver_attach+0x70/0x94)
[    4.405548]  r6:bf0037f8 r5:c048a4ac r4:c048a478 r3:00000000
[    4.429809] [<c012ca04>] (__driver_attach+0x0/0x94) from [<c012b038>] (bus_for_each_dev+0x54/0x90)
[    4.475708]  r6:bf0037f8 r5:c344be88 r4:00000000 r3:20000013
[    4.500366] [<c012afe4>] (bus_for_each_dev+0x0/0x90) from [<c012c40c>] (driver_attach+0x20/0x28)
[    4.528442]  r7:00000000 r6:c049d5e8 r5:c35be5a0 r4:bf0037f8
[    4.553466] [<c012c3ec>] (driver_attach+0x0/0x28) from [<c012bfc0>] (bus_add_driver+0xd4/0x228)
[    4.581878] [<c012beec>] (bus_add_driver+0x0/0x228) from [<c012cd54>] (driver_register+0xa4/0x134)
[    4.629730]  r8:c000e024 r7:00005c44 r6:00000000 r5:00067f39 r4:bf0037f8
[    4.656738] [<c012ccb0>] (driver_register+0x0/0x134) from [<c012db14>] (platform_driver_register+0x4c/0x60)
[    4.706542] [<c012dac8>] (platform_driver_register+0x0/0x60) from [<bf00604c>] (ohci_hcd_mod_init+0x4c/0x8c [ohci_hcd])
[    4.757843] [<bf006000>] (ohci_hcd_mod_init+0x0/0x8c [ohci_hcd]) from [<c00087ac>] (do_one_initcall+0x9c/0x174)
[    4.808990]  r4:bf003834 r3:c344a000
[    4.832641] [<c0008710>] (do_one_initcall+0x0/0x174) from [<c004c35c>] (sys_init_module+0x7c/0x194)
[    4.881530] [<c004c2e0>] (sys_init_module+0x0/0x194) from [<c000dec0>] (ret_fast_syscall+0x0/0x2c)
[    4.930664]  r7:00000080 r6:00000000 r5:01427ea0 r4:01422192
[    4.956481] Code: bad PC value
[    4.978729] ---[ end trace 58280240f08342c4 ]---
[    5.002258] Kernel panic - not syncing: Fatal exception

Fix this by taking a copy of the data. Also mark Amstrad E3's data with
__initdata to save some memory with multi-board kernels.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
07fd296
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Jan 9, 2013
@peterhurley @gregkh peterhurley + gregkh staging/fwserial: Update TODO file per reviewer comments
Pursuant to this review https://lkml.org/lkml/2012/11/12/500
by Stefan Richter, update the TODO file.
- Clarify purpose of TODO file
- Remove firewire item #4. As discussed in this conversation
  https://lkml.org/lkml/2012/11/13/564 knowing the AR buffer size
  is not a hard requirement. The required rx buffer size can be
  determined experimentally.
- Remove firewire item #5. This was a private note for further
  experimentation.
- Change firewire item #1. Change suggested header from uapi header
  to kernel-only header.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3c8c21e
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Jan 29, 2013
Dmitry Kasatkin + James Morris evm: checking if removexattr is not a NULL
The following lines of code produce a kernel oops.

fd = socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0);
fchmod(fd, 0666);

[  139.922364] BUG: unable to handle kernel NULL pointer dereference at   (null)
[  139.924982] IP: [<  (null)>]   (null)
[  139.924982] *pde = 00000000
[  139.924982] Oops: 0000 [#5] SMP
[  139.924982] Modules linked in: fuse dm_crypt dm_mod i2c_piix4 serio_raw evdev binfmt_misc button
[  139.924982] Pid: 3070, comm: acpid Tainted: G      D      3.8.0-rc2-kds+ #465 Bochs Bochs
[  139.924982] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
[  139.924982] EIP is at 0x0
[  139.924982] EAX: cf5ef000 EBX: cf5ef000 ECX: c143d600 EDX: c15225f2
[  139.924982] ESI: cf4d2a1c EDI: cf4d2a1c EBP: cc02df10 ESP: cc02dee4
[  139.924982]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  139.924982] CR0: 80050033 CR2: 00000000 CR3: 0c059000 CR4: 000006d0
[  139.924982] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  139.924982] DR6: ffff0ff0 DR7: 00000400
[  139.924982] Process acpid (pid: 3070, ti=cc02c000 task=d7705340 task.ti=cc02c000)
[  139.924982] Stack:
[  139.924982]  c1203c88 00000000 cc02def4 cf4d2a1c ae21eefa 471b60d5 1083c1ba c26a5940
[  139.924982]  e891fb5e 00000041 00000004 cc02df1c c1203964 00000000 cc02df4c c10e20c3
[  139.924982]  00000002 00000000 00000000 22222222 c1ff2222 cf5ef000 00000000 d76efb08
[  139.924982] Call Trace:
[  139.924982]  [<c1203c88>] ? evm_update_evmxattr+0x5b/0x62
[  139.924982]  [<c1203964>] evm_inode_post_setattr+0x22/0x26
[  139.924982]  [<c10e20c3>] notify_change+0x25f/0x281
[  139.924982]  [<c10cbf56>] chmod_common+0x59/0x76
[  139.924982]  [<c10e27a1>] ? put_unused_fd+0x33/0x33
[  139.924982]  [<c10cca09>] sys_fchmod+0x39/0x5c
[  139.924982]  [<c13f4f30>] syscall_call+0x7/0xb
[  139.924982] Code:  Bad EIP value.

This happens because sockets do not define the removexattr operation.
Before removing the xattr, verify the removexattr function pointer is
not NULL.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
a67adb9
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Feb 15, 2013
Jaegeuk Kim f2fs: avoid balanc_fs during evict_inode
1. Background

Previously, if f2fs tries to move data blocks of an *evicting* inode during the
cleaning process, it stops the process incompletely and then restarts the whole
process, since it needs a locked inode to grab victim data pages in its address
space. In order to get a locked inode, iget_locked() by f2fs_iget() is normally
used, but, it waits if the inode is on freeing.

So, here is a deadlock scenario.
1. f2fs_evict_inode()       <- inode "A"
  2. f2fs_balance_fs()
    3. f2fs_gc()
      4. gc_data_segment()
        5. f2fs_iget()      <- inode "A" too!

If step #1 and #5 treat a same inode "A", step #5 would fall into deadlock since
the inode "A" is on freeing. In order to resolve this, f2fs_iget_nowait() which
skips __wait_on_freeing_inode() was introduced in step #5, and stops f2fs_gc()
to complete f2fs_evict_inode().

1. f2fs_evict_inode()           <- inode "A"
  2. f2fs_balance_fs()
    3. f2fs_gc()
      4. gc_data_segment()
        5. f2fs_iget_nowait()   <- inode "A", then stop f2fs_gc() w/ -ENOENT

2. Problem and Solution

In the above scenario, however, f2fs cannot finish f2fs_evict_inode() only if:
 o there are not enough free sections, and
 o f2fs_gc() tries to move data blocks of the *evicting* inode repeatedly.

So, the final solution is to use f2fs_iget() and remove f2fs_balance_fs() in
f2fs_evict_inode().
The f2fs_evict_inode() actually truncates all the data and node blocks, which
means that it doesn't produce any dirty node pages accordingly.
So, we don't need to do f2fs_balance_fs() in practical.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
d4686d5
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Feb 22, 2013
@vineetgarc vineetgarc ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core
For now this will suffice for all platforms, later exotic ones needs to
get this from DeviceTree

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
decae9d
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Mar 6, 2013
@torvalds torvalds Merge tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/k…
…ernel/git/vgupta/arc

Pull new ARC architecture from Vineet Gupta:
 "Initial ARC Linux port with some fixes on top for 3.9-rc1:

  I would like to introduce the Linux port to ARC Processors (from
  Synopsys) for 3.9-rc1.  The patch-set has been discussed on the public
  lists since Nov and has received a fair bit of review, specially from
  Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb...

  The arch bits are in arch/arc, some asm-generic changes (acked by
  Arnd), a minor change to PARISC (acked by Helge).

  The series is a touch bigger for a new port for 2 main reasons:

   1. It enables a basic kernel in first sub-series and adds
      ptrace/kgdb/.. later

   2. Some of the fallout of review (DeviceTree support, multi-platform-
      image support) were added on top of orig series, primarily to
      record the revision history.

  This updated pull request additionally contains

   - fixes due to our GNU tools catching up with the new syscall/ptrace
     ABI

   - some (minor) cross-arch Kconfig updates."

* tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (82 commits)
  ARC: split elf.h into uapi and export it for userspace
  ARC: Fixup the current ABI version
  ARC: gdbserver using regset interface possibly broken
  ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge window
  ARC: make a copy of flat DT
  ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed"
  ARC: Ensure CONFIG_VIRT_TO_BUS is not enabled
  ARC: Fix pt_orig_r8 access
  ARC: [3.9] Fallout of hlist iterator update
  ARC: 64bit RTSC timestamp hardware issue
  ARC: Don't fiddle with non-existent caches
  ARC: Add self to MAINTAINERS
  ARC: Provide a default serial.h for uart drivers needing BASE_BAUD
  ARC: [plat-arcfpga] defconfig for fully loaded ARC Linux
  ARC: [Review] Multi-platform image #8: platform registers SMP callbacks
  ARC: [Review] Multi-platform image #7: SMP common code to use callbacks
  ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional
  ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core
  ARC: [Review] Multi-platform image #4: Isolate platform headers
  ARC: [Review] Multi-platform image #3: switch to board callback
  ...
e23b622
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Mar 14, 2013
@kghost @davem330 kghost + davem330 vxlan: fix oops when delete netns containing vxlan
The following script will produce a kernel oops:

    sudo ip netns add v
    sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo
    sudo ip netns exec v ip link set lo up
    sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo
    sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo
    sudo ip netns exec v ip link set vxlan0 up
    sudo ip netns del v

where inspect by gdb:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 107]
    0xffffffffa0289e33 in ?? ()
    (gdb) bt
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    #1  vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087
    #2  0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299
    #3  0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335
    #4  0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851
    #5  0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752
    #6  0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170
    #7  0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302
    #8  0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157
    #9  0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276
    #10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168
    #11 <signal handler called>
    #12 0x0000000000000000 in ?? ()
    #13 0x0000000000000000 in ?? ()
    (gdb) fr 0
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    533		struct sock *sk = vn->sock->sk;
    (gdb) l
    528	static int vxlan_leave_group(struct net_device *dev)
    529	{
    530		struct vxlan_dev *vxlan = netdev_priv(dev);
    531		struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
    532		int err = 0;
    533		struct sock *sk = vn->sock->sk;
    534		struct ip_mreqn mreq = {
    535			.imr_multiaddr.s_addr	= vxlan->gaddr,
    536			.imr_ifindex		= vxlan->link,
    537		};
    (gdb) p vn->sock
    $4 = (struct socket *) 0x0

The kernel calls `vxlan_exit_net` when deleting the netns before shutting down
vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock`
is already gone causes the oops. so we should manually shutdown all interfaces
before deleting `vn->sock` as the patch does.

Signed-off-by: Zang MingJie <zealot0630@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9cb6cb7
@popcornmix popcornmix pushed a commit that referenced this issue Mar 28, 2013
@kghost @gregkh kghost + gregkh vxlan: fix oops when delete netns containing vxlan
[ Upstream commit 9cb6cb7 ]

The following script will produce a kernel oops:

    sudo ip netns add v
    sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo
    sudo ip netns exec v ip link set lo up
    sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo
    sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo
    sudo ip netns exec v ip link set vxlan0 up
    sudo ip netns del v

where inspect by gdb:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 107]
    0xffffffffa0289e33 in ?? ()
    (gdb) bt
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    #1  vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087
    #2  0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299
    #3  0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335
    #4  0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851
    #5  0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752
    #6  0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170
    #7  0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302
    #8  0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157
    #9  0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276
    #10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168
    #11 <signal handler called>
    #12 0x0000000000000000 in ?? ()
    #13 0x0000000000000000 in ?? ()
    (gdb) fr 0
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    533		struct sock *sk = vn->sock->sk;
    (gdb) l
    528	static int vxlan_leave_group(struct net_device *dev)
    529	{
    530		struct vxlan_dev *vxlan = netdev_priv(dev);
    531		struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
    532		int err = 0;
    533		struct sock *sk = vn->sock->sk;
    534		struct ip_mreqn mreq = {
    535			.imr_multiaddr.s_addr	= vxlan->gaddr,
    536			.imr_ifindex		= vxlan->link,
    537		};
    (gdb) p vn->sock
    $4 = (struct socket *) 0x0

The kernel calls `vxlan_exit_net` when deleting the netns before shutting down
vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock`
is already gone causes the oops. so we should manually shutdown all interfaces
before deleting `vn->sock` as the patch does.

Signed-off-by: Zang MingJie <zealot0630@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
d711a39
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Apr 4, 2013
@torvalds Vladimir Davydov + torvalds mqueue: sys_mq_open: do not call mnt_drop_write() if read-only
mnt_drop_write() must be called only if mnt_want_write() succeeded,
otherwise the mnt_writers counter will diverge.

mnt_writers counters are used to check if remounting FS as read-only is
OK, so after an extra mnt_drop_write() call, it would be impossible to
remount mqueue FS as read-only.  Besides, on umount a warning would be
printed like this one:

  =====================================
  [ BUG: bad unlock balance detected! ]
  3.9.0-rc3 #5 Not tainted
  -------------------------------------
  a.out/12486 is trying to release lock (sb_writers) at:
  mnt_drop_write+0x1f/0x30
  but there are no more locks to release!

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
38d78e5
@popcornmix popcornmix pushed a commit that referenced this issue Apr 11, 2013
@gregkh Vladimir Davydov + gregkh mqueue: sys_mq_open: do not call mnt_drop_write() if read-only
commit 38d78e5 upstream.

mnt_drop_write() must be called only if mnt_want_write() succeeded,
otherwise the mnt_writers counter will diverge.

mnt_writers counters are used to check if remounting FS as read-only is
OK, so after an extra mnt_drop_write() call, it would be impossible to
remount mqueue FS as read-only.  Besides, on umount a warning would be
printed like this one:

  =====================================
  [ BUG: bad unlock balance detected! ]
  3.9.0-rc3 #5 Not tainted
  -------------------------------------
  a.out/12486 is trying to release lock (sb_writers) at:
  mnt_drop_write+0x1f/0x30
  but there are no more locks to release!

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6ad6c40
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue May 8, 2013
@tytso Dr. Tilmann Bubeck + tytso ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT
Add a new ioctl, EXT4_IOC_SWAP_BOOT which swaps i_blocks and
associated attributes (like i_blocks, i_size, i_flags, ...) from the
specified inode with inode EXT4_BOOT_LOADER_INO (#5). This is
typically used to store a boot loader in a secure part of the
filesystem, where it can't be changed by a normal user by accident.
The data blocks of the previous boot loader will be associated with
the given inode.

This usercode program is a simple example of the usage:

int main(int argc, char *argv[])
{
  int fd;
  int err;

  if ( argc != 2 ) {
    printf("usage: ext4-swap-boot-inode FILE-TO-SWAP\n");
    exit(1);
  }

  fd = open(argv[1], O_WRONLY);
  if ( fd < 0 ) {
    perror("open");
    exit(1);
  }

  err = ioctl(fd, EXT4_IOC_SWAP_BOOT);
  if ( err < 0 ) {
    perror("ioctl");
    exit(1);
  }

  close(fd);
  exit(0);
}

[ Modified by Theodore Ts'o to fix a number of bugs in the original code.]

Signed-off-by: Dr. Tilmann Bubeck <t.bubeck@reinform.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
393d1d1
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue May 8, 2013
@rustyrussell rustyrussell module: don't unlink the module until we've removed all exposure.
Otherwise we get a race between unload and reload of the same module:
the new module doesn't see the old one in the list, but then fails because
it can't register over the still-extant entries in sysfs:

 [  103.981925] ------------[ cut here ]------------
 [  103.986902] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xab/0xd0()
 [  103.993606] Hardware name: CrownBay Platform
 [  103.998075] sysfs: cannot create duplicate filename '/module/pch_gbe'
 [  104.004784] Modules linked in: pch_gbe(+) [last unloaded: pch_gbe]
 [  104.011362] Pid: 3021, comm: modprobe Tainted: G        W    3.9.0-rc5+ #5
 [  104.018662] Call Trace:
 [  104.021286]  [<c103599d>] warn_slowpath_common+0x6d/0xa0
 [  104.026933]  [<c1168c8b>] ? sysfs_add_one+0xab/0xd0
 [  104.031986]  [<c1168c8b>] ? sysfs_add_one+0xab/0xd0
 [  104.037000]  [<c1035a4e>] warn_slowpath_fmt+0x2e/0x30
 [  104.042188]  [<c1168c8b>] sysfs_add_one+0xab/0xd0
 [  104.046982]  [<c1168dbe>] create_dir+0x5e/0xa0
 [  104.051633]  [<c1168e78>] sysfs_create_dir+0x78/0xd0
 [  104.056774]  [<c1262bc3>] kobject_add_internal+0x83/0x1f0
 [  104.062351]  [<c126daf6>] ? kvasprintf+0x46/0x60
 [  104.067231]  [<c1262ebd>] kobject_add_varg+0x2d/0x50
 [  104.072450]  [<c1262f07>] kobject_init_and_add+0x27/0x30
 [  104.078075]  [<c1089240>] mod_sysfs_setup+0x80/0x540
 [  104.083207]  [<c1260851>] ? module_bug_finalize+0x51/0xc0
 [  104.088720]  [<c108ab29>] load_module+0x1429/0x18b0

We can teardown sysfs first, then to be sure, put the state in
MODULE_STATE_UNFORMED so it's ignored while we deconstruct it.

Reported-by: Veaceslav Falico <vfalico@redhat.com>
Tested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
944a1fa
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue May 8, 2013
@HSchill @davem330 HSchill + davem330 ipvs: ip_vs_sip_fill_param() BUG: bad check of return value
The reason for this patch is crash in kmemdup
caused by returning from get_callid with uniialized
matchoff and matchlen.

Removing Zero check of matchlen since it's done by ct_sip_get_header()

BUG: unable to handle kernel paging request at ffff880457b5763f
IP: [<ffffffff810df7fc>] kmemdup+0x2e/0x35
PGD 27f6067 PUD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: xt_state xt_helper nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle xt_connmark xt_conntrack ip6_tables nf_conntrack_ftp ip_vs_ftp nf_nat xt_tcpudp iptable_mangle xt_mark ip_tables x_tables ip_vs_rr ip_vs_lblcr ip_vs_pe_sip ip_vs nf_conntrack_sip nf_conntrack bonding igb i2c_algo_bit i2c_core
CPU 5
Pid: 0, comm: swapper/5 Not tainted 3.9.0-rc5+ #5                  /S1200KP
RIP: 0010:[<ffffffff810df7fc>]  [<ffffffff810df7fc>] kmemdup+0x2e/0x35
RSP: 0018:ffff8803fea03648  EFLAGS: 00010282
RAX: ffff8803d61063e0 RBX: 0000000000000003 RCX: 0000000000000003
RDX: 0000000000000003 RSI: ffff880457b5763f RDI: ffff8803d61063e0
RBP: ffff8803fea03658 R08: 0000000000000008 R09: 0000000000000011
R10: 0000000000000011 R11: 00ffffffff81a8a3 R12: ffff880457b5763f
R13: ffff8803d67f786a R14: ffff8803fea03730 R15: ffffffffa0098e90
FS:  0000000000000000(0000) GS:ffff8803fea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff880457b5763f CR3: 0000000001a0c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/5 (pid: 0, threadinfo ffff8803ee18c000, task ffff8803ee18a480)
Stack:
 ffff8803d822a080 000000000000001c ffff8803fea036c8 ffffffffa000937a
 ffffffff81f0d8a0 000000038135fdd5 ffff880300000014 ffff880300110000
 ffffffff150118ac ffff8803d7e8a000 ffff88031e0118ac 0000000000000000
Call Trace:
 <IRQ>

 [<ffffffffa000937a>] ip_vs_sip_fill_param+0x13a/0x187 [ip_vs_pe_sip]
 [<ffffffffa007b209>] ip_vs_sched_persist+0x2c6/0x9c3 [ip_vs]
 [<ffffffff8107dc53>] ? __lock_acquire+0x677/0x1697
 [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d
 [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d
 [<ffffffff810649bc>] ? sched_clock_cpu+0x43/0xcf
 [<ffffffffa007bb1e>] ip_vs_schedule+0x181/0x4ba [ip_vs]
...

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
f7a1dd6
@popcornmix popcornmix pushed a commit that referenced this issue May 12, 2013
@HSchill @gregkh HSchill + gregkh ipvs: ip_vs_sip_fill_param() BUG: bad check of return value
commit f7a1dd6 upstream.

The reason for this patch is crash in kmemdup
caused by returning from get_callid with uniialized
matchoff and matchlen.

Removing Zero check of matchlen since it's done by ct_sip_get_header()

BUG: unable to handle kernel paging request at ffff880457b5763f
IP: [<ffffffff810df7fc>] kmemdup+0x2e/0x35
PGD 27f6067 PUD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: xt_state xt_helper nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle xt_connmark xt_conntrack ip6_tables nf_conntrack_ftp ip_vs_ftp nf_nat xt_tcpudp iptable_mangle xt_mark ip_tables x_tables ip_vs_rr ip_vs_lblcr ip_vs_pe_sip ip_vs nf_conntrack_sip nf_conntrack bonding igb i2c_algo_bit i2c_core
CPU 5
Pid: 0, comm: swapper/5 Not tainted 3.9.0-rc5+ #5                  /S1200KP
RIP: 0010:[<ffffffff810df7fc>]  [<ffffffff810df7fc>] kmemdup+0x2e/0x35
RSP: 0018:ffff8803fea03648  EFLAGS: 00010282
RAX: ffff8803d61063e0 RBX: 0000000000000003 RCX: 0000000000000003
RDX: 0000000000000003 RSI: ffff880457b5763f RDI: ffff8803d61063e0
RBP: ffff8803fea03658 R08: 0000000000000008 R09: 0000000000000011
R10: 0000000000000011 R11: 00ffffffff81a8a3 R12: ffff880457b5763f
R13: ffff8803d67f786a R14: ffff8803fea03730 R15: ffffffffa0098e90
FS:  0000000000000000(0000) GS:ffff8803fea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff880457b5763f CR3: 0000000001a0c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/5 (pid: 0, threadinfo ffff8803ee18c000, task ffff8803ee18a480)
Stack:
 ffff8803d822a080 000000000000001c ffff8803fea036c8 ffffffffa000937a
 ffffffff81f0d8a0 000000038135fdd5 ffff880300000014 ffff880300110000
 ffffffff150118ac ffff8803d7e8a000 ffff88031e0118ac 0000000000000000
Call Trace:
 <IRQ>

 [<ffffffffa000937a>] ip_vs_sip_fill_param+0x13a/0x187 [ip_vs_pe_sip]
 [<ffffffffa007b209>] ip_vs_sched_persist+0x2c6/0x9c3 [ip_vs]
 [<ffffffff8107dc53>] ? __lock_acquire+0x677/0x1697
 [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d
 [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d
 [<ffffffff810649bc>] ? sched_clock_cpu+0x43/0xcf
 [<ffffffffa007bb1e>] ip_vs_schedule+0x181/0x4ba [ip_vs]
...

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
023477d
@popcornmix popcornmix pushed a commit that referenced this issue May 12, 2013
@HSchill @gregkh HSchill + gregkh ipvs: ip_vs_sip_fill_param() BUG: bad check of return value
commit f7a1dd6 upstream.

The reason for this patch is crash in kmemdup
caused by returning from get_callid with uniialized
matchoff and matchlen.

Removing Zero check of matchlen since it's done by ct_sip_get_header()

BUG: unable to handle kernel paging request at ffff880457b5763f
IP: [<ffffffff810df7fc>] kmemdup+0x2e/0x35
PGD 27f6067 PUD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: xt_state xt_helper nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle xt_connmark xt_conntrack ip6_tables nf_conntrack_ftp ip_vs_ftp nf_nat xt_tcpudp iptable_mangle xt_mark ip_tables x_tables ip_vs_rr ip_vs_lblcr ip_vs_pe_sip ip_vs nf_conntrack_sip nf_conntrack bonding igb i2c_algo_bit i2c_core
CPU 5
Pid: 0, comm: swapper/5 Not tainted 3.9.0-rc5+ #5                  /S1200KP
RIP: 0010:[<ffffffff810df7fc>]  [<ffffffff810df7fc>] kmemdup+0x2e/0x35
RSP: 0018:ffff8803fea03648  EFLAGS: 00010282
RAX: ffff8803d61063e0 RBX: 0000000000000003 RCX: 0000000000000003
RDX: 0000000000000003 RSI: ffff880457b5763f RDI: ffff8803d61063e0
RBP: ffff8803fea03658 R08: 0000000000000008 R09: 0000000000000011
R10: 0000000000000011 R11: 00ffffffff81a8a3 R12: ffff880457b5763f
R13: ffff8803d67f786a R14: ffff8803fea03730 R15: ffffffffa0098e90
FS:  0000000000000000(0000) GS:ffff8803fea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff880457b5763f CR3: 0000000001a0c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/5 (pid: 0, threadinfo ffff8803ee18c000, task ffff8803ee18a480)
Stack:
 ffff8803d822a080 000000000000001c ffff8803fea036c8 ffffffffa000937a
 ffffffff81f0d8a0 000000038135fdd5 ffff880300000014 ffff880300110000
 ffffffff150118ac ffff8803d7e8a000 ffff88031e0118ac 0000000000000000
Call Trace:
 <IRQ>

 [<ffffffffa000937a>] ip_vs_sip_fill_param+0x13a/0x187 [ip_vs_pe_sip]
 [<ffffffffa007b209>] ip_vs_sched_persist+0x2c6/0x9c3 [ip_vs]
 [<ffffffff8107dc53>] ? __lock_acquire+0x677/0x1697
 [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d
 [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d
 [<ffffffff810649bc>] ? sched_clock_cpu+0x43/0xcf
 [<ffffffffa007bb1e>] ip_vs_schedule+0x181/0x4ba [ip_vs]
...

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
42b5b66
@popcornmix popcornmix pushed a commit that referenced this issue May 20, 2013
@gregkh Konrad Rzeszutek Wilk + gregkh x86/microcode: Add local mutex to fix physical CPU hot-add deadlock
commit 074d72f upstream.

This can easily be triggered if a new CPU is added (via
ACPI hotplug mechanism) and from user-space you do:

   echo 1 > /sys/devices/system/cpu/cpu3/online

(or wait for UDEV to do it) on a newly appeared physical CPU.

The deadlock is that the "store_online" in drivers/base/cpu.c
takes the cpu_hotplug_driver_lock() lock, then calls "cpu_up".
"cpu_up" eventually ends up calling "save_mc_for_early"
which also takes the cpu_hotplug_driver_lock() lock.

And here is that lockdep thinks of it:

 smpboot: Stack at about ffff880075c39f44
 smpboot: CPU3: has booted.
 microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25

 =============================================
 [ INFO: possible recursive locking detected ]
 3.9.0upstream-10129-g167af0e #1 Not tainted
 ---------------------------------------------
 sh/2487 is trying to acquire lock:
  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20

 but task is already holding lock:
  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(x86_cpu_hotplug_driver_mutex);
   lock(x86_cpu_hotplug_driver_mutex);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 6 locks held by sh/2487:
  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190
  #1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160
  #2:  (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160
  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20
  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60

Suggested-and-Acked-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: fenghua.yu@intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1368029583-23337-1-git-send-email-konrad.wilk@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
903bded
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue May 30, 2013
Konrad Rzeszutek Wilk + Ingo Molnar x86/microcode: Add local mutex to fix physical CPU hot-add deadlock
This can easily be triggered if a new CPU is added (via
ACPI hotplug mechanism) and from user-space you do:

   echo 1 > /sys/devices/system/cpu/cpu3/online

(or wait for UDEV to do it) on a newly appeared physical CPU.

The deadlock is that the "store_online" in drivers/base/cpu.c
takes the cpu_hotplug_driver_lock() lock, then calls "cpu_up".
"cpu_up" eventually ends up calling "save_mc_for_early"
which also takes the cpu_hotplug_driver_lock() lock.

And here is that lockdep thinks of it:

 smpboot: Stack at about ffff880075c39f44
 smpboot: CPU3: has booted.
 microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25

 =============================================
 [ INFO: possible recursive locking detected ]
 3.9.0upstream-10129-g167af0e #1 Not tainted
 ---------------------------------------------
 sh/2487 is trying to acquire lock:
  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20

 but task is already holding lock:
  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(x86_cpu_hotplug_driver_mutex);
   lock(x86_cpu_hotplug_driver_mutex);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 6 locks held by sh/2487:
  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190
  #1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160
  #2:  (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160
  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20
  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60

Suggested-and-Acked-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: fenghua.yu@intel.com
Cc: xen-devel@lists.xensource.com
Cc: stable@vger.kernel.org # for v3.9
Link: http://lkml.kernel.org/r/1368029583-23337-1-git-send-email-konrad.wilk@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
074d72f
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue May 30, 2013
@linvjw Sujith Manoharan + linvjw ath9k: Fix crash on module unload
Make sure that any open relayfs files are closed before
unregistering with mac80211, otherwise this crash is seen:

[ 1331.097846] BUG: unable to handle kernel paging request at 6b6b6b8b
[ 1331.098170] IP: [<c063d0d6>] debugfs_remove+0x26/0x80
[ 1331.098170] *pdpt = 000000002f9aa001 *pde = 0000000000000000
[ 1331.098170] Oops: 0000 [#1] PREEMPT SMP
[ 1331.098170] Modules linked in: iptable_raw xt_CT nf_conntrack_ipv4 nf_defrag]
[ 1331.098170] Pid: 4794, comm: rmmod Tainted: G        WC   3.9.1+ #5 To Be Fi.
[ 1331.098170] EIP: 0060:[<c063d0d6>] EFLAGS: 00010202 CPU: 0
[ 1331.098170] EIP is at debugfs_remove+0x26/0x80
[ 1331.098170] EAX: f2f3acd0 EBX: f2f3acd0 ECX: 00000006 EDX: f8622348
[ 1331.098170] ESI: 6b6b6b6b EDI: 00000001 EBP: ee251e14 ESP: ee251e0c
[ 1331.098170]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 1331.098170] CR0: 8005003b CR2: 6b6b6b8b CR3: 2e7b7000 CR4: 000007e0
[ 1331.098170] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 1331.098170] DR6: ffff0ff0 DR7: 00000400
[ 1331.098170] Process rmmod (pid: 4794, ti=ee250000 task=efaa2560 task.ti=ee25)
[ 1331.098170] Stack:
[ 1331.098170]  f241e170 0000000a ee251e1c f861394d ee251e28 c04e3088 f241e170 4
[ 1331.098170]  c04e30fe f45482b0 ee251e54 c04e3187 f25e86b0 ee251e54 f8618748 0
[ 1331.098170]  0000000a 00000001 ee251e68 f860065b f2509e20 f25085a0 f5b6e8a4 8
[ 1331.098170] Call Trace:
[ 1331.098170]  [<f861394d>] remove_buf_file_handler+0xd/0x20 [ath9k]
[ 1331.098170]  [<c04e3088>] relay_remove_buf+0x18/0x30
[ 1331.098170]  [<c04e30fe>] relay_close_buf+0x2e/0x40
[ 1331.098170]  [<c04e3187>] relay_close+0x77/0xf0
[ 1331.098170]  [<f8618748>] ? dpd_exit+0x38/0x40 [ath9k]
[ 1331.098170]  [<f860065b>] ath9k_deinit_softc+0x8b/0xa0 [ath9k]
[ 1331.098170]  [<f86006b8>] ath9k_deinit_device+0x48/0x60 [ath9k]
[ 1331.098170]  [<f86107f1>] ath_pci_remove+0x31/0x50 [ath9k]
[ 1331.098170]  [<c06dbff8>] pci_device_remove+0x38/0xc0
[ 1331.098170]  [<c079daa4>] __device_release_driver+0x64/0xc0
[ 1331.098170]  [<c079db97>] driver_detach+0x97/0xa0
[ 1331.098170]  [<c079cacc>] bus_remove_driver+0x6c/0xe0
[ 1331.098170]  [<c079c197>] ? bus_put+0x17/0x20
[ 1331.098170]  [<c079cae3>] ? bus_remove_driver+0x83/0xe0
[ 1331.098170]  [<c079e709>] driver_unregister+0x49/0x80
[ 1331.098170]  [<c06dc138>] pci_unregister_driver+0x18/0x80
[ 1331.098170]  [<f8610602>] ath_pci_exit+0x12/0x20 [ath9k]
[ 1331.098170]  [<f8619ce0>] ath9k_exit+0x17/0x337 [ath9k]
[ 1331.098170]  [<c09e537d>] ? mutex_unlock+0xd/0x10
[ 1331.098170]  [<c04bd36c>] sys_delete_module+0x17c/0x250
[ 1331.098170]  [<c0540dc4>] ? do_munmap+0x244/0x2d0
[ 1331.098170]  [<c0540e96>] ? vm_munmap+0x46/0x60
[ 1331.098170]  [<c09e8dc4>] ? restore_all+0xf/0xf
[ 1331.098170]  [<c09ebf50>] ? __do_page_fault+0x4c0/0x4c0
[ 1331.098170]  [<c04b18e4>] ? trace_hardirqs_on_caller+0xf4/0x180
[ 1331.098170]  [<c09ef28d>] sysenter_do_call+0x12/0x38
[ 1331.098170] Code: 90 8d 74 26 00 55 89 e5 83 ec 08 89 1c 24 89 74 24 04 3e 82
[ 1331.098170] EIP: [<c063d0d6>] debugfs_remove+0x26/0x80 SS:ESP 0068:ee251e0c
[ 1331.098170] CR2: 000000006b6b6b8b
[ 1331.727971] ---[ end trace b5bb9f2066cef7f9 ]---

Cc: <stable@vger.kernel.org>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
af69009
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue May 30, 2013
@davem330 Eric Dumazet + davem330 ip_tunnel: fix kernel panic with icmp_dest_unreach
Daniel Petre reported crashes in icmp_dst_unreach() with following call
graph:

#3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
#4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
#5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
#6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
#7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
#8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
#9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596

Daniel found a similar problem mentioned in
 http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html

And indeed this is the root cause : skb->cb[] contains data fooling IP
stack.

We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
is called. Or else skb->cb[] might contain garbage from GSO segmentation
layer.

A similar fix was tested on linux-3.9, but gre code was refactored in
linux-3.10. I'll send patches for stable kernels as well.

Many thanks to Daniel for providing reports, patches and testing !

Reported-by: Daniel Petre <daniel.petre@rcs-rds.ro>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a622260
@popcornmix popcornmix pushed a commit that referenced this issue Jun 8, 2013
@gregkh Sujith Manoharan + gregkh ath9k: Fix crash on module unload
commit af69009 upstream.

Make sure that any open relayfs files are closed before
unregistering with mac80211, otherwise this crash is seen:

[ 1331.097846] BUG: unable to handle kernel paging request at 6b6b6b8b
[ 1331.098170] IP: [<c063d0d6>] debugfs_remove+0x26/0x80
[ 1331.098170] *pdpt = 000000002f9aa001 *pde = 0000000000000000
[ 1331.098170] Oops: 0000 [#1] PREEMPT SMP
[ 1331.098170] Modules linked in: iptable_raw xt_CT nf_conntrack_ipv4 nf_defrag]
[ 1331.098170] Pid: 4794, comm: rmmod Tainted: G        WC   3.9.1+ #5 To Be Fi.
[ 1331.098170] EIP: 0060:[<c063d0d6>] EFLAGS: 00010202 CPU: 0
[ 1331.098170] EIP is at debugfs_remove+0x26/0x80
[ 1331.098170] EAX: f2f3acd0 EBX: f2f3acd0 ECX: 00000006 EDX: f8622348
[ 1331.098170] ESI: 6b6b6b6b EDI: 00000001 EBP: ee251e14 ESP: ee251e0c
[ 1331.098170]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 1331.098170] CR0: 8005003b CR2: 6b6b6b8b CR3: 2e7b7000 CR4: 000007e0
[ 1331.098170] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 1331.098170] DR6: ffff0ff0 DR7: 00000400
[ 1331.098170] Process rmmod (pid: 4794, ti=ee250000 task=efaa2560 task.ti=ee25)
[ 1331.098170] Stack:
[ 1331.098170]  f241e170 0000000a ee251e1c f861394d ee251e28 c04e3088 f241e170 4
[ 1331.098170]  c04e30fe f45482b0 ee251e54 c04e3187 f25e86b0 ee251e54 f8618748 0
[ 1331.098170]  0000000a 00000001 ee251e68 f860065b f2509e20 f25085a0 f5b6e8a4 8
[ 1331.098170] Call Trace:
[ 1331.098170]  [<f861394d>] remove_buf_file_handler+0xd/0x20 [ath9k]
[ 1331.098170]  [<c04e3088>] relay_remove_buf+0x18/0x30
[ 1331.098170]  [<c04e30fe>] relay_close_buf+0x2e/0x40
[ 1331.098170]  [<c04e3187>] relay_close+0x77/0xf0
[ 1331.098170]  [<f8618748>] ? dpd_exit+0x38/0x40 [ath9k]
[ 1331.098170]  [<f860065b>] ath9k_deinit_softc+0x8b/0xa0 [ath9k]
[ 1331.098170]  [<f86006b8>] ath9k_deinit_device+0x48/0x60 [ath9k]
[ 1331.098170]  [<f86107f1>] ath_pci_remove+0x31/0x50 [ath9k]
[ 1331.098170]  [<c06dbff8>] pci_device_remove+0x38/0xc0
[ 1331.098170]  [<c079daa4>] __device_release_driver+0x64/0xc0
[ 1331.098170]  [<c079db97>] driver_detach+0x97/0xa0
[ 1331.098170]  [<c079cacc>] bus_remove_driver+0x6c/0xe0
[ 1331.098170]  [<c079c197>] ? bus_put+0x17/0x20
[ 1331.098170]  [<c079cae3>] ? bus_remove_driver+0x83/0xe0
[ 1331.098170]  [<c079e709>] driver_unregister+0x49/0x80
[ 1331.098170]  [<c06dc138>] pci_unregister_driver+0x18/0x80
[ 1331.098170]  [<f8610602>] ath_pci_exit+0x12/0x20 [ath9k]
[ 1331.098170]  [<f8619ce0>] ath9k_exit+0x17/0x337 [ath9k]
[ 1331.098170]  [<c09e537d>] ? mutex_unlock+0xd/0x10
[ 1331.098170]  [<c04bd36c>] sys_delete_module+0x17c/0x250
[ 1331.098170]  [<c0540dc4>] ? do_munmap+0x244/0x2d0
[ 1331.098170]  [<c0540e96>] ? vm_munmap+0x46/0x60
[ 1331.098170]  [<c09e8dc4>] ? restore_all+0xf/0xf
[ 1331.098170]  [<c09ebf50>] ? __do_page_fault+0x4c0/0x4c0
[ 1331.098170]  [<c04b18e4>] ? trace_hardirqs_on_caller+0xf4/0x180
[ 1331.098170]  [<c09ef28d>] sysenter_do_call+0x12/0x38
[ 1331.098170] Code: 90 8d 74 26 00 55 89 e5 83 ec 08 89 1c 24 89 74 24 04 3e 82
[ 1331.098170] EIP: [<c063d0d6>] debugfs_remove+0x26/0x80 SS:ESP 0068:ee251e0c
[ 1331.098170] CR2: 000000006b6b6b8b
[ 1331.727971] ---[ end trace b5bb9f2066cef7f9 ]---

Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
d39b7e1
@popcornmix popcornmix pushed a commit that referenced this issue Jun 8, 2013
@rustyrussell @gregkh rustyrussell + gregkh module: don't unlink the module until we've removed all exposure.
commit 944a1fa upstream.

Otherwise we get a race between unload and reload of the same module:
the new module doesn't see the old one in the list, but then fails because
it can't register over the still-extant entries in sysfs:

 [  103.981925] ------------[ cut here ]------------
 [  103.986902] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xab/0xd0()
 [  103.993606] Hardware name: CrownBay Platform
 [  103.998075] sysfs: cannot create duplicate filename '/module/pch_gbe'
 [  104.004784] Modules linked in: pch_gbe(+) [last unloaded: pch_gbe]
 [  104.011362] Pid: 3021, comm: modprobe Tainted: G        W    3.9.0-rc5+ #5
 [  104.018662] Call Trace:
 [  104.021286]  [<c103599d>] warn_slowpath_common+0x6d/0xa0
 [  104.026933]  [<c1168c8b>] ? sysfs_add_one+0xab/0xd0
 [  104.031986]  [<c1168c8b>] ? sysfs_add_one+0xab/0xd0
 [  104.037000]  [<c1035a4e>] warn_slowpath_fmt+0x2e/0x30
 [  104.042188]  [<c1168c8b>] sysfs_add_one+0xab/0xd0
 [  104.046982]  [<c1168dbe>] create_dir+0x5e/0xa0
 [  104.051633]  [<c1168e78>] sysfs_create_dir+0x78/0xd0
 [  104.056774]  [<c1262bc3>] kobject_add_internal+0x83/0x1f0
 [  104.062351]  [<c126daf6>] ? kvasprintf+0x46/0x60
 [  104.067231]  [<c1262ebd>] kobject_add_varg+0x2d/0x50
 [  104.072450]  [<c1262f07>] kobject_init_and_add+0x27/0x30
 [  104.078075]  [<c1089240>] mod_sysfs_setup+0x80/0x540
 [  104.083207]  [<c1260851>] ? module_bug_finalize+0x51/0xc0
 [  104.088720]  [<c108ab29>] load_module+0x1429/0x18b0

We can teardown sysfs first, then to be sure, put the state in
MODULE_STATE_UNFORMED so it's ignored while we deconstruct it.

Reported-by: Veaceslav Falico <vfalico@redhat.com>
Tested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ben Greear <greearb@candelatech.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1532d7f
@popcornmix popcornmix pushed a commit that referenced this issue Jul 3, 2013
@paulmck paulmck rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
In Steven Rostedt's words:

> I've been debugging the last couple of days why my tests have been
> locking up. One of my tracing tests, runs all available tracers. The
> lockup always happened with the mmiotrace, which is used to trace
> interactions between priority drivers and the kernel. But to do this
> easily, when the tracer gets registered, it disables all but the boot
> CPUs. The lockup always happened after it got done disabling the CPUs.
>
> Then I decided to try this:
>
> while :; do
> 	for i in 1 2 3; do
> 		echo 0 > /sys/devices/system/cpu/cpu$i/online
> 	done
> 	for i in 1 2 3; do
> 		echo 1 > /sys/devices/system/cpu/cpu$i/online
> 	done
> done
>
> Well, sure enough, that locked up too, with the same users. Doing a
> sysrq-w (showing all blocked tasks):
>
> [ 2991.344562]   task                        PC stack   pid father
> [ 2991.344562] rcu_preempt     D ffff88007986fdf8     0    10      2 0x00000000
> [ 2991.344562]  ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
> [ 2991.344562]  ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
> [ 2991.344562]  ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81541750>] schedule_timeout+0xbc/0xf9
> [ 2991.344562]  [<ffffffff8154bec0>] ? ftrace_call+0x5/0x2f
> [ 2991.344562]  [<ffffffff81049513>] ? cascade+0xa8/0xa8
> [ 2991.344562]  [<ffffffff815417ab>] schedule_timeout_uninterruptible+0x1e/0x20
> [ 2991.344562]  [<ffffffff810c980c>] rcu_gp_kthread+0x502/0x94b
> [ 2991.344562]  [<ffffffff81062791>] ? __init_waitqueue_head+0x50/0x50
> [ 2991.344562]  [<ffffffff810c930a>] ? rcu_gp_fqs+0x64/0x64
> [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
> [ 2991.344562]  [<ffffffff81091e31>] ? lock_release_holdtime.part.23+0x4e/0x55
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] kworker/0:1     D ffffffff81a30680     0    47      2 0x00000000
> [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
> [ 2991.344562]  ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
> [ 2991.344562]  ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
> [ 2991.344562]  ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
> [ 2991.344562]  [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff810af7e6>] rebuild_sched_domains_locked+0x6e/0x3a8
> [ 2991.344562]  [<ffffffff810b0ec6>] rebuild_sched_domains+0x1c/0x2a
> [ 2991.344562]  [<ffffffff810b109b>] cpuset_hotplug_workfn+0x1c7/0x1d3
> [ 2991.344562]  [<ffffffff810b0ed9>] ? cpuset_hotplug_workfn+0x5/0x1d3
> [ 2991.344562]  [<ffffffff81058e07>] process_one_work+0x2d4/0x4d1
> [ 2991.344562]  [<ffffffff81058d3a>] ? process_one_work+0x207/0x4d1
> [ 2991.344562]  [<ffffffff8105964c>] worker_thread+0x2e7/0x3b5
> [ 2991.344562]  [<ffffffff81059365>] ? rescuer_thread+0x332/0x332
> [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] bash            D ffffffff81a4aa80     0  2618   2612 0x10000000
> [ 2991.344562]  ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
> [ 2991.344562]  ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
> [ 2991.344562]  ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
> [ 2991.344562]  [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff81091c99>] ? __lock_is_held+0x32/0x53
> [ 2991.344562]  [<ffffffff81548912>] notifier_call_chain+0x6b/0x98
> [ 2991.344562]  [<ffffffff810671fd>] __raw_notifier_call_chain+0xe/0x10
> [ 2991.344562]  [<ffffffff8103cf64>] __cpu_notify+0x20/0x32
> [ 2991.344562]  [<ffffffff8103cf8d>] cpu_notify_nofail+0x17/0x36
> [ 2991.344562]  [<ffffffff815225de>] _cpu_down+0x154/0x259
> [ 2991.344562]  [<ffffffff81522710>] cpu_down+0x2d/0x3a
> [ 2991.344562]  [<ffffffff81526351>] store_online+0x4e/0xe7
> [ 2991.344562]  [<ffffffff8134d764>] dev_attr_store+0x20/0x22
> [ 2991.344562]  [<ffffffff811b3c5f>] sysfs_write_file+0x108/0x144
> [ 2991.344562]  [<ffffffff8114c5ef>] vfs_write+0xfd/0x158
> [ 2991.344562]  [<ffffffff8114c928>] SyS_write+0x5c/0x83
> [ 2991.344562]  [<ffffffff8154c494>] tracesys+0xdd/0xe2
>
> As well as held locks:
>
> [ 3034.728033] Showing all locks held in the system:
> [ 3034.728033] 1 lock held by rcu_preempt/10:
> [ 3034.728033]  #0:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff810c9471>] rcu_gp_kthread+0x167/0x94b
> [ 3034.728033] 4 locks held by kworker/0:1/47:
> [ 3034.728033]  #0:  (events){.+.+.+}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
> [ 3034.728033]  #1:  (cpuset_hotplug_work){+.+.+.}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
> [ 3034.728033]  #2:  (cpuset_mutex){+.+.+.}, at: [<ffffffff810b0ec1>] rebuild_sched_domains+0x17/0x2a
> [ 3034.728033]  #3:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
> [ 3034.728033] 1 lock held by mingetty/2563:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2565:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2569:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2572:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2575:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 7 locks held by bash/2618:
> [ 3034.728033]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8114bc3f>] file_start_write+0x2a/0x2c
> [ 3034.728033]  #1:  (&buffer->mutex#2){+.+.+.}, at: [<ffffffff811b3b93>] sysfs_write_file+0x3c/0x144
> [ 3034.728033]  #2:  (s_active#54){.+.+.+}, at: [<ffffffff811b3c3e>] sysfs_write_file+0xe7/0x144
> [ 3034.728033]  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff810217c2>] cpu_hotplug_driver_lock+0x17/0x19
> [ 3034.728033]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103d196>] cpu_maps_update_begin+0x17/0x19
> [ 3034.728033]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103cfd8>] cpu_hotplug_begin+0x2c/0x6d
> [ 3034.728033]  #6:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
> [ 3034.728033] 1 lock held by bash/2980:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
>
> Things looked a little weird. Also, this is a deadlock that lockdep did
> not catch. But what we have here does not look like a circular lock
> issue:
>
> Bash is blocked in rcu_cpu_notify():
>
> 1961		/* Exclude any attempts to start a new grace period. */
> 1962		mutex_lock(&rsp->onoff_mutex);
>
>
> kworker is blocked in get_online_cpus(), which makes sense as we are
> currently taking down a CPU.
>
> But rcu_preempt is not blocked on anything. It is simply sleeping in
> rcu_gp_kthread (really rcu_gp_init) here:
>
> 1453	#ifdef CONFIG_PROVE_RCU_DELAY
> 1454			if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &&
> 1455			    system_state == SYSTEM_RUNNING)
> 1456				schedule_timeout_uninterruptible(2);
> 1457	#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
>
> And it does this while holding the onoff_mutex that bash is waiting for.
>
> Doing a function trace, it showed me where it happened:
>
> [  125.940066] rcu_pree-10      3.... 28384115273: schedule_timeout_uninterruptible <-rcu_gp_kthread
> [...]
> [  125.940066] rcu_pree-10      3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120
>
> The watchdog ran, and then:
>
> [  125.940066] watchdog-38      3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118
>
> Not sure what modprobe was doing, but shortly after that:
>
> [  125.940066] modprobe-2848    3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0
>
> Where the migration thread took down the CPU:
>
> [  125.940066] migratio-40      3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120
>
> which finally did:
>
> [  125.940066]   <idle>-0       3...1 28389282142: arch_cpu_idle_dead <-cpu_startup_entry
> [  125.940066]   <idle>-0       3...1 28389282548: native_play_dead <-arch_cpu_idle_dead
> [  125.940066]   <idle>-0       3...1 28389282924: play_dead_common <-native_play_dead
> [  125.940066]   <idle>-0       3...1 28389283468: idle_task_exit <-play_dead_common
> [  125.940066]   <idle>-0       3...1 28389284644: amd_e400_remove_cpu <-play_dead_common
>
>
> CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
> doing a schedule_timeout_uninterruptible() and it registered it's
> timeout to the timer base for CPU 3. You would think that it would get
> migrated right? The issue here is that the timer migration happens at
> the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
> CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
> held by the thread that just put itself into a uninterruptible sleep,
> that wont wake up until the CPU_DEAD notifier of the timer
> infrastructure is called, which wont happen until the rcu notifier
> finishes. Here's our deadlock!

This commit breaks this deadlock cycle by substituting a shorter udelay()
for the previous schedule_timeout_uninterruptible(), while at the same
time increasing the probability of the delay.  This maintains the intensity
of the testing.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
971394f
@popcornmix popcornmix pushed a commit that referenced this issue Jul 3, 2013
Dave Chinner + Ben Myers xfs: fix implicit padding in directory and attr CRC formats
Michael L. Semon has been testing CRC patches on a 32 bit system and
been seeing assert failures in the directory code from xfs/080.
Thanks to Michael's heroic efforts with printk debugging, we found
that the problem was that the last free space being left in the
directory structure was too small to fit a unused tag structure and
it was being corrupted and attempting to log a region out of bounds.
Hence the assert failure looked something like:

.....
#5 calling xfs_dir2_data_log_unused() 36 32
#1 4092 4095 4096
#2 8182 8183 4096
XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568

Where #1 showed the first region of the dup being logged (i.e. the
last 4 bytes of a directory buffer) and #2 shows the corrupt values
being calculated from the length of the dup entry which overflowed
the size of the buffer.

It turns out that the problem was not in the logging code, nor in
the freespace handling code. It is an initial condition bug that
only shows up on 32 bit systems. When a new buffer is initialised,
where's the freespace that is set up:

[  172.316249] calling xfs_dir2_leaf_addname() from xfs_dir_createname()
[  172.316346] #9 calling xfs_dir2_data_log_unused()
[  172.316351] #1 calling xfs_trans_log_buf() 60 63 4096
[  172.316353] #2 calling xfs_trans_log_buf() 4094 4095 4096

Note the offset of the first region being logged? It's 60 bytes into
the buffer. Once I saw that, I pretty much knew that the bug was
going to be caused by this.

Essentially, all direct entries are rounded to 8 bytes in length,
and all entries start with an 8 byte alignment. This means that we
can decode inplace as variables are naturally aligned. With the
directory data supposedly starting on a 8 byte boundary, and all
entries padded to 8 bytes, the minimum freespace in a directory
block is supposed to be 8 bytes, which is large enough to fit a
unused data entry structure (6 bytes in size). The fact we only have
4 bytes of free space indicates a directory data block alignment
problem.

And what do you know - there's an implicit hole in the directory
data block header for the CRC format, which means the header is 60
byte on 32 bit intel systems and 64 bytes on 64 bit systems. Needs
padding. And while looking at the structures, I found the same
problem in the attr leaf header. Fix them both.

Note that this only affects 32 bit systems with CRCs enabled.
Everything else is just fine. Note that CRC enabled filesystems created
before this fix on such systems will not be readable with this fix
applied.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Debugged-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 8a1fd29)
5170711
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Jul 3, 2013
Dave Chinner + Ben Myers xfs: fix implicit padding in directory and attr CRC formats
Michael L. Semon has been testing CRC patches on a 32 bit system and
been seeing assert failures in the directory code from xfs/080.
Thanks to Michael's heroic efforts with printk debugging, we found
that the problem was that the last free space being left in the
directory structure was too small to fit a unused tag structure and
it was being corrupted and attempting to log a region out of bounds.
Hence the assert failure looked something like:

.....
#5 calling xfs_dir2_data_log_unused() 36 32
#1 4092 4095 4096
#2 8182 8183 4096
XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568

Where #1 showed the first region of the dup being logged (i.e. the
last 4 bytes of a directory buffer) and #2 shows the corrupt values
being calculated from the length of the dup entry which overflowed
the size of the buffer.

It turns out that the problem was not in the logging code, nor in
the freespace handling code. It is an initial condition bug that
only shows up on 32 bit systems. When a new buffer is initialised,
where's the freespace that is set up:

[  172.316249] calling xfs_dir2_leaf_addname() from xfs_dir_createname()
[  172.316346] #9 calling xfs_dir2_data_log_unused()
[  172.316351] #1 calling xfs_trans_log_buf() 60 63 4096
[  172.316353] #2 calling xfs_trans_log_buf() 4094 4095 4096

Note the offset of the first region being logged? It's 60 bytes into
the buffer. Once I saw that, I pretty much knew that the bug was
going to be caused by this.

Essentially, all direct entries are rounded to 8 bytes in length,
and all entries start with an 8 byte alignment. This means that we
can decode inplace as variables are naturally aligned. With the
directory data supposedly starting on a 8 byte boundary, and all
entries padded to 8 bytes, the minimum freespace in a directory
block is supposed to be 8 bytes, which is large enough to fit a
unused data entry structure (6 bytes in size). The fact we only have
4 bytes of free space indicates a directory data block alignment
problem.

And what do you know - there's an implicit hole in the directory
data block header for the CRC format, which means the header is 60
byte on 32 bit intel systems and 64 bytes on 64 bit systems. Needs
padding. And while looking at the structures, I found the same
problem in the attr leaf header. Fix them both.

Note that this only affects 32 bit systems with CRCs enabled.
Everything else is just fine. Note that CRC enabled filesystems created
before this fix on such systems will not be readable with this fix
applied.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Debugged-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
8a1fd29
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Jul 3, 2013
@matteofacchinetti @gregkh matteofacchinetti + gregkh serial/mpc52xx_uart: fix kernel panic when system reboot
This bug appear when a second PSC based driver appends an interrupt
routine to the FIFO controller shared interrupt (like spi-mpc512x-psc).
When reboot, uart_shutdown() remove the serial console interrupt handler
while spi-mpc512x-psc isr is still activate and cause the following kernel
panic:

The system is going down for reboot NOW!rpc (ttyPSC0) (Mon Jun 10 12:26:07 20
INIT: Sending processirq 40: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc4-next-20130607-00001-ga0bceb3-dirty #5
Call Trace:
[cfff9f00] [c0007910] show_stack+0x48/0x150 (unreliable)
[cfff9f40] [c005ae60] __report_bad_irq.isra.6+0x34/0xe0
[cfff9f60] [c005b194] note_interrupt+0x214/0x26c
[cfff9f90] [c00590fc] handle_irq_event_percpu+0xd0/0x1bc
[cfff9fd0] [c005921c] handle_irq_event+0x34/0x54
[cfff9fe0] [c005b8f4] handle_level_irq+0x90/0xf4
[cfff9ff0] [c000cb98] call_handle_irq+0x18/0x28
[c050dd60] [c000575c] do_IRQ+0xcc/0x124
[c050dd90] [c000eb04] ret_from_except+0x0/0x14

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8a29dfb
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Jul 3, 2013
@rostedt @rostedt rostedt + rostedt trace,x86: Do not call local_irq_save() in load_current_idt()
As load_current_idt() is now what is used to update the IDT for the
switches needed for NMI, lockdep debug, and for tracing, it must not
call local_irq_save(). This is because one of the users of this is
lockdep, which does tracing of local_irq_save() and when the debug
trap is hit, we need to update the IDT before tracing interrupts
being disabled. As load_current_idt() is used to do this, calling
local_irq_save() which lockdep traces, defeats the point of calling
load_current_idt().

As interrupts are already disabled when used by lockdep and NMI, the
only other user is tracing that can disable interrupts itself. Simply
have the tracing update disable interrupts before calling load_current_idt()
instead of breaking the other users.

Here's the dump that happened:

------------[ cut here ]------------
WARNING: at /work/autotest/nobackup/linux-test.git/kernel/fork.c:1196 copy_process+0x2c3/0x1398()
DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled)
Modules linked in:
CPU: 1 PID: 4570 Comm: gdm-simple-gree Not tainted 3.10.0-rc3-test+ #5
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 ffffffff81d2a7a5 ffff88006ed13d50 ffffffff8192822b ffff88006ed13d90
 ffffffff81035f25 ffff8800721c6000 ffff88006ed13da0 0000000001200011
 0000000000000000 ffff88006ed5e000 ffff8800721c6000 ffff88006ed13df0
Call Trace:
 [<ffffffff8192822b>] dump_stack+0x19/0x1b
 [<ffffffff81035f25>] warn_slowpath_common+0x67/0x80
 [<ffffffff81035fe1>] warn_slowpath_fmt+0x46/0x48
 [<ffffffff812bfc5d>] ? __raw_spin_lock_init+0x31/0x52
 [<ffffffff810341f7>] copy_process+0x2c3/0x1398
 [<ffffffff8103539d>] do_fork+0xa8/0x260
 [<ffffffff810ca7b1>] ? trace_preempt_on+0x2a/0x2f
 [<ffffffff812afb3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff810355cf>] SyS_clone+0x16/0x18
 [<ffffffff81938369>] stub_clone+0x69/0x90
 [<ffffffff81937fc2>] ? system_call_fastpath+0x16/0x1b
---[ end trace 8b157a9d20ca1aa2 ]---

in fork.c:

 #ifdef CONFIG_PROVE_LOCKING
	DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled); <-- bug here
	DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
 #endif

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2b4bc78
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Jul 3, 2013
@vineetgarc vineetgarc ARC: pt_regs update #5: Use real ECR for pt_regs->event vs. synth values
pt_regs->event was set with artificial values to identify the low level
system event (syscall trap / breakpoint trap / exceptions / interrupts)

With r8 saving out of the way, the full word can be used to save real
ECR (Exception Cause Register) which helps idenify the event naturally,
including additional info such as cause code, param.
Only for Interrupts, where ECR is not applicable, do we resort to
synthetic non ECR values.

SAVE_ALL_TRAP/EXCEPTIONS can now be merged as they both use ECR with
different runtime values.

The ptrace helpers now use the sub-fields of ECR to distinguish the
events (e.g. vector 0x25 is trap, param 0 is syscall...)

The following benefits will follow:

(1) This centralizes the location of where ECR is saved and will allow
    the cleanup of task->thread.cause_code ECR placeholder which is set
    in non-uniform way. Then ARC VM code can safely rely on it being
    there for purpose of finer grained VM_EXEC dcache flush (based on
    exec fault: I-TLB Miss)

(2) Further, ECR being passed around from low level handlers as arg can
    be eliminated as it is part of standard reg-file in pt_regs

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
502a0c7
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Jul 14, 2013
@torvalds torvalds Merge tag 'arc-v3.11-rc1-part1' of git://git.kernel.org/pub/scm/linux…
…/kernel/git/vgupta/arc

Pull first batch of ARC changes from Vineet Gupta:
 "There's a second bunch to follow next week - which depends on commits
  on other trees (irq/net).  I'd have preferred the accompanying ARC
  change via respective trees, but it didn't workout somehow.

  Highlights of changes:

   - Continuation of ARC MM changes from 3.10 including

       zero page optimization
       Setting pagecache pages dirty by default
       Non executable stack by default
       Reducing dcache flushes for aliasing VIPT config

   - Long overdue rework of pt_regs machinery - removing the unused word
     gutters and adding ECR register to baseline (helps cleanup lot of
     low level code)

   - Support for ARC gcc 4.8

   - Few other preventive fixes, cosmetics, usage of Kconfig helper..

  The diffstat is larger than normal primarily because of arcregs.h
  header split as well as beautification of macros in entry.h"

* tag 'arc-v3.11-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (32 commits)
  ARC: warn on improper stack unwind FDE entries
  arc: delete __cpuinit usage from all arc files
  ARC: [tlb-miss] Fix bug with CONFIG_ARC_DBG_TLB_MISS_COUNT
  ARC: [tlb-miss] Extraneous PTE bit testing/setting
  ARC: Adjustments for gcc 4.8
  ARC: Setup Vector Table Base in early boot
  ARC: Remove explicit passing around of ECR
  ARC: pt_regs update #5: Use real ECR for pt_regs->event vs. synth values
  ARC: stop using pt_regs->orig_r8
  ARC: pt_regs update #4: r25 saved/restored unconditionally
  ARC: K/U SP saved from one location in stack switching macro
  ARC: Entry Handler tweaks: Simplify branch for in-kernel preemption
  ARC: Entry Handler tweaks: Avoid hardcoded LIMMS for ECR values
  ARC: Increase readability of entry handlers
  ARC: pt_regs update #3: Remove unused gutter at start of callee_regs
  ARC: pt_regs update #2: Remove unused gutter at start of pt_regs
  ARC: pt_regs update #1: Align pt_regs end with end of kernel stack page
  ARC: pt_regs update #0: remove kernel stack canary
  ARC: [mm] Remove @write argument to do_page_fault()
  ARC: [mm] Make stack/heap Non-executable by default
  ...
76d3f4c
@popcornmix popcornmix pushed a commit that referenced this issue Jul 26, 2013
@gregkh Srivatsa S. Bhat + gregkh cpufreq: Revert commit 2f7021a to fix CPU hotplug regression
commit e8d0527 upstream.

commit 2f7021a "cpufreq: protect 'policy->cpus' from offlining
during __gov_queue_work()" caused a regression in CPU hotplug,
because it lead to a deadlock between cpufreq governor worker thread
and the CPU hotplug writer task.

Lockdep splat corresponding to this deadlock is shown below:

[   60.277396] ======================================================
[   60.277400] [ INFO: possible circular locking dependency detected ]
[   60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 Not tainted
[   60.277411] -------------------------------------------------------
[   60.277417] bash/2225 is trying to acquire lock:
[   60.277422]  ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280
[   60.277444] but task is already holding lock:
[   60.277449]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
[   60.277465] which lock already depends on the new lock.

[   60.277472] the existing dependency chain (in reverse order) is:
[   60.277477] -> #2 (cpu_hotplug.lock){+.+.+.}:
[   60.277490]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.277503]        [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410
[   60.277514]        [<ffffffff81042cbc>] get_online_cpus+0x3c/0x60
[   60.277522]        [<ffffffff814b842a>] gov_queue_work+0x2a/0xb0
[   60.277532]        [<ffffffff814b7891>] cs_dbs_timer+0xc1/0xe0
[   60.277543]        [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0
[   60.277552]        [<ffffffff81063d31>] worker_thread+0x121/0x3a0
[   60.277560]        [<ffffffff8106ae2b>] kthread+0xdb/0xe0
[   60.277569]        [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0
[   60.277580] -> #1 (&j_cdbs->timer_mutex){+.+...}:
[   60.277592]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.277600]        [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410
[   60.277608]        [<ffffffff814b785d>] cs_dbs_timer+0x8d/0xe0
[   60.277616]        [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0
[   60.277624]        [<ffffffff81063d31>] worker_thread+0x121/0x3a0
[   60.277633]        [<ffffffff8106ae2b>] kthread+0xdb/0xe0
[   60.277640]        [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0
[   60.277649] -> #0 ((&(&j_cdbs->work)->work)){+.+...}:
[   60.277661]        [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30
[   60.277669]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.277677]        [<ffffffff810621ed>] flush_work+0x3d/0x280
[   60.277685]        [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120
[   60.277693]        [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20
[   60.277701]        [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0
[   60.277709]        [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20
[   60.277719]        [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100
[   60.277728]        [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0
[   60.277737]        [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c
[   60.277747]        [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110
[   60.277759]        [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10
[   60.277768]        [<ffffffff815a0a68>] _cpu_down+0x88/0x330
[   60.277779]        [<ffffffff815a0d46>] cpu_down+0x36/0x50
[   60.277788]        [<ffffffff815a2748>] store_online+0x98/0xd0
[   60.277796]        [<ffffffff81452a28>] dev_attr_store+0x18/0x30
[   60.277806]        [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150
[   60.277818]        [<ffffffff8116806d>] vfs_write+0xbd/0x1f0
[   60.277826]        [<ffffffff811686fc>] SyS_write+0x4c/0xa0
[   60.277834]        [<ffffffff815bbbbe>] tracesys+0xd0/0xd5
[   60.277842] other info that might help us debug this:

[   60.277848] Chain exists of:
  (&(&j_cdbs->work)->work) --> &j_cdbs->timer_mutex --> cpu_hotplug.lock

[   60.277864]  Possible unsafe locking scenario:

[   60.277869]        CPU0                    CPU1
[   60.277873]        ----                    ----
[   60.277877]   lock(cpu_hotplug.lock);
[   60.277885]                                lock(&j_cdbs->timer_mutex);
[   60.277892]                                lock(cpu_hotplug.lock);
[   60.277900]   lock((&(&j_cdbs->work)->work));
[   60.277907]  *** DEADLOCK ***

[   60.277915] 6 locks held by bash/2225:
[   60.277919]  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffff81168173>] vfs_write+0x1c3/0x1f0
[   60.277937]  #1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff811d9e3c>] sysfs_write_file+0x3c/0x150
[   60.277954]  #2:  (s_active#61){.+.+.+}, at: [<ffffffff811d9ec3>] sysfs_write_file+0xc3/0x150
[   60.277972]  #3:  (x86_cpu_hotplug_driver_mutex){+.+...}, at: [<ffffffff81024cf7>] cpu_hotplug_driver_lock+0x17/0x20
[   60.277990]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff815a0d32>] cpu_down+0x22/0x50
[   60.278007]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
[   60.278023] stack backtrace:
[   60.278031] CPU: 3 PID: 2225 Comm: bash Not tainted 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744
[   60.278037] Hardware name: Acer             Aspire 5741G    /Aspire 5741G    , BIOS V1.20 02/08/2011
[   60.278042]  ffffffff8204e110 ffff88014df6b9f8 ffffffff815b3d90 ffff88014df6ba38
[   60.278055]  ffffffff815b0a8d ffff880150ed3f60 ffff880150ed4770 3871c4002c8980b2
[   60.278068]  ffff880150ed4748 ffff880150ed4770 ffff880150ed3f60 ffff88014df6bb00
[   60.278081] Call Trace:
[   60.278091]  [<ffffffff815b3d90>] dump_stack+0x19/0x1b
[   60.278101]  [<ffffffff815b0a8d>] print_circular_bug+0x2b6/0x2c5
[   60.278111]  [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30
[   60.278123]  [<ffffffff81067e08>] ? __kernel_text_address+0x58/0x80
[   60.278134]  [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.278142]  [<ffffffff810621b5>] ? flush_work+0x5/0x280
[   60.278151]  [<ffffffff810621ed>] flush_work+0x3d/0x280
[   60.278159]  [<ffffffff810621b5>] ? flush_work+0x5/0x280
[   60.278169]  [<ffffffff810a9b14>] ? mark_held_locks+0x94/0x140
[   60.278178]  [<ffffffff81062d77>] ? __cancel_work_timer+0x77/0x120
[   60.278188]  [<ffffffff810a9cbd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[   60.278196]  [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120
[   60.278206]  [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20
[   60.278214]  [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0
[   60.278225]  [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20
[   60.278234]  [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100
[   60.278244]  [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0
[   60.278255]  [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c
[   60.278265]  [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110
[   60.278275]  [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10
[   60.278284]  [<ffffffff815a0a68>] _cpu_down+0x88/0x330
[   60.278292]  [<ffffffff81024cf7>] ? cpu_hotplug_driver_lock+0x17/0x20
[   60.278302]  [<ffffffff815a0d46>] cpu_down+0x36/0x50
[   60.278311]  [<ffffffff815a2748>] store_online+0x98/0xd0
[   60.278320]  [<ffffffff81452a28>] dev_attr_store+0x18/0x30
[   60.278329]  [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150
[   60.278337]  [<ffffffff8116806d>] vfs_write+0xbd/0x1f0
[   60.278347]  [<ffffffff81185950>] ? fget_light+0x320/0x4b0
[   60.278355]  [<ffffffff811686fc>] SyS_write+0x4c/0xa0
[   60.278364]  [<ffffffff815bbbbe>] tracesys+0xd0/0xd5
[   60.280582] smpboot: CPU 1 is now offline

The intention of that commit was to avoid warnings during CPU
hotplug, which indicated that offline CPUs were getting IPIs from the
cpufreq governor's work items.  But the real root-cause of that
problem was commit a66b2e5 (cpufreq: Preserve sysfs files across
suspend/resume) because it totally skipped all the cpufreq callbacks
during CPU hotplug in the suspend/resume path, and hence it never
actually shut down the cpufreq governor's worker threads during CPU
offline in the suspend/resume path.

Reflecting back, the reason why we never suspected that commit as the
root-cause earlier, was that the original issue was reported with
just the halt command and nobody had brought in suspend/resume to the
equation.

The reason for _that_ in turn, as it turns out, is that earlier
halt/shutdown was being done by disabling non-boot CPUs while tasks
were frozen, just like suspend/resume....  but commit cf7df37
(reboot: migrate shutdown/reboot to boot cpu) which came somewhere
along that very same time changed that logic: shutdown/halt no longer
takes CPUs offline.  Thus, the test-cases for reproducing the bug
were vastly different and thus we went totally off the trail.

Overall, it was one hell of a confusion with so many commits
affecting each other and also affecting the symptoms of the problems
in subtle ways.  Finally, now since the original problematic commit
(a66b2e5) has been completely reverted, revert this intermediate fix
too (2f7021a), to fix the CPU hotplug deadlock.  Phew!

Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Peter Wu <lekensteyn@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
916f4db
@popcornmix popcornmix pushed a commit that referenced this issue Aug 16, 2013
@gregkh Amit Shah + gregkh virtio: console: clean up port data immediately at time of unplug
commit ea3768b upstream.

We used to keep the port's char device structs and the /sys entries
around till the last reference to the port was dropped.  This is
actually unnecessary, and resulted in buggy behaviour:

1. Open port in guest
2. Hot-unplug port
3. Hot-plug a port with the same 'name' property as the unplugged one

This resulted in hot-plug being unsuccessful, as a port with the same
name already exists (even though it was unplugged).

This behaviour resulted in a warning message like this one:

-------------------8<---------------------------------------
WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
Hardware name: KVM
sysfs: cannot create duplicate filename
'/devices/pci0000:00/0000:00:04.0/virtio0/virtio-ports/vport0p1'

Call Trace:
 [<ffffffff8106b607>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106b6f6>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff811f2319>] ? sysfs_add_one+0xc9/0x130
 [<ffffffff811f23e8>] ? create_dir+0x68/0xb0
 [<ffffffff811f2469>] ? sysfs_create_dir+0x39/0x50
 [<ffffffff81273129>] ? kobject_add_internal+0xb9/0x260
 [<ffffffff812733d8>] ? kobject_add_varg+0x38/0x60
 [<ffffffff812734b4>] ? kobject_add+0x44/0x70
 [<ffffffff81349de4>] ? get_device_parent+0xf4/0x1d0
 [<ffffffff8134b389>] ? device_add+0xc9/0x650

-------------------8<---------------------------------------

Instead of relying on guest applications to release all references to
the ports, we should go ahead and unregister the port from all the core
layers.  Any open/read calls on the port will then just return errors,
and an unplug/plug operation on the host will succeed as expected.

This also caused buggy behaviour in case of the device removal (not just
a port): when the device was removed (which means all ports on that
device are removed automatically as well), the ports with active
users would clean up only when the last references were dropped -- and
it would be too late then to be referencing char device pointers,
resulting in oopses:

-------------------8<---------------------------------------
PID: 6162   TASK: ffff8801147ad500  CPU: 0   COMMAND: "cat"
 #0 [ffff88011b9d5a90] machine_kexec at ffffffff8103232b
 #1 [ffff88011b9d5af0] crash_kexec at ffffffff810b9322
 #2 [ffff88011b9d5bc0] oops_end at ffffffff814f4a50
 #3 [ffff88011b9d5bf0] die at ffffffff8100f26b
 #4 [ffff88011b9d5c20] do_general_protection at ffffffff814f45e2
 #5 [ffff88011b9d5c50] general_protection at ffffffff814f3db5
    [exception RIP: strlen+2]
    RIP: ffffffff81272ae2  RSP: ffff88011b9d5d00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff880118901c18  RCX: 0000000000000000
    RDX: ffff88011799982c  RSI: 00000000000000d0  RDI: 3a303030302f3030
    RBP: ffff88011b9d5d38   R8: 0000000000000006   R9: ffffffffa0134500
    R10: 0000000000001000  R11: 0000000000001000  R12: ffff880117a1cc10
    R13: 00000000000000d0  R14: 0000000000000017  R15: ffffffff81aff700
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff88011b9d5d00] kobject_get_path at ffffffff8126dc5d
 #7 [ffff88011b9d5d40] kobject_uevent_env at ffffffff8126e551
 #8 [ffff88011b9d5dd0] kobject_uevent at ffffffff8126e9eb
 #9 [ffff88011b9d5de0] device_del at ffffffff813440c7

-------------------8<---------------------------------------

So clean up when we have all the context, and all that's left to do when
the references to the port have dropped is to free up the port struct
itself.

Reported-by: chayang <chayang@redhat.com>
Reported-by: YOGANANTH SUBRAMANIAN <anantyog@in.ibm.com>
Reported-by: FuXiangChun <xfu@redhat.com>
Reported-by: Qunfang Zhang <qzhang@redhat.com>
Reported-by: Sibiao Luo <sluo@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
707ea94
@davet321 davet321 pushed a commit to davet321/rpi-linux that referenced this issue Aug 30, 2013
@gregkh Vineet Gupta + gregkh ARC: gdbserver breakage in Big-Endian configuration #1
[Based on mainline commit 502a0c7: "ARC: pt_regs update #5"]

gdbserver needs @stop_pc, served by ptrace, but fetched from pt_regs
differently, based on in_brkpt_traps(), which in turn relies on
additional machine state in pt_regs->event bitfield.

        unsigned long orig_r8:16, event:16;

For big endian config, this macro was returning false, despite being in
breakpoint Trap exception, causing wrong @stop_pc to be returned to gdb.

Issue #1: In BE, @event above is at offset 2 in word, while a STW insn
          at offset 0 was used to update it. Resort to using ST insn
	  which updates the half-word at right location.

Issue #2: The union involving bitfields causes all the members to be
	  laid out at offset 0. So with fix #1 above, ASM was now
	  updating at offset 2, "C" code was still referencing at
	  offset 0. Fixed by wrapping bitfield in a struct.

Reported-by: Noam Camus <noamc@ezchip.com>
Tested-by: Anton Kolesov <akolesov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9b2c750
@popcornmix popcornmix pushed a commit that referenced this issue Sep 4, 2013
Srivatsa S. Bhat + Rafael J. Wysocki cpufreq: Revert commit 2f7021a to fix CPU hotplug regression
commit 2f7021a "cpufreq: protect 'policy->cpus' from offlining
during __gov_queue_work()" caused a regression in CPU hotplug,
because it lead to a deadlock between cpufreq governor worker thread
and the CPU hotplug writer task.

Lockdep splat corresponding to this deadlock is shown below:

[   60.277396] ======================================================
[   60.277400] [ INFO: possible circular locking dependency detected ]
[   60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 Not tainted
[   60.277411] -------------------------------------------------------
[   60.277417] bash/2225 is trying to acquire lock:
[   60.277422]  ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280
[   60.277444] but task is already holding lock:
[   60.277449]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
[   60.277465] which lock already depends on the new lock.

[   60.277472] the existing dependency chain (in reverse order) is:
[   60.277477] -> #2 (cpu_hotplug.lock){+.+.+.}:
[   60.277490]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.277503]        [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410
[   60.277514]        [<ffffffff81042cbc>] get_online_cpus+0x3c/0x60
[   60.277522]        [<ffffffff814b842a>] gov_queue_work+0x2a/0xb0
[   60.277532]        [<ffffffff814b7891>] cs_dbs_timer+0xc1/0xe0
[   60.277543]        [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0
[   60.277552]        [<ffffffff81063d31>] worker_thread+0x121/0x3a0
[   60.277560]        [<ffffffff8106ae2b>] kthread+0xdb/0xe0
[   60.277569]        [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0
[   60.277580] -> #1 (&j_cdbs->timer_mutex){+.+...}:
[   60.277592]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.277600]        [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410
[   60.277608]        [<ffffffff814b785d>] cs_dbs_timer+0x8d/0xe0
[   60.277616]        [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0
[   60.277624]        [<ffffffff81063d31>] worker_thread+0x121/0x3a0
[   60.277633]        [<ffffffff8106ae2b>] kthread+0xdb/0xe0
[   60.277640]        [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0
[   60.277649] -> #0 ((&(&j_cdbs->work)->work)){+.+...}:
[   60.277661]        [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30
[   60.277669]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.277677]        [<ffffffff810621ed>] flush_work+0x3d/0x280
[   60.277685]        [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120
[   60.277693]        [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20
[   60.277701]        [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0
[   60.277709]        [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20
[   60.277719]        [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100
[   60.277728]        [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0
[   60.277737]        [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c
[   60.277747]        [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110
[   60.277759]        [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10
[   60.277768]        [<ffffffff815a0a68>] _cpu_down+0x88/0x330
[   60.277779]        [<ffffffff815a0d46>] cpu_down+0x36/0x50
[   60.277788]        [<ffffffff815a2748>] store_online+0x98/0xd0
[   60.277796]        [<ffffffff81452a28>] dev_attr_store+0x18/0x30
[   60.277806]        [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150
[   60.277818]        [<ffffffff8116806d>] vfs_write+0xbd/0x1f0
[   60.277826]        [<ffffffff811686fc>] SyS_write+0x4c/0xa0
[   60.277834]        [<ffffffff815bbbbe>] tracesys+0xd0/0xd5
[   60.277842] other info that might help us debug this:

[   60.277848] Chain exists of:
  (&(&j_cdbs->work)->work) --> &j_cdbs->timer_mutex --> cpu_hotplug.lock

[   60.277864]  Possible unsafe locking scenario:

[   60.277869]        CPU0                    CPU1
[   60.277873]        ----                    ----
[   60.277877]   lock(cpu_hotplug.lock);
[   60.277885]                                lock(&j_cdbs->timer_mutex);
[   60.277892]                                lock(cpu_hotplug.lock);
[   60.277900]   lock((&(&j_cdbs->work)->work));
[   60.277907]  *** DEADLOCK ***

[   60.277915] 6 locks held by bash/2225:
[   60.277919]  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffff81168173>] vfs_write+0x1c3/0x1f0
[   60.277937]  #1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff811d9e3c>] sysfs_write_file+0x3c/0x150
[   60.277954]  #2:  (s_active#61){.+.+.+}, at: [<ffffffff811d9ec3>] sysfs_write_file+0xc3/0x150
[   60.277972]  #3:  (x86_cpu_hotplug_driver_mutex){+.+...}, at: [<ffffffff81024cf7>] cpu_hotplug_driver_lock+0x17/0x20
[   60.277990]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff815a0d32>] cpu_down+0x22/0x50
[   60.278007]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
[   60.278023] stack backtrace:
[   60.278031] CPU: 3 PID: 2225 Comm: bash Not tainted 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744
[   60.278037] Hardware name: Acer             Aspire 5741G    /Aspire 5741G    , BIOS V1.20 02/08/2011
[   60.278042]  ffffffff8204e110 ffff88014df6b9f8 ffffffff815b3d90 ffff88014df6ba38
[   60.278055]  ffffffff815b0a8d ffff880150ed3f60 ffff880150ed4770 3871c4002c8980b2
[   60.278068]  ffff880150ed4748 ffff880150ed4770 ffff880150ed3f60 ffff88014df6bb00
[   60.278081] Call Trace:
[   60.278091]  [<ffffffff815b3d90>] dump_stack+0x19/0x1b
[   60.278101]  [<ffffffff815b0a8d>] print_circular_bug+0x2b6/0x2c5
[   60.278111]  [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30
[   60.278123]  [<ffffffff81067e08>] ? __kernel_text_address+0x58/0x80
[   60.278134]  [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.278142]  [<ffffffff810621b5>] ? flush_work+0x5/0x280
[   60.278151]  [<ffffffff810621ed>] flush_work+0x3d/0x280
[   60.278159]  [<ffffffff810621b5>] ? flush_work+0x5/0x280
[   60.278169]  [<ffffffff810a9b14>] ? mark_held_locks+0x94/0x140
[   60.278178]  [<ffffffff81062d77>] ? __cancel_work_timer+0x77/0x120
[   60.278188]  [<ffffffff810a9cbd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[   60.278196]  [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120
[   60.278206]  [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20
[   60.278214]  [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0
[   60.278225]  [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20
[   60.278234]  [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100
[   60.278244]  [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0
[   60.278255]  [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c
[   60.278265]  [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110
[   60.278275]  [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10
[   60.278284]  [<ffffffff815a0a68>] _cpu_down+0x88/0x330
[   60.278292]  [<ffffffff81024cf7>] ? cpu_hotplug_driver_lock+0x17/0x20
[   60.278302]  [<ffffffff815a0d46>] cpu_down+0x36/0x50
[   60.278311]  [<ffffffff815a2748>] store_online+0x98/0xd0
[   60.278320]  [<ffffffff81452a28>] dev_attr_store+0x18/0x30
[   60.278329]  [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150
[   60.278337]  [<ffffffff8116806d>] vfs_write+0xbd/0x1f0
[   60.278347]  [<ffffffff81185950>] ? fget_light+0x320/0x4b0
[   60.278355]  [<ffffffff811686fc>] SyS_write+0x4c/0xa0
[   60.278364]  [<ffffffff815bbbbe>] tracesys+0xd0/0xd5
[   60.280582] smpboot: CPU 1 is now offline

The intention of that commit was to avoid warnings during CPU
hotplug, which indicated that offline CPUs were getting IPIs from the
cpufreq governor's work items.  But the real root-cause of that
problem was commit a66b2e5 (cpufreq: Preserve sysfs files across
suspend/resume) because it totally skipped all the cpufreq callbacks
during CPU hotplug in the suspend/resume path, and hence it never
actually shut down the cpufreq governor's worker threads during CPU
offline in the suspend/resume path.

Reflecting back, the reason why we never suspected that commit as the
root-cause earlier, was that the original issue was reported with
just the halt command and nobody had brought in suspend/resume to the
equation.

The reason for _that_ in turn, as it turns out, is that earlier
halt/shutdown was being done by disabling non-boot CPUs while tasks
were frozen, just like suspend/resume....  but commit cf7df37
(reboot: migrate shutdown/reboot to boot cpu) which came somewhere
along that very same time changed that logic: shutdown/halt no longer
takes CPUs offline.  Thus, the test-cases for reproducing the bug
were vastly different and thus we went totally off the trail.

Overall, it was one hell of a confusion with so many commits
affecting each other and also affecting the symptoms of the problems
in subtle ways.  Finally, now since the original problematic commit
(a66b2e5) has been completely reverted, revert this intermediate fix
too (2f7021a), to fix the CPU hotplug deadlock.  Phew!

Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Peter Wu <lekensteyn@gmail.com>
Cc: 3.10+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
e8d0527
@popcornmix popcornmix pushed a commit that referenced this issue Sep 4, 2013
@rustyrussell Amit Shah + rustyrussell virtio: console: clean up port data immediately at time of unplug
We used to keep the port's char device structs and the /sys entries
around till the last reference to the port was dropped.  This is
actually unnecessary, and resulted in buggy behaviour:

1. Open port in guest
2. Hot-unplug port
3. Hot-plug a port with the same 'name' property as the unplugged one

This resulted in hot-plug being unsuccessful, as a port with the same
name already exists (even though it was unplugged).

This behaviour resulted in a warning message like this one:

-------------------8<---------------------------------------
WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
Hardware name: KVM
sysfs: cannot create duplicate filename
'/devices/pci0000:00/0000:00:04.0/virtio0/virtio-ports/vport0p1'

Call Trace:
 [<ffffffff8106b607>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106b6f6>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff811f2319>] ? sysfs_add_one+0xc9/0x130
 [<ffffffff811f23e8>] ? create_dir+0x68/0xb0
 [<ffffffff811f2469>] ? sysfs_create_dir+0x39/0x50
 [<ffffffff81273129>] ? kobject_add_internal+0xb9/0x260
 [<ffffffff812733d8>] ? kobject_add_varg+0x38/0x60
 [<ffffffff812734b4>] ? kobject_add+0x44/0x70
 [<ffffffff81349de4>] ? get_device_parent+0xf4/0x1d0
 [<ffffffff8134b389>] ? device_add+0xc9/0x650

-------------------8<---------------------------------------

Instead of relying on guest applications to release all references to
the ports, we should go ahead and unregister the port from all the core
layers.  Any open/read calls on the port will then just return errors,
and an unplug/plug operation on the host will succeed as expected.

This also caused buggy behaviour in case of the device removal (not just
a port): when the device was removed (which means all ports on that
device are removed automatically as well), the ports with active
users would clean up only when the last references were dropped -- and
it would be too late then to be referencing char device pointers,
resulting in oopses:

-------------------8<---------------------------------------
PID: 6162   TASK: ffff8801147ad500  CPU: 0   COMMAND: "cat"
 #0 [ffff88011b9d5a90] machine_kexec at ffffffff8103232b
 #1 [ffff88011b9d5af0] crash_kexec at ffffffff810b9322
 #2 [ffff88011b9d5bc0] oops_end at ffffffff814f4a50
 #3 [ffff88011b9d5bf0] die at ffffffff8100f26b
 #4 [ffff88011b9d5c20] do_general_protection at ffffffff814f45e2
 #5 [ffff88011b9d5c50] general_protection at ffffffff814f3db5
    [exception RIP: strlen+2]
    RIP: ffffffff81272ae2  RSP: ffff88011b9d5d00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff880118901c18  RCX: 0000000000000000
    RDX: ffff88011799982c  RSI: 00000000000000d0  RDI: 3a303030302f3030
    RBP: ffff88011b9d5d38   R8: 0000000000000006   R9: ffffffffa0134500
    R10: 0000000000001000  R11: 0000000000001000  R12: ffff880117a1cc10
    R13: 00000000000000d0  R14: 0000000000000017  R15: ffffffff81aff700
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff88011b9d5d00] kobject_get_path at ffffffff8126dc5d
 #7 [ffff88011b9d5d40] kobject_uevent_env at ffffffff8126e551
 #8 [ffff88011b9d5dd0] kobject_uevent at ffffffff8126e9eb
 #9 [ffff88011b9d5de0] device_del at ffffffff813440c7

-------------------8<---------------------------------------

So clean up when we have all the context, and all that's left to do when
the references to the port have dropped is to free up the port struct
itself.

CC: <stable@vger.kernel.org>
Reported-by: chayang <chayang@redhat.com>
Reported-by: YOGANANTH SUBRAMANIAN <anantyog@in.ibm.com>
Reported-by: FuXiangChun <xfu@redhat.com>
Reported-by: Qunfang Zhang <qzhang@redhat.com>
Reported-by: Sibiao Luo <sluo@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
ea3768b
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Sep 10, 2013
@peterhurley @gregkh peterhurley + gregkh tty: Fix lock order in tty_do_resize()
Commits 6a1c068 and
9356b53, respectively
  'tty: Convert termios_mutex to termios_rwsem' and
  'n_tty: Access termios values safely'
introduced a circular lock dependency with console_lock and
termios_rwsem.

The lockdep report [1] shows that n_tty_write() will attempt
to claim console_lock while holding the termios_rwsem, whereas
tty_do_resize() may already hold the console_lock while
claiming the termios_rwsem.

Since n_tty_write() and tty_do_resize() do not contend
over the same data -- the tty->winsize structure -- correct
the lock dependency by introducing a new lock which
specifically serializes access to tty->winsize only.

[1] Lockdep report

======================================================
[ INFO: possible circular locking dependency detected ]
3.10.0-0+tip-xeon+lockdep #0+tip Not tainted
-------------------------------------------------------
modprobe/277 is trying to acquire lock:
 (&tty->termios_rwsem){++++..}, at: [<ffffffff81452656>] tty_do_resize+0x36/0xe0

but task is already holding lock:
 ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 ((fb_notifier_list).rwsem){.+.+.+}:
       [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
       [<ffffffff8175b797>] down_read+0x47/0x5c
       [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0
       [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20
       [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320
       [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper]
       [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau]
       [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau]
       [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm]
       [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau]
       [<ffffffff813b13db>] local_pci_probe+0x4b/0x80
       [<ffffffff813b1701>] pci_device_probe+0x111/0x120
       [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0
       [<ffffffff81497bab>] __driver_attach+0xab/0xb0
       [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0
       [<ffffffff814971fe>] driver_attach+0x1e/0x20
       [<ffffffff81496cc1>] bus_add_driver+0x111/0x290
       [<ffffffff814982b7>] driver_register+0x77/0x170
       [<ffffffff813b0454>] __pci_register_driver+0x64/0x70
       [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm]
       [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau]
       [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
       [<ffffffff810c54cb>] load_module+0x123b/0x1bf0
       [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120
       [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b

-> #1 (console_lock){+.+.+.}:
       [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
       [<ffffffff810430a7>] console_lock+0x77/0x80
       [<ffffffff8146b2a1>] con_flush_chars+0x31/0x50
       [<ffffffff8145780c>] n_tty_write+0x1ec/0x4d0
       [<ffffffff814541b9>] tty_write+0x159/0x2e0
       [<ffffffff814543f5>] redirected_tty_write+0xb5/0xc0
       [<ffffffff811ab9d5>] vfs_write+0xc5/0x1f0
       [<ffffffff811abec5>] SyS_write+0x55/0xa0
       [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b

-> #0 (&tty->termios_rwsem){++++..}:
       [<ffffffff810b65c3>] __lock_acquire+0x1c43/0x1d30
       [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
       [<ffffffff8175b724>] down_write+0x44/0x70
       [<ffffffff81452656>] tty_do_resize+0x36/0xe0
       [<ffffffff8146c841>] vc_do_resize+0x3e1/0x4c0
       [<ffffffff8146c99f>] vc_resize+0x1f/0x30
       [<ffffffff813e4535>] fbcon_init+0x385/0x5a0
       [<ffffffff8146a4bc>] visual_init+0xbc/0x120
       [<ffffffff8146cd13>] do_bind_con_driver+0x163/0x320
       [<ffffffff8146cfa1>] do_take_over_console+0x61/0x70
       [<ffffffff813e2b93>] do_fbcon_takeover+0x63/0xc0
       [<ffffffff813e67a5>] fbcon_event_notify+0x715/0x820
       [<ffffffff81762f9d>] notifier_call_chain+0x5d/0x110
       [<ffffffff8107aadc>] __blocking_notifier_call_chain+0x6c/0xc0
       [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20
       [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320
       [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper]
       [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau]
       [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau]
       [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm]
       [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau]
       [<ffffffff813b13db>] local_pci_probe+0x4b/0x80
       [<ffffffff813b1701>] pci_device_probe+0x111/0x120
       [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0
       [<ffffffff81497bab>] __driver_attach+0xab/0xb0
       [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0
       [<ffffffff814971fe>] driver_attach+0x1e/0x20
       [<ffffffff81496cc1>] bus_add_driver+0x111/0x290
       [<ffffffff814982b7>] driver_register+0x77/0x170
       [<ffffffff813b0454>] __pci_register_driver+0x64/0x70
       [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm]
       [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau]
       [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
       [<ffffffff810c54cb>] load_module+0x123b/0x1bf0
       [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120
       [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

Chain exists of:
  &tty->termios_rwsem --> console_lock --> (fb_notifier_list).rwsem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((fb_notifier_list).rwsem);
                               lock(console_lock);
                               lock((fb_notifier_list).rwsem);
  lock(&tty->termios_rwsem);

 *** DEADLOCK ***

7 locks held by modprobe/277:
 #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff81497b5b>] __driver_attach+0x5b/0xb0
 #1:  (&__lockdep_no_validate__){......}, at: [<ffffffff81497b69>] __driver_attach+0x69/0xb0
 #2:  (drm_global_mutex){+.+.+.}, at: [<ffffffffa008a6dd>] drm_get_pci_dev+0xbd/0x2a0 [drm]
 #3:  (registration_lock){+.+.+.}, at: [<ffffffff813d93f5>] register_framebuffer+0x25/0x320
 #4:  (&fb_info->lock){+.+.+.}, at: [<ffffffff813d8116>] lock_fb_info+0x26/0x60
 #5:  (console_lock){+.+.+.}, at: [<ffffffff813d95a4>] register_framebuffer+0x1d4/0x320
 #6:  ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0

stack backtrace:
CPU: 0 PID: 277 Comm: modprobe Not tainted 3.10.0-0+tip-xeon+lockdep #0+tip
Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 04/30/2012
 ffffffff8213e5e0 ffff8802aa2fb298 ffffffff81755f19 ffff8802aa2fb2e8
 ffffffff8174f506 ffff8802aa2fa000 ffff8802aa2fb378 ffff8802aa2ea8e8
 ffff8802aa2ea910 ffff8802aa2ea8e8 0000000000000006 0000000000000007
Call Trace:
 [<ffffffff81755f19>] dump_stack+0x19/0x1b
 [<ffffffff8174f506>] print_circular_bug+0x1fb/0x20c
 [<ffffffff810b65c3>] __lock_acquire+0x1c43/0x1d30
 [<ffffffff810b775e>] ? mark_held_locks+0xae/0x120
 [<ffffffff810b78d5>] ? trace_hardirqs_on_caller+0x105/0x1d0
 [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
 [<ffffffff81452656>] ? tty_do_resize+0x36/0xe0
 [<ffffffff8175b724>] down_write+0x44/0x70
 [<ffffffff81452656>] ? tty_do_resize+0x36/0xe0
 [<ffffffff81452656>] tty_do_resize+0x36/0xe0
 [<ffffffff8146c841>] vc_do_resize+0x3e1/0x4c0
 [<ffffffff8146c99f>] vc_resize+0x1f/0x30
 [<ffffffff813e4535>] fbcon_init+0x385/0x5a0
 [<ffffffff8146a4bc>] visual_init+0xbc/0x120
 [<ffffffff8146cd13>] do_bind_con_driver+0x163/0x320
 [<ffffffff8146cfa1>] do_take_over_console+0x61/0x70
 [<ffffffff813e2b93>] do_fbcon_takeover+0x63/0xc0
 [<ffffffff813e67a5>] fbcon_event_notify+0x715/0x820
 [<ffffffff81762f9d>] notifier_call_chain+0x5d/0x110
 [<ffffffff8107aadc>] __blocking_notifier_call_chain+0x6c/0xc0
 [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20
 [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20
 [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320
 [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper]
 [<ffffffff8173cbcb>] ? kmemleak_alloc+0x5b/0xc0
 [<ffffffff81198874>] ? kmem_cache_alloc_trace+0x104/0x290
 [<ffffffffa01035e1>] ? drm_fb_helper_single_add_all_connectors+0x81/0xf0 [drm_kms_helper]
 [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau]
 [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau]
 [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm]
 [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau]
 [<ffffffff8175f162>] ? _raw_spin_unlock_irqrestore+0x42/0x80
 [<ffffffff813b13db>] local_pci_probe+0x4b/0x80
 [<ffffffff813b1701>] pci_device_probe+0x111/0x120
 [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0
 [<ffffffff81497bab>] __driver_attach+0xab/0xb0
 [<ffffffff81497b00>] ? driver_probe_device+0x3a0/0x3a0
 [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0
 [<ffffffff814971fe>] driver_attach+0x1e/0x20
 [<ffffffff81496cc1>] bus_add_driver+0x111/0x290
 [<ffffffffa022a000>] ? 0xffffffffa0229fff
 [<ffffffff814982b7>] driver_register+0x77/0x170
 [<ffffffffa022a000>] ? 0xffffffffa0229fff
 [<ffffffff813b0454>] __pci_register_driver+0x64/0x70
 [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm]
 [<ffffffffa022a000>] ? 0xffffffffa0229fff
 [<ffffffffa022a000>] ? 0xffffffffa0229fff
 [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau]
 [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
 [<ffffffff810c54cb>] load_module+0x123b/0x1bf0
 [<ffffffff81399a50>] ? ddebug_proc_open+0xb0/0xb0
 [<ffffffff813855ae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120
 [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dee4a0b
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Sep 10, 2013
@jf6b @rolandd jf6b + rolandd IPoIB: Fix race in deleting ipoib_neigh entries
In several places, this snippet is used when removing neigh entries:

	list_del(&neigh->list);
	ipoib_neigh_free(neigh);

The list_del() removes neigh from the associated struct ipoib_path, while
ipoib_neigh_free() removes neigh from the device's neigh entry lookup
table.  Both of these operations are protected by the priv->lock
spinlock.  The table however is also protected via RCU, and so naturally
the lock is not held when doing reads.

This leads to a race condition, in which a thread may successfully look
up a neigh entry that has already been deleted from neigh->list.  Since
the previous deletion will have marked the entry with poison, a second
list_del() on the object will cause a panic:

  #5 [ffff8802338c3c70] general_protection at ffffffff815108c5
     [exception RIP: list_del+16]
     RIP: ffffffff81289020  RSP: ffff8802338c3d20  RFLAGS: 00010082
     RAX: dead000000200200  RBX: ffff880433e60c88  RCX: 0000000000009e6c
     RDX: 0000000000000246  RSI: ffff8806012ca298  RDI: ffff880433e60c88
     RBP: ffff8802338c3d30   R8: ffff8806012ca2e8   R9: 00000000ffffffff
     R10: 0000000000000001  R11: 0000000000000000  R12: ffff8804346b2020
     R13: ffff88032a3e7540  R14: ffff8804346b26e0  R15: 0000000000000246
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib]
  #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm]
  #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm]
  #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10
 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66
 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca

We move the list_del() into ipoib_neigh_free(), so that deletion happens
only once, after the entry has been successfully removed from the lookup
table.  This same behavior is already used in ipoib_del_neighs_by_gid()
and __ipoib_reap_neigh().

Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
49b8e74
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Sep 10, 2013
@torvalds torvalds Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "The first patch is to address a long standing issue where INQUIRY
  vendor + model response data was not correctly padded with ASCII
  spaces, causing MSFT and Falconstor multipath stacks to not function
  with our LUNs.

  The second -> forth patches are additional iscsi-target regression
  fixes for the post >= v3.10 iser-target changes.  The second and third
  are failure cases that have appeared during further testing, and the
  forth is only reproducible with malformed NOP packets.

  The fifth patch is a v3.11 specific regression caused by a recent
  optimization that showed up during WRITE I/O failure testing.

  I'll be sending Patch #1 and Patch #5 to Greg-KH separately for
  stable"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Fix se_cmd->state_list leak regression during WRITE failure
  iscsi-target: Fix potential NULL pointer in solicited NOPOUT reject
  iscsi-target: Fix iscsit_transport reference leak during NP thread reset
  iscsi-target: Fix ImmediateData=Yes failure regression in >= v3.10
  target: Fix trailing ASCII space usage in INQUIRY vendor+model
dcaaaea
@popcornmix popcornmix pushed a commit that referenced this issue Oct 22, 2013
@vineetgarc @gregkh vineetgarc + gregkh ARC: Workaround spinlock livelock in SMP SystemC simulation
commit 6c00350 upstream.

Some ARC SMP systems lack native atomic R-M-W (LLOCK/SCOND) insns and
can only use atomic EX insn (reg with mem) to build higher level R-M-W
primitives. This includes a SystemC based SMP simulation model.

So rwlocks need to use a protecting spinlock for atomic cmp-n-exchange
operation to update reader(s)/writer count.

The spinlock operation itself looks as follows:

	mov reg, 1		; 1=locked, 0=unlocked
retry:
	EX reg, [lock]		; load existing, store 1, atomically
	BREQ reg, 1, rety	; if already locked, retry

In single-threaded simulation, SystemC alternates between the 2 cores
with "N" insn each based scheduling. Additionally for insn with global
side effect, such as EX writing to shared mem, a core switch is
enforced too.

Given that, 2 cores doing a repeated EX on same location, Linux often
got into a livelock e.g. when both cores were fiddling with tasklist
lock (gdbserver / hackbench) for read/write respectively as the
sequence diagram below shows:

           core1                                   core2
         --------                                --------
1. spin lock [EX r=0, w=1] - LOCKED
2. rwlock(Read)            - LOCKED
3. spin unlock  [ST 0]     - UNLOCKED
                                         spin lock [EX r=0,w=1] - LOCKED
                      -- resched core 1----

5. spin lock [EX r=1] - ALREADY-LOCKED

                      -- resched core 2----
6.                                       rwlock(Write) - READER-LOCKED
7.                                       spin unlock [ST 0]
8.                                       rwlock failed, retry again

9.                                       spin lock  [EX r=0, w=1]
                      -- resched core 1----

10  spinlock locked in #9, retry #5
11. spin lock [EX gets 1]
                      -- resched core 2----
...
...

The fix was to unlock using the EX insn too (step 7), to trigger another
SystemC scheduling pass which would let core1 proceed, eliding the
livelock.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7e0e1ac
@popcornmix popcornmix pushed a commit that referenced this issue Oct 22, 2013
@vineetgarc @gregkh vineetgarc + gregkh ARC: Workaround spinlock livelock in SMP SystemC simulation
commit 6c00350 upstream.

Some ARC SMP systems lack native atomic R-M-W (LLOCK/SCOND) insns and
can only use atomic EX insn (reg with mem) to build higher level R-M-W
primitives. This includes a SystemC based SMP simulation model.

So rwlocks need to use a protecting spinlock for atomic cmp-n-exchange
operation to update reader(s)/writer count.

The spinlock operation itself looks as follows:

	mov reg, 1		; 1=locked, 0=unlocked
retry:
	EX reg, [lock]		; load existing, store 1, atomically
	BREQ reg, 1, rety	; if already locked, retry

In single-threaded simulation, SystemC alternates between the 2 cores
with "N" insn each based scheduling. Additionally for insn with global
side effect, such as EX writing to shared mem, a core switch is
enforced too.

Given that, 2 cores doing a repeated EX on same location, Linux often
got into a livelock e.g. when both cores were fiddling with tasklist
lock (gdbserver / hackbench) for read/write respectively as the
sequence diagram below shows:

           core1                                   core2
         --------                                --------
1. spin lock [EX r=0, w=1] - LOCKED
2. rwlock(Read)            - LOCKED
3. spin unlock  [ST 0]     - UNLOCKED
                                         spin lock [EX r=0,w=1] - LOCKED
                      -- resched core 1----

5. spin lock [EX r=1] - ALREADY-LOCKED

                      -- resched core 2----
6.                                       rwlock(Write) - READER-LOCKED
7.                                       spin unlock [ST 0]
8.                                       rwlock failed, retry again

9.                                       spin lock  [EX r=0, w=1]
                      -- resched core 1----

10  spinlock locked in #9, retry #5
11. spin lock [EX gets 1]
                      -- resched core 2----
...
...

The fix was to unlock using the EX insn too (step 7), to trigger another
SystemC scheduling pass which would let core1 proceed, eliding the
livelock.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0c06a0a
@popcornmix popcornmix pushed a commit that referenced this issue Nov 4, 2013
@libin2015 libin2015 + Ingo Molnar x86/smpboot: Fix announce_cpu() to printk() the last "OK" properly
When booting secondary CPUs, announce_cpu() is called to show which cpu has
been brought up. For example:

[    0.402751] smpboot: Booting Node   0, Processors  #1 #2 #3 #4 #5 OK
[    0.525667] smpboot: Booting Node   1, Processors  #6 #7 #8 #9 #10 #11 OK
[    0.755592] smpboot: Booting Node   0, Processors  #12 #13 #14 #15 #16 #17 OK
[    0.890495] smpboot: Booting Node   1, Processors  #18 #19 #20 #21 #22 #23

But the last "OK" is lost, because 'nr_cpu_ids-1' represents the maximum
possible cpu id. It should use the maximum present cpu id in case not all
CPUs booted up.

Signed-off-by: Libin <huawei.libin@huawei.com>
Cc: <guohanjun@huawei.com>
Cc: <wangyijing@huawei.com>
Cc: <fenghua.yu@intel.com>
Cc: <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1378378676-18276-1-git-send-email-huawei.libin@huawei.com
[ tweaked the changelog, removed unnecessary line break, tweaked the format to align the fields vertically. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5223948
@popcornmix popcornmix pushed a commit that referenced this issue Nov 4, 2013
Adrian Hunter + Arnaldo Carvalho de Melo perf annotate: Fix objdump line parsing offset validation
When parsing lines from objdump a line containing source code starting
with a numeric label is mistaken for a line of disassembly starting with
a memory address.

Current validation fails to recognise that the "memory address" is out
of range and calculates an invalid offset which later causes this
segfault:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50)
    at util/annotate.c:631
631				hits += h->addr[offset++];
(gdb) bt
 #0  0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50)
    at util/annotate.c:631
 #1  0x00000000004d65e3 in annotate_browser__calc_percent (browser=0x7fffffffd130, evsel=0xa01da0) at ui/browsers/annotate.c:364
 #2  0x00000000004d7433 in annotate_browser__run (browser=0x7fffffffd130, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:672
 #3  0x00000000004d80c9 in symbol__tui_annotate (sym=0xc989a0, map=0xa02660, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:962
 #4  0x00000000004d7aa0 in hist_entry__tui_annotate (he=0xdf73f0, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:823
 #5  0x00000000004dd648 in perf_evsel__hists_browse (evsel=0xa01da0, nr_events=1, helpline=
    0x58b768 "For a higher level overview, try: perf report --sort comm,dso", ev_name=0xa02cd0 "cycles", left_exits=false, hbt=
    0x0, min_pcnt=0, env=0xa011e0) at ui/browsers/hists.c:1659
 #6  0x00000000004de372 in perf_evlist__tui_browse_hists (evlist=0xa01520, help=
    0x58b768 "For a higher level overview, try: perf report --sort comm,dso", hbt=0x0, min_pcnt=0, env=0xa011e0)
    at ui/browsers/hists.c:1950
 #7  0x000000000042cf6b in __cmd_report (rep=0x7fffffffd6c0) at builtin-report.c:581
 #8  0x000000000042e25d in cmd_report (argc=0, argv=0x7fffffffe4b0, prefix=0x0) at builtin-report.c:965
 #9  0x000000000041a0e1 in run_builtin (p=0x801548, argc=1, argv=0x7fffffffe4b0) at perf.c:319
 #10 0x000000000041a319 in handle_internal_command (argc=1, argv=0x7fffffffe4b0) at perf.c:376
 #11 0x000000000041a465 in run_argv (argcp=0x7fffffffe38c, argv=0x7fffffffe380) at perf.c:420
 #12 0x000000000041a707 in main (argc=1, argv=0x7fffffffe4b0) at perf.c:521

After the fix is applied the symbol can be annotated showing the
problematic line "1:      rep"

copy_user_generic_string  /usr/lib/debug/lib/modules/3.9.10-100.fc17.x86_64/vmlinux
             */
            ENTRY(copy_user_generic_string)
                    CFI_STARTPROC
                    ASM_STAC
                    andl %edx,%edx
              and    %edx,%edx
                    jz 4f
              je     37
                    cmpl $8,%edx
              cmp    $0x8,%edx
                    jb 2f           /* less than 8 bytes, go to byte copy loop */
              jb     33
                    ALIGN_DESTINATION
              mov    %edi,%ecx
              and    $0x7,%ecx
              je     28
              sub    $0x8,%ecx
              neg    %ecx
              sub    %ecx,%edx
        1a:   mov    (%rsi),%al
              mov    %al,(%rdi)
              inc    %rsi
              inc    %rdi
              dec    %ecx
              jne    1a
                    movl %edx,%ecx
        28:   mov    %edx,%ecx
                    shrl $3,%ecx
              shr    $0x3,%ecx
                    andl $7,%edx
              and    $0x7,%edx
            1:      rep
100.00        rep    movsq %ds:(%rsi),%es:(%rdi)
                    movsq
            2:      movl %edx,%ecx
        33:   mov    %edx,%ecx
            3:      rep
              rep    movsb %ds:(%rsi),%es:(%rdi)
                    movsb
            4:      xorl %eax,%eax
        37:   xor    %eax,%eax
              data32 xchg %ax,%ax
                    ASM_CLAC
                    ret
              retq

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1379009721-27667-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
886b37b
@popcornmix popcornmix pushed a commit that referenced this issue Nov 4, 2013
@vineetgarc vineetgarc ARC: Workaround spinlock livelock in SMP SystemC simulation
Some ARC SMP systems lack native atomic R-M-W (LLOCK/SCOND) insns and
can only use atomic EX insn (reg with mem) to build higher level R-M-W
primitives. This includes a SystemC based SMP simulation model.

So rwlocks need to use a protecting spinlock for atomic cmp-n-exchange
operation to update reader(s)/writer count.

The spinlock operation itself looks as follows:

	mov reg, 1		; 1=locked, 0=unlocked
retry:
	EX reg, [lock]		; load existing, store 1, atomically
	BREQ reg, 1, rety	; if already locked, retry

In single-threaded simulation, SystemC alternates between the 2 cores
with "N" insn each based scheduling. Additionally for insn with global
side effect, such as EX writing to shared mem, a core switch is
enforced too.

Given that, 2 cores doing a repeated EX on same location, Linux often
got into a livelock e.g. when both cores were fiddling with tasklist
lock (gdbserver / hackbench) for read/write respectively as the
sequence diagram below shows:

           core1                                   core2
         --------                                --------
1. spin lock [EX r=0, w=1] - LOCKED
2. rwlock(Read)            - LOCKED
3. spin unlock  [ST 0]     - UNLOCKED
                                         spin lock [EX r=0,w=1] - LOCKED
                      -- resched core 1----

5. spin lock [EX r=1] - ALREADY-LOCKED

                      -- resched core 2----
6.                                       rwlock(Write) - READER-LOCKED
7.                                       spin unlock [ST 0]
8.                                       rwlock failed, retry again

9.                                       spin lock  [EX r=0, w=1]
                      -- resched core 1----

10  spinlock locked in #9, retry #5
11. spin lock [EX gets 1]
                      -- resched core 2----
...
...

The fix was to unlock using the EX insn too (step 7), to trigger another
SystemC scheduling pass which would let core1 proceed, eliding the
livelock.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
6c00350
@popcornmix popcornmix pushed a commit that referenced this issue Nov 4, 2013
John Johansen + James Morris apparmor: fix suspicious RCU usage warning in policy.c/policy.h
The recent 3.12 pull request for apparmor was missing a couple rcu _protected
access modifiers. Resulting in the follow suspicious RCU usage

 [   29.804534] [ INFO: suspicious RCU usage. ]
 [   29.804539] 3.11.0+ #5 Not tainted
 [   29.804541] -------------------------------
 [   29.804545] security/apparmor/include/policy.h:363 suspicious rcu_dereference_check() usage!
 [   29.804548]
 [   29.804548] other info that might help us debug this:
 [   29.804548]
 [   29.804553]
 [   29.804553] rcu_scheduler_active = 1, debug_locks = 1
 [   29.804558] 2 locks held by apparmor_parser/1268:
 [   29.804560]  #0:  (sb_writers#9){.+.+.+}, at: [<ffffffff81120a4c>] file_start_write+0x27/0x29
 [   29.804576]  #1:  (&ns->lock){+.+.+.}, at: [<ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c
 [   29.804589]
 [   29.804589] stack backtrace:
 [   29.804595] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5
 [   29.804599] Hardware name: ASUSTeK Computer Inc.         UL50VT          /UL50VT    , BIOS 217     03/01/2010
 [   29.804602]  0000000000000000 ffff8800b95a1d90 ffffffff8144eb9b ffff8800b94db540
 [   29.804611]  ffff8800b95a1dc0 ffffffff81087439 ffff880138cc3a18 ffff880138cc3a18
 [   29.804619]  ffff8800b9464a90 ffff880138cc3a38 ffff8800b95a1df0 ffffffff811f5084
 [   29.804628] Call Trace:
 [   29.804636]  [<ffffffff8144eb9b>] dump_stack+0x4e/0x82
 [   29.804642]  [<ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105
 [   29.804649]  [<ffffffff811f5084>] __aa_update_replacedby+0x53/0x7f
 [   29.804655]  [<ffffffff811f5408>] __replace_profile+0x11f/0x1ed
 [   29.804661]  [<ffffffff811f6032>] aa_replace_profiles+0x410/0x57c
 [   29.804668]  [<ffffffff811f16d4>] profile_replace+0x35/0x4c
 [   29.804674]  [<ffffffff81120fa3>] vfs_write+0xad/0x113
 [   29.804680]  [<ffffffff81121609>] SyS_write+0x44/0x7a
 [   29.804687]  [<ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b
 [   29.804691]
 [   29.804694] ===============================
 [   29.804697] [ INFO: suspicious RCU usage. ]
 [   29.804700] 3.11.0+ #5 Not tainted
 [   29.804703] -------------------------------
 [   29.804706] security/apparmor/policy.c:566 suspicious rcu_dereference_check() usage!
 [   29.804709]
 [   29.804709] other info that might help us debug this:
 [   29.804709]
 [   29.804714]
 [   29.804714] rcu_scheduler_active = 1, debug_locks = 1
 [   29.804718] 2 locks held by apparmor_parser/1268:
 [   29.804721]  #0:  (sb_writers#9){.+.+.+}, at: [<ffffffff81120a4c>] file_start_write+0x27/0x29
 [   29.804733]  #1:  (&ns->lock){+.+.+.}, at: [<ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c
 [   29.804744]
 [   29.804744] stack backtrace:
 [   29.804750] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5
 [   29.804753] Hardware name: ASUSTeK Computer Inc.         UL50VT          /UL50VT    , BIOS 217     03/01/2010
 [   29.804756]  0000000000000000 ffff8800b95a1d80 ffffffff8144eb9b ffff8800b94db540
 [   29.804764]  ffff8800b95a1db0 ffffffff81087439 ffff8800b95b02b0 0000000000000000
 [   29.804772]  ffff8800b9efba08 ffff880138cc3a38 ffff8800b95a1dd0 ffffffff811f4f94
 [   29.804779] Call Trace:
 [   29.804786]  [<ffffffff8144eb9b>] dump_stack+0x4e/0x82
 [   29.804791]  [<ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105
 [   29.804798]  [<ffffffff811f4f94>] aa_free_replacedby_kref+0x4d/0x62
 [   29.804804]  [<ffffffff811f4f47>] ? aa_put_namespace+0x17/0x17
 [   29.804810]  [<ffffffff811f4f0b>] kref_put+0x36/0x40
 [   29.804816]  [<ffffffff811f5423>] __replace_profile+0x13a/0x1ed
 [   29.804822]  [<ffffffff811f6032>] aa_replace_profiles+0x410/0x57c
 [   29.804829]  [<ffffffff811f16d4>] profile_replace+0x35/0x4c
 [   29.804835]  [<ffffffff81120fa3>] vfs_write+0xad/0x113
 [   29.804840]  [<ffffffff81121609>] SyS_write+0x44/0x7a
 [   29.804847]  [<ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b

Reported-by: miles.lane@gmail.com
CC: paulmck@linux.vnet.ibm.com
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
4cd4fc7
@popcornmix popcornmix pushed a commit that referenced this issue Nov 4, 2013
Dave Chinner + Ben Myers xfs: lockdep needs to know about 3 dquot-deep nesting
Michael Semon reported that xfs/299 generated this lockdep warning:

=============================================
[ INFO: possible recursive locking detected ]
3.12.0-rc2+ #2 Not tainted
---------------------------------------------
touch/21072 is trying to acquire lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

but task is already holding lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xfs_dquot_other_class);
  lock(&xfs_dquot_other_class);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

7 locks held by touch/21072:
 #0:  (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e
 #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40
 #2:  (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35
 #3:  (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1
 #4:  (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f
 #5:  (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
 #6:  (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

The lockdep annotation for dquot lock nesting only understands
locking for user and "other" dquots, not user, group and quota
dquots. Fix the annotations to match the locking heirarchy we now
have.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit f112a04)
89c6c89
@popcornmix popcornmix pushed a commit that referenced this issue Nov 4, 2013
@ldorau @htejun ldorau + htejun libahci: fix turning on LEDs in ahci_start_port()
If EM Transmit bit is busy during init ata_msleep() is called.  It is
wrong - msleep() should be used instead of ata_msleep(), because if EM
Transmit bit is busy for one port, it will be busy for all other ports
too, so using ata_msleep() causes wasting tries for another ports.

The most common scenario looks like that now
(six ports try to transmit a LED meaasege):
- port #0 tries for the 1st time and succeeds
- ports #1-5 try for the 1st time and sleeps
- port #1 tries for the 2nd time and succeeds
- ports #2-5 try for the 2nd time and sleeps
- port #2 tries for the 3rd time and succeeds
- ports #3-5 try for the 3rd time and sleeps
- port #3 tries for the 4th time and succeeds
- ports #4-5 try for the 4th time and sleeps
- port #4 tries for the 5th time and succeeds
- port #5 tries for the 5th time and sleeps

At this moment port #5 wasted all its five tries and failed to
initialize.  Because there are only 5 (EM_MAX_RETRY) tries available
usually only five ports succeed to initialize. The sixth port and next
ones usually will fail.

If msleep() is used instead of ata_msleep() the first port succeeds to
initialize in the first try and next ones usually succeed to
initialize in the second try.

tj: updated comment

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
fa070ee
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 22, 2013
Borislav Petkov + Ingo Molnar x86: Improve the printout of the SMP bootup CPU table
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
646e29a
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 22, 2013
Dave Chinner + Ben Myers xfs: lockdep needs to know about 3 dquot-deep nesting
Michael Semon reported that xfs/299 generated this lockdep warning:

=============================================
[ INFO: possible recursive locking detected ]
3.12.0-rc2+ #2 Not tainted
---------------------------------------------
touch/21072 is trying to acquire lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

but task is already holding lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xfs_dquot_other_class);
  lock(&xfs_dquot_other_class);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

7 locks held by touch/21072:
 #0:  (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e
 #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40
 #2:  (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35
 #3:  (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1
 #4:  (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f
 #5:  (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
 #6:  (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

The lockdep annotation for dquot lock nesting only understands
locking for user and "other" dquots, not user, group and quota
dquots. Fix the annotations to match the locking heirarchy we now
have.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
f112a04
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 22, 2013
Borislav Petkov + Ingo Molnar x86/boot: Further compress CPUs bootup message
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
a17bce4
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 22, 2013
@gregkh Fabio Estevam + gregkh imx-drm: imx-drm-core: Fix circular locking dependency
Booting a mx6 with CONFIG_PROVE_LOCKING we get:

======================================================
[ INFO: possible circular locking dependency detected ]
3.12.0-rc4-next-20131009+ #34 Not tainted
-------------------------------------------------------
swapper/0/1 is trying to acquire lock:
 (&imx_drm_device->mutex){+.+.+.}, at: [<804575a8>] imx_drm_encoder_get_mux_id+0x28/0x98

but task is already holding lock:
 (&crtc->mutex){+.+...}, at: [<802fe778>] drm_modeset_lock_all+0x40/0x54

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&crtc->mutex){+.+...}:
       [<800777d0>] __lock_acquire+0x18d4/0x1c24
       [<80077fec>] lock_acquire+0x68/0x7c
       [<805ead5c>] _mutex_lock_nest_lock+0x58/0x3a8
       [<802fec50>] drm_crtc_init+0x48/0xa8
       [<80457c88>] imx_drm_add_crtc+0xd4/0x144
       [<8045e2e8>] ipu_drm_probe+0x114/0x1fc
       [<80312278>] platform_drv_probe+0x20/0x50
       [<80310c68>] driver_probe_device+0x110/0x22c
       [<80310e20>] __driver_attach+0x9c/0xa0
       [<8030f218>] bus_for_each_dev+0x5c/0x90
       [<80310750>] driver_attach+0x20/0x28
       [<8031034c>] bus_add_driver+0xdc/0x1dc
       [<803114d8>] driver_register+0x80/0xfc
       [<80312198>] __platform_driver_register+0x50/0x64
       [<808172fc>] ipu_drm_driver_init+0x18/0x20
       [<800088c0>] do_one_initcall+0xfc/0x160
       [<807e7c5c>] kernel_init_freeable+0x104/0x1d4
       [<805e2930>] kernel_init+0x10/0xec
       [<8000ea68>] ret_from_fork+0x14/0x2c

-> #1 (&dev->mode_config.mutex){+.+.+.}:
       [<800777d0>] __lock_acquire+0x18d4/0x1c24
       [<80077fec>] lock_acquire+0x68/0x7c
       [<805eb100>] mutex_lock_nested+0x54/0x3a4
       [<802fe758>] drm_modeset_lock_all+0x20/0x54
       [<802fead4>] drm_encoder_init+0x20/0x7c
       [<80457ae4>] imx_drm_add_encoder+0x88/0xec
       [<80459838>] imx_ldb_probe+0x344/0x4fc
       [<80312278>] platform_drv_probe+0x20/0x50
       [<80310c68>] driver_probe_device+0x110/0x22c
       [<80310e20>] __driver_attach+0x9c/0xa0
       [<8030f218>] bus_for_each_dev+0x5c/0x90
       [<80310750>] driver_attach+0x20/0x28
       [<8031034c>] bus_add_driver+0xdc/0x1dc
       [<803114d8>] driver_register+0x80/0xfc
       [<80312198>] __platform_driver_register+0x50/0x64
       [<8081722c>] imx_ldb_driver_init+0x18/0x20
       [<800088c0>] do_one_initcall+0xfc/0x160
       [<807e7c5c>] kernel_init_freeable+0x104/0x1d4
       [<805e2930>] kernel_init+0x10/0xec
       [<8000ea68>] ret_from_fork+0x14/0x2c

-> #0 (&imx_drm_device->mutex){+.+.+.}:
       [<805e510c>] print_circular_bug+0x74/0x2e0
       [<80077ad0>] __lock_acquire+0x1bd4/0x1c24
       [<80077fec>] lock_acquire+0x68/0x7c
       [<805eb100>] mutex_lock_nested+0x54/0x3a4
       [<804575a8>] imx_drm_encoder_get_mux_id+0x28/0x98
       [<80459a98>] imx_ldb_encoder_prepare+0x34/0x114
       [<802ef724>] drm_crtc_helper_set_mode+0x1f0/0x4c0
       [<802f0344>] drm_crtc_helper_set_config+0x828/0x99c
       [<802ff270>] drm_mode_set_config_internal+0x5c/0xdc
       [<802eebe0>] drm_fb_helper_set_par+0x50/0xb4
       [<802af580>] fbcon_init+0x490/0x500
       [<802dd104>] visual_init+0xa8/0xf8
       [<802df414>] do_bind_con_driver+0x140/0x37c
       [<802df764>] do_take_over_console+0x114/0x1c4
       [<802af65c>] do_fbcon_takeover+0x6c/0xd4
       [<802b2b30>] fbcon_event_notify+0x7c8/0x818
       [<80049954>] notifier_call_chain+0x4c/0x8c
       [<80049cd8>] __blocking_notifier_call_chain+0x50/0x68
       [<80049d10>] blocking_notifier_call_chain+0x20/0x28
       [<802a75f0>] fb_notifier_call_chain+0x1c/0x24
       [<802a9224>] register_framebuffer+0x188/0x268
       [<802ee994>] drm_fb_helper_initial_config+0x2bc/0x4b8
       [<802f118c>] drm_fbdev_cma_init+0x7c/0xec
       [<80817288>] imx_fb_helper_init+0x54/0x90
       [<800088c0>] do_one_initcall+0xfc/0x160
       [<807e7c5c>] kernel_init_freeable+0x104/0x1d4
       [<805e2930>] kernel_init+0x10/0xec
       [<8000ea68>] ret_from_fork+0x14/0x2c

other info that might help us debug this:

Chain exists of:
  &imx_drm_device->mutex --> &dev->mode_config.mutex --> &crtc->mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&crtc->mutex);
                               lock(&dev->mode_config.mutex);
                               lock(&crtc->mutex);
  lock(&imx_drm_device->mutex);

 *** DEADLOCK ***

6 locks held by swapper/0/1:
 #0:  (registration_lock){+.+.+.}, at: [<802a90bc>] register_framebuffer+0x20/0x268
 #1:  (&fb_info->lock){+.+.+.}, at: [<802a7a90>] lock_fb_info+0x20/0x44
 #2:  (console_lock){+.+.+.}, at: [<802a9218>] register_framebuffer+0x17c/0x268
 #3:  ((fb_notifier_list).rwsem){.+.+.+}, at: [<80049cbc>] __blocking_notifier_call_chain+0x34/0x68
 #4:  (&dev->mode_config.mutex){+.+.+.}, at: [<802fe758>] drm_modeset_lock_all+0x20/0x54
 #5:  (&crtc->mutex){+.+...}, at: [<802fe778>] drm_modeset_lock_all+0x40/0x54

In order to avoid this lockdep warning, remove the locking from
imx_drm_encoder_get_mux_id() and imx_drm_crtc_panel_format_pins().

Tested on a mx6sabrelite and mx53qsb.

Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
c35d6a3
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 22, 2013
@rostedt rostedt