Skip to content

bcm_dma_wait_idle not exported? #26

Closed
simonjhall opened this Issue May 18, 2012 · 3 comments

3 participants

@simonjhall

Hi there,

I'm using the BCM DMA functions from inside a kernel module.
I'm having a run-time link error as it appears that bcm_dma_wait_idle symbol is not exported. All the other interesting DMA functions are. Is this by design, or is this just an omission?

If it is by design, what is the best way of waiting for a DMA to finish? (IRQ?)

Kind regards,
Simon

@popcornmix

That's just an omission as no-one is currently using that function.
Note the comment, this busy waits, so will hog the CPU.
This really need a down_interruptable that is triggered from the end of DMA imterrupt.

@simonjhall

Ok cool cheers.
I'd seen the comment and was wondering if it's worth sticking in a schedule() call into the busy loop? (although I head that pre-emptive kernel doesn't work)

@popcornmix popcornmix pushed a commit that referenced this issue Aug 1, 2012
@elp elp cfg80211: fix potential deadlock in regulatory
commit fe20b39 upstream.

reg_timeout_work() calls restore_regulatory_settings() which
takes cfg80211_mutex.

reg_set_request_processed() already holds cfg80211_mutex
before calling cancel_delayed_work_sync(reg_timeout),
so it might deadlock.

Call the async cancel_delayed_work instead, in order
to avoid the potential deadlock.

This is the relevant lockdep warning:

cfg80211: Calling CRDA for country: XX

======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5-wl+ #26 Not tainted
-------------------------------------------------------
kworker/0:2/1391 is trying to acquire lock:
 (cfg80211_mutex){+.+.+.}, at: [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211]

but task is already holding lock:
 ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 ((reg_timeout).work){+.+...}:
       [<c008fd44>] validate_chain+0xb94/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c005b600>] wait_on_work+0x4c/0x154
       [<c005c000>] __cancel_work_timer+0xd4/0x11c
       [<c005c064>] cancel_delayed_work_sync+0x1c/0x20
       [<bf28b274>] reg_set_request_processed+0x50/0x78 [cfg80211]
       [<bf28bd84>] set_regdom+0x550/0x600 [cfg80211]
       [<bf294cd8>] nl80211_set_reg+0x218/0x258 [cfg80211]
       [<c03c7738>] genl_rcv_msg+0x1a8/0x1e8
       [<c03c6a00>] netlink_rcv_skb+0x5c/0xc0
       [<c03c7584>] genl_rcv+0x28/0x34
       [<c03c6720>] netlink_unicast+0x15c/0x228
       [<c03c6c7c>] netlink_sendmsg+0x218/0x298
       [<c03933c8>] sock_sendmsg+0xa4/0xc0
       [<c039406c>] __sys_sendmsg+0x1e4/0x268
       [<c0394228>] sys_sendmsg+0x4c/0x70
       [<c0013840>] ret_fast_syscall+0x0/0x3c

-> #1 (reg_mutex){+.+.+.}:
       [<c008fd44>] validate_chain+0xb94/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c04734dc>] mutex_lock_nested+0x48/0x320
       [<bf28b2cc>] reg_todo+0x30/0x538 [cfg80211]
       [<c0059f44>] process_one_work+0x2a0/0x480
       [<c005a4b4>] worker_thread+0x1bc/0x2bc
       [<c0061148>] kthread+0x98/0xa4
       [<c0014af4>] kernel_thread_exit+0x0/0x8

-> #0 (cfg80211_mutex){+.+.+.}:
       [<c008ed58>] print_circular_bug+0x68/0x2cc
       [<c008fb28>] validate_chain+0x978/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c04734dc>] mutex_lock_nested+0x48/0x320
       [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211]
       [<bf28b200>] reg_timeout_work+0x1c/0x20 [cfg80211]
       [<c0059f44>] process_one_work+0x2a0/0x480
       [<c005a4b4>] worker_thread+0x1bc/0x2bc
       [<c0061148>] kthread+0x98/0xa4
       [<c0014af4>] kernel_thread_exit+0x0/0x8

other info that might help us debug this:

Chain exists of:
  cfg80211_mutex --> reg_mutex --> (reg_timeout).work

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((reg_timeout).work);
                               lock(reg_mutex);
                               lock((reg_timeout).work);
  lock(cfg80211_mutex);

 *** DEADLOCK ***

2 locks held by kworker/0:2/1391:
 #0:  (events){.+.+.+}, at: [<c0059e94>] process_one_work+0x1f0/0x480
 #1:  ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480

stack backtrace:
[<c001b928>] (unwind_backtrace+0x0/0x12c) from [<c0471d3c>] (dump_stack+0x20/0x24)
[<c0471d3c>] (dump_stack+0x20/0x24) from [<c008ef70>] (print_circular_bug+0x280/0x2cc)
[<c008ef70>] (print_circular_bug+0x280/0x2cc) from [<c008fb28>] (validate_chain+0x978/0x10f0)
[<c008fb28>] (validate_chain+0x978/0x10f0) from [<c0090b68>] (__lock_acquire+0x8c8/0x9b0)
[<c0090b68>] (__lock_acquire+0x8c8/0x9b0) from [<c0090d40>] (lock_acquire+0xf0/0x114)
[<c0090d40>] (lock_acquire+0xf0/0x114) from [<c04734dc>] (mutex_lock_nested+0x48/0x320)
[<c04734dc>] (mutex_lock_nested+0x48/0x320) from [<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211])
[<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211]) from [<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211])
[<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211]) from [<c0059f44>] (process_one_work+0x2a0/0x480)
[<c0059f44>] (process_one_work+0x2a0/0x480) from [<c005a4b4>] (worker_thread+0x1bc/0x2bc)
[<c005a4b4>] (worker_thread+0x1bc/0x2bc) from [<c0061148>] (kthread+0x98/0xa4)
[<c0061148>] (kthread+0x98/0xa4) from [<c0014af4>] (kernel_thread_exit+0x0/0x8)
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
d120768
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
Luck, Tony x86: Remove some noise from boot log when starting cpus
Printing the "start_ip" for every secondary cpu is very noisy on a large
system - and doesn't add any value. Drop this message.

Console log before:
Booting Node   0, Processors  #1
smpboot cpu 1: start_ip = 96000
 #2
smpboot cpu 2: start_ip = 96000
 #3
smpboot cpu 3: start_ip = 96000
 #4
smpboot cpu 4: start_ip = 96000
       ...
 #31
smpboot cpu 31: start_ip = 96000
Brought up 32 CPUs

Console log after:
Booting Node   0, Processors  #1 #2 #3 #4 #5 #6 #7 Ok.
Booting Node   1, Processors  #8 #9 #10 #11 #12 #13 #14 #15 Ok.
Booting Node   0, Processors  #16 #17 #18 #19 #20 #21 #22 #23 Ok.
Booting Node   1, Processors  #24 #25 #26 #27 #28 #29 #30 #31
Brought up 32 CPUs

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
140f190
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
@elp elp cfg80211: fix potential deadlock in regulatory
reg_timeout_work() calls restore_regulatory_settings() which
takes cfg80211_mutex.

reg_set_request_processed() already holds cfg80211_mutex
before calling cancel_delayed_work_sync(reg_timeout),
so it might deadlock.

Call the async cancel_delayed_work instead, in order
to avoid the potential deadlock.

This is the relevant lockdep warning:

cfg80211: Calling CRDA for country: XX

======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5-wl+ #26 Not tainted
-------------------------------------------------------
kworker/0:2/1391 is trying to acquire lock:
 (cfg80211_mutex){+.+.+.}, at: [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211]

but task is already holding lock:
 ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 ((reg_timeout).work){+.+...}:
       [<c008fd44>] validate_chain+0xb94/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c005b600>] wait_on_work+0x4c/0x154
       [<c005c000>] __cancel_work_timer+0xd4/0x11c
       [<c005c064>] cancel_delayed_work_sync+0x1c/0x20
       [<bf28b274>] reg_set_request_processed+0x50/0x78 [cfg80211]
       [<bf28bd84>] set_regdom+0x550/0x600 [cfg80211]
       [<bf294cd8>] nl80211_set_reg+0x218/0x258 [cfg80211]
       [<c03c7738>] genl_rcv_msg+0x1a8/0x1e8
       [<c03c6a00>] netlink_rcv_skb+0x5c/0xc0
       [<c03c7584>] genl_rcv+0x28/0x34
       [<c03c6720>] netlink_unicast+0x15c/0x228
       [<c03c6c7c>] netlink_sendmsg+0x218/0x298
       [<c03933c8>] sock_sendmsg+0xa4/0xc0
       [<c039406c>] __sys_sendmsg+0x1e4/0x268
       [<c0394228>] sys_sendmsg+0x4c/0x70
       [<c0013840>] ret_fast_syscall+0x0/0x3c

-> #1 (reg_mutex){+.+.+.}:
       [<c008fd44>] validate_chain+0xb94/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c04734dc>] mutex_lock_nested+0x48/0x320
       [<bf28b2cc>] reg_todo+0x30/0x538 [cfg80211]
       [<c0059f44>] process_one_work+0x2a0/0x480
       [<c005a4b4>] worker_thread+0x1bc/0x2bc
       [<c0061148>] kthread+0x98/0xa4
       [<c0014af4>] kernel_thread_exit+0x0/0x8

-> #0 (cfg80211_mutex){+.+.+.}:
       [<c008ed58>] print_circular_bug+0x68/0x2cc
       [<c008fb28>] validate_chain+0x978/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c04734dc>] mutex_lock_nested+0x48/0x320
       [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211]
       [<bf28b200>] reg_timeout_work+0x1c/0x20 [cfg80211]
       [<c0059f44>] process_one_work+0x2a0/0x480
       [<c005a4b4>] worker_thread+0x1bc/0x2bc
       [<c0061148>] kthread+0x98/0xa4
       [<c0014af4>] kernel_thread_exit+0x0/0x8

other info that might help us debug this:

Chain exists of:
  cfg80211_mutex --> reg_mutex --> (reg_timeout).work

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((reg_timeout).work);
                               lock(reg_mutex);
                               lock((reg_timeout).work);
  lock(cfg80211_mutex);

 *** DEADLOCK ***

2 locks held by kworker/0:2/1391:
 #0:  (events){.+.+.+}, at: [<c0059e94>] process_one_work+0x1f0/0x480
 #1:  ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480

stack backtrace:
[<c001b928>] (unwind_backtrace+0x0/0x12c) from [<c0471d3c>] (dump_stack+0x20/0x24)
[<c0471d3c>] (dump_stack+0x20/0x24) from [<c008ef70>] (print_circular_bug+0x280/0x2cc)
[<c008ef70>] (print_circular_bug+0x280/0x2cc) from [<c008fb28>] (validate_chain+0x978/0x10f0)
[<c008fb28>] (validate_chain+0x978/0x10f0) from [<c0090b68>] (__lock_acquire+0x8c8/0x9b0)
[<c0090b68>] (__lock_acquire+0x8c8/0x9b0) from [<c0090d40>] (lock_acquire+0xf0/0x114)
[<c0090d40>] (lock_acquire+0xf0/0x114) from [<c04734dc>] (mutex_lock_nested+0x48/0x320)
[<c04734dc>] (mutex_lock_nested+0x48/0x320) from [<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211])
[<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211]) from [<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211])
[<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211]) from [<c0059f44>] (process_one_work+0x2a0/0x480)
[<c0059f44>] (process_one_work+0x2a0/0x480) from [<c005a4b4>] (worker_thread+0x1bc/0x2bc)
[<c005a4b4>] (worker_thread+0x1bc/0x2bc) from [<c0061148>] (kthread+0x98/0xa4)
[<c0061148>] (kthread+0x98/0xa4) from [<c0014af4>] (kernel_thread_exit+0x0/0x8)
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)

Cc: stable@kernel.org
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
fe20b39
@ghaskins ghaskins added a commit to ghaskins/raspberrypi-rt that referenced this issue Feb 20, 2013
Thomas Gleixner Subject: net-flip-lock-dep-thingy.patch
=======================================================
[ INFO: possible circular locking dependency detected ]
3.0.0-rc3+ #26
-------------------------------------------------------
ip/1104 is trying to acquire lock:
 (local_softirq_lock){+.+...}, at: [<ffffffff81056d12>] __local_lock+0x25/0x68

but task is already holding lock:
 (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (sk_lock-AF_INET){+.+...}:
       [<ffffffff810836e5>] lock_acquire+0x103/0x12e
       [<ffffffff813e2781>] lock_sock_nested+0x82/0x92
       [<ffffffff81433308>] lock_sock+0x10/0x12
       [<ffffffff81433afa>] tcp_close+0x1b/0x355
       [<ffffffff81453c99>] inet_release+0xc3/0xcd
       [<ffffffff813dff3f>] sock_release+0x1f/0x74
       [<ffffffff813dffbb>] sock_close+0x27/0x2b
       [<ffffffff81129c63>] fput+0x11d/0x1e3
       [<ffffffff81126577>] filp_close+0x70/0x7b
       [<ffffffff8112667a>] sys_close+0xf8/0x13d
       [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b

-> #0 (local_softirq_lock){+.+...}:
       [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8
       [<ffffffff810836e5>] lock_acquire+0x103/0x12e
       [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a
       [<ffffffff81056d12>] __local_lock+0x25/0x68
       [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b
       [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f
       [<ffffffff81433c38>] tcp_close+0x159/0x355
       [<ffffffff81453c99>] inet_release+0xc3/0xcd
       [<ffffffff813dff3f>] sock_release+0x1f/0x74
       [<ffffffff813dffbb>] sock_close+0x27/0x2b
       [<ffffffff81129c63>] fput+0x11d/0x1e3
       [<ffffffff81126577>] filp_close+0x70/0x7b
       [<ffffffff8112667a>] sys_close+0xf8/0x13d
       [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_INET);
                               lock(local_softirq_lock);
                               lock(sk_lock-AF_INET);
  lock(local_softirq_lock);

 *** DEADLOCK ***

1 lock held by ip/1104:
 #0:  (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12

stack backtrace:
Pid: 1104, comm: ip Not tainted 3.0.0-rc3+ #26
Call Trace:
 [<ffffffff81081649>] print_circular_bug+0x1f8/0x209
 [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8
 [<ffffffff81056d12>] ? __local_lock+0x25/0x68
 [<ffffffff810836e5>] lock_acquire+0x103/0x12e
 [<ffffffff81056d12>] ? __local_lock+0x25/0x68
 [<ffffffff81046c75>] ? get_parent_ip+0x11/0x41
 [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a
 [<ffffffff81056d12>] ? __local_lock+0x25/0x68
 [<ffffffff81046c8c>] ? get_parent_ip+0x28/0x41
 [<ffffffff81056d12>] __local_lock+0x25/0x68
 [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b
 [<ffffffff81433308>] ? lock_sock+0x10/0x12
 [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f
 [<ffffffff81433c38>] tcp_close+0x159/0x355
 [<ffffffff81453c99>] inet_release+0xc3/0xcd
 [<ffffffff813dff3f>] sock_release+0x1f/0x74
 [<ffffffff813dffbb>] sock_close+0x27/0x2b
 [<ffffffff81129c63>] fput+0x11d/0x1e3
 [<ffffffff81126577>] filp_close+0x70/0x7b
 [<ffffffff8112667a>] sys_close+0xf8/0x13d
 [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b


Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
aac678f
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue May 8, 2013
@Kev3354 Kev3354 [media] media/rc/imon.c: kill urb when send_packet() is interrupted
This avoids:
Apr 12 23:52:16 homeserver kernel: imon:send_packet: task interrupted
Apr 12 23:52:16 homeserver kernel: ------------[ cut here ]------------
Apr 12 23:52:16 homeserver kernel: WARNING: at drivers/usb/core/urb.c:327 usb_submit_urb+0x353/0x370()
Apr 12 23:52:16 homeserver kernel: Hardware name: Unknow
Apr 12 23:52:16 homeserver kernel: URB f64b6f00 submitted while active
Apr 12 23:52:16 homeserver kernel: Modules linked in:
Apr 12 23:52:16 homeserver kernel: Pid: 3154, comm: LCDd Not tainted 3.8.6-htpc-00005-g9e6fc5e #26
Apr 12 23:52:16 homeserver kernel: Call Trace:
Apr 12 23:52:16 homeserver kernel: [<c012d778>] ? warn_slowpath_common+0x78/0xb0
Apr 12 23:52:16 homeserver kernel: [<c04136c3>] ? usb_submit_urb+0x353/0x370
Apr 12 23:52:16 homeserver kernel: [<c04136c3>] ? usb_submit_urb+0x353/0x370
Apr 12 23:52:16 homeserver kernel: [<c0447010>] ? imon_ir_change_protocol+0x150/0x150
Apr 12 23:52:16 homeserver kernel: [<c012d843>] ? warn_slowpath_fmt+0x33/0x40
Apr 12 23:52:16 homeserver kernel: [<c04136c3>] ? usb_submit_urb+0x353/0x370
Apr 12 23:52:16 homeserver kernel: [<c0446c67>] ? send_packet+0x97/0x270
Apr 12 23:52:16 homeserver kernel: [<c0446cfe>] ? send_packet+0x12e/0x270
Apr 12 23:52:16 homeserver kernel: [<c05c5743>] ? do_nanosleep+0xa3/0xd0
Apr 12 23:52:16 homeserver kernel: [<c044760e>] ? vfd_write+0xae/0x250
Apr 12 23:52:16 homeserver kernel: [<c0447560>] ? lcd_write+0x180/0x180
Apr 12 23:52:16 homeserver kernel: [<c01b2b19>] ? vfs_write+0x89/0x140
Apr 12 23:52:16 homeserver kernel: [<c01b2dda>] ? sys_write+0x4a/0x90
Apr 12 23:52:16 homeserver kernel: [<c05c7c45>] ? sysenter_do_call+0x12/0x26
Apr 12 23:52:16 homeserver kernel: ---[ end trace a0b6f0fcfd2f9a1d ]---
Apr 12 23:52:16 homeserver kernel: imon:send_packet: error submitting urb(-16)
Apr 12 23:52:16 homeserver kernel: imon:vfd_write: send packet #3 failed
Apr 12 23:52:16 homeserver kernel: imon:send_packet: error submitting urb(-16)
Apr 12 23:52:16 homeserver kernel: imon:vfd_write: send packet #0 failed

Signed-off-by: Kevin Baradon <kevin.baradon@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
5f3f254
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue May 8, 2013
Mauro Carvalho Chehab [media] em28xx: fix oops at em28xx_dvb_bus_ctrl()
em28xx is oopsing with some DVB devices:

[10856.061884] general protection fault: 0000 [#1] SMP
[10856.067041] Modules linked in: rc_hauppauge em28xx_rc xc5000 drxk em28xx_dvb dvb_core em28xx videobuf2_vmalloc videobuf2_memops videobuf2_core rc_pixelview_new tuner_xc2028 tuner cx8800 cx88xx tveeprom btcx_risc videobuf_dma_sg videobuf_core rc_core v4l2_common videodev ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM be2iscsi iscsi_boot_sysfs iptable_mangle bnx2i cnic uio cxgb4i cxgb4 tun bridge cxgb3i cxgb3 stp ip6t_REJECT mdio libcxgbi nf_conntrack_ipv6 llc nf_defrag_ipv6 ib_iser rdma_cm ib_addr xt_conntrack iw_cm ib_cm ib_sa nf_conntrack ib_mad ib_core bnep bluetooth iscsi_tcp libiscsi_tcp ip6table_filter libiscsi ip6_tables scsi_transport_iscsi xfs libcrc32c snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm tg3 snd_page_alloc snd_timer
[10856.139176]  snd ptp iTCO_wdt soundcore pps_core iTCO_vendor_support lpc_ich mfd_core coretemp nfsd hp_wmi crc32c_intel microcode serio_raw rfkill sparse_keymap nfs_acl lockd sunrpc kvm_intel kvm uinput binfmt_misc firewire_ohci nouveau mxm_wmi i2c_algo_bit drm_kms_helper firewire_core crc_itu_t ttm drm i2c_core wmi [last unloaded: dib0070]
[10856.168969] CPU 1
[10856.170799] Pid: 13606, comm: dvbv5-zap Not tainted 3.9.0-rc5+ #26 Hewlett-Packard HP Z400 Workstation/0AE4h
[10856.181187] RIP: 0010:[<ffffffffa0459e47>]  [<ffffffffa0459e47>] em28xx_write_regs_req+0x37/0x1c0 [em28xx]
[10856.191028] RSP: 0018:ffff880118401a58  EFLAGS: 00010282
[10856.196533] RAX: 00020000012d0000 RBX: ffff88010804aec8 RCX: ffff880118401b14
[10856.203852] RDX: 0000000000000048 RSI: 0000000000000000 RDI: ffff88010804aec8
[10856.211174] RBP: ffff880118401ac8 R08: 0000000000000001 R09: 0000000000000000
[10856.218496] R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000048
[10856.226026] R13: ffff880118401b14 R14: ffff88011752b258 R15: ffff88011752b258
[10856.233352] FS:  00007f26636d2740(0000) GS:ffff88011fc20000(0000) knlGS:0000000000000000
[10856.241626] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10856.247565] CR2: 00007f2663716e20 CR3: 00000000c7eb1000 CR4: 00000000000007e0
[10856.254889] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10856.262215] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[10856.269542] Process dvbv5-zap (pid: 13606, threadinfo ffff880118400000, task ffff8800cd625d40)
[10856.278340] Stack:
[10856.280564]  ffff88011ffe8de8 0000000000000002 0000000000000000 ffff88011ffe9b00
[10856.288191]  ffff880118401b14 00ff88011ffe9b08 ffff880100000048 ffffffff8112a52a
[10856.295893]  0000000000000001 ffff88010804aec8 0000000000000048 ffff880118401b14
[10856.303521] Call Trace:
[10856.306182]  [<ffffffff8112a52a>] ? __alloc_pages_nodemask+0x15a/0x960
[10856.312912]  [<ffffffffa045a002>] em28xx_write_regs+0x32/0xa0 [em28xx]
[10856.319638]  [<ffffffffa045a221>] em28xx_write_reg+0x21/0x30 [em28xx]
[10856.326279]  [<ffffffffa045a2cc>] em28xx_gpio_set+0x9c/0x100 [em28xx]
[10856.332919]  [<ffffffffa045a3ac>] em28xx_set_mode+0x7c/0x80 [em28xx]
[10856.339472]  [<ffffffffa03ef032>] em28xx_dvb_bus_ctrl+0x32/0x40 [em28xx_dvb]

This is caused by commit c7a45e5,
that added support for two I2C buses. A partial fix was applied
at 3de09fb, but it doesn't cover
all cases, as the DVB core fills fe->dvb->priv with adapter->priv.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
a3b6020
@ghollingworth

Assuming this is now fixed? It's been a while!

@popcornmix popcornmix pushed a commit that referenced this issue Nov 4, 2013
@aakoskin aakoskin ARM: OMAP2: gpmc-onenand: fix sync mode setup with DT
With DT-based boot, the GPMC OneNAND sync mode setup does not work
correctly. During the async mode setup, sync flags gets incorrectly
set in the onenand_async data and the system crashes during the async
setup. Also, the sync mode never gets set in gpmc_onenand_data->flags, so
even without the crash, the actual sync mode setup would never be called.

The patch fixes this by adjusting the gpmc_onenand_data->flags when the
data is read from the DT. Also while doing this we force the onenand_async
to be always async.

The patch enables to use the following DTS chunk (that should correspond
the arch/arm/mach-omap2/board-rm680.c board file setup) with Nokia N950,
which currently crashes with 3.12-rc1. The crash output can be also
found below.

&gpmc {
	ranges = <0 0 0x04000000 0x20000000>;

	onenand@0,0 {
		#address-cells = <1>;
		#size-cells = <1>;
		reg = <0 0 0x20000000>;

		gpmc,sync-read;
		gpmc,sync-write;
		gpmc,burst-length = <16>;
		gpmc,burst-read;
		gpmc,burst-wrap;
		gpmc,burst-write;
		gpmc,device-width = <2>;
		gpmc,mux-add-data = <2>;
		gpmc,cs-on-ns = <0>;
		gpmc,cs-rd-off-ns = <87>;
		gpmc,cs-wr-off-ns = <87>;
		gpmc,adv-on-ns = <0>;
		gpmc,adv-rd-off-ns = <10>;
		gpmc,adv-wr-off-ns = <10>;
		gpmc,oe-on-ns = <15>;
		gpmc,oe-off-ns = <87>;
		gpmc,we-on-ns = <0>;
		gpmc,we-off-ns = <87>;
		gpmc,rd-cycle-ns = <112>;
		gpmc,wr-cycle-ns = <112>;
		gpmc,access-ns = <81>;
		gpmc,page-burst-access-ns = <15>;
		gpmc,bus-turnaround-ns = <0>;
		gpmc,cycle2cycle-delay-ns = <0>;
		gpmc,wait-monitoring-ns = <0>;
		gpmc,clk-activation-ns = <5>;
		gpmc,wr-data-mux-bus-ns = <30>;
		gpmc,wr-access-ns = <81>;
		gpmc,sync-clk-ps = <15000>;
	};
};

[    1.467559] GPMC CS0: cs_on     :   0 ticks,   0 ns (was   0 ticks)   0 ns
[    1.474822] GPMC CS0: cs_rd_off :   1 ticks,   5 ns (was  24 ticks)   5 ns
[    1.482116] GPMC CS0: cs_wr_off :  14 ticks,  71 ns (was  24 ticks)  71 ns
[    1.489349] GPMC CS0: adv_on    :   0 ticks,   0 ns (was   0 ticks)   0 ns
[    1.496582] GPMC CS0: adv_rd_off:   3 ticks,  15 ns (was   3 ticks)  15 ns
[    1.503845] GPMC CS0: adv_wr_off:   3 ticks,  15 ns (was   3 ticks)  15 ns
[    1.511077] GPMC CS0: oe_on     :   3 ticks,  15 ns (was   4 ticks)  15 ns
[    1.518310] GPMC CS0: oe_off    :   1 ticks,   5 ns (was  24 ticks)   5 ns
[    1.525543] GPMC CS0: we_on     :   0 ticks,   0 ns (was   0 ticks)   0 ns
[    1.532806] GPMC CS0: we_off    :   8 ticks,  40 ns (was  24 ticks)  40 ns
[    1.540039] GPMC CS0: rd_cycle  :   4 ticks,  20 ns (was  29 ticks)  20 ns
[    1.547302] GPMC CS0: wr_cycle  :   4 ticks,  20 ns (was  29 ticks)  20 ns
[    1.554504] GPMC CS0: access    :   0 ticks,   0 ns (was  23 ticks)   0 ns
[    1.561767] GPMC CS0: page_burst_access:   0 ticks,   0 ns (was   3 ticks)   0 ns
[    1.569641] GPMC CS0: bus_turnaround:   0 ticks,   0 ns (was   0 ticks)   0 ns
[    1.577270] GPMC CS0: cycle2cycle_delay:   0 ticks,   0 ns (was   0 ticks)   0 ns
[    1.585144] GPMC CS0: wait_monitoring:   0 ticks,   0 ns (was   0 ticks)   0 ns
[    1.592834] GPMC CS0: clk_activation:   0 ticks,   0 ns (was   0 ticks)   0 ns
[    1.600463] GPMC CS0: wr_data_mux_bus:   5 ticks,  25 ns (was   8 ticks)  25 ns
[    1.608154] GPMC CS0: wr_access :   0 ticks,   0 ns (was  23 ticks)   0 ns
[    1.615386] GPMC CS0 CLK period is 5 ns (div 1)
[    1.625122] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf009e442
[    1.633178] Internal error: : 1008 [#1] ARM
[    1.637573] Modules linked in:
[    1.640777] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-rc1-n9xx-los.git-5318619-00006-g4baa700-dirty #26
[    1.651123] task: ef04c000 ti: ef050000 task.ti: ef050000
[    1.656799] PC is at gpmc_onenand_setup+0x98/0x1e0
[    1.661865] LR is at gpmc_cs_set_timings+0x494/0x5a4
[    1.667083] pc : [<c002e040>]    lr : [<c001f384>]    psr: 60000113
[    1.667083] sp : ef051d10  ip : ef051ce0  fp : ef051d94
[    1.679138] r10: c0caaf60  r9 : ef050000  r8 : ef18b32c
[    1.684631] r7 : f0080000  r6 : c0caaf60  r5 : 00000000  r4 : f009e400
[    1.691497] r3 : f009e442  r2 : 80050000  r1 : 00000014  r0 : 00000000
[    1.698333] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[    1.706024] Control: 10c5387d  Table: af290019  DAC: 00000015
[    1.712066] Process swapper (pid: 1, stack limit = 0xef050240)
[    1.718200] Stack: (0xef051d10 to 0xef052000)
[    1.722778] 1d00:                                     00004000 00001402 00000000 00000005
[    1.731384] 1d20: 00000047 00000000 0000000f 0000000f 00000000 00000028 0000000f 00000005
[    1.739990] 1d40: 00000000 00000000 00000014 00000014 00000000 00000000 00000000 00000000
[    1.748596] 1d60: 00000000 00000019 00000000 00000000 ef18b000 ef099c50 c0c8cb30 00000000
[    1.757171] 1d80: c0488074 c048f868 ef051dcc ef051d98 c024447c c002dfb4 00000000 c048f868
[    1.765777] 1da0: 00000000 00000000 c010e4a4 c0dbbb7c c0c8cb40 00000000 c0ca2500 c0488074
[    1.774383] 1dc0: ef051ddc ef051dd0 c01fd508 c0244370 ef051dfc ef051de0 c01fc204 c01fd4f4
[    1.782989] 1de0: c0c8cb40 c0ca2500 c0c8cb74 00000000 ef051e1c ef051e00 c01fc3b0 c01fc104
[    1.791595] 1e00: ef0983bc 00000000 c0ca2500 c01fc31c ef051e44 ef051e20 c01fa794 c01fc328
[    1.800201] 1e20: ef03634c ef0983b0 ef27d534 c0ca2500 ef27d500 c0c9a2f8 ef051e54 ef051e48
[    1.808807] 1e40: c01fbcfc c01fa744 ef051e84 ef051e58 c01fb838 c01fbce4 c0411df8 c0caa040
[    1.817413] 1e60: ef051e84 c0ca2500 00000006 c0caa040 00000066 c0488074 ef051e9c ef051e88
[    1.825988] 1e80: c01fca30 c01fb768 c04975b8 00000006 ef051eac ef051ea0 c01fd728 c01fc9bc
[    1.834594] 1ea0: ef051ebc ef051eb0 c048808c c01fd6e4 ef051f4c ef051ec0 c0008888 c0488080
[    1.843200] 1ec0: 0000006f c046bae8 00000000 00000000 ef051efc ef051ee0 ef051f04 ef051ee8
[    1.851806] 1ee0: c046d400 c0181218 c046d410 c18da8d5 c036a8e4 00000066 ef051f4c ef051f08
[    1.860412] 1f00: c004b9a8 c046d41c c048f840 00000006 00000006 c046b488 00000000 c043ec08
[    1.869018] 1f20: ef051f4c c04975b8 00000006 c0caa040 00000066 c046d410 c048f85c c048f868
[    1.877593] 1f40: ef051f94 ef051f50 c046db8c c00087a0 00000006 00000006 c046d410 ffffffff
[    1.886199] 1f60: ffffffff ffffffff ffffffff 00000000 c0348fd0 00000000 00000000 00000000
[    1.894805] 1f80: 00000000 00000000 ef051fac ef051f98 c0348fe0 c046daa8 00000000 00000000
[    1.903411] 1fa0: 00000000 ef051fb0 c000e7f8 c0348fdc 00000000 00000000 00000000 00000000
[    1.912017] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.920623] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff
[    1.929199] Backtrace:
[    1.931793] [<c002dfa8>] (gpmc_onenand_setup+0x0/0x1e0) from [<c024447c>] (omap2_onenand_probe+0x118/0x49c)
[    1.942047] [<c0244364>] (omap2_onenand_probe+0x0/0x49c) from [<c01fd508>] (platform_drv_probe+0x20/0x24)
[    1.952117]  r8:c0488074 r7:c0ca2500 r6:00000000 r5:c0c8cb40 r4:c0dbbb7c
[    1.959197] [<c01fd4e8>] (platform_drv_probe+0x0/0x24) from [<c01fc204>] (driver_probe_device+0x10c/0x224)
[    1.969360] [<c01fc0f8>] (driver_probe_device+0x0/0x224) from [<c01fc3b0>] (__driver_attach+0x94/0x98)
[    1.979125]  r7:00000000 r6:c0c8cb74 r5:c0ca2500 r4:c0c8cb40
[    1.985107] [<c01fc31c>] (__driver_attach+0x0/0x98) from [<c01fa794>] (bus_for_each_dev+0x5c/0x90)
[    1.994506]  r6:c01fc31c r5:c0ca2500 r4:00000000 r3:ef0983bc
[    2.000488] [<c01fa738>] (bus_for_each_dev+0x0/0x90) from [<c01fbcfc>] (driver_attach+0x24/0x28)
[    2.009735]  r6:c0c9a2f8 r5:ef27d500 r4:c0ca2500
[    2.014587] [<c01fbcd8>] (driver_attach+0x0/0x28) from [<c01fb838>] (bus_add_driver+0xdc/0x260)
[    2.023742] [<c01fb75c>] (bus_add_driver+0x0/0x260) from [<c01fca30>] (driver_register+0x80/0xfc)
[    2.033081]  r8:c0488074 r7:00000066 r6:c0caa040 r5:00000006 r4:c0ca2500
[    2.040161] [<c01fc9b0>] (driver_register+0x0/0xfc) from [<c01fd728>] (__platform_driver_register+0x50/0x64)
[    2.050476]  r5:00000006 r4:c04975b8
[    2.054260] [<c01fd6d8>] (__platform_driver_register+0x0/0x64) from [<c048808c>] (omap2_onenand_driver_init+0x18/0x20)
[    2.065490] [<c0488074>] (omap2_onenand_driver_init+0x0/0x20) from [<c0008888>] (do_one_initcall+0xf4/0x150)
[    2.075836] [<c0008794>] (do_one_initcall+0x0/0x150) from [<c046db8c>] (kernel_init_freeable+0xf0/0x1b4)
[    2.085815] [<c046da9c>] (kernel_init_freeable+0x0/0x1b4) from [<c0348fe0>] (kernel_init+0x10/0xec)
[    2.095336] [<c0348fd0>] (kernel_init+0x0/0xec) from [<c000e7f8>] (ret_from_fork+0x14/0x3c)
[    2.104125]  r4:00000000 r3:00000000
[    2.107879] Code: ebffc3ae e2505000 ba00002e e2843042 (e1d320b0)
[    2.114318] ---[ end trace b8ee3e3e5e002451 ]---

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
1dc1c33
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 22, 2013
Borislav Petkov x86: Improve the printout of the SMP bootup CPU table
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
646e29a
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 22, 2013
Borislav Petkov x86/boot: Further compress CPUs bootup message
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
a17bce4
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Dec 4, 2013
@fabioestevam fabioestevam ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS cal…
…culation

Currently mx53 (CortexA8) running at 1GHz reports:
Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)

Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)

The original object code looks like this:

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr

0000003c <__loop_delay>:
  3c:	e2500001 	subs	r0, r0, #1
  40:	8afffffe 	bhi	3c <__loop_delay>
  44:	e1a0f00e 	mov	pc, lr

After adding the 'align 3' directive to __loop_delay (align to 8 bytes):

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr
  3c:	e320f000 	nop	{0}

00000040 <__loop_delay>:
  40:	e2500001 	subs	r0, r0, #1
  44:	8afffffe 	bhi	40 <__loop_delay>
  48:	e1a0f00e 	mov	pc, lr
  4c:	e320f000 	nop	{0}

, which now reports:
Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)

Some more test results:

On mx31 (ARM1136) running at 532 MHz, before the patch:
Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)

On mx31 (ARM1136) running at 532 MHz after the patch:
Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)

Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
BogoMIPS value before and after this patch.

Reported-by: Tom Evans <tom_usenet@optusnet.com.au>
Suggested-by: Tom Evans <tom_usenet@optusnet.com.au>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
11d4bb1
@popcornmix popcornmix pushed a commit that referenced this issue Mar 31, 2014
Marek Belisko ARM: dts: omap3-gta04: Add ti,omap36xx to compatible property to avoi…
…d problems with booting

Without that change booting leads to crash with more warnings like below:
[    0.284454] omap_hwmod: uart4: cannot clk_get main_clk uart4_fck
[    0.284484] omap_hwmod: uart4: cannot _init_clocks
[    0.284484] ------------[ cut here ]------------
[    0.284545] WARNING: CPU: 0 PID: 1 at arch/arm/mach-omap2/omap_hwmod.c:2543 _init+0x300/0x3e4()
[    0.284545] omap_hwmod: uart4: couldn't init clocks
[    0.284576] Modules linked in:
[    0.284606] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-next-20140124-00020-gd2aefec-dirty #26
[    0.284637] [<c00151c0>] (unwind_backtrace) from [<c0011e20>] (show_stack+0x10/0x14)
[    0.284667] [<c0011e20>] (show_stack) from [<c0568544>] (dump_stack+0x7c/0x94)
[    0.284729] [<c0568544>] (dump_stack) from [<c003ff94>] (warn_slowpath_common+0x6c/0x90)
[    0.284729] [<c003ff94>] (warn_slowpath_common) from [<c003ffe8>] (warn_slowpath_fmt+0x30/0x40)
[    0.284759] [<c003ffe8>] (warn_slowpath_fmt) from [<c07d1be8>] (_init+0x300/0x3e4)
[    0.284790] [<c07d1be8>] (_init) from [<c07d217c>] (__omap_hwmod_setup_all+0x40/0x8c)
[    0.284820] [<c07d217c>] (__omap_hwmod_setup_all) from [<c0008918>] (do_one_initcall+0xe8/0x14c)
[    0.284851] [<c0008918>] (do_one_initcall) from [<c07c5c18>] (kernel_init_freeable+0x104/0x1c8)
[    0.284881] [<c07c5c18>] (kernel_init_freeable) from [<c0563524>] (kernel_init+0x8/0x118)
[    0.284912] [<c0563524>] (kernel_init) from [<c000e368>] (ret_from_fork+0x14/0x2c)
[    0.285064] ---[ end trace 63de210ad43b627d ]---

Reference:
https://lkml.org/lkml/2013/10/8/553

Signed-off-by: Marek Belisko <marek@goldelico.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
ae41a30
@popcornmix popcornmix pushed a commit that referenced this issue Jun 8, 2014
@htejun htejun kernfs: cache atomic_write_len in kernfs_open_file
While implementing atomic_write_len, 4d3773c ("kernfs: implement
kernfs_ops->atomic_write_len") moved data copy from userland inside
kernfs_get_active() and kernfs_open_file->mutex so that
kernfs_ops->atomic_write_len can be accessed before copying buffer
from userland; unfortunately, this could lead to locking order
inversion involving mmap_sem if copy_from_user() takes a page fault.

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26 Tainted: G        W
  -------------------------------------------------------
  trinity-c236/10658 is trying to acquire lock:
   (&of->mutex#2){+.+.+.}, at: [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120

  but task is already holding lock:
   (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

 -> #1 (&mm->mmap_sem){++++++}:
	 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
	 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
	 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
	 [<mm/memory.c:4188>] might_fault+0x7e/0xb0
	 [<arch/x86/include/asm/uaccess.h:713 fs/kernfs/file.c:291>] kernfs_fop_write+0xd8/0x190
	 [<fs/read_write.c:473>] vfs_write+0xe3/0x1d0
	 [<fs/read_write.c:523 fs/read_write.c:515>] SyS_write+0x5d/0xa0
	 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2

 -> #0 (&of->mutex#2){+.+.+.}:
	 [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
	 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
	 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
	 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
	 [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510
	 [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
	 [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
	 [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
	 [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
	 [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
	 [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
	 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2

  other info that might help us debug this:

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(&mm->mmap_sem);
				 lock(&of->mutex#2);
				 lock(&mm->mmap_sem);
    lock(&of->mutex#2);

   *** DEADLOCK ***

  1 lock held by trinity-c236/10658:
   #0:  (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0

  stack backtrace:
  CPU: 2 PID: 10658 Comm: trinity-c236 Tainted: G        W 3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26
   0000000000000000 ffff88011911fa48 ffffffff8438e945 0000000000000000
   0000000000000000 ffff88011911fa98 ffffffff811a0109 ffff88011911fab8
   ffff88011911fab8 ffff88011911fa98 ffff880119128cc0 ffff880119128cf8
  Call Trace:
   [<lib/dump_stack.c:52>] dump_stack+0x52/0x7f
   [<kernel/locking/lockdep.c:1213>] print_circular_bug+0x129/0x160
   [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
   [<include/linux/spinlock.h:343 mm/slub.c:1933>] ? deactivate_slab+0x511/0x550
   [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
   [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
   [<mm/mmap.c:1552>] ? mmap_region+0x24a/0x5c0
   [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
   [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
   [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510
   [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
   [<kernel/sched/core.c:2477>] ? get_parent_ip+0x11/0x50
   [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
   [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
   [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
   [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
   [<mm/util.c:397>] ? vm_mmap_pgoff+0x6e/0xe0
   [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
   [<kernel/rcu/update.c:97>] ? __rcu_read_unlock+0x44/0xb0
   [<fs/file.c:641>] ? dup_fd+0x3c0/0x3c0
   [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
   [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
   [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2

Fix it by caching atomic_write_len in kernfs_open_file during open so
that it can be determined without accessing kernfs_ops in
kernfs_fop_write().  This restores the structure of kernfs_fop_write()
before 4d3773c with updated @len determination logic.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
References: http://lkml.kernel.org/g/53113485.2090407@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
b7ce40c
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Oct 17, 2014
Dave Hansen x86, sched: Add new topology for multi-NUMA-node CPUs
I'm getting the spew below when booting with Haswell (Xeon
E5-2699 v3) CPUs and the "Cluster-on-Die" (CoD) feature enabled
in the BIOS.  It seems similar to the issue that some folks from
AMD ran in to on their systems and addressed in this commit:

  161270f ("x86/smp: Fix topology checks on AMD MCM CPUs")

Both these Intel and AMD systems break an assumption which is
being enforced by topology_sane(): a socket may not contain more
than one NUMA node.

AMD special-cased their system by looking for a cpuid flag.  The
Intel mode is dependent on BIOS options and I do not know of a
way which it is enumerated other than the tables being parsed
during the CPU bringup process.  In other words, we have to trust
the ACPI tables <shudder>.

This detects the situation where a NUMA node occurs at a place in
the middle of the "CPU" sched domains.  It replaces the default
topology with one that relies on the NUMA information from the
firmware (SRAT table) for all levels of sched domains above the
hyperthreads.

This also fixes a sysfs bug.  We used to freak out when we saw
the "mc" group cross a node boundary, so we stopped building the
MC group.  MC gets exported as the 'core_siblings_list' in
/sys/devices/system/cpu/cpu*/topology/ and this caused CPUs with
the same 'physical_package_id' to not be listed together in
'core_siblings_list'.  This violates a statement from
Documentation/ABI/testing/sysfs-devices-system-cpu:

	core_siblings: internal kernel map of cpu#'s hardware threads
	within the same physical_package_id.

	core_siblings_list: human-readable list of the logical CPU
	numbers within the same physical_package_id as cpu#.

The sysfs effects here cause an issue with the hwloc tool where
it gets confused and thinks there are more sockets than are
physically present.

Before this patch, there are two packages:

# cd /sys/devices/system/cpu/
# cat cpu*/topology/physical_package_id | sort | uniq -c
     18 0
     18 1

But 4 _sets_ of core siblings:

# cat cpu*/topology/core_siblings_list | sort | uniq -c
      9 0-8
      9 18-26
      9 27-35
      9 9-17

After this set, there are only 2 sets of core siblings, which
is what we expect for a 2-socket system.

# cat cpu*/topology/physical_package_id | sort | uniq -c
     18 0
     18 1
# cat cpu*/topology/core_siblings_list | sort | uniq -c
     18 0-17
     18 18-35

Example spew:
...
	NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
	 #2  #3  #4  #5  #6  #7  #8
	.... node  #1, CPUs:    #9
	------------[ cut here ]------------
	WARNING: CPU: 9 PID: 0 at /home/ak/hle/linux-hle-2.6/arch/x86/kernel/smpboot.c:306 topology_sane.isra.2+0x74/0x90()
	sched: CPU #9's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
	Modules linked in:
	CPU: 9 PID: 0 Comm: swapper/9 Not tainted 3.17.0-rc1-00293-g8e01c4d-dirty #631
	Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0036.R05.1407140519 07/14/2014
	0000000000000009 ffff88046ddabe00 ffffffff8172e485 ffff88046ddabe48
	ffff88046ddabe38 ffffffff8109691d 000000000000b001 0000000000000009
	ffff88086fc12580 000000000000b020 0000000000000009 ffff88046ddabe98
	Call Trace:
	[<ffffffff8172e485>] dump_stack+0x45/0x56
	[<ffffffff8109691d>] warn_slowpath_common+0x7d/0xa0
	[<ffffffff8109698c>] warn_slowpath_fmt+0x4c/0x50
	[<ffffffff81074f94>] topology_sane.isra.2+0x74/0x90
	[<ffffffff8107530e>] set_cpu_sibling_map+0x31e/0x4f0
	[<ffffffff8107568d>] start_secondary+0x1ad/0x240
	---[ end trace 3fe5f587a9fcde61 ]---
	#10 #11 #12 #13 #14 #15 #16 #17
	.... node  #2, CPUs:   #18 #19 #20 #21 #22 #23 #24 #25 #26
	.... node  #3, CPUs:   #27 #28 #29 #30 #31 #32 #33 #34 #35

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
[ Added LLC domain and s/match_mc/match_die/ ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: brice.goglin@gmail.com
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/20140918193334.C065EBCE@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cebf15e
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 5, 2014
@pranith pranith powerpc: Wire up sys_bpf() syscall
This patch wires up the new syscall sys_bpf() on powerpc.

Passes the tests in samples/bpf:

    #0 add+sub+mul OK
    #1 unreachable OK
    #2 unreachable2 OK
    #3 out of range jump OK
    #4 out of range jump2 OK
    #5 test1 ld_imm64 OK
    #6 test2 ld_imm64 OK
    #7 test3 ld_imm64 OK
    #8 test4 ld_imm64 OK
    #9 test5 ld_imm64 OK
    #10 no bpf_exit OK
    #11 loop (back-edge) OK
    #12 loop2 (back-edge) OK
    #13 conditional loop OK
    #14 read uninitialized register OK
    #15 read invalid register OK
    #16 program doesn't init R0 before exit OK
    #17 stack out of bounds OK
    #18 invalid call insn1 OK
    #19 invalid call insn2 OK
    #20 invalid function call OK
    #21 uninitialized stack1 OK
    #22 uninitialized stack2 OK
    #23 check valid spill/fill OK
    #24 check corrupted spill/fill OK
    #25 invalid src register in STX OK
    #26 invalid dst register in STX OK
    #27 invalid dst register in ST OK
    #28 invalid src register in LDX OK
    #29 invalid dst register in LDX OK
    #30 junk insn OK
    #31 junk insn2 OK
    #32 junk insn3 OK
    #33 junk insn4 OK
    #34 junk insn5 OK
    #35 misaligned read from stack OK
    #36 invalid map_fd for function call OK
    #37 don't check return value before access OK
    #38 access memory with incorrect alignment OK
    #39 sometimes access memory with incorrect alignment OK
    #40 jump test 1 OK
    #41 jump test 2 OK
    #42 jump test 3 OK
    #43 jump test 4 OK

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
[mpe: test using samples/bpf]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
fcbb539
@popcornmix popcornmix pushed a commit that referenced this issue Jan 10, 2015
Borislav Petkov x86, microcode, AMD: Do not use smp_processor_id() in preemtible context
commit 2ef84b3 upstream.

Hand down the cpu number instead, otherwise lockdep screams when doing

echo 1 > /sys/devices/system/cpu/microcode/reload.

BUG: using smp_processor_id() in preemptible [00000000] code: amd64-microcode/2470
caller is debug_smp_processor_id+0x12/0x20
CPU: 1 PID: 2470 Comm: amd64-microcode Not tainted 3.18.0-rc6+ #26
...

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417428741-4501-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6a76bc2
@davet321 davet321 pushed a commit to davet321/rpi-linux that referenced this issue Aug 17, 2015
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7f488aa
@popcornmix popcornmix pushed a commit that referenced this issue Aug 20, 2015
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ecf5fc6
@gumbit gumbit added a commit to gumbit/linux-rt-rpi that referenced this issue Dec 28, 2015
Borislav Petkov x86, microcode, AMD: Do not use smp_processor_id() in preemtible context
commit 2ef84b3 upstream.

Hand down the cpu number instead, otherwise lockdep screams when doing

echo 1 > /sys/devices/system/cpu/microcode/reload.

BUG: using smp_processor_id() in preemptible [00000000] code: amd64-microcode/2470
caller is debug_smp_processor_id+0x12/0x20
CPU: 1 PID: 2470 Comm: amd64-microcode Not tainted 3.18.0-rc6+ #26
...

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417428741-4501-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
b858858
@giraldeau giraldeau pushed a commit to giraldeau/linux that referenced this issue Apr 12, 2016
Thomas Gleixner net-flip-lock-dep-thingy.patch
=======================================================
[ INFO: possible circular locking dependency detected ]
3.0.0-rc3+ #26
-------------------------------------------------------
ip/1104 is trying to acquire lock:
 (local_softirq_lock){+.+...}, at: [<ffffffff81056d12>] __local_lock+0x25/0x68

but task is already holding lock:
 (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (sk_lock-AF_INET){+.+...}:
       [<ffffffff810836e5>] lock_acquire+0x103/0x12e
       [<ffffffff813e2781>] lock_sock_nested+0x82/0x92
       [<ffffffff81433308>] lock_sock+0x10/0x12
       [<ffffffff81433afa>] tcp_close+0x1b/0x355
       [<ffffffff81453c99>] inet_release+0xc3/0xcd
       [<ffffffff813dff3f>] sock_release+0x1f/0x74
       [<ffffffff813dffbb>] sock_close+0x27/0x2b
       [<ffffffff81129c63>] fput+0x11d/0x1e3
       [<ffffffff81126577>] filp_close+0x70/0x7b
       [<ffffffff8112667a>] sys_close+0xf8/0x13d
       [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b

-> #0 (local_softirq_lock){+.+...}:
       [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8
       [<ffffffff810836e5>] lock_acquire+0x103/0x12e
       [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a
       [<ffffffff81056d12>] __local_lock+0x25/0x68
       [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b
       [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f
       [<ffffffff81433c38>] tcp_close+0x159/0x355
       [<ffffffff81453c99>] inet_release+0xc3/0xcd
       [<ffffffff813dff3f>] sock_release+0x1f/0x74
       [<ffffffff813dffbb>] sock_close+0x27/0x2b
       [<ffffffff81129c63>] fput+0x11d/0x1e3
       [<ffffffff81126577>] filp_close+0x70/0x7b
       [<ffffffff8112667a>] sys_close+0xf8/0x13d
       [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_INET);
                               lock(local_softirq_lock);
                               lock(sk_lock-AF_INET);
  lock(local_softirq_lock);

 *** DEADLOCK ***

1 lock held by ip/1104:
 #0:  (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12

stack backtrace:
Pid: 1104, comm: ip Not tainted 3.0.0-rc3+ #26
Call Trace:
 [<ffffffff81081649>] print_circular_bug+0x1f8/0x209
 [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8
 [<ffffffff81056d12>] ? __local_lock+0x25/0x68
 [<ffffffff810836e5>] lock_acquire+0x103/0x12e
 [<ffffffff81056d12>] ? __local_lock+0x25/0x68
 [<ffffffff81046c75>] ? get_parent_ip+0x11/0x41
 [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a
 [<ffffffff81056d12>] ? __local_lock+0x25/0x68
 [<ffffffff81046c8c>] ? get_parent_ip+0x28/0x41
 [<ffffffff81056d12>] __local_lock+0x25/0x68
 [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b
 [<ffffffff81433308>] ? lock_sock+0x10/0x12
 [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f
 [<ffffffff81433c38>] tcp_close+0x159/0x355
 [<ffffffff81453c99>] inet_release+0xc3/0xcd
 [<ffffffff813dff3f>] sock_release+0x1f/0x74
 [<ffffffff813dffbb>] sock_close+0x27/0x2b
 [<ffffffff81129c63>] fput+0x11d/0x1e3
 [<ffffffff81126577>] filp_close+0x70/0x7b
 [<ffffffff8112667a>] sys_close+0xf8/0x13d
 [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b


Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
c5be54a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.