mwifiex: fix S0ix/AP-scanning after suspend by sw method instead of hw #56

kitakar5525 · 2020-09-23T17:18:36Z

Hi, I've just opened a new branch that tries to fix mwifiex suspend issues in a more simple way.

Issues in upstream mwifiex regarding suspend:

reportedly breaks S0ix on suspend
sometimes breaks AP scanning after suspend

The previous patch mwifiex_pcie: remove()/probe() card on suspend()/resume() does remove()/probe() card on suspend()/resume(). This may be too aggressive.
The new patch mwifiex: pcie: use shutdown_sw()/reinit_sw() on suspend()/resume() tries to fix by instead sw-based way (simpler/milder)

At least I confirmed that AP scanning always works after suspend on SB1.
However, I can't confirm S0ix because SB1 can't achieve S0ix by other unknown issues anyway.

So, first, I want you folks to test if S0ix still works after this PR.
Thanks.

Note that this is not yet for upstreaming (e.g. comment at top of suspend()/resume() functions, debug output, etc.)

EDIT: the second patch mwifiex: fix mwifiex_shutdown_sw() causing sw reset failure is required for the main part of this PR (mwifiex: pcie: use shutdown_sw()/reinit_sw() on suspend()/resume())

kitakar5525

Ah, typo in commit message. "by some reasons" -> "for some reasons"
Fix it on next rebase.

EDIT: fixed.

kitakar5525

mwifiex: fix mwifiex_shutdown_sw() causing sw reset failure

Maybe adding Fixes tag is better:

Fixes: 4c5dae5 ("mwifiex: add PCIe function level reset support")

EDIT: added.

qzed · 2020-09-24T15:34:37Z

So, first, I want you folks to test if S0ix still works after this PR.

On my SB2, S0ix still seems to work.

qzed · 2020-09-24T15:40:03Z

Otherwise, this does look good to me! Maybe add that this will disable wakeups on the device in your commit message. This might also reset some internal state.

kitakar5525 · 2020-09-24T15:55:37Z

Maybe add that this will disable wakeups on the device in your commit message. This might also reset some internal state.

Right. I'll write it clearly.

kitakar5525 · 2020-09-24T16:32:58Z

Hi, I just rebased the third (main) patch.

Changed comments at top of suspend()/resume() for upstreaming
Removed (kind of debug) outputs from suspend()/resume() that indicated using shutdown_sw()/reinit_sw() instead

Improved commit message
Especially, here:

As a side effect, this patch disables wakeups (means can't use
Wake-On-WLAN anymore, if it was working before), and might also
reset some internal states.

and here:

- Removed mwifiex_enable_wake()/mwifiex_disable_wake()
  as incompatible and redundant. Shutdowned driver can't cause a wakeup.

By the way, is this comment Shutdowned driver can't cause a wakeup. true? and how do you think if preserving mwifiex_enable_wake()/mwifiex_disable_wake() is useful? (I've currently deleted as the commit message states)

Also, currently, suspend() performs mwifiex_shutdown_sw() unconditionally. But may be better to check if it's already suspended or not, as Host Sleep checks like this:

kernel/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c

Lines 538 to 542 in d569b8e

    
           if (adapter->hs_activated) { 
        
           	mwifiex_dbg(adapter, CMD, 
        
           		    "cmd: HS Already activated\n"); 
        
           	return true; 
        
           }

I'll do this next.

kitakar5525 · 2020-09-24T16:40:12Z

Also, currently, suspend() performs mwifiex_shutdown_sw() unconditionally. But may be better to check if it's already suspended or not, as Host Sleep checks like this:

kernel/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c

Lines 538 to 542 in d569b8e

if (adapter->hs_activated) {

mwifiex_dbg(adapter, CMD,

"cmd: HS Already activated\n");

return true;

}

I'll do this next.

That said, why do we need to check if device is already suspended/resumed as upstream does?

kernel/drivers/net/wireless/marvell/mwifiex/pcie.c

Lines 206 to 210 in 25501d3

    
           if (!test_bit(MWIFIEX_IS_SUSPENDED, &adapter->work_flags)) { 
        
           	mwifiex_dbg(adapter, WARN, 
        
           		    "Device already resumed\n"); 
        
           	return 0; 
        
           }

If it's useless, I'd like to remove the check instead.

qzed · 2020-09-24T17:44:33Z

By the way, is this comment Shutdowned driver can't cause a wakeup. true?

I can only guess. If it works the way I think it works, then the WiFi controller would normally keep on working (in a suspended mode), so essentially be powered on. If it detects a wakeup package, it'll signal a wakeup via the IRQ (I think that's normally a GPIO pin or something like that). If we now shut down the card, the EC firmware might not be running any more, so it won't detect and issue wakeups. At least that's my theory. Without knowing more about either how those things usually work or how the mwifiex chip works in particular, all of this might be wrong of course.

Oh and it should be Shut down driver, I think (past tense of "shut down" is also "shut down").

and how do you think if preserving mwifiex_enable_wake()/mwifiex_disable_wake() is useful? (I've currently deleted as the commit message states)

I would keep them removed, for now at least. If the EC can't issue any wakeups when shut down, having them there could cause the wrong impression (i.e. that wakeups are still expected to work). In the worst-case scenario it could also cause spurious wakeups if the EC would be not properly configured for suspend, but I kind of doubt that.

That said, why do we need to check if device is already suspended/resumed as upstream does?

I honestly don't know. is it possible that something like runtime-suspend could enter Host Suspend while the device is still active, and suspend() then cause a "double suspend"? If not then I guess that's not needed.

kitakar5525 · 2020-09-26T16:24:09Z

By the way, is this comment Shutdowned driver can't cause a wakeup. true?

I can only guess. If it works the way I think it works, then the WiFi controller would normally keep on working (in a suspended mode), so essentially be powered on. If it detects a wakeup package, it'll signal a wakeup via the IRQ (I think that's normally a GPIO pin or something like that). If we now shut down the card, the EC firmware might not be running any more, so it won't detect and issue wakeups. At least that's my theory. Without knowing more about either how those things usually work or how the mwifiex chip works in particular, all of this might be wrong of course.

Rewrote the reason of removing mwifiex_enable_wake()/mwifiex_disable_wake() in a more unspecified way:

- Removed mwifiex_enable_wake()/mwifiex_disable_wake()
  as incompatible and redundant because the driver will be shut down
  instead of Host Sleep.

Oh and it should be Shut down driver, I think (past tense of "shut down" is also "shut down").

Oh, right. Fixed.

Corrected English verb shutdown -> shut down

and how do you think if preserving mwifiex_enable_wake()/mwifiex_disable_wake() is useful? (I've currently deleted as the commit message states)

I would keep them removed, for now at least. If the EC can't issue any wakeups when shut down, having them there could cause the wrong impression (i.e. that wakeups are still expected to work). In the worst-case scenario it could also cause spurious wakeups if the EC would be not properly configured for suspend, but I kind of doubt that.

Thanks. I kept removing mwifiex_enable_wake()/mwifiex_disable_wake().

That said, why do we need to check if device is already suspended/resumed as upstream does?

I honestly don't know. is it possible that something like runtime-suspend could enter Host Suspend while the device is still active, and suspend() then cause a "double suspend"? If not then I guess that's not needed.

Added comment for not checking suspend state on suspend()

Note that now there is no check whether it's suspended before performing
mwifiex_shutdown_sw() on suspend().
With the previous Host Sleep method, the check was done by looking at
adapter->hs_activated in mwifiex_enable_hs() [sta_ioctl.c], but not
MWIFIEX_IS_SUSPENDED. So, what the previous method checked was instead
Host Sleep state, not suspend itself. Therefore, there is no need to check
the suspend state now.

Also removed comment for suspend state check at top of suspend() 
accordingly.

kitakar5525 · 2020-09-26T16:25:16Z

Other changes:

mwifiex: fix mwifiex_shutdown_sw() causing sw reset failure

Fixed "by some reasons" -> "for some reasons"
Added Fixes tag Fixes: 4c5dae59d2e9 ("mwifiex: add PCIe function level reset support")

mwifiex: pcie: use shutdown_sw()/reinit_sw() on suspend()/resume()

Re-added comment "If already not resumed" at top of resume()
I had removed the comment before because I intended to remove the check
of MWIFIEX_IS_SUSPENDED in resume(), but ended up preserving it.
Added new commit mwifiex: change comment for shutdown_sw()/reinit_sw() reflecting current state for upstreaming

kitakar5525 · 2020-09-26T16:26:59Z

Might need more investigation for why flush_workqueue() causes kernel oops. I'll do this next.

kitakar5525 · 2020-09-30T12:51:25Z

I tried re-adding flush_workqueue() (after mwifiex_shutdown_sw())

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index aef24b7c71c0..688bc94ba189 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -171,6 +171,8 @@ static int mwifiex_pcie_suspend(struct device *dev)
 		return -EFAULT;
 	}
 
+	flush_workqueue(adapter->workqueue);
+
 	/* Indicate device suspended */
 	set_bit(MWIFIEX_IS_SUSPENDED, &adapter->work_flags);

This resulted in SB1 not resuming from suspend.

The last journal output

[92430.637881] kernel: PM: suspend entry (s2idle)
[92430.682405] kernel: Filesystems sync: 0.044 seconds

So, no crash log recorded.

note: commit torvalds/linux@ec815dd added the flush_workqueue() for Host Sleep.

I have no idea what I should do, but upstream may know anyway. At least I haven't encountered issues with this ~~patch~~ PR.

qzed · 2020-09-30T13:03:57Z

Could be some deadlock again. AFAIK there are some suspend debugging options you could try. But yeah, maybe upstream has any idea on what could go wrong.

qzed

Looks good. Feel free to merge when you think it's ready.

kitakar5525 · 2020-09-30T13:44:01Z

Small changes to commit message/title:

mwifiex: pcie: use shutdown_sw()/reinit_sw() on suspend()/resume()

corrected some English grammar
the following change:

-Note that now there is no check whether it's suspended before performing
-mwifiex_shutdown_sw() on suspend().
+Note that suspend() no longer checks if it's already suspended.

removed the following from commit message:

I reused the contents of suspend()/resume() functions as much as possible,
and removed only the parts that are incompatible or redundant with
shutdown_sw()/reinit_sw().

- Removed wait_for_completion() as redundant
  mwifiex_shutdown_sw() does this.
- Removed flush_workqueue() as incompatible
  Causes kernel crashing.
- Removed mwifiex_enable_wake()/mwifiex_disable_wake()
  as incompatible and redundant because the driver will be shut down
  instead of entering Host Sleep.

I think sending this as extra comment is better.

change comment for shutdown_sw()/reinit_sw() reflecting current state

changed patch name to update comment for shutdown_sw()/reinit_sw() reflecting current state

kitakar5525 · 2020-09-30T13:46:13Z

Looks good. Feel free to merge when you think it's ready.

Thanks, I'll check commit message (mainly English grammar 😀) lastly for each PR and merge it.

kitakar5525 · 2020-09-30T14:20:24Z

Ah sorry, the last rebase caused a word missing

-When using the Host Sleep method, it prevents the platform to reach
+When using the Host Sleep method, it prevents the platform to reach S0ix
during suspend. Also, after suspend, sometimes AP scanning won't work,

Fixed in this rebase. I consider this PR is ready now.

This reverts commit d7df466. Reason for revert: Simplified and sw method commit incoming. Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>

When FLR is performed but without fw reset for some reasons (e.g. on Surface devices, fw reset requires another quirk), it fails to reset properly. You can trigger the issue on such devices via debugfs entry for reset: $ echo 1 | sudo tee /sys/kernel/debug/mwifiex/mlan0/reset and the resulting dmesg log: mwifiex_pcie 0000:03:00.0: Resetting per request mwifiex_pcie 0000:03:00.0: info: successfully disconnected from [BSSID]: reason code 3 mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed mwifiex_pcie 0000:03:00.0: deleting the crypto keys mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed mwifiex_pcie 0000:03:00.0: deleting the crypto keys mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed mwifiex_pcie 0000:03:00.0: deleting the crypto keys mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed mwifiex_pcie 0000:03:00.0: deleting the crypto keys mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed mwifiex_pcie 0000:03:00.0: deleting the crypto keys mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed mwifiex_pcie 0000:03:00.0: deleting the crypto keys mwifiex_pcie 0000:03:00.0: info: shutdown mwifiex... mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed mwifiex_pcie 0000:03:00.0: WLAN FW already running! Skip FW dnld mwifiex_pcie 0000:03:00.0: WLAN FW is active mwifiex_pcie 0000:03:00.0: Unknown api_id: 4 mwifiex_pcie 0000:03:00.0: info: MWIFIEX VERSION: mwifiex 1.0 (15.68.19.p21) mwifiex_pcie 0000:03:00.0: driver_version = mwifiex 1.0 (15.68.19.p21) mwifiex_pcie 0000:03:00.0: info: trying to associate to '[SSID]' bssid [BSSID] mwifiex_pcie 0000:03:00.0: info: associated to bssid [BSSID] successfully mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110 mwifiex_pcie 0000:03:00.0: info: successfully disconnected from [BSSID]: reason code 15 mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110 mwifiex_pcie 0000:03:00.0: deleting the crypto keys mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110 mwifiex_pcie 0000:03:00.0: deleting the crypto keys mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110 mwifiex_pcie 0000:03:00.0: deleting the crypto keys [...] When comparing mwifiex_shutdown_sw() with mwifiex_pcie_remove(), it lacks mwifiex_init_shutdown_fw(). This commit fixes mwifiex_shutdown_sw() by adding the missing mwifiex_init_shutdown_fw(). Fixes: 4c5dae5 ("mwifiex: add PCIe function level reset support") Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>

There are issues with S0ix achievement and AP scanning after suspend with the current Host Sleep method. When using the Host Sleep method, it prevents the platform to reach S0ix during suspend. Also, after suspend, sometimes AP scanning won't work, resulting in non-working wifi. To fix such issues, perform shutdown_sw()/reinit_sw() instead of Host Sleep. As a side effect, this patch disables wakeups (means that Wake-On-WLAN can't be used anymore, if it was working before), and might also reset some internal states. Note that suspend() no longer checks if it's already suspended. With the previous Host Sleep method, the check was done by looking at adapter->hs_activated in mwifiex_enable_hs() [sta_ioctl.c], but not MWIFIEX_IS_SUSPENDED. So, what the previous method checked was instead Host Sleep state, not suspend itself. Therefore, there is no need to check the suspend state now. Also removed comment for suspend state check at top of suspend() accordingly. Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>

…ent state The functions mwifiex_shutdown_sw() and mwifiex_reinit_sw() can be used for more general purposes than the PCIe function level reset. Also, these are even not PCIe-specific. So, let's update the comments at the top of each function accordingly. Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>

kitakar5525 · 2020-09-30T15:04:01Z

Rebased on top of current v5.8-surface-devel for simplicity

kitakar5525 · 2020-10-01T12:44:04Z

By the way, I noticed the following changes regarding S0ix (printed /sys/kernel/debug/pmc_core contents as early as possible after resume) on SB1 (Host Sleep method vs shutdown_sw() method):

-MPHY CORE LANE 12               	State: Not power gated
-PCH IP: 6  - SPC                             	State: On
-DMIPCIE3 PLL                    	State: Active
+MPHY CORE LANE 12               	State: Power gated
+PCH IP: 6  - SPC                             	State: Off
+DMIPCIE3 PLL                    	State: Idle

So, this indicates that the shutdown_sw() suspend method is definitely working.

…ort_recv_listen() With multi-transport support, listener sockets are not bound to any transport. So, calling virtio_transport_reset(), when an error occurs, on a listener socket produces the following null-pointer dereference: BUG: kernel NULL pointer dereference, address: 00000000000000e8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 20 Comm: kworker/0:1 Not tainted 5.5.0-rc1-ste-00003-gb4be21f316ac-dirty linux-surface#56 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport] RIP: 0010:virtio_transport_send_pkt_info+0x20/0x130 [vmw_vsock_virtio_transport_common] Code: 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 10 44 8b 76 20 e8 c0 ba fe ff <48> 8b 80 e8 00 00 00 e8 64 e3 7d c1 45 8b 45 00 41 8b 8c 24 d4 02 RSP: 0018:ffffc900000b7d08 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88807bf12728 RCX: 0000000000000000 RDX: ffff88807bf12700 RSI: ffffc900000b7d50 RDI: ffff888035c84000 RBP: ffffc900000b7d40 R08: ffff888035c84000 R09: ffffc900000b7d08 R10: ffff8880781de800 R11: 0000000000000018 R12: ffff888035c84000 R13: ffffc900000b7d50 R14: 0000000000000000 R15: ffff88807bf12724 FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000e8 CR3: 00000000790f4004 CR4: 0000000000160ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: virtio_transport_reset+0x59/0x70 [vmw_vsock_virtio_transport_common] virtio_transport_recv_pkt+0x5bb/0xe50 [vmw_vsock_virtio_transport_common] ? detach_buf_split+0xf1/0x130 virtio_transport_rx_work+0xba/0x130 [vmw_vsock_virtio_transport] process_one_work+0x1c0/0x300 worker_thread+0x45/0x3c0 kthread+0xfc/0x130 ? current_work+0x40/0x40 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Modules linked in: sunrpc kvm_intel kvm vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common irqbypass vsock virtio_rng rng_core CR2: 00000000000000e8 ---[ end trace e75400e2ea2fa824 ]--- This happens because virtio_transport_reset() calls virtio_transport_send_pkt_info() that can be used only on connecting/connected sockets. This patch fixes the issue, using virtio_transport_reset_no_sock() instead of virtio_transport_reset() when we are handling a listener socket. Fixes: c0cfa2d ("vsock: add multi-transports support") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit df18fa1) BUG=b:153004946 TEST=nested vsock works Change-Id: If59ecc3babd64b5a0e9373df110ce9d00185c5f8 Signed-off-by: Allen Webb <allenwebb@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2406879 Reviewed-by: Daniel Verkamp <dverkamp@chromium.org> Reviewed-by: Sean Paul <seanpaul@chromium.org>

I was hitting the below panic continuously when attaching kprobes to scheduler functions [ 159.045212] Unexpected kernel BRK exception at EL1 [ 159.053753] Internal error: BRK handler: f2000006 [#1] PREEMPT SMP [ 159.059954] Modules linked in: [ 159.063025] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.11.0-rc4-00008-g1e2a199f6ccd #56 [rt-app] <notice> [1] Exiting.[ 159.071166] Hardware name: ARM Juno development board (r2) (DT) [ 159.079689] pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO BTYPE=--) [ 159.085723] pc : 0xffff80001624501c [ 159.089377] lr : attach_entity_load_avg+0x2ac/0x350 [ 159.094271] sp : ffff80001622b640 [rt-app] <notice> [0] Exiting.[ 159.097591] x29: ffff80001622b640 x28: 0000000000000001 [ 159.105515] x27: 0000000000000049 x26: ffff000800b79980 [ 159.110847] x25: ffff00097ef37840 x24: 0000000000000000 [ 159.116331] x23: 00000024eacec1ec x22: ffff00097ef12b90 [ 159.121663] x21: ffff00097ef37700 x20: ffff800010119170 [rt-app] <notice> [11] Exiting.[ 159.126995] x19: ffff00097ef37840 x18: 000000000000000e [ 159.135003] x17: 0000000000000001 x16: 0000000000000019 [ 159.140335] x15: 0000000000000000 x14: 0000000000000000 [ 159.145666] x13: 0000000000000002 x12: 0000000000000002 [ 159.150996] x11: ffff80001592f9f0 x10: 0000000000000060 [ 159.156327] x9 : ffff8000100f6f9c x8 : be618290de0999a1 [ 159.161659] x7 : ffff80096a4b1000 x6 : 0000000000000000 [ 159.166990] x5 : ffff00097ef37840 x4 : 0000000000000000 [ 159.172321] x3 : ffff000800328948 x2 : 0000000000000000 [ 159.177652] x1 : 0000002507d52fec x0 : ffff00097ef12b90 [ 159.182983] Call trace: [ 159.185433] 0xffff80001624501c [ 159.188581] update_load_avg+0x2d0/0x778 [ 159.192516] enqueue_task_fair+0x134/0xe20 [ 159.196625] enqueue_task+0x4c/0x2c8 [ 159.200211] ttwu_do_activate+0x70/0x138 [ 159.204147] sched_ttwu_pending+0xbc/0x160 [ 159.208253] flush_smp_call_function_queue+0x16c/0x320 [ 159.213408] generic_smp_call_function_single_interrupt+0x1c/0x28 [ 159.219521] ipi_handler+0x1e8/0x3c8 [ 159.223106] handle_percpu_devid_irq+0xd8/0x460 [ 159.227650] generic_handle_irq+0x38/0x50 [ 159.231672] __handle_domain_irq+0x6c/0xc8 [ 159.235781] gic_handle_irq+0xcc/0xf0 [ 159.239452] el1_irq+0xb4/0x180 [ 159.242600] rcu_is_watching+0x28/0x70 [ 159.246359] rcu_read_lock_held_common+0x44/0x88 [ 159.250991] rcu_read_lock_any_held+0x30/0xc0 [ 159.255360] kretprobe_dispatcher+0xc4/0xf0 [ 159.259555] __kretprobe_trampoline_handler+0xc0/0x150 [ 159.264710] trampoline_probe_handler+0x38/0x58 [ 159.269255] kretprobe_trampoline+0x70/0xc4 [ 159.273450] run_rebalance_domains+0x54/0x80 [ 159.277734] __do_softirq+0x164/0x684 [ 159.281406] irq_exit+0x198/0x1b8 [ 159.284731] __handle_domain_irq+0x70/0xc8 [ 159.288840] gic_handle_irq+0xb0/0xf0 [ 159.292510] el1_irq+0xb4/0x180 [ 159.295658] arch_cpu_idle+0x18/0x28 [ 159.299245] default_idle_call+0x9c/0x3e8 [ 159.303265] do_idle+0x25c/0x2a8 [ 159.306502] cpu_startup_entry+0x2c/0x78 [ 159.310436] secondary_start_kernel+0x160/0x198 [ 159.314984] Code: d42000c0 aa1e03e9 d42000c0 aa1e03e9 (d42000c0) After a bit of head scratching and debugging it turned out that it is due to kprobe handler being interrupted by a tick that causes us to go into (I think another) kprobe handler. The culprit was kprobe_breakpoint_ss_handler() returning DBG_HOOK_ERROR which leads to the Unexpected kernel BRK exception. Reverting commit ba090f9 ("arm64: kprobes: Remove redundant kprobe_step_ctx") seemed to fix the problem for me. Further analysis showed that kcb->kprobe_status is set to KPROBE_REENTER when the error occurs. By teaching kprobe_breakpoint_ss_handler() to handle this status I can no longer reproduce the problem. Fixes: ba090f9 ("arm64: kprobes: Remove redundant kprobe_step_ctx") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210122110909.3324607-1-qais.yousef@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

[ Upstream commit 9ca5e7e ] When a file descriptor of pppol2tp socket is passed as file descriptor of UDP socket, a recursive deadlock occurs in l2tp_tunnel_register(). This situation is reproduced by the following program: int main(void) { int sock; struct sockaddr_pppol2tp addr; sock = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); if (sock < 0) { perror("socket"); return 1; } addr.sa_family = AF_PPPOX; addr.sa_protocol = PX_PROTO_OL2TP; addr.pppol2tp.pid = 0; addr.pppol2tp.fd = sock; addr.pppol2tp.addr.sin_family = PF_INET; addr.pppol2tp.addr.sin_port = htons(0); addr.pppol2tp.addr.sin_addr.s_addr = inet_addr("192.168.0.1"); addr.pppol2tp.s_tunnel = 1; addr.pppol2tp.s_session = 0; addr.pppol2tp.d_tunnel = 0; addr.pppol2tp.d_session = 0; if (connect(sock, (const struct sockaddr *)&addr, sizeof(addr)) < 0) { perror("connect"); return 1; } return 0; } This program causes the following lockdep warning: ============================================ WARNING: possible recursive locking detected 6.2.0-rc5-00205-gc96618275234 #56 Not tainted -------------------------------------------- repro/8607 is trying to acquire lock: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: l2tp_tunnel_register+0x2b7/0x11c0 but task is already holding lock: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sk_lock-AF_PPPOX); lock(sk_lock-AF_PPPOX); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by repro/8607: #0: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30 stack backtrace: CPU: 0 PID: 8607 Comm: repro Not tainted 6.2.0-rc5-00205-gc96618275234 #56 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x100/0x178 __lock_acquire.cold+0x119/0x3b9 ? lockdep_hardirqs_on_prepare+0x410/0x410 lock_acquire+0x1e0/0x610 ? l2tp_tunnel_register+0x2b7/0x11c0 ? lock_downgrade+0x710/0x710 ? __fget_files+0x283/0x3e0 lock_sock_nested+0x3a/0xf0 ? l2tp_tunnel_register+0x2b7/0x11c0 l2tp_tunnel_register+0x2b7/0x11c0 ? sprintf+0xc4/0x100 ? l2tp_tunnel_del_work+0x6b0/0x6b0 ? debug_object_deactivate+0x320/0x320 ? lockdep_init_map_type+0x16d/0x7a0 ? lockdep_init_map_type+0x16d/0x7a0 ? l2tp_tunnel_create+0x2bf/0x4b0 ? l2tp_tunnel_create+0x3c6/0x4b0 pppol2tp_connect+0x14e1/0x1a30 ? pppol2tp_put_sk+0xd0/0xd0 ? aa_sk_perm+0x2b7/0xa80 ? aa_af_perm+0x260/0x260 ? bpf_lsm_socket_connect+0x9/0x10 ? pppol2tp_put_sk+0xd0/0xd0 __sys_connect_file+0x14f/0x190 __sys_connect+0x133/0x160 ? __sys_connect_file+0x190/0x190 ? lockdep_hardirqs_on+0x7d/0x100 ? ktime_get_coarse_real_ts64+0x1b7/0x200 ? ktime_get_coarse_real_ts64+0x147/0x200 ? __audit_syscall_entry+0x396/0x500 __x64_sys_connect+0x72/0xb0 do_syscall_64+0x38/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd This patch fixes the issue by getting/creating the tunnel before locking the pppol2tp socket. Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()") Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

…ress commit 8b51a39 upstream. If we specify a valid CQ ring address but an invalid SQ ring address, we'll correctly spot this and free the allocated pages and clear them to NULL. However, we don't clear the ring page count, and hence will attempt to free the pages again. We've already cleared the address of the page array when freeing them, but we don't check for that. This causes the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Oops [#1] Modules linked in: CPU: 0 PID: 20 Comm: kworker/u2:1 Not tainted 6.6.0-rc5-dirty #56 Hardware name: ucbbar,riscvemu-bare (DT) Workqueue: events_unbound io_ring_exit_work epc : io_pages_free+0x2a/0x58 ra : io_rings_free+0x3a/0x50 epc : ffffffff808811a2 ra : ffffffff80881406 sp : ffff8f80000c3cd0 status: 0000000200000121 badaddr: 0000000000000000 cause: 000000000000000d [<ffffffff808811a2>] io_pages_free+0x2a/0x58 [<ffffffff80881406>] io_rings_free+0x3a/0x50 [<ffffffff80882176>] io_ring_exit_work+0x37e/0x424 [<ffffffff80027234>] process_one_work+0x10c/0x1f4 [<ffffffff8002756e>] worker_thread+0x252/0x31c [<ffffffff8002f5e4>] kthread+0xc4/0xe0 [<ffffffff8000332a>] ret_from_fork+0xa/0x1c Check for a NULL array in io_pages_free(), but also clear the page counts when we free them to be on the safer side. Reported-by: rtm@csail.mit.edu Fixes: 03d89a2 ("io_uring: support for user allocated memory for rings/sqes") Cc: stable@vger.kernel.org Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…ress If we specify a valid CQ ring address but an invalid SQ ring address, we'll correctly spot this and free the allocated pages and clear them to NULL. However, we don't clear the ring page count, and hence will attempt to free the pages again. We've already cleared the address of the page array when freeing them, but we don't check for that. This causes the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Oops [#1] Modules linked in: CPU: 0 PID: 20 Comm: kworker/u2:1 Not tainted 6.6.0-rc5-dirty #56 Hardware name: ucbbar,riscvemu-bare (DT) Workqueue: events_unbound io_ring_exit_work epc : io_pages_free+0x2a/0x58 ra : io_rings_free+0x3a/0x50 epc : ffffffff808811a2 ra : ffffffff80881406 sp : ffff8f80000c3cd0 status: 0000000200000121 badaddr: 0000000000000000 cause: 000000000000000d [<ffffffff808811a2>] io_pages_free+0x2a/0x58 [<ffffffff80881406>] io_rings_free+0x3a/0x50 [<ffffffff80882176>] io_ring_exit_work+0x37e/0x424 [<ffffffff80027234>] process_one_work+0x10c/0x1f4 [<ffffffff8002756e>] worker_thread+0x252/0x31c [<ffffffff8002f5e4>] kthread+0xc4/0xe0 [<ffffffff8000332a>] ret_from_fork+0xa/0x1c Check for a NULL array in io_pages_free(), but also clear the page counts when we free them to be on the safer side. Reported-by: rtm@csail.mit.edu Fixes: 03d89a2 ("io_uring: support for user allocated memory for rings/sqes") Cc: stable@vger.kernel.org Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

[ Upstream commit 26afda7 ] Christoph reported the following splat: WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0 Modules linked in: CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759 Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80 RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293 RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64 R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000 R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800 FS: 000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786 do_accept+0x435/0x620 net/socket.c:1929 __sys_accept4_file net/socket.c:1969 [inline] __sys_accept4+0x9b/0x110 net/socket.c:1999 __do_sys_accept net/socket.c:2016 [inline] __se_sys_accept net/socket.c:2013 [inline] __x64_sys_accept+0x7d/0x90 net/socket.c:2013 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x4315f9 Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300 R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055 </TASK> The reproducer invokes shutdown() before entering the listener status. After commit 9406279 ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets"), the above causes the child to reach the accept syscall in FIN_WAIT1 status. Eric noted we can relax the existing assertion in __inet_accept() Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: multipath-tcp/mptcp_net-next#490 Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: 9406279 ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/23ab880a44d8cfd967e84de8b93dbf48848e3d8c.1716299669.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

kitakar5525 commented Sep 23, 2020

View reviewed changes

kitakar5525 commented Sep 24, 2020

View reviewed changes

kitakar5525 force-pushed the mwifiex/fix-suspend-sw branch 2 times, most recently from 63c2656 to cf57ef5 Compare September 24, 2020 16:18

kitakar5525 force-pushed the mwifiex/fix-suspend-sw branch from cf57ef5 to c7d491e Compare September 26, 2020 16:13

kitakar5525 marked this pull request as ready for review September 30, 2020 12:51

qzed approved these changes Sep 30, 2020

View reviewed changes

kitakar5525 force-pushed the mwifiex/fix-suspend-sw branch 2 times, most recently from 54c8340 to b9fe567 Compare September 30, 2020 13:42

kitakar5525 force-pushed the mwifiex/fix-suspend-sw branch from b9fe567 to 5fbc904 Compare September 30, 2020 14:16

kitakar5525 changed the title ~~WIP: mwifiex: fix S0ix/AP-scanning after suspend by sw method instead of hw~~ mwifiex: fix S0ix/AP-scanning after suspend by sw method instead of hw Sep 30, 2020

kitakar5525 added 4 commits October 1, 2020 00:02

Revert "mwifiex_pcie: remove()/probe() card on suspend()/resume()"

05243ab

This reverts commit d7df466. Reason for revert: Simplified and sw method commit incoming. Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>

kitakar5525 force-pushed the mwifiex/fix-suspend-sw branch from 5fbc904 to 53547e1 Compare September 30, 2020 15:03

kitakar5525 merged commit 467018c into v5.8-surface-devel Sep 30, 2020

kitakar5525 deleted the mwifiex/fix-suspend-sw branch September 30, 2020 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mwifiex: fix S0ix/AP-scanning after suspend by sw method instead of hw #56

mwifiex: fix S0ix/AP-scanning after suspend by sw method instead of hw #56

kitakar5525 commented Sep 23, 2020 •

edited

Loading

kitakar5525 left a comment •

edited

Loading

kitakar5525 left a comment •

edited

Loading

qzed commented Sep 24, 2020

qzed commented Sep 24, 2020

kitakar5525 commented Sep 24, 2020

kitakar5525 commented Sep 24, 2020

kitakar5525 commented Sep 24, 2020

qzed commented Sep 24, 2020

kitakar5525 commented Sep 26, 2020

kitakar5525 commented Sep 26, 2020

kitakar5525 commented Sep 26, 2020

kitakar5525 commented Sep 30, 2020 •

edited

Loading

qzed commented Sep 30, 2020

qzed left a comment

kitakar5525 commented Sep 30, 2020

kitakar5525 commented Sep 30, 2020

kitakar5525 commented Sep 30, 2020

kitakar5525 commented Sep 30, 2020

kitakar5525 commented Oct 1, 2020 •

edited

Loading

mwifiex: fix S0ix/AP-scanning after suspend by sw method instead of hw #56

mwifiex: fix S0ix/AP-scanning after suspend by sw method instead of hw #56

Conversation

kitakar5525 commented Sep 23, 2020 • edited Loading

kitakar5525 left a comment • edited Loading

Choose a reason for hiding this comment

kitakar5525 left a comment • edited Loading

Choose a reason for hiding this comment

qzed commented Sep 24, 2020

qzed commented Sep 24, 2020

kitakar5525 commented Sep 24, 2020

kitakar5525 commented Sep 24, 2020

kitakar5525 commented Sep 24, 2020

qzed commented Sep 24, 2020

kitakar5525 commented Sep 26, 2020

kitakar5525 commented Sep 26, 2020

kitakar5525 commented Sep 26, 2020

kitakar5525 commented Sep 30, 2020 • edited Loading

qzed commented Sep 30, 2020

qzed left a comment

Choose a reason for hiding this comment

kitakar5525 commented Sep 30, 2020

kitakar5525 commented Sep 30, 2020

kitakar5525 commented Sep 30, 2020

kitakar5525 commented Sep 30, 2020

kitakar5525 commented Oct 1, 2020 • edited Loading

kitakar5525 commented Sep 23, 2020 •

edited

Loading

kitakar5525 left a comment •

edited

Loading

kitakar5525 left a comment •

edited

Loading

kitakar5525 commented Sep 30, 2020 •

edited

Loading

kitakar5525 commented Oct 1, 2020 •

edited

Loading