Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mwifiex: fix S0ix/AP-scanning after suspend by sw method instead of hw #56

Merged
merged 4 commits into from
Sep 30, 2020

Conversation

kitakar5525
Copy link
Member

@kitakar5525 kitakar5525 commented Sep 23, 2020

Hi, I've just opened a new branch that tries to fix mwifiex suspend issues in a more simple way.

Issues in upstream mwifiex regarding suspend:

  • reportedly breaks S0ix on suspend
  • sometimes breaks AP scanning after suspend

The previous patch mwifiex_pcie: remove()/probe() card on suspend()/resume() does remove()/probe() card on suspend()/resume(). This may be too aggressive.
The new patch mwifiex: pcie: use shutdown_sw()/reinit_sw() on suspend()/resume() tries to fix by instead sw-based way (simpler/milder)

At least I confirmed that AP scanning always works after suspend on SB1.
However, I can't confirm S0ix because SB1 can't achieve S0ix by other unknown issues anyway.

So, first, I want you folks to test if S0ix still works after this PR.
Thanks.

Note that this is not yet for upstreaming (e.g. comment at top of suspend()/resume() functions, debug output, etc.)

EDIT: the second patch mwifiex: fix mwifiex_shutdown_sw() causing sw reset failure is required for the main part of this PR (mwifiex: pcie: use shutdown_sw()/reinit_sw() on suspend()/resume())

Copy link
Member Author

@kitakar5525 kitakar5525 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, typo in commit message. "by some reasons" -> "for some reasons"
Fix it on next rebase.

EDIT: fixed.

Copy link
Member Author

@kitakar5525 kitakar5525 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mwifiex: fix mwifiex_shutdown_sw() causing sw reset failure

Maybe adding Fixes tag is better:

Fixes: 4c5dae5 ("mwifiex: add PCIe function level reset support")

EDIT: added.

@qzed
Copy link
Member

qzed commented Sep 24, 2020

So, first, I want you folks to test if S0ix still works after this PR.

On my SB2, S0ix still seems to work.

@qzed
Copy link
Member

qzed commented Sep 24, 2020

Otherwise, this does look good to me! Maybe add that this will disable wakeups on the device in your commit message. This might also reset some internal state.

@kitakar5525
Copy link
Member Author

Maybe add that this will disable wakeups on the device in your commit message. This might also reset some internal state.

Right. I'll write it clearly.

@kitakar5525 kitakar5525 force-pushed the mwifiex/fix-suspend-sw branch 2 times, most recently from 63c2656 to cf57ef5 Compare September 24, 2020 16:18
@kitakar5525
Copy link
Member Author

Hi, I just rebased the third (main) patch.

  • Changed comments at top of suspend()/resume() for upstreaming

  • Removed (kind of debug) outputs from suspend()/resume() that indicated using shutdown_sw()/reinit_sw() instead

  • Improved commit message
    Especially, here:

    As a side effect, this patch disables wakeups (means can't use
    Wake-On-WLAN anymore, if it was working before), and might also
    reset some internal states.
    

    and here:

    - Removed mwifiex_enable_wake()/mwifiex_disable_wake()
      as incompatible and redundant. Shutdowned driver can't cause a wakeup.
    

By the way, is this comment Shutdowned driver can't cause a wakeup. true? and how do you think if preserving mwifiex_enable_wake()/mwifiex_disable_wake() is useful? (I've currently deleted as the commit message states)

Also, currently, suspend() performs mwifiex_shutdown_sw() unconditionally. But may be better to check if it's already suspended or not, as Host Sleep checks like this:

if (adapter->hs_activated) {
mwifiex_dbg(adapter, CMD,
"cmd: HS Already activated\n");
return true;
}

I'll do this next.

@kitakar5525
Copy link
Member Author

Also, currently, suspend() performs mwifiex_shutdown_sw() unconditionally. But may be better to check if it's already suspended or not, as Host Sleep checks like this:

if (adapter->hs_activated) {
mwifiex_dbg(adapter, CMD,
"cmd: HS Already activated\n");
return true;
}

I'll do this next.

That said, why do we need to check if device is already suspended/resumed as upstream does?

if (!test_bit(MWIFIEX_IS_SUSPENDED, &adapter->work_flags)) {
mwifiex_dbg(adapter, WARN,
"Device already resumed\n");
return 0;
}

If it's useless, I'd like to remove the check instead.

@qzed
Copy link
Member

qzed commented Sep 24, 2020

By the way, is this comment Shutdowned driver can't cause a wakeup. true?

I can only guess. If it works the way I think it works, then the WiFi controller would normally keep on working (in a suspended mode), so essentially be powered on. If it detects a wakeup package, it'll signal a wakeup via the IRQ (I think that's normally a GPIO pin or something like that). If we now shut down the card, the EC firmware might not be running any more, so it won't detect and issue wakeups. At least that's my theory. Without knowing more about either how those things usually work or how the mwifiex chip works in particular, all of this might be wrong of course.

Oh and it should be Shut down driver, I think (past tense of "shut down" is also "shut down").

and how do you think if preserving mwifiex_enable_wake()/mwifiex_disable_wake() is useful? (I've currently deleted as the commit message states)

I would keep them removed, for now at least. If the EC can't issue any wakeups when shut down, having them there could cause the wrong impression (i.e. that wakeups are still expected to work). In the worst-case scenario it could also cause spurious wakeups if the EC would be not properly configured for suspend, but I kind of doubt that.

That said, why do we need to check if device is already suspended/resumed as upstream does?

I honestly don't know. is it possible that something like runtime-suspend could enter Host Suspend while the device is still active, and suspend() then cause a "double suspend"? If not then I guess that's not needed.

@kitakar5525
Copy link
Member Author

By the way, is this comment Shutdowned driver can't cause a wakeup. true?

I can only guess. If it works the way I think it works, then the WiFi controller would normally keep on working (in a suspended mode), so essentially be powered on. If it detects a wakeup package, it'll signal a wakeup via the IRQ (I think that's normally a GPIO pin or something like that). If we now shut down the card, the EC firmware might not be running any more, so it won't detect and issue wakeups. At least that's my theory. Without knowing more about either how those things usually work or how the mwifiex chip works in particular, all of this might be wrong of course.

  • Rewrote the reason of removing mwifiex_enable_wake()/mwifiex_disable_wake() in a more unspecified way:
    - Removed mwifiex_enable_wake()/mwifiex_disable_wake()
      as incompatible and redundant because the driver will be shut down
      instead of Host Sleep.
    

Oh and it should be Shut down driver, I think (past tense of "shut down" is also "shut down").

Oh, right. Fixed.

  • Corrected English verb shutdown -> shut down

and how do you think if preserving mwifiex_enable_wake()/mwifiex_disable_wake() is useful? (I've currently deleted as the commit message states)

I would keep them removed, for now at least. If the EC can't issue any wakeups when shut down, having them there could cause the wrong impression (i.e. that wakeups are still expected to work). In the worst-case scenario it could also cause spurious wakeups if the EC would be not properly configured for suspend, but I kind of doubt that.

Thanks. I kept removing mwifiex_enable_wake()/mwifiex_disable_wake().

That said, why do we need to check if device is already suspended/resumed as upstream does?

I honestly don't know. is it possible that something like runtime-suspend could enter Host Suspend while the device is still active, and suspend() then cause a "double suspend"? If not then I guess that's not needed.

  • Added comment for not checking suspend state on suspend()
    Note that now there is no check whether it's suspended before performing
    mwifiex_shutdown_sw() on suspend().
    With the previous Host Sleep method, the check was done by looking at
    adapter->hs_activated in mwifiex_enable_hs() [sta_ioctl.c], but not
    MWIFIEX_IS_SUSPENDED. So, what the previous method checked was instead
    Host Sleep state, not suspend itself. Therefore, there is no need to check
    the suspend state now.
    
    Also removed comment for suspend state check at top of suspend() 
    accordingly.
    

@kitakar5525
Copy link
Member Author

Other changes:

mwifiex: fix mwifiex_shutdown_sw() causing sw reset failure

  • Fixed "by some reasons" -> "for some reasons"
  • Added Fixes tag Fixes: 4c5dae59d2e9 ("mwifiex: add PCIe function level reset support")

mwifiex: pcie: use shutdown_sw()/reinit_sw() on suspend()/resume()

  • Re-added comment "If already not resumed" at top of resume()
    I had removed the comment before because I intended to remove the check
    of MWIFIEX_IS_SUSPENDED in resume(), but ended up preserving it.

  • Added new commit mwifiex: change comment for shutdown_sw()/reinit_sw() reflecting current state for upstreaming

@kitakar5525
Copy link
Member Author

Might need more investigation for why flush_workqueue() causes kernel oops. I'll do this next.

@kitakar5525
Copy link
Member Author

kitakar5525 commented Sep 30, 2020

I tried re-adding flush_workqueue() (after mwifiex_shutdown_sw())

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index aef24b7c71c0..688bc94ba189 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -171,6 +171,8 @@ static int mwifiex_pcie_suspend(struct device *dev)
 		return -EFAULT;
 	}
 
+	flush_workqueue(adapter->workqueue);
+
 	/* Indicate device suspended */
 	set_bit(MWIFIEX_IS_SUSPENDED, &adapter->work_flags);
 

This resulted in SB1 not resuming from suspend.

The last journal output

[92430.637881] kernel: PM: suspend entry (s2idle)
[92430.682405] kernel: Filesystems sync: 0.044 seconds

So, no crash log recorded.

note: commit torvalds/linux@ec815dd added the flush_workqueue() for Host Sleep.

I have no idea what I should do, but upstream may know anyway. At least I haven't encountered issues with this patch PR.

@kitakar5525 kitakar5525 marked this pull request as ready for review September 30, 2020 12:51
@qzed
Copy link
Member

qzed commented Sep 30, 2020

Could be some deadlock again. AFAIK there are some suspend debugging options you could try. But yeah, maybe upstream has any idea on what could go wrong.

Copy link
Member

@qzed qzed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Feel free to merge when you think it's ready.

@kitakar5525 kitakar5525 force-pushed the mwifiex/fix-suspend-sw branch 2 times, most recently from 54c8340 to b9fe567 Compare September 30, 2020 13:42
@kitakar5525
Copy link
Member Author

Small changes to commit message/title:

mwifiex: pcie: use shutdown_sw()/reinit_sw() on suspend()/resume()

  • corrected some English grammar

  • the following change:

-Note that now there is no check whether it's suspended before performing
-mwifiex_shutdown_sw() on suspend().
+Note that suspend() no longer checks if it's already suspended.
  • removed the following from commit message:
I reused the contents of suspend()/resume() functions as much as possible,
and removed only the parts that are incompatible or redundant with
shutdown_sw()/reinit_sw().

- Removed wait_for_completion() as redundant
  mwifiex_shutdown_sw() does this.
- Removed flush_workqueue() as incompatible
  Causes kernel crashing.
- Removed mwifiex_enable_wake()/mwifiex_disable_wake()
  as incompatible and redundant because the driver will be shut down
  instead of entering Host Sleep.

I think sending this as extra comment is better.

change comment for shutdown_sw()/reinit_sw() reflecting current state

  • changed patch name to update comment for shutdown_sw()/reinit_sw() reflecting current state

@kitakar5525
Copy link
Member Author

Looks good. Feel free to merge when you think it's ready.

Thanks, I'll check commit message (mainly English grammar 😀) lastly for each PR and merge it.

@kitakar5525
Copy link
Member Author

Ah sorry, the last rebase caused a word missing

-When using the Host Sleep method, it prevents the platform to reach
+When using the Host Sleep method, it prevents the platform to reach S0ix
during suspend. Also, after suspend, sometimes AP scanning won't work,

Fixed in this rebase. I consider this PR is ready now.

@kitakar5525 kitakar5525 changed the title WIP: mwifiex: fix S0ix/AP-scanning after suspend by sw method instead of hw mwifiex: fix S0ix/AP-scanning after suspend by sw method instead of hw Sep 30, 2020
This reverts commit d7df466.

Reason for revert:
  Simplified and sw method commit incoming.

Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>
When FLR is performed but without fw reset for some reasons (e.g. on
Surface devices, fw reset requires another quirk), it fails to reset
properly. You can trigger the issue on such devices via debugfs entry
for reset:

    $ echo 1 | sudo tee /sys/kernel/debug/mwifiex/mlan0/reset

and the resulting dmesg log:

    mwifiex_pcie 0000:03:00.0: Resetting per request
    mwifiex_pcie 0000:03:00.0: info: successfully disconnected from [BSSID]: reason code 3
    mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
    mwifiex_pcie 0000:03:00.0: deleting the crypto keys
    mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
    mwifiex_pcie 0000:03:00.0: deleting the crypto keys
    mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
    mwifiex_pcie 0000:03:00.0: deleting the crypto keys
    mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
    mwifiex_pcie 0000:03:00.0: deleting the crypto keys
    mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
    mwifiex_pcie 0000:03:00.0: deleting the crypto keys
    mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
    mwifiex_pcie 0000:03:00.0: deleting the crypto keys
    mwifiex_pcie 0000:03:00.0: info: shutdown mwifiex...
    mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
    mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
    mwifiex_pcie 0000:03:00.0: WLAN FW already running! Skip FW dnld
    mwifiex_pcie 0000:03:00.0: WLAN FW is active
    mwifiex_pcie 0000:03:00.0: Unknown api_id: 4
    mwifiex_pcie 0000:03:00.0: info: MWIFIEX VERSION: mwifiex 1.0 (15.68.19.p21)
    mwifiex_pcie 0000:03:00.0: driver_version = mwifiex 1.0 (15.68.19.p21)
    mwifiex_pcie 0000:03:00.0: info: trying to associate to '[SSID]' bssid [BSSID]
    mwifiex_pcie 0000:03:00.0: info: associated to bssid [BSSID] successfully
    mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110
    mwifiex_pcie 0000:03:00.0: info: successfully disconnected from [BSSID]: reason code 15
    mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110
    mwifiex_pcie 0000:03:00.0: deleting the crypto keys
    mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110
    mwifiex_pcie 0000:03:00.0: deleting the crypto keys
    mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110
    mwifiex_pcie 0000:03:00.0: deleting the crypto keys
    [...]

When comparing mwifiex_shutdown_sw() with mwifiex_pcie_remove(), it
lacks mwifiex_init_shutdown_fw().

This commit fixes mwifiex_shutdown_sw() by adding the missing
mwifiex_init_shutdown_fw().

Fixes: 4c5dae5 ("mwifiex: add PCIe function level reset support")
Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>
There are issues with S0ix achievement and AP scanning after suspend
with the current Host Sleep method.

When using the Host Sleep method, it prevents the platform to reach S0ix
during suspend. Also, after suspend, sometimes AP scanning won't work,
resulting in non-working wifi.

To fix such issues, perform shutdown_sw()/reinit_sw() instead of Host
Sleep.

As a side effect, this patch disables wakeups (means that Wake-On-WLAN
can't be used anymore, if it was working before), and might also reset
some internal states.

Note that suspend() no longer checks if it's already suspended.

With the previous Host Sleep method, the check was done by looking at
adapter->hs_activated in mwifiex_enable_hs() [sta_ioctl.c], but not
MWIFIEX_IS_SUSPENDED. So, what the previous method checked was instead
Host Sleep state, not suspend itself. Therefore, there is no need to check
the suspend state now.

Also removed comment for suspend state check at top of suspend()
accordingly.

Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>
…ent state

The functions mwifiex_shutdown_sw() and mwifiex_reinit_sw() can be used
for more general purposes than the PCIe function level reset. Also, these
are even not PCIe-specific.

So, let's update the comments at the top of each function accordingly.

Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>
@kitakar5525
Copy link
Member Author

Rebased on top of current v5.8-surface-devel for simplicity

@kitakar5525 kitakar5525 merged commit 467018c into v5.8-surface-devel Sep 30, 2020
@kitakar5525 kitakar5525 deleted the mwifiex/fix-suspend-sw branch September 30, 2020 15:07
@kitakar5525
Copy link
Member Author

kitakar5525 commented Oct 1, 2020

By the way, I noticed the following changes regarding S0ix (printed /sys/kernel/debug/pmc_core contents as early as possible after resume) on SB1 (Host Sleep method vs shutdown_sw() method):

-MPHY CORE LANE 12               	State: Not power gated
-PCH IP: 6  - SPC                             	State: On
-DMIPCIE3 PLL                    	State: Active
+MPHY CORE LANE 12               	State: Power gated
+PCH IP: 6  - SPC                             	State: Off
+DMIPCIE3 PLL                    	State: Idle

So, this indicates that the shutdown_sw() suspend method is definitely working.

kitakar5525 pushed a commit to kitakar5525/linux-kernel that referenced this pull request Oct 18, 2020
…ort_recv_listen()

With multi-transport support, listener sockets are not bound to any
transport. So, calling virtio_transport_reset(), when an error
occurs, on a listener socket produces the following null-pointer
dereference:

  BUG: kernel NULL pointer dereference, address: 00000000000000e8
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 0 PID: 20 Comm: kworker/0:1 Not tainted 5.5.0-rc1-ste-00003-gb4be21f316ac-dirty linux-surface#56
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
  Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport]
  RIP: 0010:virtio_transport_send_pkt_info+0x20/0x130 [vmw_vsock_virtio_transport_common]
  Code: 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 10 44 8b 76 20 e8 c0 ba fe ff <48> 8b 80 e8 00 00 00 e8 64 e3 7d c1 45 8b 45 00 41 8b 8c 24 d4 02
  RSP: 0018:ffffc900000b7d08 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffff88807bf12728 RCX: 0000000000000000
  RDX: ffff88807bf12700 RSI: ffffc900000b7d50 RDI: ffff888035c84000
  RBP: ffffc900000b7d40 R08: ffff888035c84000 R09: ffffc900000b7d08
  R10: ffff8880781de800 R11: 0000000000000018 R12: ffff888035c84000
  R13: ffffc900000b7d50 R14: 0000000000000000 R15: ffff88807bf12724
  FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000000e8 CR3: 00000000790f4004 CR4: 0000000000160ef0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   virtio_transport_reset+0x59/0x70 [vmw_vsock_virtio_transport_common]
   virtio_transport_recv_pkt+0x5bb/0xe50 [vmw_vsock_virtio_transport_common]
   ? detach_buf_split+0xf1/0x130
   virtio_transport_rx_work+0xba/0x130 [vmw_vsock_virtio_transport]
   process_one_work+0x1c0/0x300
   worker_thread+0x45/0x3c0
   kthread+0xfc/0x130
   ? current_work+0x40/0x40
   ? kthread_park+0x90/0x90
   ret_from_fork+0x35/0x40
  Modules linked in: sunrpc kvm_intel kvm vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common irqbypass vsock virtio_rng rng_core
  CR2: 00000000000000e8
  ---[ end trace e75400e2ea2fa824 ]---

This happens because virtio_transport_reset() calls
virtio_transport_send_pkt_info() that can be used only on
connecting/connected sockets.

This patch fixes the issue, using virtio_transport_reset_no_sock()
instead of virtio_transport_reset() when we are handling a listener
socket.

Fixes: c0cfa2d ("vsock: add multi-transports support")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit df18fa1)

BUG=b:153004946
TEST=nested vsock works

Change-Id: If59ecc3babd64b5a0e9373df110ce9d00185c5f8
Signed-off-by: Allen Webb <allenwebb@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2406879
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-by: Sean Paul <seanpaul@chromium.org>
qzed pushed a commit that referenced this pull request Jan 24, 2021
I was hitting the below panic continuously when attaching kprobes to
scheduler functions

	[  159.045212] Unexpected kernel BRK exception at EL1
	[  159.053753] Internal error: BRK handler: f2000006 [#1] PREEMPT SMP
	[  159.059954] Modules linked in:
	[  159.063025] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.11.0-rc4-00008-g1e2a199f6ccd #56
	[rt-app] <notice> [1] Exiting.[  159.071166] Hardware name: ARM Juno development board (r2) (DT)
	[  159.079689] pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO BTYPE=--)

	[  159.085723] pc : 0xffff80001624501c
	[  159.089377] lr : attach_entity_load_avg+0x2ac/0x350
	[  159.094271] sp : ffff80001622b640
	[rt-app] <notice> [0] Exiting.[  159.097591] x29: ffff80001622b640 x28: 0000000000000001
	[  159.105515] x27: 0000000000000049 x26: ffff000800b79980

	[  159.110847] x25: ffff00097ef37840 x24: 0000000000000000
	[  159.116331] x23: 00000024eacec1ec x22: ffff00097ef12b90
	[  159.121663] x21: ffff00097ef37700 x20: ffff800010119170
	[rt-app] <notice> [11] Exiting.[  159.126995] x19: ffff00097ef37840 x18: 000000000000000e
	[  159.135003] x17: 0000000000000001 x16: 0000000000000019
	[  159.140335] x15: 0000000000000000 x14: 0000000000000000
	[  159.145666] x13: 0000000000000002 x12: 0000000000000002
	[  159.150996] x11: ffff80001592f9f0 x10: 0000000000000060
	[  159.156327] x9 : ffff8000100f6f9c x8 : be618290de0999a1
	[  159.161659] x7 : ffff80096a4b1000 x6 : 0000000000000000
	[  159.166990] x5 : ffff00097ef37840 x4 : 0000000000000000
	[  159.172321] x3 : ffff000800328948 x2 : 0000000000000000
	[  159.177652] x1 : 0000002507d52fec x0 : ffff00097ef12b90
	[  159.182983] Call trace:
	[  159.185433]  0xffff80001624501c
	[  159.188581]  update_load_avg+0x2d0/0x778
	[  159.192516]  enqueue_task_fair+0x134/0xe20
	[  159.196625]  enqueue_task+0x4c/0x2c8
	[  159.200211]  ttwu_do_activate+0x70/0x138
	[  159.204147]  sched_ttwu_pending+0xbc/0x160
	[  159.208253]  flush_smp_call_function_queue+0x16c/0x320
	[  159.213408]  generic_smp_call_function_single_interrupt+0x1c/0x28
	[  159.219521]  ipi_handler+0x1e8/0x3c8
	[  159.223106]  handle_percpu_devid_irq+0xd8/0x460
	[  159.227650]  generic_handle_irq+0x38/0x50
	[  159.231672]  __handle_domain_irq+0x6c/0xc8
	[  159.235781]  gic_handle_irq+0xcc/0xf0
	[  159.239452]  el1_irq+0xb4/0x180
	[  159.242600]  rcu_is_watching+0x28/0x70
	[  159.246359]  rcu_read_lock_held_common+0x44/0x88
	[  159.250991]  rcu_read_lock_any_held+0x30/0xc0
	[  159.255360]  kretprobe_dispatcher+0xc4/0xf0
	[  159.259555]  __kretprobe_trampoline_handler+0xc0/0x150
	[  159.264710]  trampoline_probe_handler+0x38/0x58
	[  159.269255]  kretprobe_trampoline+0x70/0xc4
	[  159.273450]  run_rebalance_domains+0x54/0x80
	[  159.277734]  __do_softirq+0x164/0x684
	[  159.281406]  irq_exit+0x198/0x1b8
	[  159.284731]  __handle_domain_irq+0x70/0xc8
	[  159.288840]  gic_handle_irq+0xb0/0xf0
	[  159.292510]  el1_irq+0xb4/0x180
	[  159.295658]  arch_cpu_idle+0x18/0x28
	[  159.299245]  default_idle_call+0x9c/0x3e8
	[  159.303265]  do_idle+0x25c/0x2a8
	[  159.306502]  cpu_startup_entry+0x2c/0x78
	[  159.310436]  secondary_start_kernel+0x160/0x198
	[  159.314984] Code: d42000c0 aa1e03e9 d42000c0 aa1e03e9 (d42000c0)

After a bit of head scratching and debugging it turned out that it is
due to kprobe handler being interrupted by a tick that causes us to go
into (I think another) kprobe handler.

The culprit was kprobe_breakpoint_ss_handler() returning DBG_HOOK_ERROR
which leads to the Unexpected kernel BRK exception.

Reverting commit ba090f9 ("arm64: kprobes: Remove redundant
kprobe_step_ctx") seemed to fix the problem for me.

Further analysis showed that kcb->kprobe_status is set to
KPROBE_REENTER when the error occurs. By teaching
kprobe_breakpoint_ss_handler() to handle this status I can no  longer
reproduce the problem.

Fixes: ba090f9 ("arm64: kprobes: Remove redundant kprobe_step_ctx")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210122110909.3324607-1-qais.yousef@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
qzed pushed a commit that referenced this pull request Mar 11, 2023
[ Upstream commit 9ca5e7e ]

When a file descriptor of pppol2tp socket is passed as file descriptor
of UDP socket, a recursive deadlock occurs in l2tp_tunnel_register().
This situation is reproduced by the following program:

int main(void)
{
	int sock;
	struct sockaddr_pppol2tp addr;

	sock = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
	if (sock < 0) {
		perror("socket");
		return 1;
	}

	addr.sa_family = AF_PPPOX;
	addr.sa_protocol = PX_PROTO_OL2TP;
	addr.pppol2tp.pid = 0;
	addr.pppol2tp.fd = sock;
	addr.pppol2tp.addr.sin_family = PF_INET;
	addr.pppol2tp.addr.sin_port = htons(0);
	addr.pppol2tp.addr.sin_addr.s_addr = inet_addr("192.168.0.1");
	addr.pppol2tp.s_tunnel = 1;
	addr.pppol2tp.s_session = 0;
	addr.pppol2tp.d_tunnel = 0;
	addr.pppol2tp.d_session = 0;

	if (connect(sock, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
		perror("connect");
		return 1;
	}

	return 0;
}

This program causes the following lockdep warning:

 ============================================
 WARNING: possible recursive locking detected
 6.2.0-rc5-00205-gc96618275234 #56 Not tainted
 --------------------------------------------
 repro/8607 is trying to acquire lock:
 ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: l2tp_tunnel_register+0x2b7/0x11c0

 but task is already holding lock:
 ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(sk_lock-AF_PPPOX);
   lock(sk_lock-AF_PPPOX);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by repro/8607:
  #0: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30

 stack backtrace:
 CPU: 0 PID: 8607 Comm: repro Not tainted 6.2.0-rc5-00205-gc96618275234 #56
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x100/0x178
  __lock_acquire.cold+0x119/0x3b9
  ? lockdep_hardirqs_on_prepare+0x410/0x410
  lock_acquire+0x1e0/0x610
  ? l2tp_tunnel_register+0x2b7/0x11c0
  ? lock_downgrade+0x710/0x710
  ? __fget_files+0x283/0x3e0
  lock_sock_nested+0x3a/0xf0
  ? l2tp_tunnel_register+0x2b7/0x11c0
  l2tp_tunnel_register+0x2b7/0x11c0
  ? sprintf+0xc4/0x100
  ? l2tp_tunnel_del_work+0x6b0/0x6b0
  ? debug_object_deactivate+0x320/0x320
  ? lockdep_init_map_type+0x16d/0x7a0
  ? lockdep_init_map_type+0x16d/0x7a0
  ? l2tp_tunnel_create+0x2bf/0x4b0
  ? l2tp_tunnel_create+0x3c6/0x4b0
  pppol2tp_connect+0x14e1/0x1a30
  ? pppol2tp_put_sk+0xd0/0xd0
  ? aa_sk_perm+0x2b7/0xa80
  ? aa_af_perm+0x260/0x260
  ? bpf_lsm_socket_connect+0x9/0x10
  ? pppol2tp_put_sk+0xd0/0xd0
  __sys_connect_file+0x14f/0x190
  __sys_connect+0x133/0x160
  ? __sys_connect_file+0x190/0x190
  ? lockdep_hardirqs_on+0x7d/0x100
  ? ktime_get_coarse_real_ts64+0x1b7/0x200
  ? ktime_get_coarse_real_ts64+0x147/0x200
  ? __audit_syscall_entry+0x396/0x500
  __x64_sys_connect+0x72/0xb0
  do_syscall_64+0x38/0xb0
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

This patch fixes the issue by getting/creating the tunnel before
locking the pppol2tp socket.

Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()")
Cc: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
qzed pushed a commit that referenced this pull request Mar 11, 2023
[ Upstream commit 9ca5e7e ]

When a file descriptor of pppol2tp socket is passed as file descriptor
of UDP socket, a recursive deadlock occurs in l2tp_tunnel_register().
This situation is reproduced by the following program:

int main(void)
{
	int sock;
	struct sockaddr_pppol2tp addr;

	sock = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
	if (sock < 0) {
		perror("socket");
		return 1;
	}

	addr.sa_family = AF_PPPOX;
	addr.sa_protocol = PX_PROTO_OL2TP;
	addr.pppol2tp.pid = 0;
	addr.pppol2tp.fd = sock;
	addr.pppol2tp.addr.sin_family = PF_INET;
	addr.pppol2tp.addr.sin_port = htons(0);
	addr.pppol2tp.addr.sin_addr.s_addr = inet_addr("192.168.0.1");
	addr.pppol2tp.s_tunnel = 1;
	addr.pppol2tp.s_session = 0;
	addr.pppol2tp.d_tunnel = 0;
	addr.pppol2tp.d_session = 0;

	if (connect(sock, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
		perror("connect");
		return 1;
	}

	return 0;
}

This program causes the following lockdep warning:

 ============================================
 WARNING: possible recursive locking detected
 6.2.0-rc5-00205-gc96618275234 #56 Not tainted
 --------------------------------------------
 repro/8607 is trying to acquire lock:
 ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: l2tp_tunnel_register+0x2b7/0x11c0

 but task is already holding lock:
 ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(sk_lock-AF_PPPOX);
   lock(sk_lock-AF_PPPOX);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by repro/8607:
  #0: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30

 stack backtrace:
 CPU: 0 PID: 8607 Comm: repro Not tainted 6.2.0-rc5-00205-gc96618275234 #56
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x100/0x178
  __lock_acquire.cold+0x119/0x3b9
  ? lockdep_hardirqs_on_prepare+0x410/0x410
  lock_acquire+0x1e0/0x610
  ? l2tp_tunnel_register+0x2b7/0x11c0
  ? lock_downgrade+0x710/0x710
  ? __fget_files+0x283/0x3e0
  lock_sock_nested+0x3a/0xf0
  ? l2tp_tunnel_register+0x2b7/0x11c0
  l2tp_tunnel_register+0x2b7/0x11c0
  ? sprintf+0xc4/0x100
  ? l2tp_tunnel_del_work+0x6b0/0x6b0
  ? debug_object_deactivate+0x320/0x320
  ? lockdep_init_map_type+0x16d/0x7a0
  ? lockdep_init_map_type+0x16d/0x7a0
  ? l2tp_tunnel_create+0x2bf/0x4b0
  ? l2tp_tunnel_create+0x3c6/0x4b0
  pppol2tp_connect+0x14e1/0x1a30
  ? pppol2tp_put_sk+0xd0/0xd0
  ? aa_sk_perm+0x2b7/0xa80
  ? aa_af_perm+0x260/0x260
  ? bpf_lsm_socket_connect+0x9/0x10
  ? pppol2tp_put_sk+0xd0/0xd0
  __sys_connect_file+0x14f/0x190
  __sys_connect+0x133/0x160
  ? __sys_connect_file+0x190/0x190
  ? lockdep_hardirqs_on+0x7d/0x100
  ? ktime_get_coarse_real_ts64+0x1b7/0x200
  ? ktime_get_coarse_real_ts64+0x147/0x200
  ? __audit_syscall_entry+0x396/0x500
  __x64_sys_connect+0x72/0xb0
  do_syscall_64+0x38/0xb0
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

This patch fixes the issue by getting/creating the tunnel before
locking the pppol2tp socket.

Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()")
Cc: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
qzed pushed a commit that referenced this pull request Nov 20, 2023
…ress

commit 8b51a39 upstream.

If we specify a valid CQ ring address but an invalid SQ ring address,
we'll correctly spot this and free the allocated pages and clear them
to NULL. However, we don't clear the ring page count, and hence will
attempt to free the pages again. We've already cleared the address of
the page array when freeing them, but we don't check for that. This
causes the following crash:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Oops [#1]
Modules linked in:
CPU: 0 PID: 20 Comm: kworker/u2:1 Not tainted 6.6.0-rc5-dirty #56
Hardware name: ucbbar,riscvemu-bare (DT)
Workqueue: events_unbound io_ring_exit_work
epc : io_pages_free+0x2a/0x58
 ra : io_rings_free+0x3a/0x50
 epc : ffffffff808811a2 ra : ffffffff80881406 sp : ffff8f80000c3cd0
 status: 0000000200000121 badaddr: 0000000000000000 cause: 000000000000000d
 [<ffffffff808811a2>] io_pages_free+0x2a/0x58
 [<ffffffff80881406>] io_rings_free+0x3a/0x50
 [<ffffffff80882176>] io_ring_exit_work+0x37e/0x424
 [<ffffffff80027234>] process_one_work+0x10c/0x1f4
 [<ffffffff8002756e>] worker_thread+0x252/0x31c
 [<ffffffff8002f5e4>] kthread+0xc4/0xe0
 [<ffffffff8000332a>] ret_from_fork+0xa/0x1c

Check for a NULL array in io_pages_free(), but also clear the page counts
when we free them to be on the safer side.

Reported-by: rtm@csail.mit.edu
Fixes: 03d89a2 ("io_uring: support for user allocated memory for rings/sqes")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
qzed pushed a commit that referenced this pull request Nov 20, 2023
…ress

If we specify a valid CQ ring address but an invalid SQ ring address,
we'll correctly spot this and free the allocated pages and clear them
to NULL. However, we don't clear the ring page count, and hence will
attempt to free the pages again. We've already cleared the address of
the page array when freeing them, but we don't check for that. This
causes the following crash:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Oops [#1]
Modules linked in:
CPU: 0 PID: 20 Comm: kworker/u2:1 Not tainted 6.6.0-rc5-dirty #56
Hardware name: ucbbar,riscvemu-bare (DT)
Workqueue: events_unbound io_ring_exit_work
epc : io_pages_free+0x2a/0x58
 ra : io_rings_free+0x3a/0x50
 epc : ffffffff808811a2 ra : ffffffff80881406 sp : ffff8f80000c3cd0
 status: 0000000200000121 badaddr: 0000000000000000 cause: 000000000000000d
 [<ffffffff808811a2>] io_pages_free+0x2a/0x58
 [<ffffffff80881406>] io_rings_free+0x3a/0x50
 [<ffffffff80882176>] io_ring_exit_work+0x37e/0x424
 [<ffffffff80027234>] process_one_work+0x10c/0x1f4
 [<ffffffff8002756e>] worker_thread+0x252/0x31c
 [<ffffffff8002f5e4>] kthread+0xc4/0xe0
 [<ffffffff8000332a>] ret_from_fork+0xa/0x1c

Check for a NULL array in io_pages_free(), but also clear the page counts
when we free them to be on the safer side.

Reported-by: rtm@csail.mit.edu
Fixes: 03d89a2 ("io_uring: support for user allocated memory for rings/sqes")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
qzed pushed a commit that referenced this pull request Jun 30, 2024
[ Upstream commit 26afda7 ]

Christoph reported the following splat:

WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
Modules linked in:
CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
 do_accept+0x435/0x620 net/socket.c:1929
 __sys_accept4_file net/socket.c:1969 [inline]
 __sys_accept4+0x9b/0x110 net/socket.c:1999
 __do_sys_accept net/socket.c:2016 [inline]
 __se_sys_accept net/socket.c:2013 [inline]
 __x64_sys_accept+0x7d/0x90 net/socket.c:2013
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x4315f9
Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
 </TASK>

The reproducer invokes shutdown() before entering the listener status.
After commit 9406279 ("tcp: defer shutdown(SEND_SHUTDOWN) for
TCP_SYN_RECV sockets"), the above causes the child to reach the accept
syscall in FIN_WAIT1 status.

Eric noted we can relax the existing assertion in __inet_accept()

Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: multipath-tcp/mptcp_net-next#490
Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: 9406279 ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/23ab880a44d8cfd967e84de8b93dbf48848e3d8c.1716299669.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants