New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mwifiex: Try waking the firmware until we get an interrupt #91
Merged
qzed
merged 1 commit into
linux-surface:v5.11-surface-devel
from
jonas2515:v5.11-surface-devel-fix-firmware-wakeup-fails
Mar 28, 2021
Merged
mwifiex: Try waking the firmware until we get an interrupt #91
qzed
merged 1 commit into
linux-surface:v5.11-surface-devel
from
jonas2515:v5.11-surface-devel-fix-firmware-wakeup-fails
Mar 28, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
qzed
requested changes
Mar 28, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing where compiler optimization could screw us a bit and some formatting where you could reduce code a bit.
jonas2515
force-pushed
the
v5.11-surface-devel-fix-firmware-wakeup-fails
branch
from
March 28, 2021 22:14
7ad67ed
to
468449d
Compare
It seems that the firmware of the 88W8897 card sometimes ignores or misses when we try to wake it up by reading the firmware status register. This leads to the firmware wakeup timeout expiring and the driver resetting the card because we assume the firmware has hung up or crashed (unfortunately that's not unlikely with this card). Turns out that most of the time the firmware actually didn't hang up, but simply "missed" our wakeup request and doesn't send us an AWAKE event. Trying again to read the firmware status register after a short timeout usually makes the firmware wake we up as expected, so add a small retry loop to mwifiex_pm_wakeup_card() that looks at the interrupt status to check whether the card woke up. The number of tries and timeout lengths for this were determined experimentally: The firmware usually takes about 500 us to wake up after we attempt to read the status register. In some cases where the firmware is very busy (for example while doing a bluetooth scan) it might even miss our requests for multiple milliseconds, which is why after 15 tries the waiting time gets increased to 10 ms. The maximum number of tries it took to wake the firmware when testing this was around 20, so a maximum number of 50 tries should give us plenty of safety margin. A good reproducer for this issue is letting the firmware sleep and wake up in very short intervals, for example by pinging an device on the network every 0.1 seconds.
jonas2515
force-pushed
the
v5.11-surface-devel-fix-firmware-wakeup-fails
branch
from
March 28, 2021 22:18
468449d
to
072c04f
Compare
qzed
approved these changes
Mar 28, 2021
qzed
pushed a commit
that referenced
this pull request
Jun 29, 2021
If a task is killed during a page fault, it does not currently call sb_end_pagefault(), which means that the filesystem cannot be frozen at any time thereafter. This may be reported by lockdep like this: ==================================== WARNING: fsstress/10757 still has locks held! 5.13.0-rc4-build4+ #91 Not tainted ------------------------------------ 1 lock held by fsstress/10757: #0: ffff888104eac530 ( sb_pagefaults as filesystem freezing is modelled as a lock. Fix this by removing all the direct returns from within the function, and using 'ret' to indicate whether we were interrupted or successful. Fixes: 1cf7a15 ("afs: Implement shared-writeable mmap") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20210616154900.1958373-1-willy@infradead.org/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
qzed
pushed a commit
that referenced
this pull request
Sep 2, 2022
commit d957e7f upstream. storvsc_error_wq workqueue should not be marked as WQ_MEM_RECLAIM as it doesn't need to make forward progress under memory pressure. Marking this workqueue as WQ_MEM_RECLAIM may cause deadlock while flushing a non-WQ_MEM_RECLAIM workqueue. In the current state it causes the following warning: [ 14.506347] ------------[ cut here ]------------ [ 14.506354] workqueue: WQ_MEM_RECLAIM storvsc_error_wq_0:storvsc_remove_lun is flushing !WQ_MEM_RECLAIM events_freezable_power_:disk_events_workfn [ 14.506360] WARNING: CPU: 0 PID: 8 at <-snip->kernel/workqueue.c:2623 check_flush_dependency+0xb5/0x130 [ 14.506390] CPU: 0 PID: 8 Comm: kworker/u4:0 Not tainted 5.4.0-1086-azure #91~18.04.1-Ubuntu [ 14.506391] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022 [ 14.506393] Workqueue: storvsc_error_wq_0 storvsc_remove_lun [ 14.506395] RIP: 0010:check_flush_dependency+0xb5/0x130 <-snip-> [ 14.506408] Call Trace: [ 14.506412] __flush_work+0xf1/0x1c0 [ 14.506414] __cancel_work_timer+0x12f/0x1b0 [ 14.506417] ? kernfs_put+0xf0/0x190 [ 14.506418] cancel_delayed_work_sync+0x13/0x20 [ 14.506420] disk_block_events+0x78/0x80 [ 14.506421] del_gendisk+0x3d/0x2f0 [ 14.506423] sr_remove+0x28/0x70 [ 14.506427] device_release_driver_internal+0xef/0x1c0 [ 14.506428] device_release_driver+0x12/0x20 [ 14.506429] bus_remove_device+0xe1/0x150 [ 14.506431] device_del+0x167/0x380 [ 14.506432] __scsi_remove_device+0x11d/0x150 [ 14.506433] scsi_remove_device+0x26/0x40 [ 14.506434] storvsc_remove_lun+0x40/0x60 [ 14.506436] process_one_work+0x209/0x400 [ 14.506437] worker_thread+0x34/0x400 [ 14.506439] kthread+0x121/0x140 [ 14.506440] ? process_one_work+0x400/0x400 [ 14.506441] ? kthread_park+0x90/0x90 [ 14.506443] ret_from_fork+0x35/0x40 [ 14.506445] ---[ end trace 2d9633159fdc6ee7 ]--- Link: https://lore.kernel.org/r/1659628534-17539-1-git-send-email-ssengar@linux.microsoft.com Fixes: 436ad94 ("scsi: storvsc: Allow only one remove lun work item to be issued per lun") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It seems that the firmware of the 88W8897 card sometimes ignores or
misses when we try to wake it up by reading the firmware status
register. This leads to the firmware wakeup timeout expiring and the
driver resetting the card because we assume the firmware has hung up or
crashed (unfortunately that's not unlikely with this card).
Turns out that most of the time the firmware actually didn't hang up,
but simply "missed" our wakeup request and doesn't send us an AWAKE
event.
Trying again to read the firmware status register after a short timeout
usually makes the firmware wake we up as expected, so add a small retry
loop to mwifiex_pm_wakeup_card() that looks at the interrupt status to
check whether the card woke up.
The number of tries and timeout lengths for this were determined
experimentally: The firmware usually takes about 500 us to wake up
after we attempt to read the status register. In some cases where the
firmware is very busy (for example while doing a bluetooth scan) it
might even miss our requests for multiple milliseconds, which is why
after 15 tries the waiting time gets increased to 10 ms. The maximum
number of tries it took to wake the firmware when testing this was
around 20, so a maximum number of 50 tries should give us plenty of
safety margin.
A good reproducer for this issue is letting the firmware sleep and wake
up in very short intervals, for example by pinging an device on the
network every 0.1 seconds.