Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
soundwire: use driver callbacks directly with proper locking
In the SoundWire probe, we store a pointer from the driver ops into the 'slave' structure. This can lead to kernel oopses when unbinding codec drivers, e.g. with the following sequence to remove machine driver and codec driver. /sbin/modprobe -r snd_soc_sof_sdw /sbin/modprobe -r snd_soc_rt711 The full details can be found in the BugLink below, for reference the two following examples show different cases of driver ops/callbacks being invoked after the driver .remove(). kernel: BUG: kernel NULL pointer dereference, address: 0000000000000150 kernel: Workqueue: events cdns_update_slave_status_work [soundwire_cadence] kernel: RIP: 0010:mutex_lock+0x19/0x30 kernel: Call Trace: kernel: ? sdw_handle_slave_status+0x426/0xe00 [soundwire_bus 94ff184bf398570c3f8ff7efe9e32529f532e4ae] kernel: ? newidle_balance+0x26a/0x400 kernel: ? cdns_update_slave_status_work+0x1e9/0x200 [soundwire_cadence 1bcf98eebe5ba9833cd433323769ac923c9c6f82] kernel: BUG: unable to handle page fault for address: ffffffffc07654c8 kernel: Workqueue: pm pm_runtime_work kernel: RIP: 0010:sdw_bus_prep_clk_stop+0x6f/0x160 [soundwire_bus] kernel: Call Trace: kernel: <TASK> kernel: sdw_cdns_clock_stop+0xb5/0x1b0 [soundwire_cadence 1bcf98eebe5ba9833cd433323769ac923c9c6f82] kernel: intel_suspend_runtime+0x5f/0x120 [soundwire_intel aca858f7c87048d3152a4a41bb68abb9b663a1dd] kernel: ? dpm_sysfs_remove+0x60/0x60 This was not detected earlier in Intel tests since the tests first remove the parent PCI device and shut down the bus. The sequence above is a corner case which keeps the bus operational but without a driver bound. This patch removes the use the 'slave' ops and uses proper locking to make sure there are no live callbacks based on a dangling pointer executed during or after the driver unbinding sequence. Adding device_lock()/unlock() seems fitting since it's what is already used by the device and power management core routines, however this needs to be done carefully to avoid self-inflicted dead-locks. There are two cases where the device core code will take a lock, and taking the same lock while doing through suspend/resume routines would be problematic. The first case is on attachment. The attachment and device resume can happen asynchronously, and in the case where the resume happens first adding device_lock() does not work at all. Fortunately, in this case the codec driver resume routine needs to wait on the 'initialization_complete', so using device_trylock() is fine. In all other cases unrelated to attachment, a regular device_lock() needs to be taken. The second case is the routine sdw_update_slave_status(), which can be called from two different places, and the device lock may be taken already by the device core. Using device_trylock() also helps prevent this issue - and in case the device is locked already by the device core, it's safe to proceed since the call to sdw_update_slave_status() happens during the parent device PM routines. The parent-child relationship guarantees that the lock will not be released while the dev->driver pointer is used. The issue with the ops pointer has been there since December 2017, but there were so many changes in the bus handling code that this patch will not apply cleanly all the way to this initial commit. The changes can be easily backported though. Thanks to Dan Williams for his suggestions on an earlier version of this patch. BugLink: thesofproject#3531 Fixes: 56d4fe3 ("soundwire: Add MIPI DisCo property helpers") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
- Loading branch information