Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Deadlock between device.tracker and device.snatchable_lock #5737

Open
Tracked by #5572
sagudev opened this issue May 24, 2024 · 4 comments
Open
Tracked by #5572

[core] Deadlock between device.tracker and device.snatchable_lock #5737

sagudev opened this issue May 24, 2024 · 4 comments

Comments

@sagudev
Copy link
Contributor

sagudev commented May 24, 2024

Description
More deadlocks between queue.submit and poll_all devices (both threads are running device.maintain).

queue.submit thread:

  thread #107, name = 'WGPU'
    frame #0: 0x00007ffff6f2725d libc.so.6`syscall at syscall.S:38
    frame #1: 0x000055555bd3a8c2 servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers at linux.rs:112:13
    frame #2: 0x000055555bd3a8a5 servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers [inlined] <parking_lot_core::thread_parker::imp::ThreadParker as parking_lot_core::thread_parker::ThreadParkerT>::park at linux.rs:66:13
    frame #3: 0x000055555bd3a886 servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers at parking_lot.rs:635:36
    frame #4: 0x000055555bd3a70b servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers at parking_lot.rs:207:5
    frame #5: 0x000055555bd3a675 servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers at parking_lot.rs:600:5
    frame #6: 0x000055555bd3a675 servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers(self=0x00007ffeda27b5a8, timeout=Instant>{...}, prev_value=0) at raw_rwlock.rs:1017:17
    frame #7: 0x000055555bd37c8e servo`parking_lot::raw_rwlock::RawRwLock::lock_exclusive_slow(self=0x00007ffeda27b5a8, timeout=Instant>{...}) at raw_rwlock.rs:647:9
    frame #8: 0x000055555b1586d9 servo`wgpu_core::snatch::SnatchLock::write [inlined] <parking_lot::raw_rwlock::RawRwLock as lock_api::rwlock::RawRwLock>::lock_exclusive(self=0x00007ffeda27b5a8) at raw_rwlock.rs:73:26
    frame #9: 0x000055555b1586cb servo`wgpu_core::snatch::SnatchLock::write at rwlock.rs:500:9
    frame #10: 0x000055555b1586cb servo`wgpu_core::snatch::SnatchLock::write [inlined] wgpu_core::lock::vanilla::RwLock<T>::write at vanilla.rs:85:33
    frame #11: 0x000055555b1586cb servo`wgpu_core::snatch::SnatchLock::write(self=0x00007ffeda27b5a8) at snatch.rs:154:40
    frame #12: 0x000055555b0c0e89 servo`wgpu_core::resource::Texture<A>::destroy(self=<unavailable>) at resource.rs:878:32
    frame #13: 0x000055555b0131aa servo`wgpu_core::device::resource::Device<A>::maintain at resource.rs:3649:21
    frame #14: 0x000055555b012c12 servo`wgpu_core::device::resource::Device<A>::maintain(self=0x00007ffeda279010, fence_guard=wgpu_core::lock::vanilla::RwLockReadGuard<core::option::Option<wgpu_hal::vulkan::Fence>> @ 0x00007fff505f7760, maintain=<unavailable>, snatch_guard=<unavailable>) at resource.rs:476:13
    frame #15: 0x000055555b046a79 servo`wgpu_core::device::queue::<impl wgpu_core::global::Global>::queue_submit(self=<unavailable>, queue_id=<unavailable>, command_buffer_ids=<unavailable>) at queue.rs:1494:23

poll_all_devices thread:

  thread #108, name = 'WGPU poller'
    frame #0: 0x00007ffff6f2725d libc.so.6`syscall at syscall.S:38
    frame #1: 0x000055555bd3c477 servo`parking_lot::raw_mutex::RawMutex::lock_slow at linux.rs:112:13
    frame #2: 0x000055555bd3c45a servo`parking_lot::raw_mutex::RawMutex::lock_slow [inlined] <parking_lot_core::thread_parker::imp::ThreadParker as parking_lot_core::thread_parker::ThreadParkerT>::park at linux.rs:66:13
    frame #3: 0x000055555bd3c454 servo`parking_lot::raw_mutex::RawMutex::lock_slow at parking_lot.rs:635:36
    frame #4: 0x000055555bd3c3f9 servo`parking_lot::raw_mutex::RawMutex::lock_slow at parking_lot.rs:207:5
    frame #5: 0x000055555bd3c3f9 servo`parking_lot::raw_mutex::RawMutex::lock_slow at parking_lot.rs:600:5
    frame #6: 0x000055555bd3c3f9 servo`parking_lot::raw_mutex::RawMutex::lock_slow(self=0x00007ffeda27b5b0, timeout=Instant>{...}) at raw_mutex.rs:262:17
    frame #7: 0x000055555b148a84 servo`wgpu_core::device::life::LifetimeTracker<A>::triage_suspected [inlined] <parking_lot::raw_mutex::RawMutex as lock_api::mutex::RawMutex>::lock(self=0x00007ffeda27b5b0) at raw_mutex.rs:72:13
    frame #8: 0x000055555b148a76 servo`wgpu_core::device::life::LifetimeTracker<A>::triage_suspected at mutex.rs:223:9
    frame #9: 0x000055555b148a76 servo`wgpu_core::device::life::LifetimeTracker<A>::triage_suspected at vanilla.rs:29:27
    frame #10: 0x000055555b148a76 servo`wgpu_core::device::life::LifetimeTracker<A>::triage_suspected [inlined] wgpu_core::device::life::LifetimeTracker<A>::triage_suspected_render_bundles(self=<unavailable>, trackers=0x00007ffeda27b5b0) at life.rs:501:37
    frame #11: 0x000055555b148a76 servo`wgpu_core::device::life::LifetimeTracker<A>::triage_suspected(self=0x00007ffeda27b888, trackers=0x00007ffeda27b5b0) at life.rs:786:9
    frame #12: 0x000055555b16bfd7 servo`wgpu_core::device::resource::Device<A>::maintain(self=0x00007ffeda279010, fence_guard=wgpu_core::lock::vanilla::RwLockReadGuard<core::option::Option<wgpu_hal::vulkan::Fence>> @ r15, maintain=<unavailable>, snatch_guard=<unavailable>) at resource.rs:438:9
    frame #13: 0x000055555b156789 servo`wgpu_core::device::global::<impl wgpu_core::global::Global>::poll_all_devices at global.rs:2148:39
    frame #14: 0x000055555b156769 servo`wgpu_core::device::global::<impl wgpu_core::global::Global>::poll_all_devices at global.rs:2188:21
    frame #15: 0x000055555b15661f servo`wgpu_core::device::global::<impl wgpu_core::global::Global>::poll_all_devices(self=<unavailable>, force_wait=<unavailable>) at global.rs:2213:17

Repro steps
Servo servo/servo@5ef507e when running webgpu:api,validation,state,device_lost,destroy:createTexture,2d,uncompressed_format:*

Platform
wgpu-core d0a5e48

@sagudev sagudev changed the title [core] Deadlock . [core] Deadlock on tracker lock. May 24, 2024
@sagudev
Copy link
Contributor Author

sagudev commented May 24, 2024

queue.submit thread

Acquired tracker lock here:

let trackers = self.trackers.lock();

then tries to acquire
let snatch_guard = device.snatchable_lock.write();

poll_all_devices thread

Tries to acquire

let mut trackers = trackers.lock();
while it already acquired snatchable lock:
let snatch_guard = device.snatchable_lock.read();

@sagudev sagudev changed the title [core] Deadlock on tracker lock. [core] Deadlock between device.tracker and device.snatchable_lock May 24, 2024
@sagudev
Copy link
Contributor Author

sagudev commented May 24, 2024

similar deadlock also happens if one thread is destroying buffer instead of texture in

pub(crate) fn release_gpu_resources(&self) {

@jimblandy
Copy link
Member

Thanks for filing this, and for the analysis.

If you look at the analysis results posted in #5586, you'll see that there's no shortage of cycles in that lock acquisition ordering graph. There are lots of ways for wgpu to deadlock right now, unfortunately. We used to have static deadlock prevention until arcanization removed it, and things went downhill fast.

I have some security-sensitive issues I need to get through first. I'm expecting to have them done by the first week in June, and then I can turn my attention back to deadlocks. I definitely encourage you or anyone else to tackle these issues themselves if you need them addressed sooner than that.

@sagudev
Copy link
Contributor Author

sagudev commented Jun 12, 2024

Some similar deadlock happening in webgpu:api,validation,state,device_lost,destroy:queue,writeTexture,2d,uncompressed_format:* on servo/servo@3029549, this time between queue_write_texture and device poll:

queue_write_texture thread:

thread backtrace
  thread #51, name = 'WGPU'
    frame #0: 0x00007895a3b2725d libc.so.6`syscall at syscall.S:38
    frame #1: 0x00005d660d9ed6d5 servo`parking_lot::raw_mutex::RawMutex::lock_slow at linux.rs:112:13
    frame #2: 0x00005d660d9ed6bb servo`parking_lot::raw_mutex::RawMutex::lock_slow [inlined] <parking_lot_core::thread_parker::imp::ThreadParker as parking_lot_core::thread_parker::ThreadParkerT>::park at linux.rs:66:13
    frame #3: 0x00005d660d9ed6b5 servo`parking_lot::raw_mutex::RawMutex::lock_slow at parking_lot.rs:635:36
    frame #4: 0x00005d660d9ed657 servo`parking_lot::raw_mutex::RawMutex::lock_slow at parking_lot.rs:207:5
    frame #5: 0x00005d660d9ed657 servo`parking_lot::raw_mutex::RawMutex::lock_slow at parking_lot.rs:600:5
    frame #6: 0x00005d660d9ed657 servo`parking_lot::raw_mutex::RawMutex::lock_slow(self=0x0000789581cbe5a8, timeout=Instant>{...}) at raw_mutex.rs:262:17
    frame #7: 0x00005d660cb56f16 servo`wgpu_core::device::queue::<impl wgpu_core::global::Global>::queue_write_texture [inlined] <parking_lot::raw_mutex::RawMutex as lock_api::mutex::RawMutex>::lock(self=0x0000789581cbe5a8) at raw_mutex.rs:72:13
    frame #8: 0x00005d660cb56efc servo`wgpu_core::device::queue::<impl wgpu_core::global::Global>::queue_write_texture at mutex.rs:223:9
    frame #9: 0x00005d660cb56efc servo`wgpu_core::device::queue::<impl wgpu_core::global::Global>::queue_write_texture [inlined] wgpu_core::lock::vanilla::Mutex<T>::lock at vanilla.rs:29:27
    frame #10: 0x00005d660cb56ef3 servo`wgpu_core::device::queue::<impl wgpu_core::global::Global>::queue_write_texture(self=<unavailable>, queue_id=<unavailable>, destination=<unavailable>, data=<unavailable>, data_layout=<unavailable>, size=<unavailable>) at queue.rs:926:48

device poll thread:

thread backtrace
  thread #52, name = 'WGPU poller'
    frame #0: 0x00007895a3b2725d libc.so.6`syscall at syscall.S:38
    frame #1: 0x00005d660d9eb7a9 servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers at linux.rs:112:13
    frame #2: 0x00005d660d9eb78c servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers [inlined] <parking_lot_core::thread_parker::imp::ThreadParker as parking_lot_core::thread_parker::ThreadParkerT>::park at linux.rs:66:13
    frame #3: 0x00005d660d9eb76a servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers at parking_lot.rs:635:36
    frame #4: 0x00005d660d9eb731 servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers at parking_lot.rs:207:5
    frame #5: 0x00005d660d9eb731 servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers at parking_lot.rs:600:5
    frame #6: 0x00005d660d9eb731 servo`parking_lot::raw_rwlock::RawRwLock::wait_for_readers(self=0x0000789581cbe5a0, timeout=Instant>{...}, prev_value=0) at raw_rwlock.rs:1017:17
    frame #7: 0x00005d660d9e91f1 servo`parking_lot::raw_rwlock::RawRwLock::lock_exclusive_slow(self=0x0000789581cbe5a0, timeout=Instant>{...}) at raw_rwlock.rs:647:9
    frame #8: 0x00005d660cc15377 servo`wgpu_core::resource::Texture<A>::destroy [inlined] <parking_lot::raw_rwlock::RawRwLock as lock_api::rwlock::RawRwLock>::lock_exclusive(self=0x0000789581cbe5a0) at raw_rwlock.rs:73:26
    frame #9: 0x00005d660cc15369 servo`wgpu_core::resource::Texture<A>::destroy at rwlock.rs:500:9
    frame #10: 0x00005d660cc15369 servo`wgpu_core::resource::Texture<A>::destroy at vanilla.rs:85:33
    frame #11: 0x00005d660cc15369 servo`wgpu_core::resource::Texture<A>::destroy [inlined] wgpu_core::snatch::SnatchLock::write(self=0x0000789581cbe5a0) at snatch.rs:154:40
    frame #12: 0x00005d660cc15369 servo`wgpu_core::resource::Texture<A>::destroy(self=<unavailable>) at resource.rs:878:32
    frame #13: 0x00005d660cbd08ee servo`wgpu_core::device::resource::Device<A>::maintain at resource.rs:3649:21
    frame #14: 0x00005d660cbd066f servo`wgpu_core::device::resource::Device<A>::maintain(self=0x0000789581cbc010, fence_guard=wgpu_core::lock::vanilla::RwLockReadGuard<core::option::Option<wgpu_hal::vulkan::Fence>> @ 0x00007894fc5f9a80, maintain=<unavailable>, snatch_guard=<unavailable>) at resource.rs:476:13
    frame #15: 0x00005d660cc8ae72 servo`wgpu_core::device::global::<impl wgpu_core::global::Global>::poll_all_devices at global.rs:2148:39
    frame #16: 0x00005d660cc8ae52 servo`wgpu_core::device::global::<impl wgpu_core::global::Global>::poll_all_devices at global.rs:2188:21
    frame #17: 0x00005d660cc8acc4 servo`wgpu_core::device::global::<impl wgpu_core::global::Global>::poll_all_devices(self=0x0000789581c63010, force_wait=<unavailable>) at global.rs:2213:17
    frame #18: 0x00005d660c9bf813 servo`webgpu::poll_thread::poll_all_devices(global=<unavailable>, more_work=0x00007894fc5fa04f, force_wait=<unavailable>, lock=()) at poll_thread.rs:57:11

queue_write_texture thread tries to acquire:

let mut trackers = device.trackers.lock();

that is acquired by poller thread:
let trackers = self.trackers.lock();

while poller thread tries to acquire
let snatch_guard = device.snatchable_lock.write();
that queue_write_texture thread acquired at
let snatch_guard = device.snatchable_lock.read();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants