Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan segfault while exiting #1439

Closed
ecton opened this issue May 14, 2021 · 8 comments · Fixed by #1792
Closed

Vulkan segfault while exiting #1439

ecton opened this issue May 14, 2021 · 8 comments · Fixed by #1792
Labels
type: bug Something isn't working

Comments

@ecton
Copy link
Contributor

ecton commented May 14, 2021

Description

I have a segfaul on exit that is occurring while I have no user-code interacting with wgpu (as far as I can find). It's affecting my library, kludgine

Repro steps

Sadly, I can't reproduce any crashes or similar looking valgrind errors while using any of the examples.

  1. Checkout the redux branch of kludgine. Technically the main branch exhibits this behavior, but the redux branch has been simplified significantly compared to main.
  2. Run any of the examples, and occasionally when closing the window, you'll see:
    corrupted size vs. prev_size in fastbins # this message isn't consistently shown
    zsh: segmentation fault (core dumped)  cargo run --example simple
    
  3. Run valgrind -v ./target/debug/examples/simple

Expected vs observed behavior

I expect vulkan to shut down properly without crashing.

Extra materials

Here's a snippet from the valgrind report that I find interesting:

==344258== 2 errors in context 48 of 51:
==344258== Invalid read of size 4
==344258==    at 0x8323799: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x831BEEB: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x831BF40: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x831B49D: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x831B5D2: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x8675DCA: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x8678826: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x867946D: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x866D732: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x866D7E0: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x712661A: ??? (in /usr/lib/libGLX_nvidia.so.460.73.01)
==344258==    by 0x13ADFBF: ash::vk::extensions::KhrSwapchainFn::destroy_swapchain_khr (extensions.rs:620)
==344258==  Address 0xa0c6ca0 is 800 bytes inside a block of size 4,992 free'd
==344258==    at 0x483F9AB: free (vg_replace_malloc.c:538)
==344258==    by 0x8333D86: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x831C7B9: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x70DE647: ??? (in /usr/lib/libGLX_nvidia.so.460.73.01)
==344258==    by 0x70DEE4E: ??? (in /usr/lib/libGLX_nvidia.so.460.73.01)
==344258==    by 0x7168008: ??? (in /usr/lib/libGLX_nvidia.so.460.73.01)
==344258==    by 0x4A46696: __run_exit_handlers (in /usr/lib/libc-2.33.so)
==344258==    by 0x4A4683D: exit (in /usr/lib/libc-2.33.so)
==344258==    by 0x17095F6: std::sys::unix::os::exit (os.rs:634)
==344258==    by 0x1702F8E: std::process::exit (process.rs:1753)
==344258==    by 0x32A7AA: winit::platform_impl::platform::x11::EventLoop<T>::run (mod.rs:399)
==344258==    by 0x35B054: winit::platform_impl::platform::EventLoop<T>::run (mod.rs:652)
==344258==  Block was alloc'd at
==344258==    at 0x4840B65: calloc (vg_replace_malloc.c:760)
==344258==    by 0x8335AE3: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x70E2239: ??? (in /usr/lib/libGLX_nvidia.so.460.73.01)
==344258==    by 0x831C284: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x8676EA1: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x86773D1: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x8677737: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x86795A2: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x866E871: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x867902E: ??? (in /usr/lib/libnvidia-glcore.so.460.73.01)
==344258==    by 0x712666E: ??? (in /usr/lib/libGLX_nvidia.so.460.73.01)
==344258==    by 0x61564FC: ???

Many of the contexts it prints errors for stem from destroy_swapchain_khr. I can confirm that setting the winit ControlFlow to Exit is done after I exit the render loop in the thread that drives rendering. As far as I can tell, no code of mine is executing while the app is shutting down.

I've tried inserting a delay after closing the window and telling winit to exit, and it doesn't matter. It's not due to a race condition of in-flight rendering code from what I can tell.

I can try to narrow this down further, but I'm not sure how to dive in further at this point.

Platform

  • wgpu version: wgpu 0.8.1, but the issue has plagued me since wgpu 0.7 from the best of my memory. Sadly this project hasn't been front-and-center for me lately, so I don't have a good timeline established.
  • OS: Linux 5.9.16-1-MANJARO, Xfce using x11
  • GPU: Nvidia GeForce 2070, proprietary drivers 460.73.01
@ecton
Copy link
Contributor Author

ecton commented May 14, 2021

I've tried inserting a delay after closing the window and telling winit to exit, and it doesn't matter. It's not due to a race condition of in-flight rendering code from what I can tell.

Sigh, of course as soon as I click submit and walk away from the computer I got a new idea. The area I placed that delay in still had "self" in scope, which meant that while all drawing calls were done being made, the swapchain, pipelines, device, queue, and surface hadn't been dropped yet.

So, I tried manually dropping self before I sent the notification to the other thread (using a channel under the hood), and this fixes the segfault. It seems like this crash only occurs when the main thread initiates the exit handlers and another thread is cleaning up the wgpu structures it had ownership over.

I hope this helps narrow down the issue, but the good news is that my project no longer segfaults with this workaround, so it's not impacting me anymore.

@cwfitzgerald cwfitzgerald transferred this issue from gfx-rs/wgpu-rs Jun 3, 2021
@cwfitzgerald cwfitzgerald added the type: bug Something isn't working label Jun 3, 2021
@ecton
Copy link
Contributor Author

ecton commented Aug 6, 2021

This is still affecting wgpu 0.9. I've also discovered that dropping a TextureView in similar circumstances can also trigger a similar crash. But, thankfully now, I think I've figured out everything I need to manually drop in my separate thread to prevent crashes.

My codebase has changed since I last reported this. These workarounds are now in the main branch. If anyone is wanting to debug this, the location to disable my workarounds is here. You can search the source base for this issue URL to find the current location. Comment out the drop calls, then run any of the examples. I used the simple example.

Upon running the app, close the window. It will sometimes break into the debugger inside of winit's signal handling code (but that doesn't cause an actual segfault when running the app). But, other times you'll a SIGSEGV inside of dropping a Buffer:

___lldb_unnamed_symbol54245$$libnvidia-glcore.so.470.57.02 (@___lldb_unnamed_symbol54245$$libnvidia-glcore.so.470.57.02:48)
___lldb_unnamed_symbol54266$$libnvidia-glcore.so.470.57.02 (@___lldb_unnamed_symbol54266$$libnvidia-glcore.so.470.57.02:93)
___lldb_unnamed_symbol54267$$libnvidia-glcore.so.470.57.02 (@___lldb_unnamed_symbol54267$$libnvidia-glcore.so.470.57.02:13)
___lldb_unnamed_symbol53434$$libnvidia-glcore.so.470.57.02 (@___lldb_unnamed_symbol53434$$libnvidia-glcore.so.470.57.02:35)
___lldb_unnamed_symbol1216$$libGLX_nvidia.so.0 (@___lldb_unnamed_symbol1216$$libGLX_nvidia.so.0:16)
___lldb_unnamed_symbol368$$libvulkan.so.1 (@___lldb_unnamed_symbol368$$libvulkan.so.1:37)
vkDestroyDevice (@vkDestroyDevice:20)
destroy_device (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.32.1/src/vk/features.rs:4533)
destroy_device<ash::device::Device> (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.32.1/src/device.rs:386)
drop (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-backend-vulkan-0.9.0/src/lib.rs:894)
drop_in_place<gfx_backend_vulkan::RawDevice> (@core::ptr::drop_in_place$LT$gfx_backend_vulkan..RawDevice$GT$::h7e05998bd61f7f1d:8)
drop_slow<gfx_backend_vulkan::RawDevice> (@alloc::sync::Arc$LT$T$GT$::drop_slow::h6d5c08407f4ba97d:10)
drop<gfx_backend_vulkan::RawDevice> (@_$LT$alloc..sync..Arc$LT$T$GT$$u20$as$u20$core..ops..drop..Drop$GT$::drop::h410a492a73774334:26)
drop_in_place<alloc::sync::Arc<gfx_backend_vulkan::RawDevice>> (@core::ptr::drop_in_place$LT$alloc..sync..Arc$LT$gfx_backend_vulkan..RawDevice$GT$$GT$::h75a25041c336ce1b:6)
drop_in_place<gfx_backend_vulkan::Queue> (@core::ptr::drop_in_place$LT$gfx_backend_vulkan..Queue$GT$::hd6a815af2be18212:12)
drop_in_place<[gfx_backend_vulkan::Queue]> (@core::ptr::drop_in_place$LT$$u5b$gfx_backend_vulkan..Queue$u5d$$GT$::hc9f0d573c186e866:29)
drop<gfx_backend_vulkan::Queue,alloc::alloc::Global> (@_$LT$alloc..vec..Vec$LT$T$C$A$GT$$u20$as$u20$core..ops..drop..Drop$GT$::drop::h0e9527e065af4273:17)
drop_in_place<alloc::vec::Vec<gfx_backend_vulkan::Queue, alloc::alloc::Global>> (@core::ptr::drop_in_place$LT$alloc..vec..Vec$LT$gfx_backend_vulkan..Queue$GT$$GT$::h1128a154f0d6d701:8)
drop_in_place<gfx_hal::queue::family::QueueGroup<gfx_backend_vulkan::Backend>> (@core::ptr::drop_in_place$LT$gfx_hal..queue..family..QueueGroup$LT$gfx_backend_vulkan..Backend$GT$$GT$::hbb73dac116999426:7)
dispose<gfx_backend_vulkan::Backend> (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.9.2/src/device/mod.rs:2760)
clear<gfx_backend_vulkan::Backend,wgpu_core::hub::IdentityManagerFactory> (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.9.2/src/hub.rs:719)
drop<wgpu_core::hub::IdentityManagerFactory> (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.9.2/src/hub.rs:793)
drop_in_place<wgpu_core::hub::Global<wgpu_core::hub::IdentityManagerFactory>> (@core::ptr::drop_in_place$LT$wgpu_core..hub..Global$LT$wgpu_core..hub..IdentityManagerFactory$GT$$GT$::h9e433763a0b2ce59:8)
drop_in_place<wgpu::backend::direct::Context> (@core::ptr::drop_in_place$LT$wgpu..backend..direct..Context$GT$::h36cbdd3624ab05ff:11)
drop_slow<wgpu::backend::direct::Context> (@alloc::sync::Arc$LT$T$GT$::drop_slow::h259a01f9f212cefe:10)
drop<wgpu::backend::direct::Context> (@_$LT$alloc..sync..Arc$LT$T$GT$$u20$as$u20$core..ops..drop..Drop$GT$::drop::hdf1d784bec8480dc:26)
drop_in_place<alloc::sync::Arc<wgpu::backend::direct::Context>> (@core::ptr::drop_in_place$LT$alloc..sync..Arc$LT$wgpu..backend..direct..Context$GT$$GT$::h7019f602c5999289:6)
drop_in_place<wgpu::Buffer> (@core::ptr::drop_in_place$LT$wgpu..Buffer$GT$::ha0e91d33273b7583:12)
drop_in_place<easygpu::buffers::uniform::UniformBuffer> (@core::ptr::drop_in_place$LT$easygpu..buffers..uniform..UniformBuffer$GT$::he75820c107c795ac:6)
drop_in_place<easygpu::pipeline::PipelineCore> (@core::ptr::drop_in_place$LT$easygpu..pipeline..PipelineCore$GT$::h86e16ffa1754ebc8:29)
drop_in_place<easygpu_lyon::pipeline::LyonPipeline<easygpu_lyon::pipeline::Srgb>> (@core::ptr::drop_in_place$LT$easygpu_lyon..pipeline..LyonPipeline$LT$easygpu_lyon..pipeline..Srgb$GT$$GT$::h5d2c2494645049ee:6)
drop_in_place<kludgine_core::frame_renderer::FrameRenderer<kludgine_core::sprite::Srgb>> (@core::ptr::drop_in_place$LT$kludgine_core..frame_renderer..FrameRenderer$LT$kludgine_core..sprite..Srgb$GT$$GT$::h9e23e840d676281e:68)
render_loop<kludgine_core::sprite::Srgb> (/home/ecton/repos/kludgine/core/src/frame_renderer.rs:208)
{{closure}}<kludgine_core::sprite::Srgb,closure-2> (/home/ecton/repos/kludgine/core/src/frame_renderer.rs:123)
__rust_begin_short_backtrace<closure-0,()> (@std::sys_common::backtrace::__rust_begin_short_backtrace::h3ceca0bcad7ad53a:10)
{{closure}}<closure-0,()> (@std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::hcc61cf179efa52a0:10)
call_once<(),closure-0> (@_$LT$std..panic..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h34ab37a0f1dc944e:10)
do_call<std::panic::AssertUnwindSafe<closure-0>,()> (@std::panicking::try::do_call::hae97f3c6df5b1d45:16)
__rust_try (@__rust_try:14)
try<(),std::panic::AssertUnwindSafe<closure-0>> (@std::panicking::try::ha39884f688a61a3e:25)
catch_unwind<std::panic::AssertUnwindSafe<closure-0>,()> (@std::panic::catch_unwind::h41623c6534ac9305:10)
{{closure}}<closure-0,()> (@std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::h1abb5f7c2c261633:90)
call_once<closure-0,()> (@core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hd40872b20fc13dcb:6)
_$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once (@std::sys::unix::thread::Thread::new::thread_start::h8c7c4450dba62914:16)
_$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once (@std::sys::unix::thread::Thread::new::thread_start::h8c7c4450dba62914:14)
thread_start (@std::sys::unix::thread::Thread::new::thread_start::h8c7c4450dba62914:12)
start_thread (@start_thread:51)
__clone (@__clone:26)

Or other times inside of dropping a BindGroup:

___lldb_unnamed_symbol54245$$libnvidia-glcore.so.470.57.02 (@___lldb_unnamed_symbol54245$$libnvidia-glcore.so.470.57.02:48)
___lldb_unnamed_symbol54266$$libnvidia-glcore.so.470.57.02 (@___lldb_unnamed_symbol54266$$libnvidia-glcore.so.470.57.02:93)
___lldb_unnamed_symbol54267$$libnvidia-glcore.so.470.57.02 (@___lldb_unnamed_symbol54267$$libnvidia-glcore.so.470.57.02:13)
___lldb_unnamed_symbol53332$$libnvidia-glcore.so.470.57.02 (@___lldb_unnamed_symbol53332$$libnvidia-glcore.so.470.57.02:10)
destroy_command_pool (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.32.1/src/vk/features.rs:5204)
destroy_command_pool<ash::device::Device> (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.32.1/src/device.rs:545)
destroy_command_pool (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-backend-vulkan-0.9.0/src/device.rs:509)
destroy<gfx_backend_vulkan::Backend> (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.9.2/src/command/allocator.rs:69)
destroy<gfx_backend_vulkan::Backend> (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.9.2/src/command/allocator.rs:278)
dispose<gfx_backend_vulkan::Backend> (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.9.2/src/device/mod.rs:2748)
clear<gfx_backend_vulkan::Backend,wgpu_core::hub::IdentityManagerFactory> (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.9.2/src/hub.rs:719)
drop<wgpu_core::hub::IdentityManagerFactory> (/home/ecton/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.9.2/src/hub.rs:793)
drop_in_place<wgpu_core::hub::Global<wgpu_core::hub::IdentityManagerFactory>> (@core::ptr::drop_in_place$LT$wgpu_core..hub..Global$LT$wgpu_core..hub..IdentityManagerFactory$GT$$GT$::h9e433763a0b2ce59:8)
drop_in_place<wgpu::backend::direct::Context> (@core::ptr::drop_in_place$LT$wgpu..backend..direct..Context$GT$::h36cbdd3624ab05ff:11)
drop_slow<wgpu::backend::direct::Context> (@alloc::sync::Arc$LT$T$GT$::drop_slow::h259a01f9f212cefe:10)
drop<wgpu::backend::direct::Context> (@_$LT$alloc..sync..Arc$LT$T$GT$$u20$as$u20$core..ops..drop..Drop$GT$::drop::hdf1d784bec8480dc:26)
drop_in_place<alloc::sync::Arc<wgpu::backend::direct::Context>> (@core::ptr::drop_in_place$LT$alloc..sync..Arc$LT$wgpu..backend..direct..Context$GT$$GT$::h7019f602c5999289:6)
drop_in_place<wgpu::BindGroup> (@core::ptr::drop_in_place$LT$wgpu..BindGroup$GT$::h75816cb360134439:11)
drop_in_place<easygpu::binding::BindingGroup> (@core::ptr::drop_in_place$LT$easygpu..binding..BindingGroup$GT$::h1c0678db7777c485:6)
drop_in_place<(u64, easygpu::binding::BindingGroup)> (@core::ptr::drop_in_place$LT$$LP$u64$C$easygpu..binding..BindingGroup$RP$$GT$::h8a3c850db250cc1d:7)
core::ptr::mut_ptr::_$LT$impl$u20$$BP$mut$u20$T$GT$::drop_in_place (@hashbrown::raw::Bucket$LT$T$GT$::drop::hcaedeedffb70ef47:10)
drop<(u64, easygpu::binding::BindingGroup)> (@hashbrown::raw::Bucket$LT$T$GT$::drop::hcaedeedffb70ef47:9)
drop_elements<(u64, easygpu::binding::BindingGroup),alloc::alloc::Global> (@hashbrown::raw::RawTable$LT$T$C$A$GT$::drop_elements::hf75504682068b25f:53)
drop<(u64, easygpu::binding::BindingGroup),alloc::alloc::Global> (@_$LT$hashbrown..raw..RawTable$LT$T$C$A$GT$$u20$as$u20$core..ops..drop..Drop$GT$::drop::hf689904ab0928757:15)
drop_in_place<hashbrown::raw::RawTable<(u64, easygpu::binding::BindingGroup), alloc::alloc::Global>> (@core::ptr::drop_in_place$LT$hashbrown..raw..RawTable$LT$$LP$u64$C$easygpu..binding..BindingGroup$RP$$GT$$GT$::h234d7db1471734ca:6)
drop_in_place<hashbrown::map::HashMap<u64, easygpu::binding::BindingGroup, std::collections::hash::map::RandomState, alloc::alloc::Global>> (@core::ptr::drop_in_place$LT$hashbrown..map..HashMap$LT$u64$C$easygpu..binding..BindingGroup$C$std..collections..hash..map..RandomState$GT$$GT$::h58d5ff98846ce46c:7)
drop_in_place<std::collections::hash::map::HashMap<u64, easygpu::binding::BindingGroup, std::collections::hash::map::RandomState>> (@core::ptr::drop_in_place$LT$std..collections..hash..map..HashMap$LT$u64$C$easygpu..binding..BindingGroup$GT$$GT$::h1dca429051fd2f4e:6)
drop_in_place<kludgine_core::frame_renderer::GpuState> (@core::ptr::drop_in_place$LT$kludgine_core..frame_renderer..GpuState$GT$::h42cca9e6e4ae5bb6:6)
drop_in_place<core::cell::UnsafeCell<kludgine_core::frame_renderer::GpuState>> (@core::ptr::drop_in_place$LT$core..cell..UnsafeCell$LT$kludgine_core..frame_renderer..GpuState$GT$$GT$::hb14b517e4d93ad45:6)
drop_in_place<std::sync::mutex::Mutex<kludgine_core::frame_renderer::GpuState>> (@core::ptr::drop_in_place$LT$std..sync..mutex..Mutex$LT$kludgine_core..frame_renderer..GpuState$GT$$GT$::h3ed0e0248fb54acc:12)
drop_in_place<kludgine_core::frame_renderer::FrameRenderer<kludgine_core::sprite::Srgb>> (@core::ptr::drop_in_place$LT$kludgine_core..frame_renderer..FrameRenderer$LT$kludgine_core..sprite..Srgb$GT$$GT$::h52e2267827d3e70d:81)
render_loop<kludgine_core::sprite::Srgb> (/home/ecton/repos/kludgine/core/src/frame_renderer.rs:208)
{{closure}}<kludgine_core::sprite::Srgb,closure-2> (/home/ecton/repos/kludgine/core/src/frame_renderer.rs:123)
__rust_begin_short_backtrace<closure-0,()> (@std::sys_common::backtrace::__rust_begin_short_backtrace::hfb5c3fa6467ca7e8:10)
{{closure}}<closure-0,()> (@std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::hf483f6c61b949694:10)
call_once<(),closure-0> (@_$LT$std..panic..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h63012b9cb894d02a:10)
do_call<std::panic::AssertUnwindSafe<closure-0>,()> (@std::panicking::try::do_call::h76c418a979f9ba14:16)
__rust_try (@__rust_try:14)
try<(),std::panic::AssertUnwindSafe<closure-0>> (@std::panicking::try::h9b1da831d8bb5554:25)
catch_unwind<std::panic::AssertUnwindSafe<closure-0>,()> (@std::panic::catch_unwind::h2443d5bb36ea947c:10)
{{closure}}<closure-0,()> (@std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::h83eb5bfe3554ba93:90)
call_once<closure-0,()> (@core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hb9262dbc1325f700:6)
_$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once (@std::sys::unix::thread::Thread::new::thread_start::h8c7c4450dba62914:16)
_$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once (@std::sys::unix::thread::Thread::new::thread_start::h8c7c4450dba62914:14)
thread_start (@std::sys::unix::thread::Thread::new::thread_start::h8c7c4450dba62914:12)
start_thread (@start_thread:51)
__clone (@__clone:26)

These crashes are occurring in the thread that those drop calls are written in, which is not the main winit thread. By moving the drops to happen before I tell the main thread to exit, it prevents the crashes from occurring.

@kvark
Copy link
Member

kvark commented Aug 6, 2021

So in your case the resources are dropped after all the context/device are cleaned up?

@ecton
Copy link
Contributor Author

ecton commented Aug 6, 2021

I don't believe so (Sorry, I have a lot of projects and this is some of my oldest code). In my setup, this thread does all the rendering. The main thread is winit event handling only. I don't believe any other thread has actual references to wgpu types. The other threads work on other types that eventually get rendered by this thread.

After digging through to try to re-familiarize myself with this, I believe what's happening is that my signal to shutdown is finishing destroying the window itself. Since winit controls when the window itself goes away, and that happens in the main thread, I believe the window is sometimes being destroyed before the wgpu resources are destroyed.

If that's true, is this actually a bug then? Can wgpu even protect me from myself for that situation?

@kvark
Copy link
Member

kvark commented Aug 11, 2021

Great question! This is basically #1463 (cc @pythonesque )

@pythonesque
Copy link
Contributor

It's "not a bug" in the sense that wgpu doesn't currently provide a safe interface to surface creation, so in theory all bets are off. In practice, I think it is one though, as it's very unlikely that an average wgpu user who needs windowing is going to be able to use it safely. Fortunately, I think the proposal @kvark linked (or a minor tweak of it) can solve the issue; basically, for safety, we just need to make sure the surface holds onto a reference to the window, preventing it from being destroyed until the wgpu context is destroyed (it's more complicated than that but that's the basic idea).

@ecton
Copy link
Contributor Author

ecton commented Aug 12, 2021

I read through the linked issue, and it sounds great to me.

While there's not a bug in wgpu's implementation, I think there's a bug in the documentation: the safety requirements aren't documented correctly. It currently only says that the raw window handle must be valid for creation. I'm a prime example of people that can't make the connection that the handle must also remain valid for the life of the surface.

Would a PR modifying the documentation's safety sentence to add that note be worth doing?

@kvark
Copy link
Member

kvark commented Aug 12, 2021

Sure, we'll be happy to have the documentation corrected!

Patryk27 pushed a commit to Patryk27/wgpu that referenced this issue Nov 23, 2022
Replace uses of `call_unique` with uses of `call` and `call_or`, which becomes
public. It's not clear when `call_unique` is correct to use, and avoiding a few
numeric suffixes here and there isn't worth it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants