Buffers Leaked when Device Handle is Dropped Before Buffers #5529

jonmmease · 2024-04-13T14:32:50Z

Description
After repeated headless rendering with the metal backend, wgpu 0.19 crashes with a "Context leak detected, msgtracer returned -1" error where version 0.18.0 works without a problem.

My workflow is to perform headless rendering to PNG images following the approach in the Wgpu without a window learn-wgpu page.

Repro steps
I've created a repro repository in https://github.com/jonmmease/wgpu-memory-repro that reliable reproduces the crash on my M1 macbook pro. See the README for more details about the repro and full system details.

Expected vs observed behavior
In wgpu 0.18.0, everything works well. in 0.19 (I tried 0.19.0 through 0.19.3) I get this crash after ~100 iterations.

Context leak detected, msgtracer returned -1

thread 'main' panicked at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9:
assertion failed: !ptr.is_null()
stack backtrace:
   0: rust_begin_unwind
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:144:5
   3: foreign_types_shared::ForeignTypeRef::from_ptr
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
   4: <metal::commandqueue::CommandQueue as core::ops::deref::Deref>::deref
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/metal-0.27.0/src/commandqueue.rs:13:1
   5: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding::{{closure}}
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-hal-0.19.3/src/metal/command.rs:179:17
   6: objc::rc::autorelease::autoreleasepool
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/objc-0.2.7/src/rc/autorelease.rs:29:5
   7: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-hal-0.19.3/src/metal/command.rs:175:19
   8: wgpu_core::device::queue::PendingWrites<A>::activate
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/device/queue.rs:265:17
   9: wgpu_core::device::resource::Device<A>::new
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/device/resource.rs:230:9
  10: wgpu_core::instance::Adapter<A>::create_device_and_queue_from_hal
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:306:29
  11: wgpu_core::instance::Adapter<A>::create_device_and_queue
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:381:9
  12: wgpu_core::instance::<impl wgpu_core::global::Global<G>>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:1084:23
  13: <wgpu::backend::wgpu_core::ContextWgpuCore as wgpu::context::Context>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/backend/wgpu_core.rs:587:44
  14: <T as wgpu::context::DynContext>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/context.rs:2019:22
  15: wgpu::Adapter::request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/lib.rs:2119:22
  16: wgpu_memory_repro::State::new::{{closure}}
             at ./src/main.rs:103:31
  17: wgpu_memory_repro::run::{{closure}}
             at ./src/main.rs:300:34
  18: pollster::block_on
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pollster-0.3.0/src/lib.rs:128:15
  19: wgpu_memory_repro::main
             at ./src/main.rs:309:9
  20: core::ops::function::FnOnce::call_once
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Extra materials
I went through the Metal section of https://github.com/gfx-rs/wgpu/wiki/Debugging-wgpu-Applications, creating the XML file and updating the executable with the codesign command. Then I ran the executable with METAL_DEVICE_WRAPPER_TYPE=1 ./target/debug/wgpu-memory-repro. The console printed out wgpu-memory-repro[4512:4643030] Metal API Validation Enabled, but no additional information was displayed. Let me know if that's not the correct process and there's anything more I can do to get more info.

Thanks a lot!

Platform

Metal backend
Chip: Apple M1 Pro
macOS: Sonoma 14.4.1
Memory 32 GB

The text was updated successfully, but these errors were encountered:

jonmmease · 2024-04-13T14:48:16Z

After reading through #3056, I tried setting OBJC_DEBUG_MISSING_POOLS=YES and I get pages and pages of messages like this:

 % OBJC_DEBUG_MISSING_POOLS=YES METAL_DEVICE_WRAPPER_TYPE=1 ./target/debug/wgpu-memory-repro

2024-04-13 10:41:33.085 wgpu-memory-repro[58773:4772502] Metal API Validation Enabled
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a28060 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624e70 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624e40 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b30000 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a281d0 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624900 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624de0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b24320 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b24370 of class MTLCommandQueueDescriptorInternal autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a34020 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001634030 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001634300 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b340a0 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24b00 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24ac0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a80 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a40 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a00 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d249c0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24940 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24600 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d246c0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24680 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24740 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x6000016260d0 of class __NSArrayM autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
...
thread 'main' panicked at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9:
assertion failed: !ptr.is_null()

But it looks like similar messages are printed with wgpu 0.18.0 (when the program doesn't crash), so I'm not sure whether this is significant.

jonmmease · 2024-04-15T18:18:16Z

According to git-bisect, and my repro above, this bug was introduced in #3626.

Friendly ping @gents83 and @nical in case anything here rings a bell based on your work/review on that PR

jonmmease · 2024-04-17T19:06:33Z

I was able to simplify the repro a lot more. It turns out that the png export workflow isn't a relevant factor, and the crash can be reproduced by repeatedly allocating a vertex buffer. For example:

use wgpu::util::DeviceExt;

pub async fn run() {
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    // // wgpu 0.18
    // let device_descriptor = wgpu::DeviceDescriptor {
    //     label: None,
    //     features: wgpu::Features::empty(),
    //     limits: wgpu::Limits::default(),
    // };

    // wgpu 0.19.3
    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, _) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    // Allocate a vertex buffer that should be dropped immediately.
    // Without this allocation, there is no leak/crash.
    let _ = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Vertex Buffer"),
        contents: bytemuck::cast_slice(&[0; 1024]),
        usage: wgpu::BufferUsages::VERTEX,
    });
}

fn main() {
    for i in 0..1000 {
        if i % 10 == 0 {
            println!("{i}");
        }
        pollster::block_on(run());
    }
}

0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
Context leak detected, msgtracer returned -1
150
160
170
180
190
Context leak detected, msgtracer returned -1
200
210
220
230
240
Context leak detected, msgtracer returned -1
250
260
270
280
290
Context leak detected, msgtracer returned -1
300
310
320
330
340
Context leak detected, msgtracer returned -1
350
360
370
380
390
Context leak detected, msgtracer returned -1
400
410
420
430
440
Context leak detected, msgtracer returned -1
450
460
470
480
490
Context leak detected, msgtracer returned -1
500
510
520
530
540
Context leak detected, msgtracer returned -1
550
560
570
580
590
Context leak detected, msgtracer returned -1
600
610
620
630
640
Context leak detected, msgtracer returned -1
650
660
670
680
690
Context leak detected, msgtracer returned -1
700
710
720
730
740
Context leak detected, msgtracer returned -1
750
760
770
780
790
thread 'main' panicked at 'assertion failed: !ptr.is_null()', /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
   3: foreign_types_shared::ForeignTypeRef::from_ptr
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
   4: <metal::commandqueue::CommandQueue as core::ops::deref::Deref>::deref
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/metal-0.27.0/src/commandqueue.rs:13:1
   5: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding::{{closure}}
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-hal/src/metal/command.rs:179:17
   6: objc::rc::autorelease::autoreleasepool
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/objc-0.2.7/src/rc/autorelease.rs:29:5
   7: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-hal/src/metal/command.rs:175:19
   8: wgpu_core::device::queue::PendingWrites<A>::activate
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/device/queue.rs:280:17
   9: wgpu_core::device::resource::Device<A>::new
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/device/resource.rs:226:9
  10: wgpu_core::instance::Adapter<A>::create_device_and_queue_from_hal
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:312:29
  11: wgpu_core::instance::Adapter<A>::create_device_and_queue
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:386:9
  12: wgpu_core::instance::<impl wgpu_core::global::Global<G>>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:1200:23
  13: <wgpu::backend::direct::Context as wgpu::context::Context>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/backend/direct.rs:635:44
  14: <T as wgpu::context::DynContext>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/context.rs:2124:22
  15: wgpu::Adapter::request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/lib.rs:2263:22
  16: wgpu_memory_repro2::run::{{closure}}
             at ./src/main.rs:32:23
  17: pollster::block_on
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pollster-0.3.0/src/lib.rs:128:15
  18: wgpu_memory_repro2::main
             at ./src/main.rs:49:9
  19: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5

grovesNL · 2024-04-17T19:09:33Z

Does the leak happen if you wrap the entire thing in an autorelease pool? I guess you might be noticing this because it's running headless and nothing provides a global autorelease pool (vs. a windowed application that would have an autorelease pool automatically added by the windowing library).

jonmmease · 2024-04-17T22:38:52Z

Thanks for the suggestion @grovesNL

I'm getting the same result when I update the above like this:

        objc::rc::autoreleasepool(|| {
            pollster::block_on(run());
        })

or if I wrap the create_buffer_init call like this:

    objc::rc::autoreleasepool(|| {
        let _ = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
            label: Some("Vertex Buffer"),
            contents: bytemuck::cast_slice(&[0; 1024]),
            usage: wgpu::BufferUsages::VERTEX,
        });
    })

Let me know if that's not what you had in mind. Thanks again!

jonmmease · 2024-04-18T00:16:27Z

Debugging a bit. I see that when the buffer is created this create_buffer function is called.

wgpu/wgpu-hal/src/metal/device.rs

Line 285 in d30255f

    
           unsafe fn create_buffer(&self, desc: &crate::BufferDescriptor) -> DeviceResult<super::Buffer> {

But I haven't been able to figure out whether there is metal-specific cleanup logic that is supposed to be called when a buffer is dropped, or if the autoreleasepool in the create_buffer function above is supposed to take care of that.

grovesNL · 2024-04-18T01:10:40Z

Yeah that seems right. We basically need an autorelease pool to be active whenever types like Metal buffers are created/cloned (retained)/dropped (releaseed)/etc., so if there is a place that does that without an autorelease pool active then it could leak. I think the one you used for the entire program would be enough, but maybe there's something else going on.

I haven't worked with the Metal backend for a while, but maybe something else (internally to wgpu) holds a strong reference to the buffer you're creating? The Xcode debugger might be able to show you if anything's referencing it.

jonmmease · 2024-04-18T16:32:52Z

Thanks @grovesNL, that's helpful.

I started playing with Xcode Instruments to analyze leaks. It's a little overwhelming. For the commit before #3626, no leaks are detected (though memory usage does climb a little over time).

But for #3626 and beyond, there are dozens of leaks reported and memory usage increases dramatically over time until the program crashes.

When I have more time I'll try to look at some of these stack traces more carefully and see if I can make more sense of it.

jonmmease · 2024-04-18T23:49:03Z

I found something potentially suspicious. In my repro, each iteration results in a device being created using the new associated function at

wgpu/wgpu-core/src/device/resource.rs

Line 211 in ca729cc

pub(crate) fn new(

But the corresponding drop associated function is never called

wgpu/wgpu-core/src/device/resource.rs

Lines 162 to 176 in ca729cc

    
           impl<A: HalApi> Drop for Device<A> { 
        
               fn drop(&mut self) { 
        
                   resource_log!("Destroy raw Device {:?}", self.info.label()); 
        
                   let raw = self.raw.take().unwrap(); 
        
                   let pending_writes = self.pending_writes.lock().take().unwrap(); 
        
                   pending_writes.dispose(&raw); 
        
                   self.command_allocator.dispose(&raw); 
        
                   unsafe { 
        
                       raw.destroy_buffer(self.zero_buffer.take().unwrap()); 
        
                       raw.destroy_fence(self.fence.write().take().unwrap()); 
        
                       let queue = self.queue_to_drop.take().unwrap(); 
        
                       raw.exit(queue); 
        
                   } 
        
               } 
        
           }

I think the leak of buffers may be associated with the fact that the device's buffers aren't getting cleaned up when the top-level handle to the device is dropped. This would suggest that this might not be specifically a metal issue, and perhaps I'm hitting it more than most because I end up creating and dropping devices repeatedly.

cwfitzgerald · 2024-04-18T23:59:48Z

The ownership is actually the other way around, buffers have owning refs on the device - the buffer leak is probably causing the device to stay alive.

See #5120 for the ideal ownership diagram.

jonmmease · 2024-04-19T11:59:40Z

Thanks @cwfitzgerald, that's helpful. I think I'm honing in on a circular reference.

In the repro above, the call to device.create_buffer_init unmaps the buffer after writing to it. This buffer.unmap eventually calls Buffer.ummap_inner:

wgpu/wgpu-core/src/resource.rs

Line 435 in 54740d5

    
           fn unmap_inner(self: &Arc<Self>) -> Result<Option<BufferMapPendingClosure>, BufferAccessError> {

This performs a match on the current value of map_state, and in my case this takes the resource::BufferMapState::Init branch.

wgpu/wgpu-core/src/resource.rs

Lines 446 to 450 in 54740d5

    
           resource::BufferMapState::Init { 
        
               ptr, 
        
               stage_buffer, 
        
               needs_flush, 
        
           } => {

The second to last line of this branch calls pending_writes.consume_temp, passing in the stage_buffer.

wgpu/wgpu-core/src/resource.rs

Line 503 in 54740d5

pending_writes.consume_temp(queue::TempResource::Buffer(stage_buffer));

The stage buffer is an instance of this Buffer struct, which has a strong reference to the Device.

wgpu/wgpu-core/src/resource.rs

Lines 384 to 387 in 54740d5

    
           #[derive(Debug)] 
        
           pub struct Buffer<A: HalApi> { 
        
               pub(crate) raw: Snatchable<A::Buffer>, 
        
               pub(crate) device: Arc<Device<A>>,

This pending_writes.consume_temp call adds the Buffer to the temp_resources Vec of PendingWrites.

wgpu/wgpu-core/src/device/queue.rs

Lines 197 to 207 in 54740d5

    
           pub(crate) struct PendingWrites<A: HalApi> { 
        
               pub command_encoder: A::CommandEncoder, 
        
               /// True if `command_encoder` is in the "recording" state, as 
        
               /// described in the docs for the [`wgpu_hal::CommandEncoder`] 
        
               /// trait. 
        
               /// 
        
               /// [`wgpu_hal::CommandEncoder`]: hal::CommandEncoder 
        
               pub is_recording: bool, 
        
               pub temp_resources: Vec<TempResource<A>>,

This Vec is never cleared because in this repro I'm not submitting anything to the queue, and the dispose function is never called...

wgpu/wgpu-core/src/device/queue.rs

Lines 227 to 238 in 54740d5

    
           pub fn dispose(mut self, device: &A::Device) { 
        
               unsafe { 
        
                   if self.is_recording { 
        
                       self.command_encoder.discard_encoding(); 
        
                   } 
        
                   self.command_encoder 
        
                       .reset_all(self.executing_command_buffers.into_iter()); 
        
                   device.destroy_command_encoder(self.command_encoder); 
        
               } 
        
               self.temp_resources.clear(); 
        
           }

because the Device is never dropped, because a reference to it is in pending_writes.

wgpu/wgpu-core/src/device/resource.rs

Lines 162 to 167 in 54740d5

    
           impl<A: HalApi> Drop for Device<A> { 
        
               fn drop(&mut self) { 
        
                   resource_log!("Destroy raw Device {:?}", self.info.label()); 
        
                   let raw = self.raw.take().unwrap(); 
        
                   let pending_writes = self.pending_writes.lock().take().unwrap(); 
        
                   pending_writes.dispose(&raw);

If I add

let commands: Vec<CommandBuffer> = Vec::new();
queue.submit(commands.into_iter());

to the end of the repro then the device does get dropped successfully, and there is no leak/crash.

Unfortunately, in the original repro (and my real project) I am submitting work the the queue, so this isn't quite what's going on there.

jonmmease · 2024-04-19T13:56:15Z

Ok, I've made a bit more progress. It looks like order in which wgpu objects are dropped determines whether there is a leak or not.

Here's another example where I've flattened everything into a single render function (rather than storing the device and other resources in the State struct.

use std::iter;
use wgpu::util::DeviceExt;
use wgpu::{
    CommandBuffer, CommandEncoderDescriptor, Extent3d, ImageCopyBuffer, ImageCopyTexture,
    ImageDataLayout, Origin3d, Texture, TextureAspect, TextureDescriptor, TextureDimension,
    TextureFormat, TextureUsages, TextureView,
};

#[repr(C)]
#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable)]
struct Vertex {
    position: [f32; 3],
    color: [f32; 3],
}

impl Vertex {
    fn desc() -> wgpu::VertexBufferLayout<'static> {
        wgpu::VertexBufferLayout {
            array_stride: std::mem::size_of::<Vertex>() as wgpu::BufferAddress,
            step_mode: wgpu::VertexStepMode::Vertex,
            attributes: &[
                wgpu::VertexAttribute {
                    offset: 0,
                    shader_location: 0,
                    format: wgpu::VertexFormat::Float32x3,
                },
                wgpu::VertexAttribute {
                    offset: std::mem::size_of::<[f32; 3]>() as wgpu::BufferAddress,
                    shader_location: 1,
                    format: wgpu::VertexFormat::Float32x3,
                },
            ],
        }
    }
}

const VERTICES: &[Vertex] = &[
    Vertex {
        position: [-0.0868241, 0.49240386, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // A
    Vertex {
        position: [-0.49513406, 0.06958647, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // B
    Vertex {
        position: [-0.21918549, -0.44939706, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // C
    Vertex {
        position: [0.35966998, -0.3473291, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // D
    Vertex {
        position: [0.44147372, 0.2347359, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // E
];

const INDICES: &[u16] = &[0, 1, 4, 1, 2, 4, 2, 3, 4, /* padding */ 0];

async fn render() -> Result<(), wgpu::SurfaceError> {
    // The instance is a handle to our GPU
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, queue) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    // The instance is a handle to our GPU
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    // wgpu 0.19.3
    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, queue) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    let texture_format = TextureFormat::Rgba8Unorm;
    let size = 256u32;
    let texture_desc = TextureDescriptor {
        size: Extent3d {
            width: size,
            height: size,
            depth_or_array_layers: 1,
        },
        mip_level_count: 1,
        sample_count: 1,
        dimension: TextureDimension::D2,
        format: texture_format,
        usage: TextureUsages::COPY_SRC | TextureUsages::RENDER_ATTACHMENT,
        label: None,
        view_formats: &[texture_format],
    };

    let texture = device.create_texture(&texture_desc);
    let texture_view = texture.create_view(&Default::default());

    let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
        label: Some("Shader"),
        source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
    });

    let render_pipeline_layout = device.create_pipeline_layout(&wgpu::PipelineLayoutDescriptor {
        label: Some("Render Pipeline Layout"),
        bind_group_layouts: &[],
        push_constant_ranges: &[],
    });

    let render_pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
        label: Some("Render Pipeline"),
        layout: Some(&render_pipeline_layout),
        vertex: wgpu::VertexState {
            module: &shader,
            entry_point: "vs_main",
            buffers: &[Vertex::desc()],
        },
        fragment: Some(wgpu::FragmentState {
            module: &shader,
            entry_point: "fs_main",
            targets: &[Some(wgpu::ColorTargetState {
                format: texture_format,
                blend: Some(wgpu::BlendState {
                    color: wgpu::BlendComponent::REPLACE,
                    alpha: wgpu::BlendComponent::REPLACE,
                }),
                write_mask: wgpu::ColorWrites::ALL,
            })],
        }),
        primitive: wgpu::PrimitiveState {
            topology: wgpu::PrimitiveTopology::TriangleList,
            strip_index_format: None,
            front_face: wgpu::FrontFace::Ccw,
            cull_mode: Some(wgpu::Face::Back),
            polygon_mode: wgpu::PolygonMode::Fill,
            unclipped_depth: false,
            conservative: false,
        },
        depth_stencil: None,
        multisample: wgpu::MultisampleState {
            count: 1,
            mask: !0,
            alpha_to_coverage_enabled: false,
        },
        multiview: None,
    });

    let vertex_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Vertex Buffer"),
        contents: bytemuck::cast_slice(VERTICES),
        usage: wgpu::BufferUsages::VERTEX,
    });
    let index_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Index Buffer"),
        contents: bytemuck::cast_slice(INDICES),
        usage: wgpu::BufferUsages::INDEX,
    });
    let num_indices = INDICES.len() as u32;

    let mut encoder = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
        label: Some("Render Encoder"),
    });

    {
        let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
            label: Some("Render Pass"),
            color_attachments: &[Some(wgpu::RenderPassColorAttachment {
                view: &texture_view,
                resolve_target: None,
                ops: wgpu::Operations {
                    load: wgpu::LoadOp::Clear(wgpu::Color {
                        r: 0.1,
                        g: 0.2,
                        b: 0.3,
                        a: 1.0,
                    }),
                    store: wgpu::StoreOp::Store,
                },
            })],
            depth_stencil_attachment: None,
            occlusion_query_set: None,
            timestamp_writes: None,
        });

        render_pass.set_pipeline(&render_pipeline);
        render_pass.set_vertex_buffer(0, vertex_buffer.slice(..));
        render_pass.set_index_buffer(index_buffer.slice(..), wgpu::IndexFormat::Uint16);
        render_pass.draw_indexed(0..num_indices, 0, 0..1);
    }
    queue.submit(iter::once(encoder.finish()));
    
    // // Dropping device first triggers a memory leak
    // drop(device);

    Ok(())
}

pub async fn run() {
    render().await.unwrap();
}

fn main() {
    for i in 0..1000 {
        if i % 10 == 0 {
            println!("{i}");
        }
        pollster::block_on(run());
    }
}

This works without a leak/crash. But if you uncomment the drop(device); line at the bottom of the function (which causes the device to be dropped before the texture and buffers), then the leak and crash behavior is present. My original repro (and my project) triggered this because the device property was the first struct property of the State struct, and so it was getting dropped before the Texture and TextureView. When I reorder the struct properties in the repro like this then there is no leak:

struct State {
    size: u32,
    vertex_buffer: wgpu::Buffer,
    index_buffer: wgpu::Buffer,
    num_indices: u32,
    texture_size: Extent3d,
    render_pipeline: wgpu::RenderPipeline,
    texture_view: TextureView,
    texture: Texture,
    queue: wgpu::Queue,
    device: wgpu::Device,
}

Figuring out that this is how things currently work is enough to unblock me, but I am wondering if this should be considered a bug or just something that should be documented (and apologies if it is and I didn't come across it).

cwfitzgerald · 2024-04-19T15:17:19Z

Oh yes, that's definitely a bug. Well done tracking it down!

gents83 · 2024-04-19T15:51:48Z

Sorry for not being very reactive or present in this period :(
Thanks a lot for your investigation @jonmmease
Apparently it seems that dropping the device doesn't force the queue submit to be finished and the release of resources before destroying, isn't it?
It could be even beneficial to add it to the mem_leaks.rs test in order to keep track of this issue too - while fixing it.

jonmmease mentioned this issue Apr 13, 2024

MacOS Ventura and Sonoma memory leak #5394

Open

jonmmease mentioned this issue Apr 18, 2024

Multiple leaks on Metal backend #5448

Closed

jonmmease mentioned this issue Apr 19, 2024

Change wgpu object drop order to fix memory leak jonmmease/avenger#67

Merged

cwfitzgerald changed the title ~~0.19 metal backend regression: Crashes with "Context leak detected, msgtracer returned -1"~~ Buffers Leaked when Device Handle is Dropped Before Buffers Apr 19, 2024

cwfitzgerald added type: bug Something isn't working help wanted Contributions encouraged area: correctness We're behaving incorrectly labels Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

jonmmease commented Apr 13, 2024

jonmmease commented Apr 13, 2024

jonmmease commented Apr 15, 2024

jonmmease commented Apr 17, 2024

grovesNL commented Apr 17, 2024

jonmmease commented Apr 17, 2024

jonmmease commented Apr 18, 2024

grovesNL commented Apr 18, 2024

jonmmease commented Apr 18, 2024

jonmmease commented Apr 18, 2024

cwfitzgerald commented Apr 18, 2024 •

edited

jonmmease commented Apr 19, 2024

jonmmease commented Apr 19, 2024

cwfitzgerald commented Apr 19, 2024

gents83 commented Apr 19, 2024

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

Comments

jonmmease commented Apr 13, 2024

jonmmease commented Apr 13, 2024

jonmmease commented Apr 15, 2024

jonmmease commented Apr 17, 2024

grovesNL commented Apr 17, 2024

jonmmease commented Apr 17, 2024

jonmmease commented Apr 18, 2024

grovesNL commented Apr 18, 2024

jonmmease commented Apr 18, 2024

jonmmease commented Apr 18, 2024

cwfitzgerald commented Apr 18, 2024 • edited

jonmmease commented Apr 19, 2024

jonmmease commented Apr 19, 2024

cwfitzgerald commented Apr 19, 2024

gents83 commented Apr 19, 2024

cwfitzgerald commented Apr 18, 2024 •

edited