Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

Open
jonmmease opened this issue Apr 13, 2024 · 14 comments
Open

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

jonmmease opened this issue Apr 13, 2024 · 14 comments
Labels
area: correctness We're behaving incorrectly help wanted Contributions encouraged type: bug Something isn't working

Comments

@jonmmease
Copy link

Description
After repeated headless rendering with the metal backend, wgpu 0.19 crashes with a "Context leak detected, msgtracer returned -1" error where version 0.18.0 works without a problem.

My workflow is to perform headless rendering to PNG images following the approach in the Wgpu without a window learn-wgpu page.

Repro steps
I've created a repro repository in https://github.com/jonmmease/wgpu-memory-repro that reliable reproduces the crash on my M1 macbook pro. See the README for more details about the repro and full system details.

Expected vs observed behavior
In wgpu 0.18.0, everything works well. in 0.19 (I tried 0.19.0 through 0.19.3) I get this crash after ~100 iterations.

Context leak detected, msgtracer returned -1

thread 'main' panicked at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9:
assertion failed: !ptr.is_null()
stack backtrace:
   0: rust_begin_unwind
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:144:5
   3: foreign_types_shared::ForeignTypeRef::from_ptr
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
   4: <metal::commandqueue::CommandQueue as core::ops::deref::Deref>::deref
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/metal-0.27.0/src/commandqueue.rs:13:1
   5: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding::{{closure}}
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-hal-0.19.3/src/metal/command.rs:179:17
   6: objc::rc::autorelease::autoreleasepool
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/objc-0.2.7/src/rc/autorelease.rs:29:5
   7: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-hal-0.19.3/src/metal/command.rs:175:19
   8: wgpu_core::device::queue::PendingWrites<A>::activate
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/device/queue.rs:265:17
   9: wgpu_core::device::resource::Device<A>::new
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/device/resource.rs:230:9
  10: wgpu_core::instance::Adapter<A>::create_device_and_queue_from_hal
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:306:29
  11: wgpu_core::instance::Adapter<A>::create_device_and_queue
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:381:9
  12: wgpu_core::instance::<impl wgpu_core::global::Global<G>>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:1084:23
  13: <wgpu::backend::wgpu_core::ContextWgpuCore as wgpu::context::Context>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/backend/wgpu_core.rs:587:44
  14: <T as wgpu::context::DynContext>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/context.rs:2019:22
  15: wgpu::Adapter::request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/lib.rs:2119:22
  16: wgpu_memory_repro::State::new::{{closure}}
             at ./src/main.rs:103:31
  17: wgpu_memory_repro::run::{{closure}}
             at ./src/main.rs:300:34
  18: pollster::block_on
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pollster-0.3.0/src/lib.rs:128:15
  19: wgpu_memory_repro::main
             at ./src/main.rs:309:9
  20: core::ops::function::FnOnce::call_once
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Extra materials
I went through the Metal section of https://github.com/gfx-rs/wgpu/wiki/Debugging-wgpu-Applications, creating the XML file and updating the executable with the codesign command. Then I ran the executable with METAL_DEVICE_WRAPPER_TYPE=1 ./target/debug/wgpu-memory-repro. The console printed out wgpu-memory-repro[4512:4643030] Metal API Validation Enabled, but no additional information was displayed. Let me know if that's not the correct process and there's anything more I can do to get more info.

Thanks a lot!

Platform

  • Metal backend
  • Chip: Apple M1 Pro
  • macOS: Sonoma 14.4.1
  • Memory 32 GB
@jonmmease
Copy link
Author

After reading through #3056, I tried setting OBJC_DEBUG_MISSING_POOLS=YES and I get pages and pages of messages like this:

 % OBJC_DEBUG_MISSING_POOLS=YES METAL_DEVICE_WRAPPER_TYPE=1 ./target/debug/wgpu-memory-repro

2024-04-13 10:41:33.085 wgpu-memory-repro[58773:4772502] Metal API Validation Enabled
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a28060 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624e70 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624e40 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b30000 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a281d0 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624900 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624de0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b24320 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b24370 of class MTLCommandQueueDescriptorInternal autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a34020 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001634030 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001634300 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b340a0 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24b00 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24ac0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a80 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a40 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a00 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d249c0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24940 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24600 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d246c0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24680 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24740 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x6000016260d0 of class __NSArrayM autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
...
thread 'main' panicked at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9:
assertion failed: !ptr.is_null()

But it looks like similar messages are printed with wgpu 0.18.0 (when the program doesn't crash), so I'm not sure whether this is significant.

@jonmmease
Copy link
Author

According to git-bisect, and my repro above, this bug was introduced in #3626.

Friendly ping @gents83 and @nical in case anything here rings a bell based on your work/review on that PR

@jonmmease
Copy link
Author

I was able to simplify the repro a lot more. It turns out that the png export workflow isn't a relevant factor, and the crash can be reproduced by repeatedly allocating a vertex buffer. For example:

use wgpu::util::DeviceExt;

pub async fn run() {
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    // // wgpu 0.18
    // let device_descriptor = wgpu::DeviceDescriptor {
    //     label: None,
    //     features: wgpu::Features::empty(),
    //     limits: wgpu::Limits::default(),
    // };

    // wgpu 0.19.3
    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, _) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    // Allocate a vertex buffer that should be dropped immediately.
    // Without this allocation, there is no leak/crash.
    let _ = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Vertex Buffer"),
        contents: bytemuck::cast_slice(&[0; 1024]),
        usage: wgpu::BufferUsages::VERTEX,
    });
}

fn main() {
    for i in 0..1000 {
        if i % 10 == 0 {
            println!("{i}");
        }
        pollster::block_on(run());
    }
}
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
Context leak detected, msgtracer returned -1
150
160
170
180
190
Context leak detected, msgtracer returned -1
200
210
220
230
240
Context leak detected, msgtracer returned -1
250
260
270
280
290
Context leak detected, msgtracer returned -1
300
310
320
330
340
Context leak detected, msgtracer returned -1
350
360
370
380
390
Context leak detected, msgtracer returned -1
400
410
420
430
440
Context leak detected, msgtracer returned -1
450
460
470
480
490
Context leak detected, msgtracer returned -1
500
510
520
530
540
Context leak detected, msgtracer returned -1
550
560
570
580
590
Context leak detected, msgtracer returned -1
600
610
620
630
640
Context leak detected, msgtracer returned -1
650
660
670
680
690
Context leak detected, msgtracer returned -1
700
710
720
730
740
Context leak detected, msgtracer returned -1
750
760
770
780
790
thread 'main' panicked at 'assertion failed: !ptr.is_null()', /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
   3: foreign_types_shared::ForeignTypeRef::from_ptr
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
   4: <metal::commandqueue::CommandQueue as core::ops::deref::Deref>::deref
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/metal-0.27.0/src/commandqueue.rs:13:1
   5: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding::{{closure}}
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-hal/src/metal/command.rs:179:17
   6: objc::rc::autorelease::autoreleasepool
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/objc-0.2.7/src/rc/autorelease.rs:29:5
   7: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-hal/src/metal/command.rs:175:19
   8: wgpu_core::device::queue::PendingWrites<A>::activate
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/device/queue.rs:280:17
   9: wgpu_core::device::resource::Device<A>::new
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/device/resource.rs:226:9
  10: wgpu_core::instance::Adapter<A>::create_device_and_queue_from_hal
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:312:29
  11: wgpu_core::instance::Adapter<A>::create_device_and_queue
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:386:9
  12: wgpu_core::instance::<impl wgpu_core::global::Global<G>>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:1200:23
  13: <wgpu::backend::direct::Context as wgpu::context::Context>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/backend/direct.rs:635:44
  14: <T as wgpu::context::DynContext>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/context.rs:2124:22
  15: wgpu::Adapter::request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/lib.rs:2263:22
  16: wgpu_memory_repro2::run::{{closure}}
             at ./src/main.rs:32:23
  17: pollster::block_on
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pollster-0.3.0/src/lib.rs:128:15
  18: wgpu_memory_repro2::main
             at ./src/main.rs:49:9
  19: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5

@grovesNL
Copy link
Collaborator

Does the leak happen if you wrap the entire thing in an autorelease pool? I guess you might be noticing this because it's running headless and nothing provides a global autorelease pool (vs. a windowed application that would have an autorelease pool automatically added by the windowing library).

@jonmmease
Copy link
Author

Thanks for the suggestion @grovesNL

I'm getting the same result when I update the above like this:

        objc::rc::autoreleasepool(|| {
            pollster::block_on(run());
        })

or if I wrap the create_buffer_init call like this:

    objc::rc::autoreleasepool(|| {
        let _ = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
            label: Some("Vertex Buffer"),
            contents: bytemuck::cast_slice(&[0; 1024]),
            usage: wgpu::BufferUsages::VERTEX,
        });
    })

Let me know if that's not what you had in mind. Thanks again!

@jonmmease
Copy link
Author

Debugging a bit. I see that when the buffer is created this create_buffer function is called.

unsafe fn create_buffer(&self, desc: &crate::BufferDescriptor) -> DeviceResult<super::Buffer> {

But I haven't been able to figure out whether there is metal-specific cleanup logic that is supposed to be called when a buffer is dropped, or if the autoreleasepool in the create_buffer function above is supposed to take care of that.

@grovesNL
Copy link
Collaborator

Yeah that seems right. We basically need an autorelease pool to be active whenever types like Metal buffers are created/cloned (retained)/dropped (releaseed)/etc., so if there is a place that does that without an autorelease pool active then it could leak. I think the one you used for the entire program would be enough, but maybe there's something else going on.

I haven't worked with the Metal backend for a while, but maybe something else (internally to wgpu) holds a strong reference to the buffer you're creating? The Xcode debugger might be able to show you if anything's referencing it.

@jonmmease
Copy link
Author

Thanks @grovesNL, that's helpful.

I started playing with Xcode Instruments to analyze leaks. It's a little overwhelming. For the commit before #3626, no leaks are detected (though memory usage does climb a little over time).

Screenshot 2024-04-18 at 11 30 44 AM

But for #3626 and beyond, there are dozens of leaks reported and memory usage increases dramatically over time until the program crashes.

Screenshot 2024-04-18 at 11 29 55 AM

When I have more time I'll try to look at some of these stack traces more carefully and see if I can make more sense of it.

@jonmmease
Copy link
Author

I found something potentially suspicious. In my repro, each iteration results in a device being created using the new associated function at

pub(crate) fn new(

But the corresponding drop associated function is never called

impl<A: HalApi> Drop for Device<A> {
fn drop(&mut self) {
resource_log!("Destroy raw Device {:?}", self.info.label());
let raw = self.raw.take().unwrap();
let pending_writes = self.pending_writes.lock().take().unwrap();
pending_writes.dispose(&raw);
self.command_allocator.dispose(&raw);
unsafe {
raw.destroy_buffer(self.zero_buffer.take().unwrap());
raw.destroy_fence(self.fence.write().take().unwrap());
let queue = self.queue_to_drop.take().unwrap();
raw.exit(queue);
}
}
}

I think the leak of buffers may be associated with the fact that the device's buffers aren't getting cleaned up when the top-level handle to the device is dropped. This would suggest that this might not be specifically a metal issue, and perhaps I'm hitting it more than most because I end up creating and dropping devices repeatedly.

@cwfitzgerald
Copy link
Member

cwfitzgerald commented Apr 18, 2024

The ownership is actually the other way around, buffers have owning refs on the device - the buffer leak is probably causing the device to stay alive.

See #5120 for the ideal ownership diagram.

@jonmmease
Copy link
Author

Thanks @cwfitzgerald, that's helpful. I think I'm honing in on a circular reference.

In the repro above, the call to device.create_buffer_init unmaps the buffer after writing to it. This buffer.unmap eventually calls Buffer.ummap_inner:

fn unmap_inner(self: &Arc<Self>) -> Result<Option<BufferMapPendingClosure>, BufferAccessError> {

This performs a match on the current value of map_state, and in my case this takes the resource::BufferMapState::Init branch.

resource::BufferMapState::Init {
ptr,
stage_buffer,
needs_flush,
} => {

The second to last line of this branch calls pending_writes.consume_temp, passing in the stage_buffer.

pending_writes.consume_temp(queue::TempResource::Buffer(stage_buffer));

The stage buffer is an instance of this Buffer struct, which has a strong reference to the Device.

#[derive(Debug)]
pub struct Buffer<A: HalApi> {
pub(crate) raw: Snatchable<A::Buffer>,
pub(crate) device: Arc<Device<A>>,

This pending_writes.consume_temp call adds the Buffer to the temp_resources Vec of PendingWrites.

pub(crate) struct PendingWrites<A: HalApi> {
pub command_encoder: A::CommandEncoder,
/// True if `command_encoder` is in the "recording" state, as
/// described in the docs for the [`wgpu_hal::CommandEncoder`]
/// trait.
///
/// [`wgpu_hal::CommandEncoder`]: hal::CommandEncoder
pub is_recording: bool,
pub temp_resources: Vec<TempResource<A>>,

This Vec is never cleared because in this repro I'm not submitting anything to the queue, and the dispose function is never called...

pub fn dispose(mut self, device: &A::Device) {
unsafe {
if self.is_recording {
self.command_encoder.discard_encoding();
}
self.command_encoder
.reset_all(self.executing_command_buffers.into_iter());
device.destroy_command_encoder(self.command_encoder);
}
self.temp_resources.clear();
}

because the Device is never dropped, because a reference to it is in pending_writes.

impl<A: HalApi> Drop for Device<A> {
fn drop(&mut self) {
resource_log!("Destroy raw Device {:?}", self.info.label());
let raw = self.raw.take().unwrap();
let pending_writes = self.pending_writes.lock().take().unwrap();
pending_writes.dispose(&raw);

If I add

let commands: Vec<CommandBuffer> = Vec::new();
queue.submit(commands.into_iter());

to the end of the repro then the device does get dropped successfully, and there is no leak/crash.

Unfortunately, in the original repro (and my real project) I am submitting work the the queue, so this isn't quite what's going on there.

@jonmmease
Copy link
Author

Ok, I've made a bit more progress. It looks like order in which wgpu objects are dropped determines whether there is a leak or not.

Here's another example where I've flattened everything into a single render function (rather than storing the device and other resources in the State struct.

use std::iter;
use wgpu::util::DeviceExt;
use wgpu::{
    CommandBuffer, CommandEncoderDescriptor, Extent3d, ImageCopyBuffer, ImageCopyTexture,
    ImageDataLayout, Origin3d, Texture, TextureAspect, TextureDescriptor, TextureDimension,
    TextureFormat, TextureUsages, TextureView,
};

#[repr(C)]
#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable)]
struct Vertex {
    position: [f32; 3],
    color: [f32; 3],
}

impl Vertex {
    fn desc() -> wgpu::VertexBufferLayout<'static> {
        wgpu::VertexBufferLayout {
            array_stride: std::mem::size_of::<Vertex>() as wgpu::BufferAddress,
            step_mode: wgpu::VertexStepMode::Vertex,
            attributes: &[
                wgpu::VertexAttribute {
                    offset: 0,
                    shader_location: 0,
                    format: wgpu::VertexFormat::Float32x3,
                },
                wgpu::VertexAttribute {
                    offset: std::mem::size_of::<[f32; 3]>() as wgpu::BufferAddress,
                    shader_location: 1,
                    format: wgpu::VertexFormat::Float32x3,
                },
            ],
        }
    }
}

const VERTICES: &[Vertex] = &[
    Vertex {
        position: [-0.0868241, 0.49240386, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // A
    Vertex {
        position: [-0.49513406, 0.06958647, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // B
    Vertex {
        position: [-0.21918549, -0.44939706, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // C
    Vertex {
        position: [0.35966998, -0.3473291, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // D
    Vertex {
        position: [0.44147372, 0.2347359, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // E
];

const INDICES: &[u16] = &[0, 1, 4, 1, 2, 4, 2, 3, 4, /* padding */ 0];

async fn render() -> Result<(), wgpu::SurfaceError> {
    // The instance is a handle to our GPU
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, queue) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    // The instance is a handle to our GPU
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    // wgpu 0.19.3
    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, queue) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    let texture_format = TextureFormat::Rgba8Unorm;
    let size = 256u32;
    let texture_desc = TextureDescriptor {
        size: Extent3d {
            width: size,
            height: size,
            depth_or_array_layers: 1,
        },
        mip_level_count: 1,
        sample_count: 1,
        dimension: TextureDimension::D2,
        format: texture_format,
        usage: TextureUsages::COPY_SRC | TextureUsages::RENDER_ATTACHMENT,
        label: None,
        view_formats: &[texture_format],
    };

    let texture = device.create_texture(&texture_desc);
    let texture_view = texture.create_view(&Default::default());

    let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
        label: Some("Shader"),
        source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
    });

    let render_pipeline_layout = device.create_pipeline_layout(&wgpu::PipelineLayoutDescriptor {
        label: Some("Render Pipeline Layout"),
        bind_group_layouts: &[],
        push_constant_ranges: &[],
    });

    let render_pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
        label: Some("Render Pipeline"),
        layout: Some(&render_pipeline_layout),
        vertex: wgpu::VertexState {
            module: &shader,
            entry_point: "vs_main",
            buffers: &[Vertex::desc()],
        },
        fragment: Some(wgpu::FragmentState {
            module: &shader,
            entry_point: "fs_main",
            targets: &[Some(wgpu::ColorTargetState {
                format: texture_format,
                blend: Some(wgpu::BlendState {
                    color: wgpu::BlendComponent::REPLACE,
                    alpha: wgpu::BlendComponent::REPLACE,
                }),
                write_mask: wgpu::ColorWrites::ALL,
            })],
        }),
        primitive: wgpu::PrimitiveState {
            topology: wgpu::PrimitiveTopology::TriangleList,
            strip_index_format: None,
            front_face: wgpu::FrontFace::Ccw,
            cull_mode: Some(wgpu::Face::Back),
            polygon_mode: wgpu::PolygonMode::Fill,
            unclipped_depth: false,
            conservative: false,
        },
        depth_stencil: None,
        multisample: wgpu::MultisampleState {
            count: 1,
            mask: !0,
            alpha_to_coverage_enabled: false,
        },
        multiview: None,
    });

    let vertex_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Vertex Buffer"),
        contents: bytemuck::cast_slice(VERTICES),
        usage: wgpu::BufferUsages::VERTEX,
    });
    let index_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Index Buffer"),
        contents: bytemuck::cast_slice(INDICES),
        usage: wgpu::BufferUsages::INDEX,
    });
    let num_indices = INDICES.len() as u32;

    let mut encoder = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
        label: Some("Render Encoder"),
    });

    {
        let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
            label: Some("Render Pass"),
            color_attachments: &[Some(wgpu::RenderPassColorAttachment {
                view: &texture_view,
                resolve_target: None,
                ops: wgpu::Operations {
                    load: wgpu::LoadOp::Clear(wgpu::Color {
                        r: 0.1,
                        g: 0.2,
                        b: 0.3,
                        a: 1.0,
                    }),
                    store: wgpu::StoreOp::Store,
                },
            })],
            depth_stencil_attachment: None,
            occlusion_query_set: None,
            timestamp_writes: None,
        });

        render_pass.set_pipeline(&render_pipeline);
        render_pass.set_vertex_buffer(0, vertex_buffer.slice(..));
        render_pass.set_index_buffer(index_buffer.slice(..), wgpu::IndexFormat::Uint16);
        render_pass.draw_indexed(0..num_indices, 0, 0..1);
    }
    queue.submit(iter::once(encoder.finish()));
    
    // // Dropping device first triggers a memory leak
    // drop(device);

    Ok(())
}

pub async fn run() {
    render().await.unwrap();
}

fn main() {
    for i in 0..1000 {
        if i % 10 == 0 {
            println!("{i}");
        }
        pollster::block_on(run());
    }
}

This works without a leak/crash. But if you uncomment the drop(device); line at the bottom of the function (which causes the device to be dropped before the texture and buffers), then the leak and crash behavior is present. My original repro (and my project) triggered this because the device property was the first struct property of the State struct, and so it was getting dropped before the Texture and TextureView. When I reorder the struct properties in the repro like this then there is no leak:

struct State {
    size: u32,
    vertex_buffer: wgpu::Buffer,
    index_buffer: wgpu::Buffer,
    num_indices: u32,
    texture_size: Extent3d,
    render_pipeline: wgpu::RenderPipeline,
    texture_view: TextureView,
    texture: Texture,
    queue: wgpu::Queue,
    device: wgpu::Device,
}

Figuring out that this is how things currently work is enough to unblock me, but I am wondering if this should be considered a bug or just something that should be documented (and apologies if it is and I didn't come across it).

@cwfitzgerald
Copy link
Member

Oh yes, that's definitely a bug. Well done tracking it down!

@cwfitzgerald cwfitzgerald changed the title 0.19 metal backend regression: Crashes with "Context leak detected, msgtracer returned -1" Buffers Leaked when Device Handle is Dropped Before Buffers Apr 19, 2024
@cwfitzgerald cwfitzgerald added type: bug Something isn't working help wanted Contributions encouraged area: correctness We're behaving incorrectly labels Apr 19, 2024
@gents83
Copy link
Contributor

gents83 commented Apr 19, 2024

Sorry for not being very reactive or present in this period :(
Thanks a lot for your investigation @jonmmease
Apparently it seems that dropping the device doesn't force the queue submit to be finished and the release of resources before destroying, isn't it?
It could be even beneficial to add it to the mem_leaks.rs test in order to keep track of this issue too - while fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: correctness We're behaving incorrectly help wanted Contributions encouraged type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants