Skip to content

Fix Metal command buffer ownership in command lists#96

Closed
cdlewis wants to merge 3 commits into
renderbag:mainfrom
cdlewis:main
Closed

Fix Metal command buffer ownership in command lists#96
cdlewis wants to merge 3 commits into
renderbag:mainfrom
cdlewis:main

Conversation

@cdlewis
Copy link
Copy Markdown

@cdlewis cdlewis commented May 22, 2026

'Clean' closures of recompiled apps (e.g. Zelda64Recomp, MarioKart64 and most importantly Snowboard Kids 2 :P) are being reported as crashes.

If we dig into what is going on here:

NSZombieEnabled=YES NSDeallocateZombies=NO OBJC_PRINT_EXCEPTIONS=YES ./build/SnowboardKids2Recompiled.app/Contents/MacOS/SnowboardKids2Recompiled

We see:

-[AGXG14GFamilyCommandBuffer release]: message sent to deallocated instance

Looking through usage of mtl, we see that MetalCommandList saves a pointer to the command buffer without retaining a reference to it. My layman's reading of the Metal docs suggest that we do need to call retain since in this case we're saving a reference to a pointer we do not own.

'Clean' closures of recompiled apps (e.g. Zelda64Recomp, MarioKart64 and
most importantly Snowboard Kids 2 :P) are being reported as crashes.

If we dig into what is going on here:

```
NSZombieEnabled=YES NSDeallocateZombies=NO OBJC_PRINT_EXCEPTIONS=YES ./build/SnowboardKids2Recompiled.app/Contents/MacOS/SnowboardKids2Recompiled
```

We see:

```
-[AGXG14GFamilyCommandBuffer release]: message sent to deallocated instance
```

Looking through usage of mtl, we see that MetalCommandList saves a
pointer to the command buffer without retaining a reference to it. My
layman's reading of the Metal docs suggest that we do need to call
retain if the refence escapes the method call. And sure enough, retaining
this instance of mtl appears to fix the crashing issue.
Comment thread plume_metal.cpp
cdlewis added 2 commits May 21, 2026 23:58
Zombie diagnostics reported -[AGXG14GFamilyBlitContext release]: message sent to deallocated instance during shutdown after the command buffer ownership fix landed.

This is the same ownership pattern as the previous command buffer fix: blitCommandEncoder returns a non-owned/autoreleased object, but MetalCommandList stores it and later releases it. Retain the encoder when storing it so the later release is balanced.
Comment thread plume_metal.cpp
MTL::TextureDescriptor *descriptor = MTL::TextureDescriptor::textureBufferDescriptor(pixelFormat, width, options, usage);
this->texture = buffer->mtl->newTexture(descriptor, 0, bytesPerRow);

descriptor->release();
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the opposite problem, we did not own TextureDescriptor and the autorelease pool (?) should take care of it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add an autoreleasepool to this function then.

Comment thread plume_metal.cpp
if (activeBlitEncoder == nullptr) {
activeBlitEncoder = mtl->blitCommandEncoder(device->sharedBlitDescriptor);
activeBlitEncoder->setLabel(MTLSTR("Copy Blit Encoder"));
activeBlitEncoder->retain();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief review of all "commandEncoder" command tells me not every one of them are using retain() and some are using release() despite that. CC @squidbus should all command encoders need to manually increase and decrease the refcount when created and destroyed?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I was hesitant to make more sweeping changes than were necessary to stop crashes. My read is that pointers that can escape the function should be retained/released but I'm far from an expert at this stuff.

Copy link
Copy Markdown
Contributor

@squidbus squidbus May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, retain here is right but needs an autoreleasepool around this. These were also addressed in #95, there's a missing pair for checkActiveResolveTextureComputeEncoder as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should all command encoders need to manually increase and decrease the refcount when created and destroyed?

Only if it's kept outside the scope of the function, otherwise an autoreleasepool will free it. Although these missing retains did not have an autoreleasepool so they were living longer.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth just waiting for #95 to merge then?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be best to clean up the logic of auto-release pools in that PR for sure, and then we can look at retain()/release() logic to be done where it's necessary.

Comment thread plume_metal.cpp
assert(mtl == nullptr);
startedEncoding = false;
mtl = queue->mtl->commandBufferWithUnretainedReferences();
mtl->retain();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a retain here is right but also needs an autorelease pool around this function.

Comment thread plume_metal.cpp
if (activeBlitEncoder == nullptr) {
activeBlitEncoder = mtl->blitCommandEncoder(device->sharedBlitDescriptor);
activeBlitEncoder->setLabel(MTLSTR("Copy Blit Encoder"));
activeBlitEncoder->retain();
Copy link
Copy Markdown
Contributor

@squidbus squidbus May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, retain here is right but needs an autoreleasepool around this. These were also addressed in #95, there's a missing pair for checkActiveResolveTextureComputeEncoder as well.

@DarioSamo
Copy link
Copy Markdown
Contributor

#97 should include all the relevant changes from this one and it's been merged, so we can close this one.

@DarioSamo DarioSamo closed this May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants