Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Deadlock between Device::temp_suspected and CommandBuffer::data #5647

Closed
Tracked by #5572
jimblandy opened this issue May 2, 2024 · 0 comments · Fixed by #5653
Closed
Tracked by #5572

[core] Deadlock between Device::temp_suspected and CommandBuffer::data #5647

jimblandy opened this issue May 2, 2024 · 0 comments · Fixed by #5653
Assignees

Comments

@jimblandy
Copy link
Member

Two threads can deadlock as follows:

  • The first thread calls Global::command_encoder_drop, which locks CommandBuffer::data while calling Device::untrack, which then locks Device::temp_suspected.
  • The second thread calls Queue::submit, which locks Device::temp_suspected and then CommandBuffer::data.

The lock around temp_suspected should never need to be held for a long time. This is just a simple allocation reuse.

jimblandy added a commit to jimblandy/wgpu that referenced this issue May 3, 2024
Remove unreachable code from `Global::queue_submit` that checks
whether the resources used by the command buffer have a reference
count of one, and adds them to `Device::temp_suspected` if so.

When `queue_submit` is called, all the `Arc`s processed by this code
have a reference count of at least three, even when the user has
dropped the resource:

- `Device::trackers` holds strong references to all the device's
  resources.

- `CommandBufferMutable::trackers` holds strong references to all
  resources used by the command buffer.

- The `used_resources` methods of the various members of
  `CommandBufferMutable::trackers` all return iterators of owned
  `Arc`s.

Fortunately, since the `Global::device_drop_foo` methods all add the
`foo` being dropped to `Device::suspected_resources`, and
`LifetimeTracker::triage_suspected` does an adequate job of accounting
for the uninteresting `Arc`s and leaves the resources there until
they're actually dead, things do get cleaned up without the checks in
`Global::queue_submit`.

Fixes gfx-rs#5647.
jimblandy added a commit to jimblandy/wgpu that referenced this issue May 3, 2024
Remove unreachable code from `Global::queue_submit` that checks
whether the resources used by the command buffer have a reference
count of one, and adds them to `Device::temp_suspected` if so.

When `queue_submit` is called, all the `Arc`s processed by this code
have a reference count of at least three, even when the user has
dropped the resource:

- `Device::trackers` holds strong references to all the device's
  resources.

- `CommandBufferMutable::trackers` holds strong references to all
  resources used by the command buffer.

- The `used_resources` methods of the various members of
  `CommandBufferMutable::trackers` all return iterators of owned
  `Arc`s.

Fortunately, since the `Global::device_drop_foo` methods all add the
`foo` being dropped to `Device::suspected_resources`, and
`LifetimeTracker::triage_suspected` does an adequate job of accounting
for the uninteresting `Arc`s and leaves the resources there until
they're actually dead, things do get cleaned up without the checks in
`Global::queue_submit`.

fixes gfx-rs#5647
jimblandy added a commit to jimblandy/wgpu that referenced this issue May 3, 2024
Remove unreachable code from `Global::queue_submit` that checks
whether the resources used by the command buffer have a reference
count of one, and adds them to `Device::temp_suspected` if so.

When `queue_submit` is called, all the `Arc`s processed by this code
have a reference count of at least three, even when the user has
dropped the resource:

- `Device::trackers` holds strong references to all the device's
  resources.

- `CommandBufferMutable::trackers` holds strong references to all
  resources used by the command buffer.

- The `used_resources` methods of the various members of
  `CommandBufferMutable::trackers` all return iterators of owned
  `Arc`s.

Fortunately, since the `Global::device_drop_foo` methods all add the
`foo` being dropped to `Device::suspected_resources`, and
`LifetimeTracker::triage_suspected` does an adequate job of accounting
for the uninteresting `Arc`s and leaves the resources there until
they're actually dead, things do get cleaned up without the checks in
`Global::queue_submit`.

This allows `Device::temp_suspected` to be private to
`device::resource`, with a sole remaining use in `Device::untrack`.

Fixes gfx-rs#5647.
ErichDonGubler pushed a commit that referenced this issue May 3, 2024
Remove unreachable code from `Global::queue_submit` that checks
whether the resources used by the command buffer have a reference
count of one, and adds them to `Device::temp_suspected` if so.

When `queue_submit` is called, all the `Arc`s processed by this code
have a reference count of at least three, even when the user has
dropped the resource:

- `Device::trackers` holds strong references to all the device's
  resources.

- `CommandBufferMutable::trackers` holds strong references to all
  resources used by the command buffer.

- The `used_resources` methods of the various members of
  `CommandBufferMutable::trackers` all return iterators of owned
  `Arc`s.

Fortunately, since the `Global::device_drop_foo` methods all add the
`foo` being dropped to `Device::suspected_resources`, and
`LifetimeTracker::triage_suspected` does an adequate job of accounting
for the uninteresting `Arc`s and leaves the resources there until
they're actually dead, things do get cleaned up without the checks in
`Global::queue_submit`.

This allows `Device::temp_suspected` to be private to
`device::resource`, with a sole remaining use in `Device::untrack`.

Fixes #5647.
@jimblandy jimblandy self-assigned this May 20, 2024
jimblandy added a commit to jimblandy/wgpu that referenced this issue May 23, 2024
Remove unreachable code from `Global::queue_submit` that checks
whether the resources used by the command buffer have a reference
count of one, and adds them to `Device::temp_suspected` if so.

When `queue_submit` is called, all the `Arc`s processed by this code
have a reference count of at least three, even when the user has
dropped the resource:

- `Device::trackers` holds strong references to all the device's
  resources.

- `CommandBufferMutable::trackers` holds strong references to all
  resources used by the command buffer.

- The `used_resources` methods of the various members of
  `CommandBufferMutable::trackers` all return iterators of owned
  `Arc`s.

Fortunately, since the `Global::device_drop_foo` methods all add the
`foo` being dropped to `Device::suspected_resources`, and
`LifetimeTracker::triage_suspected` does an adequate job of accounting
for the uninteresting `Arc`s and leaves the resources there until
they're actually dead, things do get cleaned up without the checks in
`Global::queue_submit`.

This allows `Device::temp_suspected` to be private to
`device::resource`, with a sole remaining use in `Device::untrack`.

Fixes gfx-rs#5647.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant