Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

render/drm_dumb_allocator: doesn't work on amdgpu #2916

Closed
emersion opened this issue May 5, 2021 · 2 comments · Fixed by #3131
Closed

render/drm_dumb_allocator: doesn't work on amdgpu #2916

emersion opened this issue May 5, 2021 · 2 comments · Fixed by #3131

Comments

@emersion
Copy link
Member

emersion commented May 5, 2021

The GBM imports are failing. See #2700 (comment).

Fixing this requires side-stepping GBM.

We probably have two choices here:

  • Pass GEM handles around directly. In theory this is simpler, but also more dangerous if there's no safe-guard. In particular, on multi-GPU systems, it's pretty easy to mix up GEM handles between devices.
  • Open a separate DRM FD in the DRM backend. Manually perform the GEM handle ref'counting.
emersion added a commit to emersion/wlroots that referenced this issue Aug 23, 2021
Using GBM to import DRM dumb buffers tends to not work well. By
using GBM we're calling some driver-specific functions in Mesa.
These functions check whether Mesa can work with the buffer.
Sometimes Mesa has requirements which differ from DRM dumb buffers
and the GBM import will fail (e.g. on amdgpu).

Instead, drop GBM and use drmPrimeFDToHandle directly. But there's
a twist: BO handles are not ref'counted by the kernel and need to
be ref'counted in user-space [1]. libdrm usually performs this
bookkeeping and is used under-the-hood by Mesa.

We can't re-use libdrm for this task without using driver-specific
APIs. So let's just re-implement the ref'counting logic in wlroots.
The wlroots implementation is inspired from amdgpu's in libdrm [2].

Closes: swaywm#2916

[1]: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/110
[2]: https://gitlab.freedesktop.org/mesa/drm/-/blob/1a4c0ec9aea13211997f982715fe5ffcf19dd067/amdgpu/handle_table.c
emersion added a commit to emersion/wlroots that referenced this issue Aug 24, 2021
Using GBM to import DRM dumb buffers tends to not work well. By
using GBM we're calling some driver-specific functions in Mesa.
These functions check whether Mesa can work with the buffer.
Sometimes Mesa has requirements which differ from DRM dumb buffers
and the GBM import will fail (e.g. on amdgpu).

Instead, drop GBM and use drmPrimeFDToHandle directly. But there's
a twist: BO handles are not ref'counted by the kernel and need to
be ref'counted in user-space [1]. libdrm usually performs this
bookkeeping and is used under-the-hood by Mesa.

We can't re-use libdrm for this task without using driver-specific
APIs. So let's just re-implement the ref'counting logic in wlroots.
The wlroots implementation is inspired from amdgpu's in libdrm [2].

Closes: swaywm#2916

[1]: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/110
[2]: https://gitlab.freedesktop.org/mesa/drm/-/blob/1a4c0ec9aea13211997f982715fe5ffcf19dd067/amdgpu/handle_table.c
bl4ckb0ne pushed a commit that referenced this issue Aug 25, 2021
Using GBM to import DRM dumb buffers tends to not work well. By
using GBM we're calling some driver-specific functions in Mesa.
These functions check whether Mesa can work with the buffer.
Sometimes Mesa has requirements which differ from DRM dumb buffers
and the GBM import will fail (e.g. on amdgpu).

Instead, drop GBM and use drmPrimeFDToHandle directly. But there's
a twist: BO handles are not ref'counted by the kernel and need to
be ref'counted in user-space [1]. libdrm usually performs this
bookkeeping and is used under-the-hood by Mesa.

We can't re-use libdrm for this task without using driver-specific
APIs. So let's just re-implement the ref'counting logic in wlroots.
The wlroots implementation is inspired from amdgpu's in libdrm [2].

Closes: #2916

[1]: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/110
[2]: https://gitlab.freedesktop.org/mesa/drm/-/blob/1a4c0ec9aea13211997f982715fe5ffcf19dd067/amdgpu/handle_table.c
fengguang pushed a commit to 0day-ci/linux that referenced this issue Sep 3, 2021
This can be used to create a separate DRM file description, thus
creating a new GEM handle namespace.

My use-case is wlroots. The library splits responsibilities between
separate components: the GBM allocator creates buffers, the GLES2
renderer uses EGL to import them and render to them, the DRM
backend imports the buffers and displays them. wlroots has a
modular architecture, and any of these components can be swapped
and replaced with something else. For instance, the pipeline can
be set up so that the DRM dumb buffer allocator is used instead of
GBM and the Pixman renderer is used instead of GLES2. Library users
can also replace any of these components with their own custom one.

DMA-BUFs are used to pass buffer references across components. We
could use GEM handles instead, but this would result in pain if
multiple GPUs are in use: wlroots copies buffers across GPUs as
needed. Importing a GEM handle created on one GPU into a completely
different GPU will blow up (fail at best, mix unrelated buffers
otherwise).

Everything is fine if all components use Mesa. However, this isn't
always desirable. For instance when running with DRM dumb buffers
and the Pixman software renderer it's unfortunate to depend on GBM
in the DRM backend just to turn DMA-BUFs into FB IDs. GBM loads
Mesa drivers to perform an action which has nothing driver-specific.
Additionally, drivers will fail the import if the 3D engine can't
use the imported buffer, for instance amdgpu will refuse to import
DRM dumb buffers [1]. We might also want to be running with a Vulkan
renderer and a Vulkan allocator in the future, and GBM wouldn't be
welcome in this setup.

To address this, GBM can be side-stepped in the DRM backend, and
can be replaced with drmPrimeFDToHandle calls. However because of
GEM handle reference counting issues, care must be taken to avoid
double-closing the same GEM handle. In particular, it's not
possible to share a DRM FD with GBM or EGL and perform some
drmPrimeFDToHandle calls manually.

So wlroots needs to re-open the DRM FD to create a new GEM handle
namespace. However there's no guarantee that the file-system
permissions will be set up so that the primary FD can be opened
by the compsoitor. On modern systems seatd or logind is a privileged
process responsible for doing this, and other processes aren't
expected to do it. For historical reasons systemd still allows
physically logged in users to open primary DRM nodes, but this
doesn't work on non-systemd setups and it's desirable to lock
them down at some point.

Some might suggest to open the render node instead of re-opening
the primary node. However some systems don't have a render node
at all (e.g. no GPU, or a split render/display SoC).

Solutions to this issue have been discussed in [2]. One solution
would be to open the magic /proc/self/fd/<fd> file, but it's a
Linux-specific hack (wlroots supports BSDs too). Another solution
is to add support for re-opening a DRM primary node to seatd/logind,
but they don't support it now and really haven't been designed for
this (logind would need to grow a completely new API, because it
assumes unique dev_t IDs). Also this seems like pushing down a
kernel limitation to user-space a bit too hard.

Another solution is to allow creating empty DRM leases. The lessee
FD would have its own GEM handle namespace, so wouldn't conflict
wth GBM/EGL. It would have the master bit set, but would be able
to manage zero resources. wlroots doesn't intend to share this FD
with any other process.

All in all IMHO that seems like a pretty reasonable solution to the
issue at hand.

Note, I've discussed with Jonas Ådahl and Mutter plans to adopt a
similar design in the future.

Example usage in wlroots is available at [3]. IGT test available
at [4].

[1]: swaywm/wlroots#2916
[2]: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/110
[3]: swaywm/wlroots#3158
[4]: https://patchwork.freedesktop.org/series/94323/

Signed-off-by: Simon Ser <contact@emersion.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Daniel Stone <daniels@collabora.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Cc: Michel Dänzer <michel@daenzer.net>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Dave Airlie <airlied@redhat.com>
@sjnewbury
Copy link

sjnewbury commented Sep 4, 2021

I'm getting severe corruption on fullscreen windows, for example alacritty with "super-f". I've bisected it to 5dfaf5e.

Using current git master/main libdrm/mesa/wlroots/sway, amdgpu (POLARIS10).

@emersion
Copy link
Member Author

emersion commented Sep 4, 2021

Please open a new bug report with debug logs and dmesg.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

Successfully merging a pull request may close this issue.

2 participants