render/drm_dumb_allocator: doesn't work on amdgpu #2916

emersion · 2021-05-05T08:57:24Z

The GBM imports are failing. See #2700 (comment).

Fixing this requires side-stepping GBM.

We probably have two choices here:

Pass GEM handles around directly. In theory this is simpler, but also more dangerous if there's no safe-guard. In particular, on multi-GPU systems, it's pretty easy to mix up GEM handles between devices.
Open a separate DRM FD in the DRM backend. Manually perform the GEM handle ref'counting.

Using GBM to import DRM dumb buffers tends to not work well. By using GBM we're calling some driver-specific functions in Mesa. These functions check whether Mesa can work with the buffer. Sometimes Mesa has requirements which differ from DRM dumb buffers and the GBM import will fail (e.g. on amdgpu). Instead, drop GBM and use drmPrimeFDToHandle directly. But there's a twist: BO handles are not ref'counted by the kernel and need to be ref'counted in user-space [1]. libdrm usually performs this bookkeeping and is used under-the-hood by Mesa. We can't re-use libdrm for this task without using driver-specific APIs. So let's just re-implement the ref'counting logic in wlroots. The wlroots implementation is inspired from amdgpu's in libdrm [2]. Closes: swaywm#2916 [1]: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/110 [2]: https://gitlab.freedesktop.org/mesa/drm/-/blob/1a4c0ec9aea13211997f982715fe5ffcf19dd067/amdgpu/handle_table.c

Using GBM to import DRM dumb buffers tends to not work well. By using GBM we're calling some driver-specific functions in Mesa. These functions check whether Mesa can work with the buffer. Sometimes Mesa has requirements which differ from DRM dumb buffers and the GBM import will fail (e.g. on amdgpu). Instead, drop GBM and use drmPrimeFDToHandle directly. But there's a twist: BO handles are not ref'counted by the kernel and need to be ref'counted in user-space [1]. libdrm usually performs this bookkeeping and is used under-the-hood by Mesa. We can't re-use libdrm for this task without using driver-specific APIs. So let's just re-implement the ref'counting logic in wlroots. The wlroots implementation is inspired from amdgpu's in libdrm [2]. Closes: #2916 [1]: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/110 [2]: https://gitlab.freedesktop.org/mesa/drm/-/blob/1a4c0ec9aea13211997f982715fe5ffcf19dd067/amdgpu/handle_table.c

This can be used to create a separate DRM file description, thus creating a new GEM handle namespace. My use-case is wlroots. The library splits responsibilities between separate components: the GBM allocator creates buffers, the GLES2 renderer uses EGL to import them and render to them, the DRM backend imports the buffers and displays them. wlroots has a modular architecture, and any of these components can be swapped and replaced with something else. For instance, the pipeline can be set up so that the DRM dumb buffer allocator is used instead of GBM and the Pixman renderer is used instead of GLES2. Library users can also replace any of these components with their own custom one. DMA-BUFs are used to pass buffer references across components. We could use GEM handles instead, but this would result in pain if multiple GPUs are in use: wlroots copies buffers across GPUs as needed. Importing a GEM handle created on one GPU into a completely different GPU will blow up (fail at best, mix unrelated buffers otherwise). Everything is fine if all components use Mesa. However, this isn't always desirable. For instance when running with DRM dumb buffers and the Pixman software renderer it's unfortunate to depend on GBM in the DRM backend just to turn DMA-BUFs into FB IDs. GBM loads Mesa drivers to perform an action which has nothing driver-specific. Additionally, drivers will fail the import if the 3D engine can't use the imported buffer, for instance amdgpu will refuse to import DRM dumb buffers [1]. We might also want to be running with a Vulkan renderer and a Vulkan allocator in the future, and GBM wouldn't be welcome in this setup. To address this, GBM can be side-stepped in the DRM backend, and can be replaced with drmPrimeFDToHandle calls. However because of GEM handle reference counting issues, care must be taken to avoid double-closing the same GEM handle. In particular, it's not possible to share a DRM FD with GBM or EGL and perform some drmPrimeFDToHandle calls manually. So wlroots needs to re-open the DRM FD to create a new GEM handle namespace. However there's no guarantee that the file-system permissions will be set up so that the primary FD can be opened by the compsoitor. On modern systems seatd or logind is a privileged process responsible for doing this, and other processes aren't expected to do it. For historical reasons systemd still allows physically logged in users to open primary DRM nodes, but this doesn't work on non-systemd setups and it's desirable to lock them down at some point. Some might suggest to open the render node instead of re-opening the primary node. However some systems don't have a render node at all (e.g. no GPU, or a split render/display SoC). Solutions to this issue have been discussed in [2]. One solution would be to open the magic /proc/self/fd/<fd> file, but it's a Linux-specific hack (wlroots supports BSDs too). Another solution is to add support for re-opening a DRM primary node to seatd/logind, but they don't support it now and really haven't been designed for this (logind would need to grow a completely new API, because it assumes unique dev_t IDs). Also this seems like pushing down a kernel limitation to user-space a bit too hard. Another solution is to allow creating empty DRM leases. The lessee FD would have its own GEM handle namespace, so wouldn't conflict wth GBM/EGL. It would have the master bit set, but would be able to manage zero resources. wlroots doesn't intend to share this FD with any other process. All in all IMHO that seems like a pretty reasonable solution to the issue at hand. Note, I've discussed with Jonas Ådahl and Mutter plans to adopt a similar design in the future. Example usage in wlroots is available at [3]. IGT test available at [4]. [1]: swaywm/wlroots#2916 [2]: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/110 [3]: swaywm/wlroots#3158 [4]: https://patchwork.freedesktop.org/series/94323/ Signed-off-by: Simon Ser <contact@emersion.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Daniel Stone <daniels@collabora.com> Cc: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Cc: Michel Dänzer <michel@daenzer.net> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Keith Packard <keithp@keithp.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Dave Airlie <airlied@redhat.com>

sjnewbury · 2021-09-04T15:07:28Z

I'm getting severe corruption on fullscreen windows, for example alacritty with "super-f". I've bisected it to 5dfaf5e.

Using current git master/main libdrm/mesa/wlroots/sway, amdgpu (POLARIS10).

emersion · 2021-09-04T15:15:57Z

Please open a new bug report with debug logs and dmesg.

emersion added backend/drm allocator labels May 5, 2021

emersion mentioned this issue May 5, 2021

drm dumb allocator #2700

Merged

emersion mentioned this issue Aug 23, 2021

backend/drm: introduce wlr_drm_bo_handle_table #3131

Merged

bl4ckb0ne closed this as completed in #3131 Aug 25, 2021

sjnewbury mentioned this issue Sep 4, 2021

backend/drm: get rid of BO handle table #3164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

render/drm_dumb_allocator: doesn't work on amdgpu #2916

render/drm_dumb_allocator: doesn't work on amdgpu #2916

emersion commented May 5, 2021

sjnewbury commented Sep 4, 2021 •

edited

emersion commented Sep 4, 2021

render/drm_dumb_allocator: doesn't work on amdgpu #2916

render/drm_dumb_allocator: doesn't work on amdgpu #2916

Comments

emersion commented May 5, 2021

sjnewbury commented Sep 4, 2021 • edited

emersion commented Sep 4, 2021

sjnewbury commented Sep 4, 2021 •

edited