Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a patch to fix regression with nouveau #309

Closed
wants to merge 1 commit into from

Conversation

doraskayo
Copy link

This restores hardware acceleration on systems where the nouveau driver is loaded and NVIDIA hardware is present, including switchable graphics setups.

An alternative to #308.

This restores hardware acceleration on systems where the nouveau
driver is loaded and NVIDIA hardware is present, including
switchable graphics setups.
@flathubbot
Copy link
Contributor

Started test build 57356

@flathubbot
Copy link
Contributor

Build 57356 successful
To test this build, install it from the testing repository:

flatpak install --user https://dl.flathub.org/build-repo/39979/org.chromium.Chromium.flatpakref

@doraskayo
Copy link
Author

I tested and it seems to work well.

@fooishbar
Copy link

It's worth noting that this introduces a dependency on GBM from Mesa >= 21.1.0, which is probably too new. I wonder if the issue isn't just that the if (ret) fallback path returns a new empty ScopedFD constructor, rather than ScopedFD(fd) like the other?

@doraskayo
Copy link
Author

doraskayo commented Aug 7, 2023

It's worth noting that this introduces a dependency on GBM from Mesa >= 21.1.0, which is probably too new. I wonder if the issue isn't just that the if (ret) fallback path returns a new empty ScopedFD constructor, rather than ScopedFD(fd) like the other?

It's not related to the fallback path. Here's a minimal reproducer: https://gitlab.gnome.org/-/snippets/5989

$ ./nouveau_gbm_bug
Started iteration 0
Finished iteration 0
Started iteration 1
nouveau_gbm_bug: drmPrimeHandleToFD: No such file or directory

The issue is due to the fact that nouveau only considers GEM handles "global" (i.e., may be shared in multiple objects) in specific scenarios, by calling the nouveau_bo_make_global() function: https://gitlab.freedesktop.org/mesa/drm/-/blob/c6013245ce9ce287bb86d327f9b6420a320a08e6/nouveau/nouveau.c#L714-723

The concept of "get_handle" in the GBM API seems to be designed to be a simple getter; in the gbm_bo_get_handle() case the backend is not even informed that the handle was shared. This means that even for the more involved gbm_bo_get_handle_for_plane() API, backends treat it as such.

You can see that nouveau doesn't call the nouveau_bo_set_prime() API which marks the handle as global in the WINSYS_HANDLE_TYPE_KMS case, but does in the WINSYS_HANDLE_TYPE_FD case. The former is relevant to the "get_handle" APIs and the latter to "get_fd" GBM APIs: https://gitlab.freedesktop.org/mesa/mesa/-/blob/8088d73fd1c6c8d52975e8f7551a030242e2b256/src/gallium/drivers/nouveau/nouveau_screen.c#L147-151

Edit:
Other drivers where this works add the handles to their global list/map as part of gbm_dri_bo_create, through the __DRI_IMAGE_ATTRIB_HANDLE image query here. It's still not clear if to me if this is unnecessary on their part or an issue with nouveau which doesn't.

@fooishbar
Copy link

To be fair, that sounds like a Nouveau bug; when it's queried for a KMS handle, it should make sure that it's properly exporting it.

@doraskayo
Copy link
Author

doraskayo commented Aug 7, 2023

To be fair, that sounds like a Nouveau bug; when it's queried for a KMS handle, it should make sure that it's properly exporting it.

I couldn't find documentation that states the expected behavior of WINSYS_HANDLE_TYPE_KMS in regards to exporting, and it seemed that nouveau has had this behavior for years. If the expectation is that drivers would consider a handle resource externally shared in the WINSYS_HANDLE_TYPE_KMS case, then it's indeed a nouveau bug.

@fooishbar
Copy link

Yeah, that's definitely the expectation. I can't think of any other driver which returns a GEM handle that cannot be exported. Using the FD entrypoints is definitely better, but it's not OK to return GEM handles that ... aren't entirely GEM handles.

@doraskayo
Copy link
Author

Yeah, that's definitely the expectation. I can't think of any other driver which returns a GEM handle that cannot be exported. Using the FD entrypoints is definitely better, but it's not OK to return GEM handles that ... aren't entirely GEM handles.

Thanks for providing clarity on this. I'll try to see if I can figure out the best way to achieve this in nouveau.

@doraskayo
Copy link
Author

doraskayo commented Aug 7, 2023

@fooishbar, I found additional context for the issue:

So it seems that the conclusion was that this was an application issue in the wlroots case, and it is now well-documented that this behavior is not allowed.

Considering the above, how do you suggest we proceed?

cc: @emersion

@fooishbar
Copy link

Calling drmPrimeHandleToFD is actually completely OK, as long as you a) never close the handle that gets returned to you, and b) never call drmPrimeFDToHandle but instead import via GBM.

@danvet
Copy link

danvet commented Aug 7, 2023

@fooishbar drivers generally only keep gem handle in their drmPrimeFDToHandle that could be theoretically shared, i.e. either previously imported or one that was originally allocated on that fd and then exported. which means if you call drmPrimeHandleToFD bypassing gbm, that handle won't be in the lookup cache and no amount of using gbm to import will fix that, stuff is broken at that point.

hence why if you have gbm managing a drmfd, you cannot do anything else with these handle that does not go through gbm. there really can be only one owner of the drmfd gem bo handle space or things will break.

so your rule needs to be that calling drmPrimeHandleToFD is totally ok, as long as you guarantee that that buffer never comes back to the same fd through any convoluted path. meaning you could import it in some kms display driver, or v4l device, but that's absolutely all you can safely do. closing the fd otoh is fine, since fd are fully refcounted

@danvet
Copy link

danvet commented Aug 7, 2023

oh also this is pretty much generic issue for all gbm/mesa drivers using drm underneath, shouldn't have anything to do with nouveau. but sounds like the reason nouveau is special is because it gets one of the alloc flags wrong and hence some hacks are needed (that WINSYS_HANDLE_TYPE_KMS mentioned above, I haven't dug into the code)

@fooishbar
Copy link

Mmm, I guess I missed the memo that gbm_bo_get_handle wasn't exporting. I'm having (a lot of) trouble imagining why it would be problematic to do so? It wouldn't cause performance problems because the BO is going to be at the tail end of your rendering anyway. And it feels bizzare to have two classes of GEM objects?

@doraskayo
Copy link
Author

@fooishbar, @danvet, so looking at this minimal reproducer, is the logic in get_fd_for_plane_shim inherently flawed and bound to fail with some drivers such as nouveau? https://gitlab.gnome.org/-/snippets/5989

I did see that the drivers where this works do export the handles (add them to an internal map) as part of gbm_dri_bo_create because it queries for __DRI_IMAGE_ATTRIB_HANDLE here. It's still not clear if to me if this is unnecessary on their part or an issue with nouveau which doesn't.

@fooishbar
Copy link

Yeah, personally I think it's nouveau at fault for exporting GEM handles to the user and then claiming that those handles ... can't be used.

@doraskayo
Copy link
Author

doraskayo commented Aug 11, 2023

Thanks @fooishbar.

I'd like to share some additional context. The only documentation I could find that may give a hint as to the intended nature of handles queried using __DRI_IMAGE_ATTRIB_HANDLE is in MESA_drm_image, specifically in regards to eglExportDRMImageMESA. The spec says:

To create a process local handle or a global DRM name for a
buffer, call

 EGLBoolean eglExportDRMImageMESA(EGLDisplay dpy,
                                  EGLImageKHR image,
                                  EGLint *name,
                                  EGLint *handle,
                                  EGLint *stride);

If is non-NULL, a global name is assigned to the image and
written to , the handle (local to the DRM file descriptor,
for use with DRM kernel modesetting API) is written to <handle>
if
non-NULL and the stride (in bytes) is written to <stride>, if
non-NULL.

Mesa implements this requirement with __DRI_IMAGE_ATTRIB_HANDLE here, exactly like it does in the GBM case when it exports handles.

So the __DRI_IMAGE_ATTRIB_HANDLE image query is used in the case of EGL to meet the requirement of the spec to create a handle which is "local to the DRM file descriptor, for use with DRM kernel modesetting API".

@danvet, is this what you meant in the following quote?

so your rule needs to be that calling drmPrimeHandleToFD is totally ok, as long as you guarantee that that buffer never comes back to the same fd through any convoluted path. meaning you could import it in some kms display driver, or v4l device, but that's absolutely all you can safely do.

If so, could one theorize that the handles that gbm_bo_get_handle and gbm_bo_get_handle_for_plane return are only meant to be used with the KMS API and not any other API? (such as drmPrimeHandleToFD)

@fooishbar, I think that handles exported through GBM do indeed work with KMS APIs even in the nouveau case, so they are not entirely useless. Perhaps this was the only intention of the GBM API? To effectively provide the same promise provided by eglExportDRMImageMESA in regards to handles, but not more?

@fooishbar
Copy link

GBM definitely started as a bridge between EGL and KMS, but has been a lot more than that for a long time.

I don't think the spec text is super relevant to what's going on here. It describes the handle as 'local to the DRM file descriptor', because that's what GEM handles are: scoped to the file descriptor (technically the file description, but close enough). Even then, it invalidates its own point by suggesting a 'global DRM name' i.e. flink, which would presumably fail in the same way as dmabuf export.

It's also not clear what nouveau could gain from preventing export. I understand that you want a clear delineation between local and exported buffers, sure. But the only time you export a GEM handle is when a buffer is GBM or winsys, i.e. at the very tail end of your rendering where it's going to be used by a context-external consumer anyway. Even then, winsys (in the case of Wayland and DRI3) pulls an FD rather than a handle, so the only thing nouveau gains from this is the ability to keep gbm_bos within local space. I'd be stunned if this was a meaningful optimisation in any usecase, let alone one which justified the pain of an arbitrary division within the API.

@danvet
Copy link

danvet commented Aug 11, 2023

@doraskayo with import in a kms device I meant using your own drmfd for the display hardware, which must be not the same file descriptor as the one gbm using. otherwise you're right back to the confusion over who manages the drm handle space. So the only ok sequence is 1. open a drm kms fd to get your own fresh and private drmfd 2. get a dma-buf fd from somewhere 3. drmPrimeFD2Handle on your own drmfd that you opened in step 1

If it's some other drmfd that you're sharing with gbm for step 3, then you're in bad surprises territory again.

Wrt the specifics of __DRI_IMAGE_ATTRIB_HANDLE I have no idea (would need to check the source in mesa), I think @fooishbar is the more compenten one for that.

@doraskayo
Copy link
Author

Closing this PR, this is now fixed in nouveau: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24648

Thanks for the help and advice, @fooishbar, @danvet.

@doraskayo doraskayo closed this Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants