Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXT_device_enumeration: unclear how hot unplug works #120

Open
emersion opened this issue Jan 21, 2021 · 8 comments
Open

EXT_device_enumeration: unclear how hot unplug works #120

emersion opened this issue Jan 21, 2021 · 8 comments
Assignees

Comments

@emersion
Copy link

emersion commented Jan 21, 2021

How should a driver handle device unplug when it supports EXT_device_enumeration?

  • Should the EGLDevice handles of the unplugged device remain valid?
  • When is it safe from a driver to invalidate an EGLDevice handle (and release associated resources)?
  • What happens when trying to use the unplugged EGLDevice?

This issue stems from https://gitlab.freedesktop.org/glvnd/libglvnd/-/issues/215. We need to be careful to pick behavior that is compatible with a vendor-neutral EGL loader such as glvnd.

For instance, it's not safe for a driver to invalidate an unplugged EGLDevice after the EGL client calls eglQueryDevicesEXT, because that would allow another driver to re-use the exact same handle and prevent the EGL client from figuring out the EGLDevice is gone or has changed. Or maybe glvnd should wrap vendor EGLDevices to prevent this?

cc @cubanismo @kbrenneman

@cubanismo
Copy link
Contributor

@jjulianoatnv may be interested here as well, as similar questions have been raised regarding VkPhysicalDevice objects in Vulkan, and we should probably resolve both issues in a compatible way.

@emersion emersion changed the title EXT_device_enumeration: unclear hot unplug works EXT_device_enumeration: unclear how hot unplug works Jan 21, 2021
@kbrenneman
Copy link
Contributor

Another way of looking at the problem is:

  • If an application is using a device, how does it know that the devive is no longer valid?
  • How does an application know if a new device is available?
  • How should an application recover from a device becoming invalid?

You could invalidate an EGLDeviceEXT handle, as long as you never re-use that handle for a different device. If you allow re-using a handle, then applications would have to deal with a valid EGLDeviceEXT handle suddenly pointing to a different device, possibly between successive EGL calls.

For functions which take an EGLDeviceEXT handle, just returning an error code (probably EGL_BAD_DEVICE_EXT) could make sense. But, I don't know what should happen with an EGLDisplay that uses that device, especially if that EGLDisplay owns the current context.

@emersion
Copy link
Author

emersion commented Feb 9, 2021

If an application is using a device, how does it know that the device is no longer valid?

After enumerating the list of devices, if an old device isn't advertised anymore, the old device is no longer valid.

How does an application know if a new device is available?

Out-of-scope. On Linux, udev can be used to monitor new devices appearing/disappearing.

How should an application recover from a device becoming invalid?
But, I don't know what should happen with an EGLDisplay that uses that device, especially if that EGLDisplay owns the current context.

It doesn't seem like this is related to EGLDevice.

These are relevant I think:

@kbrenneman
Copy link
Contributor

If an application is using a device, how does it know that the device is no longer valid?

After enumerating the list of devices, if an old device isn't advertised anymore, the old device is no longer valid.

We'd still need to define what happens with EGL functions if a device was removed after the last call to eglQueryDevicesEXT.

We also don't want applications to be constantly polling eglQueryDevicesEXT, and it wouldn't be sufficient even if they did, since an unplug is still asynchronous. So, there needs to be some way for an application to know that it needs to call eglQueryDevicesEXT.

How does an application know if a new device is available?

Out-of-scope. On Linux, udev can be used to monitor new devices appearing/disappearing.

That's fair. Adding a new device doesn't affect what an application is doing with an existing device, so some asynchronous notification should be fine, and that notification is necessarily OS-specific.

How should an application recover from a device becoming invalid?
But, I don't know what should happen with an EGLDisplay that uses that device, especially if that EGLDisplay owns the current context.

It doesn't seem like this is related to EGLDevice.

It's part of the same problem -- a EGLDisplay is still a top-level EGL object associated with a device. If that device becomes unusable, then the EGLDisplay also does, just like an EGLDeviceEXT.

@emersion
Copy link
Author

We'd still need to define what happens with EGL functions if a device was removed after the last call to eglQueryDevicesEXT.

EGL_BAD_DEVICE_EXT for device functions, robustness for EGLDisplay.

there needs to be some way for an application to know that it needs to call eglQueryDevicesEXT.

Yeah, but I don't think EGL should be involved. Clients will likely want to integrate hotplug detection into their event loop, and that's too system-specific for EGL to handle. Just ask users to use platform-specific APIs such as udev.

If you really think EGL should be involved, I'd suggest working on a separate extension, and not block this issue because of it.

It's part of the same problem -- a EGLDisplay is still a top-level EGL object associated with a device. If that device becomes unusable, then the EGLDisplay also does, just like an EGLDeviceEXT.

Yes, but it's not specific to the device platform. It may happen when the EGLDisplay was created for another platform, like X11/GBM/Wayland/surfaceless. The driver will pick a physical device under-the-hood, which may disappear without the client noticing.

@kbrenneman
Copy link
Contributor

For adding a new device, we don't need EGL to notify the application. An application can carry on using whatever device it was using and still work just fine. If an application cares about new devices, then it can implement whatever OS-specific hotplug detection it needs, and then call eglQueryDevicesEXT again as necessary. I'll need to update libglvnd to deal with added devices, of course, but that'll be easy enough.

To deal with removing devices, I think we are going to want a new extension. If nothing else, an extension would be a good way to define the behavior that an application should expect. To provide a clean way for applications to cope with device removal, an extension could also define a new error code and/or a new query to distinguish between "that device handle is invalid" and "that device handle is valid, but the device disappeared when you weren't looking."

Anyway, this is what I was thinking for removal behavior. It's a rough sketch right now, but I can write this up a more formal spec if it sounds reasonable.

For EGLDeviceEXT handles:

  • After a device is removed, all EGL functions which take that EGLDeviceEXT handle will fail. With an extension, we can define a new EGL_DEVICE_LOST error code to use for this.
  • An application can check if a device is removed by calling eglQueryDevicesEXT. With an extension, we can also define a new query attribute for eglQueryDeviceAttribEXT, especially since eglQueryDevicesEXT can be pretty expensive.
  • A driver can re-use the same EGLDeviceEXT handle if the same device is later reconnected. A driver may not re-use the same handle for a different device.

For EGLDisplays:

  • If the device for a display is removed, then the EGLDisplay becomes invalid. The handle still exists, since you can't destroy EGLDisplay handles. All EGL functions which take that EGLDisplay will fail. Use the same error code that we'd use for device functions above.
  • With a new extension, an application could also check for a lost device using eglQueryDisplayAttribEXT.
  • If there's a current context from that display, then the context has a graphics reset, as per GL_KHR_robustness.
  • After an EGLDisplay becomes invalid, the only thing that an application can do with that display is to release any current contexts, and then to tear down the display with eglTerminate.
  • After calling eglTerminate, an application may try to reinitialize the display using eglInitialize. Depending on the display attributes, the driver may allow eglInitialize to succeed by picking a different device.
  • If eglInitialize fails, then an application could try getting a new display handle with eglGetPlatformDisplay. In practice, you'd probably just want to call eglGetPlatformDisplay unconditionally, since if the driver is capable of reinitializing the display, then it can just hand back that same EGLDisplay handle.

That last point is important for libglvnd. When an application calls eglGetPlatformDisplay, more than one vendor might be able to work with the native display, and libglvnd will just pick the first vendor that returns a non-NULL EGLDisplay handle. If the device behind that display disappears, then that first vendor might not have another device it can use, in which case eglInitialize will fail. But, when the application calls eglGetPlatformDisplay, then a different vendor could pick up the display instead, in which case it would return a new EGLDisplay handle.

@emersion
Copy link
Author

After a device is removed, all EGL functions which take that EGLDeviceEXT handle will fail. With an extension, we can define a new EGL_DEVICE_LOST error code to use for this.

Looks like a good idea.

With an extension, we can also define a new query attribute for eglQueryDeviceAttribEXT, especially since eglQueryDevicesEXT can be pretty expensive.

Hm. Something like an EGL_DEVICE_IS_ALIVE attrib? I'm not sure this is really needed (e.g. Vulkan doesn't have it).

A driver can re-use the same EGLDeviceEXT handle if the same device is later reconnected.

What if the same device is re-connected but on a different port?

Re-using handles between hotplugs sounds like a footgun TBH. What's the motivation?

@kbrenneman
Copy link
Contributor

With an extension, we can also define a new query attribute for eglQueryDeviceAttribEXT, especially since eglQueryDevicesEXT can be pretty expensive.

Hm. Something like an EGL_DEVICE_IS_ALIVE attrib? I'm not sure this is really needed (e.g. Vulkan doesn't have it).

Something like that, yeah.

It wouldn't strictly be needed, since you could just call eglQueryDevicesEXT and look for the handle. Using a specific device query might be easier or faster, though.

A driver can re-use the same EGLDeviceEXT handle if the same device is later reconnected.

What if the same device is re-connected but on a different port?

Re-using handles between hotplugs sounds like a footgun TBH. What's the motivation?

Emphasis on "can" -- a driver would be allowed, not required, to re-use a handle for the same device.

The inverse is the critical part: A driver must not re-use the same handle for a different device, because if it did, then an application would suddenly be working with a different device between two EGL calls without any way to realize that anything changed.

If a driver re-uses a handle for the same device, then you'ref fine: An application using that handle would continue to use the same device just like it expects.

@stonesthrow stonesthrow self-assigned this Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants