-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let's make surface creation safe! #1463
Comments
@pythonesque just a heads up: the issue is moved into |
This clarifies that the window that a surface is created on must be kept alive for the lifetime of the surface. This requirement and a proposal to change it are described in gfx-rs#1463
…fx-rs#1463) This has no effect, but makes the code more legible.
- Partially implements gfx-rs#1463
So, I decided to see what it would take to implement the "make Surface own the Window" part of this. The experimental patch is available in parasyte@73fbb52 I had to use The reason for using a trait object instead of generic type parameters is that the type parameters quickly started to infect almost all APIs. E.g. Line 875 in fe2b230
RequestAdapterOptions and so on.
|
It's a cool proof of concept - definitely worth pursuing with a PR. Can't you emulate trait_upcasting with your own trait? Unless I misunderstand its use. |
IIRC, the main reason not to use a trait object here is that some platforms have |
IIRC there's been a lot of discussion in This is also noted in
Tagging @rib and @MarijnS95 and linking a few related issues: |
Regarding winit integration for surface creation I also tend to think there should be a clearer definition of a "surface size" that winit could convey, instead of the "inner" size of windows which doesn't conceptually (or technically) match the size of the render target/surface across all backends. Please see: rust-windowing/winit#2308 |
Being more specific: this is only the case after returning from that callback on the Android side. Hence Not sure, but I think @rib implements a very similar thing (albeit with clearer API). I think the goal was for |
I think it would also be good to think through the details of how to support frame pacing effectively, since there can be a fiddly separation of responsibilities across the driver, window system integration and application, and they need to share/coordinate timing information with respect to surfaces and displays. (and so the questions around who owns what exactly may affect this) E.g. see frame pacing library overview for Android here: https://developer.android.com/games/sdk/frame-pacing/ |
After reflecting on this more, it really feels more intuitive for the surface to be generic over the lifetime of ... something. Apparently not the RawWindowHandle implementer, due to comments above wrt Android. The one major drawback I am aware of (and I haven't tried this anyway) is that it requires window The |
The (newly added) caveat in |
(I believe that the surface owning (a reference to) the window is still the only sane way to make this API usable without essentially giving up the ability for the user to choose when to present entirely, and instead having the trait work "the other way around", where winit asks for an application like wgpu to provide hooks to access the currently queued presentable stuff. Which could totally work--it's a bit more like the browser, arguably--but it would be a dramatic API change). (Also if it worked "the other way around" you run into a related problem where the part of the wgpu context that records queued commands to surfaces needs to be shared-owned by winit... which isn't really that bad, mostly just pointing out that the problem doesn't go away. This also creates a bit more friction for someone who wants to build their own wgpu-like layer because now they need to be the ones implementing the unsafe trait, but that could probably be resolved by having some sort of fully safe reusable library for queueing up commands to surfaces that implement such a trait, since that part of the implementation isn't really platform-specific. Sorry if this is a bit rambly). |
@pythonesque Can you clarify? |
When I was last working on this, people still weren't 100% sure whether if you explicitly incremented / decremented a reference count to the Android surface, all the objects were still made invalid on those lifecycle events. If they weren't then you wouldn't necessarily need to synchronize with the events for safety. |
@pythonesque if you increment the refcount the Indeed, |
Indeed it is probably wrong, but it sounds like it is memory safe, right? For the purposes of the current discussion, that should be sufficient to make the API safe everywhere. That seems like a worthwhile motivation in itself. Moreover, it would allow winit to pass events on to the user to notify them that they needed to refresh their surfaces. Not that I'm saying this situation is ideal or anything, just that I think anything that works better on Android will probably require breaking changes to the |
I described this more clearly in rust-windowing/raw-window-handle#84 (comment): if anything this is only a representation of the Android version that I tested this on and exempt from vendor changes. The documentation specifically mentions to not touch the hardware buffer after (returning from) that |
Right. Well, if we don't want to rely on vendor specific workarounds to support Android (but are there any vendors that actually break this?), I think the simplest thing to do would be to replace AsRawWindowHandle with a method that takes a closure and provides you the raw window handle for use only within the closure (there are a lot of ways to do that). The closure would just be responsible for making sure to acquire / release a lock on Android. Window ownership by the surface would still be needed because that's still the safety requirement needed for the window handle to be valid; Android just has a way of invalidating the window that doesn't exist on other systems. Like I alluded to earlier, the alternative would be to totally ditch AsRawWindowHandle and instead reverse control flow so that the window manager is exclusively responsible for both processing lifecycle events and drawing to the surface, where it gets communicated with asynchronously by libraries like wgpu leaving work for it to do. But that seems like a much larger / more ambitious potential change that would need a lot of evaluation, whereas "acquire/release in a closure on Android" (and do nothing special in the closure on other platforms) is very straightforward. |
There has been some discussion in the past over how to make create_surface safe:
gfx-rs/wgpu-rs#78
gfx-rs/wgpu-rs#281 (comment)
From these comments, my impression was that the only issue was not being able to trust the pointers provided by the type implementing https://docs.rs/raw-window-handle/0.3.3/raw_window_handle/trait.HasRawWindowHandle.html. However, this is an
unsafe trait
and the requirements it puts on its implementors are actually quite strong:(emphasis mine).
We can instantly see that just by taking
&W
whereW: HasRawWindowHandle
, we do not need to worry about arbitrary pointers or integers--we can rely on the guarantees ofHasRawWindowHandle
alone, which ensures that as long as the underlyingW
instance is alive:Since wgpu already needs to deal with handles in a backend-sensitive way, it makes perfect sense for it to just check all the individual handles. Problem solved, right? Unfortunately, it turns out that while this is the only documented source of unsafety, it is not the only source of unsafety when using a surface created by
create_surface
. @cwfitzgerald gave the following code as an example, when performing rendering using a task system off the main thread:When trying to close a window, this event loop would segfault on some Linux systems, but moving
wait_for_frame
below theframe = _
portion (at the end of theMainEventsCleaared
arm) would prevent the segfault. Here,wait_for_frame
waits for all commands to be submitted from the CPU to the GPU (but does not wait for the GPU). This suggests that after exiting the event loop (dropping the window), the window surface may be freed on the CPU on some platforms while there are still in-flight commands being sent to the surface from another thread; presumably, someone (the window manager or graphics driver) is able to make sure device resources are not reused until rendering is complete, but the same is not true of host resources.The implication here is clear: in general, we can't treat a surface as valid unless the underlying window from which it was created is valid. Here, "window" doesn't necessarily have to be a window (although in practice it usually will be), it just has to be the object that owns whatever resource is referred to by the handle it passes to wgpu. Note that in a wgpu context, we only care about "real" surfaces with valid handles.
While on some platforms (e.g. Android), it seems that a window is always kept alive as long as the surface is, on others (e.g. X11 on Linux), it seems like the surface may be deallocated together with the Window (we could not inspect the actual implementation of https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCreateXlibSurfaceKHR.html so we could not verify this for certain, but either way the Vulkan documentation doesn't indicate that there's any requirement that the surface hold onto a reference to the window). So our first conclusion is that if we want to be safe, we should make sure--within Rust--that a window always outlives the surface from which it was created.
We discussed several approaches here, but the most obvious is this one:
Surface<W>
.create_surface
(in wgpu) takesW: HasRawWindowHandle
by value rather than by reference.create_surface
(in wgpu) validates any handles it uses are non-0 / non-null (depending on platform), runs all the checks and resource creation steps it currently does, and then storesW
within theSurface
structure.Surface<W>
exposes a few methods to get back the window data:fn window(&self) -> &W
, andfn into_window(self) -> W
. Note that we do not providefn window_mut(&mut self) -> &mut W
, for reasons explained below.First, safety arguments for why this makes it okay to use the window handle associated with the surface:
HasRawWindowHandle
, as long as the underlyingW
is alive, the window handles will not change after repeated calls, so we know our checks from (3) will continue to be valid.HasRawWindowHandle
, as long as the underlyingW
is alive, the handle returned from a call will refer to a live window, so we know the window is valid as long as the instance ofW
is; which is at least as long as theSurface
itself, provided that we don't provide any means of partially moving out of theWindow
.&mut W
access or other partial window move methods, there is no way to alter the stored window without dropping the wholeSurface
(this is why we can't implementderef_mut()
).These may seem like uncomfortably strong restrictions on the window used for gfx-rs. We address these concerns in two ways:
HasRawWindowHandle
upstream in theraw-window-handle
crate (all of which we believe are justified by the requirements onHasRawWindowHandle
):This means that in order to share access to the
Window
with the rest of an application, one can simply passArc<W>
,Rc<W>
, or&W
(and possibly other implementations, should they prove useful) tocreate_surface
. Due to the nature of how windows work (all meaningful operations on them must work in the context of a separate device being able to query them through a duplicable handle), we know that the important, shared component of the window (the part that needs to be alive when the surface is) must have interior mutability of some sort if it can be mutated at all; for actually existing window APIs, this interior mutability is made safe dynamically rather than within the type system, so we don't lose anything by preventing&mut
access to the window while wgpu may have references to it. In exchange, we get a powerful safety argument that relies solely on the correct implementation ofHasRawWindowHandle
andcreate_surface
, rather than requiring detailed investigation of every possible window management API.We also examine practical, large wgpu projects to see if these restrictons would be onerous. From my own work on Veloren I know that these restrictions would be no problem for us. We also looked at Bevy, Dotrix, and Iced, and manually verified that they could move to this API with minimal changes:
Arc<winit::Window>
.It requires
&mut
access only to its own internalbevy::Window
, which is distinct from the one to which you have a handle, and even this is only neededduring window creation right before it's inserted into the hash map. So there would be no issue with switching to
Arc<winit::Window>
.winit::Window
and it could easily haveRenderer
own it: https://github.com/lowenware/dotrix/blob/fix/wgpu-pipeline-creation/src/application.rs#L67.Arc<winit::Window>
would make it even easier, but is not required; the code that needs access to the window can always get at the underlyingRenderer
instance by accessing it through the global services structure, so if the renderer owned thewinit::Window
(indirectly through the surface), it would work fine (Renderer::new
constructs and stores the surface).&W
instead ofW
, but as that API is unsafe it will have to be changed regardless). In the actual examples I could find, the window is easily available by value, however, e.g. https://github.com/hecrj/iced/blob/ff15ebc54778dee2a7262469c8a2bcd5abecd4d1/examples/integration/src/main.rs and https://github.com/hecrj/iced/blob/1f7e8b7f3d1804c39c8e0934b25f3ef178de269c/wgpu/src/window/compositor.rs.So we can see that our theoretical expectations translate well to practice. We can also see that in all known big examples,
winit::Window
is what's providing theRawWindowHandle
, which makes sense since it works to abstract window creation across as many platforms as possible. As such, verifying thatwinit::Window
's implementation ofHasRawWindowHandle
satisfies the trait requirements, together with correctness ofwgpu
's implementation of functions onSurface
, will in practice be sufficient to verify the safety of real world code using this API (which is important because it means that we are not simply hiding an added unsafety burden by moving it somewhere else in the stack).However, this is still not enough! This is because we still haven't addressed the thing you actually do with surfaces, which is use them with swap chains. Swap chains are currently produced from the combination of a surface and a compatible device. When
get_next_frame()
is called on a swap chain, we acquire a texture resource on the device; the resource is compatible with the target surface. If successful, we can then render to the texture resource from our device, and then finally (on dropping the swap frame) "release" the texture, and it is queued for presentation to the surface. Clearly, the safety of all of this depends crucially on both the device and surface remaining alive for the duration of the swap chain, but this is not guaranteed by the current API!As discussed on Matrix, the swap chain API is still very much in flux, with no major players having made a firm commitment to any particular API. However, wgpu itself has been learning from how people use it (together with constraints imposed by cross-platform requirements), and the current thought is that surfaces should own the queue currently associated with swap chains; a surface will own one swap chain at a time, and (from our discussion) wgpu has no interest in surfaces that are not part of a swap chain. This takes care of one safety issue: if a surface replaces the functionality of the swap chain, then clearly the swap chain cannot outlive the surface!
Since the swap chain needs to allocate device memory, this also implies that the swap chain should maintain a strong reference to the backing
Device
. It is possible that given thatWindow
is alive, the existing gfx-rsContext
mechanism will be sufficient to preventDevice
host memory from being deallocated, since unlike withWindow
we started out with ownership over the device data; however, it's possible that the current complex dependency graph around swap chains, swap chain frames, devices, surfaces, etc. could be greatly simplified by maintaining a directArc<Device>
. Either of these mechanisms should be sufficient to ensure safety of creating a swap frame while being reasonably flexible.To make sure that a swapchain frame is still alive, the most straightforward method would be to add a lifetime to
SwapChainFrame
;get_next_frame(&'a self) -> SwapChainFrame<'a>
would be the new signature defined onSurface
, andSwapChainFrame
would maintain a reference to theSurface
that instantiated it. If this is too inflexible for some purposes (although I suspect most could be made to fit into this pattern), we could define on the method asget_next_frame<S: Borrow<Self>>(surface: S) -> SwapChainFrame<S>
, which would allow usingRc<Surface>
,Arc<Surface>
,Surface
,&Surface
, etc. as desired (we woul also want to provide aninto_surface()
andsurface()
method as usual); though we might not be able to get away with this safely for arbitraryBorrow<Self>
, we could at least hardcode the stuff we know works. Alternately, we could again rely on reference counting / pointer following in the shared context, and avoid any explicit references (asSurface
would already take care of them).I believe this plan closes all the potential soundness holes in the current API around windows and surfaces. There may still be implementation issues, of course, but the safety argument should hold as long as
HasRawWindowHandle
is implemented properly.So, to recap:
create_surface
needs to check for null and zero handles as appropriate (in a cross-platform way).Surface
will own the windowW
and provide&W
access andinto_window(self)
, but not&mut W
.Surface
will own the resources needed for the actual swap chain (queue etc.) and those resources in turn should hold onto the Device.Surface
(and thereforeDevice
) that produced them when they are acquired (exact mechanism TBD).HasRawWindowHandle
forArc<W: AsRawWindowHandle>
and related instances implemented upstream (technically I think all of them can actually own theWindow
, but theArc
instance makes it easier).AsRawWindowHandle
(needs to satisfy the requirements it already has).The text was updated successfully, but these errors were encountered: