Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement VSync source #2412

Open
kchibisov opened this issue Aug 8, 2022 · 51 comments · Fixed by #2896
Open

Implement VSync source #2412

kchibisov opened this issue Aug 8, 2022 · 51 comments · Fixed by #2896
Assignees

Comments

@kchibisov
Copy link
Member

This is a common concept in e.g. browsers to have non blocking frame scheduling at the display refresh rate.

It seems like it's possible on all desktop platforms and the web. For the rest we may indicate that vsync source is not supported.

The way to integrate it would be to add WindowEvent::Frame which must be send when the new frame is ready to be drawn. That will come from the compositor directly and the users must request those event via Window::request_frame() -> Result<()>, where the result will indicate that the event can't be scheduled due to reasons.

The Api that should be used for that on platform are the following.

X11 - Xpresent
macOS - DisplayLink
Wayland - frame callbacks
Windows - DWM get composition timings + a timer.
Web - request animation frame.

The Api is essential for Wayland clients, since they must not rely on vsync from the graphics Api.

@dhardy
Copy link
Contributor

dhardy commented Aug 8, 2022

Sounds good to me, though it would be worth considering how failures of request_frame should be handled (e.g. requesting a wake in 16ms?).

@kchibisov
Copy link
Member Author

should be handled (e.g. requesting a wake in 16ms?).

I think it's up to application what to do here. Like it clearly knows how that it failed before hand, so can just fallback to something else. I don't want to add timers to every backend, just to provide some hand holding. At least for now. In general this can only fail on X11 if you don't have libXpresent installed, so it's like you should really try to fail...

@rib
Copy link
Contributor

rib commented Aug 11, 2022

Based on discussing this a bit on IRC I think it could be good to highlight one clear sub-problem that came up which is that it could be really helpful to have just one place where applications are allowed to render.

e.g. in the case of Wayland, if it was guaranteed that apps only rendered during RedrawRequested event callbacks then the backend could potentially register frame callbacks there to get notifications from the compositor for throttling the application.

(so potentially frame throttling might be possible on Wayland with no new API if it were documented that rendering were limited to RedrawRequested events. Similarly this should be possible for X11).

Thinking more generally about high level APIs related to rendering synchronization may open a bigger can of worms, which could be good to try and break down.

One initial concern I have about a request_frame() API that seems inspired by requestAnimationFrame in browsers is that it looks kind of back-to-front compared to typical presentation APIs that would throttle / synchronize you based on a previously submitted frame.

At a lower level then synchronization with the vblank will either be handled somewhat transparently by apis like eglSwapBuffers that can be configured to block while they wait the next vblank to release a new backbuffer or may be done more explicitly with a fence object api of some form that lets the application poll (or get an event) for when work on the gpu has completed. These mechanisms only really apply to non-composited applications though.

In composited environments the norm is to get asynchronous feedback after you give the compositor a frame that may tell you when the frame became visible for the user or at least when the compositor processed the frame. For basic throttling of the application then the exact semantics don't necessarily matter too much, just so long as it's a 1:1 ratio for vblank.

In both cases though the synchronization (or I'd say throttling for a composited environment) comes after you have rendered something and submitted it to be presented.

In general though it's probably good to keep in mind the differences between composited / non-composited applications here (presentation being throttled by a compositor vs synchronized with real hardware) which will mainly affect how the app gets its feedback. In the case of eglSwapBuffers the feedback may just be implicit (the app process may be blocked until a new back buffer is ready)

Off the top of my head I can think of three main points of interest and kinds of synchronization that applications are likely to be interested in for different use cases:

  1. The compositor has finished processing the frame that included your app-rendered frame (this is what wayland frame callbacks are generally about)
  2. The compositor has rendered its own frame that includes your app-rendered frame and that is now visible to the user on a screen (E.g. some X11 compositors can give info about this, and I think there was probably also wayland protocol for this too)
  3. Now is a "good" time to start rendering to have the minimum time between rendering and having your frame be visible to a user. (As in scheduling rendering to minimize latency between what's rendered and seen - important for VR, also called frame pacing, E.g. see https://developer.android.com/games/sdk/frame-pacing)

The above request_frame() API seems like might be more suited to trying to support frame pacing, which is more about trying to use heuristics to predict the best time for an application to start rendering. I'd generally say this is a much harder, and also more subjective, nuanced problem space than the problem of throttling rendering to the VSync, which is what the title of this issue references.

Libraries like Swappy on Android might help with being able to support frame pacing via Winit but it's the kind of thing that needs to use some amount of guess work and heuristics and I think it could be quite challenging to stabilize for lots of general purpose use cases across lots of window systems.

Looking just at Swappy for Android you can see here https://developer.android.com/games/sdk/frame-pacing/opengl/add-functions how the library also depends on graphics driver extensions which will probably complicate the relationship between Winit and APIs like Wgpu.

If there would be real interest in re-considering quite how Winit drives its rendering, such as to help with synchronization/throttling/pacing, it could also be a good opportunity to re-consider what would be ideal for iOS and OSX which I generally recall drive their rendering via delegate callbacks where apps are really expected to do, and finish their rendering synchronously as part of those callbacks whenever they happen (which doesn't fit super well with the winit event model as far as I recall)... just found the issue I was thinking of: #2010

On Android we also want to be very clear about where rendering happens considering that apps must not render if they don't have an associated surface, so I'd definitely be interested in seeing Winit clearly limit/define where applications are allowed to render.

@kchibisov
Copy link
Member Author

I think what I really want here is a frame throttling source, so I can draw at the vsync rate without blocking and with very low chances of missing vblanks.

I've started the work here #2535 for X11 for now, since I can't add Wayland ergonomically here.

@rib
Copy link
Contributor

rib commented Oct 28, 2022

Assuming that we're dealing with composited applications (i.e. ignoring the possibility of writing compositors using Winit for now) then yeah it seems good to prioritize having a way the throttle client rendering, which may not happen automatically with APIs like egl/glxSwapBuffers.

  • For X11 (as mentioned under Add frame throttling source #2535) my instinct would be to look at using the _NET_WM_FRAME_DRAWN protocol.

  • For Wayland I'd expect to use frame callbacks.

  • I'm much less sure about Windows but there is a DwmFlush() API that can sync (blocking) with the compositor (which seems excessive from the main thread). There's also DwmGetCompositionTiming. I have a feeling that on windows you can call DwmFlush() from a thread, and so combined with DwmGetCompositionTiming it may be possible to create a mechanism for waking the event loop in sync with the compositor.

  • On macOS / iOS it looks like the recommended way to throttle rendering is via a CADisplayLink /CVDisplayLink callback which abstracts away from synchronizing with any compositor, but should approximately throttle to the vsync.

  • On Android, without going down the route of using Swappy for frame pacing then it looks like we can instead get a vsync callback from the surface flinger via a "choreographer", which is also available via the NDK: https://developer.android.com/ndk/reference/group/choreographer

  • For web I suppose the closest thing would be to use requestAnimationFrame

ah, just realized that the original issue description listed most of the above, sorry

For X11, Wayland and maybe Windows then it seems like there's more capability to track the progress of specific frames (or on Android with the extensions that Swappy uses) where you can get detailed stats/metrics relating to a specific frame.

For macOS/iOS/Android (choreographer)/web you essentially just get a higher-level callback mechanism that will generally throttle your rendering but without any detailed per-frame feedback.

I think it would be nice if Winit could support both of these general models:

  1. a way to get per-frame throttling and timing feedback if supported by the window system or
  2. a general frame pacing callback/event

but I guess that the easiest abstraction to support initially in a portable way would just be (2) to have a frame pacing callback/event across platforms.

Based on some of the earlier discussion it might also be good to consider making such a callback/event be the only place that applications are allowed to render (good for Wayland).

It might also be possible to just define that the existing RedrawRequested event would be that event.

@rib
Copy link
Contributor

rib commented Oct 28, 2022

It's maybe worth highlighting here that the #2535 discussion has been a good reminder of how much of a mine field this is for X11. This capability was something that was worked on mainly in support of Intel drivers and Gnome 3 way back around 2010-2013 but even to this day there's not really any one standard way to support throttling of x11 clients to vblank/the compositor and it's also a fairly disproportionate amount of work to try and cover all the unique cases, including:

  • windowed client, x11 compositor, open source drivers
  • full screen client, x11 compositor (potentially unredirected), open source drivers
  • windowed client, x11 compositor, nvidia drivers
  • full screen client, x11 compositor (potentially unredirected), nvidia drivers
  • windowed client, xwayland assuming open source drivers (since wayland not generally supported on nvidia)
  • full screen client, xwayland (potentially unredirected), assuming O/S drivers
  • windowed client, non-composited, open source drivers
  • full screen client, non-composited, open source drivers
  • windowed client, non-composited, nvidia drivers
  • full screen client, non-composited, nvidia drivers

It's pretty much no joke that each of those cases need special consideration and a mixture of different technical solutions that aren't very consistent with each other. Sometimes the ideal solution depends on glx/egl extensions but with inconsistency between glx + egl support (both out of scope of Winit currently). The situation with Vulkan is likely even worse in some cases because since Vulkan has emerged there has been comparatively little development on x11 (the main case of handling full screen games would be the priority)

@kchibisov
Copy link
Member Author

From what I can say, the only option for now would be not support X11 at all. And add support for _NET_WM_FRAME_DRAWN will be adopted by picom/kwin for example. Xpresent shown that it's not really good solution and works only with Xwayland(which is good for games though, but there's no way to detect that we're under XWayland).

So I guess we may just go without X11 for now, it's not really a big deal given that Wayland is where all the cool work is happening and X11 support is more about providing better experience for software that no-one cares about maintaining anymore.

@rib
Copy link
Contributor

rib commented Oct 28, 2022

Yeah, I'm mostly all for forgetting about X11 at this point ;-p

Unfortunately, in reality Wayland still isn't supported on Nvidia and so a large chunk of Linux desktop users are likely to still be stuck using X11 which does spoil the picture here somewhat.

Maybe one way of starting to think of things could be that classic X11 likely implies that you're running on Nvidia and everyone with open source drivers is probably (or should at least have the option of) running a Wayland compositor / xwayland. Though it's not quite that simple - I have a hybrid Razer laptop with integrated (Intel) graphics and a discrete Nvidia GPU and that's also awkward to run Wayland on.

@rib
Copy link
Contributor

rib commented Oct 28, 2022

but there's no way to detect that we're under XWayland

I imagine if this would be helpful we can figure out a way of detecting this with a reasonable amount of certainty (though regarding xpresent I think we'd be playing with fire to be trying to use the protocol directly outside of the EGL/GLX driver)

@kchibisov
Copy link
Member Author

Well, the recent nvidia stuff does work on Wayland from what I see and some folks run sway on nvidia binary drivers, so it's changing from what I can say.

@rib
Copy link
Contributor

rib commented Oct 28, 2022

Yeah actually I just had my mind blown while looking at the mutter source code re: Nvidia support.

I was just looking through some of the mutter code out of curiosity to see if I could find a standard enough atom that would e.g. be set on the "root" window to identify xwayland and I just saw that they have been adding EGLStream support to mutter!

EGLStreams is the extension that Nvidia have always pushed for using as the basis for supporting Wayland which has historically been rejected by the open source xorg/freedesktop community (since it means having a special case for lots of things just to support a single vendor via a vendor-specific EGL extension)

It looks like that was started around december 2021 and there's still more recent work that's been happening for supporting EGLStreams so maybe we're finally going to get to the point where we really can say good riddance to classic X11!

@rib
Copy link
Contributor

rib commented Oct 28, 2022

That's made my day!

Considering that I bootstrapped the Wayland support in Mutter I was always a bit disappointed that for years afterwards the maintainers took the principled stance against supporting Nvidia's EGLStream extension vs being (imho) more pragmatic about the bigger benefits that could come from being able to move the community forwards to Wayland at least (and just argue about finding a common technical solution to replace EGLStreams + gbm later)

There are actually reasonable pros and cons arguments for EGLStreams vs gbm (which is also not a perfect solution) so I think we could have been more compromising early on here.

@kchibisov
Copy link
Member Author

If you look at comsic-comp (The pop os compositor) it should support EGL streams, since smithay supports EGL streams, but EGL streams are not that required nowadays given that NVIDIA has drm/gbm support now. You just need a recent driver and luck that it'll work with your setup.

@rib
Copy link
Contributor

rib commented Oct 28, 2022

Some of their work to support drm/kms/gbm looked like it was very limited, and only usable for some very narrow use cases but maybe it's improved. I wouldn't be surprised it's it's still much better to just use EGLStreams for Nvidia for anything but bare bones support. At least in mutter it looks like they are actively working on EGLStreams support so I guess it's still the more practical way of support NVidia vs using their gbm support.

@kchibisov
Copy link
Member Author

kchibisov commented Jun 18, 2023

The idea right how it should work was outlined before and yes, other backends don't really map in a way you describe, however winit could do similar scheduling based on DisplayLink(some dwm API) on macOS/Windows where it'll callback to the user once the frame fires on such interfaces. So it'll basically be a throttling hint that way.

It's true that having pre_present_notify could help, but on the other hand it's not clear that we always should request a frame that way. The main issue I have with your RedrawRequested which I've said multiple times, that my app generally wants to get hints, but
it doesn't mean that it wants to draw anything when it gets each of the hint. This maps on Web and Wayland, but doesn't on the macOS, with its drawRect.

How would the rendering loop for macOS looks like? I think if you ask it that you want to present it'll simply call drawRect and you must do the actual drawing (also, I don't see that you must do so from their docs).

It seems to me that what you suggests is more or less:

  1. Add pre_present_notify, so it can schedule frame callbacks transparently for the users. Users won't know about them, but they will throttle things, if pre_present_notify is used, everything is instant.
  2. Deliver the RedrawRequsted for the users if they asked for they do pre_present_notify and asked for request_redraw.
  3. The request_redraw must not batch, at all.
  4. On macOS/Windows pre_present_notify is a no-op, the request_redraw will force the macOS to call drawRect thingy.
  5. On Web, it does the thing similar to what Wayland does.

Is this what you want? So the users simply do request_redraw and throttling is done transparently for them? They'll still must draw on the RedrawRequested, because they've asked for it, and we remove arbitrary asking for such a redraw when resize happened (unless backend wants it for its internal state tracking, like it's on Wayland).

@kchibisov
Copy link
Member Author

kchibisov commented Jun 18, 2023

As a separate note, we have a cases, where users wake up winit's event loop from other thread with a need to force a redraw operation, given that some main events could be altered (resizes) whether we get a frame callback or not, should we state internally that all users event are handled before all winit events, so if they want to ask for redraw, they'll get what they want?

Edit: get what they want, I mean that winit's internal behaviors like squashing resizes on Wayland, so you have only one resize per frame callback would be hold?

@kchibisov
Copy link
Member Author

Hm, no, throttling resizes is a bad idea, we'd just say that You should resize the actual frame buffer once you get a RedrawRequested event.

@kchibisov
Copy link
Member Author

While that's sounds more or less, what should we do with X11 @rib and backends we fail to apply the model for some specific reason? Should we simply say that such backends will deliver event right away and you may want to throttle everything yourself?

I'm also not sure if everything will work the way it was discussed here in practice (it'll target current monitor refresh rate), so we'll see how it'll go...

@kchibisov
Copy link
Member Author

@rib I guess windows is also done now, to some point?

@kchibisov
Copy link
Member Author

Windows is still not done, #2900 (comment)

@fredizzimo
Copy link

Is there any time estimate for the macOS implementation?

I know have a draft PR for Neovide that switches to use the winit Wayland implementation instead of the custom one we used before and that works great.

But macOS is still problematic for us, since the opengl swap_buffers is broken, as reported for example in these issues

So, for now, the only real option for macOS users, is to use a software timer with the refresh rate set to the monitor rate, which is reported to work better than the standard opengl vsync.

We are soonish planning to do a release with all our rendering changes, but I would prefer to not release something that either does no frame sync at all using a lot of power, or something that syncs to the completely wrong rate if possible.

I have tried to find someone to implement the displaylink callbacks in Neovide, but it seems that no macOS developers are available. So, I'm hoping for more luck with winit developers, although that's probably too much of wishful thinking as well.

The Windows implementation is not that important for us, since the DWMFlush based implementation that we have work reasonably well. It's not perfect, sometimes frames are dropped without any visible reason, but I think it has something to do with the GPU throttling down, and otherwise the system not prioritising the graphics display. And I have tried many different things, but waiting for DWMFlush, just before swapping the buffers is the best I found. I also tried IDXGIOutput::WaitForVBlank but it was not really any better. The only way I was able to get it completely smooth was when I was using Direct3D and waitable swapchains, but I decided for now at least to not use that, since it's too different from the other platforms.

@kchibisov
Copy link
Member Author

Is there any time estimate for the macOS implementation?

There're no estimates, it just someone should want to do that. I don't think it's hard though, you just need to hook the callback. Timer isn't that bad as long as it's consistent.

I might allocate my time for that, but no promises. I do only Linux stuff in the end.

@fredizzimo
Copy link

Thanks.

Yes, that's what I thought would be the case, we just have to wait and see when someone picks it up. I just wanted to let you know that from the perspective of Neovide it's quite high priority, but we don't expect anything from you. If someone really needs it from our side, they can make the winit implementation and send a pull request here.

It's always tricky to find people to implement something that they don't really need themselves. Especially on macOS, which has much fewer developers in total, compared to what seems to be a quite big userbase for Neovide. As you said, it looks like it's easy to do, and I was even considering doing it blindly myself and just check that it compiles, but I decided against it, since it's very hard to guarantee that it actually works, and not just improves the situation.

@MarijnS95
Copy link
Member

For future reference I'm mapping out the AChoreographer bindings in the NDK: https://github.com/rust-mobile/ndk/compare/choreographer

The callbacks you get don't necessarily throttle frame timing, but tell you when the next vblank (or multiple vblanks/timelines, if using a more elaborate callback) is/are. You can also inform Android about your choice of timeline and refresh rate in general, so that it can optimize for your use-case if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment