Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate performance on dual GPU (optimus) PCs #8165

Closed
ttnulll opened this issue Mar 7, 2020 · 23 comments
Closed

Investigate performance on dual GPU (optimus) PCs #8165

ttnulll opened this issue Mar 7, 2020 · 23 comments

Comments

@ttnulll
Copy link

ttnulll commented Mar 7, 2020

So, as the title says, it's not working really good for setups that have 2 GPUs., no matter how powerful they are. Not SLI or CF, it's about laptops (nvidia optimus) or setups with discrete GPU w/o direct output. Basically anything, that uses the integrated GPU to output image.
While original osu had this issue for a really long time, it still hasn't been fixed and the only ways to make it at least playable are switching the API to DirectX with the compatibility mode, or playing with integrated GPU. Playing with iGPU worked pretty well there, but lazer can't output more than ~250 fps at all for me, no matter which GPU I use (both are not bad at all). I remember it was said somewhere that for original osu it's not possible to fix, because the low FPS comes from PCIe bottleneck, but the original game is still running not that bad on DirectX, and lazer runs a lot worse. Seeing that the way to make it playable still wasn't found for original game, this makes us 2GPU players feel kind of doomed and having to replace the setup to play it, no matter how powerful is the already owned setup, because the performance will be bad anyway.
So, as someone who really likes new lazer features, I really want to see it playable for my setup. Even if making FPS higher than ~500 for 2GPU output is impossible, can it at least be optizimed for Intel GPUs? An option to change the API would also be nice, at least to try, since it made the original game a bit better.

@peppy
Copy link
Sponsor Member

peppy commented Mar 7, 2020

250-500fps is already higher than our goal. glad to hear you can run it this well already!

You shouldn't really need more than this. Systems with optimus setups are usually 60hz, so around 240fps is the highest you'd want to push things.

@TCM-dev
Copy link

TCM-dev commented Mar 7, 2020

I dont think I'm the only one playing with a second monitor (144hz) on a laptop cuz I need the portability until I can get 2 pc.
If it can help I'll add that playing on the second monitor allows me to play at 2-3k fps on normal osu while I can only achieve 300fps at best on the laptop screen.
However while you think 240fps is enough for 60hz screens I disagree, in my experience over 4 years of playing, around 700-800 is the bare minimum to not feel somehow like the game is having input lag. Yes technically 240fps has 4.x ms input lag which is only 3ms more than 1000fps but I'm pretty sure having so little fps causes having input lag elsewhere too cuz it definitely feels different.

@bdach
Copy link
Collaborator

bdach commented Mar 7, 2020

lazer uses a different threading model than stable (which as far as I know was single-threaded). In lazer the input thread is separate from the draw & update threads and runs at a higher rate (should be ideally about 1000fps). You can check that by enabling the fps counter.

This in particular means that having framerate higher than 240fps should have zero impact on input. Sure, drops of draw & update could be sometimes noticeable but over 60fps they really shouldn't. The screen can only output so many frames.

@TCM-dev
Copy link

TCM-dev commented Mar 7, 2020

Oh ok that's sweet I didn't know.
However having to play on the laptop monitor would also cause a lot more low fps spikes and freeze than if you were able to play fully on the gpu. (Like 2 freezes/min) but I'll let the op continue on the subject if he experiences such things too. (Maybe that's already solved on Lazer too, I should actually try it)

Edit: tried it, can't experience any issues (didn't actually try to play but the input counter was always above 1000 while the fps were around 100-200 during gameplay, didn't experience freezes either)

@peppy peppy changed the title Making lazer playable for 2x GPU setups Investigate performance on dual GPU (optimus) PCs Mar 8, 2020
@Elepover
Copy link

Also experiencing the same problem. After some investigation, here's what came up for me:

In short

Might be a hardware problem, there's nothing osu! can do to fix it.

To make your experience better, try:

  • Running osu! with iGPU (yes, with iGPU) on internal display
  • Connecting to an external display and run osu! on it*

*This might not work for all Optimus-enabled laptops, see the explanations below.

Deeper Insights

This post on Tom's Guide explains things pretty well.

By default, yes, osu!lazer (and any other not specifically configured applications) is running with iGPU as it turns out:

context menu

Sadly, even if you choose to run with the NVIDIA dGPU, it only switches to dGPU to render it. Whatever you choose, for the integrated display, it'll always be iGPU to display the frames.

Some special Optimus-enabled laptops are configured to have their HDMI ports directly hooked to the dGPU (which is my case), so you'll get a MUCH better experience on external displays. As shown below:

iGPU+dGPU

^ when dGPU renders and iGPU outputs (internal display)

dGPU

^ when dGPU does all (external display)

In my case, the rendering pipeline is like:

pipeline

osu! is a rhythm game and is sensitive to all kinds of latency. Using laptop's internal display means that the frame has to go through an extra framebuffer before being displayed, which then, obviously, increases latency. Therefore explains why on external display the overall experience is better.

This is a hardware problem and applies to basically any game and osu! is unlikely able to fix it, apart from telling the driver to select dGPU by default to render osu! frames (mentioned in #5582). It works for many performance-demanding games like some AAA titles since the bottleneck is no longer the iGPU, but in osu!'s case it may not bring any significant improvement.

@peppy
Copy link
Sponsor Member

peppy commented May 14, 2020

Sounds about right. osu! is one of the few games that people actually care about running above their refresh rate (should be less of an issue with lazer, though).

@mirh
Copy link

mirh commented May 14, 2020

There isn't really technically an extra framebuffer.
It's just that the copy engine has probably a maximum bandwidth.

EDIT: well, screw me

@lybxlpsv
Copy link

lybxlpsv commented Nov 19, 2020

On my testing. OpenGL games has more frame latency compared to DirectX on NV Optimus laptops.
On other rhythm game I've modded on, rendering an OpenGL game onto a frame buffer, drawing it into DirectX with WGL_NV_DX_interop, fixed it completely. Windowed is faster than OpenGL and Full Screen the latency is near zero while still using the OpenGL renderer.
I think this is also a better approach compared to Angle due to less to none translation overhead.

@peppy
Copy link
Sponsor Member

peppy commented Nov 19, 2020

We are definitely going to add ANGLE support back, but the method you mention sounds a bit weird. I'd hope we could actually find the reason OpenGL is slower and resolve it directly (assuming there are examples of games which achieve this).

@lybxlpsv
Copy link

lybxlpsv commented Nov 19, 2020

The method is fairly simple, the example from khronos website has examples on how to blit OGL framebuffer into DX directly WGL_NV_DX_interop (DX9) or WGL_NV_DX_interop2 (DX11).
And the issue is not about running higher above the refresh rate, because no matter your FPS is, you will always have display latency on OpenGL on Windows.

We just want direct from game to display without any kind of buffering, we don't really care about tearing. and so far, on my testings, the only way to do near zero latency is DirectX exclusive fullscreen, without vsync and with display optimizations disabled which is very useful for mouse movement based games like osu!.
vsync or any kind of buffering will always have more latency due to synchronization issue as discussed here by nvidia employee.

As much as I want the opengl issue to be resolved, I doubt this is fixable without all related vendor stepping up to fix the entire display synchronization issue or microsoft start doing direct blit to display for OpenGL which is fairly unlikely since no opengl game has ever done exclusive fullscreen since Windows 8 I believe.

Also funnily enough, just blitting opengl fb into DirectX already have quite decent FPS improvement which doesn't make sense either on my testing few other that uses my dll hook for a specific rhythm game.

@mirh
Copy link

mirh commented Nov 19, 2020

which is fairly unlikely since no opengl game has ever done exclusive fullscreen since Windows 8 I believe.

You can do exclusive fullscreen thanks to drivers hacks, it just takes a very specific window style and dimension.

@lybxlpsv
Copy link

lybxlpsv commented Nov 19, 2020

which is fairly unlikely since no opengl game has ever done exclusive fullscreen since Windows 8 I believe.

You can do exclusive fullscreen thanks to drivers hacks, it just takes a very specific window style and dimension.

Most of the time, it looks "exclusive fullscreen" but actually not and still exhibits the display latency.
Would be interesting if you share your findings, e.g. how you style the hwnd so that NV Optimus laptops on Windows 10 actually goes direct exclusive fullscreen. I've never been successful on this.
Also I quickly hacked up SDL2 for funsies. This will use DX11 with few caveats : no alt tab, no vsync, etc (read usage.txt) cuz I'm too lazy to do D3D Device Lost and stuff. Might be useful until Angle drops.
Edit : but then I quickly realized the mouse drift with raw input on SDL2, oh well..

@mirh
Copy link

mirh commented Nov 19, 2020

WS_POPUP (and/or WS_EX_TOPMOST), plus your window and framebuffer dimensions should match the monitor's.
At least in a single GPU scenario, that should do it for you and opengl.

@lybxlpsv
Copy link

WS_POPUP (and/or WS_EX_TOPMOST), plus your window and framebuffer dimensions should match the monitor's.
At least in a single GPU scenario, that should do it for you and opengl.

that's like what 99% of libraries already does and it has never worked on NV Optimus.
even dragging an imgui window has latency.

@mirh
Copy link

mirh commented Nov 19, 2020

People here and here seemed to report otherwise.
EDIT: duh, though we don't know if (even if they have a laptop) they were actually using optimus, or rather just the "docked" mode that more expensive systems have

@lybxlpsv
Copy link

lybxlpsv commented Nov 19, 2020

People here and here seemed to report otherwise.

From my experience trying to minimize the latency to near zero while attempting to modify other OpenGL rhythm games. The only way I believe it is true and "direct" exclusive fullscreen is with vsync off, move some kind of window or non hardware cursor, if it has near zero lag then I would believe it. not, "oh the screen blackens and i'm in game it must be exclusive fullscreen" or it is "exclusive fullscreen" but it still dwm triple buffer enforced or still have some kind of display latency.

Again the matter of this topic is reducing latency to the absolute minimum and related issues due to NV Optimus.
If you just ignore those issues then yeah, exclusive fullscreen "works" but then many would rather play osu! classic with Angle.

@peppy
Copy link
Sponsor Member

peppy commented Nov 19, 2020

I've removed a binary download link from this thread. Further discussion should focus on actual code changes and improvements to osu-framework to fix the issue in a way we can use going forward. (as mentioned, I do not foresee us doing what has recently been mentioned for lazer).

If you wish to share a change please link a branch on github, not files on your dropbox, thanks!

@lybxlpsv
Copy link

It actually included a source but sdl2 seems to have buggy raw input which sadly makes it unplayable so I didn't bother setting up a repo and cleaning up the code.

It does seem that Angle is likely the cleanest way to go.

DX9/DX11 overlay seems to be hard to be implemented cleanly since it needs modification and more dx related depedencies on OsuTK and iOS like DefaultFramebuffer set on Osu.Framework. Also more dx related work when you resize the window, etc. I don't know if it is sane to have another whole platform just for nv optimus dx11 overlay.

@peppy
Copy link
Sponsor Member

peppy commented Nov 20, 2020

We are already using SDL2 and have a raw input implementation. What are you referring to?

@peppy
Copy link
Sponsor Member

peppy commented Nov 20, 2020

Please drop the subject of direct x support unless you are talking about angle or I will lock this thread.

@lybxlpsv
Copy link

Please drop the subject of direct x support unless you are talking about angle or I will lock this thread.

Ok, I deleted the comment since it likely has nothing do with NV Optimus performance related.

@mirh
Copy link

mirh commented Dec 31, 2022

Sup gentlemen.
I'd really like to point you in the direction of #6075 (comment)
@lybxlpsv was truly onto something and I had my brain disconnected

@peppy
Copy link
Sponsor Member

peppy commented Jul 26, 2023

We've stopped hearing complaints on optimus GPUs, so I'm guessing the new direct x implementation is working well.

Closing as should-be-resolved.

Please feel free to reply in this thread if you can still reproduce an issue.

@peppy peppy closed this as completed Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants