Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shader compilation very slow in Compatibility renderer #86731

Open
djrain opened this issue Jan 3, 2024 · 8 comments
Open

shader compilation very slow in Compatibility renderer #86731

djrain opened this issue Jan 3, 2024 · 8 comments

Comments

@djrain
Copy link

djrain commented Jan 3, 2024

Tested versions

  • Reproducible in: 4.2.1 stable
  • NOT reproducible in 3.5.2 stable

System information

macOS Ventura, Android 13 - compatibility

Issue description

Showing/loading CanvasItem shaders for the first time is much slower when using Compatibility renderer compared to Forward+ or Mobile. It occurs on macOS (2020 Mac Mini M1) and Android (Google Pixel 4 XL), but not on my iPhone 14 Pro Max (compatibility actually seems to be faster there). Unable to test Windows.

Here it is on mobile mode on Mac, a short stutter

mobile.mov

and compatibility, much longer stutter

compat.mov

Steps to reproduce

Run MRP on Mac or Android. After 1 second, the shaders will load and display

Minimal reproduction project (MRP)

ShaderCompilationIssue.zip

@clayjohn
Copy link
Member

clayjohn commented Jan 3, 2024

Testing the MRP on my system (PopOS 22.04, intel integrated GPU) and I can reproduce the difference in perforamnce between mobile and compatibility. On the mobile backend the compilation happens instantly, in compatibility there is a brief stutter (but not a second long). We definitely need to investigate further

@djrain
Copy link
Author

djrain commented Jan 6, 2024

Not sure if it's closely related, but we were just testing our game on iPhone, and in compatibility mode the performance was just worse in general (like 70 fps) and switching to mobile brought it back up to 120 (could this be from 2D lights?) Also, having any GPUParticles2D just crashes in compatibility, which I know is expected on mac, but I don't recall this being a problem on iOS before.

@djrain
Copy link
Author

djrain commented Jan 22, 2024

@clayjohn no rush, but any leads on this? We're launching our mobile game very soon, and we'd really prefer to use Compatibility, but the load times currently are not workable...

@clayjohn
Copy link
Member

Sorry, no updates from my end. This is a challenging problem as it stems from the difference in drivers on the different platforms. MacOS has very good Metal drivers and bad OpenGL drivers. OpenGL shader compilation on MacOS tends to be slow and there isn't much room to work around it. We have discussed some big changes to the shaders that might help, but it will be significant amount of work, and may not do much.

Android can be in a similar boat. Some devices have really good OpenGL support and others don't. So from a sample size of 1, its tough to say whether the problem is with Godot or with drivers again. Unfortunately, even if the problem is Godot, the solution might not be the same for every device. Making a change to speed up compiling on one device may slow down compiling on another device. So the process to make fixes to this has to, by nature, be quite slow and careful.

In all cases OpenGL limits shader compilation to the rendering thread. So compiling shaders will necessarily stall the renderer. With Vulkan/Metal, we compile the shaders on multiple threads and we do it before draw time.

Finally, for CanvasItem shaders, I'm not aware of any obviously problematic sections for compile times. The scene shader has some lighting paths that are low hanging fruit to optimize to bring compile times down, but for canvas shaders, there is nothing obvious, so again, it will be a lot of investigation to figure this one out.

@djrain
Copy link
Author

djrain commented Jan 22, 2024

Well I hadn't tried until just now, but I'm not able to reproduce the issue in Godot 3.5.2 with GLES3, on this same Mac - there's just a very brief stutter, same as I get in Vulkan mobile. Wouldn't this suggest that it's Godot, not a driver issue?

Also, 2 entire seconds to compile the simplest possible shader on an M1 Mac (see my second screen recording) seems beyond slow, that's like straight up broken...

@clayjohn
Copy link
Member

Well I hadn't tried until just now, but I'm not able to reproduce the issue in Godot 3.5.2 with GLES3, on this same Mac - there's just a very brief stutter, same as I get in Vulkan mobile. Wouldn't this suggest that it's Godot, not a driver issue?

Yep. That's helpful to know.

Also, 2 entire seconds to compile the simplest possible shader on an M1 Mac (see my second screen recording) seems beyond slow, that's like straight up broken...

M1 Macs don't support OpenGL natively. Apple dropped support for OpenGL a long time ago. We have put a lot of effort into working around the buggy OpenGL drivers on newer Macs, but it is a lot of work, and very few rendering contributors develop on M1/M2 Macs. So its pretty hard for us to find workarounds for Apple's buggy drivers. That specific example is a driver bug that we are aware of and have flagged for Apple, but there is little chance it will be fixed by them, so we need to find a workaround and ship a patched version of the shader ourselves.

Its good to know that 3.5.2 doesn't trigger that driver bug, as that will give us a lead on where to look for a workaround.

@djrain
Copy link
Author

djrain commented Jan 23, 2024

Cool, really appreciate all your efforts, just let me know if I can help with anything.

@clayjohn
Copy link
Member

clayjohn commented Jan 24, 2024

#87553 significantly improves the situation here, but I think there are more things to investigate both for cached and non cached runs.

For cached runs:

  1. We save the entire cache every time we compile a new specialization, this is super slow and can result in paying the full cost of caching multiple times (consider multiple smaller caches, or just not caching specializations) at all

For non cached runs:

  1. We have 5 modes by default, we probably don't need so many and should move more to specializations (instancing and ninepatch for sure). This would cut initial compile time by 40%
  2. Investigate another method for packing light data. Right now the light loop is likely being unwrapped which makes compiling really slow (and probably doesn't help performance for us)
  3. Reduce size of max lights (consider a very small limit by default) (this likely helps on web more than anywhere else)

Updated MRP with a simple mechanism for measuring the stall
ShaderCompilationIssueWithMeasurements.zip
ShaderCompilationIssueWithSky.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants