Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement KHR_parallel_shader_compile support #16321

Closed
fernandojsg opened this issue Apr 23, 2019 · 23 comments · Fixed by #19752
Closed

Implement KHR_parallel_shader_compile support #16321

fernandojsg opened this issue Apr 23, 2019 · 23 comments · Fixed by #19752

Comments

@fernandojsg
Copy link
Collaborator

fernandojsg commented Apr 23, 2019

This new KHR_parallel_shader_compile extension will bring shader compilation off to the main thread and even if the total compiling time could remain the same it won't be blocking the main thread which is a huge benefit.

It will require some modifications on the way three.js is handling the compilation and linking of shaders and move it from sync to async. Some ideas have been proposed on the Khronos mailing list and the Khronos repo PR

Just opening the issue so we could discuss ideas on how to address this change.

@mrdoob mrdoob added this to the rXX milestone Apr 24, 2019
@fernandojsg
Copy link
Collaborator Author

fernandojsg commented Apr 24, 2019

Some initial thoughts from my understanding after reading the thread and the spec:

Currently without parallel compiling, the pipeline looks more or less like this:
image
using the extension, even on the worst case with just one thread, it will looks the same and it will take as much time as without the extension, but it won't block the main thread, so we will still have a huge benefit using it.

With the latest debug (false by default) to the status check on the shader compilation we avoid unnecessary blockings WebGLShader.js#L23-L40, but with the extension enabled we will still stall on linkProgram WebGLProgram.js#L586-L605
In this case, without modifying anything and enabling the extension we could get benefits of parallel compiling the fs and vs shaders, while linkProgram will wait for them to complete, looking similar to this (assuming 2 threads):
image

In order to get the best of it we should group the compilation and linking, as @jdashg proposed on KhronosGroup/WebGL#2855 (comment):

for (const x of shaders)
   gl.compileShader(x);
for (const x of programs)
   gl.linkProgram(x);

Using this way we could get a timeline similar to:
image

Grouping the shaders compilation together could be easy enough to achieve as we could just compile them as they go, while grouping the linkProgram will need substantial modifications on the code, to queue them and call them after all the compile shader calls have been executed.
It could be done using also the approach proposed by Jeff:

function* linkingProgress(programs) {
   const ext = gl.getExtension('KHR_parallel_shader_compile');
   let todo = programs.slice();
   while (todo.length) {
      if (ext) {
         todo = todo.filter(x => !gl.getProgramParameter(x, ext.COMPLETION_STATUS_KHR));
      } else {
         const x = todo.pop();
         gl.getProgramParameter(x, gl.LINK_STATUS);
      }
      yield 1.0 - (todo.length / programs.length);
   }
}

Introducing this asynchrony we need to define a isReady() function to determine if the compilation and linking has been finished or not, similar to how babylon is doing currently.

We should use that isReady() function on the WebGLRenderer's AnimationLoop so we could avoid executing it until isReady() returns true.
Although it could be interesting to add that check on WebGLRenderer::render().
At the same time it should be handy to provide a callback so users could be notified when this happens in case they are not using setAnimationLoop and rely on a requestAnimationFrame/animate() loop instead.

@aardgoose
Copy link
Contributor

I've prototyped splitting the compilation phase and link phase of programs to see if you can get much benefit from using the fact the Chrome compiles shaders asynchronously in a separate GPU thread rather than the main JS thread. not very useful, but the same mechanism could be used with parallel compile.

There are less mods than you might expect, primarily initMaterial() needs splitting into two halves since the first half setups up the environment for creating a new program if required and the second half which relies on interrogating the program state so in effect serialises with the compilation and linking.

WebGLProgram again needs two parts and an isReady status that would check compilation completion. This avoids the need for global lists of shaders and programs in various states and maintains the information in WebGLProgram() objects where is belongs.

One interesting question then with a parallel compilation mechanism:

You don't have an atomic change of scene after adding several new materials. Some meshes or parts of meshes with multiple materials, would display initially in different frames when intended to display at the same time which could be ugly especially for very slow compiles. How do you handle this? Preserving the atomic nature could be difficult.

@fernandojsg
Copy link
Collaborator Author

@aardgoose cool! do you have any PR open already for that? It could be interesting to see that approach.

Regarding the atomic change, I agree that things always get weird with async in place. We will need to add some validation before using a material that is not fully ready yet, and probably we could include some helpers for the case when you want to load multiple materials like an explosion effect and you want to render it just when all of them are ready.
Maybe an attribute to the object that you are going to render, or a three-state value on ready: ready, not ready, ready but waiting for a friend program to finish :).

@takahirox
Copy link
Collaborator

Personally started trial implementation to see the behavior. If anyone is interested in

Branch
https://github.com/takahirox/three.js/tree/ParallelShaderExtension

Diffs
https://github.com/mrdoob/three.js/compare/dev...takahirox:ParallelShaderExtension?expand=1

I need to clean up, optimize, and still take care some stuffs but it seems working on Canary.

@takahirox
Copy link
Collaborator

FYI, Chrome Canary which is the only browser currently supporting KHR_parallel_shader_compile extension might have the extension performance issue now. gl.getProgramParameter(program, ext.COMPLETION_STATUS_KHR) is slow because it seems to wait for compile/link completion although it shouldn't wait for.

https://www.khronos.org/webgl/public-mailing-list/public_webgl/1904/msg00042.php
https://bugs.chromium.org/p/chromium/issues/detail?id=957001

@takahirox
Copy link
Collaborator

FYI, COMPLETION_STATUS_KHR performance issue seems to have been resolved on Canary https://www.khronos.org/webgl/public-mailing-list/public_webgl/1905/msg00007.php

@takahirox
Copy link
Collaborator

Still WIP but I want to share so far that I locally confirmed KHR_parallel_shader_compile with some optimizations improves the frame rate dropping on application start up.

https://twitter.com/superhoge/status/1128438246638333953

@aardgoose
Copy link
Contributor

FWIW I have a prototype patch for Firefox adding support for KHR_parallel_shader_compile which appears to work quite well, it certainly reduces jank although not quite as smooth as Chrome canary., with the basic Khronos demo/test:

https://www.khronos.org/registry/webgl/sdk/tests/performance/parallel_shader_compile/

The main overhead now on the main thread with FF appears to be the shader translation/validation before submitting to the GPU driver.

Performance trace alternating serial and parallel passes of the above test:

Annotation 2019-05-19 190807

@PWhiddy
Copy link
Contributor

PWhiddy commented May 21, 2019

Very excited to see progress on this! Looking forward to this feature

@aardgoose
Copy link
Contributor

An alternative take.

The Khronos WebGL group recommend not checking shader compilation status, but just doing:

  • compile shaders
  • link program
  • check link error and only check shader errors if link fails

for all cases, not just using KHR_parallel_shader_compile.

The link operation serialises automatically on the compiles in the background. So at least with Chrome you get the benefits of any built in parallel/asynchronous operation that already exists. Firefox actually checks retrieves the completion status and logs as part of the compile and link operations and caches for later use at the moment, so no real benefit there.

So I have removed error checking and display from WebGLShaders.js and reworked it in WebGLProgram.js, this could be submitted upstream without changing functionality.

https://github.com/aardgoose/three.js/tree/parallel1

Building on this to enable KHR_parallel_shader_compile, but now only needings two states rather than three.

https://github.com/aardgoose/three.js/tree/parallel2

Still some minor issues, probably something that should be enabled on a renderer or material basis as required rather than by default. Tested with Chrome canary and my hacked version of Firefox.

@takahirox

@drone1
Copy link

drone1 commented Sep 2, 2019

My site freezes on load for a second or two. I can delay compilation, but the freeze is unavoidable, and the shader in question isn't even that many lines of code (500?). Anyway would love to see this in THREE.js.

@toji
Copy link
Contributor

toji commented Jun 29, 2020

Been looking into this recently. Put up a pair of PRs (Primarily #19752, which depends on #19745) that offer one way of taking advantage of parallel shader compilation, though to get the most benefit out of it apps would need to make a fairly minor change to their loading code. If you've been following this issue feel free to let me know how/if that approach works for you!

@drone1
Copy link

drone1 commented Jun 29, 2020

Thank you so much for doing this work! I've been dying for this feature for ages.

@monfera
Copy link

monfera commented Jan 3, 2022

I'd like to understand a bit more about the current status of KHR_parallel_shader_compile across browsers:

Just like 2.5 years ago, Chrome and Firefox do not support this extension. Only Safari supports it (on OS X and iOS alike, which is nice, therefore in effect for all iPhone and iPad browsers).

Also, several folks elsewhere claimed that not much work is done by compileShader itself, pointing to linkProgram incurring all the work. Did anyone measure significant parallelism/speedup with the KHR_parallel_shader_compile extension on Safari, where the extension is available?

Is it still good advice to compile all shaders in bulk, and then link all programs in bulk, irrespective of the extension?

PS. With or without the extension, Safari seems to be pathologically slow, tested with simple/short es300 shaders. Each program, consisting of 1-2 dozen vertex shader lines and 1 dozen fragment shader lines took 9ms. On Chrome and Firefox, it took about 1/100th of that (10 shaders took about 1ms).

@toji
Copy link
Contributor

toji commented Jan 4, 2022

Just like 2.5 years ago, Chrome and Firefox do not support this extension. Only Safari supports it (on OS X and iOS alike, which is nice, therefore in effect for all iPhone and iPad browsers).

Chrome supports KHR_parallel_shader_compile on Windows, though I'm not seeing it on Mac. Presumably that's due to the extension being implementable on ANGLE, while a native OpenGL backend would have to support the extension directly and Apple's drivers likely do not. (Apple's implementation in Metal-based, IIRC, so they could implement the extension themselves.)

I don't see support in Firefox on any platform.

As for performance gains, I have some screenshots in #19752 (comment) that show the type of difference we're potentially talking about. It's not an absolute speedup in this case, but the ability to defer blocking calls until they can spend the minimal amount of time blocking. (Meaning less jank on the main JS thread.) Also, despite the method name the extension doesn't just cover compileShader but the entire program creation, including linking.

@gkjohnson
Copy link
Collaborator

For three-gpu-pathtracer this would be really nice. The path tracing shader is very complex and is taking upwards of 12 seconds to compile on my desktop machine (2070 super, i7 7700k), which seems to affect other tabs and other system responsiveness. There are some other things I think that can be done to improve that time such as removing shader code not needed for the materials being rendered but either way I think an async compile would be a significant improvement. Then it would be able to be more easily compiled alongside other tasks such as file loading, processing, and BVH generation.

It looks like this will work everywhere important, now, outside of FF:

https://developer.mozilla.org/en-US/docs/Web/API/KHR_parallel_shader_compile

@aardgoose
Copy link
Contributor

I have an old patch for FF to implement parallel shader compilation that passes the Khronos WebGL test suite. I expect it would need updating.

https://phabricator.services.mozilla.com/D83315

@aardgoose
Copy link
Contributor

@gkjohnson

What sort of API would you find useful?

@gkjohnson
Copy link
Collaborator

What sort of API would you find useful?

I think something like renderer.compileAsync( scene, camera ) similar to the compile function would be good. It might be nice to have some variant that just takes a material, too, if possible. My specific use case will involve compiling a material that's used with a FullScreenQuad so it won't necessarily be directly attached to a mesh or scene to be compiled but that's easy enough to work around.

@aardgoose
Copy link
Contributor

I'll knock something up along those lines.

@toji
Copy link
Contributor

toji commented Aug 23, 2022

Worth nothing that compileAsync is what I introduced in #19752 (comment), but it's been difficult to get traction for it or it's dependency for whatever reason.

I'm happy to rebase those and ensure they still work if there's a realistic chance of them being merged.

@gkjohnson
Copy link
Collaborator

@toji Yes I remember that PR now - I would like to see that merged. Unfortunately this touches a sensitive part of the project code that I'm unfamiliar with so I don't feel confident providing feedback.

@drone1
Copy link

drone1 commented Sep 10, 2022

Sorry to poke here, but any suggestions on next step(s) to get this blessed, majestic chunk of code merged? It would be a shame to waste what seems to some really high-value work. If this works well it may provide a much smoother user experience for many applications. Thank you.

@mrdoob mrdoob removed this from the r??? milestone Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

11 participants