Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC - support KHR_parallel_shader_compile #16662

Closed
wants to merge 32 commits into from

Conversation

aardgoose
Copy link
Contributor

Adds a mechanism to use parallel shader compilation (available for testing in Chrome canary).

To use set material.parallelCompile = true (false by default)

Changes to WebGLRenderer allow initMaterial() to bail out pending completion of shader compilation/linking, objects using materials that aren't ready are skipped during rendering passes.

@takahirox
Copy link
Collaborator

takahirox commented Jun 3, 2019

Did you confirm any gl.get*(), gl.useProgram(), and other gl calls don't block CPU with this change? And if so which platform did you test, Windows, Mac, or Linux?

Here on my Windows10 Chrome Canary, I needed further tuning involving complex change in renderer, because of ANGLE(?) problem, to accomplish non-blocking GL calls. That's the reason I didn't make PR yet.

Refer to https://www.khronos.org/webgl/public-mailing-list/public_webgl/1905/msg00017.php for the detail.

If you use Mac, you may not see this problem. If working fine on Mac, good to know.

@aardgoose
Copy link
Contributor Author

@takahirox

Tested on Chrome Canary on Windows 10 and Firefox which I have hacked to support parallel compile.

It avoids calling any of the potentially blocking calls useProgram() etc until the COMPLETION_STATUS_KHR result is true (the program.ready boolean check) and which is why it bails out half way through initMaterial().

I've hacked a basic test from an existing demo which uses this and shows the expected change in frame rate in the devtools performance.

https://github.com/aardgoose/three.js/tree/parallel3

@takahirox
Copy link
Collaborator

takahirox commented Jun 3, 2019

The performance problem I saw is a program link+compilation even blocks another program's gl.get*() and gl.useProgram().

Discussion in WebGL mailing list
gl.useProgram(program) test
gl.getProgramParameter(program, gl.ACTIVE_ATTRIBUTES) test

As you see, somehow gl.useProgram() issue seems to have been resolved but gl.getProgramParameter() seems to still have the problem. So I needed to check no program is compiled+linked to call gl.get*().

I've hacked a basic test from an existing demo which uses this and shows the expected change in frame rate in the devtools performance.

Interesting... Could you make live demo for me?

@aardgoose
Copy link
Contributor Author

I see what you mean with your test cases and get the same results with Chrome. It's not an issue I've observed, but I think that is because the compile/links are fitting into the interframe interval so it doesn't show up.

I think it is useful getting the code visible, and I would imagine this restriction will be removed, so I am not sure it is worth serialising material compilations yet.

Interestingly in my Firefox build I get 1ms times in both your test cases, but obviously that is not generally available, but is using ANGLE as a backend so it doesn't look an ANGLE problem. I wonder if the number of threads available allocated is an issue here.

@takahirox
Copy link
Collaborator

Before going forward, let me confirm one thing.

I see what you mean with your test cases and get the same results with Chrome.

I've hacked a basic test from an existing demo which uses this and shows the expected change in frame rate in the devtools performance.

On Canary you can reproduce the performance issue I reported but also you see the performance improvement with this change? If so, that sounds interesting.

I thought Canary, which is only the browser supporting KHR_parallel_shader_compile extension tho it's behind pref, requires complex tuning to get non-blocking benefit. But yes, we may not want to do such special tuning. So I thought of waiting for browser-side performance problem is resolved or another browser starts to support without any performance problem.

And that's the reason I didn't make PR yet. Instead I thought of sending feedback to chrome and working together to fix the problem (of course working example like this PR would be helpful). But if you see the performance gain now, I may need to change my mind.

@aardgoose
Copy link
Contributor Author

I am seeing some benefits on Chrome and have added a mechanism to prevent overlapping compile/link operations without further changes to WebGLRenderer to address the GetProgramParameter issue/limitation. I have also added instrumentation to track how many frames the compile/link takes.

Chrome caches shader compilations so the time to compile decreases after the first run.

Re Firefox, I have patched it to support KHR_parallel_shader_compile (and submitted the patches for consideration, after making sure there were no regressions on the Khronos webgL tests), this does not exhibit the the issues you identified with Chrome canary.

Have you raised a bug for Chrome, I couldn't see one in the Chromium bug tracker?

@takahirox
Copy link
Collaborator

Sorry I'm a bit busy. I'll comment later, hopefully this weekend.

@mrdoob mrdoob added this to the rXX milestone Jun 18, 2019
@@ -70,6 +70,7 @@ function Material() {
this.premultipliedAlpha = false;

this.visible = true;
this.parallelCompile = false;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this boolean needed?
What are the downsides of using KHR_parallel_shader_compile by default?

Copy link
Collaborator

@takahirox takahirox Sep 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A material can be ready a few animation frames away since the material is added to the scene. That means The mesh can't show up in the first few animation frames. So if it's true as default

  1. It breaks the compatibility.
  2. It can be problematic for applications which don't use animation frame, for example calling renderer.render() only when something in the application is changed.

Copy link
Owner

@mrdoob mrdoob Sep 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be per material? Or can it be per renderer?

Copy link
Collaborator

@takahirox takahirox Sep 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think moving the flag to renderer would sound simpler.

Copy link
Collaborator

@takahirox takahirox Sep 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the downsides of using KHR_parallel_shader_compile by default?

In addition to what I mentioned above, if it hasn't been updated, there is a chance that the extension may have performance issue on Chrome due to browser side problem now. (Sorry I've been delaying to evaluate and report! I'm happy if anyone evaluates.) Even if it still has performance problem, we may go forward. Chrome may resolve sooner or later.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, may be easier to start with the renderer first, and if people need more granularity we can consider moving to deeper objects.

Copy link
Contributor Author

@aardgoose aardgoose Sep 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your preferred API for a renderer swicth?

A new property of the parameters object or a simple boolean parameter of the renderer itself?

I think the later is simpler, since the renderer is passed as a parameter to WebGLProgram()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have found out an issue with a renderer wide switch.

Some passes like the PMREMGenerator carry out single pass operations which can and do fail with parallel compilation enabled when the compilation does not complete immediately, I imagine there are other cases out there.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could set a global default on Material.DefaultParallelCompile and use that to init each material.

So a user could opt in globally for simplicity but the default would be safe for all the odd corner cases out there.

@takahirox
Copy link
Collaborator

takahirox commented Sep 16, 2019

Regarding the performance problem I mentioned. Sorry I'm delaying the report. I've been a bit too busy and may be still busy in some weeks.

As you may know, I made the prototype in #16321 but realized still some gl calls unexpectedly block CPU. I asked Chrome team and it seems like there are some performance problems.

I was trying further fine tuning by adding work around to avoid the problems in that thread to provide the actual performance benefit to users, I accomplished very small blocking compilation + linking. #16321 (comment)

https://twitter.com/superhoge/status/1128438246638333953

But I realized the code became a bit messy. I don't think we should have such a complex work around and I think this performance problem should be fixed in the browser side.

So, I haven't looked into the detail implementation in this PR yet but the basic idea looks same as what I did. If we decide the flag API and unless we see any other problems, I think we can go ahead. We can keep working with Chrome team later to fix the problem (if needed. There is a chance that the problem has been already fixed.)

Anyways, I'll evaluate and report the performance here later when I'll have time. Sorry for blocking KHR_parallel_shader_compile to Three.js for a while!

@aardgoose
Copy link
Contributor Author

I worked around the chrome issue you described by serialising shader compile/links and only allowing 1 program to be compiled at a time. This avoids the blocking issues for now and could easily be removed if required, the changes are confined to WebGLProgram.

This is controlled with the maxParallel variable at the head of WebGLProgram.js. This may actually be useful to limit parallelism, although I think Chrome limited it to 1 background compile at a time.

There is a debugging console.log() which displays how many render frames a compile/link takes.

@JordyvanDortmont
Copy link
Contributor

Would be great if this was merged! In our case it takes about 3 seconds for shaders to compile (Chrome, Windows).

Can I try working on this PR or is there something that blocks this from being merged?

@takahirox
Copy link
Collaborator

It seems the performance issue on Chrome I mentioned seems being resolved https://twitter.com/gfxprogrammer/status/1301246609129222144

@HexaField
Copy link
Contributor

Are there any plans to ressurect this? It should still give a significant performance boost.

@aardgoose
Copy link
Contributor Author

Hi,

There hasn't been noticeable interest in this, the main issue is what behaviour is expected when a model/scene is using multiple materials with shaders that take various amounts of time to compile. As a result the display of the scene will be spread over a number of frames rather than as a single atomic event.

@Mugen87 Mugen87 removed this from the r??? milestone Oct 14, 2023
@Mugen87
Copy link
Collaborator

Mugen87 commented Oct 14, 2023

Closing in favor of #19752.

@Mugen87 Mugen87 closed this Oct 14, 2023
@aardgoose aardgoose deleted the parallel2 branch October 19, 2023 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants