Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebGLRenderer: Add compileAsync(). #19752

Merged
merged 7 commits into from
Oct 14, 2023
Merged

WebGLRenderer: Add compileAsync(). #19752

merged 7 commits into from
Oct 14, 2023

Conversation

toji
Copy link
Contributor

@toji toji commented Jun 29, 2020

TL;DR: Provide a function like renderer.compile() that notifies a callback when the given object can be added to the scene with minimal blocking on shader program compiling/linking.

This proposal builds on #19745, would fix #16321, and is an alternative to #16662.

The longest blocking operations that tend to happen when an object is first added to a scene are textures uploads and shader compiles. While texture uploads can be dramatically reduced by either using ImageBitmaps (see #19518) or compressed textures, shader compiles are already asynchronous and the best practice for reducing the time that they block is to give them time to finish prior to first use. If the KHR_parallel_shader_compile extension is available we can do even better by polling to discover when the compile is finished prior to it's first use.

#19745 takes a step in the right direction by deferring any potentially blocking shader calls until the first time the shader is actually used for rendering, but Three's current architecture is such that most shader programs are compiled/linked immediately before their first use, which forces the worst-case scenario and causes the first query from the program (for example, a getProgramInfoLog() or getUniformLocation() call) to block while the compile finishes. You can see an example here recorded from the GLTF Loader example. Notice that the getProgramInfoLog() call takes 49ms

no-precompile

This PR suggests adding a new method to the WebGLRenderer which functions similarly to the existing compile() method with a couple key differences: It supports passing objects that are not yet part of the scene, and takes a callback which will be called when the shaders for the passed objects have finished compiling and can be added to the scene with minimal blocking.

This would functionally look something like this:

loader.load( objectUrl, function ( loadedObject ) {
  // Notify the callback when loadedObject can be added to scene without blocking.
  // Not a fan of the name, though. :P
  renderer.compileTarget( scene, loadedObject, function() {
    scene.add( loadedObject  );
  } );
} );

Using this pattern and the code in this PR I get the same first load shown above looking like this:

with-precompile

Notice that the getProgramInfoLog() call that previously took 49ms has shrunk so much that you can't even see it without zooming in. It's now under 2ms. You just got back ~3 frames of stalled renderer! (Especially important for XR uses.)

with-precompile-zoom

(Also notice in the wider view of the timeline that the PMREM generator block, which is the first large column, is significantly faster too. That's actually a side effect of #19745, but ultimately it's following the same principle.)

To preemptively address a couple of questions that I feel will come out of this:

Why not update the existing compile() function to support this?

This is a very viable option! But in order to maintain backwards compatibility you're either grappling with some very non-intuitive argument ordering or forcing people using the call to attach the new object to the scene, call compile(), then immediately detach it again and wait for the callback to once again re-attach it. 😝 Otherwise they're doing very similar things.

Why doesn't this method take a camera like compile()?

Turns out that's not really necessary for what this method and compile() are doing, which is simply preloading the material shaders. The camera is being passed to the light setup, which IS very necessary for the shaders, but it's only being used to compute the view matrix which only really affects specular computation and doesn't have any bearing on the shader compilation. FWIW I think compile() should drop the camera argument too.

What about adding a parallelCompile property on the material, like in #16662?

This was my first inclination as well, but it suffers from several drawbacks enforced due to Three's architecture. For one, it breaks the model of objects being visible as soon as they're added to the scene. They just "show up" at some point and the developer has no idea when. That's fine for some uses, not for others. Also, crucially, unless great lengths are taken to prevent it any changes to the scene's lighting setup will cause another parallel compile, during which the objects will flicker out of existence for a few frames. This is a far more objectionable artifact. Finally, some apps (like the glTF loader example) only draw when the user is interacting with the scene (like rotating the camera). Without an active animation loop the parallelCompile approach doesn't have a way of indicating that the object is ready for drawing and thus you get stuck with an empty scene until you click and drag, which is clearly unwanted behavior.

By making this an explicit call, the developer has exact control over what objects they are willing to wait on and what should happen when they're ready. It doesn't "magically" make existing apps better, but does offer an optimization path for devs that care about it.

src/renderers/WebGLRenderer.js Outdated Show resolved Hide resolved
src/renderers/WebGLRenderer.js Outdated Show resolved Hide resolved

}

let foundTarget = scene === target;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some effort made here to allow cases where the target is already attached to the scene (or IS the scene), but I'm not sure how useful that is? Could simplify things a tiny bit if that case was deemed unnecessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I have removed the second scene parameter and exchanged it with camera so compile() and compileAsync() both have the same signature and perform layer testing for lights.

src/renderers/WebGLRenderer.js Outdated Show resolved Hide resolved

// if some materials are still not ready, wait a bit and check again

setTimeout( checkMaterialsReady, 10 );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably ideally be a rAF, but I'm not sure what the best pattern is for syncing up with any existing render loop is and I wanted to keep this PR simple initially for the sake of discussion.


}

if ( extensions.get( 'KHR_parallel_shader_compile' ) !== null ) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check could easily go away and be replaced by an arbitrary call to checkMaterialsReady() every time. The downside to that is that systems which don't support KHR_parallel_shader_compile will immediately report that every program is ready and thus try to render with it right away, thus erasing some of the potential benefits. I figured that if someone was calling this method at all they would be OK with some delay, so we might as well give the compiler a frame's worth of breathing room before we notified the callback. Wouldn't eliminate stalls on those browsers, but it would still reduce them somewhat for developers using this pattern.

src/renderers/WebGLRenderer.js Outdated Show resolved Hide resolved
// program as ready immediately. It may cause a stall when it's first used.
let programReady = !parameters.rendererExtensionParallelShaderCompile;

this.isReady = function () {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method, however, is new.

@sciecode
Copy link
Contributor

Great PR @toji !

Unfortunate that we can't make use of parallel compiling for the regular workflow without major architectural changes, but at least having the option to pre-compiling is already very nice. I would love to have this and the related PR reviewed and merged sooner rather than later.

IMO we should also emphasize this option in as many examples as possible, the net advantage of using parallel compiling is huge!

@toji
Copy link
Contributor Author

toji commented Jul 26, 2020

Review ping? Looks like I need to rebase but I'd appreciate some feedback on the direction before putting too much more effort into it.

@takahirox
Copy link
Collaborator

The performance problem on Chrome I mentioned in #16662 seems being resolved.

https://twitter.com/gfxprogrammer/status/1301246609129222144

I think it's good time to go for KHR_parallel_shader_compile support.

@drone1
Copy link

drone1 commented Jan 9, 2021

@mrdoob Any chance we can get this PR reviewed/merged? If not, any feedback for @toji? It's been a long time and I semi-desperately need this feature for my app.

Thank you so much.

@toji
Copy link
Contributor Author

toji commented Dec 15, 2021

(@mrdoob: This is the patch we were talking about last week in the office.)

I've rebased this and made sure it works against the newest Three, and refactored the signature for the function a bit. Now it's called 'compileAsync', takes the object to compile first, the scene it will be added to as an optional second parameter, and returns a promise when the object can be added to the optional scene (or rendered as the scene, if no explicit scene was given) without waiting on compilation.

Example:

renderer.compileAsync(newObject, scene).then(() => {
  scene.add(newObject);
});

Plus I can happily confirm that testing after rebasing shows that it still offers significant benefits! And for whatever reason, it's even easier to see the difference in Chrome's dev tools now. Consider this profile of the webgl_loader_gltf_transmission example, before and after:

Screenshot 2021-12-15 152518

You can see that, in total, the time it takes to get to the first render of the object after load is almost exactly the same, with the biggest difference being that the largest blocking task gets broken up into a couple of smaller chunks with a nice gap in the middle that allows a few more frames to be drawn while the shaders compile. This is great for keeping the page interactive during loads! (The remaining big blocking calls are mostly texImage2D(), which is a different problem entirely and can be mitigated by the developer through use of compressed textures.)

@toji toji marked this pull request as ready for review December 15, 2021 23:49
@elalish
Copy link
Contributor

elalish commented Dec 16, 2021

Yes, Yes, Yes! We need this @mrdoob!

// Wait for all the materials in the new object to indicate that they're
// ready to be used before resolving the promise.

return new Promise( ( resolve ) => {
Copy link
Owner

@mrdoob mrdoob Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of minor but... is it possible to use async/await instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is in this case. The problem is that this kicks off a polling loop with setTimeout(), which of course is callback based. If we want to treat it as a promise we have to explicitly wrap it in a Promise() constructor.

This does still allow developers to call the function as await renderer.compileAsync(....); though! And I've updated the two examples this was used in to demonstrate exactly that, because it's a much nicer pattern.

@mrdoob
Copy link
Owner

mrdoob commented Jan 6, 2022

Ops, I broke webgl_loader_gltf when resolving conflicts...

@toji
Copy link
Contributor Author

toji commented Jan 6, 2022

Rebased to fix the webgl_loader_gltf breakage, and used await in the examples to set a nicer precedent.


// Only initialize materials in the new scene, not the targetScene.

scene.traverse( function ( object ) {
Copy link

@avaer avaer Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add scene overrideMaterial support? It seemed to work as expected when I did the naive thing:

const _compileMaterial = function ( material, object ) {
	if ( Array.isArray( material ) ) {

		for ( let i = 0; i < material.length; i ++ ) {

			const material2 = material[ i ];

			getProgram( material2, targetScene, object );
			compiling.add( material2 );

		}

	} else {

		getProgram( material, targetScene, object );
		compiling.add( material );

	}
};

if (scene.overrideMaterial) {
	scene.traverse( function ( object ) {
		const material = object.material;

		if ( material ) {
			_compileMaterial(scene.overrideMaterial, object);
		}
	});
} else {
	scene.traverse( function ( object ) {

		const material = object.material;

		if ( material ) {

			_compileMaterial(material, object);

		}

	} );
}

@marcofugaro
Copy link
Contributor

This would be useful to have!

@Ben-Mack
Copy link

Yes, we've been waiting for this quite some times, are there any concern or blocking problem on this PR? @mrdoob @Mugen87

@Mugen87
Copy link
Collaborator

Mugen87 commented Oct 12, 2023

I did approve #19745 which should be merged first but I guess it was overlooked in the past. It should be no problem to create a new PR based on #19745 and resolve the merge conflicts. I can do that if @mrdoob approves 👍 .

@toji
Copy link
Contributor Author

toji commented Oct 12, 2023

I'm happy to resolve the conflicts in #19745 if @mrdoob is going to be able to approve it. Should be relatively trivial. Then I can resolve this one and we can focus on landing it.

EDIT: Resolved the conflict on #19745. I'll do this one if that can get merged.

@marcofugaro
Copy link
Contributor

marcofugaro commented Oct 12, 2023

As an user, shader compilation that blocks the main thread is always a problem in real-world projects (websites with various animations usually).
I think users would appreciate this feature.

The guys from Needle implemented it already in their engine too:
https://twitter.com/marcel_wiessler/status/1612461834904141828

@Mugen87 Mugen87 added this to the r158 milestone Oct 13, 2023
The compileAsync method adds a way to allow apps to wait on shader
compilation before adding objects to a scene.
@github-actions
Copy link

github-actions bot commented Oct 13, 2023

📦 Bundle size

Full ESM build, minified and gzipped.

Filesize dev Filesize PR Diff
651.5 kB (161.4 kB) 652.3 kB (161.6 kB) +832 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Filesize dev Filesize PR Diff
444.2 kB (107.4 kB) 445 kB (107.6 kB) +833 B

@toji
Copy link
Contributor Author

toji commented Oct 13, 2023

Rebased now that #19745 has landed! Thanks @Mugen87!

Brief testing shows that there is still the expected wait for compile to happen, but it looks like there's also been some shifting around of how compile() works since the last time I rebased this. Since I'm mostly trying to mimic what that's doing with the extra wait on the materials, but I'd love some sanity checking from anyone who's more familiar with Three's material system.

@Mugen87
Copy link
Collaborator

Mugen87 commented Oct 13, 2023

but I'd love some sanity checking from anyone who's more familiar with Three's material system.

I'll try to debug both examples using compileAsync() tomorrow and also verify the performance benefits.

@toji toji closed this Oct 13, 2023
@toji toji reopened this Oct 13, 2023
@toji
Copy link
Contributor Author

toji commented Oct 13, 2023

Sorry! Clicked the wrong button! Thanks for taking a look,

@Mugen87 Mugen87 changed the title Allow apps to wait for materials to finish compiling before adding objects to a scene WebGLRenderer: Add compileAsync(). Oct 14, 2023
@Mugen87 Mugen87 merged commit d47a43a into mrdoob:dev Oct 14, 2023
19 checks passed
@Mugen87
Copy link
Collaborator

Mugen87 commented Oct 14, 2023

During testing I feel a conceptual peculiarity with our compile*() methods might be overlooked in this PR. I think it's important to highlight it in order to avoid bad surprises^^.

Pre-compiling only works if the render state during the pre-compilation is equal to the actual render state in the scene. E.g. if you call compile() with a single mesh and later add it to a scene with lights, you will see a new shader compilation. So the call of compile() does actually not prepone the compilation, it adds an additional compilation and thus just produces overhead. That happens because the number of (shadow casting) lights in the scene determines the shader source and thus its internal cache key. You can see this here:

numDirLights: lights.directional.length,
numPointLights: lights.point.length,
numSpotLights: lights.spot.length,
numSpotLightMaps: lights.spotLightMap.length,
numRectAreaLights: lights.rectArea.length,
numHemiLights: lights.hemi.length,
numDirLightShadows: lights.directionalShadowMap.length,
numPointLightShadows: lights.pointShadowMap.length,
numSpotLightShadows: lights.spotShadowMap.length,
numSpotLightShadowsWithMaps: lights.numSpotLightShadowsWithMaps,
numLightProbes: lights.numLightProbes,

So it's best to use compile() and compileAsync() by preparing the scene first (meaning adding lights, environment and 3D objects), call compile*() and then use the scene for rendering.

@aardgoose
Copy link
Contributor

This will align well with WebGPU where compilation is automatically async, and provides promises on completion so rAF games are not needed either.

I was already looking at implementing compile() for the WebGPURenderer, and adding it to the fallback WebGL path shouldn't be complicated either.

Compilation isn't forced in the current WebGPURenderer.render() so calling render() doesn't not necessarily render what you expected or indeed anything.

As an aside, forcing the webGPURender.render() to wait for all shaders to compile is rather messy, because often the frame buffer texture has expired by the time compilation has finished and you get console errors and no output, unless you use GPURenderBundles.

@Mugen87
Copy link
Collaborator

Mugen87 commented Oct 14, 2023

Looking at the semantics of compile*() it makes me wonder if we should revisit the idea from #16662 and add a new flag for WebGLRenderer called asyncShaderCompilation (initially with default to false.) When enabled, KHR_parallel_shader_compile is always used internally.

I have the feeling more users would benefit from KHR_parallel_shader_compile since automatic internal usage isn't that error prone compared to compile() and compileAsync().

@mrdoob
Copy link
Owner

mrdoob commented Oct 14, 2023

@Mugen87 I think asyncShaderCompilation is a better solution than conpileAsync() yeah.

Edit: Specially considering that it'll match the behavior of WebGPURenderer.

@Mugen87
Copy link
Collaborator

Mugen87 commented Oct 14, 2023

I have filed a PR to explore the suggested concept: #26964

@toji
Copy link
Contributor Author

toji commented Oct 14, 2023

FYI, I have removed the second scene parameter and exchanged it with camera so compile() and compileAsync() both have the same signature and perform layer testing for lights.

I can see now how the camera is useful (that seems like a fairly recent addition to compile()? I don't recall seeing it when I first wrote the PR.) So adding the camera is fine, but I'm concerned about removing the second scene from the signature.

So it's best to use compile() and compileAsync() by preparing the scene first (meaning adding lights, environment and 3D objects), call compile*() and then use the scene for rendering.

I think this model significantly reduces the usefulness of the method. Seems like I didn't do a very good job of communicating my intent here, sorry for that.

If you only cared about the first display of the entire scene then you could omit the optional second scene and it would function as you describe.

But my hope with the second scene was that you could use it to say "I plan on adding this object tree to that one. Tell me when I can do so without blocking due to shader compiles." It used the render state from both scenes combined so that if you were loading an individual mesh you could still get all the necessary lighting and other state information.

As a concrete example, on my page https://xrdinosaurs.com I would load the entire environment but the dinosaurs were loaded by the user clicking a button. Because it is typically viewed in VR, avoiding loading jank is critical. Every time the user clicked the button to load a new dinosaur, I would load the glTF mesh, then call compileAsync(dinosaurGltfScene, environmentScene) and wait for it to resolve before attaching dinosaurGltfScene to the environmentScene (which has all the lights, environment maps, etc.)

It's not clear how I would use the API with the updated signature effectively in this scenario. I can't add the dinosaur mesh to the environment and then call compileAsync() because I'm still rendering the environment, and the added mesh will block rendering till it's ready on the next frame. I could copy the entire environment every time I intend to add a new mesh to it, and then do the compileAsync() on the copied scene, but that sounds inefficient and error prone. (Then again I don't really know how expensive it would be. Maybe it's a non-issue?)

The current signature works if the only thing I'm concerned about it initial display of the entire scene, but it seems to me like that's more of a concern for the examples than many larger apps. Similarly, an asyncShaderCompilation flag seems like it would be difficult to use effectively in a real-world app where lights or meshes are changing frequently. I brought this up in the opening comment:

What about adding a parallelCompile property on the material, like in #16662?

This was my first inclination as well, but it suffers from several drawbacks enforced due to Three's architecture. For one, it breaks the model of objects being visible as soon as they're added to the scene. They just "show up" at some point and the developer has no idea when. That's fine for some uses, not for others. Also, crucially, unless great lengths are taken to prevent it any changes to the scene's lighting setup will cause another parallel compile, during which the objects will flicker out of existence for a few frames. This is a far more objectionable artifact. Finally, some apps (like the glTF loader example) only draw when the user is interacting with the scene (like rotating the camera). Without an active animation loop the parallelCompile approach doesn't have a way of indicating that the object is ready for drawing and thus you get stuck with an empty scene until you click and drag, which is clearly unwanted behavior.

@Mugen87
Copy link
Collaborator

Mugen87 commented Oct 14, 2023

Okay, I understand it make sense to add targetScene as an optional parameter back but I would prefer to keep the signatures of compileAsync() and compile() identical. It seems compile() could benefit from targetScene as well, right?

@Mugen87
Copy link
Collaborator

Mugen87 commented Oct 14, 2023

(Then again I don't really know how expensive it would be. Maybe it's a non-issue?)

It shouldn't be expensive since no rendering happens. But yes, the copy/paste operations to reproduce the same render state would be inconvenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement KHR_parallel_shader_compile support