Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks: Update world transforms / updateMatrixWorld() - improve benchmark realism #25113

Closed

Conversation

diarmidmackenzie
Copy link
Contributor

@diarmidmackenzie diarmidmackenzie commented Dec 11, 2022

Related issue: #25115

Description

See related issue for full background.

This PR addresses an issue with the existing Benchmark tests for updateMatrixWorld()

The issue is that the benhmark test generates a completely homogeneous set of Object3Ds, which results in the code iterating over a monomorphic set of objects - which will result in very flattering performance vs. real world usage.

Evidence that this is a real issue is provided by observing that the benchmark added in this PR, with a randomized, heterogeneous set of Object3Ds performs 50% slower than the equivalent homegeneous benchmark.

I don't expect this PR to be merged in its current state.

It's intended to share the code for the new "real world" benchmark test, to highlight the substantial performance gap between homegeneous & heterogeneous benchmark tests, and to open a discussion about what the best parameters would be for a Benchmark test that is representative of real-world use-cases for Three.js.

PR #25114 is related and offers a prototype fix that significantly improves performance on the new benchmark added by this PR.

@diarmidmackenzie diarmidmackenzie changed the title Perf benchmark realism Benchmarks: Update world transforms / updateMatrixWorld() - improve benchmark realism Dec 11, 2022
@diarmidmackenzie
Copy link
Contributor Author

diarmidmackenzie commented Dec 12, 2022

I've done a little more experimentation & exploration on this topic. A few additional points worth noting.

  • There's more to the perf benefits of monomorphism than just cache misses. This article gives a great overview. In particular V8 uses monomporphism as an decision factor in whether or not to apply various other optimizations to hot code.

  • Playing around with different combinations of Object3D classes didn't make much difference. Even a simple 50/50 split of Groups & Meshes seemed to result in a very similar slowdown. As far as I can tell, monomorphic vs. polymorphic is the only significant factor here. Hence it probably doesn't matter much what specific blend is used in the "real world" test is.

  • If I rearrange the tests so that D runs first, it destroys performance for all the tests (even the monomorphic ones). Given what I read i the article that I linked above, that makes sense. Monomorphism is about more than just cache hits / misses. The V8 engine actively decides to perform additional optimizations for monomorphic functions. So it seems that by running test D first, the V8 engine decides not to optimize certain functions, which then hurts performance even for subsequent monomorphic cases.

  • When I run with the fix in updateMatrixWorld Optimization using new Object3DMatrixData class #25114, switching the test order with D first does not have any impact on performance any more. That's nice to see, and provides more evidence that updateMatrixWorld Optimization using new Object3DMatrixData class #25114 is making everything nincely monomorphic.

  • The article linked above suggests that with > 4 types, there might be a further slowdown as the V8 engine moves from a polymorphic cache of up to 4 entries to a "megamorphic" implementation. I tried testing with up to 6 different Object3D sub-classes, but I didn't see any significant additional slowdown.

Some other articles I found on the topic (googling "megamorphic V8") that look pretty useful /relevant:
https://erdem.pl/2019/08/v-8-function-optimization
https://marcradziwill.com/blog/mastering-javascript-high-performance/

@diarmidmackenzie
Copy link
Contributor Author

diarmidmackenzie commented Dec 12, 2022

More experimentation, and I have found another factor that makes a big difference.

I've done a couple of types of tests:

  • monomorphic tests with a range of different Object3D types
  • polymorphic tests with very simple modifications to Object3D, so e.g.:
		var choice = Math.random();
		if (choice < 0.2) {
			child = new THREE.Object3D();
			child.extra1 = "test"
		}
		else if (choice < 0.4) {
			child = new THREE.Object3D();
			child.extra2 = "test"
		}
		else if (choice < 0.6) {
			child = new THREE.Object3D();
			child.extra3 = "test"
		}
		else if (choice < 0.8) {
			child = new THREE.Object3D();
			child.extra4 = "test"
		}
		else {
			child = new THREE.Object3D();
			child.extra5 = "test"
		}

What I've learned...

  • polymorphism accounts for a degradation from about 250 ops/sec to about 150 ops/sec
  • certain classes are slower even when monomporphic, in particular:
  • Mesh & InstancedMesh have monomorphic perf of about 160 ops/sec
  • SkinnedMesh has monomorphic perf of about 120 ops/sec.

I've also tried running these tests with the fix in #25114.

As you might hope/expect:

  • In the "simple polymorphic" case, the fix brings polymorphic performance ~level with monomorphic performance
  • The fix has no impact on the performance of a monomorphic test with Mesh, InstancedMesh or SkinnedMesh.

So overall there are 2 factors in play here:

@diarmidmackenzie
Copy link
Contributor Author

Based on the analysis above, I now think I'm in a position to propose an improved set of benchmarks, and have updated the PR with what I'm proposing, and will convert from Draft to "Ready for Review".

image

Rationale for including these:

  • Polymorphic - highlights the issues with polymorphic performance, without any other confounding variables.
  • Monomorphic Mesh - Not yet understod what the perf issues are with Mesh, but it's a widely used class hence perf very important, and worth keeping an eye on by itself.
  • Monomorphic SkinnedMesh - Not so widely used, but has significantly worse perf than even Mesh. Worth keeping an eye on.
  • Realistic blend. I've kept the original mix I proposed in here as well: 5% Skinned, 5% Instanced, 50% Mesh, 40% Group. Hopefully any regressions would show up in one of the other more-focussed tests, but I do think it's useful to have something that's a reasonable mode of what perf we might see in reality, and useful to be able to keep an eye on whether this looks in line with all the other metrics.

I'm quite deep into this topic at the moment, so it's possible I am over-egging the amount of tests needed here, and overlooking costs that arise from having too many tests.

If I had to cut back on this, I'd probably cut back on the SkinnedMesh (I don't imagine it's widely used at great scale) - all the others I think have a pretty cast-iron case for inclusion.

In the other direction, if I were to extend to include even more tests, I think I'd extend to include a monomorphic test for each individual sub-class of Object3D, to check for & monitor variations in performance between different classes.

For reference, a preview of what these benchmarks look like with the prototype fix for #25114

image

@diarmidmackenzie diarmidmackenzie marked this pull request as ready for review December 12, 2022 11:50
@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 6, 2023

Closing since the benchmarks have been removed.

@Mugen87 Mugen87 closed this Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants