Benchmarks: Update world transforms / updateMatrixWorld() - improve benchmark realism #25113

diarmidmackenzie · 2022-12-11T13:50:06Z

Related issue: #25115

Description

See related issue for full background.

This PR addresses an issue with the existing Benchmark tests for updateMatrixWorld()

The issue is that the benhmark test generates a completely homogeneous set of Object3Ds, which results in the code iterating over a monomorphic set of objects - which will result in very flattering performance vs. real world usage.

Evidence that this is a real issue is provided by observing that the benchmark added in this PR, with a randomized, heterogeneous set of Object3Ds performs 50% slower than the equivalent homegeneous benchmark.

I don't expect this PR to be merged in its current state.

It's intended to share the code for the new "real world" benchmark test, to highlight the substantial performance gap between homegeneous & heterogeneous benchmark tests, and to open a discussion about what the best parameters would be for a Benchmark test that is representative of real-world use-cases for Three.js.

PR #25114 is related and offers a prototype fix that significantly improves performance on the new benchmark added by this PR.

diarmidmackenzie · 2022-12-12T09:05:15Z

I've done a little more experimentation & exploration on this topic. A few additional points worth noting.

There's more to the perf benefits of monomorphism than just cache misses. This article gives a great overview. In particular V8 uses monomporphism as an decision factor in whether or not to apply various other optimizations to hot code.
Playing around with different combinations of Object3D classes didn't make much difference. Even a simple 50/50 split of Groups & Meshes seemed to result in a very similar slowdown. As far as I can tell, monomorphic vs. polymorphic is the only significant factor here. Hence it probably doesn't matter much what specific blend is used in the "real world" test is.
If I rearrange the tests so that D runs first, it destroys performance for all the tests (even the monomorphic ones). Given what I read i the article that I linked above, that makes sense. Monomorphism is about more than just cache hits / misses. The V8 engine actively decides to perform additional optimizations for monomorphic functions. So it seems that by running test D first, the V8 engine decides not to optimize certain functions, which then hurts performance even for subsequent monomorphic cases.
When I run with the fix in updateMatrixWorld Optimization using new Object3DMatrixData class #25114, switching the test order with D first does not have any impact on performance any more. That's nice to see, and provides more evidence that updateMatrixWorld Optimization using new Object3DMatrixData class #25114 is making everything nincely monomorphic.
The article linked above suggests that with > 4 types, there might be a further slowdown as the V8 engine moves from a polymorphic cache of up to 4 entries to a "megamorphic" implementation. I tried testing with up to 6 different Object3D sub-classes, but I didn't see any significant additional slowdown.

Some other articles I found on the topic (googling "megamorphic V8") that look pretty useful /relevant:
https://erdem.pl/2019/08/v-8-function-optimization
https://marcradziwill.com/blog/mastering-javascript-high-performance/

diarmidmackenzie · 2022-12-12T09:37:02Z

More experimentation, and I have found another factor that makes a big difference.

I've done a couple of types of tests:

monomorphic tests with a range of different Object3D types
polymorphic tests with very simple modifications to Object3D, so e.g.:

		var choice = Math.random();
		if (choice < 0.2) {
			child = new THREE.Object3D();
			child.extra1 = "test"
		}
		else if (choice < 0.4) {
			child = new THREE.Object3D();
			child.extra2 = "test"
		}
		else if (choice < 0.6) {
			child = new THREE.Object3D();
			child.extra3 = "test"
		}
		else if (choice < 0.8) {
			child = new THREE.Object3D();
			child.extra4 = "test"
		}
		else {
			child = new THREE.Object3D();
			child.extra5 = "test"
		}

What I've learned...

polymorphism accounts for a degradation from about 250 ops/sec to about 150 ops/sec
certain classes are slower even when monomporphic, in particular:
Mesh & InstancedMesh have monomorphic perf of about 160 ops/sec
SkinnedMesh has monomorphic perf of about 120 ops/sec.

I've also tried running these tests with the fix in #25114.

As you might hope/expect:

In the "simple polymorphic" case, the fix brings polymorphic performance ~level with monomorphic performance
The fix has no impact on the performance of a monomorphic test with Mesh, InstancedMesh or SkinnedMesh.

So overall there are 2 factors in play here:

Monomophic vs. Polymorphic, which accounts for a ~40% perf hit, and can be fixed by updateMatrixWorld Optimization using new Object3DMatrixData class #25114
A completely separate issue where updateMatrixWorld runs slowly over Mesh (and even more slowly over SkinnedMesh), which manifests even with a monomophic set of objects of these classes, and is not improved by updateMatrixWorld Optimization using new Object3DMatrixData class #25114.

diarmidmackenzie · 2022-12-12T11:45:56Z

Based on the analysis above, I now think I'm in a position to propose an improved set of benchmarks, and have updated the PR with what I'm proposing, and will convert from Draft to "Ready for Review".

Rationale for including these:

Polymorphic - highlights the issues with polymorphic performance, without any other confounding variables.
Monomorphic Mesh - Not yet understod what the perf issues are with Mesh, but it's a widely used class hence perf very important, and worth keeping an eye on by itself.
Monomorphic SkinnedMesh - Not so widely used, but has significantly worse perf than even Mesh. Worth keeping an eye on.
Realistic blend. I've kept the original mix I proposed in here as well: 5% Skinned, 5% Instanced, 50% Mesh, 40% Group. Hopefully any regressions would show up in one of the other more-focussed tests, but I do think it's useful to have something that's a reasonable mode of what perf we might see in reality, and useful to be able to keep an eye on whether this looks in line with all the other metrics.

I'm quite deep into this topic at the moment, so it's possible I am over-egging the amount of tests needed here, and overlooking costs that arise from having too many tests.

If I had to cut back on this, I'd probably cut back on the SkinnedMesh (I don't imagine it's widely used at great scale) - all the others I think have a pretty cast-iron case for inclusion.

In the other direction, if I were to extend to include even more tests, I think I'd extend to include a monomorphic test for each individual sub-class of Object3D, to check for & monitor variations in performance between different classes.

For reference, a preview of what these benchmarks look like with the prototype fix for #25114

This reverts commit 1ba542c.

Mugen87 · 2023-02-06T10:28:02Z

Closing since the benchmarks have been removed.

diarmidmackenzie added 2 commits December 10, 2022 14:38

set quaternion, not rotation

1ba542c

Add more varied test

338ac1c

diarmidmackenzie changed the title ~~Perf benchmark realism~~ Benchmarks: Update world transforms / updateMatrixWorld() - improve benchmark realism Dec 11, 2022

This was referenced Dec 11, 2022

updateMatrixWorld Optimization using new Object3DMatrixData class #25114

Open

Potential improvement of "real-world" performance of updateMatrixWorld() #25115

Closed

Updated set of tests based on latest analysis of what variables matter.

8c8b91c

Fix warnings

fffb64d

diarmidmackenzie marked this pull request as ready for review December 12, 2022 11:50

Revert "set quaternion, not rotation"

4b8a601

This reverts commit 1ba542c.

diarmidmackenzie mentioned this pull request Dec 12, 2022

Benchmarks: Contamination between benchmark tests, dependent on ordering #25124

Closed

Mugen87 closed this Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks: Update world transforms / updateMatrixWorld() - improve benchmark realism #25113

Benchmarks: Update world transforms / updateMatrixWorld() - improve benchmark realism #25113

diarmidmackenzie commented Dec 11, 2022 •

edited

Loading

diarmidmackenzie commented Dec 12, 2022 •

edited

Loading

diarmidmackenzie commented Dec 12, 2022 •

edited

Loading

diarmidmackenzie commented Dec 12, 2022

Mugen87 commented Feb 6, 2023

Benchmarks: Update world transforms / updateMatrixWorld() - improve benchmark realism #25113

Benchmarks: Update world transforms / updateMatrixWorld() - improve benchmark realism #25113

Conversation

diarmidmackenzie commented Dec 11, 2022 • edited Loading

diarmidmackenzie commented Dec 12, 2022 • edited Loading

diarmidmackenzie commented Dec 12, 2022 • edited Loading

diarmidmackenzie commented Dec 12, 2022

Mugen87 commented Feb 6, 2023

diarmidmackenzie commented Dec 11, 2022 •

edited

Loading

diarmidmackenzie commented Dec 12, 2022 •

edited

Loading

diarmidmackenzie commented Dec 12, 2022 •

edited

Loading