-
-
Notifications
You must be signed in to change notification settings - Fork 35.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks: Contamination between benchmark tests, dependent on ordering #25124
Comments
I'm not sure this repository is the right place for discussing this issue. Every user of benchmarks.js (or other benchmarking libs) eventually runs into the mentioned inconsistency so it's not right to find solutions on user level. If |
Do you have evidence that this is the case? Searching for "contamination" in benchmarks.js github reveals just the one issue referred to above, which claims that contamination need not be an issue. I'm not sure that we yet know that the issue is with benchmarks.js, rather than the implemention within this repo. Further, even if there is an issue with benchmarks.js / browser, there may be sensible mitigation that could be taken within this repo - e.g. at a minimum a big comment in the code & on the benchmarks page warning that results are order-sensitive, to avoid misinterpretation. Or perhaps some more effective workaround such as forcing a browser refresh between tests, while storing off results data in localStorage to they can all be displayed at the end of the run...? |
That sounds too complex to me. In the meanwhile, I would rather consider to remove the benchmarks and write the few tests which are actually required with jsbench. These could be linked in the wiki so everybody can use them. |
Sounds like a good solution that should solve the contamination issue. |
Would you or @LeviPesin be interested in migrating the existing six tests in /test/benchmark/core to jsbench? Meaning for each test an individual bench. They could be linked in a new page at the Developer Guide here: https://github.com/mrdoob/three.js/wiki |
Yes, I could look at this. I'd also want to offer an updated version of #25113 on top of this. |
EDITED to reflect my updated views on whether this solves the contamination issue (it doesn't). I have done some experimentation with jsbench.me. It had some advantages, but I'm not convinced it's adequate to *replace" the current benchmarks. Perhaps it could be a useful complement. Here's an example test, derived from the updateMatrixWorld() example I've been working on: What's good about this:
What's not good:
Other notable differences:
Overall, I think jsbench.me is great for use cases where someone wants to compare the performance of a couple of different ways of doing things within THREE.js. Having some samples linked from the wiki for this purpose could be a valuable addition. I think it's not well-suited to a master set of THREE.js benchmarks, which can be used to assess performance of the library itself, and the likely impact of changes to THREE.js on delivered performance. For this purpose, I think it would be better to build on & refine the existing Benchmark tests. |
FYI also I have been looking at the benchmark.js code, and I now understand the comment about contamination-prevention. "we do try to keep each sample as free from previous samples hot code contamination. We do this by keeping variables, properties, values unique in each compiled test sample." This is true for the code that appears directly in the function decalred to benchmark.js e.g.
But it won't apply the the THREE.js library code that gets invoked. This will be pre-compiled, with optimization based on prior code execution, and can only be reset via a browser refresh. |
I wonder whether running the benchmark inside an iFrame would solve the contaminaton issue? Output data could be aggregated from across multiple iFrames into a single top-level display. |
Test's interfering with eachother is because "var" is module scope. Either block scope code with "let and const" or you have these issues. Swap tests around and "var" declared variables don't have the same state. |
I think this issue is about a contamination of another sort -- that browser's JS engine could, for example, not have in its cache one function in one test and already have it in another. |
It really is vars of the same name, being module scoped and manipulated. Just spent an hour or two with the bench code. There's no cleanup after each test so you can swap them around and see. |
Closing, see #25434 (comment). |
Describe the bug
I've been doing some work with the perf benchmarks here: /test/benchmark/benchmarks.html
One issue I have spotted with these benchmarks is that the data output depends on the order that the tests are run in.
#25113 (comment)
Specifically, when I created a test D with a polymorphic set of Object3D sub-classes, and made it the first test run, I noticed that the performance of tests with monomorphic collections of Object3Ds were compromised (as compared to when they get to run first). Performance was down by > 60% vs. the non-compromised tests.
A plausible explanation for this is that the V8 engine will observe the polymorphic use of functions, and mark these functions up as non-optimizable (see discussion in #25115 & #25113 for background on polymorphism / monomorphism impact on performance & V8 optimizations).
However, it's rather probematic if benchmark test outputs depend on the ordering of the tests - this can be very misleading.
Ideally we'd reset the V8 optimization state from one benchmark test to the next. However I can't find any API to do this.
It looks like the tests use benchmarkJS v2.1.0. I've looked through that repo for discussion of how such an issue can be solved. I found just one relevant comment:
bestiejs/benchmark.js#47 (comment)
"we do try to keep each sample as free from previous samples hot code contamination. We do this by keeping variables, properties, values unique in each compiled test sample."
Unfortunately this mechanism doesn't seem to be working. I don't yet have any understanding why not. I don't have any background on Benchmark.js or how these tests were set up. It would be great if someone who does can offer any insights here.
@takahirox?
To Reproduce
We don't yet have any polymorphic tests in dev, so I need to point to a code sample on a PR that's not merged yet.
There does seem to be some variability from any given run to the next, but it's usually +/-10% - whereas this is a 3-fold drop in the performance benchmark.
Note also that PR #25114 addresses the performance issues associated with polymorphism in updateMatrixWorld(). That PR eliminates the effect in this specific example (but doesn't address the generic issue of contamination between tests).
Live example
Sorry, I don't have a live example.
Expected behavior
I expect benchmark tests to deliver reliable results from one run to the next, regardless of ordering
Screenshots
See above.
Platform:
The text was updated successfully, but these errors were encountered: