Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks: Contamination between benchmark tests, dependent on ordering #25124

Closed
diarmidmackenzie opened this issue Dec 12, 2022 · 13 comments
Closed

Comments

@diarmidmackenzie
Copy link
Contributor

diarmidmackenzie commented Dec 12, 2022

Describe the bug

I've been doing some work with the perf benchmarks here: /test/benchmark/benchmarks.html

One issue I have spotted with these benchmarks is that the data output depends on the order that the tests are run in.
#25113 (comment)

Specifically, when I created a test D with a polymorphic set of Object3D sub-classes, and made it the first test run, I noticed that the performance of tests with monomorphic collections of Object3Ds were compromised (as compared to when they get to run first). Performance was down by > 60% vs. the non-compromised tests.

A plausible explanation for this is that the V8 engine will observe the polymorphic use of functions, and mark these functions up as non-optimizable (see discussion in #25115 & #25113 for background on polymorphism / monomorphism impact on performance & V8 optimizations).

However, it's rather probematic if benchmark test outputs depend on the ordering of the tests - this can be very misleading.

Ideally we'd reset the V8 optimization state from one benchmark test to the next. However I can't find any API to do this.

It looks like the tests use benchmarkJS v2.1.0. I've looked through that repo for discussion of how such an issue can be solved. I found just one relevant comment:
bestiejs/benchmark.js#47 (comment)

"we do try to keep each sample as free from previous samples hot code contamination. We do this by keeping variables, properties, values unique in each compiled test sample."

Unfortunately this mechanism doesn't seem to be working. I don't yet have any understanding why not. I don't have any background on Benchmark.js or how these tests were set up. It would be great if someone who does can offer any insights here.

@takahirox?

To Reproduce

We don't yet have any polymorphic tests in dev, so I need to point to a code sample on a PR that's not merged yet.

  1. take the version of UpdateMatrixWorld.js from PR Benchmarks: Update world transforms / updateMatrixWorld() - improve benchmark realism #25113
  2. Run the updateWorldTransform benchmark tests - here's what I saw for the 1st 2 tests.

image

  1. Swap the order to the tests so that D is run first
  2. Re-run the tests, and observe that the benchmark results are very different - what I saw for the 1st 2 tests.

image

There does seem to be some variability from any given run to the next, but it's usually +/-10% - whereas this is a 3-fold drop in the performance benchmark.

Note also that PR #25114 addresses the performance issues associated with polymorphism in updateMatrixWorld(). That PR eliminates the effect in this specific example (but doesn't address the generic issue of contamination between tests).

Live example

Sorry, I don't have a live example.

Expected behavior

I expect benchmark tests to deliver reliable results from one run to the next, regardless of ordering

Screenshots

See above.

Platform:

  • Device: Desktop
  • OS: Windows
  • Browser: Chrome
  • Three.js version: dev
@Mugen87
Copy link
Collaborator

Mugen87 commented Dec 17, 2022

I'm not sure this repository is the right place for discussing this issue. Every user of benchmarks.js (or other benchmarking libs) eventually runs into the mentioned inconsistency so it's not right to find solutions on user level. If benchmarks.js can't help to avoid the inconsistency, the issue should potentially discussed with the browser vendor, e.g. https://bugs.chromium.org/p/chromium/issues/list

@diarmidmackenzie
Copy link
Contributor Author

diarmidmackenzie commented Dec 17, 2022

Every user of benchmarks.js (or other benchmarking libs) eventually runs into the mentioned inconsistency

Do you have evidence that this is the case?

Searching for "contamination" in benchmarks.js github reveals just the one issue referred to above, which claims that contamination need not be an issue.
bestiejs/benchmark.js#47

I'm not sure that we yet know that the issue is with benchmarks.js, rather than the implemention within this repo.

Further, even if there is an issue with benchmarks.js / browser, there may be sensible mitigation that could be taken within this repo - e.g. at a minimum a big comment in the code & on the benchmarks page warning that results are order-sensitive, to avoid misinterpretation.

Or perhaps some more effective workaround such as forcing a browser refresh between tests, while storing off results data in localStorage to they can all be displayed at the end of the run...?

@Mugen87
Copy link
Collaborator

Mugen87 commented Dec 17, 2022

Or perhaps some more effective workaround such as forcing a browser refresh between tests, while storing off results data in localStorage to they can all be displayed at the end of the run...?

That sounds too complex to me.

In the meanwhile, I would rather consider to remove the benchmarks and write the few tests which are actually required with jsbench. These could be linked in the wiki so everybody can use them.

@diarmidmackenzie
Copy link
Contributor Author

Sounds like a good solution that should solve the contamination issue.

@Mugen87
Copy link
Collaborator

Mugen87 commented Dec 17, 2022

Would you or @LeviPesin be interested in migrating the existing six tests in /test/benchmark/core to jsbench? Meaning for each test an individual bench. They could be linked in a new page at the Developer Guide here: https://github.com/mrdoob/three.js/wiki

@diarmidmackenzie
Copy link
Contributor Author

Yes, I could look at this. I'd also want to offer an updated version of #25113 on top of this.

@diarmidmackenzie
Copy link
Contributor Author

diarmidmackenzie commented Dec 20, 2022

EDITED to reflect my updated views on whether this solves the contamination issue (it doesn't).

I have done some experimentation with jsbench.me. It had some advantages, but I'm not convinced it's adequate to *replace" the current benchmarks. Perhaps it could be a useful complement.

Here's an example test, derived from the updateMatrixWorld() example I've been working on:
https://jsbench.me/velbwaakvn/1

What's good about this:

  • It does longer/slower runs, which I suspect give more accurate results (although I suspect existing THREE.js Benchmarks could be tuned to achieve this).
  • Tests are public, so anyone can access them & tinker with them. Login needed only to save changes to a test.
  • No requirement to clone the THREE.js repo locally to run benchmark tests.

What's not good:

  • Although it re-runs the test setup steps for each test on a page, this does not solve the contamination/ordering issue. That could presumably be solved by having a separate page for every single test - but we could equally apply that solution to the existing THREE.js Benchmarks.
  • Harder to test against local un-published code. The solution I have been using for this is ngrok to publish my local build directory on a publically accessible URL, so that I can pull it into the test. This works, but it's fiddlier to set up than the current benchmarks.
  • Performance Profiling is harder. I can't pause between test setup & test execution. This means that I can't easily get performance profiling data of just the test from Chrome Dev Tools (I expect this is possible with careful setting of breakpoints, but it would take some effort to figure out correct positioning of breakpoints).
  • Version control of test definitions. Tests can be saved & forked, but we won't have a rich version history like we do with code stored in GitHub.
  • Tests take longer to run than current benchmarks. This is in part the downside of the tests being more accurate, also a consequence of the fact that setup is re-run for each individual test.
  • Output data is displayed less compactly. There's no easy way to copy/paste the data from a run into e.g. a GitHub issue (e.g. via screenshotting or similar).

Other notable differences:

  • jsbench.me is based around the assumption that the various tests in a suite are intended for comparison with each other - it reports performance deltas between tests. That doesn't fit with current benchmark tests, which have fast small-population tests alongside slow large-population tests, and where the interesting comparisons are between one run (against version A of code) and another run (against version B of the code)

Overall, I think jsbench.me is great for use cases where someone wants to compare the performance of a couple of different ways of doing things within THREE.js. Having some samples linked from the wiki for this purpose could be a valuable addition.

I think it's not well-suited to a master set of THREE.js benchmarks, which can be used to assess performance of the library itself, and the likely impact of changes to THREE.js on delivered performance.

For this purpose, I think it would be better to build on & refine the existing Benchmark tests.

@diarmidmackenzie
Copy link
Contributor Author

diarmidmackenzie commented Dec 20, 2022

FYI also I have been looking at the benchmark.js code, and I now understand the comment about contamination-prevention.

"we do try to keep each sample as free from previous samples hot code contamination. We do this by keeping variables, properties, values unique in each compiled test sample."

This is true for the code that appears directly in the function decalred to benchmark.js

e.g.

function () {

		rootD.updateMatrixWorld( true );

	}

But it won't apply the the THREE.js library code that gets invoked. This will be pre-compiled, with optimization based on prior code execution, and can only be reset via a browser refresh.

@diarmidmackenzie
Copy link
Contributor Author

I wonder whether running the benchmark inside an iFrame would solve the contaminaton issue?

Output data could be aggregated from across multiple iFrames into a single top-level display.

@epreston
Copy link
Contributor

epreston commented Feb 5, 2023

Test's interfering with eachother is because "var" is module scope. Either block scope code with "let and const" or you have these issues. Swap tests around and "var" declared variables don't have the same state.

@LeviPesin
Copy link
Contributor

I think this issue is about a contamination of another sort -- that browser's JS engine could, for example, not have in its cache one function in one test and already have it in another.

@epreston
Copy link
Contributor

epreston commented Feb 5, 2023

It really is vars of the same name, being module scoped and manipulated. Just spent an hour or two with the bench code. There's no cleanup after each test so you can swap them around and see.

@Mugen87 Mugen87 closed this as completed Feb 6, 2023
@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 6, 2023

Closing, see #25434 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants