Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of benchmarks seems to bias results #864

Closed
sinclairzx81 opened this issue May 15, 2022 · 13 comments · Fixed by #866
Closed

Order of benchmarks seems to bias results #864

sinclairzx81 opened this issue May 15, 2022 · 13 comments · Fixed by #866
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@sinclairzx81
Copy link
Contributor

sinclairzx81 commented May 15, 2022

Hi, thanks for this project.

I'm currently looking at submitting a new runtime type checker for comparative performance benchmarks, but have noticed that the order in which each validator is run seems to degrade subsequent benchmark results. By moving performing tests around, I'm able to greatly improve the performance results of tests by simply running them first.

I'm thinking this is likely due to some of the validators utilizing more memory overhead (implicating the GC), or that V8 optimization heuristics have broken down causing V8 to take the slow path (this is quite likely given how validation routines may require dynamic property access on objects to complete). To rule out the GC, I started the project with npm run compile:spectypes && node --expose-gc -r ts-node/register ./index.ts and called global.gc() to force collection prior to tests running, but no luck. So the problem is most likely related V8 optimization heuristics.

I think to give all tests an even footing, I'm wondering if it would be possible to reshape these tests such that each is run within it's own node process (rather than running all tests in the same process). This would give each a fresh instance in which to do it's thing, and should yield more accurate results.

Open to thoughts if this is something worth looking into.

@moltar
Copy link
Owner

moltar commented May 15, 2022

Hey @sinclairzx81, thank you for your feedback.

This is certainly a serious issue.

I am not an expert on Node internals or how GC works, beyond the CS 201 type of information :)

Do you have any proposals that are specific to this repo?

I had an idea to run every test in a separate GitHub Action job, which would provide the isolation, but then I was worried about the jobs being placed on different machines with different specs.

Running them in a separate process on the same machine sounds like good middle ground.

Wondering what would it take to do so, given our current setup.

@moltar
Copy link
Owner

moltar commented May 15, 2022

Is there anything, perhaps, in the settings of benny we could tweak to equalize the testing environment?

Maybe @caderek can recommend something?

@moltar moltar added bug Something isn't working help wanted Extra attention is needed labels May 15, 2022
@sinclairzx81
Copy link
Contributor Author

@moltar Hi, thanks for the quick reply.

I agree that each test should be run on the same machine (not distributed across different machines on CI). I guess my initial thoughts on this were to just run each test in it's own V8 isolate. I guess this could be achieved a couple of ways, but perhaps the easiest is to have a cli command that is able to target an individual test. This would enable something like the following in package.json.

// runs spectypes only
{
   "scripts": {
      "start": "npm run compile:spectypes && ts-node index.ts --test spectypes"
   }
}

So, to run the complete test suite, you could just append the start script....

// runs everything
{
   "scripts": {
      "start": "npm run compile:spectypes && ts-node index.ts --test spectypes && ts-node index.ts --test zod && ... (omitted)"
   }
}

As the start command would likely get prohibitively long, googles zx package could be utilized to run a JavaScript based shell execution script to run each test and await them in turn.

// runner.mjs

// build spectype
await $`npm run compile:spectypes`

// load cases somehow, and execute in series (with each test run as an separate OS process)
for (const case of loadCases()) {
  await $`ts-node index.ts --test ${case}`
}
$ npx zx runner.mjs

I had a quick look through the code, I guess the only complication that might arise is handling aggregate results (writing benchmark output to node1x.json which gets sourced in the UI later on). Not sure if there is going to be an issue with each test run in isolation.

Open to thoughts.
Cheers
S

@hoeck
Copy link
Collaborator

hoeck commented May 16, 2022

run on the same machine

👍

a cli command that is able to target an individual test

👍

I had a quick look through the code, I guess the only complication that might arise is handling aggregate results (writing benchmark output to node1x.json which gets sourced in the UI later on). Not sure if there is going to be an issue with each test run in isolation.

The results json format is pretty simple: an object in an array for each benchmark/module combination. That can be easily rewritten such that every isolated benchmark run appends its individual result to an already existing results json file. Benchmarks will obviously not run in parallel so there won't be any file transaction issues.

Two more thoughts/questions that come to my mind are:

  1. Should we run all benchmarks of a module (assertString, parseSafe etc) in the same process or isolate that too?
  2. Should only the chosen module be loaded or is it okay to load all modules (zod, io-ts, ...) even though only a single one is benchmarked?

@sinclairzx81
Copy link
Contributor Author

sinclairzx81 commented May 16, 2022

@hoeck Hi!

Should we run all benchmarks of a module (assertString, parseSafe etc) in the same process or isolate that too?

This is a good question. I'm thinking maybe the best way to orchestrate these tests might be to organize the execution of the tests by package (vs by test). So in this regard, the process would start for a given validation package (i.e myzod), then execute all the tests for that validation package, then exit. The next package would be run immediately there after (in it's own process)

The thinking here is that, if performance degrades during the tests run for a single package, that's generally going to be a consequence of that package throttling V8 in some way. Package authors may be able to identify bottlenecks in throughput if they know the sequence in which the tests were run, and if the results show clear degradation across the tests run in sequence.

Should only the chosen module be loaded or is it okay to load all modules (zod, io-ts, ...) even though only a single one is benchmarked?

It might be a good idea to omit packages from import if they are not used in the test. It's probably going to be fine either way, but I guess a potential exists for packages to cache acceleration data structures on import, and such caching could bias results.

I should note, In the testing I've done so far, I have been importing all the tests in the suite, and haven't experienced a problem with them all being there. Degradation seems more tied to order of execution, with the importing of these packages seeming ok.

Hope this helps!
S

@hoeck
Copy link
Collaborator

hoeck commented May 16, 2022

@sinclairzx81 @moltar or anyone else volunteering for this? Otherwise I'm going to try to implement @sinclairzx81 suggestions.

@sinclairzx81
Copy link
Contributor Author

@hoeck Hi. that's great news :) I wasn't planning on undertaking these updates myself (as that work is best left for those benchmarking), but can help assist with the PR process by running some local tests and providing feedback prior to merge if it's helpful.

Keep me posted!

@moltar
Copy link
Owner

moltar commented May 16, 2022

Otherwise I'm going to try to implement

That would be great! 👍🏼 😁

@hoeck
Copy link
Collaborator

hoeck commented May 17, 2022

@sinclairzx81 could you please try the changes from my PR?

When you execute npm start or npm run start or ./start.sh, benchmarks are now run for each module in their own npm process and the results are aggregated.

You can also do npm run start run zod myzod to only benchmark the listed modules at a time.

And most important, does the bias of the ordering go away or does it persist?

@sinclairzx81
Copy link
Contributor Author

@hoeck Hi sure, just ran your branch and it looks like that did the trick. Seems many of the libraries there are giving much better results across the board now, with some crossing the 1 million operations per second mark which is nice to see! Hopefully the updates weren't too bad to implement. Also +1 for adding the ability run individual cases, very useful!

Good work mate!
S

@hoeck
Copy link
Collaborator

hoeck commented May 17, 2022

Thx for the feedback @sinclairzx81 😊 and I am glad that this removed the result bias. I will fix up the linter issues with the PR tomorrow so that we can merge it.

@moltar
Copy link
Owner

moltar commented May 17, 2022

@hoeck amazing work, as usual! 👍🏼

I'm wondering if we shall add temporarily a notice banner at the top explaining this change, with maybe a link to this issue for details?

@hoeck
Copy link
Collaborator

hoeck commented May 17, 2022

I'm wondering if we shall add temporarily a notice banner at the top explaining this change, with maybe a link to this issue for details?

That makes sense, I'll add a simple info message on top of the readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants