benchmark: various improvments to benchmark #3604

IvanGoncharov · 2022-05-24T13:40:52Z

This is stacked PR, for descriptions of various changes please look into the description of underlying commits.

We execute benchmarked code multiple times in a cycle to improve accuracy but this also leads to GC being triggered that significantly affect accuracy. To solve this now we allocate 1GB of heap specifically for new objects. Also trace GC to ensure all our benchmarks fit into 1GB.

If context switch happens during benchmark it affects measurements in a significant way, so just reject affected measurments.

Whole benchmark script is synchronous anyway because we can't measure stuff in parallel but were dealing async/await/Promise just because we used IPC to communicate with spawned subprocess. Now we switch to 'pipe' + JSON.stringify/JSON.parse for comminication and can simply source code a lot.

netlify · 2022-05-24T13:40:59Z

✅ Deploy Preview for compassionate-pike-271cb3 ready!

Name	Link
🔨 Latest commit	`56194c3`
🔍 Latest deploy log	https://app.netlify.com/sites/compassionate-pike-271cb3/deploys/628cf5d442f93a000abb0951
😎 Deploy Preview	https://deploy-preview-3604--compassionate-pike-271cb3.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

github-actions · 2022-05-24T13:41:32Z

Hi @IvanGoncharov, I'm @github-actions bot happy to help you with this PR 👋

Supported commands

Please post this commands in separate comments and only one per comment:

@github-actions run-benchmark - Run benchmark comparing base and merge commits for this PR
@github-actions publish-pr-on-npm - Build package from this PR and publish it on NPM

…it into comment Currently benchmark report is posted directly to comment, but because of terminal color codes it look unreadable. Sadly, github doesn't provide any mechanism to add colors to the comment text (even in code blocks). The best solution I can think of is to link to GH own logs to show benchmark report

yaacovCR · 2022-05-29T18:28:35Z

Sadly, I need to figure out a way to run benchmark reliably on CI after #3604

So it seems like this PR broke benchmarking? Perhaps by making it overly restrictive in terms of discarded samples? Or, at least that's my understanding.

Should we revert at least until fixed?
Might we consider a 3rd party benchmarking solution as an additional dev-dependency so that we can outsource some of the benchmarking work?
How do other libraries control for context-switching? Do they just increase the number of runs, perhaps randomizing the order of the runs when doing A/B tests?

IvanGoncharov · 2022-05-30T18:27:48Z

Might we consider a 3rd party benchmarking solution as an additional dev-dependency so that we can outsource some of the benchmarking work?

If you find one.
Everything I tried is really bad, no lib can reliably measure a 1-2% difference.
The numbers differ a lot between runs.
It's easy to measure the performance of some microbenchmarks it's hard to have repeated results on functions like execute, buildSchema, etc.

AFAIK, all big project have their own custom benchmark.

How do other libraries control for context-switching? Do they just increase the number of runs, perhaps randomizing the order of the runs when doing A/B tests?

There is some tech for compiled languages that allow you to get super reliable measures.
But I don't know about anything that can do this for JS.

Moreover, involuntary context switching is an external source of noise for your measurement.
Plus it's totally unpredictable how long it will take + it's unpredictable how much CPU cache will be lost during the context switch.
So I don't think it's possible to account for this with purely statistical methods.
For example, if you run a benchmark before this PR with Youtube playing in the background you will see how severely it affects measurements.

Should we revert at least until fixed?

If it will block someone.
You can still run the benchmark locally if your computer is not doing anything else and you get a reliable benchmark.
I did this change specifically to measure the effect of AsyncIteratable and other changes to the executor.
Without this change, I can't reliably measure the effects of those changes even locally.

So I would say now it gives reliable numbers if it works, previously it just gave unreliable numbers.

IvanGoncharov · 2022-05-30T18:31:42Z

@yaacovCR Above comment was intended to explain why benchmarking is hard and not discourage you from looking at existing 3rd-party solutions.
It would be great if you look into what is available in the JS ecosystem right now.
Especially since I did my own research many years ago.

yaacovCR · 2022-05-30T18:47:22Z

I think we are not too concerned about "clean room" benchmarking, but rather about real world benchmarking and if context switching introduces a level of noise beyond which the benchmark can't reliably distinguish, I'm not sure the measurable clean room difference would be meaningful. It seems to me that the "solution" as it were would be just to increase the number of runs and randomize the order.

Perhaps anyway this change could be reverted for CI and then enabled locally so CI would at least work even if the variability is higher than local.

IvanGoncharov added 3 commits May 24, 2022 16:38

benchmark: reject samples that had involuntary context switches

ea00774

If context switch happens during benchmark it affects measurements in a significant way, so just reject affected measurments.

IvanGoncharov added the PR: internal 🏠 label May 24, 2022

IvanGoncharov merged commit 4f8864e into graphql:main May 25, 2022

IvanGoncharov deleted the pr_branch2 branch May 25, 2022 09:08

IvanGoncharov mentioned this pull request May 25, 2022

Add new 'GraphQLSchema.getField' method #3605

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark: various improvments to benchmark #3604

benchmark: various improvments to benchmark #3604

IvanGoncharov commented May 24, 2022

netlify bot commented May 24, 2022 •

edited

github-actions bot commented May 24, 2022

yaacovCR commented May 29, 2022

IvanGoncharov commented May 30, 2022

IvanGoncharov commented May 30, 2022

yaacovCR commented May 30, 2022

benchmark: various improvments to benchmark #3604

benchmark: various improvments to benchmark #3604

Conversation

IvanGoncharov commented May 24, 2022

netlify bot commented May 24, 2022 • edited

✅ Deploy Preview for compassionate-pike-271cb3 ready!

github-actions bot commented May 24, 2022

yaacovCR commented May 29, 2022

IvanGoncharov commented May 30, 2022

IvanGoncharov commented May 30, 2022

yaacovCR commented May 30, 2022

netlify bot commented May 24, 2022 •

edited