Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark: various improvments to benchmark #3604

Merged
merged 4 commits into from May 25, 2022

Conversation

IvanGoncharov
Copy link
Member

This is stacked PR, for descriptions of various changes please look into the description of underlying commits.

We execute benchmarked code multiple times in a cycle to improve
accuracy but this also leads to GC being triggered that significantly
affect accuracy. To solve this now we allocate 1GB of heap specifically
for new objects. Also trace GC to ensure all our benchmarks fit into
1GB.
If context switch happens during benchmark it affects measurements in a
significant way, so just reject affected measurments.
Whole benchmark script is synchronous anyway because we can't measure
stuff in parallel but were dealing async/await/Promise just because we
used IPC to communicate with spawned subprocess.
Now we switch to 'pipe' + JSON.stringify/JSON.parse for comminication
and can simply source code a lot.
@netlify
Copy link

netlify bot commented May 24, 2022

Deploy Preview for compassionate-pike-271cb3 ready!

Name Link
🔨 Latest commit 56194c3
🔍 Latest deploy log https://app.netlify.com/sites/compassionate-pike-271cb3/deploys/628cf5d442f93a000abb0951
😎 Deploy Preview https://deploy-preview-3604--compassionate-pike-271cb3.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@github-actions
Copy link

Hi @IvanGoncharov, I'm @github-actions bot happy to help you with this PR 👋

Supported commands

Please post this commands in separate comments and only one per comment:

  • @github-actions run-benchmark - Run benchmark comparing base and merge commits for this PR
  • @github-actions publish-pr-on-npm - Build package from this PR and publish it on NPM

…it into comment

Currently benchmark report is posted directly to comment, but because of
terminal color codes it look unreadable. Sadly, github doesn't provide
any mechanism to add colors to the comment text (even in code blocks).
The best solution I can think of is to link to GH own logs to show
benchmark report
@IvanGoncharov IvanGoncharov merged commit 4f8864e into graphql:main May 25, 2022
@IvanGoncharov IvanGoncharov deleted the pr_branch2 branch May 25, 2022 09:08
@yaacovCR
Copy link
Contributor

Sadly, I need to figure out a way to run benchmark reliably on CI after #3604

So it seems like this PR broke benchmarking? Perhaps by making it overly restrictive in terms of discarded samples? Or, at least that's my understanding.

  1. Should we revert at least until fixed?
  2. Might we consider a 3rd party benchmarking solution as an additional dev-dependency so that we can outsource some of the benchmarking work?
  3. How do other libraries control for context-switching? Do they just increase the number of runs, perhaps randomizing the order of the runs when doing A/B tests?

@IvanGoncharov
Copy link
Member Author

Might we consider a 3rd party benchmarking solution as an additional dev-dependency so that we can outsource some of the benchmarking work?

If you find one.
Everything I tried is really bad, no lib can reliably measure a 1-2% difference.
The numbers differ a lot between runs.
It's easy to measure the performance of some microbenchmarks it's hard to have repeated results on functions like execute, buildSchema, etc.

AFAIK, all big project have their own custom benchmark.

How do other libraries control for context-switching? Do they just increase the number of runs, perhaps randomizing the order of the runs when doing A/B tests?

There is some tech for compiled languages that allow you to get super reliable measures.
But I don't know about anything that can do this for JS.

Moreover, involuntary context switching is an external source of noise for your measurement.
Plus it's totally unpredictable how long it will take + it's unpredictable how much CPU cache will be lost during the context switch.
So I don't think it's possible to account for this with purely statistical methods.
For example, if you run a benchmark before this PR with Youtube playing in the background you will see how severely it affects measurements.

Should we revert at least until fixed?

If it will block someone.
You can still run the benchmark locally if your computer is not doing anything else and you get a reliable benchmark.
I did this change specifically to measure the effect of AsyncIteratable and other changes to the executor.
Without this change, I can't reliably measure the effects of those changes even locally.

So I would say now it gives reliable numbers if it works, previously it just gave unreliable numbers.

@IvanGoncharov
Copy link
Member Author

@yaacovCR Above comment was intended to explain why benchmarking is hard and not discourage you from looking at existing 3rd-party solutions.
It would be great if you look into what is available in the JS ecosystem right now.
Especially since I did my own research many years ago.

@yaacovCR
Copy link
Contributor

I think we are not too concerned about "clean room" benchmarking, but rather about real world benchmarking and if context switching introduces a level of noise beyond which the benchmark can't reliably distinguish, I'm not sure the measurable clean room difference would be meaningful. It seems to me that the "solution" as it were would be just to increase the number of runs and randomize the order.

Perhaps anyway this change could be reverted for CI and then enabled locally so CI would at least work even if the variability is higher than local.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants