New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark: various improvments to benchmark #3604
Conversation
We execute benchmarked code multiple times in a cycle to improve accuracy but this also leads to GC being triggered that significantly affect accuracy. To solve this now we allocate 1GB of heap specifically for new objects. Also trace GC to ensure all our benchmarks fit into 1GB.
If context switch happens during benchmark it affects measurements in a significant way, so just reject affected measurments.
Whole benchmark script is synchronous anyway because we can't measure stuff in parallel but were dealing async/await/Promise just because we used IPC to communicate with spawned subprocess. Now we switch to 'pipe' + JSON.stringify/JSON.parse for comminication and can simply source code a lot.
✅ Deploy Preview for compassionate-pike-271cb3 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
Hi @IvanGoncharov, I'm @github-actions bot happy to help you with this PR 👋 Supported commandsPlease post this commands in separate comments and only one per comment:
|
…it into comment Currently benchmark report is posted directly to comment, but because of terminal color codes it look unreadable. Sadly, github doesn't provide any mechanism to add colors to the comment text (even in code blocks). The best solution I can think of is to link to GH own logs to show benchmark report
So it seems like this PR broke benchmarking? Perhaps by making it overly restrictive in terms of discarded samples? Or, at least that's my understanding.
|
If you find one. AFAIK, all big project have their own custom benchmark.
There is some tech for compiled languages that allow you to get super reliable measures. Moreover, involuntary context switching is an external source of noise for your measurement.
If it will block someone. So I would say now it gives reliable numbers if it works, previously it just gave unreliable numbers. |
@yaacovCR Above comment was intended to explain why benchmarking is hard and not discourage you from looking at existing 3rd-party solutions. |
I think we are not too concerned about "clean room" benchmarking, but rather about real world benchmarking and if context switching introduces a level of noise beyond which the benchmark can't reliably distinguish, I'm not sure the measurable clean room difference would be meaningful. It seems to me that the "solution" as it were would be just to increase the number of runs and randomize the order. Perhaps anyway this change could be reverted for CI and then enabled locally so CI would at least work even if the variability is higher than local. |
This is stacked PR, for descriptions of various changes please look into the description of underlying commits.