Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Apple silicon performance workaround as this has been fixed upstream #1337

Closed
jackryanservia opened this issue Dec 19, 2023 · 3 comments · Fixed by #1456
Closed

Remove Apple silicon performance workaround as this has been fixed upstream #1337

jackryanservia opened this issue Dec 19, 2023 · 3 comments · Fixed by #1456
Assignees
Labels
compatibility performance v1 Prerequisite for o1js v1.0

Comments

@jackryanservia
Copy link
Contributor

jackryanservia commented Dec 19, 2023

Earlier this year, we found an issue (#491) where o1js performance was much slower on Apple silicon machines than would be expected and fixed it with a workaround in #683.

The Chromium team identified the issue in 1228686. It was the result of V8 not using LSE instructions even when the underlying hardware supported them and has since been fixed.

Given the finding below, it makes sense to remove the workaround we applied in #683, as doing so would dramatically improve performance on Apple silicon machines.

@jackryanservia
Copy link
Contributor Author

jackryanservia commented Dec 19, 2023

Running src/examples/crypto/ecdsa/run.ts with the current configuration takes 108 seconds to compile and 41 seconds to prove. With getEfficientNumWorkers disabled, it takes 16 seconds to compile and 28 seconds to prove! 😮

Profiling with getEfficientNumWorkers reveals that almost no ticks are used on rdl_dealloc and rdl_alloc:

Preworkaround profile:
ticks  total  nonlib   name
...
79876  41.8%  41.8%    JS: *__rdl_dealloc
...
65241  34.1%  34.1%    JS: *__rdl_alloc
...
Node.js 20 profile with getEfficientNumWorkers disabled:
ticks  total  nonlib   name
...
1      0.0%   0.0%     JS: *__rdl_dealloc
...
1      0.0%   0.0%     JS: *__rdl_alloc
...

This means removing getEfficientNumWorkers will substantially improve performance on Apple silicon, and we should do that! 😸

@mitschabaude
Copy link
Member

Fantastic news, thanks for investigating!!

@nicc nicc changed the title Investigate if Apple silicon performance issue is fixed upstream Remove Apple silicon performance workaround as this has been fixed upstream Jan 4, 2024
@garwalsh garwalsh added the v1 Prerequisite for o1js v1.0 label Jan 30, 2024
@MartinMinkov MartinMinkov reopened this Feb 23, 2024
@dfstio
Copy link

dfstio commented Mar 6, 2024

See also #1456 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility performance v1 Prerequisite for o1js v1.0
Projects
None yet
5 participants