Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize argon2 running speed (WebAssembly + C) #67

Open
polymorpher opened this issue Aug 6, 2021 · 2 comments
Open

Optimize argon2 running speed (WebAssembly + C) #67

polymorpher opened this issue Aug 6, 2021 · 2 comments

Comments

@polymorpher
Copy link
Owner

polymorpher commented Aug 6, 2021

We need to improve the speed of argon2 so that users won't need to wait for too long to create a wallet (i.e. generate 1M leaves). The wait time is ideally <30 seconds, and should definitely be no more than 1 minute. Based on experimentation with parameter settings with equivalent difficulty but without generating hashes in batch, a running time at ~10 seconds should be achievable in a single thread. See also my notes in #64 and #65 and the quoted text below. To summarize, low hanging fruit techniques are

  • parallelization (worker + memory transfer) (some helpful notes: https://web.dev/webassembly-threads/ )
  • reduce repetitive memory operations (in C)
  • align bytes with hash block size

We can also utilize https://github.com/gpujs/gpu.js if these techniques are insufficient, but it should be very unlikely. It could also be a lot of work (to maintain) and should not be implemented unless necessary.

It should also be noted that the implementation relies on a customized Argon2 that made it more efficient for computing hashes in batch: https://github.com/polymorpher/argon2-browser. Note that there is still a lot of room for improvement in the customized version. For example, right now each hash computation still triggers a full initialization and destruction sequence in the underlying C code, which involves a large number of malloc and free. They can be improved by doing malloc and free only once for the entire batch. The javascript code could also parallelize the workload and obtain at least 2-4x speed boost for today's typical hardware. A deeper dive into the C code may also allow us to remove salt and align each input to a single 32-byte word, and optimize the code further for 32-byte blocks of input. With these improvements, it can be expected that the randomness bits can be improved to ~20 with argon2, i.e. close to ~1M as originally proposed in Client Security.

Other than these low hanging fruits, we can identify further bottleneck by comprehensively profiling and benchmarking the C code.

@polymorpher
Copy link
Owner Author

Rather than doing this, an alternative is to add a replace-root function call to the smart contract. We already planned to do it anyway, to allow extending the lifespan of an expiring wallet. Rather than generating 1M leafs, we can generate a wallet with a shorter lifespan (~1-3 months), and replaces root only when it is needed. Another benefit of doing this is we need much less storage space at the client.

@polymorpher
Copy link
Owner Author

polymorpher commented Aug 8, 2021

Some notes: currently (after some optimization) the speed for computing 65535 hashes on Safari is 5-7s (without SIMD - not supported by Safari), and on Chrome is 11s (2021 iMac M1) to 15s (2014 iMac Retina) both with SIMD. Not sure why Chrome is slower than Safari (because it is on a Mac?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant