New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GIP] Batch Verification & Multithread Optimization on Range Proof Verification #1321
Comments
Related - #836 We should be able to "batch verify" bulletproofs via libsecp. I'm not sure where we got to with this but @yeastplume may have more info. |
"batch verify" bulletproofs via libsecp? 👍 that's wonderful if indeed having, will also take a look at it. |
@apoelstra states in the referenced blog post -
|
Yes, #836 is highly relevant, I had started looking at it and ran into issues on an older version of libsecp-zkp. Not sure where it's at now. And I do think we should do both:
|
Oh, and is GIP an actual thing now?! 😄 |
Last I left this, I couldn't get the batch verify function working from rust.. unsure if it was being called correctly or not (most likely not). Upon asking, I was pointed at the test in the benchmark which is indeed passing, so I need to compare this more closely with what's being passed in from rust, though I haven't got to it. Code in benchmark is here: |
A 64-bit BP rangeproof should never be taking 40ms to verify regardless of batching. On my laptop a single proof takes 18ms to generate, 2.3ms to verify, and 0.25ms to batch-verify. |
The gap is most likely due to the fact that we recreate the scratch space and generators every time. A first easy step, until we figure out the issue in passing parameters down to batch-verify properly, could be to have a batch implementation that doesn't use libsecp batch-verify but at least reuses those artifacts (of course, fixing both would be best). Edit: after checking the latest code, we do not recreate generators every time anymore, but we do use |
Thanks all of you for providing these useful information! which lead me to investigate the detail of Nice team work:) |
@ignopeverell what's the detail parameters for your slow machine? 10 times faster than my |
The detail of the batch verification bench test can be found here in my Wiki Test result on my poor mac air:
That's a near 50 times speed optimization! Thanks @apoelstra and all other contributors of And I'm still working at the Hope to complete all these and give pull-request to Grin repo, in one or two days. |
@garyyu That’s great news, what was the issue with the batch verify function? verify_single_w_scratch was from the older version, the latest T3 branch allows you to create and hold onto a scratch space on the rust side |
@ignopeverell @yeastplume I'm confused about ignopeverell said above: Look at this 2 lines: Current code indeed recreate generators every time, please confirm. |
I think @ignopeverell was just looking at master instead of testnet3 branch (really need to fix that to avoid confusion) Would be more concerned about this:
that 0x11 has been copied and copied and copied again, so I don't know where it came from, but if it's wrong it means the generator point is currently being incorrectly flipped, which might indeed lead to problems (and a hard fork). I'll check this further this evening when I have some time. |
No, should be fine. This is what gets called for generator load:
Both 0x11 & 0x01 == 1 and 0xb & 0x01 == 1 so that change shouldn't make any difference (it should indeed be 0x0b), but Interesting that our generator H needs to be flipped, we'd get a tiny optimisation from choosing a point that doesn't need to be. |
Interesting! @yeastplume you want choosing another point to replace current That would be a global optimization! even the saving computation is trivial. good idea~ but that would be a hard fork, right? lucky we're still in Testnet. But perhaps @ignopeverell don't want a Testnet4 :) |
We could hardfork testnet3. But I'd rather see some numbers before to make sure it's not a micro-optimization with no runtime impact. |
OK, will investigate if that's a micro-optimization. |
And about:
I did an optimization to use a shared generators, avoid recreating every time, here is the bench test result on my
This can contribute near 177% faster in bullet proof creation, and 667% faster in single bullet proof verification. |
Now I try to integrate these optimization to Grin repos :) |
About:
Here is the bench test for pedersen commitment with original
Indeed, that's a micro-optimization and no need to do it. (To Be Checked! it's normal that a Pedersen Commitment spend 0.252 ms, but why changing |
:D that was just a throwaway comment, I was never under the impression that that was anything other than a tiny micro-optimisation.. but thanks anyhow for checking! |
Totally we got over 35 times speed optimization :) Before:
After #1363:
So, this optimization should be enough already. The multithread optimization proposal could be a premature one, can be postponed. |
…imblewimble#1321) (mimblewimble#1363) * improve: use bullet rangeproof batch verification for txhashset validation (mimblewimble#1321) * update rust-secp256k1-zkp to tag 'grin_integration_22'
This is a Grin Improvement Proposal.
Range proof verification is the most expensive computation among all those computations in Grin, I'm not using "one of" here:) If I'm wrong, let me know please.
According to my bench test on range proof, it need average 40ms for a verification and 80ms for a creation, on my MacBook Air (Early 2015). As a compare, the ECDSA signature spent average 0.14ms for a signature and 0.17ms for a signature verification, on same computer. The detail of this bench test can be found here.
That's a little bit higher than expectation! In case of 50 millions of UTXO, that need about 23 days to complete all the range proof verification, on my poor old MacBook Air; suppose a brand new computer is as 5 times faster as mine, that still need 4.6 days!
Currently, when a new node installed, the fast sync mode will download and validate the txhashset archive, I spent 17 minutes to verify 23140 range proofs (average about 45ms/verification):
That's the longest part in the whole fast sync procedure (24 minutes in this example), range proof verification spent 70% time of the whole sync time, even not including the
BodySync
stage which also need range proof verification.Before looking into the possibility of the real optimization on range proof algorithm or implementation self, at least we can easily give multiple threads optimization at application level.
Look forward to hear your views on this. If all agree to proceed, I will try to give this optimization.
The text was updated successfully, but these errors were encountered: