-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GOST ECC optimizations #263
Conversation
Clang 10, with patch: Clang 10, without patch: GCC 8.3, with patch: GCC 8.3, without patch: $ cat /proc/cpuinfo
|
This def doesn't seem right. I don't think it's hitting the right code paths. Could you double check in the debugger? Set a bp at |
OH and ofc critical info I left out ... so far it's only |
Got my mistake. id_tc26_gost_3410_2012_256_paramSetA is TCA parameters. Just a moment. |
Wow! With patch, clang 10 Without patch: GCC 8.3, with patch: GCC 8.3, without patch: |
Excellent! Thanks for the sanity check :) We'll keep at it. If you have any comments outside the |
This looks awesome. I use benchmarker from #264. Before patch is applied (time, cycles, instrs):
After patch is applied (time, cycles, instrs):
|
Brilliant result! |
Stupid question no 1: why the sign is 4,5 times faster and verify only 2? |
It's not a dumb question at all :) For verify, OpenSSL right now isn't using a constant time algorithm, so it's actually pretty OK efficient. (All the changes in the OpenSSL EC module over the past few years don't affect that code path -- only keygen, sign, and derive.) The same goes for the code changes here, in the sense that
Signature verify isn't normally made constant time because there are no secrets involved. |
So. We had a constant-time implementation, that became 4 times faster after replacing the multiplication with a new one. Non-constant-time implementation has become "just" 2 times faster. So the verification now is two times slower than signing. Do I correctly understand your point? |
Yes that's the situation :) In general the verification will be slower because it takes the form The fact that it may not currently hold in OpenSSL master is because the CT point multiplication ( |
a190178 should be all the curves now ... |
@bbbrumley Another question - why being constant time algorithm it still show distribution of execution times in my benchmarks? |
Also a good question :) While in general, CT means "not dependent on the key", the issues for Having these full stack EC implementations will let From experience, I know a few spots to look so when I carve out some time I'll go through those and submit PRs. |
IC. Yes, |
Without optimization:
With optimization:
|
Optimized version:
Non-optimized version:
|
For me, knowing our EC layer, the I was hoping to see more along the lines of We'll investigate more next week. |
I also see some regression in 512 A/B verification. |
Huh, yea. The signature verification path was more of an afterthought -- we hacked it together in about 15min just to get more complete coverage. We'll think about how we can improve that one. (It should be easy since there are no CT requirements, even if the Fiat GF layer will anyway be CT.) Basically you're seeing OpenSSL's variable time code performing nicely for larger curve sizes. |
Many thanks anyway! |
Do I correctly understand, that some code is automatically generated and some (minor) is hand-written? And, if new curves appear, the code may be regenerated? |
Technically speaking, everything is automatically generated -- it's just about where the templates come from (we are using python mako templating):
Above also highlights the fact that formal verification only covers the GF layer -- so we focused the EC layer on simplicity and correctness for review, coupled with lots of unit testing. Adding a new curve takes less than 5 minutes:
We have about 20 curves right now in our experiments, including all the GOST ones. |
Btw, are there many code duplications now? Hard to compare since all functions named by curve. |
@bbbrumley Also, could you provide script or text description of complete or just example for reproduction of generated code? This would be nice to include in the sources for the future generations. |
I ... don't know. The Fiat code is, by nature, specific to the finite field. For example, we generated the
Given that the Fiat lines are the vast majority of the code (try
Absolutely! We'll have all the tooling source code up within a few weeks. It's on git already and CI is running now, but I hope to extend the CI to integrations. The rigging supports several third party projects (
|
Not many. vimdiff does not show many matching parts. |
@bbbrumley Thanks. What do you think - is it possible (and meaningful) to make statistical test to determine if implementation have time leaks? |
Some timing leaks are here. The issue is You can see the rich history of similar ECDSA fixes in OpenSSL. I can help but it needs to be part of larger restructuring of ECC stuff in In terms of this PR, yea the rigging parts are the place to look for leaks. You spotted Yes When things are restructured, potentially both the |
Btw, I was wrong to say |
Hey Folks, I think we are getting closer to merge quality here. See what your benchmarks say for this latest push. We haven't done extensive benchmarking yet, but spot testing looks pretty good. Edit: Oh right and you also get 32-bit support here, which is also the codepath it'll take for compilers / archs that don't support |
Wow!!! Great job! It may be even faster than Ed25519! |
Standalone EC implementations from ECCKiila. https://gitlab.com/nisec/ecckiila
Excellent :) Yea, they look good. The speedup over what we had before is a combination of:
I think this is pretty much ready. After merging (when you're comfortable), my team's next steps are removing the |
Yes, that sound definitely sane, but could you please clarify the purpose of moving to GOST-specific structures instead of openssl native ones? |
There are several reasons, a few here but just my opinions:
Another way to look at it, is right now you're going gost-engine -> EC module -> gost-engine basically just for arithmetic. After this PR, you're going gost-engine -> EC module -> gost-engine -> ECCKiila -> gost-engine -> EC module -> gost-engine. When IMO the better path is gost-engine -> ECCKiila -> gost-engine. Even another, perhaps outsider, view is: If I were to integrate the ECC parts of GOST in OpenSSL now, I would have it as separate as possible from the EC module. It is similar to the 25519 and 448 situation -- yes it is ECC, but you have your own OID hierarchy and can do what you want. The disaster situation is like SM2, where the cryptosystems are totally different but they still put their OIDs in the legacy EC hierarchy: so they are linked forever. |
Got it. Many thanks for the clarification! |
Merged. Thanks! |
Could you also please provide some HOWTO on adding new curves, if any? |
Sure :) We've got an adding a new curve section, but we need to expand it better for curves with birational equivalences. The oneliner I used there for
I will have to update the |
@bbbrumley How can we be sure that this generated implementations (btw, >5 MB of generated code) does not produce incorrect results for some rare curve points? |
It's a good question! I don't think you can be sure. Here is what you can do:
Limitations that I have not mentioned yet for
These are cases that never happen in If you want, you could create your own version of Edit: FYI, if you do the |
@bbbrumley Thank you much for detailed answer! |
I want to and will implement |
YW! I added some limited It's currently passing on 2**16 iterations for all curves, GOST included. |
When testing libraries, it may be worth testing public and private keys with leading or final zero bytes to catch serialisation issues, but I suspect it's out of scope of ECCKiila |
Standalone EC implementations from ECCKiila. https://gitlab.com/nisec/ecckiila (cherry picked from commit bc34620)
Hey @beldmit,
This is def not for merging :) I'm working on theOut of WiPgost-engine
rigging and this is just a dry run.Could you take a look at the performance and let me know what you think?
Should see improvements on EVP
keygen
,sign
,derive
, andverify
across the board.Only tested so far on 64-bit and 32-bit linux, with
clang
.32-bit support
is still WiP, and alsowill serve as the fallback for compilers that do not support__uint128_t
.Edit: very much suggest
clang
no debug build with optimization flags.