-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supercop AMD64 ASM for Wallet Importing [Do not Merge] #2317
Supercop AMD64 ASM for Wallet Importing [Do not Merge] #2317
Conversation
SHA-256 hash of supercop-20170725.tar : 87cf6b3306fa4cb5c688774d0a8a367d74e519c9ea6733d96cfce322a228044e
@vtnerd I confirmed that the copied files are identical to the original. Just one small thing: the entire content of the folder Also, I couldn't build the accelerated version because the |
9d72ff2
to
b610a47
Compare
I do not understand - the first commit is supposed to be an exact duplicate of those folders in supercop. The next commit converts the constant references to position independent references.
The |
Default behavior is to use amd64-51-30k when targeting amd64 architecture. `-DWALLET_CRYPTO` can be used to optionally disable or enable amd64-51-30k or amd64-64-24. See `src/wallet/crypto/README.md` for more info.
b610a47
to
8f38b22
Compare
Maybe I was unclear: I see in the first commit that there are two exactly identical files, e.g. |
The files are similar but not identical. EDIT: Should've said good "eye" @kenshi84 ! Noticing the similarity should taken a decent amount of effort. |
Here are the timings for a full blockchain scan (with restore height 0) on testnet (981463 blocks):
Computer: MacBook Pro 15-inch/2016, 2.9 GHz Intel Core i7, 16GB RAM The speedup is quite substantial! |
What is the benefit of the separate project? Would kovri or other projects use the ASM speedups in some way? Supposedly one of the variants will end up in NaCL... |
I think you mean repo?
And, as referenced in #2133, paraphrasing:
Answer: to streamline the review process for the sake of accurate auditing ^
Not that I know of, but your actual work (versus what the library produces) will be much easier to review (thus why we've began moving to submodules). |
Hi @vtnerd, continuing where we left off on IRC:
What do you think?
But that doesn't mean no (even for kovri). The future is bright with potential 🌞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contingent on a theory OK from luigi
@vtnerd I've created a supercop repository on monero-project |
I would say that @vtnerd it won't be too hard to make that code run on Windows if you put your mind to it. |
The field unpacking functions And making that code run in Windows would require an update to every ASM function to handle the different ABI. Its certainly not more difficult than the ASM work I already completed, but is additional work and a burden on reviewers looking for correctness / non-malicious behavior. I was hoping to provide minimal changes to the supercop code for auditing purposes; the current changes are far more than I wanted to do. I would prefer if making it work in Windows were done in a future patch. |
I'm using Dr. Bernstein nomenclature here:
The distinct advantage of asm for doing fe operations is that you don't need to check overflows (go ahead, read the 64-bit Arithmetic overflow checks are a red herring and cmov instructions to make sure we stay in the field and avoid division are not that expensive on asm level (they cost an order of magnitude more on c level) You are also missing the security dimension. 64-bit version is going to be inherently more well-behaved as there is one and only one way of representing each fe number, whereas a representation in any partial number system is going to involve a normalisation step that will need to occur sooner or later - ( have you checked whether the bounds of all operations added together are certain not to roll off the field? Think doing fe_add 10,000 times ) Lastly the 64-bit operations will probably lend themselves better to future cpu architecture optimisations, whereas with 51-bits you are stuck with what you have Another important point - |
I know what you meant by
The overflow checks are not a red herring. According to the ed25519 whitepaper, it is the entire reason the
Both are used only for wallet scanning due to possible issues as a result of less testing/auditing and general complexity. A bug in this context means an output could be lost until re-scanned with the option disabled. I agree that the 51-bit radix representation appears to have additional complexity (just read the whitepaper). I will change the default to
I still do not understand what you mean - the 51-bit variant uses 5 64-bit numbers. The future performance of each seems hard to predict because it depends on the CPU cache size/performance/eviction and the relative cost of specific instructions.
I was benchmarking 10000 transactions each with varying numbers of outputs (I think I settled on 4 for testing). So both functions were being tested indirectly. It was still artificial - in real usage the CPU is more likely to evict the associated cachelines since the current processing loop is not as "tight" as my benchmark. So again, that's why I was considered making Also, its worth mentioning that this was designed to compile both in the same binary for benchmarking. I just need to cleanup the cmake primarily before inclusion into mainline (i.e. some future patch). |
We seem to be arguing the difference between an arithmetic overflow and adc instruction. Since I don't really want to get intro arguments here, I will leave you to it. Sorry if you found me annoying. |
@moneromooo-monero I approve the idea, indeed. It's a pretty good way IMO to "get its foot in the door" for potential future use elsewhere (more critical functions), as it is a significant speedup. I have not looked at the code yet. |
Once the ASM code is merged into the sub-repo, I will revisit with a new PR. |
The first commit imports
amd64-51-30k
andamd64-64-24k
fromsupercop-20170725/crypto_sign/ed25519
. I encourage @iamsmooth @moneromooo-monero @hyc @fluffypony @luigi1111 @kenshi84 to verify that the code is the same from that source.The next commit (ping @hyc @luigi1111) updates the supercop code to be compliant with
-fPIC
or-fPIE
. Some platforms (OSX for example) compile in this mode by default, and I do not see a good way to detect whether position independent code is being used to select a non-position independent variant.The last commit are the changes to the monero baseline to integrate the ASM code into the wallet (@hyc and @luigi1111 might want to peek at the
*choose_tp.s
files adapted from supercop code to includez
field). The current ref10 implementation is not modified, and this code is optionally used for wallet scanning only. See associated readme. @iamsmooth @luigi1111 @hyc @moneromooo-monero ... should this be enabled by default? Is it worth the extra maintenance to even use this code? The bulk of this code is upstream from supercop, and now written by me.Performance Numbers (OSX on Mac mini mobile i5)
Note that the
monerod
system in my test setup should have slightly higher than normal latency. There was still a ~8min improvement. A much larger drop in user CPU time was seen. Improvements to the fetching code should help. Theamd64-51-30k
version always fetched more blocks.ref10
amd64-51-30k
EDIT: Updated reference to last commit. The original last commit did not have
amd64-51-30k.cmake
andamd64-64-24k.cmake
.