Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Support for supercop ASM in wallet, and benchmark for supercop [ Do Not Merge] #6337
Sorry for the do not merge tag again. But this requires a PR in another repo.
This is a working example of the amd64 ASM from supercop to speedup wallet scanning. The speedup is less now, presumably due to slow downs in other areas of the wallet. Scanning from block 1 to 2031549 (mainnet) drops from 5.8 minutes to 4.14 minutes (~28.6%) on my test machine. The benchmark utility (see below) shows a crypto speedup of ~146% with two output transactions.
The test machine is a ryzen 3900x / 32 GiB PC3200 RAM / 970pro nvme / x570 desktop. The daemon and wallet were on the same box. The first run for each crypto implementation was tossed in an effort to "normalize" the file cache. The disk is encrypted, so LMDB hitting disk has an additional hit.
This uses namespace aliases, and therefore has ZERO overhead when disabled. Not even a hit to the executable size. When this is enabled, everything using the "device" library will increase in size due to the extra crypto code. This is unfortunate.
This the output from the benchmark utility included with this PR. It can "sideload" multiple crypto implementations to test the relative performance.
On this setup, static builds of
It is possible to build
I tested scanning on two machines with less awesome CPUs. The same ryzen 3900x / 970pro is being used to run the daemon, with the wallet scanning on:
Fanless laptop with i7-7Y75 (1.3 GHz 2/4 cores 4MiB cache): 46.07 minutes to 20.52 minutes (~55.4%).
2014 Mac Mini i5 (2.5 GHz 2/4 cores 3MiB cache): 38.29 minutes to 24.00 minutes (~37%).
I was incorrect earlier about changes to the code affecting performance - these numbers are similar to what was seen previously. I need a second identical system to see how running the daemon + wallet on a single machine affects performance. Or perhaps just look at the CPU utilization more closely.
Either way, the increase is fairly big for lower-core and/or lower clock speed x86-64 systems.
EDIT: got my cpu cache numbers flipped. The system with higher cache did better with lower clock speed. This suggest the x86-64 code is a bit heavy on the cache.