-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use ARM crypto extensions to improve ZFS encryption performance #12171
Comments
The only thing I can think of that might match your description (since ZFS has, to my knowledge, never used the kernel's crypto primitives, and those posts don't appear to cite anything except each other) would be when Linux decided to not play nice in the sandbox and took away the ability for non-GPL code to use SIMD, the fix for that (#9346) and #9749, which tried to reduce the additional overhead introduced by having to save and restore everything on our own. To my knowledge, the platform-specific optimizations (for encryption) currently are indeed limited to using x86_64 instructions, though that's not an architectural restriction, just that nobody has written or ported acceleration for other platforms. So I would not expect the aforementioned SIMD problem to have resulted in significant performance loss on aarch64, so much as there was never a highly optimized implementation running there. If you'd like to get one, you could always try doing it yourself (and, if you're feeling nice, contributing it here) - it looks like there's some ARMv8 optimizations available in OpenSSL, which is, I believe, where the AESNI optimizations came from. (You also might want to ask @AttilaFueloep before doing that, though, in case they're already working on it in the near term, to avoid duplicated effort.) |
Yes, the ICP has no SIMD support for ARM. I'd suggest porting https://github.com/openssl/openssl/blob/master/crypto/modes/asm/aes-gcm-armv8_64.pl. Since I've no ARM hardware I can't do that, sorry, but I'm willing to help if someone's going to tackle that. |
Just to have an idea of the work involved here: is it a matter of taking the Perl scripts from openssl and have them generate actual S files for the assembly compiler? Cursory glance at ZFS's aesni-gcm-x86_64.S vs OpenSSL's aesni-gcm-x86_64.pl seems to suggest so. Poking some more around the two source trees I see a lot of pl to S file name correspondences across OpenSSL's |
Yes, that's the first step. You can also just run run |
I can take a look at this and/or help out since I'm in arm64 land and was doing some testing of ZFS encryption speed there. FWIW, testing ZFS on a ramdisk:
And
|
Yep, I'd expect some major boost from bringing in the openssl asm routines. The Intel Coffee Lake I measured on is twice as fast but that shouldn't matter much. If you could have a look, that would be great. I'm willing to help and that would give me a reason to learn some ARM asm ;-) |
I have zero experience with assembler, ARM or otherwise, so I wouldn't be able to do this on my own. I'm willing to help though :) Is ZFS cross compilable? My ARM device is a single board computer with a weak CPU, so I was trying to work from my PC from Windows Subsystem for Linux/Ubuntu 20.04: I could get ZFS to compile natively (but not to run tests) but trying to cross compile with |
I've not tried it in the way you mean (whenever I've played with cross-compilation here, it's been with qemu-static and chroot), but I don't know of any secret sauce that would break in that case except maybe needing to explicitly specify the Linux kernel bits to compile against aren't the running kernel. What's the build error? |
Just tried compiling after Compiling fails with
Of interest, I'm seeing files like
being compiled, which really seem out of place in an ARM build... Using
at the config stage, which to me seems like it's just using the native x86_64 toolchain. Full outputs below:
|
Sorry, can't help with the cross compiling issues since I never did that, but I'd look at the
Those files are bracketed with |
for raspberry pi architecture, cross compiling can be done via this guide: https://medium.com/@au42/the-useful-raspberrypi-cross-compile-guide-ea56054de187 |
@bghira That guide is not really relevant to this project: to start with zfs does not use CMake (unfortunately) I actually made some inroads in getting zfs to cross compile on my system. First of all one needs to install the gnu compilers for ARM64 like so: Then, and this is the bit that was stumping me, one needs to install the ARM specific versions of ZFS's dependencies. First, one needs to enable multiarch like so: Then, since Ubuntu's repositories list seems to be broken by default, one should follow these steps And finally (note the :arm64 after each library name) This at last allows me to run |
FWIW, I know that cross compilation is possible and is a thing, but I've always gotten mixed results when cross-compiling anything more complex than a couple of C source files to a static binary. (Maybe that's me and/or inexperience talking.) I've had much better luck doing native on-target compiles - yes, compiling on a Raspberry Pi is waaaay slower, but eh. @albertofustinoni I've had a pretty good experience with github/dockcross/dockcross using Docker - perhaps it might be useful here, too. Also, while on the topic of ARM Crypto -- quite a few ARM64 chips have SHA extensions as well (the ARM SHA extensions were introduced earlier than Intel's SHA extensions, I think?) ... see: https://github.com/openssl/openssl/blob/e59bfbaa2dbd680f77e1121e382502bd522a466c/crypto/sha/asm/sha512-armv8.pl The speedup is pretty big... On my phone with a Qualcomm Snapdragon 855 (SM8150) single thread SHA256 performance is ~1.3GB/sec versus ~290MB/sec on a Xeon E5-2620 v4 @ 2.1GHz. Multithread performance is even crazier - using 8 cores on my phone I get 6.5-6.6GB/sec! (The phone CPU is 8 cores but it's like "BIG_big.little ... 1x 2.84GHz "BIG", 3x "big" at like 2.4GHz and then 4x "little" at maybe like 1.8GHz.) Compared to 32 threads on the Xeon (16 cores + hyperthreading) which only gives ~4.5GB/sec. (Command: Even the ARM AES extensions are impressive: 8 threads on my phone ( Long story short: Using the ARM crypto extensions is a big deal for performance. And also, the OpenSSL assembly code for sha256/512 on x86_64 which includes support for Intel SHA instructions. (Though from a benefit vs. time approach, only the newest x86_64 processors have the WP:Intel SHA instructions and the benefit of implementing on ARM first is probably greater...) |
The specialized implementations are a BFD for performance, yes. But as remarked in the source for it, "Reason for undertaken effort is that there is at least one popular SoC based on Cortex-A53 that doesn't have crypto extensions." (Of course, the 4 doesn't either, womp fucking womp.) So you'd almost certainly want to loot those too, not just the crypto extension implementations, where possible. |
Does that mean they are intentionally not using the extensions because there are important SOCs out there that don't support them? Why can they not check at runtime if the extensions are enabled and use them if they are? |
That particular implementation is specifically for SoCs that don't implement them. I believe it was written because they already had an implementation that used the crypto extensions, but it obviously was not useful on platforms which did not implement them. |
What is still needed to get this working? I could help but unfortunately I don't really know where to start. |
As far as I know, all the overview from this comment is still needed - I don't know of anyone who has done anything for it already beyond trying to setup a cross-compile env. |
Hmm yep I was afraid of that. I'm totally comfortable with some regular C++ and C coding but unfortunately this is above my skill level 😕 |
This would be great to have, especially for devices such as RockPro64 which can use SAS HBAs in it's PCIe x4 port. |
Nowadays, the arm64 architecture is becoming increasingly common, and I am also using a Rockchip 3568 CPU. When I use ZFS native encryption, I have noticed that its read and write speeds are significantly slower compared to not using encryption. May I ask when we can solve this issue? |
atm you'd want to solve #14555 first. |
|
System information
Describe the problem you're observing
I am running ZFS on an Odroid HC4 and I'm noticing significantly worse performance with ZFS native encryption VS creating unencrypted pool over a dm-crypto block device created via LUKS.
We are talking between 40 and 70MB/s and 100% CPU usage for native crypto depending on which cypher I choose vs saturating gigabit ethernet at ~70% CPU usage when copying via Samba.
The Cortex A55 supports ARM's crypto extensions and I'm assuming dm-crypto takes advantage of them given these results
Is this a case of ZFS not being able to do the same? Or is this related to the kernel breaking crypto acceleration for ZFS a while ago? Was the fix only for x86/AES-NI?
The text was updated successfully, but these errors were encountered: