sha256 x86_64 optimization v2 #2351

tuxoko · 2014-05-30T09:10:02Z

This is a revision of #2332

Currently, the optimization only applies to kernel space,
because I haven't figured out how to it properly in user space.

kernelOfTruth · 2014-07-13T03:51:27Z

In file included from /var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs/../../module/zfs/sha256.c:29:0:
/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/include/sys/sha256.h:5:24: fatal error: asm/sha256.h: No such file or directory
#include <asm/sha256.h>
^
compilation terminated.

probably should be

#include <asm-generic/sha256.h>
#include <asm-x86_64/sha256.h>

instead of

#include <asm/sha256.h>

edit:

not sure why github is swallowing the text and not displaying it:

http://pastebin.com/bk7b4H4N

chrisrd · 2014-07-14T01:47:05Z

@kernelOfTruth "not sure why github is swallowing the text and not displaying it" Github markup: for literal block text, surround the block with 3 back-quotes (```) on separate lines, e.g.:

In file included from /var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs/../../module/zfs/sha256.c:29:0:
/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/include/sys/sha256.h:5:24: fatal error: asm/sha256.h: No such file or directory
 #include <asm/sha256.h>
                        ^
compilation terminated.


probably should be

#include <asm-generic/sha256.h>
#include <asm-x86_64/sha256.h>

instead of

#include <asm/sha256.h>

kernelOfTruth · 2014-07-14T01:57:25Z

@chrisrd thank you very much for this information 👍

tuxoko · 2014-07-14T12:40:35Z

@kernelOfTruth
How do you build your package?
Do you have full configure and build log?
The <asm/sha256.h> should be copied from the target arch's directory during configure.
I'm not sure why it didn't happen for you.

kernelOfTruth · 2014-07-24T02:09:45Z

@tuxoko sorry for the delay

building via the Gentoo package manager

manually unpacking (ebuild zfs-kmod-9999.ebuild unpack / ebuild zfs-9999.ebuild unpack)

patching it in

then

ebuild zfs-kmod-9999.ebuild compile install qmerge

will post the log later

kernelOfTruth · 2014-07-26T17:31:16Z

oops, sorry,

wrong log caused by permissions,

will post the correct one later - mea culpa :(

kernelOfTruth · 2014-07-27T21:17:37Z


patch -p1 < /usr/src/sources/zfs/current/27.07.2014_sha256\ x86_64\ optimization\ v2_2351/27.07.2014_sha256\ x86_64\ optimization\ v2_2351.diff 
patching file config/user-arch.m4
patching file configure.ac
patching file include/.gitignore
patching file include/Makefile.am
patching file include/asm-generic/Makefile.am
patching file include/asm-generic/sha256.h
patching file include/asm-x86_64/Makefile.am
patching file include/asm-x86_64/sha256.h
patching file include/sys/Makefile.am
Hunk #1 succeeded at 37 with fuzz 1 (offset 1 line).
patching file include/sys/sha256.h
patching file include/sys/zio_checksum.h
patching file lib/libzpool/Makefile.am
Hunk #1 succeeded at 125 (offset 2 lines).
patching file module/.gitignore
patching file module/Makefile.in
patching file module/zfs/Makefile.in
Hunk #1 succeeded at 99 (offset 2 lines).
patching file module/zfs/asm-x86_64/Makefile.in
patching file module/zfs/asm-x86_64/sha256-avx-asm.S
patching file module/zfs/asm-x86_64/sha256-avx2-asm.S
patching file module/zfs/asm-x86_64/sha256-ssse3-asm.S
patching file module/zfs/asm-x86_64/sha256_x86_64.c
patching file module/zfs/sha256.c
patching file module/zfs/spa_misc.c
Hunk #1 succeeded at 1662 with fuzz 2 (offset 3 lines).
patching file module/zfs/zio_checksum.c

applying the patch:


In file included from /var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs/../../module/zfs/sha256.c:29:0:
/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/include/sys/sha256.h:5:24: fatal error: asm/sha256.h: No such file or directory
 #include <asm/sha256.h>
                        ^
compilation terminated.
/usr/src/linux-3.14.14_btrfs_test29/scripts/Makefile.build:308: recipe for target '/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs/../../module/zfs/sha256.o' failed
make[6]: *** [/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs/../../module/zfs/sha256.o] Error 1
make[6]: *** Waiting for unfinished jobs....
/usr/src/linux-3.14.14_btrfs_test29/scripts/Makefile.build:455: recipe for target '/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs' failed
make[5]: *** [/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs] Error 2
/usr/src/linux-3.14.14_btrfs_test29/Makefile:1277: recipe for target '_module_/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module' failed
make[4]: *** [_module_/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module] Error 2
Makefile:133: recipe for target 'sub-make' failed
make[3]: *** [sub-make] Error 2
make[3]: Leaving directory '/usr/src/linux-3.14.14_btrfs_test29'
Makefile:19: recipe for target 'modules' failed
make[2]: *** [modules] Error 2
make[2]: Leaving directory '/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module'
Makefile:675: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999'
Makefile:543: recipe for target 'all' failed
make: *** [all] Error 2
 * ERROR: sys-fs/zfs-kmod-9999::gentoo failed (compile phase):
 *   emake failed
 * 
 * If you need support, post the output of `emerge --info '=sys-fs/zfs-kmod-9999::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=sys-fs/zfs-kmod-9999::gentoo'`.
 * The complete build log is located at '/var/log/portage/sys-fs:zfs-kmod-9999:20140727-211202.log'.
 * For convenience, a symlink to the build log is located at '/var/tmp/portage/sys-fs/zfs-kmod-9999/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/sys-fs/zfs-kmod-9999/temp/environment'.
 * Working directory: '/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999'
 * S: '/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999'

full build log:

http://pastebin.com/RxA81VhW

kernelOfTruth · 2014-07-27T21:26:01Z

out of /var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/ :

grep -iR "asm/sha256.h" .
./include/sys/sha256.h:#include <asm/sha256.h>

find . | grep sha256.h
./include/asm-generic/sha256.h
./include/asm-x86_64/sha256.h
./include/sys/sha256.h

after replacing

#include <asm/sha256.h>

in

include/sys/sha256.h

with

#include <asm-generic/sha256.h>
#include <asm-x86_64/sha256.h>

it compiles fine

is there a way to check if this optimization is in use ?

ryao · 2014-10-26T04:32:41Z

@kernelOfTruth A simple way is to profile using perf and see what symbols are in use.

http://wiki.gentoo.org/wiki/ZFSOnLinux_Development_Guide#Generating_a_Flame_Graph_with_Perf

Another way is to attach gdb to your kernel and check the value of sha256_transform_asm against the various routines.

@tuxoko I am in a position to test the AVX2 routine, although I might not find time to do that this week. Also, this would benefit from further revision when Broadwell debuts the new sha256 instructions:

https://software.intel.com/en-us/articles/intel-sha-extensions

ryao · 2014-10-26T05:16:21Z

module/zfs/sha256.c

@@ -125,3 +132,51 @@ zio_checksum_SHA256(const void *buf, uint64_t size, zio_cksum_t *zcp)
 	    (uint64_t)H[4] << 32 | H[5],
 	    (uint64_t)H[6] << 32 | H[7]);
 }
+
+void (*sha256_transform)(const void *, uint32_t *, uint64_t);


Using a writeable function pointer here will break PaX builds. The PaX plugin was designed to break builds when writeable function pointers like these are used because they can be written with the address of arbitrary code. It would be better to have an enum that is set based on the test results and then used to select the correct function via a switch statement.

ryao · 2014-10-26T05:40:40Z

The sha256 checksums are calculated in such a way that we generate big endian versions of them. Do the Intel routines provided also do that? If not, we will need to do byte swapping to fix that. Otherwise, we risk introducing a disk format change.

Also, it might be worth considering whether we could have the compiler generate "optimized" versions of this routine against different CPUs for us. I modified our current sha256.c to allow GCC to generate assembly code from a single file, added the static inline suggestion that I made and built it with gcc -S -O3 -fno-stack-protector -march=core-avx2 sha256.c. The resulting sha256.s is not perfect, but it is rather good

http://dpaste.com/24NZ4K6
http://dpaste.com/28EPAEP

An alternative way of doing achieving what this pull request aims to do without using hand writen assembly would be to split sha256.c into two files, sha256-base.c and sha256-generic.c. The former would contain the logic for switching between implementations while the latter would be used with different compiler invocations to obtain the same routine built for different CPUs. We would change the name of the function on each via a CPP switch (e.g. -DSHA256_NAME=sha256_transform_avx2). Then we could link it all together and get a similar effect to assembly, with the benefit that we can include custom versions for as many CPUs as we want.

It would be interesting to do benchmarks to see if the handwritten assembly is noticeably faster than the GCC output. If it is not, then we could avoid adding hand written assembly, yet receive the benefits in a way that could be adapted to other ISAs without the need for one of us to understand the ISA.

ryao · 2014-10-26T05:48:13Z

It occurs to me that we could tell the compiler to build the existing SHA256 routine with SSE2 instructions on amd64. The kernel's build system explicitly tells the compiler not to do this because we need to use kernel_fpu_begin()/kernel_fpu_end() to make it safe. SSE2 is always available on amd64 processors and we are already talking about using kernel_fpu_begin()/kernel_fpu_end(), so there is little reason not to use it to accelerate the common case.

ryao · 2014-10-26T06:13:18Z

module/zfs/asm-x86_64/sha256_x86_64.c

+#endif
+
+	if (sha256_transform_asm)
+		sha256_transform = arch_sha256_transform;


I realize that we currently do not support realtime kernels, but that support will come as soon as someone writes patches for it. kernel_fpu_begin()/kernel_fpu_end() turns off interrupts inside the critical section. Turning off interrupts for any appreciable amount of time is undesireable on realtime systems. We likely should have a module option to allow optimizations to be disabled on such systems. It might even be better to disable them by default on realtime kernels.

ryao · 2014-10-26T14:47:25Z

@tuxoko The following documentation should be useful for implementing these routines in userspace:

Section 6.4:
http://www.agner.org/optimize/calling_conventions.pdf

Section 15.1:
http://www.agner.org/optimize/optimizing_assembly.pdf

I have Haswell hardware that I can use for testing, although it should also be possible to do testing with QEMU.

ryao · 2014-10-26T15:21:30Z

module/zfs/sha256.c

+	0x8d5651e46d3cdb76, 0x2d02d0bf37c9e592 }},
+};
+
+static void sha256_test(void)


Introducing a self-test routine is an excellent idea. However, it is important to understand that the generic routine is restricted to a subset of instructions that operate on the normal integer registers, such that the likelihood of a defect that only affects checksums is small. This allowed us to avoid introducing a self check in the past, yet still be relatively safe.

Introducing optimized assembly changes things because we begin exercising transistors that often go unused during normal operation. This dramatically increases the risk of a CPU defect affecting the checksum routines. Having a self test occur only during debug builds means that the vast majority of ZoL installations will have nothing to guard against such defects. At the same time, using preprogrammed data to do a comparison risks passing CPUs with hypothetical defects that affect only some byte sequences and not others. We could be running on a system where all but 1 CPU core is good, so only checking 1 core like we do here would miss it. We could even begin the test on a bad CPU core, but fail to detect it because we are rescheduled to a good CPU core.

With those things in mind, I would like to see some changes:

A self test should be done whenever we use optimized assembly. This includes non-debug builds. Running this in debug builds when we use non-optimized assembly as you do here should also be done.

When we detect that we can use the optimized assembly routine, we should perform a second self-check routine that initializes a bufffer with random data, calculates the hash with the generic routine, calculates the hash with the optimized routine, and compares the result. This is intended to provide some protection against defects that would get past a static input.

We need to do an Illumos-style xcall to run this check on all available CPU cores. We would want to implement the xcall infrastructure in the SPL using Linux's on_each_cpu() routine. Code operating in that context will operate with interrupts disabled, so there is no risk of being rescheduled (and failing to test CPU cores) like we have here.

There exist Linux systems that support hotpluggable CPUs, so we should detect the addition of CPU cores to a system so that we can test them. I do not know how to do this offhand, so it needs investigation.

This is a revision of openzfs#2332 Currently, the optimization only applies to kernel space, because I haven't figured out how to it properly in user space. AVX2 is untested because I don't have such CPU. So use it with you own discretion.

pavel-odintsov · 2015-01-13T19:06:36Z

Hello, folks!

Any progress in this issue?

This is a revision of openzfs#2332 Currently, the optimization only applies to kernel space, because I haven't figured out how to it properly in user space. AVX2 is untested because I don't have such CPU. So use it with you own discretion.

… v2) In file included from /var/tmp/portage/sys-fs/zfs-kmod-9999-r1/work/zfs-kmod-9999/module/zfs/../../module/zfs/sha256.c:29:0: /var/tmp/portage/sys-fs/zfs-kmod-9999-r1/work/zfs-kmod-9999/include/sys/sha256.h:5:24: fatal error: asm/sha256.h: No such file or directory #include <asm/sha256.h>

sempervictus · 2016-03-10T10:01:47Z

@tuxoko: is there any chance you have a version of this which is compatible with the abd_next branch?
I also got an error about bool being an undefined type which required that i

#include <stdbool.h>

in order to bypass it - not sure how "allowed" that is, or why i'm seeing it on my end.

tuxoko · 2016-03-10T18:41:15Z

@sempervictus
Which file? I think the correct way in kernel is to #include <linux/types.h>

kernelOfTruth · 2016-03-10T18:59:27Z

zfs module import is failing due to:

zfs: Unknown symbol arch_sha256_init (err 0)

tuxoko · 2016-03-10T19:29:54Z

@kernelOfTruth
Building kmod would fail, I have no idea why, but the Makefile.in inside asm-x86_64 don't transform into Makefile.

tuxoko · 2016-03-10T21:23:32Z

Fixed kmod build. But in-tree still fails.

tuxoko · 2016-03-10T22:26:25Z

Fix typo in kernel_fpu_end and fix build error in linux 4.5

behlendorf · 2016-03-10T22:33:40Z

@tuxoko it would be great if you could rebase this patch on the work @ironMann has done in #4381. That would help give us a good idea if the proposed generic interfaces are going to meet our needs.

tuxoko · 2016-03-21T19:43:07Z

Rebase to master.

ironMann · 2016-03-21T20:25:11Z

module/zfs/sha256-avx2-asm.S

+XTMP3 = %ymm3
+XTMP4 = %ymm8
+XFER  = %ymm9
+XTMP5 = %ymm11


this will run only in 64bit mode. (regs ymm8...15 are used). Such code should be
protected with

#include <sys/isa_defs.h> #if defined(HAVE_AVX2) && defined(__x86_64)

It is only built in x86_64, see module/zfs/Makefile.in

right, I made an incomplete case for ifdefs.
Having ifdef(HAVE_AVX2) around the code will prevent old compilers and binutils (gcc older than 4.7) from going in and choking on unknown instructions.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>

tuxoko · 2016-06-03T18:26:55Z

Update: rebase to master, cleanup code, add module parameter to choose algo like in fletcher, add benchmark to select fastest during init.

Add ssse3, avx, avx2 optimized sha256. During module init, the fastest available version will be selected. Currently, we only support optimization in kernel space. User programs will use generic code. Note: The sha256-{ssse3,avx,avx2}-asm.S files are from linux-3.14. Signed-off-by: Chunwei Chen <david.chen@osnexus.com>

tcaputi · 2016-06-06T20:31:55Z

Hello. I know I am coming into this PR very, very late, but I just noticed it. I just wanted to make sure you guys were aware of PR #4329 for ZFS encryption. The first commit in that PR ports the crypto API from Illumos to a ZoL kernel module. This code includes a sha256 implementation with x86_64 assembly that compiles in both userspace and kernel space. In the current PR, I have not replaced the existing sha256 code in an effort to limit the scope of the PR (which is already quite sizable). However, this would be very easy to add (probably about an hour's worth of time, 45 minutes of which would just be verifying I didn't break anything on big endian systems). I certainly don't mean to step on anybody's toes, but would it make sense to look at doing this? It might not considering that the encryption patch might take a while to get merged (considering its size).

behlendorf · 2016-06-06T23:47:22Z

@tcaputi thanks for commenting, I've posted a more detailed comment in #4329 about this. The short version is as a first step toward ZFS encryption let's get the crypto framework merged and a few smaller changes which leverage it. That'll help us shake out any issues. I suspect we'll want to use the vectorized sha256 version implemented here when available. @tuxoko do you have any benchmark results for this?

tuxoko · 2016-06-07T00:20:34Z

@behlendorf
IIRC, the benchmark is something like this on a Haswell i7:
generic ~240MiB/s
ssse3, avx ~390MiB/s
avx2 ~460MiB/s

behlendorf · 2016-07-27T21:06:46Z

@tuxoko now that vectorized fletcher, raidz, and crypto framework are all in master I think would be a good time to rebase this so we can get it finalized and merged. The straight forward thing to do is probably just extend your existing patch to include the sha256 implementation from the icp module as an option. It would be good to fix it up so it builds in user space as well like the similar fletcher code.

behlendorf · 2016-08-16T23:07:58Z

Closing for now to minimize the number of open action PRs. It can be reopened when someone has time to work on this.

sempervictus · 2017-12-21T14:53:51Z

@behlendorf: any chance of revisiting this, or implementing something newer than the rather old OpenSSL derived functions?

behlendorf · 2017-12-21T19:21:38Z

@sempervictus I'd love to see this implemented if someone has the time to work on it.

tuxoko mentioned this pull request May 30, 2014

sha256 x86_64 optimization (RFC, prototype) #2332

Closed

edillmann mentioned this pull request Aug 5, 2014

SPL: Fixing allocation for task txg_sync which used GFP flags 0x7aebba4 with PF_NOFS set #2569

Closed

kernelOfTruth mentioned this pull request Oct 5, 2014

Move ARC data buffers out of vmalloc #2129

Closed

kernelOfTruth mentioned this pull request Oct 16, 2014

3525 persistent l2arc. #2672

Closed

ryao reviewed Oct 26, 2014
View reviewed changes

ryao mentioned this pull request Oct 26, 2014

Verify checksum of the ZFS module text and rodata before each transaction group commit #2832

Open

kernelOfTruth mentioned this pull request Oct 29, 2014

"zpool upgrade -v" aborting with "illegal hardware instruction" error kernelOfTruth/ZFS-for-SystemRescueCD#2

Open

DeHackEd mentioned this pull request Nov 11, 2014

ZFS deduplication produce excessive cpu usage by SHA256 #2887

Closed

behlendorf added the Type: Feature Feature request or new feature label Jan 16, 2015

tuxoko force-pushed the asm2 branch from f601bbc to df855e1 Compare March 10, 2016 20:08

tuxoko force-pushed the asm2 branch 2 times, most recently from fd01b7e to 8baa4a5 Compare March 10, 2016 22:25

tuxoko force-pushed the asm2 branch from 8baa4a5 to c777338 Compare March 21, 2016 19:42

ironMann reviewed Mar 21, 2016
View reviewed changes

Don't ignore Makefile.in in module

4925d7e

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>

tuxoko force-pushed the asm2 branch from c777338 to 363ad74 Compare June 3, 2016 18:23

tuxoko force-pushed the asm2 branch from 363ad74 to f736bd8 Compare June 3, 2016 20:48

behlendorf mentioned this pull request Jun 6, 2016

ZFS Encryption #4329

Closed

behlendorf closed this Aug 16, 2016

jgottula mentioned this pull request Dec 13, 2017

Linux 4.13+: Warnings from objtool while compiling the 'icp' kernel module: "unsupported stack pointer realignment" (in SHA1/SHA2 functions) #6950

Closed

jumbi77 mentioned this pull request Sep 18, 2021

Detect SHA extensions #12549

Closed

13 tasks

sha256 x86_64 optimization v2 #2351

sha256 x86_64 optimization v2 #2351

Conversation

tuxoko commented May 30, 2014 • edited Loading

kernelOfTruth commented Jul 13, 2014

chrisrd commented Jul 14, 2014

kernelOfTruth commented Jul 14, 2014

tuxoko commented Jul 14, 2014

kernelOfTruth commented Jul 24, 2014

kernelOfTruth commented Jul 26, 2014

kernelOfTruth commented Jul 27, 2014

kernelOfTruth commented Jul 27, 2014

ryao commented Oct 26, 2014

ryao Oct 26, 2014

Choose a reason for hiding this comment

ryao commented Oct 26, 2014

ryao commented Oct 26, 2014

ryao Oct 26, 2014

Choose a reason for hiding this comment

ryao commented Oct 26, 2014

ryao Oct 26, 2014

Choose a reason for hiding this comment

pavel-odintsov commented Jan 13, 2015

sempervictus commented Mar 10, 2016

tuxoko commented Mar 10, 2016

kernelOfTruth commented Mar 10, 2016

tuxoko commented Mar 10, 2016

tuxoko commented Mar 10, 2016

tuxoko commented Mar 10, 2016

behlendorf commented Mar 10, 2016

tuxoko commented Mar 21, 2016

ironMann Mar 21, 2016

Choose a reason for hiding this comment

tuxoko Mar 21, 2016

Choose a reason for hiding this comment

ironMann Jun 3, 2016

Choose a reason for hiding this comment

tuxoko commented Jun 3, 2016

tcaputi commented Jun 6, 2016

behlendorf commented Jun 6, 2016

tuxoko commented Jun 7, 2016

behlendorf commented Jul 27, 2016

behlendorf commented Aug 16, 2016

sempervictus commented Dec 21, 2017

behlendorf commented Dec 21, 2017

tuxoko commented May 30, 2014 •

edited

Loading