Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: small fixes for FreeBSD/aarch64 #14715

Closed
wants to merge 2 commits into from

Conversation

kevans91
Copy link
Contributor

@kevans91 kevans91 commented Apr 4, 2023

Motivation and Context

Following PR #13741, FreeBSD/aarch64 experiences a panic at boot as we try to use FPU instructions while doing the startup benchmarking.

Description

The primary fix here is to add a proper implementation of kfpu_begin()/kfpu_end() for aarch64, which is implemented much like the x86 version -- fpu_kern(9) to wrap sections that want to do FPU things. I discovered two other issues in the openzfs build while I was forward porting these to the openzfs repo, I fixed one (need to remove -mgeneral-regs-only for sha256/sha512 .S files) and punted on the other one (non-trivial conflict between sys/elf_common.h and spl/sys/vnode.h. More details on the non-trivial conflict in the "resync" commit. (Edit to note: the one I punted on has since been fixed by PR #14674; verbiage about the conflict has been removed from commit messages)

How Has This Been Tested?

Booted FreeBSD/aarch64 with changes applied. Another change was needed to stop clobbering x18 in the blake2 implementation; @mcmilk said he was working on a patch for this (I tested with a naive patch that "fixed" it but may not have done so properly).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@kevans91
Copy link
Contributor Author

kevans91 commented Apr 4, 2023

CC @mmatuska @zxombie

@@ -507,6 +507,16 @@ CFLAGS.zstd_lazy.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_ldm.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_opt.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}

sha256-armv8.o: sha256-armv8.S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this hackish thing ;-)
I would change sha(256|512)-armv8.S to fit without this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, but I just copied the technique from the next couple of lines after it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not really sure what you mean. We have to pass -mgeneral-regs-only normally to avoid unexpected fpu use, but we really need the neon instructions here. We don't have a better way to strip it from just these four files AFAICT, and removing the offending instructions completely defeats the point of the implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a runtime test to check for fpu+armv8 crypto ... when this test is okay, the code gets executed.
When this can not be done this way, we may need some more compiler options... but I would favorite the current test + execute thing.... of cause - the register x18 thing ... I didn't know. I will fix this ... but currently I don't have the time for it. Maybe next week.

The current work arround may be some #ifdef __FreeBSD__ within the assembly + glue code ... but when this has one or two weeks, then I have a pull request for it.

Copy link
Contributor

@jrtc27 jrtc27 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-mgeneral-regs-only forbids use of floating-point instructions in assembly. These files, by design, use floating-point. Therefore the flag must be stripped specifically for these files, otherwise they will not build. What is your exact objection? It's unclear what your two paragraphs are actually saying specifically on a technical level. This isn't OS-specific; it's just a difference between LLVM, which applies -mgeneral-regs-only to the assembly, and GCC, which doesn't forward it on to GNU as (and there may not even be a flag to do it?). Compiling the Linux module with LLVM would hit the exact same issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit odd to me that CFLAGS applies to assembly, but that is a complaint to file with LLVM.

Copy link
Contributor

@jrtc27 jrtc27 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFLAGS is just the name for it in the build system. And nothing says -mfoo is purely for the compiler and not the assembler; there's precedent with things like -march governing which instructions you get, with -mgeneral-regs-only being implemented in LLVM as disabling the fp-armv8, crypto and neon features, i.e. the same as what you get with -march=armv8-a+nofp+nosimd+nocrypto. It's just a difference of opinion.

Copy link
Contributor

@ryao ryao Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-mgeneral-regs-only does more than just that. It permanently prevents you from turning on these instructions via pragmas, which has caused me plenty of grief late last year when looking into aarch64 optimizations on key fletcher4 functions. :/

In any case, I have far too much to do to have time to complain to LLVM about this and even if I did and they agreed, we would still need to support the older versions anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


#define kfpu_allowed() 1
#define kfpu_initialize(tsk) do {} while (0)
#define kfpu_begin() do {} while (0)
#define kfpu_end() do {} while (0)
#define kfpu_begin() do { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been tested? Maybe we can wait a bit, until I fixed blake3 and sha2 assembly.
I think I dig deeper into this tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as noted in the x18 issue, I tested this with hacked-around blake3 and can actually boot again now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to enter the fpu state because the checksum benchmark is called from the boot thread. In this thread floating point instructions are disabled so we need to enter a fpu section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been tested? Maybe we can wait a bit, until I fixed blake3 and sha2 assembly. I think I dig deeper into this tomorrow.

Master being broken on FreeBSD is a big problem. Rather than leaving master broken while we figure out the ideal solution, we should fix it now and revise it to be more elegant later to allow development that depends on aarch64 working on FreeBSD to continue independently of this.

@mmatuska
Copy link
Contributor

mmatuska commented Apr 6, 2023

Btw. you can make a FreeBSD differential revision as well and we can have it in base "earlier" than it gets committed to OpenZFS so that we make arm64 working again.

@jrtc27
Copy link
Contributor

jrtc27 commented Apr 6, 2023

Regarding x18, Linux doesn't use it in the same way FreeBSD does, but it does use it if you enable the SHADOW_CALL_STACK config option. So you'll surely find that using ZFS with CONFIG_SHADOW_CALL_STACK on Linux also blows up. So, again, this is not a FreeBSD-specific problem.

@jrtc27
Copy link
Contributor

jrtc27 commented Apr 6, 2023

On a related note, why does OpenZFS even have 2000+ line assembly files checked in that are the result of modifying the output of a random version of a random compiler run on an unknown source file? There are no comments, no hope of verifying the assembly does what it's supposed to do beyond black-box testing, and no ability to make changes as needed. Why is this not just done like any other C file and built from the true source, whatever C produced these gargantuan opaque blobs?

@jrtc27
Copy link
Contributor

jrtc27 commented Apr 6, 2023

At least with the SHA2 files they're taken from OpenSSL's output and "modified [...] to fit into OpenZFS", though who knows what those modifications are, one would have to diff them against whatever the OpenSSL perl scripts spit out.

@jrtc27
Copy link
Contributor

jrtc27 commented Apr 6, 2023

The reference BLAKE3 implementation also has a blake3_neon.c, which makes me wonder why we're using converted-with-simde versions of the SSE2 and SSE4.1 (I don't even want to ask why we want both of those on non-x86...) source rather than the native NEON version?

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 6, 2023
@kevans91 kevans91 force-pushed the freebsd_aarch64 branch 2 times, most recently from 64e2cbc to bd049f3 Compare April 6, 2023 17:53
Copy link
Contributor

@ryao ryao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

sha256-armv8.S and sha512-armv8.S need the same treatment as the sse
bits; removal of -mgeneral-regs-only from flags.

This fixes errors about requiring NEON, which is a difference in clang
vs. gcc treatment of -mgeneral-regs-only being specified on asm files.

Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU
usage, otherwise we panic promptly at boot as soon as ZFS attempts to
do checksum benchmarking.

Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
@tonyhutter
Copy link
Contributor

@kevans91 would you mind doing a force-push to re-run buildbot? I had to restart buildbot to install some updates and I think it caused some of your builds to fail.

@kevans91
Copy link
Contributor Author

kevans91 commented Apr 7, 2023

I'm not sure I understand how the FreeBSD amd64 buildbot failed... nothing in the configure output looks fatal. stdio for shell_1 indicates it ran gmake, but there's no output from that that I can find anywhere obvious... (and this PR shouldn't have affected anything there, anyways)

Copy link
Contributor

@mcmilk mcmilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kfpu_begin() / kfpu_end() works on my system also.

@mcmilk mcmilk mentioned this pull request Apr 9, 2023
13 tasks
@mcmilk
Copy link
Contributor

mcmilk commented Apr 9, 2023

The reference BLAKE3 implementation also has a blake3_neon.c, which makes me wonder why we're using converted-with-simde versions of the SSE2 and SSE4.1 (I don't even want to ask why we want both of those on non-x86...) source rather than the native NEON version?

I benchmarked and the neon impl. isn't as fast as the converted sse2/sse41 impl. The first versions of the BLAKE3 patchset used the neon c file... the huge differences are seen on e.g. Apple M1.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 10, 2023
behlendorf pushed a commit that referenced this pull request Apr 10, 2023
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU
usage, otherwise we panic promptly at boot as soon as ZFS attempts to
do checksum benchmarking.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #14715
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Apr 11, 2023
sha256-armv8.S and sha512-armv8.S need the same treatment as the sse
bits; removal of -mgeneral-regs-only from flags.

This fixes errors about requiring NEON, which is a difference in clang
vs. gcc treatment of -mgeneral-regs-only being specified on asm files.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes openzfs#14715
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Apr 11, 2023
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU
usage, otherwise we panic promptly at boot as soon as ZFS attempts to
do checksum benchmarking.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes openzfs#14715
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Apr 11, 2023
sha256-armv8.S and sha512-armv8.S need the same treatment as the sse
bits; removal of -mgeneral-regs-only from flags.

This fixes errors about requiring NEON, which is a difference in clang
vs. gcc treatment of -mgeneral-regs-only being specified on asm files.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes openzfs#14715
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Apr 11, 2023
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU
usage, otherwise we panic promptly at boot as soon as ZFS attempts to
do checksum benchmarking.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes openzfs#14715
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Apr 11, 2023
sha256-armv8.S and sha512-armv8.S need the same treatment as the sse
bits; removal of -mgeneral-regs-only from flags.

This fixes errors about requiring NEON, which is a difference in clang
vs. gcc treatment of -mgeneral-regs-only being specified on asm files.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes openzfs#14715
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Apr 11, 2023
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU
usage, otherwise we panic promptly at boot as soon as ZFS attempts to
do checksum benchmarking.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes openzfs#14715
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants