Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: small fixes for FreeBSD/aarch64 #14715

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 12 additions & 2 deletions include/os/freebsd/spl/sys/simd_aarch64.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,13 +44,23 @@
#define _FREEBSD_SIMD_AARCH64_H

#include <sys/types.h>
#include <sys/ucontext.h>
#include <machine/elf.h>
#include <machine/fpu.h>
#include <machine/md_var.h>
#include <machine/pcb.h>

#define kfpu_allowed() 1
#define kfpu_initialize(tsk) do {} while (0)
#define kfpu_begin() do {} while (0)
#define kfpu_end() do {} while (0)
#define kfpu_begin() do { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been tested? Maybe we can wait a bit, until I fixed blake3 and sha2 assembly.
I think I dig deeper into this tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as noted in the x18 issue, I tested this with hacked-around blake3 and can actually boot again now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to enter the fpu state because the checksum benchmark is called from the boot thread. In this thread floating point instructions are disabled so we need to enter a fpu section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been tested? Maybe we can wait a bit, until I fixed blake3 and sha2 assembly. I think I dig deeper into this tomorrow.

Master being broken on FreeBSD is a big problem. Rather than leaving master broken while we figure out the ideal solution, we should fix it now and revise it to be more elegant later to allow development that depends on aarch64 working on FreeBSD to continue independently of this.

if (__predict_false(!is_fpu_kern_thread(0))) \
fpu_kern_enter(curthread, NULL, FPU_KERN_NOCTX); \
} while (0)

#define kfpu_end() do { \
if (__predict_false(curthread->td_pcb->pcb_fpflags & PCB_FP_NOSAVE)) \
fpu_kern_leave(curthread, NULL); \
} while (0)
#define kfpu_init() (0)
#define kfpu_fini() do {} while (0)

Expand Down
10 changes: 10 additions & 0 deletions module/Makefile.bsd
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,16 @@ CFLAGS.zstd_lazy.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_ldm.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_opt.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}

sha256-armv8.o: sha256-armv8.S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this hackish thing ;-)
I would change sha(256|512)-armv8.S to fit without this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, but I just copied the technique from the next couple of lines after it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not really sure what you mean. We have to pass -mgeneral-regs-only normally to avoid unexpected fpu use, but we really need the neon instructions here. We don't have a better way to strip it from just these four files AFAICT, and removing the offending instructions completely defeats the point of the implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a runtime test to check for fpu+armv8 crypto ... when this test is okay, the code gets executed.
When this can not be done this way, we may need some more compiler options... but I would favorite the current test + execute thing.... of cause - the register x18 thing ... I didn't know. I will fix this ... but currently I don't have the time for it. Maybe next week.

The current work arround may be some #ifdef __FreeBSD__ within the assembly + glue code ... but when this has one or two weeks, then I have a pull request for it.

Copy link
Contributor

@jrtc27 jrtc27 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-mgeneral-regs-only forbids use of floating-point instructions in assembly. These files, by design, use floating-point. Therefore the flag must be stripped specifically for these files, otherwise they will not build. What is your exact objection? It's unclear what your two paragraphs are actually saying specifically on a technical level. This isn't OS-specific; it's just a difference between LLVM, which applies -mgeneral-regs-only to the assembly, and GCC, which doesn't forward it on to GNU as (and there may not even be a flag to do it?). Compiling the Linux module with LLVM would hit the exact same issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit odd to me that CFLAGS applies to assembly, but that is a complaint to file with LLVM.

Copy link
Contributor

@jrtc27 jrtc27 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFLAGS is just the name for it in the build system. And nothing says -mfoo is purely for the compiler and not the assembler; there's precedent with things like -march governing which instructions you get, with -mgeneral-regs-only being implemented in LLVM as disabling the fp-armv8, crypto and neon features, i.e. the same as what you get with -march=armv8-a+nofp+nosimd+nocrypto. It's just a difference of opinion.

Copy link
Contributor

@ryao ryao Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-mgeneral-regs-only does more than just that. It permanently prevents you from turning on these instructions via pragmas, which has caused me plenty of grief late last year when looking into aarch64 optimizations on key fletcher4 functions. :/

In any case, I have far too much to do to have time to complain to LLVM about this and even if I did and they agreed, we would still need to support the older versions anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${CC} -c ${CFLAGS:N-mgeneral-regs-only} ${WERROR} ${.IMPSRC} \
-o ${.TARGET}
${CTFCONVERT_CMD}

sha512-armv8.o: sha512-armv8.S
${CC} -c ${CFLAGS:N-mgeneral-regs-only} ${WERROR} ${.IMPSRC} \
-o ${.TARGET}
${CTFCONVERT_CMD}

b3_aarch64_sse2.o: b3_aarch64_sse2.S
${CC} -c ${CFLAGS:N-mgeneral-regs-only} ${WERROR} ${.IMPSRC} \
-o ${.TARGET}
Expand Down