Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64 blake3 assembly trashes x18 #14714

Open
zxombie opened this issue Apr 4, 2023 · 18 comments
Open

AArch64 blake3 assembly trashes x18 #14714

zxombie opened this issue Apr 4, 2023 · 18 comments
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@zxombie
Copy link
Contributor

zxombie commented Apr 4, 2023

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version 14.0-CURRENT
Kernel Version 14.0-CURRENT
Architecture arm64
OpenZFS Version zfs-2.1.99-FreeBSD_g57cfae4a2

Describe the problem you're observing

Kernel panics because the accelerated blake3 implementation trashes the x18 register. In b3_aarch64_sse2.S and b3_aarch64_sse41.S the x18 register is used. This is reserved on FreeBSD/arm64 for a per-CPU pointer. The kernel is built with the -ffixed-x18 compiler flag to ensure C code doesn't write to x18, however the above two files modify either it or the 32 bit alias w18.

Describe how to reproduce the problem

Boot on FreeBSD/arm64 with the zfs module

@zxombie zxombie added the Type: Defect Incorrect behavior (e.g. crash, hang) label Apr 4, 2023
@mcmilk
Copy link
Contributor

mcmilk commented Apr 4, 2023

I will fix this and provide a pull request soon.

@kevans91
Copy link
Contributor

kevans91 commented Apr 4, 2023

I will fix this and provide a pull request soon.

awesome, thanks! I have another patch for more freebsd/aarch64 fallout that I'll push through in a PR momentarily as well to complement yours. freebsd/arm is also going to be broken for similar reasons (needed kfpu_ handling), but I don't really have a good setup to test that with and I'm not sure how many people use it on armv7 anyways...

@zxombie
Copy link
Contributor Author

zxombie commented Apr 4, 2023

Is this generated from the blake3 C code? If so would it be a problem to also build with -mbranch-protection=standard? It adds support for the pointer authentication and branch target identification extensions. The new instructions it adds are nops in CPUs that don't implement these extensions.

@rincebrain
Copy link
Contributor

rincebrain commented Apr 4, 2023

I have a working FreeBSD/AArch64 with the various SIMD acceleration modes implemented that passed everything last I tried it, as I recall, I should probably push that...

I could test on ARMv7 if that's needed.

@rincebrain rincebrain added the Type: Architecture Indicates an issue is specific to a single processor architecture label Apr 4, 2023
@kevans91
Copy link
Contributor

kevans91 commented Apr 4, 2023

I pushed part of my patch in PR #14715, just to implement kfpu_begin/kfpu_end. Combined with a hackaround for blake2 to avoid x18, that was sufficient to boot again, but this system just uses checksum=on everywhere and thus isn't fully exercising much beyond the initial benchmarking that was breaking the system.

@lundman
Copy link
Contributor

lundman commented Apr 4, 2023

I'd be curious to test and see if this is related to the aarch64 on macOS issues.

@kevans91
Copy link
Contributor

kevans91 commented Apr 4, 2023

I'd be curious to test and see if this is related to the aarch64 on macOS issues.

fwiw, I noted in one of my emails that this probably is broken for the same reason on macOS. See: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms

The platforms reserve register x18. Don’t use this register.

@lundman
Copy link
Contributor

lundman commented Apr 4, 2023

I wonder if CC_AArch64Apple ever landed in llvm and I can use it for this as well.

@oconnor663
Copy link

Is this generated from the blake3 C code?

Reading the comments at the top of those assembly files, it looks like they're converted from upstream x86 assembly, using a tool called SIMD Everywhere.

@kevans91
Copy link
Contributor

kevans91 commented Apr 6, 2023

Is this generated from the blake3 C code?

Reading the comments at the top of those assembly files, it looks like they're converted from upstream x86 assembly, using a tool called SIMD Everywhere.

Yeah, we've since worked out how to generate them and from where, but unfortunately the process is incredibly undocumented and makes this a very unreproducible process. At the very least:

  1. What compiler flags should be used? I guess -O2 at a minimum + the ones we're asking for, but it's undocumented
  2. To what extent is the output massaged? I see at least symbols have been renamed, what else has happened?

Knowing what compiler version was used would also be helpful to try and avoid major changes in codegen. These are the kinds of things that would've been nice to see documented in the output files or the commit message so that others could reproduce the massive assembly blobs.

@oconnor663
Copy link

If I understand the question correctly, the answer is that @sneves wrote upstream assembly files by hand, so they aren't reproducible from the output of any compiler.

@kevans91
Copy link
Contributor

kevans91 commented Apr 7, 2023

If I understand the question correctly, the answer is that @sneves wrote upstream assembly files by hand, so they aren't reproducible from the output of any compiler.

These specific files are versions of blake3_sse*.c compiled into assembly and pushed into the openzfs tree; upstream only has x86 versions of these in assembly. (sse is notably an x86 construct)

@bsdimp
Copy link
Contributor

bsdimp commented Apr 10, 2023

Any chance this will fix the following bogus code in ./module/zstd/include/aarch64_compat.h

#ifdef _KERNEL
#undef __aarch64__
#endif

Instead, it should be #undef HAS_NEON or similar with all the #ifdefs updated from

#if defined(__aarch64__) || \
        (defined(__x86_64) && defined(HAVE_SSE2)) || \
        (defined(__PPC64__) && defined(__LITTLE_ENDIAN__))
        &blake3_sse2_impl,
#endif
#if defined(__aarch64__) || \
        (defined(__x86_64) && defined(HAVE_SSE4_1)) || \
        (defined(__PPC64__) && defined(__LITTLE_ENDIAN__))
        &blake3_sse41_impl,
#endif

to something like

#if (defined(__aarch64__) && defined(HAVE_NEON) || \
        (defined(__x86_64) && defined(HAVE_SSE2)) || \
        (defined(__PPC64__) && defined(__LITTLE_ENDIAN__))
        &blake3_sse2_impl,
#endif
#if (defined(__aarch64__) && defined(HAVE_NEON2) || \
        (defined(__x86_64) && defined(HAVE_SSE4_1)) || \
        (defined(__PPC64__) && defined(__LITTLE_ENDIAN__))
        &blake3_sse41_impl,
#endif

because right now the undef is highly bogus (and causes problems). I have to work around this in FreeBSD's boot loader because there's no blake_sse2_impl code checked in for aarch64 (or hasn't been in the past, it seems to have changed though) Besides, this should be blake3_neon_impl since sse2 is a specific x86 thing. If it's not in the repo, it shouldn't be automatically enabled like this. and it shouldn't assume that OpenZFS is only used in environments where it can be used.

@rincebrain
Copy link
Contributor

It's not called _neon_impl because there is a _neon_impl in the BLAKE3 tree and it's slow, it's called the same thing because it's compiled from the SSE intrinsics implementations with SIMDe, which implements compiler intrinsics for platform A in terms of platform B's.

@bsdimp
Copy link
Contributor

bsdimp commented Apr 10, 2023

It's not called _neon_impl because there is a _neon_impl in the BLAKE3 tree and it's slow, it's called the same thing because it's compiled from the SSE intrinsics implementations with SIMDe, which implements compiler intrinsics for platform A in terms of platform B's.

It should have comments about why this is done then. Currently it's all a mass of undocumented tribal knowledge that's confusing for people that aren't well versed in the history of why this came to be.

@rincebrain
Copy link
Contributor

rincebrain commented Apr 10, 2023

Sure, I'm almost always in favor of more documentation.

Also, as said by the PR author, though I think it was in the PR, he tried to have it just compile at runtime, but a lot of compilers result in wildly varying performance and warnings in the generated ASM, so he settled for just massaging the one that seemed best performing.

@mcmilk
Copy link
Contributor

mcmilk commented Jan 15, 2024

I think this issue is fixed and can be closed - is this correct?

@mcmilk
Copy link
Contributor

mcmilk commented Jan 16, 2024

Is this generated from the blake3 C code? If so would it be a problem to also build with -mbranch-protection=standard? It adds support for the pointer authentication and branch target identification extensions. The new instructions it adds are nops in CPUs that don't implement these extensions.

I need to remove this option, because it generates issues with older gcc version (gcc8 + gcc9) - see here: #14965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

7 participants