-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AArch64 blake3 assembly trashes x18 #14714
Comments
I will fix this and provide a pull request soon. |
awesome, thanks! I have another patch for more freebsd/aarch64 fallout that I'll push through in a PR momentarily as well to complement yours. freebsd/arm is also going to be broken for similar reasons (needed kfpu_ handling), but I don't really have a good setup to test that with and I'm not sure how many people use it on armv7 anyways... |
Is this generated from the blake3 C code? If so would it be a problem to also build with |
I have a working FreeBSD/AArch64 with the various SIMD acceleration modes implemented that passed everything last I tried it, as I recall, I should probably push that... I could test on ARMv7 if that's needed. |
I pushed part of my patch in PR #14715, just to implement kfpu_begin/kfpu_end. Combined with a hackaround for blake2 to avoid x18, that was sufficient to boot again, but this system just uses |
I'd be curious to test and see if this is related to the aarch64 on macOS issues. |
fwiw, I noted in one of my emails that this probably is broken for the same reason on macOS. See: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms
|
I wonder if |
Reading the comments at the top of those assembly files, it looks like they're converted from upstream x86 assembly, using a tool called SIMD Everywhere. |
Yeah, we've since worked out how to generate them and from where, but unfortunately the process is incredibly undocumented and makes this a very unreproducible process. At the very least:
Knowing what compiler version was used would also be helpful to try and avoid major changes in codegen. These are the kinds of things that would've been nice to see documented in the output files or the commit message so that others could reproduce the massive assembly blobs. |
If I understand the question correctly, the answer is that @sneves wrote upstream assembly files by hand, so they aren't reproducible from the output of any compiler. |
These specific files are versions of |
Any chance this will fix the following bogus code in ./module/zstd/include/aarch64_compat.h
Instead, it should be #undef HAS_NEON or similar with all the #ifdefs updated from
to something like
because right now the undef is highly bogus (and causes problems). I have to work around this in FreeBSD's boot loader because there's no blake_sse2_impl code checked in for aarch64 (or hasn't been in the past, it seems to have changed though) Besides, this should be blake3_neon_impl since sse2 is a specific x86 thing. If it's not in the repo, it shouldn't be automatically enabled like this. and it shouldn't assume that OpenZFS is only used in environments where it can be used. |
It's not called _neon_impl because there is a _neon_impl in the BLAKE3 tree and it's slow, it's called the same thing because it's compiled from the SSE intrinsics implementations with SIMDe, which implements compiler intrinsics for platform A in terms of platform B's. |
It should have comments about why this is done then. Currently it's all a mass of undocumented tribal knowledge that's confusing for people that aren't well versed in the history of why this came to be. |
Sure, I'm almost always in favor of more documentation. Also, as said by the PR author, though I think it was in the PR, he tried to have it just compile at runtime, but a lot of compilers result in wildly varying performance and warnings in the generated ASM, so he settled for just massaging the one that seemed best performing. |
I think this issue is fixed and can be closed - is this correct? |
I need to remove this option, because it generates issues with older gcc version (gcc8 + gcc9) - see here: #14965 |
System information
Describe the problem you're observing
Kernel panics because the accelerated blake3 implementation trashes the x18 register. In
b3_aarch64_sse2.S
andb3_aarch64_sse41.S
the x18 register is used. This is reserved on FreeBSD/arm64 for a per-CPU pointer. The kernel is built with the-ffixed-x18
compiler flag to ensure C code doesn't write to x18, however the above two files modify either it or the 32 bit alias w18.Describe how to reproduce the problem
Boot on FreeBSD/arm64 with the zfs module
The text was updated successfully, but these errors were encountered: