Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLST throws illegal instruction error on AMD K10 CPUs (Windows) #200

Closed
wjblanke opened this issue Dec 11, 2023 · 27 comments
Closed

BLST throws illegal instruction error on AMD K10 CPUs (Windows) #200

wjblanke opened this issue Dec 11, 2023 · 27 comments

Comments

@wjblanke
Copy link

wjblanke commented Dec 11, 2023

Windows event viewer shows 0xc000001d (illegal instruction) when running BLST on a K10 AMD 905e CPU

Disassembly of code at offset shows pshufb instruction being executed.

According to x86 docs:

"pshufb (packed shuffle bytes) is not supported on AMD K8/K10, first included on AMD FX* (Bulldozer). It comes with SSSE3 (Supplemental SSE3) set."

But BLST uses it quite a bit for sha256 :-(

(venv) Williams-MacBook-Pro:blst bill$ grep -r pshufb *
src/asm/x86_64-xlate.pl:my $pshufb = sub {
src/asm/sha256-x86_64.pl: pshufb $TMP,@msg[0]
src/asm/sha256-x86_64.pl: pshufb $TMP,@msg[1]
src/asm/sha256-x86_64.pl: pshufb $TMP,@msg[2]
src/asm/sha256-x86_64.pl: pshufb $TMP,@msg[3]
src/asm/sha256-x86_64.pl: pshufb $t3,@x[0]
src/asm/sha256-x86_64.pl: pshufb $t3,@x[1]
src/asm/sha256-x86_64.pl: pshufb $t3,@x[2]
src/asm/sha256-x86_64.pl: pshufb $t3,@x[3]
src/asm/sha256-x86_64.pl: '&pshufb ($t3,$t4)', # sigma1(X[14..15])
src/asm/sha256-x86_64.pl: '&pshufb ($t3,$t5)',
src/asm/sha256-x86_64.pl: #&pshufb ($t3,$t4); # sigma1(X[14..15])
src/asm/sha256-x86_64.pl: #&pshufb ($t3,$t5);

Thanks for your efforts!!! Cheers

illegalinstruction
@mratsim
Copy link
Contributor

mratsim commented Dec 11, 2023

From the README:

If final application crashes with an "illegal instruction" exception [after copying to another system], pass -D__BLST_PORTABLE__ on build.sh command line. If you don't use build.sh, complement the CFLAGS environment variable with the said command line option.

@wjblanke
Copy link
Author

wjblanke commented Dec 11, 2023

We do use portable for BLST. We run on a wide range of CPUs (100,000+ users), so we know portable works very well. AMD K10s are the only problem reports we have received. Thanks!

@wjblanke
Copy link
Author

wjblanke commented Dec 11, 2023

AMD K10 doesn't have SSSE3, eg
https://www.cpu-world.com/CPUs/K10/AMD-Phenom%20II%20X4%20905e%20-%20HD905EOCK4DGI%20(HD905EOCGIBOX).html

From BLST:

sha256_block procedure for x86_64.

This module is stripped of AVX and even scalar code paths, with
rationale that

a) AVX1 is [justifiably] faster than SSSE3 code path only on one
processor, venerable Sandy Bridge;
b) AVX2 incurs costly power transitions, which would be justifiable
if AVX2 code was executing most of the time, which is not the
case in the context;
c) all contemporary processors support SSSE3, so that nobody would
actually use scalar code path anyway;

So BLST can't run on AMD K10s without throwing illegal instructions :-(

@mratsim
Copy link
Contributor

mratsim commented Dec 11, 2023

You are mentioning src/asm/sha256-x86_64.pl but the portable one is src/asm/sha256-portable-x86_64.pl

Alternatively you can compile BLST without assembly: https://github.com/supranational/blst/blob/56f9198/src/no_asm.h#L1225

@hoffmang9
Copy link

Either way, we are building __BLST_PORTABLE - https://github.com/Chia-Network/bls-signatures/blob/main/src/CMakeLists.txt#L47

@emlowe
Copy link

emlowe commented Dec 11, 2023

Ok, in the win64 directory there doesn't appear to be sha256-portable... - however, it is there in the elf directory.

So we include for windows anyway the win64/sha256-x86_64.asm assembler file.

@emlowe
Copy link

emlowe commented Dec 11, 2023

I'll note our current released version uses some custom CMake scripts to build BLST and uses it in a Python wheel

However, our upcoming release uses a rust crate for this and leverages the provided BLST rust build environment from build.rs. We are definitely using the portable setting, but this rust crate also has an illegal instruction we believe is due to the use of pshufb. However, we don't have an entire build system on this old CPU so we don't have exact Debugging tools to pinpoint the exact failing instruction

@wjblanke
Copy link
Author

wjblanke commented Dec 11, 2023

Actually it crashed in the same code and instruction (pshufb) in the rust version. The offset (which I got from event viewer on the 905e system) is different of course.

https://github.com/supranational/blst/blob/master/src/asm/sha256-x86_64.pl#L408

rustillegal

@wjblanke wjblanke changed the title BLST throws illegal instruction error on AMD K10 CPUs BLST throws illegal instruction error on AMD K10 CPUs (Windows) Dec 11, 2023
@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

Wow! How deep is it reasonable to go? The rationale behind omitting sha256-portable-x86_64.asm from x86_64-pc-windows-msvc build is as follows. It's tricky to arrange, admittedly not impossible, but the barrier was deemed to be high enough to not attempt to jump over it, because nobody should be irresponsible enough to have an unsupported Windows system on the Internet. I mean all Windows versions that supported pre-SSSE3 processors are out of support by now.

What to do? Note that there is build/coff/sha256-portable-x86_64.s, so that you can make Win64 build as portable as Linux one by using mingw toolchain. In Rust context it means using --target=x86_64-pc-windows-gnu. Is this sufficient?

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

Alternatively you can compile BLST without assembly:

As a point of clarification. Non-assembly builds are not actually supported on x86_64 and aarch64 platforms. Attempt to build it even should fail...

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

--target=x86_64-pc-windows-gnu. Is this sufficient?

On a related note. What's your VC version? It would have to be pretty old, wouldn't it? I mean newer versions don't support older Windows, so I can imagine some jumping-through-the-hoops is going on when Windows builds are produced. Question is what's more tricky, throwing together a mingw environment or putting together out-of-support VC installation?

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

--target=x86_64-pc-windows-gnu.

Since it's not exclusively about Rust, it might be appropriate to clarify that mingw option is not limited to Rust. Of course not. And in addition to that, if you control the build procedure, you might find it useful to know that you can compile build/assemble.S with clang --target=x86_64-pc-windows-msvc as alternative to compiling build/win64/*.asm files with ml64.

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

--target=x86_64-pc-windows-gnu.

Hmm, after double-checking the suggestion that this would work on pre-SSSE3 processors turned out to be wrong. Sorry! However! It's way easier to fix that by simply harmonizing it with ELF than to square the [msvc] circle in build.rs. So the question "is this sufficient" still stands. A variant of it. In other words, the suggestion is to make x86_64-pc-windows-gnu work and call it a day :-)

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

make x86_64-pc-windows-gnu work

As in #201.

@wjblanke
Copy link
Author

Wow! How deep is it reasonable to go? The rationale behind omitting sha256-portable-x86_64.asm from x86_64-pc-windows-msvc build is as follows. It's tricky to arrange, admittedly not impossible, but the barrier was deemed to be high enough to not attempt to jump over it, because nobody should be irresponsible enough to have an unsupported Windows system on the Internet. I mean all Windows versions that supported pre-SSSE3 processors are out of support by now.

The box I am using for testing with the AMD 905e CPU is running Windows 10 Home which isn't EOL until 2025. I'll admit it is pretty slow!

What to do? Note that there is build/coff/sha256-portable-x86_64.s, so that you can make Win64 build as portable as Linux one by using mingw toolchain. In Rust context it means using --target=x86_64-pc-windows-gnu. Is this sufficient?

Unfortunately we have to use the Visual C++ calling convention since Python is compiled using MSVC++ and we link in under that, so mingw doesn't work for our situation. We are compiling using the Visual Studio version featured by the github runners, which I think is MSVC++ 2022.

Thanks for looking into this. We recently switched from using Relic to BLST as our underlying BLS library and regrettably that has left some users with these AMD chips out in the cold.

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

The box I am using for testing with the AMD 905e CPU is running Windows 10 Home

Question was if it's actually supported. The fact that it might be possible to trick Windows [10] to install on unsupported hardware doesn't really mean that it makes it qualified for support by everybody :-)

Visual C++ calling convention

For reference, as far as blst itself goes, x86_64-pc-windows-gnu and x86_64-pc-windows-msvc are interchangeable. Because it's C, not C++. So that you can compile assembly.S with the mingw toolchain or clang and link it into your C++ thing [compiled with MSVC]. As for Github Actions, they have both preinstalled, ready to be called.

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

As an additional point, clang-cl is meant to be a drop-in replacement for cl. Yet at the same time you can compile assembly.S with it. Without bothering with --target.

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

clang-cl is meant to be a drop-in replacement for cl.

Just in case, the implied suggestion is to use clang-cl for everything, to compile build/assembly.S, src/server.c and your C++ thing. Unification! :-)

@wjblanke
Copy link
Author

wjblanke commented Dec 12, 2023

Could we get a new release of the rust crate that supports K10? I think that may be all we need.

Thanks!

@emlowe
Copy link

emlowe commented Dec 12, 2023

Assumingcargo build --features=portable --target=x86_64-pc-windows-gnu works, I don't think we would care about the specific toolchain that much

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

Assumingcargo build --features=portable --target=x86_64-pc-windows-gnu works

Double-check #201!

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

As for having clang-cl compile assembly.S by cargo, no, cc-rs doesn't let you pull it off, not as is. But it's possible to use "raw" clang:

  1. get clang on your %PATH%;
  2. set %CC% environment variable to clang;
  3. clone build/assembly.S: make --target=x86_64-pc-windows-gnu portable. #201 and change to its bindings/rust directory;
  4. execute cargo test --release --features=portable;
  5. cargo clean -p blst --release;
  6. cargo test --release ---features=portable -vvv;

The last two steps are meant to convince you that it does call clang -c -D__BLST_PORTABLE__ assembly.S.

As for non-Rust builds. Other build systems ought to respect CC environment variable too. And have user-defined rules. What I'm driving at is that in general you should be able to perform a Windows VS build by merely setting %CC% to clang-cl. As already mentioned, it was designed to be a drop-in replacement for cl. And you should be able to provide a suitable .S -> .obj rule...

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 12, 2023

Could we get a new release of the rust crate that supports K10?

I'm waiting out [at least] cc-rs release that will make --target=aarch64-pc-windows-msvc work. Meanwhile you can always use [patch.crates-io] to redirect blst to github.

@emlowe
Copy link

emlowe commented Dec 13, 2023

CC=clang does appear to make a correct rust crate and subsequent wheel via maturin - need to test on the ancient AMD system to be certain, but it works on a normal machine. I assume there is some minimal performance loss for everyone else in sha256, but that is the price for portable

@emlowe
Copy link

emlowe commented Dec 13, 2023

Confirmed that building with CC=clang along with #201 and using portable works on an AMD K10 machine.

@dot-asm
Copy link
Collaborator

dot-asm commented Dec 14, 2023

Thanks!

@Rigidity
Copy link

Could we get a new release of the rust crate that supports K10?

I'm waiting out [at least] cc-rs release that will make --target=aarch64-pc-windows-msvc work. Meanwhile you can always use [patch.crates-io] to redirect blst to github.

Hey, curious if there's any update on this and if you're planning on cutting a new release sometime soon? Would be ideal to eventually not have to patch in the specific commit hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants