Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD trap at avx2::_mm256_sad_epu8 #1412

Closed
zesterer opened this issue Nov 1, 2023 · 10 comments · Fixed by #1417
Closed

SIMD trap at avx2::_mm256_sad_epu8 #1412

zesterer opened this issue Nov 1, 2023 · 10 comments · Fixed by #1417
Labels
A-core-arch Area: Necessary for full core::arch support

Comments

@zesterer
Copy link

zesterer commented Nov 1, 2023

Hello,

I made an attempt at building Veloren with the cranelift backend!

I'll preface this by saying that I did not expect this to work, and the fact that the thing even compiled at all is miraculous to me. Veloren is an enormous codebase nowadays that pulls in a terrifying number of dependencies that do all sorts of weird and unusual things that likely represent a headache for a codegen backend like JIT, dynamic linking, horrible multi-threading things, linking to several C and C++ codebases, a lot of SIMD (both explicit and implicit), atomics all over the place, etc.

When running the executable, I get:

trap at Instance { def: Item(DefId(2:14641 ~ core[b9f2]::core_arch::x86::avx2::_mm256_sad_epu8)), args: [] } (_ZN4core9core_arch3x864avx215_mm256_sad_epu817h663d79696ba92f42E): llvm.x86.avx2.psad.bw

(this happens after both wgpu selects a graphics adapter and the internal server boots up, so that it got this far is impressive!)

That said, there was no warning about this intrinsic (or any warnings at all, for that manner) reported during the build process, despite this post implying that there should be.

Hopefully this is useful information!

@bjorn3
Copy link
Member

bjorn3 commented Nov 5, 2023

That said, there was no warning about this intrinsic (or any warnings at all, for that manner) reported during the build process, despite this post implying that there should be.

I think something somewhere is suppressing those warnings. Maybe it is cargo, maybe it is rustc when --cap-lints allow is passed? Haven't investigated yet.

When running the executable, I get:

trap at Instance { def: Item(DefId(2:14641 ~ core[b9f2]::core_arch::x86::avx2::_mm256_sad_epu8)), args: [] } (_ZN4core9core_arch3x864avx215_mm256_sad_epu817h663d79696ba92f42E): llvm.x86.avx2.psad.bw

Should be fixed on the implement_xgetbv branch now. I'm currently doing a local build of veloren to check if everything works.

@bjorn3
Copy link
Member

bjorn3 commented Nov 5, 2023

Looks like there is another image decoding issue:

trap at Instance { def: Item(DefId(2:13926 ~ core[90bc]::core_arch::x86::ssse3::_mm_mulhrs_epi16)), args: [] } (_ZN4core9core_arch3x865ssse316_mm_mulhrs_epi1617hcd9ec8a636ca4408E): llvm.x86.ssse3.pmul.hr.sw.128

@zesterer
Copy link
Author

zesterer commented Nov 6, 2023

Did you want me to look at further into this when I get time?

@bjorn3
Copy link
Member

bjorn3 commented Nov 6, 2023

That is not necessary. I know what the issue is (another unimplemented intrinsic), I am working on implementing it.

@bjorn3
Copy link
Member

bjorn3 commented Nov 7, 2023

Progress update: I got to the login screen and after I tried to login it crashed with

trap at Instance { def: Item(DefId(2:14310 ~ core[90bc]::core_arch::x86::avx::_mm256_lddqu_si256)), args: [] } (_ZN4core9core_arch3x863avx18_mm256_lddqu_si25617hbbd9f58f2d58f5fdE): llvm.x86.avx.ldu.dq.256

in httparse (a dependency of hyper). Going to fix that next.

@zesterer
Copy link
Author

zesterer commented Nov 7, 2023

Oooh, lots of progress! Definitely keep me updated and let me know if I can lend a hand, I'd love to see this as a viable alternative to opt-level = 0 for us.

@bjorn3
Copy link
Member

bjorn3 commented Nov 7, 2023

If you could quickly get rid of shaderc and spirv_cross that would be great :) Implementing new intrinsics is reasonably quickly as I can test the respective crate in isolation, but recompiling the entirety of veloren once I implemented some intrinsics takes 15min on my 2 core + HT intel core i3 laptop (can't use the dev-desktop-eu-2.infra.rust-lang.org due veloren to depending on vulkan). Like half of that is spent in compiling shaderc and spriv_cross.

(Yes, I'm fully aware that you can't do this quickly, but maybe for the long term switching to wgsl using naga would be possible? Naga is already a dependency of veloren through wgpu.)

I will keep you updated!

@zesterer
Copy link
Author

zesterer commented Nov 7, 2023

IIRC you can disable the shaderc-from-source feature in voxygen, it shouldn't be necessary to recompile shaderc. It's enabled by default just because rebuilds are buggy on some specific setups, but that's unlikely to be your case.

@bjorn3
Copy link
Member

bjorn3 commented Nov 7, 2023

screenshot_1699352450832

@bjorn3 bjorn3 added the A-core-arch Area: Necessary for full core::arch support label Nov 7, 2023
@bjorn3
Copy link
Member

bjorn3 commented Nov 11, 2023

Works with the latest nightly now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core-arch Area: Necessary for full core::arch support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants