Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-mfpmath=sse is not added in 32bits build #1133

Closed
SirLynix opened this issue Jun 11, 2024 · 10 comments · Fixed by #1141
Closed

-mfpmath=sse is not added in 32bits build #1133

SirLynix opened this issue Jun 11, 2024 · 10 comments · Fixed by #1141

Comments

@SirLynix
Copy link
Contributor

SirLynix commented Jun 11, 2024

Hi,

-mfpmath=sse is not added in 32bits build, and from a comment in Jolt.cmake this seems to be a mistake

# On 32-bit builds we need to default to using SSE instructions, the x87 FPU instructions have higher intermediate precision

This block is still part of the 64bits check, same for EMIT_X86_INSTRUCTION_SET_DEFINITIONS.

Also, shouldn't -msse2 be added too in 32bits build? -mfpmath=sse is not enough

@jrouwe
Copy link
Owner

jrouwe commented Jun 11, 2024

Thanks for the report, I'll investigate.

@jrouwe
Copy link
Owner

jrouwe commented Jun 12, 2024

Hello,

The errors in the referenced ticket don't seem to be related to enabling sse:

/opt/homebrew/Cellar/mingw-w64/12.0.0/toolchain-i686/lib/gcc/i686-w64-mingw32/14.1.0/include/xmmintrin.h: In member function 'float JPH::Vec3::GetX() const':
/opt/homebrew/Cellar/mingw-w64/12.0.0/toolchain-i686/lib/gcc/i686-w64-mingw32/14.1.0/include/xmmintrin.h:973:1: error: inlining failed in call to 'always_inline' 'float _mm_cvtss_f32(__m128)': target specific option mismatch
  973 | _mm_cvtss_f32 (__m128 __A)
      | ^~~~~~~~~~~~~

It's a warning about inlining which I already fixed in 4a08d8d (which is not in an official release yet).

The -mfpmath=sse flag seems to be sufficient for GCC to start creating SSE instructions. I've created a sample in godbolt. You can see that the generated assembly uses SSE instructions. If you remove the -mfpmath=sse flag, or replace it with -msse, it starts creating old style x87 instructions.

If this is the 64-bit section you're referring to:

elseif ("${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "x86_64" OR "${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "AMD64")

then what it is checking is CMAKE_SYSTEM_PROCESSOR, which in my experience for x86 is the processor that is compiling the code, not what architecture it is targetting. I'm hoping you're not compiling on a 32-bit OS, but if you are, or if I'm wrong about the value, we can add an extra check for that.

@SirLynix
Copy link
Contributor Author

Thanks for all the details!

I will use your inline fix to try and see if it fixes the build on MinGW on macOS.

If this is the 64-bit section you're referring to:

elseif ("${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "x86_64" OR "${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "AMD64")

then what it is checking is CMAKE_SYSTEM_PROCESSOR, which in my experience for x86 is the processor that is compiling the code, not what architecture it is targetting. I'm hoping you're not compiling on a 32-bit OS, but if you are, or if I'm wrong about the value, we can add an extra check for that.

However I find this confusing, why is the processor compiling the code any relevant? Compilers running on 32bits system can produce 64bits binaries (and compilers running on 64bits system can produce 32bits binaries as well). Only the target architecture should matter.

I find the CMake doc a bit confusing on that as well

@jrouwe
Copy link
Owner

jrouwe commented Jun 12, 2024

Yes, some of the cmake variables are unintuitive. How are you enabling the 32-bit build from cmake? If I simply add -m32 to the CMAKE_CXX_FLAGS then CMAKE_SYSTEM_PROCESSOR remains x86_64, which to me is logical since otherwise it would somehow need to parse the compiler options and change the value of CMAKE_SYSTEM_PROCESSOR on the fly based on this option.

@SirLynix
Copy link
Contributor Author

SirLynix commented Jun 12, 2024

yes from what I see xmake passes -m32 and set the compiler path.

/Users/runner/.xmake/packages/c/cmake/3.29.2/c83e9763d2e54cf3bc634700a848c33b/bin/cmake -DENABLE_ALL_WARNINGS=OFF -DINTERPROCEDURAL_OPTIMIZATION=OFF -DTARGET_UNIT_TESTS=OFF -DTARGET_HELLO_WORLD=OFF -DTARGET_PERFORMANCE_TEST=OFF -DTARGET_SAMPLES=OFF -DTARGET_VIEWER=OFF -DUSE_STATIC_MSVC_RUNTIME_LIBRARY=OFF -DOVERRIDE_CXX_FLAGS=OFF -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON -DCROSS_PLATFORM_DETERMINISTIC=OFF -DDOUBLE_PRECISION=OFF -DGENERATE_DEBUG_SYMBOLS=OFF -DOBJECT_LAYER_BITS=16 -DUSE_AVX=OFF -DUSE_AVX2=OFF -DUSE_AVX512=OFF -DUSE_F16C=OFF -DUSE_FMADD=OFF -DUSE_LZCNT=OFF -DUSE_SSE4_1=OFF -DUSE_SSE4_2=OFF -DUSE_TZCNT=OFF -DCMAKE_INSTALL_PREFIX=/Users/runner/.xmake/packages/j/joltphysics/v5.0.0/d18c2625a84a4ebe80128018d7fb713c -DCMAKE_INSTALL_LIBDIR:PATH=lib -G "Unix Makefiles" -DCMAKE_C_COMPILER=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-gcc -DCMAKE_OSX_SYSROOT= -DCMAKE_ASM_FLAGS=-m32 -DCMAKE_RC_COMPILER=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-windres -DCMAKE_STATIC_LINKER_FLAGS= -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=BOTH -DCMAKE_SHARED_LINKER_FLAGS=-m32 -DCMAKE_SYSTEM_NAME=Windows -DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH -DCMAKE_ASM_COMPILER=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-gcc -DCMAKE_CXX_FLAGS=-m32 -DCMAKE_RANLIB=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-ranlib -DHAVE_FLAG_SEARCH_PATHS_FIRST=0 -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER -DCMAKE_FIND_ROOT_PATH=/opt/homebrew/opt/mingw-w64 -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=BOTH -DCMAKE_CXX_COMPILER=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-g++ -DCMAKE_C_FLAGS=-m32 -DCMAKE_AR=/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-ar -DCMAKE_EXE_LINKER_FLAGS=-m32 "-DCMAKE_CXX_FLAGS_RELEASE=-m32 -O3 -DNDEBUG" "-DCMAKE_C_FLAGS_RELEASE=-m32 -O3 -DNDEBUG" -DCMAKE_EXE_LINKER_FLAGS_RELEASE=-m32 -DCMAKE_SHARED_LINKER_FLAGS_RELEASE=-m32 -DCMAKE_STATIC_LINKER_FLAGS_RELEASE= /Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Build

however, from the failed build log, there are also warnings about sse not being enabled

/Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Build/../Jolt/Core/Profiler.inl:71:50: warning: SSE vector return without SSE enabled changes the ABI [-Wpsabi]
   71 |                 __m128i val = _mm_loadu_si128(src);
      |                                                  ^

notice also that no SSE option is passed to the compiler:

/opt/homebrew/opt/mingw-w64/bin/i686-w64-mingw32-g++ -DJPH_BUILD_SHARED_LIBRARY -DJPH_DEBUG_RENDERER -DJPH_OBJECT_LAYER_BITS=16 -DJPH_PROFILE_ENABLED -DJPH_SHARED_LIBRARY -DJolt_EXPORTS -DNDEBUG @CMakeFiles/Jolt.dir/includes_CXX.rsp -m32 -Wno-stringop-overflow -ffp-contract=off -m32 -O3 -DNDEBUG -std=c++17 -Winvalid-pch -include /Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Build/build_d18c2625/CMakeFiles/Jolt.dir/cmake_pch.hxx -MD -MT CMakeFiles/Jolt.dir/Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Core/JobSystemThreadPool.cpp.obj -MF CMakeFiles/Jolt.dir/Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Core/JobSystemThreadPool.cpp.obj.d -o CMakeFiles/Jolt.dir/Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Core/JobSystemThreadPool.cpp.obj -c /Users/runner/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Core/JobSystemThreadPool.cpp

Probably because in that case, CMAKE_SYSTEM_PROCESSOR is aarch64 (Apple M1), but the target arch is x86.

it looks like detecting the target arch is not easy with cmake: https://stackoverflow.com/questions/11944060/how-to-detect-target-architecture-using-cmake

Edit : I see your fix targets only debug compilation, but here it's compiling using release

@jrouwe
Copy link
Owner

jrouwe commented Jun 12, 2024

Cross compiling is indeed difficult using cmake.

Can you place:

MESSAGE(${CMAKE_SYSTEM_PROCESSOR})

at the end of CMakeLists.txt to see what it is reporting? I'm hoping it's not aarch64, that would be really bad.

@SirLynix
Copy link
Contributor Author

SirLynix commented Jun 12, 2024

Sure.

I added

MESSAGE(CMAKE_SYSTEM_PROCESSOR="${CMAKE_SYSTEM_PROCESSOR}")

and the output is:

CMAKE_SYSTEM_PROCESSOR=""

😓

https://github.com/xmake-io/xmake-repo/actions/runs/9489680619/job/26151521672?pr=4330#step:5:478

@jrouwe
Copy link
Owner

jrouwe commented Jun 12, 2024

Ok, that was unexpected! I'm not really sure how to proceed now.

I can probably blanket disable the always_inline warning on gcc, but I can't really add -msse if I don't know what processor we're compiling for.

Do we really need to cross compile from Apple M1 to a 32-bit x86 platform? (seems like a super fringe case)

@SirLynix
Copy link
Contributor Author

SirLynix commented Jun 12, 2024

Do we really need to cross compile from Apple M1 to a 32-bit x86 platform? (seems like a super fringe case)

The problem is not 32-bit but target architecture detection, as CMAKE_SYSTEM_PROCESSOR is empty on MinGW 64bits on macOS too, and on MinGW 64bits on Windows too (I'm not sure why the error doesn't trigger).

Notice that MinGW has no SSE option even on Windows when targeting 64bits

/D/a/_temp/msys64/mingw64/bin/x86_64-w64-mingw32-g++.exe -DJPH_BUILD_SHARED_LIBRARY -DJPH_DEBUG_RENDERER -DJPH_OBJECT_LAYER_BITS=16 -DJPH_PROFILE_ENABLED -DJPH_SHARED_LIBRARY -DJolt_EXPORTS -DNDEBUG @CMakeFiles/Jolt.dir/includes_CXX.rsp -m64 -Wno-stringop-overflow -ffp-contract=off -m64 -O3 -DNDEBUG -std=c++17 -Winvalid-pch -include C:/Users/runneradmin/AppData/Local/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Build/build_07cbf58e/CMakeFiles/Jolt.dir/cmake_pch.hxx -MD -MT CMakeFiles/Jolt.dir/4560404b44facb1d4119b9286f48a459/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Physics/Body/MotionProperties.cpp.obj -MF CMakeFiles/Jolt.dir/4560404b44facb1d4119b9286f48a459/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Physics/Body/MotionProperties.cpp.obj.d -o CMakeFiles/Jolt.dir/4560404b44facb1d4119b9286f48a459/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Physics/Body/MotionProperties.cpp.obj -c /C/Users/runneradmin/AppData/Local/.xmake/cache/packages/2406/j/joltphysics/v5.0.0/source/Jolt/Physics/Body/MotionProperties.cpp

I suppose the best option would be to detect arch from compiler, or to expect the user to set CMAKE_SYSTEM_PROCESSOR from the command-line.

I can submit a pull request to xmake to set CMAKE_SYSTEM_PROCESSOR quite easily, but that won't fix 32bits compilation as sse options are only added in the x86_64 check

@SirLynix
Copy link
Contributor Author

SirLynix commented Jun 13, 2024

After some more research, it appears CMAKE_SYSTEM_PROCESSOR is empty because we set CMAKE_SYSTEM_NAME, which enables cross-compilation. It's our duty to set CMAKE_SYSTEM_PROCESSOR.
(edit: interesting read)

I will fix that on xmake side, and I think a fix is needed here:

JoltPhysics/Jolt/Jolt.cmake

Lines 619 to 655 in 4cd5205

elseif ("${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "x86_64" OR "${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "AMD64")
# x64
if (USE_AVX512)
target_compile_options(Jolt PUBLIC -mavx512f -mavx512vl -mavx512dq -mavx2 -mbmi -mpopcnt -mlzcnt -mf16c)
elseif (USE_AVX2)
target_compile_options(Jolt PUBLIC -mavx2 -mbmi -mpopcnt -mlzcnt -mf16c)
elseif (USE_AVX)
target_compile_options(Jolt PUBLIC -mavx -mpopcnt)
elseif (USE_SSE4_2)
target_compile_options(Jolt PUBLIC -msse4.2 -mpopcnt)
elseif (USE_SSE4_1)
target_compile_options(Jolt PUBLIC -msse4.1)
else()
target_compile_options(Jolt PUBLIC -msse2)
endif()
if (USE_LZCNT)
target_compile_options(Jolt PUBLIC -mlzcnt)
endif()
if (USE_TZCNT)
target_compile_options(Jolt PUBLIC -mbmi)
endif()
if (USE_F16C)
target_compile_options(Jolt PUBLIC -mf16c)
endif()
if (USE_FMADD AND NOT CROSS_PLATFORM_DETERMINISTIC)
target_compile_options(Jolt PUBLIC -mfma)
endif()
# On 32-bit builds we need to default to using SSE instructions, the x87 FPU instructions have higher intermediate precision
# which will cause problems in the collision detection code (the effect is similar to leaving FMA on, search for
# JPH_PRECISE_MATH_ON for the locations where this is a problem).
if (NOT MSVC)
target_compile_options(Jolt PUBLIC -mfpmath=sse)
endif()
EMIT_X86_INSTRUCTION_SET_DEFINITIONS()
endif()

maybe just check if CMAKE_SYSTEM_PROCESSOR equals x86 too (and update the 32bits comment), as I don't think any of this is specific for 64bits.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants