New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support broader range of CPUs like Intel Core 2 series by using SSE4.1 #78
Conversation
Add compile options to cmake
Hello, thanks for the change! I just tested it on MSVC and it turns off popcount because MSVC doesn't define |
I think we can use add_compile_definitions like this to solve the problem:
|
I fixed the MSVC build, but while playing around with the various compiler options I found out that MSVC2022 only supports compiling for AVX and AVX2 (/arch:SSE4 doesn't even exist as a commandline option, /arch:SSE2 gives a warning that the option is unknown). So I think the changes in Core.h and Math.h are not needed as you will never be able to trigger them. On Clang everything goes through '__builtin_popcount' and only on MSVC there's a difference. Shall I remove them (or did you actually intend your version of popcount to be executed on Linux too)? |
Do you mean without one of AVX or AVX2, library can't be compiled with MSVC? If that's the case then your change and suggestion is ok, but if not then what about my previous message to using "add_compile_definitions" to enable/disable features explicitly?
|
- Added support for SSE4.1 in MSVC
You're right, when no /arch:XXX commandline is supplied MSVC will fall back to SSE2 (the property sheet in MSVC allows you to set /arch:SSE2 but that option gives a warning). Unlike in Clang, you can still use the SSE4 intrinsics in that case, so I think I can indeed support a SSE4.1, SSE4.2, AVX and AVX2 configuration on MSVC and then your code additions make sense. All processors that support AVX2 also support BMI, POPCNT, LZCNT, M16C and FMA so I'm enabling them all together if you enable AVX2 now. AVX always supports POPCNT so they go together too. And other than that I kept your individual switches in case someone wants to target a processor that doesn't support AVX2 but does support some of the other instructions (i.e. some consoles). |
Let me know if it works for you like this and I'll merge the changes. |
Excellent, it seems good to me. Please merge it, we can add additional changes later if something doesn't work correctly. |
Because popcnt is the only instruction from SSE4.2 that used in project, I add a software implementation for it and add cmake options for more flexible builds.
All options are ON now and project builds like before. If all options set to OFF then project will be compatible with SSE4.1.
I only test it with Clang. I couldn't test it with Visual Studio so I didn't implement cmake options for it.
Resources related to popcount implementation:
https://stackoverflow.com/questions/109023/how-to-count-the-number-of-set-bits-in-a-32-bit-integer
https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
https://godbolt.org/z/qGdh1dvKK
Tests Results:
SSE4.1:
SSE4.2(popcnt):
All options ON: