-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid popcnt on Windows when unavailable and in portable builds #9680
Conversation
@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, though I'm reluctant to spend time on stuff we can't really validate (because we don't have the hardware to verify PORTABLE is actually portable)
@@ -92,18 +92,23 @@ inline int CountTrailingZeroBits(T v) { | |||
#endif | |||
} | |||
|
|||
#if defined(_MSC_VER) && !defined(_M_X64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By not having ifdef _MSC_VER, we have code lines with emitted code but not covered by unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we trying to appease the unit test LOC coverage or always conditionally compile away unused code?
For the former we can simply do:
#ifdef _MSC_VER
For the latter we'd need to do:
#if defined(_MSC_VER) && (!defined(HAVE_SSE42) || (!defined(_M_X64) && !defined(_M_IX86)))
I dropped this because the old code appeared to be leaning towards the latter and I didn't want to maintain it that way. If you are looking for the former benefit, I can add the simple ifdef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can add the simple ifdef
Done.
message(FATAL_ERROR "FORCE_SSE42=ON but unable to compile with SSE4.2 enabled") | ||
if(HAVE_SSE42) | ||
add_definitions(-DHAVE_SSE42) | ||
add_definitions(-DHAVE_PCLMUL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Pre-existing issue) According to this website I found, Nehalem processors have SSE 4.2 but not PCLMUL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there is an issue on this topic still open #3782 (comment). We knew about the possibility of such HW when introducing 3-way crc32c too but chose not to handle it then...
edit: That issue did not mention HAVE_PCLMUL should be tested separately, but yeah ideally it should.
edit 2: Somewhat interestingly, build_detect_platform does have a separate compiler test for HAVE_PCLMUL, while CMake does not.
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Fixes #9560. Only use popcnt intrinsic when HAVE_SSE42 is set. Also avoid setting it based on compiler test in portable builds because such test will pass on MSVC even without proper arch flags (ref: https://devblogs.microsoft.com/oldnewthing/20201026-00/?p=104397). Pull Request resolved: #9680 Test Plan: verified the combinations of -DPORTABLE and -DFORCE_SSE42 produce expected compiler flags on Linux. Verified MSVC build using PORTABLE=1 (in CircleCI) does not set HAVE_SSE42. Reviewed By: pdillinger Differential Revision: D34739033 Pulled By: ajkr fbshipit-source-id: d10456f3392945fc3e59430a1777840f7b60b276
Summary: Fixes #9560. Only use popcnt intrinsic when HAVE_SSE42 is set. Also avoid setting it based on compiler test in portable builds because such test will pass on MSVC even without proper arch flags (ref: https://devblogs.microsoft.com/oldnewthing/20201026-00/?p=104397). Pull Request resolved: #9680 Test Plan: verified the combinations of -DPORTABLE and -DFORCE_SSE42 produce expected compiler flags on Linux. Verified MSVC build using PORTABLE=1 (in CircleCI) does not set HAVE_SSE42. Reviewed By: pdillinger Differential Revision: D34739033 Pulled By: ajkr fbshipit-source-id: d10456f3392945fc3e59430a1777840f7b60b276
Any ideas when this is going to be released? |
Summary: Fixes facebook/rocksdb#9560. Only use popcnt intrinsic when HAVE_SSE42 is set. Also avoid setting it based on compiler test in portable builds because such test will pass on MSVC even without proper arch flags (ref: https://devblogs.microsoft.com/oldnewthing/20201026-00/?p=104397). Pull Request resolved: facebook/rocksdb#9680 Test Plan: verified the combinations of -DPORTABLE and -DFORCE_SSE42 produce expected compiler flags on Linux. Verified MSVC build using PORTABLE=1 (in CircleCI) does not set HAVE_SSE42. Reviewed By: pdillinger Differential Revision: D34739033 Pulled By: ajkr fbshipit-source-id: d10456f3392945fc3e59430a1777840f7b60b276
@@ -321,7 +321,8 @@ if(NOT MSVC) | |||
set(CMAKE_REQUIRED_FLAGS "-msse4.2 -mpclmul") | |||
endif() | |||
|
|||
CHECK_CXX_SOURCE_COMPILES(" | |||
if (NOT PORTABLE OR FORCE_SSE42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr @pdillinger I believe this change makes PORTABLE builds on MSVC never use the CRC32c three way implementation even if the processor features are detected at runtime in crc32c.cpp's Choose_Extend
, because HAVE_SSE42
is now never defined and the runtime checks in isPCLMULQDQ
and isSSE42
are never invoked. If true, that's a regression from the previous behavior of PORTABLE builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PORTABLE builds can test for POPCNT support using the CPUID instruction in a similar fashion to isPCLMULQDQ
and isSSE42
(POPCNT is bit 23 in the 3rd integer, similarly to how SSE 4.2 is bit 20).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime checking for processor features is not a sustainable model for us to implement. If you want peak performance, compile for your hardware. Also, at least on newer hardware, XXH3 checksum is faster than CRC32c.
I'm pretty sure that the runtime detection "feature" you allude to was an illusion anyway, because if -march= supports some feature, it is free to be used in code generation, without a runtime check, to optimize various patterns and APIs. If -march= does not support some feature, the compiler will refuse to compile a call to an intrinsic that requires that feature. Only asm will bypass that check, which was not used in the x86 implementation of 3-way crc32c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario I'm referring to is MSVC, where -march
doesn't apply. You can build in MSVC without having compiler generated code use e.g., SSE 4 instructions, but still be able to invoke SSE 4 intrinsics - this is what MSVC PORTABLE builds did after a runtime decision in three way CRC32C as far as I can tell.
Our use case is shipping RocksDB as part of an end user Windows product that's installed on a variety of systems - unfortunately some of them with "antique" CPUs without POPCNT and some with the latest and greatest ones. We can't use non-PORTABLE builds due to the heterogeneous nature of end user devices. Building multiple RocksDB libraries with different build flags and selecting the one appropriate for a specific target system is not really viable. So, the previous state of affairs where PORTABLE builds dynamically selected optimized code paths at runtime when it had the most value (like block verification) was ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger considering the MSVC will only use explicitly invoked intrinsics, do you think we can allow compilation when there are complete runtime checks on the features used? I am not planning to implement it but wondering if we would accept such a contribution.
BTW, the previous code did not have complete runtime checks, in particular popcnt was not checked and still isn't, so I am not sure how the previous state was ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous code used POPCNT unconditionally at runtime in PORTABLE builds and that was definitely a bug. My only complaint is with the side effect of this PR that PORTABLE builds no longer attempt to use three way CRC32C on MSVC.
If you are inclined to accept such a contribution I'll go ahead and open a PR to address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If MSVC works that way, sure send a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we should bring clarity to what things like FORCE_SSE42 and HAVE_SSE42 really mean, or rename them or add new things that mean what we want. I had assumed that HAVE_* historically meant that the feature was available without a runtime check when I added HAVE_AVX2. Perhaps it historically (unclearly) meant that a runtime check was needed.
Summary: Hello, As discussed previously in this [discussion](#9680 (comment)), the mentioned PR introduced a regression in portable versions that compile with MSVC - crc_3way optimization won't be used even in cases where it is supported. This PR aims to fix just that. Pull Request resolved: #10667 Reviewed By: akankshamahajan15 Differential Revision: D40644592 Pulled By: ajkr fbshipit-source-id: dadbeb10d57c19800e74288258ec3b96095557dd
Fixes #9560. Only use popcnt intrinsic when HAVE_SSE42 is set. Also avoid setting it based on compiler test in portable builds because such test will pass on MSVC even without proper arch flags (ref: https://devblogs.microsoft.com/oldnewthing/20201026-00/?p=104397).
Test Plan: verified the combinations of -DPORTABLE and -DFORCE_SSE42 produce expected compiler flags on Linux. Verified MSVC build using PORTABLE=1 (in CircleCI) does not set HAVE_SSE42.