Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid popcnt on Windows when unavailable and in portable builds #9680

Closed
wants to merge 7 commits into from

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Mar 8, 2022

Fixes #9560. Only use popcnt intrinsic when HAVE_SSE42 is set. Also avoid setting it based on compiler test in portable builds because such test will pass on MSVC even without proper arch flags (ref: https://devblogs.microsoft.com/oldnewthing/20201026-00/?p=104397).

Test Plan: verified the combinations of -DPORTABLE and -DFORCE_SSE42 produce expected compiler flags on Linux. Verified MSVC build using PORTABLE=1 (in CircleCI) does not set HAVE_SSE42.

@ajkr ajkr changed the title [CI only] Try fix win popcnt Avoid popcnt on Windows when unavailable and in portable builds Mar 9, 2022
@ajkr ajkr marked this pull request as ready for review March 9, 2022 00:16
@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@ajkr ajkr requested a review from pdillinger March 9, 2022 01:06
@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, though I'm reluctant to spend time on stuff we can't really validate (because we don't have the hardware to verify PORTABLE is actually portable)

@@ -92,18 +92,23 @@ inline int CountTrailingZeroBits(T v) {
#endif
}

#if defined(_MSC_VER) && !defined(_M_X64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By not having ifdef _MSC_VER, we have code lines with emitted code but not covered by unit tests.

Copy link
Contributor Author

@ajkr ajkr Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we trying to appease the unit test LOC coverage or always conditionally compile away unused code?

For the former we can simply do:

#ifdef _MSC_VER

For the latter we'd need to do:

#if defined(_MSC_VER) && (!defined(HAVE_SSE42) || (!defined(_M_X64) && !defined(_M_IX86)))

I dropped this because the old code appeared to be leaning towards the latter and I didn't want to maintain it that way. If you are looking for the former benefit, I can add the simple ifdef.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can add the simple ifdef

Done.

message(FATAL_ERROR "FORCE_SSE42=ON but unable to compile with SSE4.2 enabled")
if(HAVE_SSE42)
add_definitions(-DHAVE_SSE42)
add_definitions(-DHAVE_PCLMUL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Pre-existing issue) According to this website I found, Nehalem processors have SSE 4.2 but not PCLMUL

https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

Copy link
Contributor Author

@ajkr ajkr Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there is an issue on this topic still open #3782 (comment). We knew about the possibility of such HW when introducing 3-way crc32c too but chose not to handle it then...

edit: That issue did not mention HAVE_PCLMUL should be tested separately, but yeah ideally it should.

edit 2: Somewhat interestingly, build_detect_platform does have a separate compiler test for HAVE_PCLMUL, while CMake does not.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ajkr added a commit that referenced this pull request Mar 10, 2022
Summary:
Fixes #9560. Only use popcnt intrinsic when HAVE_SSE42 is set. Also avoid setting it based on compiler test in portable builds because such test will pass on MSVC even without proper arch flags (ref: https://devblogs.microsoft.com/oldnewthing/20201026-00/?p=104397).

Pull Request resolved: #9680

Test Plan: verified the combinations of -DPORTABLE and -DFORCE_SSE42 produce expected compiler flags on Linux. Verified MSVC build using PORTABLE=1 (in CircleCI) does not set HAVE_SSE42.

Reviewed By: pdillinger

Differential Revision: D34739033

Pulled By: ajkr

fbshipit-source-id: d10456f3392945fc3e59430a1777840f7b60b276
ajkr added a commit that referenced this pull request Mar 10, 2022
Summary:
Fixes #9560. Only use popcnt intrinsic when HAVE_SSE42 is set. Also avoid setting it based on compiler test in portable builds because such test will pass on MSVC even without proper arch flags (ref: https://devblogs.microsoft.com/oldnewthing/20201026-00/?p=104397).

Pull Request resolved: #9680

Test Plan: verified the combinations of -DPORTABLE and -DFORCE_SSE42 produce expected compiler flags on Linux. Verified MSVC build using PORTABLE=1 (in CircleCI) does not set HAVE_SSE42.

Reviewed By: pdillinger

Differential Revision: D34739033

Pulled By: ajkr

fbshipit-source-id: d10456f3392945fc3e59430a1777840f7b60b276
@theolivenbaum
Copy link
Contributor

Any ideas when this is going to be released?

rajivshah3 pushed a commit to iotaledger/rocksdb that referenced this pull request Apr 14, 2022
Summary:
Fixes facebook/rocksdb#9560. Only use popcnt intrinsic when HAVE_SSE42 is set. Also avoid setting it based on compiler test in portable builds because such test will pass on MSVC even without proper arch flags (ref: https://devblogs.microsoft.com/oldnewthing/20201026-00/?p=104397).

Pull Request resolved: facebook/rocksdb#9680

Test Plan: verified the combinations of -DPORTABLE and -DFORCE_SSE42 produce expected compiler flags on Linux. Verified MSVC build using PORTABLE=1 (in CircleCI) does not set HAVE_SSE42.

Reviewed By: pdillinger

Differential Revision: D34739033

Pulled By: ajkr

fbshipit-source-id: d10456f3392945fc3e59430a1777840f7b60b276
@@ -321,7 +321,8 @@ if(NOT MSVC)
set(CMAKE_REQUIRED_FLAGS "-msse4.2 -mpclmul")
endif()

CHECK_CXX_SOURCE_COMPILES("
if (NOT PORTABLE OR FORCE_SSE42)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr @pdillinger I believe this change makes PORTABLE builds on MSVC never use the CRC32c three way implementation even if the processor features are detected at runtime in crc32c.cpp's Choose_Extend, because HAVE_SSE42 is now never defined and the runtime checks in isPCLMULQDQ and isSSE42 are never invoked. If true, that's a regression from the previous behavior of PORTABLE builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PORTABLE builds can test for POPCNT support using the CPUID instruction in a similar fashion to isPCLMULQDQ and isSSE42 (POPCNT is bit 23 in the 3rd integer, similarly to how SSE 4.2 is bit 20).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime checking for processor features is not a sustainable model for us to implement. If you want peak performance, compile for your hardware. Also, at least on newer hardware, XXH3 checksum is faster than CRC32c.

I'm pretty sure that the runtime detection "feature" you allude to was an illusion anyway, because if -march= supports some feature, it is free to be used in code generation, without a runtime check, to optimize various patterns and APIs. If -march= does not support some feature, the compiler will refuse to compile a call to an intrinsic that requires that feature. Only asm will bypass that check, which was not used in the x86 implementation of 3-way crc32c.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario I'm referring to is MSVC, where -march doesn't apply. You can build in MSVC without having compiler generated code use e.g., SSE 4 instructions, but still be able to invoke SSE 4 intrinsics - this is what MSVC PORTABLE builds did after a runtime decision in three way CRC32C as far as I can tell.
Our use case is shipping RocksDB as part of an end user Windows product that's installed on a variety of systems - unfortunately some of them with "antique" CPUs without POPCNT and some with the latest and greatest ones. We can't use non-PORTABLE builds due to the heterogeneous nature of end user devices. Building multiple RocksDB libraries with different build flags and selecting the one appropriate for a specific target system is not really viable. So, the previous state of affairs where PORTABLE builds dynamically selected optimized code paths at runtime when it had the most value (like block verification) was ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger considering the MSVC will only use explicitly invoked intrinsics, do you think we can allow compilation when there are complete runtime checks on the features used? I am not planning to implement it but wondering if we would accept such a contribution.

BTW, the previous code did not have complete runtime checks, in particular popcnt was not checked and still isn't, so I am not sure how the previous state was ideal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code used POPCNT unconditionally at runtime in PORTABLE builds and that was definitely a bug. My only complaint is with the side effect of this PR that PORTABLE builds no longer attempt to use three way CRC32C on MSVC.
If you are inclined to accept such a contribution I'll go ahead and open a PR to address this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If MSVC works that way, sure send a PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we should bring clarity to what things like FORCE_SSE42 and HAVE_SSE42 really mean, or rename them or add new things that mean what we want. I had assumed that HAVE_* historically meant that the feature was available without a runtime check when I added HAVE_AVX2. Perhaps it historically (unclearly) meant that a runtime check was needed.

facebook-github-bot pushed a commit that referenced this pull request Nov 8, 2022
Summary:
Hello,
As discussed previously in this [discussion](#9680 (comment)), the mentioned PR introduced a regression in portable versions that compile with MSVC - crc_3way optimization won't be used even in cases where it is supported.

This PR aims to fix just that.

Pull Request resolved: #10667

Reviewed By: akankshamahajan15

Differential Revision: D40644592

Pulled By: ajkr

fbshipit-source-id: dadbeb10d57c19800e74288258ec3b96095557dd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rocksdb jni fails on invalid instruction when popcnt instruction is missing
5 participants