forked from intel/x86-simd-sort
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Mitigated poor compressstore performance on AMD Zen 4
Zen 4's compressstore AVX512 implementation is highly inefficient (throughput of 50-70). Emulating it using `compress` & `storeu` separately is, in fact, faster than the native operation. To choose between the native/emulated, a `SW_VCOMPRESS` flag can be passed to the make file (`SW_COMPRESS=1 make`)
- Loading branch information
1 parent
7d7591c
commit 41d03b2
Showing
4 changed files
with
81 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
41d03b2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@natmaurice This is interesting! Have you opened an issue with gcc?
41d03b2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mr-c That's a good point.
So far, the current release of gcc (12.2) does not support Zen 4 (
-march=znver4
), and neither does clang.This support should arrive for gcc 13. Unfortunately, the pre-release doesn't seem to optimize
compressstoreu
into a faster emulated version. I haven't found any report about the issue either.So yes, I'm probably going to follow your advice and will file an issue with gcc.
41d03b2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the best time to report bugs in GCC, before the first release of the new series is out.
By the way, I adapted the code here in a branch of SIMDe I'm currently working on: simd-everywhere/simde@13cc2be (frequently rebased in https://github.com/simd-everywhere/simde/tree/x86-simd-sort )
Do you know if clang supports Zen 4 yet? I can add a similar fix.