New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for AMD's Zen3 architecture #561
Conversation
- User can now specify zen3 configuration, currently it reuses block sizes and kernels from zen2. - Auto configuration can detect and enable if zen3 config is needed - Added support for amd64 bundle which contains all zen platforms - Moved exiting amd bundle to amd64 legacy. AMD-Internal: [CPUPL-500, CPUPL-1013]
AMD-Internal: [CPUPL-1013] Change-Id: I859152d63d1a56519c508dfa19587f25123e08b4
znver3 flag will be enabled if compiler is AOCC Clang version 3.0 and configuration is zen3 Change-Id: Ie164f4d469bf3f8df31ccf8fed9f80dfc62efb39 AMD-Internal: [CPUPL-1353]
Block sizes (MC, KC, NC) for DGEMM are determined at runtime based on following parameters - Single or multithreaded build - Processor Architecture (currently support only zen3) - Number of threads requested while running the library Change-Id: Ia793484b77adb87486e630d0d3b4c7856ae52094
The configuration is updated to - Enable EPYC architecture optimizations - Removed macros to override block sizes as there was no performance gain using them AMD-Internal : [CPUPL-1350]
# Ignore everything in this directory | ||
* | ||
# Except this file | ||
!.gitignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have local files in this directory which you're not ready to push, I recommend ignoring them in .git/info/exclude
which is purely local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we don't have any zen3 specific kernels but we do need empty folder to satisfy build scripts.
config/amd64/make_defs.mk
Outdated
endif | ||
endif | ||
# These setting should come from makefiles for individial configuration | ||
# included in this bundle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't there still need to be some setting here for compiling framework code (like COPTFLAGS
)?
As far as I can tell, there is no |
@fgvanzee , generic_legacy it is failing because there is issue with amd64_legacy configuration, interestingly amd64_legacy builds fine on its own but fails via x86_64. @devinamatthews, thanks for the review, will address your comments. |
Seems like there is some name conflict, if I rename amd64_legacy to amd_legacy it works but amd64legacy doesn't. Do we ignore characters after certain length in the config name? |
No. But maybe something is going awry with a function that searches for the presence of a string in a list of strings. |
I was able to reproduce it after all. (I had forgotten to check out the |
I've found the source of the |
Details: - Fixed a bug in configure related to the building of the so-called config list. When processing the contents of config_registry, configure creates a series of structures and list that allow for various mappings related to configuration families, subconfigs, and kernel sets. Two of those lists are built via subsitituion of umbrella families with their subconfig members, and one of those lists was improperly performing the subtitution in a way that would erroneously match on partial umbrella family names. That code was changed to match the code that was already doing the subtitution properly, via substitute_words(). - Added comments noting the importance of using substitute_words() in both instances.
Travis is building this PR now, but we have a new compilation error:
Maybe gcc 8 isn't new enough for the |
Confirmed. gcc 8 doesn't know about |
Details: - .travis.yml was previously using gcc 8, which did not support -march=znver2. The file has been updated to use gcc 9 everywhere gcc 8 was previously being used.
Uh, isn't the fix the check for a high enough version in the make defs? No need to change the Travis CI setup. |
Hmm, yeah, good point. We already have a |
@devinamatthews The fix should have worked, though. Instead, the |
Could be a transient network problem. I restarted it. |
I had already restarted |
Try downloading SDE from that link by hand. Maybe it has changed. |
This link? |
Whichever one is in do_sde.sh. |
At this point I don't know what else to do other than disable the SDE testing altogether. Related: why don't we nix the testing of |
Not that much time, but sure. |
Report it to Travis support. Something odd with the network because it works fine on my laptop. |
config/zen3/make_defs.mk
Outdated
#gcc or clang version must be atleast 4.0 | ||
# gcc 9.0 or later: | ||
ifeq ($(shell test $(GCC_VERSION) -ge 9; echo $$?),0) | ||
CKVECFLAGS += -march=znver2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that GCC 10.3 and newer support -march=znver3
but it's not selected.
Is that on purpose or an oversight?
https://gcc.gnu.org/gcc-10/changes.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye, @bartoldeman. I had noticed this, too, and have fixed it in my (as yet unpushed) pass of edits.
One thing that will help me get started on the Another question: can we distinguish the important AOCC versions through its clang versions (since AOCC is based on clang), or do you think we will need a separate set of |
Why not just check the compiler for support for various flags in |
I'm not sure how this is incompatible with what I've proposed. |
For example: has_znver3=no
if echo | $CC -march=znver3 -x c -c -o /dev/null -; then
has_znver3=yes
fi Then sub @HAS_ZNVER3@ or something in config.mk. |
And of course you want to add |
The conflict is that then you would not need any version flags at all. |
You wouldn't need any versions variables, true, but this solution still creates new I like your solution in principle since it narrowly identifies the key piece of information, which is: "Does the compiler support I could also imagine someone arguing that versions are still a bit more useful because in theory you could have a version of a compiler that thinks it supports a compiler flag, but that support is actually broken in practice. |
For AOCC (AMD clang) we have following thresholds: AOCC > 3.0.0 supports znver3. For clang I have limited information clang > 9.0.0 supports znver2 |
Me too like this idea, specifically when we are dealing with vanilla clang and AMD clang it will avoid many redundant checks/variables. |
Clang 12+ supports the |
Details: - Restructured clang and AOCC support for zen2/zen3 make_defs.mk files. The clang and AOCC version detection now happens in configure, not in the subconfigs' makefile fragments. That is, we've added logic to configure that detects the version of clang/AOCC, outputs an appropriate variable to config.mk (ie: CLANG_OT_*, AOCC_OT_*), and then checks for it within the makefile fragment (as is currently done for the GCC_OT_* variables). - Added configure support for a GCC_OT_10_1_0 variable (and associated substitution anchor) to communicate whether the gcc version is older than 10.1.0, and use this variable to check for recent enough versions of gcc to use -march=znver3. - Adjusted zen3 MC blocksizes for c and z datatypes to match that of zen2. - Disabled gemmsup for c and z datatypes in the zen3 subconfig by setting those sup thresholds to -1 given that we do not have any confirmed/tested sup kernels for the complex domain. - Inlined the contents of config/zen/amd_config.mk into the zen2 make_defs.mk so that the file is self-contained and immune to changes in other subconfigs. - Added indenting (with spaces) of GNU make conditionals for easier reading in zen2 and zen3 make_defs.mk files. - Comment updates. - Whitespace changes.
@dzambare Please take a look at 7872c3a -- in particular, the compiler detection logic in I also wanted to add CPUID files for Zen2 and Zen3 in |
Details: - Restructured clang and AOCC support in config/zen/make_defs.mk to bring that file into alignment with recent changes to the make_defs.mk of zen2 and zen3 subconfigs. - Added missing options (-mllvm -disable-licm-vrp) to the znver1 conditional case of aocc handling in zen2/make_defs.mk. These options were present in the amd_config.mk fragment that was being included in the previous version of zen2/make_defs.mk, but was accidentally omitted from the newer version introduced recently in 7872c3a.
@dzambare I made a minor change of note in b641cf7: I added I also rewrote the |
@dzambare Thanks Dipal. Devin provided me guidance on the CPUID |
Details: - Adjusted the range of models checked by bli_cpuid_is_zen() (which was previously 0x00 ~ 0xff and is now 0x00 ~ 0x2f) so that it is completely disjoint from the models checked by bli_cpuid_is_zen2() (0x30 ~ 0xff). This is normally necessary because Zen and Zen2 microarchitectures share the same family (23, or 0x17), and so the model code is the only way to differentiate the two. But in our case, fixing the model range for zen *wasn't* actually necessarily since we checked for zen2 first, and therefore the wide zen range acted like the 'else' of an 'if-else' statement. That said, the change helps improve clarity for the reader by encoding useful knowledge, which was obtained from https://en.wikichip.org/wiki/amd/cpuid . - Added a zen2.def file to the collection in travis/cpuid. Thanks to Devin Matthews for his guidance in hacking this file as a slight modification of zen.def. - Enabled testing of zen2 via the SDE in travis/do_sde.sh. - Comment updates to bli_cpuid.c.
Details: - Added a zen3.def file to the collection in travis/cpuid. Note that support for zen, zen2, and zen3 is now present, and while all the three microarchitectures have identical instruction sets from the perspective of BLIS, they each correspond to different subconfigs and therefore merit separate testing. - Enabled testing of zen3 via the SDE in travis/do_sde.sh. - Added comments to zen2.def to briefly explain how the file was created. (A similar comment is also present in zen3.def.)
Details: - Updated travis/do_sde.sh to grab the tarball for the latest version of the Intel SDE from the flame/ci-utils repository.
@dzambare Looks like the SDE testing for |
Hi @fgvanzee, I have tested the new makefiles for gcc 9, 10 , 11, aocc 2.0, 3.0 and 3.1 and clang 11. Everything is working as expected. Really appreciate your help in updating makefiles and SDE. |
You're welcome, @dzambare. Thank you for your help as well! Glad the new configuration logic works as intended. |
@bartoldeman This (rather significant) PR merge is done. 🙂 |
Details: - Added a new 'zen3' subconfiguration targeting support for the AMD Zen3 microarchitecture (flame#561). Thanks to AMD for this contribution. - Restructured clang and AOCC support for zen, zen2, and zen3 make_defs.mk files. The clang and AOCC version detection now happens in configure, not in the subconfigurations' makefile fragments. That is, we've added logic to configure that detects the version of clang/AOCC, outputs an appropriate variable to config.mk (ie: CLANG_OT_*, AOCC_OT_*), and then checks for it within the makefile fragment (as is currently done for the GCC_OT_* variables). - Added configure support for a GCC_OT_10_1_0 variable (and associated substitution anchor) to communicate whether the gcc version is older than 10.1.0, and use this variable to check for recent enough versions of gcc to use -march=znver3 in the zen3 subconfig. - Inlined the contents of config/zen/amd_config.mk into the zen and zen2 make_defs.mk so that the files are self-contained, harmonizing the format of all three Zen-based subconfigurations' make_defs.mk files. - Added indenting (with spaces) of GNU make conditionals for easier reading in zen, zen2, and zen3 make_defs.mk files. - Adjusted the range of models checked by bli_cpuid_is_zen() (which was previously 0x00 ~ 0xff and is now 0x00 ~ 0x2f) so that it is completely disjoint from the models checked by bli_cpuid_is_zen2() (0x30 ~ 0xff). This is normally necessary because Zen and Zen2 microarchitectures share the same family (23, or 0x17), and so the model code is the only way to differentiate the two. But in our case, fixing the model range for zen *wasn't* actually necessary since we checked for zen2 first, and therefore the wide zen range acted like the 'else' of an 'if-else' statement. That said, the change helps improve clarity for the reader by encoding useful knowledge, which was obtained from https://en.wikichip.org/wiki/amd/cpuid . - Added zen2.def and zen3.def files to the collection in travis/cpuid. Note that support for zen, zen2, and zen3 is now present, and while all the three microarchitectures have identical instruction sets from the perspective of BLIS microkernels, they each correspond to different subconfigurations and therefore merit separate testing. Thanks to Devin Matthews for his guidance in hacking these files as slight modifications of zen.def. - Enabled testing of zen2 and zen3 via the SDE in travis/do_sde.sh. Now, zen, zen2, and zen3 are tested through the SDE via Travis CI builds. - Updated travis/do_sde.sh to grab the SDE tarball from a new ci-utils repository on GitHub rather than on Intel's website. This change was made in an attempt to circumvent recent troubles with Travis CI not being able to download the SDE directly from Intel's website via curl. Thanks to Devin Matthews for suggesting the idea. - Updated travis/do_sde.sh to grab the latest version (8.69.1) of the Intel SDE from the flame/ci-utils repository. - Updated .travis.yml to use gcc 9. The file was previously using gcc 8, which did not support -march=znver2. - Created amd64_legacy umbrella family in config_registry for targeting older (bulldozer, piledriver, steamroller, and excavator) microarchitectures and moved those same subconfigs out of the amd64 umbrella family. However, x86_64 retains amd64_legacy as a constituent member. - Fixed a bug in configure related to the building of the so-called config list. When processing the contents of config_registry, configure creates a series of structures and lists that allow for various mappings related to configuration families, subconfigs, and kernel sets. Two of those lists are built via substitution of umbrella families with their subconfig members, and one of those lists was improperly performing the substitution in a way that would erroneously match on partial umbrella family names. That code was changed to match the code that was already doing the substitution properly, via substitute_words(). Also added comments noting the importance of using substitute_words() in both instances. - Comment updates.
Details: - Added a new 'zen3' subconfiguration targeting support for the AMD Zen3 microarchitecture (#561). Thanks to AMD for this contribution. - Restructured clang and AOCC support for zen, zen2, and zen3 make_defs.mk files. The clang and AOCC version detection now happens in configure, not in the subconfigurations' makefile fragments. That is, we've added logic to configure that detects the version of clang/AOCC, outputs an appropriate variable to config.mk (ie: CLANG_OT_*, AOCC_OT_*), and then checks for it within the makefile fragment (as is currently done for the GCC_OT_* variables). - Added configure support for a GCC_OT_10_1_0 variable (and associated substitution anchor) to communicate whether the gcc version is older than 10.1.0, and use this variable to check for recent enough versions of gcc to use -march=znver3 in the zen3 subconfig. - Inlined the contents of config/zen/amd_config.mk into the zen and zen2 make_defs.mk so that the files are self-contained, harmonizing the format of all three Zen-based subconfigurations' make_defs.mk files. - Added indenting (with spaces) of GNU make conditionals for easier reading in zen, zen2, and zen3 make_defs.mk files. - Adjusted the range of models checked by bli_cpuid_is_zen() (which was previously 0x00 ~ 0xff and is now 0x00 ~ 0x2f) so that it is completely disjoint from the models checked by bli_cpuid_is_zen2() (0x30 ~ 0xff). This is normally necessary because Zen and Zen2 microarchitectures share the same family (23, or 0x17), and so the model code is the only way to differentiate the two. But in our case, fixing the model range for zen *wasn't* actually necessary since we checked for zen2 first, and therefore the wide zen range acted like the 'else' of an 'if-else' statement. That said, the change helps improve clarity for the reader by encoding useful knowledge, which was obtained from https://en.wikichip.org/wiki/amd/cpuid . - Added zen2.def and zen3.def files to the collection in travis/cpuid. Note that support for zen, zen2, and zen3 is now present, and while all the three microarchitectures have identical instruction sets from the perspective of BLIS microkernels, they each correspond to different subconfigurations and therefore merit separate testing. Thanks to Devin Matthews for his guidance in hacking these files as slight modifications of zen.def. - Enabled testing of zen2 and zen3 via the SDE in travis/do_sde.sh. Now, zen, zen2, and zen3 are tested through the SDE via Travis CI builds. - Updated travis/do_sde.sh to grab the SDE tarball from a new ci-utils repository on GitHub rather than on Intel's website. This change was made in an attempt to circumvent recent troubles with Travis CI not being able to download the SDE directly from Intel's website via curl. Thanks to Devin Matthews for suggesting the idea. - Updated travis/do_sde.sh to grab the latest version (8.69.1) of the Intel SDE from the flame/ci-utils repository. - Updated .travis.yml to use gcc 9. The file was previously using gcc 8, which did not support -march=znver2. - Created amd64_legacy umbrella family in config_registry for targeting older (bulldozer, piledriver, steamroller, and excavator) microarchitectures and moved those same subconfigs out of the amd64 umbrella family. However, x86_64 retains amd64_legacy as a constituent member. - Fixed a bug in configure related to the building of the so-called config list. When processing the contents of config_registry, configure creates a series of structures and lists that allow for various mappings related to configuration families, subconfigs, and kernel sets. Two of those lists are built via substitution of umbrella families with their subconfig members, and one of those lists was improperly performing the substitution in a way that would erroneously match on partial umbrella family names. That code was changed to match the code that was already doing the substitution properly, via substitute_words(). Also added comments noting the importance of using substitute_words() in both instances. - Comment updates. - (cherry picked from commit 26e4b6b)
Added support for Zen3 Configuration.
-- Added new zen3 configuration and auto detection of zen3 architecture
-- Added configuration family amd64 for all zen architectures
-- Moved older AMD architecture in amd64_legacy family
-- Updated zen2 makefiles to pick znver2 for clang (if supported by the detected clang version)
AMD BLIS Upstream:
This PR includes following commits for AMD BLIS version 3.0.1
pick 9c7814d Added support for zen3 configuration
squash 536edc4 Added support for zen3 configuration
squash 23a2073 Added support for zen3 configuration
squash 449ee37 Added support for zen3 configuration
pick 25d23cd Zen3 support, disabled IR, JR loop parallelization
squash 9d7978e Zen3 support, disabled IR, JR loop parallelization
pick f8ab9f6 Enabled znver3 flag for zen3 architecture
squash 38a8008 Enabled znver3 flag for zen3 architecture
pick b1144a8 Added -fomit-frame-pointer option to CKOPTFLAGS.
pick ce99b1e Added dynamic block size selection logic for DGEMM.
squash f9d06c7 Added dynamic block size selection logic for DGEMM.
pick 14e2160 Update amd64 bundle configuration
squash c2f63fc Update amd64 bundle configuration