-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add openmp support to System z #66081
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Do we have build bolt set up for it?
Not yet. I'm running the other SystemZ build bots, I'll see if I can add an OpenMP bot as well. |
Let me know when this is merged, we are eager to test it on the onnx-mlir / zDLC compiler effort to optimize DNN for the Z platform using multi-threading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I had a brief look and besides some typos I haven't spotted anything. __kmp_invoke_microtask()
is the most involved piece, and it looks correct to me.
I'm not sure what went wrong with the checks after the last update. I had fetched main and rebased:
|
Hmm, as far as I can see, |
Yes, I had done |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Thanks. Fixed. Do you want me to squash my commits before I push again? |
I'm slowly going through the failures in the openmp testsuite (the llvm testsuite is now green). So far I have a couple of fixups that I plan to convert to proper patches later:
and there are more to come (e.g., we need to support backchain unwinding via |
I now have two fixups for the existing "Add openmp support to System z" commit; please consider making an update. The number of failures that I'm seeing locally is down from 45 to 24. First, apparently there is an additional possiblity for
Second:
The first part fixes unwinding (GDB could not unwind past this function). The second part fixes several tests; the problem is that |
For the
The second change looks correct to me. |
I have made the changes but will hold off squashing/pushing until the rest of the changes are ready. |
The backchain PR was merged: #69405 One more fixup, which gets the number of failures down to 20:
It's somewhat hacky, so I'm open to suggestions on how to improve this. This diff adds /proc/cpuinfo parsing support. The problem is that openmp thinks that topology has 3 levels: threads, cores and sockets (https://www.openmp.org/spec-html/5.0/openmpse53.html). On IBM Z we have two more levels: books and drawers. This change works around this mismatch by combining socket, book and drawer into one value. In theory this all can be skipped altogether, so that |
How does the GNU OpenMP library handle this? If there are choices to be made, it would be best to remain compatible ... |
It looks similar, yes. I've decided to include the assembly snippet, because from these textual descriptions it's somewhat hard to fully understand what is going on.
They seem to sidestep the problem by parsing only core sibling lists ( |
FWIW, it is fine (from OpenMP's perspective) to ignore certain features if the hardware/software don't support. For example, we don't have affinity control on macOS. |
Having authored the original proposal on OMP affinity, there were too many different layouts/options to easily coalesce all possible machine hardware into a simple enough abstraction. In using a single number, number that are close are supposed to be "nearer" than others that are far. There can be discontinuities (e.g. CPUs 0-7 are on one socket, CPU #8 is "near" CPU 7 but will be on a different socket), this is unavoidable. However, affinity has to be understood in terms of "places". If it is important to have socket affinity, then you may place threads assigned to CPU 0..7 to one OMP place, and threads assigned to CPU 8..15 to a different OMP place. Alternatively, if you want affinity by books or drawer, you may place OMP threads in Places that represent books or drawers. When initiating a new parallel workspace/region/loop, one may indicate if we want threads to be colocated in the same place, in nearby places, or spread among places. Hope this helps putting this feature in context. |
Totally agree; and implementations for a custom arch can be refined over time to provide additional functionality as it is being ready to be utilized or added. |
What are the remaining impediments in merging this PR? |
There are still 20 testsuite failures, which I am investigating. Some are caused by flaky archer tests (where running on IBM Z somehow amplifies their flakiness), but there is at least one endianness issue that I'm still looking into, caused by The idea is to clean up the testsuite and almost immediately enable the IBM Z buildbot. |
I've posted the #69995 and #69982 PRs. With these PRs and with the fixups I posted in the comments here, I get only archer failures, which we can IMO xfail. Just in case, here is my working branch: https://github.com/iii-i/llvm-project/tree/systemz-openmp. |
Just to confirm I have updated my tree with the patches in your comments and just need to squash and push. |
The PRs were merged, thanks @shiltian for the reviews! I went over my work branch, and noticed two more things that would be good to have in this PR. The following makes unwinding work in testcases; we already have similar definitions for sanitizers:
The second one is more questionable:
In order for unwinding to work, all the code involved in the call chain must store backchain on the stack. In 99% of the cases this is test code, which is covered by the first diff. However, in one particular case there is library code involved. Building the whole library with backchain support would be an overkill, hence this solution. I understand that I'm proposing to add code that is relevant only to testing to the main logic, so I'm open to different approaches here. Edit: I noticed I forgot one minor test fix, sent as #70075. |
So just to confirm, the latest LLVM tree has the fixes? |
Yes, except for the very minor one (70075). As far as I'm concerned, we can go ahead without it. There will still be a small gap before we enable the buildbot. |
This should be done as a stand-alone PR, I think there'll need to be some additional discussion. In particular, we should make sure that the atomic expand pass doesn't introduce any other changes we don't want to see - we'll have to review the default settings of the various See also this (still pending) patch about handling 128-bit atomics: https://reviews.llvm.org/D146425 |
I looked at the current llvm So it's not merged yet? Or was it merged and then pulled?
|
No, it's not merged yet. We're still working through getting all pieces in place to ensure a fully clean test suite. At this point, the remaining issues are atomic fadd/fsub support (#70398), and then collecting the various changes described in the above comments and merging them into this PR. |
#70398 is now merged; I put the fixups I propose for this PR into https://github.com/iii-i/llvm-project/tree/libomp-s390x. |
Thanks, @iii-i ! @nealef , can you merge those changes here so we can then merge this PR as a whole? Thanks! |
I'm still a little confused as to what should go in. Were any of the above in a separate PR? Here's what I have to commit at the moment. It doesn't have the frame pointer suggestions because I couldn't work out if they were going in separately.
What's missing or what should be excluded? |
What's missing, and what I would also squash here, are: |
Sigh, forgot to run the format checker. |
Not sure what the format check failure is about. |
It says |
I manually ran
where the two diff numbers are (1) my commit and (2) the next commit and it came back clean. So I am not sure (a) why the commit numbers are different to mine and (b) why it failed. I'll rebase again. |
Here is the result of |
? - there was no output? |
It removed trailing whitespace on lines 1743 and 1744. |
* openmp/README.rst - Add s390x to those platforms supported * openmp/libomptarget/plugins-nextgen/CMakeLists.txt - Add s390x subdirectory * openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt - Add s390x definitions * openmp/runtime/CMakeLists.txt - Add s390x to those platforms supported * openmp/runtime/cmake/LibompGetArchitecture.cmake - Define s390x ARCHITECTURE * openmp/runtime/cmake/LibompMicroTests.cmake - Add dependencies for System z (aka s390x) * openmp/runtime/cmake/LibompUtils.cmake - Add S390X to the mix * openmp/runtime/cmake/config-ix.cmake - Add s390x as a supported LIPOMP_ARCH * openmp/runtime/src/kmp_affinity.h - Define __NR_sched_[get|set]addinity for s390x * openmp/runtime/src/kmp_config.h.cmake - Define CACHE_LINE for s390x * openmp/runtime/src/kmp_os.h - Add KMP_ARCH_S390X to support checks * openmp/runtime/src/kmp_platform.h - Define KMP_ARCH_S390X * openmp/runtime/src/kmp_runtime.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/kmp_tasking.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h - Define ITT_ARCH_S390X * openmp/runtime/src/z_Linux_asm.S - Instantiate __kmp_invoke_microtask for s390x * openmp/runtime/src/z_Linux_util.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/test/ompt/callback.h - Define print_possible_return_addresses for s390x * openmp/runtime/tools/lib/Platform.pm - Return s390x as platform and host architecture * openmp/runtime/tools/lib/Uname.pm - Set hardware platform value for s390x * openmp/runtime/src/kmp_affinity.cpp - Implement s390x /proc/cpuinfo parsing * openmp/runtime/src/kmp_tasking.cpp - Add backchain attribute to __km_invoke_task - Style fix * openmp/runtime/src/z_Linux_asm.S - Add unwind information * openmp/runtime/test/lit.cfg - Build openmp tests with -mbackchain * openmp/runtime/test/ompt/callback.h - Additional possibility for print_possible_return_addresses()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shiltian , does this still look good to you after the latest set of changes? From a SystemZ perspective this looks ready to be merged now. (After it is merged, I'll enable openmp testing in the SystemZ build bots as well.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. The changes look good to me, assuming they have been tested internally. For the "missing" features we can always refine them gradually.
Opened a PR to add an OpenMP builder for s390x here: llvm/llvm-zorg#67 |
##===----------------------------------------------------------------------===## | ||
|
||
if(CMAKE_SYSTEM_NAME MATCHES "Linux") | ||
build_generic_elf64("SystemZ" "S390X" "s390x" "s390x-ibm-linux-gnu" "22") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to build this on Fedora and CMAKE_SYSTEM_PROCESSOR is set to s390x and not SystemZ, so the plugin fails to build. What operating system were you testing this on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alma Linux 8 I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also see
-- LIBOMPTARGET: Not building S390X NextGen offloading plugin: machine not found in the system.
@nealef can you see what we need to do to fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does CMAKE_SYSTEM_NAME get set to anything but Linux for a z build?
if(CMAKE_SYSTEM_NAME MATCHES "Linux")
build_generic_elf64("SystemZ" "S390X" "s390x" "s390x-ibm-linux-gnu" "22")
else()
libomptarget_say("Not building s390x NextGen offloading plugin: machine not found in the system.")
endif()
That's the only way we can get this message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's a different message (note the "S390X" instead of "s390x" in the message text). This message seems to originate from the implementation of build_generic_elf64
:
macro(build_generic_elf64 tmachine tmachine_name tmachine_libname tmachine_triple elf_machine_id)
if(CMAKE_SYSTEM_PROCESSOR MATCHES "${tmachine}$")
[...]
else()
libomptarget_say("Not building ${tmachine_name} NextGen offloading plugin: machine not found in the system.")
endif()
This is testing CMAKE_SYSTEM_PROCESSOR
against SystemZ
, but the macro is set to s390x
as Tom said.
If I change the build_generic_elf64
invocation to use s390x
, an attempt to build the plugin is made, but it fails:
-- LIBOMPTARGET: Building s390x plugin linked with libffi
[...]
/home/uweigand/llvm/llvm-head/openmp/libomptarget/plugins-nextgen/generic-elf-64bit/src/rtl.cpp:409:20: error: no member named 's390x' in 'llvm::Triple'
409 | return Triple::LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE;
| ~~~~~~~~^
<command line>:2:52: note: expanded from macro 'LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE'
2 | #define LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE s390x
| ^
So there appears to be some mismatch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Uli. I was on vacation the last couple of weeks so didn't have a chance to look at it. However, judging by your changes I would've been asking you a lot of questions anyway ;-). |
I've finally merged all needed PRs, so the plugin now builds on SystemZ. |
openmp/README.rst
openmp/libomptarget/plugins-nextgen/CMakeLists.txt
openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt
openmp/runtime/CMakeLists.txt
openmp/runtime/cmake/LibompGetArchitecture.cmake
openmp/runtime/cmake/LibompMicroTests.cmake
openmp/runtime/cmake/LibompUtils.cmake
openmp/runtime/cmake/config-ix.cmake
openmp/runtime/src/kmp_affinity.h
openmp/runtime/src/kmp_config.h.cmake
openmp/runtime/src/kmp_os.h
openmp/runtime/src/kmp_platform.h
openmp/runtime/src/kmp_runtime.cpp
openmp/runtime/src/kmp_tasking.cpp
openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h
openmp/runtime/src/z_Linux_asm.S
openmp/runtime/src/z_Linux_util.cpp
openmp/runtime/test/ompt/callback.h
openmp/runtime/tools/lib/Platform.pm
openmp/runtime/tools/lib/Uname.pm