Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add openmp support to System z #66081

Merged
merged 2 commits into from
Nov 3, 2023
Merged

Add openmp support to System z #66081

merged 2 commits into from
Nov 3, 2023

Conversation

nealef
Copy link
Contributor

@nealef nealef commented Sep 12, 2023

  • openmp/README.rst

    • Add s390x to those platforms supported
  • openmp/libomptarget/plugins-nextgen/CMakeLists.txt

    • Add s390x subdirectory
  • openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt

    • Add s390x definitions
  • openmp/runtime/CMakeLists.txt

    • Add s390x to those platforms supported
  • openmp/runtime/cmake/LibompGetArchitecture.cmake

    • Define s390x ARCHITECTURE
  • openmp/runtime/cmake/LibompMicroTests.cmake

    • Add dependencies for System z (aka s390x)
  • openmp/runtime/cmake/LibompUtils.cmake

    • Add S390X to the mix
  • openmp/runtime/cmake/config-ix.cmake

    • Add s390x as a supported LIPOMP_ARCH
  • openmp/runtime/src/kmp_affinity.h

    • Define _NR_sched[get|set]addinity for s390x
  • openmp/runtime/src/kmp_config.h.cmake

    • Define CACHE_LINE for s390x
  • openmp/runtime/src/kmp_os.h

    • Add KMP_ARCH_S390X to support checks
  • openmp/runtime/src/kmp_platform.h

    • Define KMP_ARCH_S390X
  • openmp/runtime/src/kmp_runtime.cpp

    • Generate code when KMP_ARCH_S390X is defined
  • openmp/runtime/src/kmp_tasking.cpp

    • Generate code when KMP_ARCH_S390X is defined
  • openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h

    • Define ITT_ARCH_S390X
  • openmp/runtime/src/z_Linux_asm.S

    • Instantiate __kmp_invoke_microtask for s390x
  • openmp/runtime/src/z_Linux_util.cpp

    • Generate code when KMP_ARCH_S390X is defined
  • openmp/runtime/test/ompt/callback.h

    • Define print_possible_return_addresses for s390x
  • openmp/runtime/tools/lib/Platform.pm

    • Return s390x as platform and host architecture
  • openmp/runtime/tools/lib/Uname.pm

    • Set hardware platform value for s390x

@nealef
Copy link
Contributor Author

nealef commented Sep 13, 2023

@uweigand

@shiltian shiltian requested a review from a team September 13, 2023 14:15
Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Do we have build bolt set up for it?

@uweigand
Copy link
Member

This looks good to me. Do we have build bolt set up for it?

Not yet. I'm running the other SystemZ build bots, I'll see if I can add an OpenMP bot as well.

@AlexandreEichenberger
Copy link
Contributor

Let me know when this is merged, we are eager to test it on the onnx-mlir / zDLC compiler effort to optimize DNN for the Z platform using multi-threading.

Copy link
Member

@iii-i iii-i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I had a brief look and besides some typos I haven't spotted anything. __kmp_invoke_microtask() is the most involved piece, and it looks correct to me.

openmp/runtime/CMakeLists.txt Outdated Show resolved Hide resolved
openmp/runtime/cmake/config-ix.cmake Outdated Show resolved Hide resolved
openmp/runtime/src/kmp_runtime.cpp Outdated Show resolved Hide resolved
openmp/runtime/test/ompt/callback.h Outdated Show resolved Hide resolved
openmp/runtime/src/z_Linux_asm.S Outdated Show resolved Hide resolved
@nealef
Copy link
Contributor Author

nealef commented Oct 11, 2023

I'm not sure what went wrong with the checks after the last update. I had fetched main and rebased:

$ git fetch origin main
remote: Enumerating objects: 41698, done.
remote: Counting objects: 100% (23288/23288), done.
remote: Compressing objects: 100% (125/125), done.
remote: Total 41698 (delta 23186), reused 23209 (delta 23163), pack-reused 18410
Receiving objects: 100% (41698/41698), 71.51 MiB | 18.05 MiB/s, done.
Resolving deltas: 100% (32726/32726), completed with 6323 local objects.
From https://github.com/llvm/llvm-project
 * branch                      main       -> FETCH_HEAD
   d671126ad097..8301e485001b  main       -> origin/main
: 
$ git rebase origin main
Successfully rebased and updated refs/heads/main.

@iii-i
Copy link
Member

iii-i commented Oct 11, 2023

Hmm, as far as I can see, nealef:libomp-s390x is still based on a month-old commit. I tried running tests, and there is one failure caused by #65483. Rebasing on top of the latest main makes it go away.

@nealef
Copy link
Contributor Author

nealef commented Oct 11, 2023

Yes, I had done git rebase origin main so it was applying it to my local main instead of the branch with my changes. Rebased my branch and checks are underway again.

@github-actions
Copy link

github-actions bot commented Oct 11, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@nealef
Copy link
Contributor Author

nealef commented Oct 11, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

Thanks. Fixed. Do you want me to squash my commits before I push again?

@iii-i
Copy link
Member

iii-i commented Oct 16, 2023

I'm slowly going through the failures in the openmp testsuite (the llvm testsuite is now green). So far I have a couple of fixups that I plan to convert to proper patches later:

diff --git a/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp b/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp
index 7a3a2a7e9013..6abb203f0c36 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp
@@ -226,6 +226,8 @@ void SystemZPassConfig::addIRPasses() {
     addPass(createLoopDataPrefetchPass());
   }
 
+  addPass(createAtomicExpandPass());
+
   TargetPassConfig::addIRPasses();
 }
 
diff --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index 339e4ca4be6b..4397565e8f47 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -2456,12 +2456,21 @@ typedef struct kmp_depend_info {
   union {
     kmp_uint8 flag; // flag as an unsigned char
     struct { // flag as a set of 8 bits
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+      unsigned all : 1;
+      unsigned unused : 3;
+      unsigned set : 1;
+      unsigned mtx : 1;
+      unsigned out : 1;
+      unsigned in : 1;
+#else
       unsigned in : 1;
       unsigned out : 1;
       unsigned mtx : 1;
       unsigned set : 1;
       unsigned unused : 3;
       unsigned all : 1;
+#endif
     } flags;
   };
 } kmp_depend_info_t;

and there are more to come (e.g., we need to support backchain unwinding via __builtin_frame_address() in order to make some of the tests happy). I'll post the updates here.

@iii-i
Copy link
Member

iii-i commented Oct 18, 2023

I now have two fixups for the existing "Add openmp support to System z" commit; please consider making an update. The number of failures that I'm seeing locally is down from 45 to 24.

First, apparently there is an additional possiblity for print_possible_return_addresses():

--- a/openmp/runtime/test/ompt/callback.h
+++ b/openmp/runtime/test/ompt/callback.h
@@ -232,10 +232,18 @@ ompt_label_##id:
 // On s390x the NOP instruction is 2 bytes long. For non-void runtime
 // functions Clang inserts a STY instruction (but only if compiling under
 // -fno-PIC which will be the default with Clang 8.0, another 6 bytes).
+//
+// Another possibility is:
+//
+//                brasl %r14,__kmpc_end_master@plt
+//   a7 f4 00 02  j 0f
+//   47 00 00 00  0: nop
+//   a7 f4 00 02  j addr
+//                addr:
 #define print_possible_return_addresses(addr)                                  \
-  printf("%" PRIu64 ": current_address=%p or %p\n",                            \
+  printf("%" PRIu64 ": current_address=%p or %p or %p\n",                      \
          ompt_get_thread_data()->value, ((char *)addr) - 2,                    \
-         ((char *)addr) - 8)
+         ((char *)addr) - 8, ((char *)addr) - 12)
 #else
 #error Unsupported target architecture, cannot determine address offset!
 #endif

Second:

--- a/openmp/runtime/src/z_Linux_asm.S
+++ b/openmp/runtime/src/z_Linux_asm.S
@@ -2310,9 +2310,18 @@ __kmp_invoke_microtask:
        .cfi_startproc
 
        stmg    %r6,%r14,48(%r15)
+       .cfi_offset %r6, -112
+       .cfi_offset %r7, -104
+       .cfi_offset %r8, -96
+       .cfi_offset %r9, -88
+       .cfi_offset %r10, -80
+       .cfi_offset %r11, -72
+       .cfi_offset %r12, -64
+       .cfi_offset %r13, -56
+       .cfi_offset %r14, -48
+       .cfi_offset %r15, -40
        lgr     %r11,%r15
-       .cfi_def_cfa    %r15, 0
-       .cfi_offset     %r15, 0
+       .cfi_def_cfa %r11, 160
 
        // Compute the dynamic stack size:
        //
@@ -2342,7 +2351,7 @@ __kmp_invoke_microtask:
 #if OMPT_SUPPORT
        // Save frame pointer into exit_frame
        lg      %r8,160(%r11)
-       stg     %r15,0(%r8)
+       stg     %r11,0(%r8)
 #endif
 
        // Prepare arguments for the pkfn function (first 5 using r2-r6 registers)

The first part fixes unwinding (GDB could not unwind past this function). The second part fixes several tests; the problem is that OMPT_GET_FRAME_ADDRESS(0) is supposed to return the value of the stack pointer at entry, and not the current one.

@uweigand
Copy link
Member

For the print_possible_return_addresses case, is this the same issue ("additional jump") called out for LOONGARCH64 as well?

// On LoongArch64 the NOP instruction is 4 bytes long, can be followed by
// inserted jump instruction (another 4 bytes long). And an additional jump
// instruction may appear (adding 4 more bytes) when the NOP is referenced
// elsewhere (ie. another branch).

The second change looks correct to me.

@nealef
Copy link
Contributor Author

nealef commented Oct 18, 2023

I have made the changes but will hold off squashing/pushing until the rest of the changes are ready.

@iii-i
Copy link
Member

iii-i commented Oct 18, 2023

The backchain PR was merged: #69405

One more fixup, which gets the number of failures down to 20:

diff --git a/openmp/runtime/src/kmp_affinity.cpp b/openmp/runtime/src/kmp_affinity.cpp
index 20c1c610b915..1686de7eaafd 100644
--- a/openmp/runtime/src/kmp_affinity.cpp
+++ b/openmp/runtime/src/kmp_affinity.cpp
@@ -2990,6 +2990,9 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line,
 
   unsigned num_avail = 0;
   *line = 0;
+#if KMP_ARCH_S390X
+  bool reading_s390x_sys_info = true;
+#endif
   while (!feof(f)) {
     // Create an inner scoping level, so that all the goto targets at the end of
     // the loop appear in an outer scoping level. This avoids warnings about
@@ -3036,7 +3039,21 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line,
         continue;
 #endif
 
+#if KMP_ARCH_S390X
+      // s390x /proc/cpuinfo starts with a variable number of lines containing
+      // the overall system information. Skip them.
+      if (reading_s390x_sys_info) {
+        if (*buf == '\n')
+          reading_s390x_sys_info = false;
+        continue;
+      }
+#endif
+
+#if KMP_ARCH_S390X
+      char s1[] = "cpu number";
+#else
       char s1[] = "processor";
+#endif
       if (strncmp(buf, s1, sizeof(s1) - 1) == 0) {
         CHECK_LINE;
         char *p = strchr(buf + sizeof(s1) - 1, ':');
@@ -3062,6 +3079,23 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line,
             threadInfo[num_avail][osIdIndex]);
         __kmp_read_from_file(path, "%u", &threadInfo[num_avail][pkgIdIndex]);
 
+#if KMP_ARCH_S390X
+        // Disambiguate physical_package_id.
+        unsigned book_id;
+        KMP_SNPRINTF(path, sizeof(path),
+                     "/sys/devices/system/cpu/cpu%u/topology/book_id",
+                     threadInfo[num_avail][osIdIndex]);
+        __kmp_read_from_file(path, "%u", &book_id);
+        threadInfo[num_avail][pkgIdIndex] |= (book_id << 8);
+
+        unsigned drawer_id;
+        KMP_SNPRINTF(path, sizeof(path),
+                     "/sys/devices/system/cpu/cpu%u/topology/drawer_id",
+                     threadInfo[num_avail][osIdIndex]);
+        __kmp_read_from_file(path, "%u", &drawer_id);
+        threadInfo[num_avail][pkgIdIndex] |= (drawer_id << 16);
+#endif
+
         KMP_SNPRINTF(path, sizeof(path),
                      "/sys/devices/system/cpu/cpu%u/topology/core_id",
                      threadInfo[num_avail][osIdIndex]);

It's somewhat hacky, so I'm open to suggestions on how to improve this. This diff adds /proc/cpuinfo parsing support. The problem is that openmp thinks that topology has 3 levels: threads, cores and sockets (https://www.openmp.org/spec-html/5.0/openmpse53.html). On IBM Z we have two more levels: books and drawers. This change works around this mismatch by combining socket, book and drawer into one value.

In theory this all can be skipped altogether, so that __kmp_affinity_create_flat_map() would create a simplistic view of the system topology based on the number of CPUs alone. The problem with that approach is that the testcases expect that it should be consistent with /sys/devices/system/cpu, which it isn't.

@uweigand
Copy link
Member

On IBM Z we have two more levels: books and drawers. This change works around this mismatch by combining socket, book and drawer into one value.

How does the GNU OpenMP library handle this? If there are choices to be made, it would be best to remain compatible ...

@iii-i
Copy link
Member

iii-i commented Oct 19, 2023

For the print_possible_return_addresses case, is this the same issue ("additional jump") called out for LOONGARCH64 as well?

It looks similar, yes. I've decided to include the assembly snippet, because from these textual descriptions it's somewhat hard to fully understand what is going on.

How does the GNU OpenMP library handle this? If there are choices to be made, it would be best to remain compatible ...

They seem to sidestep the problem by parsing only core sibling lists (gomp_affinity_init_level_1()). Looking a bit closer, I don't think that the spec defines a way to directly refer to the socket number - this artificial number that I'm making up for disambiguation. So it's probably okay.

@shiltian
Copy link
Contributor

FWIW, it is fine (from OpenMP's perspective) to ignore certain features if the hardware/software don't support. For example, we don't have affinity control on macOS.

@AlexandreEichenberger
Copy link
Contributor

On IBM Z we have two more levels: books and drawers. This change works around this mismatch by combining socket, book and drawer into one value.

Having authored the original proposal on OMP affinity, there were too many different layouts/options to easily coalesce all possible machine hardware into a simple enough abstraction. In using a single number, number that are close are supposed to be "nearer" than others that are far. There can be discontinuities (e.g. CPUs 0-7 are on one socket, CPU #8 is "near" CPU 7 but will be on a different socket), this is unavoidable.

However, affinity has to be understood in terms of "places". If it is important to have socket affinity, then you may place threads assigned to CPU 0..7 to one OMP place, and threads assigned to CPU 8..15 to a different OMP place. Alternatively, if you want affinity by books or drawer, you may place OMP threads in Places that represent books or drawers.

When initiating a new parallel workspace/region/loop, one may indicate if we want threads to be colocated in the same place, in nearby places, or spread among places.

Hope this helps putting this feature in context.

@AlexandreEichenberger
Copy link
Contributor

AlexandreEichenberger commented Oct 19, 2023

FWIW, it is fine (from OpenMP's perspective) to ignore certain features if the hardware/software don't support. For example, we don't have affinity control on macOS.

Totally agree; and implementations for a custom arch can be refined over time to provide additional functionality as it is being ready to be utilized or added.

@AlexandreEichenberger
Copy link
Contributor

AlexandreEichenberger commented Oct 20, 2023

What are the remaining impediments in merging this PR?

@iii-i
Copy link
Member

iii-i commented Oct 22, 2023

There are still 20 testsuite failures, which I am investigating. Some are caused by flaky archer tests (where running on IBM Z somehow amplifies their flakiness), but there is at least one endianness issue that I'm still looking into, caused by %struct.DEP vs struct kmp_depend_info mismatch, and a few more I haven't investigated yet.

The idea is to clean up the testsuite and almost immediately enable the IBM Z buildbot.

@iii-i
Copy link
Member

iii-i commented Oct 24, 2023

I've posted the #69995 and #69982 PRs. With these PRs and with the fixups I posted in the comments here, I get only archer failures, which we can IMO xfail. Just in case, here is my working branch: https://github.com/iii-i/llvm-project/tree/systemz-openmp.

@nealef
Copy link
Contributor Author

nealef commented Oct 24, 2023

Just to confirm I have updated my tree with the patches in your comments and just need to squash and push.

@iii-i
Copy link
Member

iii-i commented Oct 24, 2023

The PRs were merged, thanks @shiltian for the reviews!

I went over my work branch, and noticed two more things that would be good to have in this PR. The following makes unwinding work in testcases; we already have similar definitions for sanitizers:

diff --git a/openmp/runtime/test/lit.cfg b/openmp/runtime/test/lit.cfg
index 650d3853e851..27ff057c85f6 100644
--- a/openmp/runtime/test/lit.cfg
+++ b/openmp/runtime/test/lit.cfg
@@ -51,6 +51,8 @@ flags = " -I " + config.test_source_root + \
     " " + config.test_extra_flags
 if config.has_omit_frame_pointer_flag:
     flags += " -fno-omit-frame-pointer"
+if config.target_arch == "s390x":
+    flags += " -mbackchain"
 
 config.test_flags = " -I " + config.omp_header_directory + flags
 config.test_flags_use_compiler_omp_h = flags

The second one is more questionable:

--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -1737,6 +1737,9 @@ __kmpc_omp_reg_task_with_affinity(ident_t *loc_ref, kmp_int32 gtid,
 // gtid: global thread ID of caller
 // task: the task to invoke
 // current_task: the task to resume after task invocation
+#ifdef __s390x__
+__attribute__((target("backchain")))
+#endif
 static void __kmp_invoke_task(kmp_int32 gtid, kmp_task_t *task,
                               kmp_taskdata_t *current_task) {
   kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);

In order for unwinding to work, all the code involved in the call chain must store backchain on the stack. In 99% of the cases this is test code, which is covered by the first diff. However, in one particular case there is library code involved. Building the whole library with backchain support would be an overkill, hence this solution. I understand that I'm proposing to add code that is relevant only to testing to the main logic, so I'm open to different approaches here.

Edit: I noticed I forgot one minor test fix, sent as #70075.

@AlexandreEichenberger
Copy link
Contributor

So just to confirm, the latest LLVM tree has the fixes?

@iii-i
Copy link
Member

iii-i commented Oct 25, 2023

So just to confirm, the latest LLVM tree has the fixes?

Yes, except for the very minor one (70075). As far as I'm concerned, we can go ahead without it. There will still be a small gap before we enable the buildbot.

@jprotze
Copy link
Collaborator

jprotze commented Oct 25, 2023

@iii-i I'm curios about the flaky archer tests. Is it more than what you fixed with #70075? Can you point me to some test output that shows your issue? If you still have some issues, please file an issue, paste the test output and ping me in the issue.

@uweigand
Copy link
Member

diff --git a/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp b/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp
index 7a3a2a7e9013..6abb203f0c36 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp
@@ -226,6 +226,8 @@ void SystemZPassConfig::addIRPasses() {
     addPass(createLoopDataPrefetchPass());
   }
 
+  addPass(createAtomicExpandPass());
+
   TargetPassConfig::addIRPasses();
 }

This should be done as a stand-alone PR, I think there'll need to be some additional discussion. In particular, we should make sure that the atomic expand pass doesn't introduce any other changes we don't want to see - we'll have to review the default settings of the various shouldExpandAtomic... and shouldCastAtomic... flags. For example, I don't think we need to cast atomic loads/stores of floating-point types to integer types - this can still happen directly via FPRs.

See also this (still pending) patch about handling 128-bit atomics: https://reviews.llvm.org/D146425

@AlexandreEichenberger
Copy link
Contributor

I looked at the current llvm https://github.com/llvm/llvm-project.git and it does not show s390x as a supported arch.

So it's not merged yet? Or was it merged and then pulled?

LIBOMP_ARCH = aarch64|arm|i386|loongarch64|mic|mips|mips64|ppc64|ppc64le|x86_64|riscv64
The default value for this option is chosen based on probing the compiler for architecture macros (e.g., is __x86_64__ predefined by compiler?).

@uweigand
Copy link
Member

I looked at the current llvm https://github.com/llvm/llvm-project.git and it does not show s390x as a supported arch.

So it's not merged yet? Or was it merged and then pulled?

No, it's not merged yet. We're still working through getting all pieces in place to ensure a fully clean test suite. At this point, the remaining issues are atomic fadd/fsub support (#70398), and then collecting the various changes described in the above comments and merging them into this PR.

@iii-i
Copy link
Member

iii-i commented Oct 31, 2023

#70398 is now merged; I put the fixups I propose for this PR into https://github.com/iii-i/llvm-project/tree/libomp-s390x.

@uweigand
Copy link
Member

#70398 is now merged; I put the fixups I propose for this PR into https://github.com/iii-i/llvm-project/tree/libomp-s390x.

Thanks, @iii-i ! @nealef , can you merge those changes here so we can then merge this PR as a whole? Thanks!

@nealef
Copy link
Contributor Author

nealef commented Oct 31, 2023

I'm still a little confused as to what should go in. Were any of the above in a separate PR? Here's what I have to commit at the moment. It doesn't have the frame pointer suggestions because I couldn't work out if they were going in separately.

diff --git a/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp b/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp
index 7a3a2a7e9013..6abb203f0c36 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp
@@ -226,6 +226,8 @@ void SystemZPassConfig::addIRPasses() {
     addPass(createLoopDataPrefetchPass());
   }

+  addPass(createAtomicExpandPass());
+
   TargetPassConfig::addIRPasses();
 }

diff --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index 339e4ca4be6b..4397565e8f47 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -2456,12 +2456,21 @@ typedef struct kmp_depend_info {
   union {
     kmp_uint8 flag; // flag as an unsigned char
     struct { // flag as a set of 8 bits
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+      unsigned all : 1;
+      unsigned unused : 3;
+      unsigned set : 1;
+      unsigned mtx : 1;
+      unsigned out : 1;
+      unsigned in : 1;
+#else
       unsigned in : 1;
       unsigned out : 1;
       unsigned mtx : 1;
       unsigned set : 1;
       unsigned unused : 3;
       unsigned all : 1;
+#endif
     } flags;
   };
 } kmp_depend_info_t;
diff --git a/openmp/runtime/src/z_Linux_asm.S b/openmp/runtime/src/z_Linux_asm.S
index 3533ef2d14ec..a72705528d41 100644
--- a/openmp/runtime/src/z_Linux_asm.S
+++ b/openmp/runtime/src/z_Linux_asm.S
@@ -2310,9 +2310,18 @@ __kmp_invoke_microtask:
 	.cfi_startproc

 	stmg	%r6,%r14,48(%r15)
+        .cfi_offset %r6, -112
+        .cfi_offset %r7, -104
+        .cfi_offset %r8, -96
+        .cfi_offset %r9, -88
+        .cfi_offset %r10, -80
+        .cfi_offset %r11, -72
+        .cfi_offset %r12, -64
+        .cfi_offset %r13, -56
+        .cfi_offset %r14, -48
+        .cfi_offset %r15, -40
 	lgr	%r11,%r15
-	.cfi_def_cfa	%r15, 0
-	.cfi_offset	%r15, 0
+	.cfi_def_cfa %r11, 160

 	// Compute the dynamic stack size:
 	//
@@ -2342,7 +2351,7 @@ __kmp_invoke_microtask:
 #if OMPT_SUPPORT
 	// Save frame pointer into exit_frame
 	lg	%r8,160(%r11)
-	stg	%r15,0(%r8)
+	stg	%r11,0(%r8)
 #endif

 	// Prepare arguments for the pkfn function (first 5 using r2-r6 registers)
diff --git a/openmp/runtime/test/ompt/callback.h b/openmp/runtime/test/ompt/callback.h
index 8e9bab363b16..efbd4c716e0e 100644
--- a/openmp/runtime/test/ompt/callback.h
+++ b/openmp/runtime/test/ompt/callback.h
@@ -232,10 +232,18 @@ ompt_label_##id:
 // On s390x the NOP instruction is 2 bytes long. For non-void runtime
 // functions Clang inserts a STY instruction (but only if compiling under
 // -fno-PIC which will be the default with Clang 8.0, another 6 bytes).
+//
+// Another possibility is:
+//
+//                brasl %r14,__kmpc_end_master@plt
+//   a7 f4 00 02  j 0f
+//   47 00 00 00  0: nop
+//   a7 f4 00 02  j addr
+//                addr:
 #define print_possible_return_addresses(addr)                                  \
-  printf("%" PRIu64 ": current_address=%p or %p\n",                            \
+  printf("%" PRIu64 ": current_address=%p or %p or %p\n",                      \
          ompt_get_thread_data()->value, ((char *)addr) - 2,                    \
-         ((char *)addr) - 8)
+         ((char *)addr) - 8, ((char *)addr) - 12)
 #else
 #error Unsupported target architecture, cannot determine address offset!
 #endif

What's missing or what should be excluded?

@iii-i
Copy link
Member

iii-i commented Oct 31, 2023

What's missing, and what I would also squash here, are:

  • 6eadf47 (Build openmp tests with -mbackchain)

  • a3d8559 (Compile __kmp_invoke_task with -mbackchain)

  • 13bea32 (Implement s390x /proc/cpuinfo parsing)

@llvmbot llvmbot added openmp:libomp OpenMP host runtime openmp:libomptarget OpenMP offload runtime labels Oct 31, 2023
@nealef
Copy link
Contributor Author

nealef commented Oct 31, 2023

Sigh, forgot to run the format checker.

@nealef
Copy link
Contributor Author

nealef commented Nov 1, 2023

Not sure what the format check failure is about.

@iii-i
Copy link
Member

iii-i commented Nov 1, 2023

It says Warning: Unable to find merge base between fa6b574215377389258f019853a528c12706196d and 8504ad40a9fd7602763d54e0048958414fb2976e, where fa6b574 is a fairly recent upstream commit, and 8504ad40a9fd7602763d54e0048958414fb2976e is nealef:libomp-s390x, which is 1985 commits behind llvm:main. So I would try rebasing.

@nealef
Copy link
Contributor Author

nealef commented Nov 1, 2023

I manually ran

git-clang-format --diff 88b6acc4695537f7126f643bb2feda3495bba6f6 8aaa2cb833abc840b0780c4a0a69d42eab0b6d38 -- openmp/runtime/src/kmp_affinity.cpp openmp/runtime/src/kmp_affinity.h openmp/runtime/src/kmp_os.h openmp/runtime/src/kmp_platform.h openmp/runtime/src/kmp_runtime.cpp openmp/runtime/src/kmp_tasking.cpp openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h openmp/runtime/src/z_Linux_util.cpp openmp/runtime/test/ompt/callback.h

where the two diff numbers are (1) my commit and (2) the next commit and it came back clean. So I am not sure (a) why the commit numbers are different to mine and (b) why it failed. I'll rebase again.

@iii-i
Copy link
Member

iii-i commented Nov 2, 2023

Here is the result of git diff -U0 --no-color HEAD^ | clang-format-diff -i -p1: iii-i@7fdc6ca

@nealef
Copy link
Contributor Author

nealef commented Nov 2, 2023

Here is the result of git diff -U0 --no-color HEAD^ | clang-format-diff -i -p1: iii-i@7fdc6ca

? - there was no output?

@iii-i
Copy link
Member

iii-i commented Nov 2, 2023

It removed trailing whitespace on lines 1743 and 1744.

nealef and others added 2 commits November 2, 2023 06:19
* openmp/README.rst
  - Add s390x to those platforms supported

* openmp/libomptarget/plugins-nextgen/CMakeLists.txt
  - Add s390x subdirectory

* openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt
  - Add s390x definitions

* openmp/runtime/CMakeLists.txt
  - Add s390x to those platforms supported

* openmp/runtime/cmake/LibompGetArchitecture.cmake
  - Define s390x ARCHITECTURE

* openmp/runtime/cmake/LibompMicroTests.cmake
  - Add dependencies for System z (aka s390x)

* openmp/runtime/cmake/LibompUtils.cmake
  - Add S390X to the mix

* openmp/runtime/cmake/config-ix.cmake
  - Add s390x as a supported LIPOMP_ARCH

* openmp/runtime/src/kmp_affinity.h
  - Define __NR_sched_[get|set]addinity for s390x

* openmp/runtime/src/kmp_config.h.cmake
  - Define CACHE_LINE for s390x

* openmp/runtime/src/kmp_os.h
  - Add KMP_ARCH_S390X to support checks

* openmp/runtime/src/kmp_platform.h
  - Define KMP_ARCH_S390X

* openmp/runtime/src/kmp_runtime.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/src/kmp_tasking.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h
  - Define ITT_ARCH_S390X

* openmp/runtime/src/z_Linux_asm.S
  - Instantiate __kmp_invoke_microtask for s390x

* openmp/runtime/src/z_Linux_util.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/test/ompt/callback.h
  - Define print_possible_return_addresses for s390x

* openmp/runtime/tools/lib/Platform.pm
  - Return s390x as platform and host architecture

* openmp/runtime/tools/lib/Uname.pm
  - Set hardware platform value for s390x

* openmp/runtime/src/kmp_affinity.cpp
  - Implement s390x /proc/cpuinfo parsing

* openmp/runtime/src/kmp_tasking.cpp
  - Add backchain attribute to __km_invoke_task
  - Style fix

* openmp/runtime/src/z_Linux_asm.S
  - Add unwind information

* openmp/runtime/test/lit.cfg
  - Build openmp tests with -mbackchain

* openmp/runtime/test/ompt/callback.h
  - Additional possibility for print_possible_return_addresses()
Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version LGTM. Thanks to @nealef and @iii-i for working through the remaining issues!

@uweigand
Copy link
Member

uweigand commented Nov 2, 2023

Hi @shiltian , does this still look good to you after the latest set of changes? From a SystemZ perspective this looks ready to be merged now. (After it is merged, I'll enable openmp testing in the SystemZ build bots as well.)

Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. The changes look good to me, assuming they have been tested internally. For the "missing" features we can always refine them gradually.

@uweigand uweigand merged commit 1111ef0 into llvm:main Nov 3, 2023
3 checks passed
@nealef nealef deleted the libomp-s390x branch November 3, 2023 12:29
@uweigand
Copy link
Member

uweigand commented Nov 3, 2023

Opened a PR to add an OpenMP builder for s390x here: llvm/llvm-zorg#67

kwk added a commit to fedora-llvm-team/llvm-snapshots that referenced this pull request Nov 3, 2023
kwk added a commit to fedora-llvm-team/llvm-snapshots that referenced this pull request Nov 3, 2023
##===----------------------------------------------------------------------===##

if(CMAKE_SYSTEM_NAME MATCHES "Linux")
build_generic_elf64("SystemZ" "S390X" "s390x" "s390x-ibm-linux-gnu" "22")
Copy link
Collaborator

@tstellar tstellar Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to build this on Fedora and CMAKE_SYSTEM_PROCESSOR is set to s390x and not SystemZ, so the plugin fails to build. What operating system were you testing this on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alma Linux 8 I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see

-- LIBOMPTARGET: Not building S390X NextGen offloading plugin: machine not found in the system.

@nealef can you see what we need to do to fix this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does CMAKE_SYSTEM_NAME get set to anything but Linux for a z build?

if(CMAKE_SYSTEM_NAME MATCHES "Linux")
 build_generic_elf64("SystemZ" "S390X" "s390x" "s390x-ibm-linux-gnu" "22")
else()
 libomptarget_say("Not building s390x NextGen offloading plugin: machine not found in the system.")
endif()

That's the only way we can get this message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's a different message (note the "S390X" instead of "s390x" in the message text). This message seems to originate from the implementation of build_generic_elf64:

macro(build_generic_elf64 tmachine tmachine_name tmachine_libname tmachine_triple elf_machine_id)
if(CMAKE_SYSTEM_PROCESSOR MATCHES "${tmachine}$")
[...]
else()
  libomptarget_say("Not building ${tmachine_name} NextGen offloading plugin: machine not found in the system.")
endif()

This is testing CMAKE_SYSTEM_PROCESSOR against SystemZ, but the macro is set to s390x as Tom said.

If I change the build_generic_elf64 invocation to use s390x, an attempt to build the plugin is made, but it fails:

-- LIBOMPTARGET: Building s390x plugin linked with libffi
[...]
/home/uweigand/llvm/llvm-head/openmp/libomptarget/plugins-nextgen/generic-elf-64bit/src/rtl.cpp:409:20: error: no member named 's390x' in 'llvm::Triple'
  409 |     return Triple::LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE;
      |            ~~~~~~~~^
<command line>:2:52: note: expanded from macro 'LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE'
    2 | #define LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE s390x
      |                                                    ^

So there appears to be some mismatch here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out actually enabling this uncovers a whole lot of additional problems. I now seem to have it all working; PRs submitted as:
#83978
#83976
#83975

@nealef
Copy link
Contributor Author

nealef commented Mar 11, 2024

Thanks Uli. I was on vacation the last couple of weeks so didn't have a chance to look at it. However, judging by your changes I would've been asking you a lot of questions anyway ;-).

@uweigand
Copy link
Member

I've finally merged all needed PRs, so the plugin now builds on SystemZ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openmp:libomp OpenMP host runtime openmp:libomptarget OpenMP offload runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants