Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a way to disable use of -mcx16 compile flag #9022

Open
opoplawski opened this issue May 30, 2021 · 8 comments
Open

Need a way to disable use of -mcx16 compile flag #9022

opoplawski opened this issue May 30, 2021 · 8 comments

Comments

@opoplawski
Copy link
Contributor

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

4.1.1

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

This is regarding build the Fedora openmpi package

Details of the problem

From https://bugzilla.redhat.com/show_bug.cgi?id=1965692:

I recently acquired a cheap wintel toy tablet/laptop (a Kano PC), which has a
Celeron N4000 processor. I tried running HPL benchmarks on it to figure out how
fast the thing is. However, xhpl_openmpi crashed with an illegal instruction. I
then installed the MPICH version of HPL, and xhpl_mpich runs fine. The issue is
thus in OpenMPI.

Looking at the build logs of openmpi-4.1.0-5.fc34.x86_64, I see that it has
been compiled with the -mcx16 flag detected by configure. According to the GCC
man page

       This option enables GCC to generate "CMPXCHG16B" instructions in
       64-bit code to implement compare-and-exchange operations on 16-byte
       aligned 128-bit objects.  This is useful for atomic updates of data
       structures exceeding one machine word in size.  The compiler uses
       this instruction to implement __sync Builtins.  However, for
       __atomic Builtins operating on 128-bit integers, a library call is
       always used.

However, because the instruction is not available on all x86_64 processors, the
use of the flag should be disabled in OpenMPI.

However, I see no easy way to do this in configure, especially since I think we may need to fallback to using -latomic.

@ggouaillardet
Copy link
Contributor

@opoplawski Thanks for the report.

One option is to add a RUN_IF_ELSE() test (just like we do for -latomic).
That would do the trick but only if you build Open MPI on your Celeron processor.

Shall I guess you would not be happy with this, since you might want to install Open MPI from a binary package
provided by Fedora, and such binaries are unlikely built on Celeron?

An other option would be to add yet an other configure option to explicitly not test the -mcx16 flag (and hence fallback on -latomic). That would help portability at the expense of performance.

@hjelmn can you please comment on the performance impact of using -latomic even if -mcx16 would be an option?

@opoplawski
Copy link
Contributor Author

The problem for the Fedora packaging is that it is built on hardware with cx16 support, but must be able to be run on hardware without it. Ideally it would detect this at run time, but I'm not sure this is possible. It seems like reverting to what I'm assuming is the old behavior of always using -latomic via a configure flag would be acceptable.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Jun 1, 2021
Do not try the -mcx16 flag if --disable-cx16-atomics is used,
and prevent the generation of instructions that are not available
on all x86 platforms (such as Celeron N4000).

Always try to run a simple test to make sure the selected atomic
generate correct results.

Thanks Orion Poplawski for reporting this issue.

Refs. open-mpi#9022

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
@ggouaillardet
Copy link
Contributor

@opoplawski Fair enough, I issued #9023 (on master) to address this issue.

From a Fedora packaging point of view, you would now configure --disable-cx16-atomics ... to tell Open MPI not to test the -mcx16 flag.

I revamped the logic, and the new expected behavior on your Celeron is

  • try without flags : fail
  • try to compile with -mcx16 : success
  • test -mcx16 generates correct code : fail (because of a SIGILL)
  • try to compile with -latomic : success
  • test -latomic generates correct code : success

I could not test this because I do not have access to a Celeron, and I will appreciate you double check the actual behavior on your tablet.

A similar fix will have to be applied to the internal PMIx and the external one (I guess Fedora is using)

FWIW, is here a diff for the v4.1.x branch (Open MPI only, PMIx is yet to be done)
If you choose to apply it to an official tarball, you will need recent autotools, and run
autogen.pl --force in the source directory before invoking configure

diff --git a/config/opal_config_asm.m4 b/config/opal_config_asm.m4
index 5183c7e..1bee71a 100644
--- a/config/opal_config_asm.m4
+++ b/config/opal_config_asm.m4
@@ -11,8 +11,8 @@ dnl Copyright (c) 2004-2005 The Regents of the University of California.
 dnl                         All rights reserved.
 dnl Copyright (c) 2008-2018 Cisco Systems, Inc.  All rights reserved.
 dnl Copyright (c) 2010      Oracle and/or its affiliates.  All rights reserved.
-dnl Copyright (c) 2015-2017 Research Organization for Information Science
-dnl                         and Technology (RIST). All rights reserved.
+dnl Copyright (c) 2015-2021 Research Organization for Information Science
+dnl                         and Technology (RIST).  All rights reserved.
 dnl Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights
 dnl                         reserved.
 dnl Copyright (c) 2017      Amazon.com, Inc. or its affiliates.  All Rights
@@ -126,15 +126,38 @@ int main(int argc, char** argv)
 dnl ------------------------------------------------------------------
 
 dnl
-dnl Check to see if a specific function is linkable.
+dnl Helper to Check if a specific function is usable.
 dnl
-dnl Check with:
+dnl
+dnl $1: function name to print
+dnl $2: program to test
+dnl $3: action if success
+dnl #4: action if fail
+dnl
+AC_DEFUN([_OPAL_ASM_CHECK_ATOMIC_FUNC],[
+    AC_LINK_IFELSE([$2],
+        [AC_MSG_RESULT([yes])
+         dnl make sure it works
+         AC_MSG_CHECKING([if $1() gives correct results])
+         AC_RUN_IFELSE([$2],
+               [$3
+                AC_MSG_RESULT([yes])],
+               [$4
+                AC_MSG_RESULT([no])],
+               [AC_MSG_RESULT([cannot test -- assume yes (cross compiling)])])],
+        [$4
+         AC_MSG_RESULT([no])])
+])
+
+dnl
+dnl Check to see if a specific function is usable.
+dnl
+dnl Check compilation and actually try ro run the test code
+dnl (if we're not cross-compiling) in order to verify that
+dnl it actually gives us the correct result:
 dnl 1. No compiler/linker flags.
-dnl 2. CFLAGS += -mcx16
+dnl 2. CFLAGS += -mcx16 (unless --disable-cx16-atomics is used)
 dnl 3. LIBS += -latomic
-dnl 4. Finally, if it links ok with any of #1, #2, or #3, actually try
-dnl to run the test code (if we're not cross-compiling) and verify
-dnl that it actually gives us the correct result.
 dnl
 dnl Note that we unfortunately can't use AC SEARCH_LIBS because its
 dnl check incorrectly fails (because these functions are special compiler
@@ -161,47 +184,27 @@ AC_DEFUN([OPAL_ASM_CHECK_ATOMIC_FUNC],[
 
     dnl Check with no compiler/linker flags
     AC_MSG_CHECKING([for $1])
-    AC_LINK_IFELSE([$2],
-        [opal_asm_check_func_happy=1
-         AC_MSG_RESULT([yes])],
-        [opal_asm_check_func_happy=0
-         AC_MSG_RESULT([no])])
-
+    _OPAL_ASM_CHECK_ATOMIC_FUNC([$1], [$2], [opal_asm_check_func_happy=1], [opal_asm_check_func_happy=0])
     dnl If that didn't work, try again with CFLAGS+=mcx16
     AS_IF([test $opal_asm_check_func_happy -eq 0],
-        [AC_MSG_CHECKING([for $1 with -mcx16])
-         CFLAGS="$CFLAGS -mcx16"
-         AC_LINK_IFELSE([$2],
-             [opal_asm_check_func_happy=1
-              AC_MSG_RESULT([yes])],
-             [opal_asm_check_func_happy=0
-              CFLAGS=$opal_asm_check_func_CFLAGS_save
-              AC_MSG_RESULT([no])])
-         ])
+          [AC_MSG_CHECKING([for $1 with -mcx16])
+           AS_IF([test "$enable_cx16_atomics" = "no"],
+                 [AC_MSG_RESULT([skipped])],
+                 [CFLAGS="$CFLAGS -mcx16"
+                  _OPAL_ASM_CHECK_ATOMIC_FUNC([$1], [$2],
+                                              [opal_asm_check_func_happy=1],
+                                              [opal_asm_check_func_happy=0
+                                               CFLAGS=$opal_asm_check_func_CFLAGS_save])])
+          ])
 
     dnl If that didn't work, try again with LIBS+=-latomic
     AS_IF([test $opal_asm_check_func_happy -eq 0],
-        [AC_MSG_CHECKING([for $1 with -latomic])
-         LIBS="$LIBS -latomic"
-         AC_LINK_IFELSE([$2],
-             [opal_asm_check_func_happy=1
-              AC_MSG_RESULT([yes])],
-             [opal_asm_check_func_happy=0
-              LIBS=$opal_asm_check_func_LIBS_save
-              AC_MSG_RESULT([no])])
-         ])
-
-    dnl If we have it, try it and make sure it gives a correct result.
-    dnl As of Aug 2018, we know that it links but does *not* work on clang
-    dnl 6 on ARM64.
-    AS_IF([test $opal_asm_check_func_happy -eq 1],
-        [AC_MSG_CHECKING([if $1() gives correct results])
-         AC_RUN_IFELSE([$2],
-              [AC_MSG_RESULT([yes])],
-              [opal_asm_check_func_happy=0
-               AC_MSG_RESULT([no])],
-              [AC_MSG_RESULT([cannot test -- assume yes (cross compiling)])])
-         ])
+          [AC_MSG_CHECKING([for $1 with -latomic])
+           LIBS="$LIBS -latomic"
+           _OPAL_ASM_CHECK_ATOMIC_FUNC([$1], [$2],
+                                       [opal_asm_check_func_happy=1],
+                                       [opal_asm_check_func_happy=0
+                                        LIBS=$opal_asm_check_func_LIBS_save])])
 
     dnl If we were unsuccessful, restore CFLAGS/LIBS
     AS_IF([test $opal_asm_check_func_happy -eq 0],
@@ -1025,6 +1028,10 @@ AC_DEFUN([OPAL_CONFIG_ASM],[
       [AC_HELP_STRING([--enable-builtin-atomics],
          [Enable use of __atomic builtin atomics (default: enabled)])])
 
+    AC_ARG_ENABLE([cx16-atomics],
+      [AS_HELP_STRING([--enable-cx16-atomics],
+         [Try using -mcx16 flag if needed (default: autodetect)])])
+
     opal_cv_asm_builtin="BUILTIN_NO"
     AS_IF([test "$opal_cv_asm_builtin" = "BUILTIN_NO" && test "$enable_builtin_atomics" != "no"],
           [OPAL_CHECK_GCC_ATOMIC_BUILTINS([opal_cv_asm_builtin="BUILTIN_GCC"], [])])

@bosilca
Copy link
Member

bosilca commented Jun 1, 2021

Adding an option to entirely skip the mcx16 test is fine, but if you add actual runs of the generated code you will breaks cross-compilation. Another solution, which is currently used by a lot of distro builders, is to force the target CPU (march and friends) via the initial flags (compile and link), in order to prevent the compiler from generating invalid instructions for the specified architecture.

@hjelmn
Copy link
Member

hjelmn commented Jun 1, 2021

Thats rather disappointing that the distros will have to turn off this feature due to some sub-par processors. Is there no way to limit the effect to only these low-end processors?

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Jun 2, 2021
Do not try the -mcx16 flag if --disable-cx16-atomics is used,
and prevent the generation of instructions that are not available
on all x86 platforms (such as Celeron N4000).

Always try to run a simple test to make sure the selected atomic
generate correct results.

Thanks Orion Poplawski for reporting this issue.

Refs. open-mpi#9022

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
@ggouaillardet
Copy link
Contributor

@bosilca there is currently a test, and it is assumed to success in the case of cross compilation (well, as long as autotools see it as a cross-compilation).

I updated the PR to prepend -mcx16 to CFLAGS (same as op/avx) to make sure -march=... takes precedence. Do you know which -march=... should be used for Celeron? I tried -mcx16 -march=goldmont on GCC 11, but there is no conflict and the code compiles.

@bosilca
Copy link
Member

bosilca commented Jun 2, 2021

My bad, I missed the cross-compilation argument for AC_RUN_IFELSE.

@susilehtola
Copy link

@opoplawski pinged me by email; I reported the original issue.

-march cannot be set since that comes from the Fedora default optimization flags, %optflags a.k.a. $RPM_OPT_FLAGS. The Fedora default flags on x86_64 (where the problem occurred) is just

-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection

That is, there is only the -mtune=generic flag so that the compiler makes no assumptions about the processor.

However, if you turn on flags like -mcx16 that turn on use of specific instructions, then you break down the cross-platform compatibility operating system distributions require.

tldr: the only thing you need to do is not turn on any special hardware-specific features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants