Skip to content

PPC64 - may have problems with atomic make check tests #2610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hppritcha opened this issue Dec 20, 2016 · 20 comments
Closed

PPC64 - may have problems with atomic make check tests #2610

hppritcha opened this issue Dec 20, 2016 · 20 comments
Assignees
Labels
Milestone

Comments

@hppritcha
Copy link
Member

@opoplawski is reporting a problem with the 2.0.2 rc1 candidate on PPC64:

There's a suspect PR #2178 - looks like only major change with PPC atomics between 2.0.1 (where the make distcheck works) and 2.0.2rc.

https://www.mail-archive.com/devel@lists.open-mpi.org//msg19851.html

@hppritcha
Copy link
Member Author

@hjelmn if you are able to reproduce please try. Otherwise, @jjhursey could someone from IBM take a look at this?

@jjhursey jjhursey self-assigned this Dec 20, 2016
@PHHargrove
Copy link
Member

FWIW I tested the same RC on multiple PPC64 w/o failures of "make check".
On big-endian POWER7, my coverage included gcc-4.4.7 and 4.8.3, as well as xlc-13.1.
On little-endian POWER8, my coverage included gcc-4.9.2, xlc-13.1 and pgi-16.10.

Additionally, I have just now tried gcc-6.2 on both platforms and pass "make check" just fine.

v2.x defaults to using the gcc builtin atomics when present. So, changes to the inline asm in PR #2178 might be unrelated unless some non-default configure flags or a compiler was used which lacks GNU inline asm support (note xlc-13.1 for Linux does provide the gcc builtins).

@opoplawski
Copy link
Contributor

@PHHargrove
Copy link
Member

I am able to reproduce by configuring with --disable-builtin-atomics, with both gcc-4.x and gcc-6.2 on both big-endian POWER7 and little-endian POWER8.

@hjelmn my OpenPOWER VM you used for PR #2178 has gcc-5.4, but I suspect (given failures w/ 4.x and 6.2) that it will also reproduce the problem if you lack alternatives.

@opoplawski
Copy link
Contributor

I see this in the configure logs:

checking if gcc supports GCC inline assembly... yes
checking if gcc supports DEC inline assembly... no
checking if gcc supports XLC inline assembly... no
checking for assembly format... default-.text-.globl-:--.L-@-1-1-0-1-1
checking for assembly architecture... POWERPC64
checking for builtin atomics... BUILTIN_NO
checking for pre-built assembly file... no (not in asm-data)
checking whether possible to generate assembly file... yes
checking for atomic assembly filename... atomic-local.s

@PHHargrove
Copy link
Member

@opoplawski Yes, I also noticed checking for builtin atomics... BUILTIN_NO in your logs, though I do not know the cause. However, it does explain why I didn't find this in my previous testing, and why --disable-builtin-atomics reproduces it.

@opoplawski
Copy link
Contributor

It turns out that the default for --enable-builtin-atomics is disabled if not specified. Is that expected? Should I be specifying --enable-builtin-atomics in my builds?

@PHHargrove
Copy link
Member

@opoplawski

I realize now that my original (no error) builds included --enable-debug while my later ones removed that flag in addition to adding --disable-builtin-atomics. So, I am not certain any longer about what I claimed about which atomics are enabled by default and about the correlation between disabling the builtins and reproducing your error.

I am going to step away from this issue now, and leave it to Nathan and Josh (who probably have a better idea how to fix the problem).

@opoplawski
Copy link
Contributor

I'll note that it looks like if I set --enable-builtin-atomics my builds succeed: https://koji.fedoraproject.org/koji/taskinfo?taskID=17005965

@hppritcha hppritcha changed the title PPC64 - may have problems with atomic make distcheck tests PPC64 - may have problems with atomic make check tests Dec 21, 2016
@hppritcha
Copy link
Member Author

@opoplawski is it okay to require --enable-builtin-atomics for your purposes for 2.0.2?

@opoplawski
Copy link
Contributor

Seems fine to me - I know nothing about these options though.

@hppritcha hppritcha added this to the v2.0.3 milestone Jan 4, 2017
@PHHargrove
Copy link
Member

@hppritcha
This bug is still present in my testing of the rc2 tarball on multiple PPC64 systems, when using the default configure options (no --enable-debug or any atomics-related configure options).
You added this bug to the v2.0.3 milestone, not v2.0.2.
So, am I correct in thinking that you will "fix this in documentation" for 2.0.2, as suggested by the question you posed to Orion:

is it okay to require --enable-builtin-atomics for your purposes for 2.0.2?

@hppritcha
Copy link
Member Author

@PHHargrove I wasn't intending to imply we'd fix this in the documentation, but we can use this issue as such.

@hppritcha
Copy link
Member Author

hppritcha commented Jan 6, 2017

I'm making this a blocker for 2.0.2. There appear to be more issues with PPC64 and atomics than we'd thought.

@jsquyres
Copy link
Member

jsquyres commented Jan 9, 2017

@gpaulsen @jjhursey Can you guys be prepared to talk about this one tomorrow on the webex?

@hppritcha
Copy link
Member Author

removed @hjelmn he has enough to do.

@nysal
Copy link
Member

nysal commented Jan 10, 2017

This looks like a regression caused by:

commit d4be138a7b6993dc000da0f89e57d07c0db5a770
Author: Nathan Hjelm <hjelmn@lanl.gov>
Date:   Thu Sep 15 12:37:24 2016 -0600

    asm/ppc: work around apparent PGI 16.9 bug
    
    The add_64, sub_64, and cmpset_64 atomics used "+m" (*addr) to
    indicate the asm also writes the memory location. This is better than
    using a memory clobber. PGI 16.9 introduced a bug that causes a
    compiler failure on the "+m" constraint (input/output). It seems to
    work with "=m" (output) which matches the 32-bit atomics.
    
    Fixes open-mpi/ompi#2086
    
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    (cherry picked from commit 2edc77b27ba06dab65ddf261a59ff6ec1ad8ed38)

Adding the constraint to the input operands list seems to fix the problem in my limited tests. Please review/test and I'll open a PR. The issue with the build not using the builtin atomics needs investigation. However it shouldn't be a blocker. It might actually be better if we switch over to the builtin atomics on PPC only after we verify there are no performance regressions.

--- a/opal/include/opal/sys/powerpc/atomic.h
+++ b/opal/include/opal/sys/powerpc/atomic.h
@@ -223,6 +223,7 @@ static inline int32_t opal_atomic_swap_32(volatile int32_t *addr, int32_t newval
 #if (OPAL_ASSEMBLY_ARCH == OPAL_POWERPC64)
 
 #if  OPAL_GCC_INLINE_ASSEMBLY
+
 static inline int64_t opal_atomic_add_64 (volatile int64_t* v, int64_t inc)
 {
    int64_t t;
@@ -232,7 +233,7 @@ static inline int64_t opal_atomic_add_64 (volatile int64_t* v, int64_t inc)
                         "     stdcx.  %0, 0, %3    \n\t"
                         "     bne-    1b           \n\t"
                         : "=&r" (t), "=m" (*v)
-                        : "r" (OPAL_ASM_VALUE64(inc)), "r" OPAL_ASM_ADDR(v)
+                        : "r" (OPAL_ASM_VALUE64(inc)), "r" OPAL_ASM_ADDR(v), "m" (*v)
                         : "cc");
 
    return t;
@@ -249,7 +250,7 @@ static inline int64_t opal_atomic_sub_64 (volatile int64_t* v, int64_t dec)
                         "     stdcx.  %0,0,%3      \n\t"
                         "     bne-    1b           \n\t"
                         : "=&r" (t), "=m" (*v)
-                        : "r" (OPAL_ASM_VALUE64(dec)), "r" OPAL_ASM_ADDR(v)
+                        : "r" (OPAL_ASM_VALUE64(dec)), "r" OPAL_ASM_ADDR(v), "m" (*v)
                         : "cc");
 
    return t;
@@ -268,7 +269,7 @@ static inline int opal_atomic_cmpset_64(volatile int64_t *addr,
                          "   bne-    1b         \n\t"
                          "2:"
                          : "=&r" (ret), "=m" (*addr)
-                         : "r" (addr), "r" (OPAL_ASM_VALUE64(oldval)), "r" (OPAL_ASM_VALUE64(newval))
+                         : "r" (addr), "r" (OPAL_ASM_VALUE64(oldval)), "r" (OPAL_ASM_VALUE64(newval)), "m" (*addr)
                          : "cc", "memory");
 
    return (ret == oldval);

@hppritcha hppritcha added this to the v2.0.3 milestone Jan 10, 2017
@hppritcha hppritcha modified the milestones: v2.0.2, v2.0.3 Jan 10, 2017
@hppritcha
Copy link
Member Author

Do we want to get @nysal 's proposed fix in for 2.0.2 or push out to 2.0.3?

@nysal
Copy link
Member

nysal commented Jan 11, 2017

@hppritcha I'd like to get this in 2.0.2. PR coming up shortly.

hppritcha pushed a commit that referenced this issue Jan 12, 2017
Add a missing constraint to the input operand list.
This fixes a regression caused by d4be138.
Thanks to Orion Poplawski for reporting the issue.

Refs #2610

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
@hppritcha
Copy link
Member Author

closed via #2706, #2707, and #2708

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants