PPC64 builds broken on master #874

Closed
jsquyres opened this Issue Sep 8, 2015 · 27 comments

Comments

Projects
None yet
7 participants
Owner

jsquyres commented Sep 8, 2015

Per http://www.open-mpi.org/community/lists/devel/2015/09/17979.php, @adrianreber reports:

Since a few days the MTT runs on my ppc64 systems are failing with:

[bimini:11716] *** Process received signal ***
[bimini:11716] Signal: Segmentation fault (11)
[bimini:11716] Signal code: Address not mapped (1)
[bimini:11716] Failing at address: (nil)[bimini:11716] [ 0] [0x3fffa2bb0448]
[bimini:11716] [ 1] /lib64/libc.so.6(+0xcb074)[0x3fffa27eb074] [bimini:11716] [ 2]
/home/adrian/mtt-scratch/installs/GubX/install/lib/libpmix.so.0(opal_pmix_pmix1xx_pmix_value_xfer-0x68758)[0x3fffa2158a10] [bimini:11716] [ 3]
/home/adrian/mtt-scratch/installs/GubX/install/lib/libpmix.so.0(OPAL_PMIX_PMIX1XX_PMIx_Put-0x48338)[0x3fffa2179f70] [bimini:11716] [ 4]
/home/adrian/mtt-scratch/installs/GubX/install/lib/openmpi/mca_pmix_pmix1xx.so(pmix1_put-0x27efc)[0x3fffa21d858c]

I think I do not see these kind of errors on any of the other MTT setups so it might be ppc64 related. Just wanted to point it out.

@rhc54 I see that the failure is in PMIX...?
@gpaulsen @nysal Pinging the PPC64/IBM people...

@jsquyres jsquyres added the bug label Sep 8, 2015

Member

rhc54 commented Sep 8, 2015

@jsquyres it isn't in the PMIx library, but in the opal integration to that library. Looks like some kind of issue with moving data between OPAL and PMIx structures.

Contributor

ggouaillardet commented Sep 9, 2015

i found several issues related to big endian (i reproduced them on a sparc architecture)

the inlined patch fixes some of them.
with the patch

mpirun -np 1 a.out

does work

a.out

suffers from intermittent hang

mpirun -np 2 a.out

always fail with the following error

--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  getting local rank failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_init failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)

btw, is PMIx supposed to work on an heterogeneous cluster (e.g. mix of big and little endian) ?

here is the patch

diff --git a/opal/mca/pmix/pmix1xx/pmix/src/buffer_ops/pack.c b/opal/mca/pmix/pmix1xx/pmix/src/buffer_ops/pack.c
index f1db83c..faa9a6e 100644
--- a/opal/mca/pmix/pmix1xx/pmix/src/buffer_ops/pack.c
+++ b/opal/mca/pmix/pmix1xx/pmix/src/buffer_ops/pack.c
@@ -11,6 +11,8 @@
  *                         All rights reserved.
  * Copyright (c) 2011-2013 Cisco Systems, Inc.  All rights reserved.
  * Copyright (c) 2014-2015 Intel, Inc. All rights reserved.
+ * Copyright (c) 2015      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -264,7 +266,7 @@ int pmix_bfrop_pack_int64(pmix_buffer_t *buffer, const void *src,
                           int32_t num_vals, pmix_data_type_t type)
 {
     int32_t i;
-    uint64_t tmp, *srctmp = (uint64_t*) src;
+    uint64_t tmp, tmp2;
     char *dst;
     size_t bytes_packed = num_vals * sizeof(tmp);

@@ -275,7 +277,8 @@ int pmix_bfrop_pack_int64(pmix_buffer_t *buffer, const void *src,
     }

     for (i = 0; i < num_vals; ++i) {
-        tmp = pmix_hton64(srctmp[i]);
+        memcpy(&tmp2, (char *)src+i*sizeof(uint64_t), sizeof(uint64_t));
+        tmp = pmix_hton64(tmp2);
         memcpy(dst, &tmp, sizeof(tmp));
         dst += sizeof(tmp);
     }
diff --git a/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c b/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
index 95e8aa5..c6c3496 100644
--- a/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
+++ b/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
@@ -230,8 +230,7 @@ int PMIx_Init(pmix_proc_t *proc)
         return PMIX_ERR_BAD_PARAM;
     }

-    ++pmix_globals.init_cntr;
-    if (1 < pmix_globals.init_cntr) {
+    if (0 < pmix_globals.init_cntr) {
         /* since we have been called before, the nspace and
          * rank should be known. So return them here if
          * requested */
@@ -339,6 +338,9 @@ int PMIx_Init(pmix_proc_t *proc)
     rc = cb.status;
     PMIX_DESTRUCT(&cb);

+    if (PMIX_SUCCESS == rc) {
+        pmix_globals.init_cntr++;
+    }
     return rc;
 }

diff --git a/opal/mca/pmix/pmix1xx/pmix_pmix1.c b/opal/mca/pmix/pmix1xx/pmix_pmix1.c
index 3abeee9..84c7774 100644
--- a/opal/mca/pmix/pmix1xx/pmix_pmix1.c
+++ b/opal/mca/pmix/pmix1xx/pmix_pmix1.c
@@ -281,7 +281,7 @@ void pmix1_value_load(pmix_value_t *v,
             break;
         case OPAL_SIZE:
             v->type = PMIX_SIZE;
-            memcpy(&(v->data.size), &kv->data.size, sizeof(size_t));
+            v->data.size = (size_t)kv->data.size;
             break;
         case OPAL_PID:
             v->type = PMIX_PID;
@@ -344,7 +344,7 @@ void pmix1_value_load(pmix_value_t *v,
             if (NULL != kv->data.bo.bytes) {
                 v->data.bo.bytes = (char*)malloc(kv->data.bo.size);
                 memcpy(v->data.bo.bytes, kv->data.bo.bytes, kv->data.bo.size);
-                memcpy(&(v->data.bo.size), &kv->data.bo.size, sizeof(size_t));
+                v->data.bo.size = (size_t)kv->data.bo.size;
             } else {
                 v->data.bo.bytes = NULL;
                 v->data.bo.size = 0;
@@ -382,7 +382,7 @@ int pmix1_value_unload(opal_value_t *kv,
         break;
     case PMIX_SIZE:
         kv->type = OPAL_SIZE;
-        memcpy(&kv->data.size, &(v->data.size), sizeof(size_t));
+        kv->data.size = (int)v->data.size;
         break;
     case PMIX_PID:
         kv->type = OPAL_PID;
@@ -444,7 +444,7 @@ int pmix1_value_unload(opal_value_t *kv,
         kv->type = OPAL_BYTE_OBJECT;
         if (NULL != v->data.bo.bytes && 0 < v->data.bo.size) {
             kv->data.bo.bytes = (uint8_t*)v->data.bo.bytes;
-            kv->data.bo.size = v->data.bo.size;
+            kv->data.bo.size = (int)v->data.bo.size;
         } else {
             kv->data.bo.bytes = NULL;
             kv->data.bo.size = 0;
Member

rhc54 commented Sep 9, 2015

Ah crud - I thought I'd been careful enough about avoiding the alignment issues, but obviously not. I'll take a gander at these in the morning and bring them into PMIx - will also see if I can spot any additional problems. Thanks!

Contributor

ggouaillardet commented Sep 9, 2015

@rhc54 note there are two kind of issues

  • alignment
  • conversion (e.g. cannot simply memcpy an int into a long on big endian arch)
Member

adrianreber commented Sep 9, 2015

I can confirm that I no longer get segfaults on ppc64 but the same errors as @ggouaillardet .

Member

rhc54 commented Sep 9, 2015

@ggouaillardet PMIx is supposed to work on hetero clusters - it has the equivalent of the OPAL dss for packing/unpacking to make it work. Did you see something that would prevent it?

Member

nysal commented Sep 9, 2015

@jsquyres @rhc54 I'm out of office for the next two days. I'll try reproducing this issue on Monday.

Member

rhc54 commented Sep 9, 2015

@nysal Thanks!

Contributor

ggouaillardet commented Sep 10, 2015

@rhc54 my question about hetero clusters was a naive one.

with the following inlined oneline patch, i get no more issues with singleton and mpirun -np 1
(but mpirun -np 2 still fails)

diff --git a/opal/mca/pmix/pmix1xx/pmix/src/buffer_ops/pack.c b/opal/mca/pmix/pmix1xx/pmix/src/buffer_ops/pack.c
index faa9a6e..cf453ee 100644
--- a/opal/mca/pmix/pmix1xx/pmix/src/buffer_ops/pack.c
+++ b/opal/mca/pmix/pmix1xx/pmix/src/buffer_ops/pack.c
@@ -643,7 +643,7 @@ int pmix_bfrop_pack_proc(pmix_buffer_t *buffer, const void *src,
         if (PMIX_SUCCESS != (ret = pmix_bfrop_pack_string(buffer, &ptr, 1, PMIX_STRING))) {
             return ret;
         }
-        if (PMIX_SUCCESS != (ret = pmix_bfrop_pack_sizet(buffer, &proc[i].rank, 1, PMIX_INT))) {
+        if (PMIX_SUCCESS != (ret = pmix_bfrop_pack_int(buffer, &proc[i].rank, 1, PMIX_INT))) {
             return ret;
         }
     }
Member

rhc54 commented Sep 10, 2015

@ggouaillardet Oh wow - yeah, that would definitely be wrong! Sorry for that typo.

Contributor

ggouaillardet commented Sep 11, 2015

and here is another twolines patch that is needed to fix mpirun -np n helloworld with n > 1 on big endian arch

diff --git a/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server.c b/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server.c
index 2dbfb2b..3b311be 100644
--- a/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server.c
+++ b/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server.c
@@ -381,7 +381,8 @@ static void _register_nspace(int sd, short args, void *cbdata)
     pmix_setup_caddy_t *cd = (pmix_setup_caddy_t*)cbdata;
     pmix_nspace_t *nptr, *tmp;
     pmix_status_t rc;
-    size_t i, j, size, rank;
+    size_t i, j, size;
+    int rank;
     pmix_kval_t kv;
     char **nodes=NULL, **procs=NULL;
     pmix_buffer_t buf2;
Member

adrianreber commented Sep 11, 2015

So it seems all patches from @ggouaillardet are already in the master branch and now the test cases seem to start up correctly. The tests are, however, not running correctly:

mpirun -np 4 --mca btl tcp,self --mca coll_sm_priority 100 --  `pwd`/src/MPI_Abort_c
MPITEST info  (0): Starting MPI_Abort test
MPITEST info  (0): This test will abort after printing the results message
MPITEST info  (0): If it does not, then a f.a.i.l.u.r.e will be noted
--------------------------------------------------------------------------
Open MPI failed to bind internal memory to a specific NUMA node.  This
message will only be reported at most once per process.

  Local host: bimini.lisas.de
  PID:        10947
  File:       ../../../../opal/mca/hwloc/base/hwloc_base_maffinity.c:118
  Message:    hwloc_set_area_membind() failure
  Severity:   Warning -- your job will continue, but possibly with degraded performance
--------------------------------------------------------------------------
[bimini.lisas.de:10940] 3 more processes have sent help message help-opal-hwloc-base.txt / mbind failure
[bimini.lisas.de:10940] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

and then it seems to hang forever. If I remove the parameters --mca coll_sm_priority 100 from the intel_tests Makefile the tests are all passing. Now idea what that parameter means.

Contributor

ggouaillardet commented Sep 11, 2015

this parameter is set to use the collective module optimized for shared memory with intra node communicator.
I do not think this module is production ready, so unless you have a good reason to use it, I recommend you do not change the priority (it is zero by default)

Member

adrianreber commented Sep 11, 2015

The coll_sm_priority parameter seems to be default in the intel test suite:

827dcde8 (Jeff Squyres 2011-03-17 13:50:39 +0000 5) MPIRUN = mpirun -np 4 --mca btl tcp,self --mca coll_sm_priority 100

I was just using the defaults. Maybe needs some fixing too.

Contributor

ggouaillardet commented Sep 11, 2015

@jsquyres is there any reason why we use coll/sm by default ?

Owner

jsquyres commented Sep 11, 2015

Is there a reason not to use coll sm?

Contributor

ggouaillardet commented Sep 11, 2015

not really, except a memory leak in v1.10 when a communicator is freed.
in ompi, the default priority of coll/sm is zero, and I found surprising the default behavior of ompi-tests is to use coll/sm.
so let me put it this way : shouldn't we test by default what end users will run by default ?

Owner

jsquyres commented Sep 11, 2015

Maybe the real question is why coll sm has a low priority by default... is there a known issue with it? (there's something in the back of my head that says that there is, but I don't recall what it is offhand...)

Member

nysal commented Sep 22, 2015

@jsquyres Should we close this one? as the original issue seems to be fixed. I can confirm that some basic tests that I ran on ppc64 pass. For sm coll component maybe we can open another issue?
By the way, does the abort test pass on intel and it is just ppc64 that has an issue? Maybe we should collect some performance data for coll sm and compare it to the tuned coll component. If it is much better, it might be worthwile to spend some effort to fix outstanding issues in sm.

doko42 commented Feb 12, 2016

seeing similar issues on powerpc 32bit, https://bugs.debian.org/814183

are these fix applied for the 1.10.x series as well?

Contributor

ggouaillardet commented Feb 12, 2016

the fix was for pmix and there is no such thing in the v1.10 series

Owner

jsquyres commented Feb 12, 2016

@doko42 From that Debian bug report, it's not clear what the error is. Is there a corefile or some other error product that shows what has failed?

Member

nysal commented Feb 14, 2016

@doko42 I haven't tried running powerpc 32-bit in a while. Could you attach gdb to the hung tasks and get a backtrace? Do simple examples shipped with ompi work?

sbez44 commented Apr 4, 2016

this all looks toppling hot.I got thrown here from Mac download mpi repair bug homebrew! looking for NetPipe cluster trails of cray && earlier blue ibm (pre zos)dont think your comments should contain live links yunno!!! those bug block build up queues are all under pressure,from what is coursoing and causing such,but dont know what.need step way back!

Owner

jsquyres commented Apr 19, 2016

@adrianreber Is this issue still relevant?

Member

adrianreber commented Apr 20, 2016

@jsquyres I do not see the segfault mentioned in this ticket anymore. All tests of the intel_tests directory are running on all branches on ppc64 without errors. Seems fixed, yes.

Owner

jsquyres commented Apr 20, 2016

@adrianreber Thanks.
@doko42 I didn't hear back from you, so I'm assuming this is no longer an issue for you, either.

So I'm closing this issue.

@jsquyres jsquyres closed this Apr 20, 2016

jsquyres pushed a commit to jsquyres/ompi that referenced this issue Sep 19, 2016

Merge pull request #874 from ggouaillardet/topic/v1.10/missing_includ…
…e_files

Topic/v1.10/missing include files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment