Skip to content

OMPI v5.0.0RC6 with ORTE fails to compile when built against PMIx 3.2.3 #10341

@robert-mijakovic

Description

@robert-mijakovic

Background information

What version of Open MPI are you using?

  • OpenMPI v5.0.0RC6 with GCC 10.3.0.

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

  • The package was installed from the distribution tarball of v5.0.0RC6.

Please describe the system on which you are running

  • Operating system/version: Rocky Linux 8.5
  • Computer hardware: BullSequana XH2000, 2xAMD EPYC 7H12 64C 2.6GHz
  • Network type: Mellanox HDR200 InfiniBand/ParTec ParaStation ClusterSuite

Details of the problem

Dependencies:

  • Compiler: GCC 10.3.0
  • hwloc 2.7.1
  • libevent 2.1.12
  • UCX 1.12.1
  • libfabric 1.14.0
  • PMIx 3.2.3
  • HCOLL 4.7.3202
  • xpmem 2.6.5-36
  • knem 1.1.4

OpenMPI 5.0.0RC6 with ORTE failed to compile when built against PMIx 3.2.3.
The decision is based on the fact that our SLURM was built against that version too.
Since PRRTE is only available with PMIx 4.x, I had to disable it and enabled ORTE.
However, as you can see below, it failed to compile because definitions from PMIx 4.x seem to be necessary.

The build is configured using the following options:

$ ./configure --prefix=/apps/USE/easybuild/staging/2021.1/software/OpenMPI/5.0.0-GCC-10.3.0  --build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu --without-prrte --with-libevent-libdir=$EBROOTLIBEVENT/lib --with-pmix-libdir=$EBROOTPMIX/lib --with-hwloc-libdir=$EBROOTHWLOC/lib --with-ofi-libdir=$EBROOTOFI/lib --with-ucx-libdir=$EBROOTUCX/lib --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default  --enable-shared  --with-cuda=no 
  CC       instance/instance.lo
In file included from /apps/USE/easybuild/staging/2021.1/software/PMIx/3.2.3-GCCcore-10.3.0/include/pmix.h:52,
                 from ../opal/mca/pmix/pmix-internal.h:47,
                 from ../opal/mca/pmix/base/base.h:23,
                 from communicator/comm_cid.c:39:
communicator/comm_cid.c: In function ompi_comm_ext_cid_new_block:
communicator/comm_cid.c:339:28: error: PMIX_GROUP_ASSIGN_CONTEXT_ID undeclared (first use in this function)
  339 |     PMIX_INFO_LOAD(&pinfo, PMIX_GROUP_ASSIGN_CONTEXT_ID, NULL, PMIX_BOOL);
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/apps/USE/easybuild/staging/2021.1/software/PMIx/3.2.3-GCCcore-10.3.0/include/pmix_common.h:1566:22: note: in definition of macro PMIX_INFO_LOAD
 1566 |         if (NULL != (k)) {                                  \
      |                      ^
communicator/comm_cid.c:339:28: note: each undeclared identifier is reported only once for each function it appears in
  339 |     PMIX_INFO_LOAD(&pinfo, PMIX_GROUP_ASSIGN_CONTEXT_ID, NULL, PMIX_BOOL);
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/apps/USE/easybuild/staging/2021.1/software/PMIx/3.2.3-GCCcore-10.3.0/include/pmix_common.h:1566:22: note: in definition of macro PMIX_INFO_LOAD
 1566 |         if (NULL != (k)) {                                  \
      |                      ^
communicator/comm_cid.c:346:10: warning: implicit declaration of function PMIx_Group_construct [-Wimplicit-function-declaration]
  346 |     rc = PMIx_Group_construct(tag, procs, proc_count, &pinfo, 1, &results, &nresults);
      |          ^~~~~~~~~~~~~~~~~~~~
communicator/comm_cid.c:357:10: warning: implicit declaration of function PMIx_Group_destruct [-Wimplicit-function-declaration]
  357 |     rc = PMIx_Group_destruct (tag, NULL, 0);
      |          ^~~~~~~~~~~~~~~~~~~
make[2]: *** [Makefile:2595: communicator/comm_cid.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
instance/instance.c: In function ompi_mpi_instance_init_common:
instance/instance.c:424:55: error: PMIX_ERR_LOST_CONNECTION undeclared (first use in this function); did you mean PMIX_ERR_LOST_PEER_CONNECTION?
  424 |     pmix_status_t codes[2] = { PMIX_ERR_PROC_ABORTED, PMIX_ERR_LOST_CONNECTION };
      |                                                       ^~~~~~~~~~~~~~~~~~~~~~~~
      |                                                       PMIX_ERR_LOST_PEER_CONNECTION
instance/instance.c:424:55: note: each undeclared identifier is reported only once for each function it appears in
runtime/ompi_rte.c: In function ompi_rte_breakpoint:
runtime/ompi_rte.c:1071:20: error: PMIX_DEBUGGER_RELEASE undeclared (first use in this function); did you mean PMIX_ERR_DEBUGGER_RELEASE?
 1071 |     int rc, code = PMIX_DEBUGGER_RELEASE;
      |                    ^~~~~~~~~~~~~~~~~~~~~
      |                    PMIX_ERR_DEBUGGER_RELEASE
runtime/ompi_rte.c:1071:20: note: each undeclared identifier is reported only once for each function it appears in
runtime/ompi_rte.c:1094:10: warning: implicit declaration of function PMIX_CHECK_RANK; did you mean PMIX_PROC_RANK? [-Wimplicit-function-declaration]
 1094 |     if (!PMIX_CHECK_RANK(u32, opal_process_info.myprocid.rank)) {
      |          ^~~~~~~~~~~~~~~
      |          PMIX_PROC_RANK
instance/instance.c: In function ompi_instance_get_num_psets_complete:
instance/instance.c:949:37: error: PMIX_QUERY_NUM_PSETS undeclared (first use in this function); did you mean PMIX_QUERY_NAMESPACES?
  949 |         if (0 == strcmp(info[n].key,PMIX_QUERY_NUM_PSETS)) {
      |                                     ^~~~~~~~~~~~~~~~~~~~
      |                                     PMIX_QUERY_NAMESPACES
In file included from /apps/USE/easybuild/staging/2021.1/software/PMIx/3.2.3-GCCcore-10.3.0/include/pmix.h:52,
                 from ../opal/mca/pmix/pmix-internal.h:47,
                 from ../opal/util/proc.h:26,
                 from runtime/ompi_rte.c:47:
runtime/ompi_rte.c:1106:30: error: PMIX_BREAKPOINT undeclared (first use in this function)
 1106 |     PMIX_INFO_LOAD(&info[1], PMIX_BREAKPOINT, "mpi-init", PMIX_STRING);
      |                              ^~~~~~~~~~~~~~~
/apps/USE/easybuild/staging/2021.1/software/PMIx/3.2.3-GCCcore-10.3.0/include/pmix_common.h:1566:22: note: in definition of macro PMIX_INFO_LOAD
 1566 |         if (NULL != (k)) {                                  \
      |                      ^
instance/instance.c:964:46: error: PMIX_QUERY_PSET_NAMES undeclared (first use in this function); did you mean PMIX_QUERY_CREATE?
  964 |         } else if (0 == strcmp (info[n].key, PMIX_QUERY_PSET_NAMES)) {
      |                                              ^~~~~~~~~~~~~~~~~~~~~
      |                                              PMIX_QUERY_CREATE
runtime/ompi_rte.c:1107:23: error: PMIX_READY_FOR_DEBUG undeclared (first use in this function)
 1107 |     PMIx_Notify_event(PMIX_READY_FOR_DEBUG,
      |                       ^~~~~~~~~~~~~~~~~~~~
instance/instance.c: In function ompi_instance_get_num_psets:
instance/instance.c:1032:39: error: PMIX_QUERY_NUM_PSETS undeclared (first use in this function); did you mean PMIX_QUERY_NAMESPACES?
 1032 |     ompi_instance_refresh_pmix_psets (PMIX_QUERY_NUM_PSETS);
      |                                       ^~~~~~~~~~~~~~~~~~~~
      |                                       PMIX_QUERY_NAMESPACES
make[2]: *** [Makefile:2595: runtime/ompi_rte.lo] Error 1
instance/instance.c: In function ompi_instance_get_nth_pset:
instance/instance.c:1041:43: error: PMIX_QUERY_PSET_NAMES undeclared (first use in this function); did you mean PMIX_QUERY_CREATE?
 1041 |         ompi_instance_refresh_pmix_psets (PMIX_QUERY_PSET_NAMES);
      |                                           ^~~~~~~~~~~~~~~~~~~~~
      |                                           PMIX_QUERY_CREATE
instance/instance.c: In function ompi_instance_group_pmix_pset:
instance/instance.c:1196:27: error: PMIX_PSET_NAME undeclared (first use in this function); did you mean PMIX_RM_NAME?
 1196 |         rc = PMIx_Get(&p, PMIX_PSET_NAME, NULL, 0, &pval);
      |                           ^~~~~~~~~~~~~~
      |                           PMIX_RM_NAME
make[2]: *** [Makefile:2595: instance/instance.lo] Error 1
make[2]: Leaving directory '/dev/shm/OpenMPI/5.0.0/GCC-10.3.0/openmpi-5.0.0rc6/ompi'
make[1]: *** [Makefile:2702: all-recursive] Error 1
make[1]: Leaving directory '/dev/shm/OpenMPI/5.0.0/GCC-10.3.0/openmpi-5.0.0rc6/ompi'
make: *** [Makefile:1484: all-recursive] Error 1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions