Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMIX_ERROR when MPI_Comm_spawn in multiple nodes #12601

Open
dariomnz opened this issue Jun 5, 2024 · 14 comments
Open

PMIX_ERROR when MPI_Comm_spawn in multiple nodes #12601

dariomnz opened this issue Jun 5, 2024 · 14 comments

Comments

@dariomnz
Copy link

dariomnz commented Jun 5, 2024

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

  • ompi_info --version
    Open MPI v5.0.3

https://www.open-mpi.org/community/help/

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.3.tar.gz
tar zxf openmpi-5.0.3.tar.gz
ln   -s openmpi-5.0.3  openmpi

# 4) Install openmpi (from source code)
mkdir -p /home/lab/bin
cd       ${DESTINATION_PATH}/openmpi
./configure --prefix=/home/lab/bin/openmpi
make -j $(nproc) all
make install
output of ompi_info
+ ompi_info
                 Package: Open MPI root@buildkitsandbox Distribution
                Open MPI: 5.0.3
  Open MPI repo revision: v5.0.3
   Open MPI release date: Apr 08, 2024
                 MPI API: 3.1.0
            Ident string: 5.0.3
                  Prefix: /home/lab/bin/openmpi
 Configured architecture: x86_64-pc-linux-gnu
           Configured by: root
           Configured on: Fri May 31 08:42:58 UTC 2024
          Configure host: buildkitsandbox
  Configure command line: '--prefix=/home/lab/bin/openmpi'
                Built by: 
                Built on: Fri May 31 08:51:40 UTC 2024
              Built host: buildkitsandbox
              C bindings: yes
             Fort mpif.h: no
            Fort use mpi: no
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: no
 Fort mpi_f08 compliance: The mpi_f08 module was not built
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: gcc
     C compiler absolute: /bin/gcc
  C compiler family name: GNU
      C compiler version: 11.4.0
            C++ compiler: g++
   C++ compiler absolute: /bin/g++
           Fort compiler: none
       Fort compiler abs: none
         Fort ignore TKR: no
   Fort 08 assumed shape: no
      Fort optional args: no
          Fort INTERFACE: no
    Fort ISO_FORTRAN_ENV: no
       Fort STORAGE_SIZE: no
      Fort BIND(C) (all): no
      Fort ISO_C_BINDING: no
 Fort SUBROUTINE BIND(C): no
       Fort TYPE,BIND(C): no
 Fort T,BIND(C,name="a"): no
            Fort PRIVATE: no
           Fort ABSTRACT: no
       Fort ASYNCHRONOUS: no
          Fort PROCEDURE: no
         Fort USE...ONLY: no
           Fort C_FUNLOC: no
 Fort f08 using wrappers: no
         Fort MPI_SIZEOF: no
             C profiling: yes
   Fort mpif.h profiling: no
  Fort use mpi profiling: no
   Fort use mpi_f08 prof: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, Event lib: yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: no
          MPI extensions: affinity, cuda, ftmpi, rocm
 Fault Tolerance support: yes
          FT MPI support: yes
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
         MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.0.3)
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v5.0.3)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v5.0.3)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v5.0.3)
                 MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.0.3)
                 MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.0.3)
                 MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.0.3)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v5.0.3)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.3)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.3)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.0.3)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v5.0.3)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v5.0.3)
               MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component v5.0.3)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v5.0.3)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v5.0.3)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v5.0.3)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.0.3)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v5.0.3)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.0.3)
                MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.0.3)
             MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component v5.0.3)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v5.0.3)
                 MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.0.3)
                MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component v5.0.3)
                MCA coll: basic (MCA v2.1.0, API v2.4.0, Component v5.0.3)
                MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.0.3)
                MCA coll: inter (MCA v2.1.0, API v2.4.0, Component v5.0.3)
                MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component v5.0.3)
                MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.0.3)
                MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.0.3)
                MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component v5.0.3)
                MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component v5.0.3)
                MCA coll: monitoring (MCA v2.1.0, API v2.4.0, Component
                          v5.0.3)
                MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.0.3)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.0.3)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v5.0.3)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.3)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.0.3)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v5.0.3)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.3)
                MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component
                          v5.0.3)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v5.0.3)
                  MCA io: romio341 (MCA v2.1.0, API v2.0.0, Component v5.0.3)
                  MCA op: avx (MCA v2.1.0, API v1.0.0, Component v5.0.3)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.3)
                 MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v5.0.3)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.3)
                MCA part: persist (MCA v2.1.0, API v4.0.0, Component v5.0.3)
                 MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.0.3)
                 MCA pml: monitoring (MCA v2.1.0, API v2.1.0, Component
                          v5.0.3)
                 MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.0.3)
                 MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.0.3)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.0.3)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v5.0.3)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.0.3)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v5.0.3)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v5.0.3)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v5.0.3)

Please describe the system on which you are running

  • Operating system/version: docker Ubuntu 22.04.4 LTS
  • Computer hardware: irrelevant
  • Network type: irrelevant

Details of the problem

The spawn method gives error when you have several nodes. If it is done within the same node it works perfectly.
It is interesting because communications between the nodes do take place even if the error occurs.

Code spawn.c

#include <mpi.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv); 

    MPI_Comm parentcomm, intercomm, intracomm;
    int rank, size, len;
    char proc_name[MPI_MAX_PROCESSOR_NAME];

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Get_processor_name(proc_name, &len);

    MPI_Comm_get_parent(&parentcomm); 
    if (parentcomm == MPI_COMM_NULL) {
        printf("Parent from %s: rank %d out of %d\n", proc_name, rank, size);
        
        for (int i = 0; i < 2; i++)
        {
            // Spawn a new process
            MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm, MPI_ERRCODES_IGNORE);

            // Send a message to the child process
            int message = 42;
            MPI_Bcast(&message, 1, MPI_INT, MPI_ROOT, intercomm);  
            printf("Parent broadcasted message: %d\n", message);
            // Free the communicators
            MPI_Comm_free(&intercomm); 
        }
    }else{
        printf("Childfrom %s: rank %d out of %d\n", proc_name, rank, size);

        // Recv a message to the child process
        int message;
        MPI_Bcast(&message, 1, MPI_INT, 0, parentcomm);
        printf("Child received broadcasted message: %d\n", message);

        // Free the communicators
        MPI_Comm_free(&parentcomm);
    }

    MPI_Finalize();
    return 0;
}

Good execution in one node:

+ mpicc -g -o spawn spawn.c
+ mpiexec -n 3 --map-by node:OVERSUBSCRIBE ./spawn
Parent from 2e7630b38c9e: rank 0 out of 3
Parent from 2e7630b38c9e: rank 1 out of 3
Parent from 2e7630b38c9e: rank 2 out of 3
Childfrom 2e7630b38c9e: rank 0 out of 1
Parent broadcasted message: 42
Parent broadcasted message: 42
Parent broadcasted message: 42
Child received broadcasted message: 42
Childfrom 2e7630b38c9e: rank 0 out of 1
Parent broadcasted message: 42
Parent broadcasted message: 42
Parent broadcasted message: 42
Child received broadcasted message: 42

Bad execution in multiple nodes:

+ mpicc -g -o spawn spawn.c
+ mpiexec -n 3 --hostfile /work/machines_mpi --map-by node:OVERSUBSCRIBE ./spawn
Parent from 2e7630b38c9e: rank 0 out of 3
Parent from 74f0e8888de4: rank 1 out of 3
Parent from c1de8f727368: rank 2 out of 3
[2e7630b38c9e:03183] PMIX ERROR: PMIX_ERROR in file prted/pmix/pmix_server_dyn.c at line 1041
Childfrom 2e7630b38c9e: rank 0 out of 1
Parent broadcasted message: 42
Child received broadcasted message: 42
Parent broadcasted message: 42
Parent broadcasted message: 42
[2e7630b38c9e:03183] PMIX ERROR: PMIX_ERROR in file prted/pmix/pmix_server_dyn.c at line 1041
Childfrom 2e7630b38c9e: rank 0 out of 1
[74f0e8888de4][[1087,1],1][btl_tcp_proc.c:400:mca_btl_tcp_proc_create] opal_modex_recv: failed with return value=-46
[c1de8f727368][[1087,1],2][btl_tcp_proc.c:400:mca_btl_tcp_proc_create] opal_modex_recv: failed with return value=-46
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[1087,1],1]) is on host: 74f0e8888de4
  Process 2 ([[1087,3],0]) is on host: unknown
  BTLs attempted: self tcp

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
[74f0e8888de4:00000] *** An error occurred in MPI_Bcast
[74f0e8888de4:00000] *** reported by process [71237633,1]
[74f0e8888de4:00000] *** on communicator 
[74f0e8888de4:00000] *** MPI_ERR_INTERN: internal error
[74f0e8888de4:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[74f0e8888de4:00000] ***    and MPI will try to terminate your MPI job as well)
Child received broadcasted message: 42
Parent broadcasted message: 42
@hppritcha
Copy link
Member

Could you set this env variable in the shell where the parent process is started?

export PMIX_MCA_gds=hash

and rerun and see it the problem persists?

@hppritcha hppritcha self-assigned this Jun 7, 2024
@rhc54
Copy link
Contributor

rhc54 commented Jun 7, 2024

I assume you used the PMIx that was included in OMPI v5.0.3? If so, then the above envar is unlikely to do any good. The bug it addressed is in the PMIx v5 series, and OMPI v5.0.3 uses PMIx v4.

Looking at the error output, the problem lies in the RTE's handling of the PMIx_Connect operation that is used in the "connect/accept" portion of comm_spawn. We know there are issues in that area - probably fixed in later versions, but not in releases yet. I'm unaware of any workaround short of updating, and have no concrete advice there.

That said, I know we can successfully comm_spawn across multiple nodes because I regularly do so. However, none of my codes follow your pattern, so I cannot say why your code fails.

@dariomnz
Copy link
Author

Could you set this env variable in the shell where the parent process is started?

export PMIX_MCA_gds=hash

and rerun and see it the problem persists?

I test it and the problem persists:

If you tell me that in the current versions it does not work and in the future it will be fixed, it is also valid as an answer for me.

+ export PMIX_MCA_gds=hash
+ PMIX_MCA_gds=hash
+ mpiexec -n 3 --hostfile /work/machines_mpi --map-by node:OVERSUBSCRIBE ./spawn
Warning: Permanently added '172.18.0.3' (ED25519) to the list of known hosts.

Warning: Permanently added '172.18.0.4' (ED25519) to the list of known hosts.

Parent from e6d86d2ea3a1: rank 0 out of 3
Parent from 1e80d68b9a8e: rank 2 out of 3
Parent from eba2089b8c2f: rank 1 out of 3
[e6d86d2ea3a1:00072] PMIX ERROR: PMIX_ERROR in file prted/pmix/pmix_server_dyn.c at line 1041
[e6d86d2ea3a1:00072] PMIX ERROR: PMIX_ERROR in file prted/pmix/pmix_server_dyn.c at line 1041
[e6d86d2ea3a1:00072] PMIX ERROR: PMIX_ERR_OUT_OF_RESOURCE in file base/bfrop_base_unpack.c at line 1843
Childfrom e6d86d2ea3a1: rank 0 out of 1
Parent broadcasted message: 42
Child received broadcasted message: 42
Parent broadcasted message: 42
Parent broadcasted message: 42
[e6d86d2ea3a1:00072] PMIX ERROR: PMIX_ERROR in file prted/pmix/pmix_server_dyn.c at line 1041
[e6d86d2ea3a1:00072] PMIX ERROR: PMIX_ERROR in file prted/pmix/pmix_server_dyn.c at line 1041
[e6d86d2ea3a1:00072] PMIX ERROR: PMIX_ERR_OUT_OF_RESOURCE in file base/bfrop_base_unpack.c at line 1843
Childfrom e6d86d2ea3a1: rank 0 out of 1
[1e80d68b9a8e][[41639,1],2][btl_tcp_proc.c:400:mca_btl_tcp_proc_create] opal_modex_recv: failed with return value=-46
[eba2089b8c2f][[41639,1],1][btl_tcp_proc.c:400:mca_btl_tcp_proc_create] opal_modex_recv: failed with return value=-46
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[41639,1],1]) is on host: eba2089b8c2f
  Process 2 ([[41639,3],0]) is on host: unknown
  BTLs attempted: self tcp

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
+ mpiexec -n 3 --hostfile /work/machines_mpi -x PMIX_MCA_gds=hash --map-by node:OVERSUBSCRIBE ./spawn
Warning: Permanently added '172.18.0.4' (ED25519) to the list of known hosts.

Warning: Permanently added '172.18.0.3' (ED25519) to the list of known hosts.

Parent from 0f2fad4c2330: rank 0 out of 3
Parent from 9116950a090e: rank 2 out of 3
Parent from cd50ec8d6901: rank 1 out of 3
[0f2fad4c2330:00072] PMIX ERROR: PMIX_ERROR in file prted/pmix/pmix_server_dyn.c at line 1041
[0f2fad4c2330:00072] PMIX ERROR: PMIX_ERROR in file prted/pmix/pmix_server_dyn.c at line 1041
[0f2fad4c2330:00072] PMIX ERROR: PMIX_ERR_OUT_OF_RESOURCE in file base/bfrop_base_unpack.c at line 1843
Childfrom 0f2fad4c2330: rank 0 out of 1
Parent broadcasted message: 42
Child received broadcasted message: 42
Parent broadcasted message: 42
Parent broadcasted message: 42
[0f2fad4c2330:00072] PMIX ERROR: PMIX_ERROR in file prted/pmix/pmix_server_dyn.c at line 1041
[0f2fad4c2330:00072] PMIX ERROR: PMIX_ERROR in file prted/pmix/pmix_server_dyn.c at line 1041
[0f2fad4c2330:00072] PMIX ERROR: PMIX_ERR_OUT_OF_RESOURCE in file base/bfrop_base_unpack.c at line 1843
Childfrom 0f2fad4c2330: rank 0 out of 1
[cd50ec8d6901][[11691,1],1][btl_tcp_proc.c:400:mca_btl_tcp_proc_create] [9116950a090e][[11691,1],2][btl_tcp_proc.c:400:mca_btl_tcp_proc_create] opal_modex_recv: failed with return value=-46
opal_modex_recv: failed with return value=-46
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[11691,1],2]) is on host: 9116950a090e
  Process 2 ([[11691,3],0]) is on host: unknown
  BTLs attempted: self tcp

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------

@dariomnz
Copy link
Author

Maybe it is the same problem as in this issue: #12599

@rhc54
Copy link
Contributor

rhc54 commented Jun 10, 2024

As I said, it is a known problem and has probably been fixed, but I have no advice on when that will appear in a release.

@rhc54
Copy link
Contributor

rhc54 commented Jun 10, 2024

Maybe it is the same problem as in this issue: #12599

No, that is an entirely different issue.

@hppritcha
Copy link
Member

Using Open MPI main and PMIx at e32e0179 and PRRTE at d02ad07c3d I don't observe this behavior using 3 nodes of a slurm managed cluster. If i use the Open MPI internal pmix/prrte submodules the test case hangs when using multiple nodes.

@hppritcha
Copy link
Member

well i slightly amend my comment. it seems that if UCX is involved in anyway, Open MPI main with embedded openpmix/prrte hangs. If i configure open mpi with --with-ucx=no then using main and 3 nodes the above test works nominally. Adding the 5.0.x label to this as it seems the problem is specific to that branch.

@hppritcha
Copy link
Member

Could you try the 5.0.x nightly tarball? See https://www.open-mpi.org/nightly/v5.0.x/
I'm noticing that with the 5.0.3 release I get a hang with your test but with the current head of 5.0.x I'm not seeing this hang behavior.

@dariomnz
Copy link
Author

I launch the test in docker simulating the nodes with containers. I installed the version you told me and I get the same error trace as above.
As I have installed it:

wget https://download.open-mpi.org/nightly/open-mpi/v5.0.x/openmpi-v5.0.x-202406110241-2a43602.tar.gz
tar zxf openmpi-v5.0.x-202406110241-2a43602.tar.gz
ln   -s openmpi-v5.0.x-202406110241-2a43602  openmpi
mkdir -p /home/lab/bin
cd       ${DESTINATION_PATH}/openmpi
./configure --prefix=/home/lab/bin/openmpi
make -j $(nproc) all
make install
Output of ompi_info:
+ ompi_info
                 Package: Open MPI root@buildkitsandbox Distribution
                Open MPI: 5.0.4a1
  Open MPI repo revision: v5.0.3-56-g2a436023eb
   Open MPI release date: Unreleased developer copy
                 MPI API: 3.1.0
            Ident string: 5.0.4a1
                  Prefix: /home/lab/bin/openmpi
 Configured architecture: x86_64-pc-linux-gnu
           Configured by: root
           Configured on: Tue Jun 11 08:51:34 UTC 2024
          Configure host: buildkitsandbox
  Configure command line: '--prefix=/home/lab/bin/openmpi'
                Built by: 
                Built on: Tue Jun 11 09:02:45 UTC 2024
              Built host: buildkitsandbox
              C bindings: yes
             Fort mpif.h: no
            Fort use mpi: no
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: no
 Fort mpi_f08 compliance: The mpi_f08 module was not built
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: gcc
     C compiler absolute: /bin/gcc
  C compiler family name: GNU
      C compiler version: 11.4.0
            C++ compiler: g++
   C++ compiler absolute: /bin/g++
           Fort compiler: none
       Fort compiler abs: none
         Fort ignore TKR: no
   Fort 08 assumed shape: no
      Fort optional args: no
          Fort INTERFACE: no
    Fort ISO_FORTRAN_ENV: no
       Fort STORAGE_SIZE: no
      Fort BIND(C) (all): no
      Fort ISO_C_BINDING: no
 Fort SUBROUTINE BIND(C): no
       Fort TYPE,BIND(C): no
 Fort T,BIND(C,name="a"): no
            Fort PRIVATE: no
           Fort ABSTRACT: no
       Fort ASYNCHRONOUS: no
          Fort PROCEDURE: no
         Fort USE...ONLY: no
           Fort C_FUNLOC: no
 Fort f08 using wrappers: no
         Fort MPI_SIZEOF: no
             C profiling: yes
   Fort mpif.h profiling: no
  Fort use mpi profiling: no
   Fort use mpi_f08 prof: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, Event lib: yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: no
          MPI extensions: affinity, cuda, ftmpi, rocm
 Fault Tolerance support: yes
          FT MPI support: yes
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
         MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.0.4)
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v5.0.4)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v5.0.4)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v5.0.4)
                 MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.0.4)
                 MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.0.4)
                 MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.0.4)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v5.0.4)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.4)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.4)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.0.4)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v5.0.4)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v5.0.4)
               MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component v5.0.4)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v5.0.4)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v5.0.4)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v5.0.4)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.0.4)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v5.0.4)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.0.4)
                MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.0.4)
             MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component v5.0.4)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v5.0.4)
                 MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.0.4)
                MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component v5.0.4)
                MCA coll: basic (MCA v2.1.0, API v2.4.0, Component v5.0.4)
                MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.0.4)
                MCA coll: inter (MCA v2.1.0, API v2.4.0, Component v5.0.4)
                MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component v5.0.4)
                MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.0.4)
                MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.0.4)
                MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component v5.0.4)
                MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component v5.0.4)
                MCA coll: monitoring (MCA v2.1.0, API v2.4.0, Component
                          v5.0.4)
                MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.0.4)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.0.4)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v5.0.4)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.4)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.0.4)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v5.0.4)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.4)
                MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component
                          v5.0.4)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v5.0.4)
                  MCA io: romio341 (MCA v2.1.0, API v2.0.0, Component v5.0.4)
                  MCA op: avx (MCA v2.1.0, API v1.0.0, Component v5.0.4)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.4)
                 MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v5.0.4)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.4)
                MCA part: persist (MCA v2.1.0, API v4.0.0, Component v5.0.4)
                 MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.0.4)
                 MCA pml: monitoring (MCA v2.1.0, API v2.1.0, Component
                          v5.0.4)
                 MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.0.4)
                 MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.0.4)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.0.4)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v5.0.4)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.0.4)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v5.0.4)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v5.0.4)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v5.0.4)

@hppritcha
Copy link
Member

okay let's try one more thing. Could you try our nightly main tarball? I beginning to think that you are hitting a different problem on your system that I'm not able to duplicate.

Copy link

It looks like this issue is expecting a response, but hasn't gotten one yet. If there are no responses in the next 2 weeks, we'll assume that the issue has been abandoned and will close it.

@github-actions github-actions bot added the Stale label Jun 26, 2024
@janjust
Copy link
Contributor

janjust commented Jul 3, 2024

@dariomnz Are you able to test nightly main tarball as per @hppritcha suggestion? I would really hate for this issue to be auto-closed.

Copy link

It looks like this issue is expecting a response, but hasn't gotten one yet. If there are no responses in the next 2 weeks, we'll assume that the issue has been abandoned and will close it.

@github-actions github-actions bot added the Stale label Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants