Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openmpi 4.1.5 won't spawn ([asustx:2550375] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY) #11749

Closed
bchareyre opened this issue Jun 9, 2023 · 13 comments

Comments

@bchareyre
Copy link

Hello,
An issue was revealed by regression tests of yade-dem on archlinux [1].
The tests pass with 4.1.4. and earlier. They fail with 4.1.5.
Note: the test involves python binding by mpi4py

[1] https://gitlab.com/yade-dev/trunk/-/issues/309

openmpi-4.1.4 is built with extra-x86_64-build -r /tmp/arch-chroot with the PKGBUILD in archlinux official repo)

openmpi-4.1.5 is from official archlinux repo.
python 3.11.3-1
python-mpi4py 3.1.4-3

  • Operating system/version: archlinux
  • Computer hardware: not known atm
  • Network type: no network involved, it is tested on a standalone workstation

Details of the problem

Expected behavior with the script below (obtained with openmpi-4.1.4):

➜  ~ python3 script.py
Hello, I am the MPI process with rank 0
Hello, I am the MPI process with rank 1
Hello, I am the MPI process with rank 2
Hello, I am the MPI process with rank 3

Output with openmpi-4.1.5, the script does not terminate and the output is:

➜  ~ python3 script.py
[asustx:2550375] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY 
[asustx:2550375] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY 

The (python) test script:

from  mpi4py  import  MPI
import sys
world = MPI.COMM_WORLD
if world.Get_parent()==MPI.COMM_NULL: # this is rank 0 executing, spawn workers and make them execute this same script
    comm = world.Spawn("python3" , args=[sys.argv[0]],maxprocs=3 ).Merge()
else:    # then a spawned process is executing the script
    comm = world.Get_parent().Merge()

rank = comm.Get_rank ()
print("Hello, I am the MPI process with rank",rank)
comm.barrier()
comm.Free()
quit()
@bchareyre bchareyre changed the title openmpi 4.1.5 wont's spawn ([asustx:2550375] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY) openmpi 4.1.5 won't spawn ([asustx:2550375] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY) Jun 9, 2023
@jsquyres
Copy link
Member

jsquyres commented Jun 12, 2023

What version of OpenPMIx are you using?

If you are using OpenPMIx v4.2.3, this is possibly a dup of #11729.

@sukanka
Copy link

sukanka commented Jun 13, 2023

What version of OpenPMIx are you using?

If you are using OpenPMIx v4.2.3, this is possibly a dup of #11729.

Thanks for your comment. I'm using OpenPMIx-4.2.3. I'll try the patch in that issue and report here again.

@sukanka
Copy link

sukanka commented Jun 13, 2023

What version of OpenPMIx are you using?

If you are using OpenPMIx v4.2.3, this is possibly a dup of #11729.

This is not a dup of #11729. I rebuild openmpi with the patch, but the problem is still there.

And I then checked the latest PKGBUILD( updated 2 weeks ago) in the official repository of Archlinux, finding it already had the patch (#11472) included.

@jsquyres jsquyres added this to the v4.1.6 milestone Jun 13, 2023
@jsquyres
Copy link
Member

@sukanka Thanks for the confirmation. Are you working with @bchareyre, or is your report independent of theirs?

I did some git bisecting, and it looks like the fix from #11472 is what actually broke spawn (with both the bundled version of PMIx and v4.2.3). ☹️ Put differently: we fixed the 4.2.3 problem, but we broke spawn.

We need to investigate further to figure out how we broke it / how to fix it.

@sukanka
Copy link

sukanka commented Jun 14, 2023

@sukanka Thanks for the confirmation. Are you working with @bchareyre, or is your report independent of theirs?

Yeah, I work with @bchareyre. We have been discussing this for a few days. (You can have a look at the url posted in the very first comment)

@ggouaillardet
Copy link
Contributor

please give a try to the inline patch below

FWIW, I am not convinced this is the most elegant fix (since we convert PMIX_PERSIST to OPAL_UINT8 as before) but maybe that is good enough to cover the needs of the v4 series.

--- orig/openmpi-4.1.5/opal/mca/pmix/pmix3x/pmix3x.c	2023-06-14 11:57:42.000000000 +0900
+++ openmpi-4.1.5/opal/mca/pmix/pmix3x/pmix3x.c 2023-06-14 12:00:23.116704037 +0900
@@ -843,10 +843,6 @@
                 i->value.data.bo.size = 0;
             }
             break;
-        case OPAL_PERSIST:
-            i->value.type = PMIX_PERSIST;
-            i->value.data.persist = pmix3x_convert_opalpersist((opal_pmix_persistence_t)kv->data.uint8);
-            break;
         case OPAL_SCOPE:
             i->value.type = PMIX_SCOPE;
             i->value.data.scope = pmix3x_convert_opalscope((opal_pmix_scope_t)kv->data.uint8);

@sukanka
Copy link

sukanka commented Jun 14, 2023

please give a try to the inline patch below

FWIW, I am not convinced this is the most elegant fix (since we convert PMIX_PERSIST to OPAL_UINT8 as before) but maybe that is good enough to cover the needs of the v4 series.

--- orig/openmpi-4.1.5/opal/mca/pmix/pmix3x/pmix3x.c	2023-06-14 11:57:42.000000000 +0900
+++ openmpi-4.1.5/opal/mca/pmix/pmix3x/pmix3x.c 2023-06-14 12:00:23.116704037 +0900
@@ -843,10 +843,6 @@
                 i->value.data.bo.size = 0;
             }
             break;
-        case OPAL_PERSIST:
-            i->value.type = PMIX_PERSIST;
-            i->value.data.persist = pmix3x_convert_opalpersist((opal_pmix_persistence_t)kv->data.uint8);
-            break;
         case OPAL_SCOPE:
             i->value.type = PMIX_SCOPE;
             i->value.data.scope = pmix3x_convert_opalscope((opal_pmix_scope_t)kv->data.uint8);

Thanks for your reply. But this seems not working. I rebuild openmpi with the patch from #11472 and your comment. Then I re-run the demo script, and the output is

openmpi python3 script.py
[asustx:3025739] PACK-OPAL-VALUE: UNSUPPORTED TYPE 0 FOR KEY 
[asustx:3025739] [[3101,0],0] ORTE_ERROR_LOG: Error in file orted/pmix/pmix_server_pub.c at line 312
Traceback (most recent call last):
  File "/tmp/RustDesk/openmpi/script.py", line 6, in <module>
    comm = world.Spawn("python3", args=[sys.argv[0]], maxprocs=3).Merge()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "mpi4py/MPI/Comm.pyx", line 1931, in mpi4py.MPI.Intracomm.Spawn
mpi4py.MPI.Exception: MPI_ERR_OTHER: known error not in listopenmpi [asustx:3025739] PACK-OPAL-VALUE: UNSUPPORTED TYPE 0 FOR KEY 
[asustx:3025739] [[3101,0],0] ORTE_ERROR_LOG: Error in file orted/pmix/pmix_server_pub.c at line 312

Note: I run it only once, but the output looks like that the script is run twice. And when I press Enter key, I can then run new commands from the same terminal.

@ggouaillardet
Copy link
Contributor

my bad, it seems I messed up with my test environment.

What about this one?

It works for me if I build from the sources and I manually apply this patch on top of the existing one.
For some reason, I cannot get it work if I add this patch to the latest PKGBUILD, but I guess I did miss some steps since I am not really familiar with archlinux.

--- orig/openmpi-4.1.5/opal/mca/pmix/pmix3x/pmix3x.c	2023-06-14 21:45:40.159479390 +0900
+++ openmpi-4.1.5/opal/mca/pmix/pmix3x/pmix3x.c	2023-06-14 21:48:02.469473048 +0900
@@ -807,14 +807,17 @@
             PMIX_INFO_LOAD(i, kv->key, &kv->data.time, PMIX_TIME);
             break;
         case OPAL_STATUS:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_STATUS;
             i->value.data.status = pmix3x_convert_opalrc(kv->data.status);
             break;
         case OPAL_VPID:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_PROC_RANK;
             i->value.data.rank = pmix3x_convert_opalrank(kv->data.name.vpid);
             break;
         case OPAL_NAME:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_PROC;
             /* have to stringify the jobid */
             PMIX_PROC_CREATE(i->value.data.proc, 1);
@@ -833,6 +836,7 @@
             i->value.data.proc->rank = pmix3x_convert_opalrank(kv->data.name.vpid);
             break;
         case OPAL_BYTE_OBJECT:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_BYTE_OBJECT;
             if (NULL != kv->data.bo.bytes) {
                 i->value.data.bo.bytes = (char*)malloc(kv->data.bo.size);
@@ -844,18 +848,22 @@
             }
             break;
         case OPAL_PERSIST:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_PERSIST;
             i->value.data.persist = pmix3x_convert_opalpersist((opal_pmix_persistence_t)kv->data.uint8);
             break;
         case OPAL_SCOPE:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_SCOPE;
             i->value.data.scope = pmix3x_convert_opalscope((opal_pmix_scope_t)kv->data.uint8);
             break;
         case OPAL_DATA_RANGE:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_DATA_RANGE;
             i->value.data.range = pmix3x_convert_opalrange((opal_pmix_data_range_t)kv->data.uint8);
             break;
         case OPAL_PROC_STATE:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_PROC_STATE;
             /* the OPAL layer doesn't have any concept of proc state,
              * so the ORTE layer is responsible for converting it */
@@ -873,6 +881,7 @@
              * opal_value_t's that we need to convert to a pmix_data_array
              * of pmix_info_t structures */
             list = (opal_list_t*)kv->data.ptr;
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_DATA_ARRAY;
             i->value.data.darray = (pmix_data_array_t*)malloc(sizeof(pmix_data_array_t));
             i->value.data.darray->type = PMIX_INFO;
@@ -893,6 +902,7 @@
             }
             break;
         case OPAL_PROC_INFO:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_PROC_INFO;
             PMIX_PROC_INFO_CREATE(i->value.data.pinfo, 1);
             /* see if this job is in our list of known nspaces */

@jsquyres
Copy link
Member

jsquyres commented Jun 14, 2023

@ggouaillardet Works for me! Here's my test program in C:

#include <stdio.h>
#include <mpi.h>

int main(int argc, char* argv[])
{
    MPI_Init(NULL, NULL);

    MPI_Comm comm;
    MPI_Comm parent, child;
    MPI_Comm_get_parent(&parent);
    if (parent == MPI_COMM_NULL) {
        int errors[3];
        MPI_Comm_spawn(argv[0], NULL, 3, MPI_INFO_NULL, 0,
                       MPI_COMM_WORLD, &child, errors);
        MPI_Intercomm_merge(child, 0, &comm);
        MPI_Comm_free(&child);
    } else {
        MPI_Intercomm_merge(parent, 1, &comm);
    }

    int rank;
    MPI_Comm_rank(comm, &rank);
    printf("Hello, I am rank %d in the merged comm\n", rank);
    MPI_Barrier(comm);
    MPI_Comm_free(&comm);
            
    MPI_Finalize();
    return 0;
}

@bchareyre @sukanka If this patch works for you, I'll make a PR and get it merged into the v4.1.x branch.

@jsquyres jsquyres added the bug label Jun 14, 2023
@sukanka
Copy link

sukanka commented Jun 15, 2023

--- orig/openmpi-4.1.5/opal/mca/pmix/pmix3x/pmix3x.c	2023-06-14 21:45:40.159479390 +0900
+++ openmpi-4.1.5/opal/mca/pmix/pmix3x/pmix3x.c	2023-06-14 21:48:02.469473048 +0900
@@ -807,14 +807,17 @@
             PMIX_INFO_LOAD(i, kv->key, &kv->data.time, PMIX_TIME);
             break;
         case OPAL_STATUS:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_STATUS;
             i->value.data.status = pmix3x_convert_opalrc(kv->data.status);
             break;
         case OPAL_VPID:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_PROC_RANK;
             i->value.data.rank = pmix3x_convert_opalrank(kv->data.name.vpid);
             break;
         case OPAL_NAME:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_PROC;
             /* have to stringify the jobid */
             PMIX_PROC_CREATE(i->value.data.proc, 1);
@@ -833,6 +836,7 @@
             i->value.data.proc->rank = pmix3x_convert_opalrank(kv->data.name.vpid);
             break;
         case OPAL_BYTE_OBJECT:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_BYTE_OBJECT;
             if (NULL != kv->data.bo.bytes) {
                 i->value.data.bo.bytes = (char*)malloc(kv->data.bo.size);
@@ -844,18 +848,22 @@
             }
             break;
         case OPAL_PERSIST:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_PERSIST;
             i->value.data.persist = pmix3x_convert_opalpersist((opal_pmix_persistence_t)kv->data.uint8);
             break;
         case OPAL_SCOPE:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_SCOPE;
             i->value.data.scope = pmix3x_convert_opalscope((opal_pmix_scope_t)kv->data.uint8);
             break;
         case OPAL_DATA_RANGE:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_DATA_RANGE;
             i->value.data.range = pmix3x_convert_opalrange((opal_pmix_data_range_t)kv->data.uint8);
             break;
         case OPAL_PROC_STATE:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_PROC_STATE;
             /* the OPAL layer doesn't have any concept of proc state,
              * so the ORTE layer is responsible for converting it */
@@ -873,6 +881,7 @@
              * opal_value_t's that we need to convert to a pmix_data_array
              * of pmix_info_t structures */
             list = (opal_list_t*)kv->data.ptr;
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_DATA_ARRAY;
             i->value.data.darray = (pmix_data_array_t*)malloc(sizeof(pmix_data_array_t));
             i->value.data.darray->type = PMIX_INFO;
@@ -893,6 +902,7 @@
             }
             break;
         case OPAL_PROC_INFO:
+            PMIX_LOAD_KEY(i->key, kv->key);
             i->value.type = PMIX_PROC_INFO;
             PMIX_PROC_INFO_CREATE(i->value.data.pinfo, 1);
             /* see if this job is in our list of known nspaces */

Thanks! this patch works!

@bchareyre
Copy link
Author

Awesome! Thanks @ggouaillardet for fixing, and thank you @sukanka for following up.
@jsquyres, good to see the C version of the test (+freeing the child comm). I was also expecting the need to eliminate mpi4py from the picture, you translated faster than me.

@jsquyres
Copy link
Member

Fixed in #11767. Will be included in Open MPI v4.1.6.

@sukanka
Copy link

sukanka commented Jul 3, 2024

@ggouaillardet Works for me! Here's my test program in C:

#include <stdio.h>
#include <mpi.h>

int main(int argc, char* argv[])
{
    MPI_Init(NULL, NULL);

    MPI_Comm comm;
    MPI_Comm parent, child;
    MPI_Comm_get_parent(&parent);
    if (parent == MPI_COMM_NULL) {
        int errors[3];
        MPI_Comm_spawn(argv[0], NULL, 3, MPI_INFO_NULL, 0,
                       MPI_COMM_WORLD, &child, errors);
        MPI_Intercomm_merge(child, 0, &comm);
        MPI_Comm_free(&child);
    } else {
        MPI_Intercomm_merge(parent, 1, &comm);
    }

    int rank;
    MPI_Comm_rank(comm, &rank);
    printf("Hello, I am rank %d in the merged comm\n", rank);
    MPI_Barrier(comm);
    MPI_Comm_free(&comm);
            
    MPI_Finalize();
    return 0;
}

This MWE is broken again in openmpi-5.0.3-1. I'm using the one from the official repo of archlinux.

The compiled program can print the expected messages, but it never terminates, just gets stuck instead.

demo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants