Skip to content

v2.x: MPI_Comm_spawn() fails with MPI tasks #3080

@ggouaillardet

Description

@ggouaillardet

@hppritcha @jsquyres @rhc54
this is a blocker that is likely solved with the simple patch below

diff --git a/orte/mca/plm/base/plm_base_launch_support.c b/orte/mca/plm/base/plm_base_launch_support.c
index eeec61f..8ed2948 100644
--- a/orte/mca/plm/base/plm_base_launch_support.c
+++ b/orte/mca/plm/base/plm_base_launch_support.c
@@ -625,9 +630,9 @@ void orte_plm_base_post_launch(int fd, short args, void *cbdata)
      * it won't register and we need to send the response now.
      * Otherwise, it is an MPI job and we should wait for it
      * to register */
-    if (orte_get_attribute(&jdata->attributes, ORTE_JOB_NON_ORTE_JOB, NULL, OPAL_BOOL)) {
+    if (!orte_get_attribute(&jdata->attributes, ORTE_JOB_NON_ORTE_JOB, NULL, OPAL_BOOL)) {
         OPAL_OUTPUT_VERBOSE((5, orte_plm_base_framework.framework_output,
-                             "%s plm:base:launch job %s is not MPI",
+                             "%s plm:base:launch job %s is MPI",
                              ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
                              ORTE_JOBID_PRINT(jdata->jobid))); 

master is OK, v2.0.x is fine (though the verbose message is incorrect) but v2.x mainly hangs (and works with confusing error message once in a while)

the PR will come shortly, meanwhile, i guess we do not want to release v2.1.0 nor a new rc tarball

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions