Skip to content

Conversation

@sam6258
Copy link
Contributor

@sam6258 sam6258 commented Feb 28, 2018

… for ORTE_APP_PREFIX_DIR

When reconstructing PATH and LD_LIBRARY_PATH, with the value from the ORTE_APP_PREFIX_DIR parameter, we are not correctly picking out the prefix dir on multiple app contexts (should only use the first app context to retrieve the prefix dir). This can be seen in an MPMD style launch. The second application will receive NULL == param and skip over the loop to appropriately set PATH and LD_LIBRARY_PATH. We can see something similar to the fix I have made: https://github.ibm.com/smpi/ibm-ompi/blob/ibm_smpi_viper_master/orte/orted/orted_submit.c#L1022-L1033

Using the app prefix from multiple app contexts:

mpirun --prefix garbage_prefix -mca btl_openib_warn_default_gid_prefix 0 -np 1 env : -np 1 env | grep "^PATH"
PATH=garbage_prefix/bin:/garbage_prefix/bin:/smpi_dev/smiller/ompi-master//bin:..........
PATH=/garbage_prefix/bin:/smpi_dev/smiller/ompi-master//bin:.........

Using only the app prefix from the first app context:

PATH=garbage_prefix/bin:/garbage_prefix/bin:/smpi_dev/smiller/ompi-master//bin:..........
PATH=garbage_prefix/bin:/garbage_prefix/bin:/smpi_dev/smiller/ompi-master//bin:..........

@sam6258 sam6258 force-pushed the prefix_dir_fix branch 3 times, most recently from aa174f9 to d8e2e6c Compare February 28, 2018 20:02
Copy link
Member

@jjhursey jjhursey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. 👍

Copy link
Contributor

@rhc54 rhc54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urrr...hold on a minute. It isn't quite that simple. The problem is that the user can indeed have specified a prefix for each app_context - this particularly happens when running on a heterogeneous cluster. So you need to check first to see if there is a prefix for that app_context - if not, then you can use the one from the first app_context, if given.

@sam6258
Copy link
Contributor Author

sam6258 commented Feb 28, 2018

@rhc54 Does that mean this comment is stale? https://github.ibm.com/smpi/ibm-ompi/blob/ibm_smpi_viper_master/orte/orted/orted_submit.c#L1512-L1517 Or am I misinterpreting what the portion of code following line 1517 is doing?

… for ORTE_APP_PREFIX_DIR

Signed-off-by: Scott Miller <scott.miller1@ibm.com>
@rhc54
Copy link
Contributor

rhc54 commented Feb 28, 2018

Like I said, it's not that simple. mpirun can only have one prefix on the cmd line, but that's because we don't have a way of dealing with multiple prefixes when launching the daemons via something like srun (note that we could do it for ssh-based launches). However, if you look at the dynamic spawn code (orte/orted/pmix/pmix_server_dyn.c), or if you look at the DVM code, you'll quickly see that we do indeed support per-app_context prefixes.

The comment in orted_submit solely applies to mpirun - in fact, the code really was a cut/paste from there.

@sam6258
Copy link
Contributor Author

sam6258 commented Feb 28, 2018

I see what you are saying. So you are proposing something more along the lines of this?:

    orte_get_attribute(&app->attributes, ORTE_APP_PREFIX_DIR, (void**)&param, OPAL_STRING);
    /* grab the parameter from the first app context because the current context does not have a prefix assigned */
    if (NULL == param) {
        tmp_app = (orte_app_context_t*)opal_pointer_array_get_item(jdata->apps, 0);
        assert (NULL != tmp_app);
        orte_get_attribute(&tmp_app->attributes, ORTE_APP_PREFIX_DIR, (void**)&param, OPAL_STRING);
    } 

@rhc54
Copy link
Contributor

rhc54 commented Feb 28, 2018

Yeah, I think that would cover the bases.

@sam6258
Copy link
Contributor Author

sam6258 commented Feb 28, 2018

Updated with @rhc54 's requested changes.

Copy link
Contributor

@rhc54 rhc54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@jjhursey jjhursey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jjhursey jjhursey merged commit 6f58954 into open-mpi:master Mar 1, 2018
@gpaulsen
Copy link
Member

gpaulsen commented Mar 16, 2018

PRed to v3.0.x in #4873
PRed to v3.1.x in #4872

@sam6258 sam6258 deleted the prefix_dir_fix branch March 22, 2018 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants