-
Notifications
You must be signed in to change notification settings - Fork 934
Fix PATH and LD_LIBRARY_PATH prefixing to use first app context value… #4867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
aa174f9 to
d8e2e6c
Compare
jjhursey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. 👍
rhc54
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Urrr...hold on a minute. It isn't quite that simple. The problem is that the user can indeed have specified a prefix for each app_context - this particularly happens when running on a heterogeneous cluster. So you need to check first to see if there is a prefix for that app_context - if not, then you can use the one from the first app_context, if given.
|
@rhc54 Does that mean this comment is stale? https://github.ibm.com/smpi/ibm-ompi/blob/ibm_smpi_viper_master/orte/orted/orted_submit.c#L1512-L1517 Or am I misinterpreting what the portion of code following line 1517 is doing? |
… for ORTE_APP_PREFIX_DIR Signed-off-by: Scott Miller <scott.miller1@ibm.com>
|
Like I said, it's not that simple. mpirun can only have one prefix on the cmd line, but that's because we don't have a way of dealing with multiple prefixes when launching the daemons via something like srun (note that we could do it for ssh-based launches). However, if you look at the dynamic spawn code (orte/orted/pmix/pmix_server_dyn.c), or if you look at the DVM code, you'll quickly see that we do indeed support per-app_context prefixes. The comment in orted_submit solely applies to mpirun - in fact, the code really was a cut/paste from there. |
|
I see what you are saying. So you are proposing something more along the lines of this?: |
|
Yeah, I think that would cover the bases. |
d8e2e6c to
d7e594f
Compare
|
Updated with @rhc54 's requested changes. |
rhc54
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
jjhursey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
… for ORTE_APP_PREFIX_DIR
When reconstructing PATH and LD_LIBRARY_PATH, with the value from the ORTE_APP_PREFIX_DIR parameter, we are not correctly picking out the prefix dir on multiple app contexts (should only use the first app context to retrieve the prefix dir). This can be seen in an MPMD style launch. The second application will receive NULL == param and skip over the loop to appropriately set PATH and LD_LIBRARY_PATH. We can see something similar to the fix I have made: https://github.ibm.com/smpi/ibm-ompi/blob/ibm_smpi_viper_master/orte/orted/orted_submit.c#L1022-L1033
Using the app prefix from multiple app contexts:
mpirun --prefix garbage_prefix -mca btl_openib_warn_default_gid_prefix 0 -np 1 env : -np 1 env | grep "^PATH"
PATH=garbage_prefix/bin:/garbage_prefix/bin:/smpi_dev/smiller/ompi-master//bin:..........
PATH=/garbage_prefix/bin:/smpi_dev/smiller/ompi-master//bin:.........
Using only the app prefix from the first app context:
PATH=garbage_prefix/bin:/garbage_prefix/bin:/smpi_dev/smiller/ompi-master//bin:..........
PATH=garbage_prefix/bin:/garbage_prefix/bin:/smpi_dev/smiller/ompi-master//bin:..........