-
Notifications
You must be signed in to change notification settings - Fork 932
orte: only set the ORTE_NODE_ALIAS attribute when required #5097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orte: only set the ORTE_NODE_ALIAS attribute when required #5097
Conversation
When there is no alias for a given node, do not set the ORTE_NODE_ALIAS attribute to an empty string any more. Thanks Erico for reporting this issue. Thanks Ralph for the guidance. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (back-ported from commit open-mpi/ompi@a05456a)
| aptr = opal_argv_join(aliases, ','); | ||
| orte_set_attribute(&node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, aptr, OPAL_STRING); | ||
| free(aptr); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by this logic:
- If there are aliases, then set
ORTE_NODE_ALIASto a comma-delimited list of aliases plus our local nodename. - If there are no aliases, then do not set
ORTE_NODE_ALIASat all -- not even to our local nodename.
Is the behavior w.r.t. the local nodename intended? I.e., is the local nodename only relevant if there are aliases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - that is the intended behavior. Note that this also only happens if orte_retain_aliases is set. Otherwise, we just ignore the aliases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. 👍
| ptr = opal_argv_join(alias, ','); | ||
| orte_set_attribute(&hnp_node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, ptr, OPAL_STRING); | ||
| free(ptr); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why split the original value that you just obtained from orte_get_attribute(..., ORTE_NODE_ALIAS, ...)? If you get a non-NULL/non-empty value back, why not just directly orte_set_attribute(...)?
I ask because you're not saving the value you get back from orte_get_attribute() -- you're getting the value from orte_get_attribute(), transforming it, then transforming it back, and then using the double-transformed value to call orte_set_attribute(). Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to add the new alias to the end of the existing comma-delimited list, so we have to first fetch what we may already have, split it to get an argv array, then add only if unique, recombine and re-store. If he gets nothing back from the original "get", then he still has to store the new one.
Note that the code block 151-158 should be removed - looks like a copy/paste error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhc54 at line 143, we get the alias(es) for &hnp_node->attributes but at line 151, we get the alias(es) for &node->attributes, so are you sure this block should be removed ?
note I will add a missing free(ptr) at line 153
also, what is the rationale for testing ptr != NULL at line 144 when we check the return of orte_get_attribute() everywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggouaillardet You are right about the code block - I forgot that "node" may be carrying its own list of aliases.
Technically, it would probably be better to check the return code instead of checking ptr for NULL. Ultimately, it doesn't matter as ptr will obviously remain NULL if the attribute isn't found.
| aptr = opal_argv_join(aliases, ','); | ||
| orte_set_attribute(&node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, aptr, OPAL_STRING); | ||
| free(aptr); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - that is the intended behavior. Note that this also only happens if orte_retain_aliases is set. Otherwise, we just ignore the aliases.
| ptr = opal_argv_join(alias, ','); | ||
| orte_set_attribute(&hnp_node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, ptr, OPAL_STRING); | ||
| free(ptr); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to add the new alias to the end of the existing comma-delimited list, so we have to first fetch what we may already have, split it to get an argv array, then add only if unique, recombine and re-store. If he gets nothing back from the original "get", then he still has to store the new one.
Note that the code block 151-158 should be removed - looks like a copy/paste error.
When there is no alias for a given node, do not set the
ORTE_NODE_ALIAS attribute to an empty string any more.
Thanks Erico for reporting this issue.
Thanks Ralph for the guidance.
Signed-off-by: Gilles Gouaillardet gilles@rist.or.jp
(back-ported from commit a05456a)