Skip to content

Conversation

@ggouaillardet
Copy link
Contributor

When there is no alias for a given node, do not set the
ORTE_NODE_ALIAS attribute to an empty string any more.

Thanks Erico for reporting this issue.
Thanks Ralph for the guidance.

Signed-off-by: Gilles Gouaillardet gilles@rist.or.jp

(back-ported from commit a05456a)

When there is no alias for a given node, do not set the
ORTE_NODE_ALIAS attribute to an empty string any more.

Thanks Erico for reporting this issue.
Thanks Ralph for the guidance.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@a05456a)
aptr = opal_argv_join(aliases, ',');
orte_set_attribute(&node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, aptr, OPAL_STRING);
free(aptr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this logic:

  • If there are aliases, then set ORTE_NODE_ALIAS to a comma-delimited list of aliases plus our local nodename.
  • If there are no aliases, then do not set ORTE_NODE_ALIAS at all -- not even to our local nodename.

Is the behavior w.r.t. the local nodename intended? I.e., is the local nodename only relevant if there are aliases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - that is the intended behavior. Note that this also only happens if orte_retain_aliases is set. Otherwise, we just ignore the aliases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. 👍

ptr = opal_argv_join(alias, ',');
orte_set_attribute(&hnp_node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, ptr, OPAL_STRING);
free(ptr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why split the original value that you just obtained from orte_get_attribute(..., ORTE_NODE_ALIAS, ...)? If you get a non-NULL/non-empty value back, why not just directly orte_set_attribute(...)?

I ask because you're not saving the value you get back from orte_get_attribute() -- you're getting the value from orte_get_attribute(), transforming it, then transforming it back, and then using the double-transformed value to call orte_set_attribute(). Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to add the new alias to the end of the existing comma-delimited list, so we have to first fetch what we may already have, split it to get an argv array, then add only if unique, recombine and re-store. If he gets nothing back from the original "get", then he still has to store the new one.

Note that the code block 151-158 should be removed - looks like a copy/paste error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhc54 at line 143, we get the alias(es) for &hnp_node->attributes but at line 151, we get the alias(es) for &node->attributes, so are you sure this block should be removed ?

note I will add a missing free(ptr) at line 153

also, what is the rationale for testing ptr != NULL at line 144 when we check the return of orte_get_attribute() everywhere else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggouaillardet You are right about the code block - I forgot that "node" may be carrying its own list of aliases.

Technically, it would probably be better to check the return code instead of checking ptr for NULL. Ultimately, it doesn't matter as ptr will obviously remain NULL if the attribute isn't found.

aptr = opal_argv_join(aliases, ',');
orte_set_attribute(&node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, aptr, OPAL_STRING);
free(aptr);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - that is the intended behavior. Note that this also only happens if orte_retain_aliases is set. Otherwise, we just ignore the aliases.

ptr = opal_argv_join(alias, ',');
orte_set_attribute(&hnp_node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, ptr, OPAL_STRING);
free(ptr);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to add the new alias to the end of the existing comma-delimited list, so we have to first fetch what we may already have, split it to get an argv array, then add only if unique, recombine and re-store. If he gets nothing back from the original "get", then he still has to store the new one.

Note that the code block 151-158 should be removed - looks like a copy/paste error.

@hppritcha hppritcha added the NEWS label May 10, 2018
@bwbarrett bwbarrett merged commit 7bbe78d into open-mpi:v3.0.x May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants