Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions orte/mca/ess/hnp/ess_hnp_module.c
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
* Copyright (c) 2011-2017 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2013-2017 Intel, Inc. All rights reserved.
* Copyright (c) 2017 Research Organization for Information Science
* Copyright (c) 2017-2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
Expand Down Expand Up @@ -462,12 +462,14 @@ static int rte_init(void)
if (orte_retain_aliases) {
aliases = NULL;
opal_ifgetaliases(&aliases);
/* add our own local name to it */
opal_argv_append_nosize(&aliases, orte_process_info.nodename);
aptr = opal_argv_join(aliases, ',');
if (0 < opal_argv_count(aliases)) {
/* add our own local name to it */
opal_argv_append_nosize(&aliases, orte_process_info.nodename);
aptr = opal_argv_join(aliases, ',');
orte_set_attribute(&node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, aptr, OPAL_STRING);
free(aptr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this logic:

  • If there are aliases, then set ORTE_NODE_ALIAS to a comma-delimited list of aliases plus our local nodename.
  • If there are no aliases, then do not set ORTE_NODE_ALIAS at all -- not even to our local nodename.

Is the behavior w.r.t. the local nodename intended? I.e., is the local nodename only relevant if there are aliases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - that is the intended behavior. Note that this also only happens if orte_retain_aliases is set. Otherwise, we just ignore the aliases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. 👍

opal_argv_free(aliases);
orte_set_attribute(&node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, aptr, OPAL_STRING);
free(aptr);
}
/* record that the daemon job is running */
jdata->num_procs = 1;
Expand Down
8 changes: 5 additions & 3 deletions orte/mca/plm/base/plm_base_launch_support.c
Original file line number Diff line number Diff line change
Expand Up @@ -1158,10 +1158,12 @@ void orte_plm_base_daemon_callback(int status, orte_process_name_t* sender,
opal_argv_append_nosize(&atmp, alias);
free(alias);
}
alias = opal_argv_join(atmp, ',');
if (0 < naliases) {
alias = opal_argv_join(atmp, ',');
orte_set_attribute(&daemon->node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, alias, OPAL_STRING);
free(alias);
}
opal_argv_free(atmp);
orte_set_attribute(&daemon->node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, alias, OPAL_STRING);
free(alias);
}

/* unpack the topology signature for that node */
Expand Down
10 changes: 6 additions & 4 deletions orte/mca/ras/base/ras_base_node.c
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
* Copyright (c) 2011-2017 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2014-2017 Intel, Inc. All rights reserved.
* Copyright (c) 2015 Research Organization for Information Science
* Copyright (c) 2015-2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
Expand Down Expand Up @@ -157,10 +157,12 @@ int orte_ras_base_node_insert(opal_list_t* nodes, orte_job_t *jdata)
opal_argv_free(nalias);
}
/* and store the result */
ptr = opal_argv_join(alias, ',');
if (0 < opal_argv_count(alias)) {
ptr = opal_argv_join(alias, ',');
orte_set_attribute(&hnp_node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, ptr, OPAL_STRING);
free(ptr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why split the original value that you just obtained from orte_get_attribute(..., ORTE_NODE_ALIAS, ...)? If you get a non-NULL/non-empty value back, why not just directly orte_set_attribute(...)?

I ask because you're not saving the value you get back from orte_get_attribute() -- you're getting the value from orte_get_attribute(), transforming it, then transforming it back, and then using the double-transformed value to call orte_set_attribute(). Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to add the new alias to the end of the existing comma-delimited list, so we have to first fetch what we may already have, split it to get an argv array, then add only if unique, recombine and re-store. If he gets nothing back from the original "get", then he still has to store the new one.

Note that the code block 151-158 should be removed - looks like a copy/paste error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhc54 at line 143, we get the alias(es) for &hnp_node->attributes but at line 151, we get the alias(es) for &node->attributes, so are you sure this block should be removed ?

note I will add a missing free(ptr) at line 153

also, what is the rationale for testing ptr != NULL at line 144 when we check the return of orte_get_attribute() everywhere else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggouaillardet You are right about the code block - I forgot that "node" may be carrying its own list of aliases.

Technically, it would probably be better to check the return code instead of checking ptr for NULL. Ultimately, it doesn't matter as ptr will obviously remain NULL if the attribute isn't found.

opal_argv_free(alias);
orte_set_attribute(&hnp_node->attributes, ORTE_NODE_ALIAS, ORTE_ATTR_LOCAL, ptr, OPAL_STRING);
free(ptr);
}
/* don't keep duplicate copy */
OBJ_RELEASE(node);
Expand Down