Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify where MPI_PROC_NULL is permitted #87

Open
omor1 opened this issue Mar 13, 2018 · 47 comments
Open

Clarify where MPI_PROC_NULL is permitted #87

omor1 opened this issue Mar 13, 2018 · 47 comments
Assignees
Labels
mpi-5 For inclusion in the MPI 5.0 standard wg-collectives Collectives Working Group
Projects

Comments

@omor1
Copy link
Member

omor1 commented Mar 13, 2018

Problem

MPI_PROC_NULL is currently specified in the Point-to-Point Communication chapter. It is unclear from the wording whether it is intended to be useable only from within point-to-point communication calls or anywhere a rank parameter is otherwise permitted. Note that MPI_PROC_NULL is also referenced from within the Process Topologies chapter, where it is defined as the neighbor of a process at the border of Cartesian topology with a non-periodic dimension.

Proposal

The standard should clarify whether MPI_PROC_NULL is allowed in other contexts, e.g. as a source or destination in a (distributed) graph topology or as the target of a one-sided communication call.

Changes to the Text

The Null Processes section of the Point-to-Point Communication chapter will need to be changed to accommodate this clarification. The Process Topologies chapter may also need to clarify that although the neighbor acts as though it were MPI_PROC_NULL, such a construct cannot be specified by the user (should null processes be disallowed in contexts other than point-to-point communication).

Should null processes be allowed as the root of a collective communication, the impact on correctness may need to be described (e.g. if the root is MPI_PROC_NULL, the call returns as soon as possible without modifying the buffers, but processes must still call collectives in the same order on the same communicator).

Impact on Implementations

If implementations currently allow MPI_PROC_NULL in contexts other than point-to-point communication, they may need to add a runtime error check for it. If they do not, they may need to add special behavior (e.g. an additional branch).

Impact on Users

Users may be able to define certain behavior more cleanly (for instance, a binary tree process topology). Current code remains valid.

References

#138: Currently ranks are occasionally referred to as 'non-negative integers', which may prohibit a positive implementation of this proposal on implementations that define MPI_PROC_NULL as a negative integer. It has no impact if the clarification is a narrowing one.
PR: https://github.com/mpi-forum/mpi-standard/pull/43

@hjelmn
Copy link

hjelmn commented Mar 13, 2018

I don't see the benefit of this change. Can you give an example of what would be easier if MPI_PROC_NULL is allowed for dist graph?

@omor1
Copy link
Member Author

omor1 commented Mar 13, 2018

A clear example is a virtual binary tree topology. Nodes have between 1 and 3 neighbors (a parent and up to two children); most nodes are inner nodes and all three, some a leaves and have one, and a few have two (the root and some inner nodes in a non-full tree).

Currently, this must be implemented as follows:

MPI_Comm tree;
int world_rank;
int world_size;
int neighbors[3];
int n = 0;

MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);

if (world_rank > 0) 
        neighbors[n++] = (world_rank-1)/2;
if (2*world_rank+1 < world_size)
        neighbors[n++] = 2*world_rank+1;
if (2*world_rank+2 < world_size)
        neighbors[n++] = 2*world_rank+2;

MPI_Dist_graph_create_adjacent(MPI_COMM_WORLD,
                               n, neighbors, MPI_UNWEIGHTED,
                               n, neighbors, MPI_UNWEIGHTED,
                               MPI_INFO_NULL, true, &tree_comm);

It then becomes difficult to tell what is in neighbors[0]; is this a parent, or a left child? This differs depending on whether or not you are the root. The same is true for neighbors[1]; this could be a left child, a right child, or a bogus value.

Suppose I later want to have every node send something to their left child; to do so, I must either store the left child separately from the neighbors array, or store what index in the neighbors array contains the left child, or recalculate either of those on demand—each time!

While this isn't difficult for a binary tree, it is easy to see how in a more general topology that is mostly homogeneous this may become a burden.

With this clarification, this instead becomes

MPI_Comm tree;
int world_rank;
int world_size;
int neighbors[3];

MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);

neighbors[0] = world_rank > 0              ? (world_rank-1)/2 : MPI_PROC_NULL;
neighbors[1] = 2*world_rank+1 < world_size ? 2*world_rank+1   : MPI_PROC_NULL;
neighbors[2] = 2*world_rank+2 < world_size ? 2*world_rank+2   : MPI_PROC_NULL;

MPI_Dist_graph_create_adjacent(MPI_COMM_WORLD,
                               3, neighbors, MPI_UNWEIGHTED,
                               3, neighbors, MPI_UNWEIGHTED,
                               MPI_INFO_NULL, true, &CommTree);

Now neighbors[0] is always the parent; neighbors[1] is always the left child; and neighbors[2] is always the right child. These may be nonexistent (MPI_PROC_NULL), but sending and receiving from them is never incorrect. In short, this helps simplify operations near the boundaries—exactly what MPI_PROC_NULL is meant to be used for!

In many instances, it is convenient to specify a "dummy" source or destination for communication. This simplifies the code that is needed for dealing with boundaries, for example, in the case of a non-circular shift done with calls to send-receive. (MPI 3.1, Section 3.11, Page 80)

A further point is that currently, a distributed graph topology cannot emulate a cartesian topology! I believe that this is not the intention of the MPI specification; a graph topology is intended to be the most general, and it is only because cartesian topologies are so common that the cartesian topology functions are provided (so as to simplify the process).

@omor1
Copy link
Member Author

omor1 commented Mar 16, 2018

Also note that MPI states:

The special value MPI_PROC_NULL can be used instead of a rank wherever a source or a destination argument is required in a call.

So that a strict reading of the text should require this anyway; this proposal merely seeks to clarify the fact, as some implementations do not currently support this.

(Actually, I've gone and tested—it seems that while OpenMPI's behavior is erroneous, MPICH's appears to be correct. In any case, a clarification may be useful.)

@dholmes-epcc-ed-ac-uk dholmes-epcc-ed-ac-uk self-assigned this Mar 21, 2018
@hjelmn
Copy link

hjelmn commented Apr 24, 2018

I would not call Open MPI's behavior erroneous. I follows an interpretation of the standard. Now, I agree that clarification is needed but i don't yet agree with your interpretation.

@omor1
Copy link
Member Author

omor1 commented Apr 29, 2018

Given that MPI_PROC_NULL is supposed to be able to be used wherever a rank is required as a source or destination, and coupled with the fact that the neighborhood collectives explicitly support MPI_PROC_NULL neighbors, rejecting a topology in which MPI_PROC_NULL is specified as the rank of a neighbor would appear to be erroneous—at least to me.

What is your reasoning for the validity of Open MPI's behavior? Keep in mind that the neighbors at the borders of non-periodic dimensions of Cartesian topologies are explicitly defined to be MPI_PROC_NULL by the standard.

It is interesting to note that, as far as I can tell, the distributed graph topology constructors are the only place in the standard where ranks are specified to be 'non-negative integers'; all other functions merely specify them as 'integers'. Even the older non-distributed graph topology constructor specifies the flattened edge representation of neighbors as 'integers'.

@hjelmn
Copy link

hjelmn commented Apr 29, 2018

Exactly, the sources and destinations are non-negative. This requirement implicitly forbids MPI_PROC_NULL (which by definition MUST be negative to avoid conflicting with another valid rank). As such Open MPI's behavior is not erroneous. To me it is also logical.

This behavior may or may not have been the original intent and there may be need for an errata (at which point we will change the behavior of Open MPI). This will certainly be discussed at the MPI Forum meeting in June.

@jdinan
Copy link

jdinan commented Apr 30, 2018

Does MPI require implementations to support MAX_INT processes? I'm not sure that non-negative in this context was intended to exclude MPI_PROC_NULL.

@tonyskjellum
Copy link

tonyskjellum commented Apr 30, 2018 via email

@omor1
Copy link
Member Author

omor1 commented Apr 30, 2018

As far as I can tell, there is no requirement that an MPI implementation support INT_MAX processes or that MPI_PROC_NULL be negative. It is usually implemented as a negative integer (-2 in Open MPI, -1 in MPICH), but this is not required—it could conceivably be implemented as INT_MAX in an implementation that supported less than INT_MAX processes.

According to @hjelmn's reasoning, that would make whether or not MPI_PROC_NULL is a valid neighbor an artifact of the implementation, which is probably not the intent.

@hjelmn
Copy link

hjelmn commented Apr 30, 2018

As I said, I do not know there original intent but it does call out non-negative integers while the other functions do not. I interpret this to be any rank but null.

Who wrote the text for these functions?

@hjelmn
Copy link

hjelmn commented Apr 30, 2018

FWIW. My interpretation is based on there being no material benefit to allowing null. Allowing null saves some minimal bookkeeping for apps. But, I can see having a limitation allowing for some optimization in the library. There is a cost to allowing null.

@omor1
Copy link
Member Author

omor1 commented Apr 30, 2018

Distributed graph topologies were added in MPI-2.2. The author for Process Topologies for MPI-2.2 was Torsten Hoefler. He was also the editor and organizer for that chapter for MPI-3.0 and MPI-3.1 and is still the chair for the upcoming MPI-4.0.

Certainly, every feature has a cost associated with it—hence why determining what the correct behavior ought to be and ensuring that implementations function according to said behavior is important.

@wgropp
Copy link

wgropp commented Apr 30, 2018 via email

@hjelmn
Copy link

hjelmn commented Apr 30, 2018

@wgropp Ok, then we probably should fix this by dropping non-negative so that the argument descriptions match the other calls. Is this a ticket 0 change?

@wgropp
Copy link

wgropp commented Apr 30, 2018 via email

@dholmes-epcc-ed-ac-uk
Copy link
Member

I think there is a reasonable argument for choosing to make this an errata in order to increase its visibility. I have always assumed, through osmosis, that MPI_PROC_NULL is not permitted for (dist) graph topologies. I agree that there is no strong indication in the text of the MPI Standard stating that restriction but there is a clear general statement to the contrary. This may be a common mis-conception and it may behove the MPI Forum to publicise this change much more loudly than is typically done for a ticket 0 change.

@wgropp so, for dist graph, the text for the sources and destinations arguments would read "array of valid rank values" instead of "array of non-negative integers"? And, for graph, the text for the edges argument would read "array of valid rank values describing graph edges (see below)"?

I think an explicit sentence akin to the RMA errata should be added for each case as well.
For dist graph, how about adding (at line 34 on page 297):

MPI_PROC_NULL is a valid rank value for \mpiarg{sources} or for \mpiarg{destinations}.

For graph, how about adding (at line 43 on page 294):

MPI_PROC_NULL is a valid rank value for \mpiarg{edges}.

@wgropp
Copy link

wgropp commented May 1, 2018 via email

@schulzm schulzm added errata Errata items for the previous MPI Standard not ready wg-collectives Collectives Working Group labels May 20, 2018
@dholmes-epcc-ed-ac-uk dholmes-epcc-ed-ac-uk added scheduled reading Reading is scheduled for the next meeting and removed not ready labels May 22, 2018
@dholmes-epcc-ed-ac-uk
Copy link
Member

Suggested change, based on MPI-3.x (as of a few moments ago), with change-bars and change log entry:
mpi-report-ticket87.pdf

@omor1
Copy link
Member Author

omor1 commented May 22, 2018

  1. "array of non-negative integers" should be replaced by "array of valid rank values" in MPI_DIST_GRAPH_CREATE_ADJACENT as well (for both sources and destinations) on page 296 (298 in draft).
  2. The following should be added at line 34 of page 297 (page 299 in draft) [it was added at line 32 of page 299 (35 of 301 in draft) instead, it's required in both]:

MPI_PROC_NULL is a valid rank value for \mpiarg{sources} or for \mpiarg{destinations}.

  1. Change-Log references wrong section and pages for these changes (§ 6.4.2 on page 239 instead of § 7.5.3 on page 296 and § 7.5.4 on pages 298, 299, 300, and 301)
  2. Slight typo on page 300—should be "rank", not "rnak" :)

@omor1
Copy link
Member Author

omor1 commented May 22, 2018

Similar text would probably be needed for the new functions if #78 and/or #84 are accepted.

@dholmes-epcc-ed-ac-uk
Copy link
Member

@omor1 thanks for the careful review 👍

Here's the fixed version:
mpi-report-ticket87.pdf

I expect that this errata change will be passed by the MPI Forum before issues #78, #84, and #93; the authors of those issues will be responsible for propagation of this change to new their functions.

@dholmes-epcc-ed-ac-uk
Copy link
Member

@jdinan (21 hours ago, on the pull request, non-public - hence copied here)

As long as we're clarifying this, it might be helpful to further clarify "MPI_PROC_NULL is valid ..., +and indicates that ...+" to also ensure the meaning of MPI_PROC_NULL is unambiguous.

@dholmes-epcc-ed-ac-uk just now

When explaining, we should cover several points:

  • communication with null neighbours completes normally, like with point-to-point to/from null processes or off the edges of a cartesian topology [the easy tee shot]
  • Graph and dist graph enquiry functions will include null neighbours just like non-null neighbours [the edge of the fairway]
  • buffers/counts/datatypes/displacements/etc for null neighbours must be included for neighbourhood collectives - but not for other collectives [definitely into the rough now]

Alternatively, the graph and dist graph enquiry functions should remove all MPI_PROC_NULL values and give back only non-null neighbours (like a cartesian topology). That means getting back from a query function something different than what went into the creation function. In @omor1's example (issue #87 (comment), comment 13th Mar, binary tree), the user must keep their neighbours input array, perform MPI_GROUP_TRANSLATE_RANKS to get ranks in the new communicator (including MPI_PROC_NULL->MPI_PROC_NULL mappings), and avoid usage of the topology query functions. However, "the sequence of neighbours is defined as the sequence returned by " (MPI-3.1, section 7.6, page 314) [looks like we've found a water hazard]

None of this is trivial, so I agree that some additional explanation is required here.

@dholmes-epcc-ed-ac-uk
Copy link
Member

Is there still interest in pursuing this proposal?

It is marked as an errata (a little ambitious, perhaps), which means it has one chance to get into MPI-4.0 - specifically, it must be formally read at the next face-to-face meeting.

For the associated PR, I must choose between:

  1. leave base branch as mpi-3.x, close the PR - target: unmerged
  2. leave base branch as mpi-3.x, update the PR urgently, read at next meeting - target: errata for mpi-3.x
  3. update base branch to mpi-4.x, update the PR eventually, read whenever - target: change for mpi-4.x

@omor1
Copy link
Member Author

omor1 commented Jun 1, 2019

@dholmes-epcc-ed-ac-uk actually I emailed the mpi-comments mailing list a couple days ago about a related (and extended) version of this.

The gist is that 'rank' arguments (source, destination, root, target, etc) are variously described as either 'integer' or 'non-negative integer'. There's no real reason I can see for this. Additionally, it's unclear whether MPI_PROC_NULL is meant to be useable only in point-to-point communication or more generally wherever a 'rank' argument is allowed (e.g. one can imagine calling MPI_PUT with a null process as the target).

The proposal would be to add 'rank integer' as a semantic term with a possible value in the range [0, N) for a group of size N. If more expansive null process support is desired, then the definition of 'rank integer' can be extended to also include MPI_PROC_NULL; otherwise, this can be specified only for those functions in the point-to-point communication chapter. Receive calls will also need to specify that they can take MPI_ANY_SOURCE (which they already do).

I wanted to pass it through mpi-comments to get feedback before writing and submitting a proposal, but I guess no one actually uses mailing lists nowadays. Should I submit it on mpi-forum/mpi-issues, or to mpiwg-semantic-terms/semantic-terms-issues? It would touch large parts of the standard and the maintainers of the different chapters may want input. (I'd expect little opposition to clarifying 'integer' versus 'non-negative integer', but whether MPI_PROC_NULL should be allowed is evidently much more controversial.)

In any case, that proposal would supersede this one I think, since it's more general. Actually, it should probably be split in two for the two different portions...

@dholmes-epcc-ed-ac-uk
Copy link
Member

@omor1 this proposal/issue could be expanded in scope to encompass the broader problems with inconsistent descriptions of rank in the standard. Closing this issue and starting another loses the history of comments, which dooms us to repeat them on the next issue.

Re-reading the comments on this issue, there is a clear need to make some clarifying text changes for this problem - so option (1) should be dropped.

The comment from @schulzm (#87 (comment)) reminds me that the Forum decided this should be handled as a full change (2 votes), rather than errata (1 vote). In which case, this issue has probably already missed the MPI-4.0 boat - so option (2) should be dropped.

I think we are left with option (3) only.

@omor1
Copy link
Member Author

omor1 commented Jun 1, 2019

@dholmes-epcc-ed-ac-uk right I agree, I think this is more complex than an errata and there's a clear need for clarification, so option (3) is left. I'l rewrite the proposal text and update the labels. Is there worth in keeping the current proposal text around for archival purposes? If so, should I copy it to a comment or place it below the updated text?

@dholmes-epcc-ed-ac-uk
Copy link
Member

@omor1 I think we could update this issue to encompass the wider context that you propose to tackle, rather than creating a new issue that would be disassociated from this conversation.

Alternatively, this issue (and its PR) can be closed and referenced from the new ones when they are created (with a couple of appropriate sentences of explanation).

@omor1
Copy link
Member Author

omor1 commented Jun 4, 2019

I think that I'll create a new issue for the 'integer'/'non-negative integer' issue and update this one to expand where MPI_PROC_NULL should/shouldn't be permitted. The problems are of course related, but mostly orthogonal.

@omor1 omor1 changed the title Clarification: allow MPI_PROC_NULL as neighbor in (distributed) graph topologies Clarify where MPI_PROC_NULL is permitted Jun 4, 2019
@omor1
Copy link
Member Author

omor1 commented Jun 4, 2019

Alright, I've updated this proposal. It now allows for clarification in either direction; i.e. either prohibit MPI_PROC_NULL from everywhere other than P2P calls or all it in all calls with 'rank' parameters. Or something in between, I suppose, as long as that is strictly clarified, though I'd like to see a strong rationale for that.

@omor1 omor1 added mpi-4.0 wg-rma RMA Working Group and removed errata Errata items for the previous MPI Standard labels Jun 9, 2019
@wesbland wesbland added this to To Do in MPI 4.0 Ratification Oct 21, 2020
@dholmes-epcc-ed-ac-uk
Copy link
Member

@omor1 is there still interest in pushing this issue forward? We are now running out of time for the mpi-4.0 release, just as we ran out of time for the mpi-3.1 release.

@dholmes-epcc-ed-ac-uk dholmes-epcc-ed-ac-uk added mpi-4.1 For inclusion in the MPI 4.1 standard and removed mpi-4.0 labels Nov 9, 2020
@wesbland wesbland added this to To Do in MPI 4.1 Jun 9, 2021
@devreal
Copy link

devreal commented Jul 6, 2022

MPI 4 in §12.3 states that

MPI_PROC_NULL is a valid target rank in all MPI RMA communication calls.

It is also explicitly mentioned in MPI_Win_shared_query as a valid rank.

I consider this solved for the RMA chapter and will remove the label.

@devreal devreal removed the wg-rma RMA Working Group label Jul 6, 2022
@Wee-Free-Scot
Copy link

@omor1 (and/or others) is there still interest in pushing this issue forward? We are now running out of time for the mpi-4.1 release, just as we ran out of time for the mpi-3.1 and mpi-4.0 releases.

I see four options:

  1. We leave the text ambiguous and underspecified (this option has worked for decades, why change course now?)
  2. We write "Use of MPI_PROC_NULL in graph or dist-graph topologies is erroneous." (bad)
  3. We write "Use of MPI_PROC_NULL in graph or dist-graph topologies results in implementation-defined behavior." (ugly)
  4. We write "Use of MPI_PROC_NULL in graph or dist-graph topologies is permitted and means ..." (good, but we have no consensus on what text finishes the sentence correctly.)

@Wee-Free-Scot Wee-Free-Scot added mpi-5 For inclusion in the MPI 5.0 standard and removed mpi-4.1 For inclusion in the MPI 4.1 standard labels Jul 20, 2022
@wesbland wesbland removed this from To Do in MPI 4.1 Oct 19, 2022
@wesbland wesbland added this to To Do in MPI 5.0 via automation Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mpi-5 For inclusion in the MPI 5.0 standard wg-collectives Collectives Working Group
Projects
Status: To Do
MPI 5.0
To Do
Development

No branches or pull requests