Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_PROC_NULL in Topology Creation #4675

Open
omor1 opened this issue Dec 29, 2017 · 7 comments
Open

MPI_PROC_NULL in Topology Creation #4675

omor1 opened this issue Dec 29, 2017 · 7 comments

Comments

@omor1
Copy link
Contributor

omor1 commented Dec 29, 2017

I'm trying to create a binary tree topology using MPI_Dist_graph_create_adjacent(), simplifying the graph boundaries by using MPI_PROC_NULL. This allows specifying all nodes in a consistent way.
I'm not sure this is allowed by the specification; I could find no information either way.
However, the neighborhood collectives specify that the borders of a cartesian topology act as though they send and receive from MPI_PROC_NULL. It could be useful to be able to obtain similar behavior in a generic graph.

Example code is given below:

int world_rank;
int world_size;
int neighbor[3];

MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);

neighbor[0] = world_rank > 0 ? (world_rank-1)/2 : MPI_PROC_NULL;
neighbor[1] = 2*world_rank+1 < world_size ? 2*world_rank+1 : MPI_PROC_NULL;
neighbor[2] = 2*world_rank+2 < world_size ? 2*world_rank+2 : MPI_PROC_NULL;

MPI_Dist_graph_create_adjacent(MPI_COMM_WORLD,
                               3, neighbor, MPI_UNWEIGHTED,
                               3, neighbor, MPI_UNWEIGHTED,
                               MPI_INFO_NULL, true, &CommTree);

Currently this code results in the following error:

*** An error occurred in MPI_Dist_graph_create_adjacent invalid sources
*** reported by process [3896508417,2]
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_ARG: invalid argument of some other kind
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
@bosilca
Copy link
Member

bosilca commented Jan 2, 2018

My understanding of the MPI standard description of the topology creation functions is that the list of neighbors is rank based, and should only contains the meaningful neighbors. Thus, you cannot have MPI_PROC_NULL as a neighbor in any type of topology.

@omor1
Copy link
Contributor Author

omor1 commented Jan 2, 2018

It is true that the distributed graph constructors specify sources and destinations as "array[s] of non-negative integers" describing "ranks of processes for which the calling process is a destination / source".

MPI_PROC_NULL is defined as -2 in Open MPI, which would make it invalid according to the specification. However, the regular graph constructor MPI_Graph_create() specifies edges as "array of integers describing graph edges", which could mean that perhaps MPI_PROC_NULL is legal there.

I'm not quite sure what you mean by meaningful neighbors—that term isn't used in the specification, and is certainly vague, as what is 'meaningful' can vary between implementations. The specification in fact explicitly allows edges to be defined multiple times for the same (source, dest) pair, but leaves the meaning up to the implementation. Similarly, at least for the non-distributed graph constructor, a process can be its own neighbor—though this isn't explicitly stated that this is true for the distributed graph constructor, I can't see a reason why this wouldn't be allowed as well.

Regarding MPI_PROC_NULL, the specification states:

The special value MPI_PROC_NULL can be used instead of a rank wherever a source or a destination argument is required in a call.

This would appear to imply that MPI_PROC_NULL should be legal in topology creation functions for specifying ranks of processes, though the standard is vague on this point.

@bosilca
Copy link
Member

bosilca commented Jan 2, 2018

meaningful neighbor is a neighbor that define an edge where a communication will take place. This does not prevent 2 ranks of being neighbors multiple times, not a rank of being it's own neighbor. Communications with MPI_PROC_NULL are meaningless, and except for adding gaps into the communication buffers I cant see why you want to specify them.

This discussion pertains to the MPI standardization effort, I would suggest to ask the question on the MPI Forum mailinglist (mpi-forum@lists.mpi-forum.org).

@omor1
Copy link
Contributor Author

omor1 commented Mar 8, 2018

I ended up posting this on the MPI Comments mailing list: Behavior of [Distributed] Graph Topology Constructors when a neighbor is MPI_PROC_NULL

Adding gaps in communication buffers is exactly the point here; it allows one to say "every process has n neighbors, but some of these happen to be null", functioning akin to how the neighborhood collectives function with non-periodic cartesian topologies:

For a Cartesian topology, created with MPI_Cart_create, the sequence of neighbors in the send and receive buffers at each process is defined by order of the dimensions, first the neighbor in the negative direction and then in the positive direction with displacement 1. The numbers of sources and destinations in the communication routines are 2*ndims with ndims defined in MPI_Cart_create. If a neighbor does not exist, i.e., at the border of a Cartesian topology in the case of a non-periodic virtual grid dimension (i.e., periods[...]==false), then this neighbor is defined to be MPI_PROC_NULL.

If a neighbor in any of the functions is MPI_PROC_NULL, then the neighborhood collective communication behaves like a point-to-point communication with MPI_PROC_NULL in this direction. That is, the buffer is still part of the sequence of neighbors but it is neither communicated nor updated.

This is useful for e.g. binary tree topologies (as described above).

bosilca added a commit to bosilca/ompi that referenced this issue Mar 9, 2018
Allowing MPI_PROC_NULL as a neighbor in any topology allows us to add
gaps on the send and recv buffers. This does make the traditional
neighbor collective have a similar behavior as the V version, but in
same time it allows the users to skip the step where they prepare the
counts and the displacement array.

For more info please take a look at issue open-mpi#4675.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
bosilca added a commit to bosilca/ompi that referenced this issue Mar 9, 2018
Allowing MPI_PROC_NULL as a neighbor in any topology allows us to add
gaps on the send and recv buffers. This does make the traditional
neighbor collective have a similar behavior as the V version, but in
same time it allows the users to skip the step where they prepare the
counts and the displacement array.

For more info please take a look at issue open-mpi#4675.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
@bosilca
Copy link
Member

bosilca commented Mar 9, 2018

From a practical perspective I see your point. You want a similar level of flexibility as the V version of the neighbor collective calls, but without having to provide an array of counts, nor compute locally the neighbors displacements.

I make a PR #4898. Give it a try and let us know. Meanwhile I will try to get convinced by the MPI forum that this is the right approach.

@omor1
Copy link
Contributor Author

omor1 commented Mar 10, 2018

I'll take a look at the PR. I'll also post on the MPI issues repository, as it seems more active than the mailing lists.

@omor1
Copy link
Contributor Author

omor1 commented Mar 16, 2018

It appears that MPICH actually supports this behavior. I've submitted an issue to the MPI Issues repository to clarify the wording to make this explicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants