Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user to specify logical topology for multi-GPU communications #46

Closed
rbabich opened this issue Jan 31, 2012 · 7 comments
Closed

Comments

@rbabich
Copy link
Member

rbabich commented Jan 31, 2012

At present, to properly run an application built with QUDA over QMP, it's necessary to specify "-geom Px Py Pz Pt" on the command-line. This is awkward in cases where the application has built-in logic to determine the best layout and is also incompatible with QDP/C, as summarized by James Osborn:

One issue with interfacing multi-GPU to QDP at the moment will be that
QDP isn't currently setting the logical topology. This was changed to
support multi-lattice in QDP, since one might not want the same node
mapping on each lattice, and QMP didn't have communicator support. Now
that QMP does, I could create a new communicator for each lattice and
set each's topology, but my concern is that MPI communicators could be
expensive in memory and I don't want to rely on this. I'm planning to
add some sort of light-weight communicators to QMP to address this.

Another issue is that the QMP topology has it's own fixed mapping of the
ranks to the logical topology, which may not be optimal. Right now QDP
is using a different mapping which was a little better in some cases. I
am also planning to allow the QMP mapping to be more flexible, but
haven't gotten to this yet.

Anyway, the main point is that it would be nice if QUDA didn't rely on
the QMP topology, but instead allowed the user to pass in a function (or
functions) that specified the rank->coords and coords->rank mappings.
That would allow much greater flexibility for the applications using
QUDA. Additionally, allowing a QMP communicator to be specified would be
ever better. You said that some groups may want to port QMP and not use
communicators, but it should be possible for those ports to still keep
the same API (with the communicator structure) and just have it always
be the same one (basically make QMP_comm_split always fail).

At this stage, I'd suggest not going so far as to rely on QMP communicators, which are still an "alpha" feature, but allowing the user to pass in mapping function seems like a nice solution. This would also add much-needed flexibility to the MPI code path, which currently assumes a simple lexicographical ordering when assigning logical grid coordinates to MPI ranks.

To summarize, I propose replacing this declaration:

 void initCommsQuda(int argc, char **argv, const int *X, const int nDim);

with:

 typedef int (*QudaCommsMap)(const int *x, void *fdata);
 void initCommsQuda(const int *X, const int nDim, QudaCommsMap func, void *fdata);

Here fdata points to any auxiliary data required by the user-supplied mapping function func(). Passing NULL for fdata is perfectly valid. As an implementation detail, note that since we'll no longer be able to assume the existence of a QMP logical topology, we'll have to eliminate the use of "relative" sends and receives in face_qmp.cpp. This is a minor inconvenience but again quoting James Osborn:

The relative sends were just a cached version of the calculation of (get my coords) -> (add 1 mod length) -> (get rank). They aren't necessary (and were never used by QDP), since you can just create the neighbor table yourself and use the regular send.

Comments?

@jpfoley
Copy link
Member

jpfoley commented Jan 31, 2012

I recently became aware of this issue when trying to integrate Quda with MILC.
I think the mapping function you suggest would be really helpful.
J.

On Mon, Jan 30, 2012 at 07:19:52PM -0800, Ron Babich wrote:

At present, to properly run an application built with QUDA over QMP, it's necessary to specify "-geom Px Py Pz Pt" on the command-line. This is awkward in cases where the application has built-in logic to determine the best layout and is also incompatible with QDP/C, as summarized by James Osborn:

One issue with interfacing multi-GPU to QDP at the moment will be that
QDP isn't currently setting the logical topology. This was changed to
support multi-lattice in QDP, since one might not want the same node
mapping on each lattice, and QMP didn't have communicator support. Now
that QMP does, I could create a new communicator for each lattice and
set each's topology, but my concern is that MPI communicators could be
expensive in memory and I don't want to rely on this. I'm planning to
add some sort of light-weight communicators to QMP to address this.

Another issue is that the QMP topology has it's own fixed mapping of the
ranks to the logical topology, which may not be optimal. Right now QDP
is using a different mapping which was a little better in some cases. I
am also planning to allow the QMP mapping to be more flexible, but
haven't gotten to this yet.

Anyway, the main point is that it would be nice if QUDA didn't rely on
the QMP topology, but instead allowed the user to pass in a function (or
functions) that specified the rank->coords and coords->rank mappings.
That would allow much greater flexibility for the applications using
QUDA. Additionally, allowing a QMP communicator to be specified would be
ever better. You said that some groups may want to port QMP and not use
communicators, but it should be possible for those ports to still keep
the same API (with the communicator structure) and just have it always
be the same one (basically make QMP_comm_split always fail).

At this stage, I'd suggest not going so far as to rely on QMP communicators, which are still an "alpha" feature, but allowing the user to pass in mapping function seems like a nice solution. This would also add much-needed flexibility to the MPI code path, which currently assumes a simple lexicographical ordering when assigning logical grid coordinates to MPI ranks.

To summarize, I propose replacing this declaration:

 void initCommsQuda(int argc, char **argv, const int *X, const int nDim);

with:

 typedef int (*QudaCommsMap)(const int *x, void *fdata);
 void initCommsQuda(const int *X, const int nDim, QudaCommsMap func, void *fdata);

Here fdata points to any auxiliary data required by the user-supplied mapping function func(). Passing NULL for fdata is perfectly valid. As an implementation detail, note that since we'll no longer be able to assume the existence of a QMP logical topology, we'll have to eliminate the use of "relative" sends and receives in face_qmp.cpp. This is a minor inconvenience but again quoting James Osborn:

The relative sends were just a cached version of the calculation of (get my coords) -> (add 1 mod length) -> (get rank). They aren't necessary (and were never used by QDP), since you can just create the neighbor table yourself and use the regular send.

Comments?


Reply to this email directly or view it on GitHub:
#46

@maddyscientist
Copy link
Member

All sounds readable to me. How much work is this?

On Jan 30, 2012, at 19:19, Ron Babichreply@reply.github.com wrote:

At present, to properly run an application built with QUDA over QMP, it's necessary to specify "-geom Px Py Pz Pt" on the command-line. This is awkward in cases where the application has built-in logic to determine the best layout and is also incompatible with QDP/C, as summarized by James Osborn:

One issue with interfacing multi-GPU to QDP at the moment will be that
QDP isn't currently setting the logical topology. This was changed to
support multi-lattice in QDP, since one might not want the same node
mapping on each lattice, and QMP didn't have communicator support. Now
that QMP does, I could create a new communicator for each lattice and
set each's topology, but my concern is that MPI communicators could be
expensive in memory and I don't want to rely on this. I'm planning to
add some sort of light-weight communicators to QMP to address this.

Another issue is that the QMP topology has it's own fixed mapping of the
ranks to the logical topology, which may not be optimal. Right now QDP
is using a different mapping which was a little better in some cases. I
am also planning to allow the QMP mapping to be more flexible, but
haven't gotten to this yet.

Anyway, the main point is that it would be nice if QUDA didn't rely on
the QMP topology, but instead allowed the user to pass in a function (or
functions) that specified the rank->coords and coords->rank mappings.
That would allow much greater flexibility for the applications using
QUDA. Additionally, allowing a QMP communicator to be specified would be
ever better. You said that some groups may want to port QMP and not use
communicators, but it should be possible for those ports to still keep
the same API (with the communicator structure) and just have it always
be the same one (basically make QMP_comm_split always fail).

At this stage, I'd suggest not going so far as to rely on QMP communicators, which are still an "alpha" feature, but allowing the user to pass in mapping function seems like a nice solution. This would also add much-needed flexibility to the MPI code path, which currently assumes a simple lexicographical ordering when assigning logical grid coordinates to MPI ranks.

To summarize, I propose replacing this declaration:

void initCommsQuda(int argc, char **argv, const int *X, const int nDim);

with:

typedef int (*QudaCommsMap)(const int *x, void *fdata);
void initCommsQuda(const int *X, const int nDim, QudaCommsMap func, void *fdata);

Here fdata points to any auxiliary data required by the user-supplied mapping function func(). Passing NULL for fdata is perfectly valid. As an implementation detail, note that since we'll no longer be able to assume the existence of a QMP logical topology, we'll have to eliminate the use of "relative" sends and receives in face_qmp.cpp. This is a minor inconvenience but again quoting James Osborn:

The relative sends were just a cached version of the calculation of (get my coords) -> (add 1 mod length) -> (get rank). They aren't necessary (and were never used by QDP), since you can just create the neighbor table yourself and use the regular send.

Comments?


Reply to this email directly or view it on GitHub:
#46

@rbabich
Copy link
Member Author

rbabich commented Jan 31, 2012

This is easy, I think, but I want Balint and Guochun to sign off first, since it requires corresponding changes to Chroma and MILC.

@rbabich
Copy link
Member Author

rbabich commented Jan 31, 2012

Note that the user-supplied func() can be a simple wrapper to QMP_get_node_number_from(coords) for anyone who wants to keep doing things the old way. This might be a good option for Chroma.

@maddyscientist
Copy link
Member

Should this be 0.4.0, wouldn't 0.4.1 be more appropriate?

@ghost ghost assigned rbabich Jul 16, 2012
@maddyscientist
Copy link
Member

Adding another reason this: this makes multi-GPU in QUDA for BQCD much less hacky (issue 73). To enable support for it currently in BQCD I have to add a comm_set_gridsize interface to the outside world so that BQCD can communicate its MPI topology to QUDA.

@rbabich
Copy link
Member Author

rbabich commented Feb 27, 2013

I'm about to push a commit that implements this. From quda.h:

/**
 * initCommsGridQuda() takes an optional "rank_from_coords" argument that
 * should be a pointer to a user-defined function with this prototype.  
 *
 * @param coords  Node coordinates
 * @param fdata   Any auxiliary data needed by the function
 * @return        MPI rank or QMP node ID cooresponding to the node coordinates
 *
 * @see initCommsGridQuda
 */
typedef int (*QudaCommsMap)(const int *coords, void *fdata);

/**
 * Declare the grid mapping ("logical topology" in QMP parlance)
 * used for communications in a multi-GPU grid.  This function
 * should be called prior to initQuda().  The only case in which
 * it's optional is when QMP is used for communication and the
 * logical topology has already been declared by the application.
 *
 * @param nDim   Number of grid dimensions.  "4" is the only supported
 *               value currently.
 *
 * @param dims   Array of grid dimensions.  dims[0]*dims[1]*dims[2]*dims[3]
 *               must equal the total number of MPI ranks or QMP nodes.
 *
 * @param func   Pointer to a user-supplied function that maps coordinates
 *               in the communication grid to MPI ranks (or QMP node IDs).
 *               If the pointer is NULL, the default mapping depends on
 *               whether QMP or MPI is being used for communication.  With
 *               QMP, the existing logical topology is used if it's been
 *               declared.  With MPI or as a fallback with QMP, the default
 *               ordering is lexicographical with the fourth ("t") index
 *               varying fastest.
 *
 * @param fdata  Pointer to any data required by "func" (may be NULL)               
 *
 * @see QudaCommsMap
 */
void initCommsGridQuda(int nDim, const int *dims, QudaCommsMap func, void *fdata);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants