Allow user to specify logical topology for multi-GPU communications #46

rbabich · 2012-01-31T03:19:51Z

At present, to properly run an application built with QUDA over QMP, it's necessary to specify "-geom Px Py Pz Pt" on the command-line. This is awkward in cases where the application has built-in logic to determine the best layout and is also incompatible with QDP/C, as summarized by James Osborn:

One issue with interfacing multi-GPU to QDP at the moment will be that
QDP isn't currently setting the logical topology. This was changed to
support multi-lattice in QDP, since one might not want the same node
mapping on each lattice, and QMP didn't have communicator support. Now
that QMP does, I could create a new communicator for each lattice and
set each's topology, but my concern is that MPI communicators could be
expensive in memory and I don't want to rely on this. I'm planning to
add some sort of light-weight communicators to QMP to address this.

Another issue is that the QMP topology has it's own fixed mapping of the
ranks to the logical topology, which may not be optimal. Right now QDP
is using a different mapping which was a little better in some cases. I
am also planning to allow the QMP mapping to be more flexible, but
haven't gotten to this yet.

Anyway, the main point is that it would be nice if QUDA didn't rely on
the QMP topology, but instead allowed the user to pass in a function (or
functions) that specified the rank->coords and coords->rank mappings.
That would allow much greater flexibility for the applications using
QUDA. Additionally, allowing a QMP communicator to be specified would be
ever better. You said that some groups may want to port QMP and not use
communicators, but it should be possible for those ports to still keep
the same API (with the communicator structure) and just have it always
be the same one (basically make QMP_comm_split always fail).

At this stage, I'd suggest not going so far as to rely on QMP communicators, which are still an "alpha" feature, but allowing the user to pass in mapping function seems like a nice solution. This would also add much-needed flexibility to the MPI code path, which currently assumes a simple lexicographical ordering when assigning logical grid coordinates to MPI ranks.

To summarize, I propose replacing this declaration:

 void initCommsQuda(int argc, char **argv, const int *X, const int nDim);

with:

 typedef int (*QudaCommsMap)(const int *x, void *fdata);
 void initCommsQuda(const int *X, const int nDim, QudaCommsMap func, void *fdata);

Here fdata points to any auxiliary data required by the user-supplied mapping function func(). Passing NULL for fdata is perfectly valid. As an implementation detail, note that since we'll no longer be able to assume the existence of a QMP logical topology, we'll have to eliminate the use of "relative" sends and receives in face_qmp.cpp. This is a minor inconvenience but again quoting James Osborn:

The relative sends were just a cached version of the calculation of (get my coords) -> (add 1 mod length) -> (get rank). They aren't necessary (and were never used by QDP), since you can just create the neighbor table yourself and use the regular send.

Comments?

The text was updated successfully, but these errors were encountered:

jpfoley · 2012-01-31T03:59:13Z

I recently became aware of this issue when trying to integrate Quda with MILC.
I think the mapping function you suggest would be really helpful.
J.

On Mon, Jan 30, 2012 at 07:19:52PM -0800, Ron Babich wrote:

At present, to properly run an application built with QUDA over QMP, it's necessary to specify "-geom Px Py Pz Pt" on the command-line. This is awkward in cases where the application has built-in logic to determine the best layout and is also incompatible with QDP/C, as summarized by James Osborn:

One issue with interfacing multi-GPU to QDP at the moment will be that
QDP isn't currently setting the logical topology. This was changed to
support multi-lattice in QDP, since one might not want the same node
mapping on each lattice, and QMP didn't have communicator support. Now
that QMP does, I could create a new communicator for each lattice and
set each's topology, but my concern is that MPI communicators could be
expensive in memory and I don't want to rely on this. I'm planning to
add some sort of light-weight communicators to QMP to address this.

Another issue is that the QMP topology has it's own fixed mapping of the
ranks to the logical topology, which may not be optimal. Right now QDP
is using a different mapping which was a little better in some cases. I
am also planning to allow the QMP mapping to be more flexible, but
haven't gotten to this yet.

Anyway, the main point is that it would be nice if QUDA didn't rely on
the QMP topology, but instead allowed the user to pass in a function (or
functions) that specified the rank->coords and coords->rank mappings.
That would allow much greater flexibility for the applications using
QUDA. Additionally, allowing a QMP communicator to be specified would be
ever better. You said that some groups may want to port QMP and not use
communicators, but it should be possible for those ports to still keep
the same API (with the communicator structure) and just have it always
be the same one (basically make QMP_comm_split always fail).

At this stage, I'd suggest not going so far as to rely on QMP communicators, which are still an "alpha" feature, but allowing the user to pass in mapping function seems like a nice solution. This would also add much-needed flexibility to the MPI code path, which currently assumes a simple lexicographical ordering when assigning logical grid coordinates to MPI ranks.

To summarize, I propose replacing this declaration:
 void initCommsQuda(int argc, char **argv, const int *X, const int nDim);
with:
 typedef int (*QudaCommsMap)(const int *x, void *fdata);
 void initCommsQuda(const int *X, const int nDim, QudaCommsMap func, void *fdata);
Here fdata points to any auxiliary data required by the user-supplied mapping function func(). Passing NULL for fdata is perfectly valid. As an implementation detail, note that since we'll no longer be able to assume the existence of a QMP logical topology, we'll have to eliminate the use of "relative" sends and receives in face_qmp.cpp. This is a minor inconvenience but again quoting James Osborn:

The relative sends were just a cached version of the calculation of (get my coords) -> (add 1 mod length) -> (get rank). They aren't necessary (and were never used by QDP), since you can just create the neighbor table yourself and use the regular send.

Comments?

Reply to this email directly or view it on GitHub:
#46

maddyscientist · 2012-01-31T13:59:55Z

All sounds readable to me. How much work is this?

On Jan 30, 2012, at 19:19, Ron Babichreply@reply.github.com wrote:

At present, to properly run an application built with QUDA over QMP, it's necessary to specify "-geom Px Py Pz Pt" on the command-line. This is awkward in cases where the application has built-in logic to determine the best layout and is also incompatible with QDP/C, as summarized by James Osborn:

One issue with interfacing multi-GPU to QDP at the moment will be that
QDP isn't currently setting the logical topology. This was changed to
support multi-lattice in QDP, since one might not want the same node
mapping on each lattice, and QMP didn't have communicator support. Now
that QMP does, I could create a new communicator for each lattice and
set each's topology, but my concern is that MPI communicators could be
expensive in memory and I don't want to rely on this. I'm planning to
add some sort of light-weight communicators to QMP to address this.

Another issue is that the QMP topology has it's own fixed mapping of the
ranks to the logical topology, which may not be optimal. Right now QDP
is using a different mapping which was a little better in some cases. I
am also planning to allow the QMP mapping to be more flexible, but
haven't gotten to this yet.

Anyway, the main point is that it would be nice if QUDA didn't rely on
the QMP topology, but instead allowed the user to pass in a function (or
functions) that specified the rank->coords and coords->rank mappings.
That would allow much greater flexibility for the applications using
QUDA. Additionally, allowing a QMP communicator to be specified would be
ever better. You said that some groups may want to port QMP and not use
communicators, but it should be possible for those ports to still keep
the same API (with the communicator structure) and just have it always
be the same one (basically make QMP_comm_split always fail).

At this stage, I'd suggest not going so far as to rely on QMP communicators, which are still an "alpha" feature, but allowing the user to pass in mapping function seems like a nice solution. This would also add much-needed flexibility to the MPI code path, which currently assumes a simple lexicographical ordering when assigning logical grid coordinates to MPI ranks.

To summarize, I propose replacing this declaration:
void initCommsQuda(int argc, char **argv, const int *X, const int nDim);
with:
typedef int (*QudaCommsMap)(const int *x, void *fdata);
void initCommsQuda(const int *X, const int nDim, QudaCommsMap func, void *fdata);
Here fdata points to any auxiliary data required by the user-supplied mapping function func(). Passing NULL for fdata is perfectly valid. As an implementation detail, note that since we'll no longer be able to assume the existence of a QMP logical topology, we'll have to eliminate the use of "relative" sends and receives in face_qmp.cpp. This is a minor inconvenience but again quoting James Osborn:

The relative sends were just a cached version of the calculation of (get my coords) -> (add 1 mod length) -> (get rank). They aren't necessary (and were never used by QDP), since you can just create the neighbor table yourself and use the regular send.

Comments?

Reply to this email directly or view it on GitHub:
#46

rbabich · 2012-01-31T16:09:26Z

This is easy, I think, but I want Balint and Guochun to sign off first, since it requires corresponding changes to Chroma and MILC.

rbabich · 2012-01-31T16:19:34Z

Note that the user-supplied func() can be a simple wrapper to QMP_get_node_number_from(coords) for anyone who wants to keep doing things the old way. This might be a good option for Chroma.

maddyscientist · 2012-02-01T20:57:47Z

Should this be 0.4.0, wouldn't 0.4.1 be more appropriate?

maddyscientist · 2012-08-07T00:10:57Z

Adding another reason this: this makes multi-GPU in QUDA for BQCD much less hacky (issue 73). To enable support for it currently in BQCD I have to add a comm_set_gridsize interface to the outside world so that BQCD can communicate its MPI topology to QUDA.

rbabich · 2013-02-27T22:19:00Z

I'm about to push a commit that implements this. From quda.h:

/**
 * initCommsGridQuda() takes an optional "rank_from_coords" argument that
 * should be a pointer to a user-defined function with this prototype.  
 *
 * @param coords  Node coordinates
 * @param fdata   Any auxiliary data needed by the function
 * @return        MPI rank or QMP node ID cooresponding to the node coordinates
 *
 * @see initCommsGridQuda
 */
typedef int (*QudaCommsMap)(const int *coords, void *fdata);

/**
 * Declare the grid mapping ("logical topology" in QMP parlance)
 * used for communications in a multi-GPU grid.  This function
 * should be called prior to initQuda().  The only case in which
 * it's optional is when QMP is used for communication and the
 * logical topology has already been declared by the application.
 *
 * @param nDim   Number of grid dimensions.  "4" is the only supported
 *               value currently.
 *
 * @param dims   Array of grid dimensions.  dims[0]*dims[1]*dims[2]*dims[3]
 *               must equal the total number of MPI ranks or QMP nodes.
 *
 * @param func   Pointer to a user-supplied function that maps coordinates
 *               in the communication grid to MPI ranks (or QMP node IDs).
 *               If the pointer is NULL, the default mapping depends on
 *               whether QMP or MPI is being used for communication.  With
 *               QMP, the existing logical topology is used if it's been
 *               declared.  With MPI or as a fallback with QMP, the default
 *               ordering is lexicographical with the fourth ("t") index
 *               varying fastest.
 *
 * @param fdata  Pointer to any data required by "func" (may be NULL)               
 *
 * @see QudaCommsMap
 */
void initCommsGridQuda(int nDim, const int *dims, QudaCommsMap func, void *fdata);

rbabich mentioned this issue Jan 31, 2012

Chroma segfault with current quda/master #45

Closed

maddyscientist mentioned this issue Jan 31, 2012

Multi-gpu device selection #48

Closed

ghost assigned rbabich Jul 16, 2012

rbabich closed this as completed in 2a72aa8 Feb 27, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow user to specify logical topology for multi-GPU communications #46

Allow user to specify logical topology for multi-GPU communications #46

rbabich commented Jan 31, 2012

jpfoley commented Jan 31, 2012

maddyscientist commented Jan 31, 2012

rbabich commented Jan 31, 2012

rbabich commented Jan 31, 2012

maddyscientist commented Feb 1, 2012

maddyscientist commented Aug 7, 2012

rbabich commented Feb 27, 2013