Inconsistent & Perhaps buggy dynamic tuning behavior #7672

wckzhang · 2020-04-29T23:21:53Z

Dynamic tuning files are organized in this format:

1             # Number of collectives
1             # Collective ID
1             # Number of com sizes
2             # Com size
2             # Number of message sizes
0 1 0 0       # Message size 0, algorithm 1, topo and segmentation at 0
1024 2 0 0    # Message size 1024, algorithm 2, topo and segmentation at 0

However, the actual definition of "Message size" is vague and dependent on the collective in question (https://github.com/open-mpi/ompi/blob/master/ompi/mca/coll/tuned/coll_tuned_decision_dynamic.c)

The following table represents the behavior of each collective's interpretation of "Message size"

Collective MPI Call	Message Size in coll tuned
MPI_Allreduce	datatype size * number of elements in send buffer
MPI_Alltoall	datatype size * comm size * number of elements to send to each process
MPI_Alltoallv	0
MPI_Barrier	0
MPI_Bcast	datatype size * number of entries in buffer
MPI_Reduce	datatype size * number of elements in send buffer
MPI_Reduce_scatter	datatype size * sum of number of elements in result distributed to each process (sum of recvcounts)
MPI_Reduce_scatter_block	datatype size * comm size * element count per block
MPI_Allgather	datatype size * comm size * number of elements in send buffer
MPI_Allgatherv	datatype size * sum of number of elements that are to be received from each process (sum of recvcounts)
MPI_Gather	datatype size * comm size
MPI_Scatter	datatype size * comm size
MPI_Exscan	datatype size * comm size
MPI_Scan	datatype size * comm size

There are a couple issues with how this is done. First, the way that the decision file is parsed, each message size already has a corresponding comm size, so there's no reason to have comm size as part of the calculation, it gives us no additional data. Second, I don't see how the MPI_Gather implementation's interpretation of message size as datatype size * comm size is correct in any way.

We should have a more consistent definition of what message size means for our dynamic tuning file.

The text was updated successfully, but these errors were encountered:

bosilca · 2020-04-30T03:41:13Z

The decision is made based on two principles: use a globally known amount such that the decision is deterministic across all processes, and use an amount that is reflective of the amount of data each process is supposed to receive but without taking in account the algorithm. This explains why for allgather we use comm size (because at the end each process will need to receive that amount), but we are not using the comm size for the allreduce. Scan and Exscan are outliers, that have been copy/paste from another algorithm without taking in account the above considerations.

wckzhang · 2020-04-30T17:29:42Z

@bosilca Can you explain why datatype size * comm size is used for MPI_Gather? I don't really understand why datatype size is important without number of elements

wckzhang · 2020-04-30T19:48:05Z

It's not that important whether comm size is used for each collective or not, but MPI_Gather just seems broken, can only ever tune based on collective size, not number of elements being sent

bosilca · 2020-05-01T15:55:19Z

The comm size is an extremely important tuning parameters as most of the papers highlight. But you are right, the number of elements should have been part of the gather decision.

wckzhang · 2020-05-01T17:41:58Z

I do think comm size is important, but that information is already contained in the file format (since all message sizes belong to a comm size already), so it seems a little extraneous, but it's not a big deal. The datatype size * comm size we should just change, seems like a bug we should fix.

wckzhang · 2020-05-26T20:24:33Z

#7758 resolves this issue for gather/scatter

wckzhang mentioned this issue Apr 29, 2020

Decision file creation code does not adhere to proper message size open-mpi/ompi-collectives-tuning#7

Open

wckzhang mentioned this issue Sep 1, 2021

coll_tuned_dynamic_rules_filename option #8157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent & Perhaps buggy dynamic tuning behavior #7672

Inconsistent & Perhaps buggy dynamic tuning behavior #7672

wckzhang commented Apr 29, 2020 •

edited

bosilca commented Apr 30, 2020

wckzhang commented Apr 30, 2020

wckzhang commented Apr 30, 2020

bosilca commented May 1, 2020

wckzhang commented May 1, 2020

wckzhang commented May 26, 2020

Inconsistent & Perhaps buggy dynamic tuning behavior #7672

Inconsistent & Perhaps buggy dynamic tuning behavior #7672

Comments

wckzhang commented Apr 29, 2020 • edited

bosilca commented Apr 30, 2020

wckzhang commented Apr 30, 2020

wckzhang commented Apr 30, 2020

bosilca commented May 1, 2020

wckzhang commented May 1, 2020

wckzhang commented May 26, 2020

wckzhang commented Apr 29, 2020 •

edited