Skip to content

v5.0.x: Fix handling of large data in MPI_Sendrecv_replace #8896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ggouaillardet
Copy link
Contributor

No description provided.

Because MPI_Sendrecv_replace() uses PMPI_Sendrecv() with MPI_PACKED
under the hood, the data to be exchanged size = MPI_Type_size(datatype) * count
must fit in a signed integer.
Otherwise, PMPI_Sendrecv()
 - fails with an error message if (int)size < 0
 - silently truncate the data if (int)size >= 0

Refs. open-mpi#8862

Thanks Jakub Benda for reporting this issue and suggesting a fix.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 6a11873)
and fix a comment

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 0b38190)
@ggouaillardet ggouaillardet added this to the v5.0.0 milestone Apr 30, 2021
@ggouaillardet ggouaillardet changed the title Fix handling of large data in MPI_Sendrecv_replace v5.0.x: Fix handling of large data in MPI_Sendrecv_replace Apr 30, 2021
@awlauria
Copy link
Contributor

bot:aws:retest

do ignore the status returned by opal_convertor_pack()
since it is *not* MPI_SUCCESS in case of success.

This fixes a regression introduced in open-mpi/ompi@0b38190

Refs. open-mpi#8907

Thanks Lisandro Dalcin for reporting this issue.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 34b764c)
@ggouaillardet
Copy link
Contributor Author

@jsquyres @bwbarrett I issued #8908 in order to find a regression (I introduced, shame on me for that).
This PR has yet to be reviewed and merged.

This current PR previously contained two cherry-picks on the same issue (MPI_Sendrecv_replace()), and I pushed yet an other cherry-pick to it.
The last cherry-pick is from a yet to be merged commit into master, so I would have expected the bot to complain about it ... but it did not.

If this is not the intended behavior, could you please have a look at this?

@gpaulsen gpaulsen requested a review from bosilca May 6, 2021 13:08
@gpaulsen
Copy link
Member

gpaulsen commented May 6, 2021

Does this prevent the ability for the PMPI_ layer to intercept MPI_Sendrecv ?

@ggouaillardet
Copy link
Contributor Author

The initial code called PMPI_Sendrecv() (note the first P in PMPI), so this subroutine was not "trappable" by the PMPI layer. (Instead, the enduser could trap MPI_Sendrecv_replace().

@markalle
Copy link
Contributor

I agree this is more in line with how everything else in OMPI seems to be done. Is the reason for the MCA_PML_CALL(irecv(...)) that it allows large int arguments that the corresponding MPI calls don't? Elsewhere I've had concerns about the directness of these calls, eg some of the collectives have real functionality being done at their top level that gets skipped over when the collective is called analogously to the MCA_PML_CALL above. But my concerns about that are fairly isolated and don't related to this commit

@gpaulsen gpaulsen merged commit cb26046 into open-mpi:v5.0.x May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants