Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport/datatype #8837

Merged
merged 6 commits into from
Apr 22, 2021
Merged

Backport/datatype #8837

merged 6 commits into from
Apr 22, 2021

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented Apr 22, 2021

Brings some updates on the datatype engine into the 4.1. Among these the most critical is the partial unpack bug from #8466.

Here are the commits from master that are covered by this PR:
e8ebe13
9901325
ef28e8d
73d64cb
fb07960

It must be noted that this PR does not bring the support for MPI_LONG and MPI_UNSIGNED_LONG in external32, because it would have required to break the ABI (because of the 2 new datatypes #define added).

Unfortunately, I had to import 2 additional commits in order to be able to build and run on an M1: 4f2dde0 and 73aae14.

Fixes #8466.

One of these commits is intentionally not a cherry pick: bot:notacherrypick

bosilca and others added 5 commits April 22, 2021 02:01
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit e8ebe13)
Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
(cherry picked from commit 9901325)
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit ef28e8d)
This commit fixes the support for heterogeneous environments and
specifically for external32. The root cause was that during the datatype
optimization process types that are contiguous in memory are collapsed
together in order to decrease the number of conversion (or memcpy)
function calls. The resulting type however, does not have the same
conversion rules as the types it replaced, leading to an incorrect (or
absent) conversion in some cases. This patch marks the datatypes where
types have been collapsed during the optimization process with a flag,
allowing the convertor to detect if the optimized type can be used in
heterogeneous setups.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 73d64cb)
When unpacking a partial predefined element check the boundaries of the
description vector type, and adjust the memory pointer accordingly (to
reflect not only when a single basic type was correctly unpacked, but
also when an entire blocklen has been unpacked).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit fb07960)
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

bot:notacherrypick
@jsquyres
Copy link
Member

@open-mpi/ucx Could you guys verify that this fixes this issue for you on v4.1.x ASAP?

@hoopoepg
Copy link
Contributor

Confirmed: issue is resolved.
tested on OMPI v4.1.0 + patch from this PR, release build, reproducer from original issue: https://gist.github.com/jayeshkrishna/5d053d3d5bba11359ea2dc82c435c3ea

@jsquyres
Copy link
Member

@hoopoepg Could you do me a huge favor (since I don't have access to UCX/IB networks)? Could you try that reproducer on v4.0.0? I'd like to know how far back this issue goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants