Skip to content

Conversation

@bosilca
Copy link
Member

@bosilca bosilca commented Aug 5, 2019

A backport of the datatype improvements on the 4.0.

Few things to mention:

  • I had to pull few other commits to solve some of the conflicts, but the changes are minor and mostly necessary anyway. The 2 commits I imported in addition to the original PR are 4211925 and d141bf7.
  • I had to manually alter one of the patches to work around the atomic types added in 000f9ee and that have not been yet backported to the stable branch.

bosilca added 11 commits August 5, 2019 09:33
Fixes open-mpi#6575.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Move toward a base type of vector (count, type, blocklen, extent, disp)
with disp and extent applying toward the count repertition and blocklen
being a contiguous memory of type type.
Implement 2 optimizations on this description used during type_commit:
- collapse: successive similar datatype descriptions are collapsed
together with an increased count.
- fusion: fuse successive datatype descriptions in order to minimize the
number of resulting memcpy during pack/unpack.

Fixes at the OMPI datatype level including:
 - Fix the create_hindexed and vector creation.
 - Fix the handling of [get|set]_elements and _count.
 - Correctly compute the dispacement for block indexed types.
 - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Update the comments to better reflect what is going on.
Minor indentations.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Merge contiguous iov in order to minimize the number of returned iovec.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Rework the to_self test to be able to be used as a benchmark.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
- optimize handling of contiguous with gaps datatypes.
- fixes a performance issue for all datatypes with a count of 1.
- optimize the pack/unpack of contiguous with gaps datatype.
- optimize the case of blocklen == 1

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Upon detecting a datatype loop representation skip the entire loop
according the the remaining space.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Optimize contiguous loops by collapsing them into a single element.
During datatype optimization collapse similar elements into larger
blocks.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Amazing how a bad instruction scheduling can have such a drastic impact
on the code performance. With this change, the get a boost of at least
50% on the performance of data with a small blocklen and/or count.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Start optimizing the code.

This commit divides the operations in 2 parts, the first, outside the
critical part, deals with partial blocks of predefined elements, and the
second, inside the critical path, only deals with full blocks of
elements. This reduces the number of expensive operations in the
critical path and results in a decent performance increase.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
@gpaulsen
Copy link
Member

gpaulsen commented Aug 5, 2019

Thanks @bosilca!

@jjhursey
Copy link
Member

jjhursey commented Aug 5, 2019

bot:ibm:gnu:retest

@open-mpi open-mpi deleted a comment from ibm-ompi Aug 5, 2019
@hppritcha hppritcha added the NEWS label Aug 5, 2019
@hppritcha hppritcha added this to the v4.0.2 milestone Aug 5, 2019
@gpaulsen
Copy link
Member

gpaulsen commented Aug 5, 2019

Fixes: #5540
Master PR: #6695

@gpaulsen gpaulsen changed the title Topic/backport 6695 Refresh of the datatype engine from Topic/backport 6695 Aug 5, 2019
@gpaulsen
Copy link
Member

gpaulsen commented Aug 8, 2019

@derbeyn @ggouaillardet Can either of you please review this v4.0.x backport of PR #6695 Please?

@gpaulsen gpaulsen requested a review from ggouaillardet August 8, 2019 21:41
@gpaulsen
Copy link
Member

gpaulsen commented Aug 8, 2019

Hmm. It won't let me request a review from @derbeyn even though she reviewed the master PR.

@gpaulsen
Copy link
Member

@ggouaillardet Are you able to review this PR? Once this goes in, we may be able to create a v4.0.2 rc1.

Fixes the convertor iovec description on the MPI-IO reported by Edgar.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
No code or logic changes.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
@hppritcha
Copy link
Member

bot:ompi:retest

@gpaulsen
Copy link
Member

@ggouaillardet Can you please review this? This is the only PR blocking a v4.0.2 rc1 build.
Thanks!

@gpaulsen gpaulsen merged commit 390e0bc into open-mpi:v4.0.x Aug 21, 2019
@derbeyn
Copy link
Contributor

derbeyn commented Aug 26, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants