Skip to content

Conversation

@xinzhao3
Copy link
Contributor

This PR fixes the bug in alltoall algorithm in oshmem/mca/scoll/basic. When it calculates dst PE idx, it does index permutation to achieve better distribution of traffic (get_dst_pe() function), therefore, the src block index should be re-calculated from that new dst PE index, not starting from 0.

Signed-off-by: Xin Zhao xinz@mellanox.com

@xinzhao3
Copy link
Contributor Author

@jladd-mlnx @bureddy

@jsquyres
Copy link
Member

bot:ompi:retest

@jladd-mlnx jladd-mlnx requested a review from alex-mikheev March 28, 2018 15:38
@jladd-mlnx
Copy link
Member

@alex-mikheev Please take a look.

@jladd-mlnx
Copy link
Member

@gpaulsen FYI.

@sam6258
Copy link
Contributor

sam6258 commented Mar 28, 2018

@xinzhao3 I pulled down your changes and the test case that produced the original issue is running smoothly!

@xinzhao3
Copy link
Contributor Author

@sam6258 great! Thanks.

Copy link
Contributor

@alex-mikheev alex-mikheev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Please also add a test to the oshmem verifier at https://github.com/openshmem-org/tests-mellanox


dst_pe = get_dst_pe(group, src_blk_idx, dst_blk_idx);
dst_pe = get_dst_pe(group, i, dst_blk_idx);
src_blk_idx = oshmem_proc_group_find_id(group, dst_pe);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling oshmem_proc_group_find_id should avoided as it does a linear lookup.
the index should be (dst_blk_idx + src_blk_idx) % group->proc_count and it is already computed in get_dst_pe()

@xinzhao3
Copy link
Contributor Author

@alex-mikheev I re-pushed, deleting oshmem_proc_group_find_id(). @sam6258 could you test the branch again?

for (i = 0; i < group->proc_count; i++) {

dst_pe = get_dst_pe(group, src_blk_idx, dst_blk_idx);
dst_pe = get_dst_pe(group, i, dst_blk_idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not correct now. I suggest modifying get_dst_pe so that you get back (dst_pe, dst_pe_index) where
dst_pe_index = (dst_blk_idx + src_blk_idx) % group->proc_count;
dst_pe = oshmem_proc_pe(group->proc_array[dst_pe_index])

Signed-off-by: Xin Zhao <xinz@mellanox.com>
@xinzhao3 xinzhao3 force-pushed the topic/shmem-alltoall branch from 3ce5fb0 to af32c30 Compare March 29, 2018 20:03
@xinzhao3
Copy link
Contributor Author

@alex-mikheev I modified get_dst_pe(), could you review again?

@sam6258
Copy link
Contributor

sam6258 commented Mar 30, 2018

@xinzhao3 My test still passes. Thanks!

@jladd-mlnx jladd-mlnx merged commit 0e6966f into open-mpi:master Apr 3, 2018
@jladd-mlnx
Copy link
Member

@xinzhao3 please open a PR against 3.0.x and 3.1.x branches.

@xinzhao3
Copy link
Contributor Author

xinzhao3 commented Apr 3, 2018

@jladd-mlnx will do today asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants