Skip to content

Conversation

@kawashima-fj
Copy link
Member

@hjelmn Could you review?

Corresponding master PR: #3489.

(cherry picked from commit e453e42)

`ompi_group_t::grp_proc_pointers[i]` may have sentinel values even
for processes which reside in the local node because the array for
`MPI_COMM_WORLD` is set up before `ompi_proc_complete_init`, which
allocates `ompi_proc_t` objects for processes reside in the local
node, is called in `MPI_INIT`. So using `ompi_proc_is_sentinel`
against `ompi_group_t::grp_proc_pointers[i]` in order to determine
whether the process resides in a remote node is not appropriate.

This bug sometimes causes an `MPI_ERR_RMA_SHARED` error when
`MPI_WIN_ALLOCATE_SHARED` is called, where sm OSC uses
`ompi_group_have_remote_peers`.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit e453e42)
@kawashima-fj kawashima-fj added this to the v2.1.2 milestone May 16, 2017
@kawashima-fj kawashima-fj requested a review from hjelmn May 16, 2017 02:27
@hjelmn
Copy link
Member

hjelmn commented May 16, 2017

Ok, much better 👍

@hjelmn
Copy link
Member

hjelmn commented May 16, 2017

will let it go on master for a day or so before approving branch PRs

@kawashima-fj
Copy link
Member Author

Yes, master PR #3489 was merged into master 5 days ago and have no problems in MTT.
(Original PR #3410 had a problem though)

@hjelmn
Copy link
Member

hjelmn commented May 16, 2017

ok, see that now. will approve these now then

@jsquyres
Copy link
Member

@bwbarrett @hppritcha Merging this PR means that you also need to merge the v3.0.x equivalent (#3530).

But otherwise: @hppritcha good to go

@jsquyres jsquyres merged commit 8b40bae into open-mpi:v2.x May 18, 2017
@kawashima-fj kawashima-fj deleted the pr/v2.x/group-remote-peers branch May 22, 2017 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants