Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

@ggouaillardet
Copy link
Contributor

No description provided.

@rhc54
Copy link

rhc54 commented Jul 28, 2016

If we are going to handle the timeout up in the server, then we probably need to remove it from the pmix base function else we'll have a race to see who times out first.

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1985/ for details.

@ggouaillardet ggouaillardet force-pushed the topic/v2.x/pmix_typos_and_timeout branch from 5cc1824 to b440a37 Compare July 28, 2016 04:12
@ggouaillardet
Copy link
Contributor Author

i updated the PR,
now if you OPAL_PMIX_EXCHANGE(...,0) then MPI_Comm_accept() and MPI_Comm_connect() might hang forever.

under the hood, the timeout is triggered in pmix by libevent, and then it invokes the orted eviction callback.
if the timeout is 60 seconds, a timeout event will be generated every 2 seconds, and orted will check-in again into the opal hotel until the 60 seconds timeout expires.
if there is no pmix timeout, then the orted eviction callback is never called.

i do not understand what you mean by race condition.
the pmix timeout calls the orted "timeout" (simple function call, not through libevent)

@ggouaillardet
Copy link
Contributor Author

Refs open-mpi/ompi#1905

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1986/ for details.

@rhc54 rhc54 added the bug label Jul 28, 2016
@rhc54 rhc54 added this to the v2.0.1 milestone Jul 28, 2016
@rhc54 rhc54 self-assigned this Jul 28, 2016
@rhc54
Copy link

rhc54 commented Jul 28, 2016

You are absolutely correct - my bad. I was mistakenly thinking that we also did a timeout event down in the opal layer as well.

Can you please bring the relevant parts back to master as well?

Thanks!

👍

@jsquyres
Copy link
Member

Please comment back here on this PR after this stuff has been committed to master and gone through a night of MTT.

@rhc54
Copy link

rhc54 commented Jul 28, 2016

went into master now

@jsquyres
Copy link
Member

@hppritcha Good to go.

@hppritcha hppritcha merged commit 670858a into open-mpi:v2.x Aug 1, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants