Skip to content

Conversation

@tkordenbrock
Copy link
Member

@tkordenbrock tkordenbrock commented Jul 25, 2017

This PR adds a timeout to each fragment of a rendezvous get. If any fragment times out or fails, the entire receive fails.

(cherry picked from master commits 06b15ce, 99453e6, 37766d7, 5ecd905)

Signed-off-by: Todd Kordenbrock thkgcode@gmail.com

plesn and others added 4 commits July 25, 2017 09:45
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
If the a frag cannot be retried because the ni_fail_type is other than
PTL_NI_DROPPED, then set the return type and jump to callback_error.
This sets MPI_ERROR and completes the receive.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
Rearrange the receive frag timeout logic to avoid calling
opal_timer_base_get_usec() in read_msg().  Instead set it at the first
retry.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
Copy link
Contributor

@regrant regrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go.

@bwbarrett
Copy link
Member

@tkordenbrock / @regrant , are these cherry picks from master? If so, can you add a note to that effect with the master commit id?

@tkordenbrock
Copy link
Member Author

@bwbarrett Sorry about that. PR updated with master commits.

@bwbarrett bwbarrett merged commit 6d4ad03 into open-mpi:v3.0.x Aug 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants