Skip to content

Conversation

@wckzhang
Copy link
Contributor

When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have
sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when
this limit is crossed.

Check the max allowed number of ranks during add_procs() and return if there is
danger of exceeding this threshold.

Signed-off-by: Aravind Gopalakrishnan Aravind.Gopalakrishnan@intel.com

@wckzhang
Copy link
Contributor Author

ompi_mtl_ofi_add_procs returning failure would cause a segfault. Cherry-picked 5cf43de to add check for the segfault.

@wckzhang
Copy link
Contributor Author

Cray seems broken, can anybody check it out?

@jsquyres
Copy link
Member

Cray sometimes just times out. 😦

bot:ompi:retest

Copy link
Member

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wckzhang this looks like a cherry pick from master? if so, you should add the -x argument to the cherry-pick command line, so that it adds the “cherry-pick from” message in the commit message.

When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have
sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when
this limit is crossed.

Check the max allowed number of ranks during add_procs() and return if there is
danger of exceeding this threshold.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 5cf43de)
@wckzhang
Copy link
Contributor Author

@wckzhang this looks like a cherry pick from master? if so, you should add the -x argument to the cherry-pick command line, so that it adds the “cherry-pick from” message in the commit message.

Done, thanks.

@ibm-ompi
Copy link

The IBM CI (XL Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/ibm-ompi/88b480fb98dfd2bdac939f023e1f8fe6

@ibm-ompi
Copy link

The IBM CI (GNU Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/ibm-ompi/88b480fb98dfd2bdac939f023e1f8fe6

@wckzhang
Copy link
Contributor Author

IBM CI failed, can someone check that?

@jjhursey
Copy link
Member

bot:ibm:gnu:retest
Somehow the prrte PR mixed with this PR. I think I figured out what happened. Let's see if this works.

@jjhursey
Copy link
Member

bot:ibm:retest

@hppritcha hppritcha self-requested a review January 3, 2020 19:59
@gpaulsen gpaulsen dismissed bwbarrett’s stale review January 13, 2020 20:12

@wckzhang, recherry-picked with -x

@gpaulsen gpaulsen requested a review from bwbarrett January 13, 2020 20:14
@gpaulsen
Copy link
Member

Thanks.

@gpaulsen gpaulsen merged commit 3da939b into open-mpi:v4.0.x Jan 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants