-
Notifications
You must be signed in to change notification settings - Fork 932
MTL/OFI: Check threshold number of peers allowed per rank #7248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ompi_mtl_ofi_add_procs returning failure would cause a segfault. Cherry-picked 5cf43de to add check for the segfault. |
|
Cray seems broken, can anybody check it out? |
|
Cray sometimes just times out. 😦 bot:ompi:retest |
bwbarrett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wckzhang this looks like a cherry pick from master? if so, you should add the -x argument to the cherry-pick command line, so that it adds the “cherry-pick from” message in the commit message.
When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when this limit is crossed. Check the max allowed number of ranks during add_procs() and return if there is danger of exceeding this threshold. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com> (cherry picked from commit 5cf43de)
Done, thanks. |
|
The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/ibm-ompi/88b480fb98dfd2bdac939f023e1f8fe6 |
|
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/ibm-ompi/88b480fb98dfd2bdac939f023e1f8fe6 |
|
IBM CI failed, can someone check that? |
|
bot:ibm:gnu:retest |
|
bot:ibm:retest |
@wckzhang, recherry-picked with -x
|
Thanks. |
When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have
sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when
this limit is crossed.
Check the max allowed number of ranks during add_procs() and return if there is
danger of exceeding this threshold.
Signed-off-by: Aravind Gopalakrishnan Aravind.Gopalakrishnan@intel.com