Skip to content

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Jun 30, 2016

This commit fixes a long standing bug in rdmacm. It is required that
the thread that calls mca_btl_openib_endpoint_cpc_complete holds the
endpoint lock. This was not the case for rdmacm. This causes debug
builds to abort.

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

@hjelmn
Copy link
Member Author

hjelmn commented Jun 30, 2016

@jladd-mlnx The rdmacm fix exposed this long standing bug. Caught by Jenkins.

@hjelmn hjelmn force-pushed the rdmacm_fix branch 2 times, most recently from bbb321c to 6dcb09e Compare June 30, 2016 19:57
@ibm-ompi
Copy link

Build Failed with XL compiler! Please review the log, and get in touch if you have questions.

This commit fixes a long standing bug in rdmacm. It is required that
the thread that calls mca_btl_openib_endpoint_cpc_complete holds the
endpoint lock. This was not the case for rdmacm. This causes debug
builds to abort. This change also required changing
mca_btl_openib_endpoint_send_cts to require the endpoint lock to be
held when calling.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@ibm-ompi
Copy link

Build Failed with XL compiler! Please review the log, and get in touch if you have questions.

Gist: https://gist.github.com/95bdd56fe7f803ed1ecde6bc9a777222

@hjelmn
Copy link
Member Author

hjelmn commented Jun 30, 2016

@jjhursey Failure seems unrelated to this PR:

[p10a602:182713] listen_thread: accept() failed: Invalid argument (22).
+ RC=1
+ echo 'IBM_CI_FAIL : Run examples'
IBM_CI_FAIL : Run examples

@ibm-ompi
Copy link

Build Failed with GNU compiler! Please review the log, and get in touch if you have questions.

Gist: https://gist.github.com/2d03072a40e4bff2c07b7a8053ee4365

@jjhursey
Copy link
Member

Yeah I think that something odd on the machine. I'm turning off the 'run example' part of the CI test for now, while I diagnose. Let's run that again and see if it passes.

bot:ibm:retest

@hjelmn
Copy link
Member Author

hjelmn commented Jun 30, 2016

Got past rdmacm! Lost it on the threading bug.

@hjelmn hjelmn merged commit 2cf0e5d into open-mpi:master Jun 30, 2016
@ibm-ompi
Copy link

Build Failed with XL compiler! Please review the log, and get in touch if you have questions.

Gist: https://gist.github.com/ibm-ompi/2746fecf92fda4d80ff1b80b23edbfc7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants