-
Notifications
You must be signed in to change notification settings - Fork 68
btl/ugni: actually make the endpoint lock recursive #1338
Conversation
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (cherry picked from commit open-mpi/ompi@83062db) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
|
:bot🏷️bug Missed this when transcribing the patch. |
|
@hjelmn What bug does this solve? |
|
A bug I introduced. If MPI_THREAD_MULTIPLE is on it deadlocks on the first connection. Couldn't test after transcribing because none of our open Cray systems were up. Hit the bug immediately once I got to test. |
|
Half of the fix made it. See https://github.com/open-mpi/ompi-release/blob/v2.x/opal/mca/btl/ugni/btl_ugni_endpoint.h#L37 |
|
Also, this does not require an rc2 IMO. |
|
Test FAILed. |
1 similar comment
|
Test FAILed. |
|
@hppritcha Assuming you can review, I'm ok with this. |
|
LANL jenkins blew up. :bot:lanl:retest |
|
Test FAILed. |
1 similar comment
|
Test FAILed. |
|
@hppritcha LANL jenkins looks like it is busted. |
|
Test FAILed. |
|
And now the mellanox Jenkins is belly-up. |
|
bot:mellanox:retest |
|
Test FAILed. |
|
Test PASSed. |
|
bot:lanl:retest |
|
Test FAILed. |
1 similar comment
|
Test FAILed. |
|
UH network problems hitting of some sort. |
|
lets wait for CI modulo the mess up at UH then merge. |
|
I can login onto the node without issues, but I know that there were some network issues the last two days. |
|
bot:ibm:retest |
|
@edgargabriel says UH is messed up while school starts up so we'll ignore dlopen and distcheck tests for this PR. |
|
not objecting to the decision, but I am still a bit surprised that it 'hangs' since both of my pr's that I filed today got through that point without major issues. |
|
bot:lanl:retest |
Signed-off-by: Nathan Hjelm hjelmn@lanl.gov
(cherry picked from commit open-mpi/ompi@83062db)
Signed-off-by: Nathan Hjelm hjelmn@lanl.gov