Skip to content

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Jun 7, 2016

The rdma_disconnect function specifies that both the server and client
should call rdma_disconnect. The code was not calling rdma_disconnect
on an endpoint if the event came before the endpoint finalization.

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

The rdma_disconnect function specifies that both the server and client
should call rdma_disconnect. The code was not calling rdma_disconnect
on an endpoint if the event came before the endpoint finalization.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn
Copy link
Member Author

hjelmn commented Jun 7, 2016

This was the only error I could see in the disconnect path. Will see if this fixes jenkins.

@jladd-mlnx
Copy link
Member

bot:retest

@hjelmn
Copy link
Member Author

hjelmn commented Jun 8, 2016

I think this is a correct change given the wording in the man page. Probably needs to be tested by chelsio.

@larrystevenwise
Copy link

@bharatpotnuri - please test this out and also perhaps it fixes the current stall we're seeing?

@bharatpotnuri
Copy link
Contributor

bharatpotnuri commented Jun 9, 2016

commit 17ae1ac works fine for iWARP.
I don't see the stall at MPI_Finalize which was seen earlier today with commit 80e362d. @larrystevenwise coming to the stall in gather test with higher message size, I am testing it out, I see it with combination of ofed and iwpm core, that should be separate issue I believe.
Thanks Jeff.

@jsquyres
Copy link
Member

jsquyres commented Jun 9, 2016

@bharatpotnuri Github pro tip: if you mention a non-escaped git hash from this repo in a comment, Github's web UI will auto-link it. E.g., 17ae1ac (vs. 17ae1ac, which isn't auto-linked, because it's in verbatim mode).

@hppritcha
Copy link
Member

bot:retest
looking for ucx unload hang again.

@hppritcha
Copy link
Member

bot:retest
looks like NERSC shutdown over weekend has leftover problems.

@lanl-ompi
Copy link
Contributor

Test FAILed.

1 similar comment
@lanl-ompi
Copy link
Contributor

Test FAILed.

@hppritcha
Copy link
Member

bot:retest

@hjelmn
Copy link
Member Author

hjelmn commented Jun 23, 2016

Any update on whether this breaks anything? If it doesn't I would like to merge this.

@jsquyres
Copy link
Member

@larrystevenwise @bharatpotnuri Can you guys comment?

@larrystevenwise
Copy link

@bharatpotnuri please verify this change is good. Thanks!

@bharatpotnuri
Copy link
Contributor

Commit 17ae1ac works fine for Chelsio iWARP.

@hjelmn
Copy link
Member Author

hjelmn commented Jun 25, 2016

@bharatpotnuri Thanks! I will go ahead and commit this. Not sure how rdmacm was working when it wasn't following the spec... Not sure this fixes any issue but at least it doesn't hurt.

@hjelmn hjelmn merged commit dac9201 into open-mpi:master Jun 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants