-
Notifications
You must be signed in to change notification settings - Fork 935
mpi/start: change semantics of start #1726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@bosilca This fixes a hang in osc/pt2pt introduced by the request rework. Not sure why the code was not hanging before but I know why it was hanging. The start code re-allocating the request was dropping the callback pointer from the request. |
|
Using mellanox jenkins to test cm. Is passing MTT with ob1 on my mac. |
There were several problems with the implementation of start in Open MPI: - There are no checks whatsoever on the state of the request(s) provided to MPI_Start/MPI_Start_all. It is erroneous to provide an active request to either of these calls. Since we are already looping over the provided requests there is little overhead in verifying that the request can be started. - Both ob1 and cm were always throwing away the request on the initial call to start and start_all with a particular request. Subsequent calls would see that the request was pml_complete and reuse it. This 1) introduced a leak as the initial request was never freed, and 2) is unnecessary. I removed the code to reallocate the request. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
|
Need to rebase this. Fixing now. |
|
:bot:retest: |
|
Looks like cm is ok with this change. The semantics of start in ob1 and cm were identical so its no surprise. |
Starting a new receive may cause a recursive call into the pt2pt frag receive function. If this happens and the prior request is on the garbage collection list it could cause problems. This commit moves the gc insert until after the new request has been posted. Signed-off-by: Nathan Hjelm <hjelmn@me.com>
|
testing mpirun --timeout param |
|
bot:retest |
|
@rhc54 - timeout it ON! 10:08:53 + /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/bin/oshrun -np 8 --bind-to core -x SHMEM_SYMMETRIC_HEAP_SIZE=1024M -get-stack-traces -timeout 300 --mca spml yoda -mca pml ob1 -mca btl self,vader /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/oshmem_symmetric_data
10:08:54 Target on PE 6 is 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10:08:54 Target on PE 3 is 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10:08:54 Target on PE 1 is 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10:08:54 Target on PE 5 is 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10:08:54 Target on PE 7 is 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10:08:54 Target on PE 4 is 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10:08:54 Target on PE 2 is 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 |
|
@miked-mellanox @jladd-mlnx Don't forget to use |
|
thanks, fixed. |
|
This brings a major change in the way we handle persistent requests. The original code allowed a persistent request to be started even if the request was only MPI complete but not PML complete. If I understand correctly the new code, we are loosing this capability, and I do not see a check to make sure that we are not restarting a request that is not yet PML complete. |
|
Hmm, good point George. I didn't think about that case. Let me give it some thought and see if I can come up with a way to plug the leak and fix the missing callback without removing that feature. It should only affect send requests as receive requests are always both mpi and pml complete at the same time. |
|
I see the path that is affected. mca_pml_ob1_send_request_start_buffered is the only path in ob1 that is not both pml and mpi complete at the same time. It should be sufficient to add code to re-allocate buffered send requests if they are not pml complete. Speaking of buffered send. Do you think there is any interest in deprecating the feature in MPI-4.0. From an implementation perspective they do not buy the user much if anything as we buffer in the pml for small/medium sends. |
|
There are two choices I see for the above case. 1) we can re-allocate the request and copy the request's attributes to the new one (callbacks, etc), or 2) spin until the request is pml complete. I lean towards the later but the former is not hard to restore. @bosilca Thoughts? |
|
Until I figure out how to get this cleanup working with the case @bosilca mentioned I will close this PR and open a new one to fix just the callback issue. |
|
@bosilca I think I have it fixed. Took less work than I expected. In order to detect the difference between a new request (fresh from isend_init) and an incomplete buffered send both the cm and ob1 isend_init functions now mark the request as pml_complete. This allows start to accurately use pml_complete to decide if a new request is needed. I kept the remainder of the cleanup as we know that no receive request can be in this state. Let me know if you are happy with the changes. I want to attach this to the request rework as osc/pt2pt hangs without this. |
There were several problems with the implementation of start in Open
MPI:
to MPI_Start/MPI_Start_all. It is erroneous to provide an active
request to either of these calls. Since we are already looping over
the provided requests there is little overhead in verifying that
the request can be started.
initial call to start and start_all. Subsequent calls would see
that the request was pml_complete and reuse it. This 1) introduced
a leak as the initial request was never freed, and 2) is
unnecessary. I removed the code to reallocate the request.
Signed-off-by: Nathan Hjelm hjelmn@lanl.gov