mpi/start: change semantics of start #1726

hjelmn · 2016-05-27T22:03:45Z

There were several problems with the implementation of start in Open
MPI:

There is not check whatsoever on the state of the request provided
to MPI_Start/MPI_Start_all. It is erroneous to provide an active
request to either of these calls. Since we are already looping over
the provided requests there is little overhead in verifying that
the request can be started.
Both ob1 and cm were always throwing away the request on the
initial call to start and start_all. Subsequent calls would see
that the request was pml_complete and reuse it. This 1) introduced
a leak as the initial request was never freed, and 2) is
unnecessary. I removed the code to reallocate the request.

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

hjelmn · 2016-05-27T22:04:57Z

@bosilca This fixes a hang in osc/pt2pt introduced by the request rework. Not sure why the code was not hanging before but I know why it was hanging. The start code re-allocating the request was dropping the callback pointer from the request.

hjelmn · 2016-05-27T22:08:25Z

Using mellanox jenkins to test cm. Is passing MTT with ob1 on my mac.

There were several problems with the implementation of start in Open MPI: - There are no checks whatsoever on the state of the request(s) provided to MPI_Start/MPI_Start_all. It is erroneous to provide an active request to either of these calls. Since we are already looping over the provided requests there is little overhead in verifying that the request can be started. - Both ob1 and cm were always throwing away the request on the initial call to start and start_all with a particular request. Subsequent calls would see that the request was pml_complete and reuse it. This 1) introduced a leak as the initial request was never freed, and 2) is unnecessary. I removed the code to reallocate the request. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

hjelmn · 2016-05-28T00:02:06Z

Need to rebase this. Fixing now.

hjelmn · 2016-05-28T00:04:22Z

:bot:retest:

hjelmn · 2016-05-28T01:31:04Z

Looks like cm is ok with this change. The semantics of start in ob1 and cm were identical so its no surprise.

Starting a new receive may cause a recursive call into the pt2pt frag receive function. If this happens and the prior request is on the garbage collection list it could cause problems. This commit moves the gc insert until after the new request has been posted. Signed-off-by: Nathan Hjelm <hjelmn@me.com>

jsquyres · 2016-05-28T11:00:27Z

@bosilca @thananon Please have a look at this PR; thanks.

mike-dubman · 2016-05-29T06:26:52Z

testing mpirun --timeout param

mike-dubman · 2016-05-29T06:26:56Z

bot:retest

mike-dubman · 2016-05-29T10:57:57Z

@rhc54 - timeout it ON!

10:08:53 + /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/bin/oshrun -np 8 --bind-to core -x SHMEM_SYMMETRIC_HEAP_SIZE=1024M -get-stack-traces -timeout 300 --mca spml yoda -mca pml ob1 -mca btl self,vader /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/oshmem_symmetric_data
10:08:54 Target on PE 6 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 3 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 1 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 5 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 7 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 4 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 2 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
1

jsquyres · 2016-05-29T11:50:04Z

@miked-mellanox @jladd-mlnx Don't forget to use --get-stack-traces (and/or --report-state-on-timeout).

mike-dubman · 2016-05-29T11:54:54Z

thanks, fixed.

bosilca · 2016-05-30T15:54:47Z

This brings a major change in the way we handle persistent requests. The original code allowed a persistent request to be started even if the request was only MPI complete but not PML complete. If I understand correctly the new code, we are loosing this capability, and I do not see a check to make sure that we are not restarting a request that is not yet PML complete.

hjelmn · 2016-05-30T16:07:11Z

Hmm, good point George. I didn't think about that case. Let me give it some thought and see if I can come up with a way to plug the leak and fix the missing callback without removing that feature. It should only affect send requests as receive requests are always both mpi and pml complete at the same time.

hjelmn · 2016-05-30T16:14:19Z

I see the path that is affected. mca_pml_ob1_send_request_start_buffered is the only path in ob1 that is not both pml and mpi complete at the same time. It should be sufficient to add code to re-allocate buffered send requests if they are not pml complete.

Speaking of buffered send. Do you think there is any interest in deprecating the feature in MPI-4.0. From an implementation perspective they do not buy the user much if anything as we buffer in the pml for small/medium sends.

hjelmn · 2016-05-30T16:19:22Z

There are two choices I see for the above case. 1) we can re-allocate the request and copy the request's attributes to the new one (callbacks, etc), or 2) spin until the request is pml complete. I lean towards the later but the former is not hard to restore. @bosilca Thoughts?

hjelmn · 2016-05-30T16:22:42Z

Until I figure out how to get this cleanup working with the case @bosilca mentioned I will close this PR and open a new one to fix just the callback issue.

hjelmn · 2016-05-30T17:25:50Z

@bosilca I think I have it fixed. Took less work than I expected. In order to detect the difference between a new request (fresh from isend_init) and an incomplete buffered send both the cm and ob1 isend_init functions now mark the request as pml_complete. This allows start to accurately use pml_complete to decide if a new request is needed. I kept the remainder of the cleanup as we know that no receive request can be in this state. Let me know if you are happy with the changes. I want to attach this to the request rework as osc/pt2pt hangs without this.

hjelmn force-pushed the request_fixes branch from c83765c to 74cde6d Compare May 27, 2016 22:08

hjelmn force-pushed the request_fixes branch from 74cde6d to 29faf28 Compare May 27, 2016 22:28

hjelmn force-pushed the request_fixes branch from 29faf28 to c91574a Compare May 27, 2016 22:30

hjelmn closed this May 30, 2016

hjelmn mentioned this pull request May 30, 2016

start bug fixes #1729

Merged

mpi/start: change semantics of start #1726

mpi/start: change semantics of start #1726

Uh oh!

Conversation

hjelmn commented May 27, 2016

Uh oh!

hjelmn commented May 27, 2016

Uh oh!

hjelmn commented May 27, 2016

Uh oh!

hjelmn commented May 28, 2016

Uh oh!

hjelmn commented May 28, 2016

Uh oh!

hjelmn commented May 28, 2016

Uh oh!

jsquyres commented May 28, 2016

Uh oh!

mike-dubman commented May 29, 2016

Uh oh!

mike-dubman commented May 29, 2016

Uh oh!

mike-dubman commented May 29, 2016

Uh oh!

jsquyres commented May 29, 2016

Uh oh!

mike-dubman commented May 29, 2016

Uh oh!

bosilca commented May 30, 2016

Uh oh!

hjelmn commented May 30, 2016

Uh oh!

hjelmn commented May 30, 2016

Uh oh!

hjelmn commented May 30, 2016

Uh oh!

hjelmn commented May 30, 2016

Uh oh!

hjelmn commented May 30, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants