Skip to content

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented May 27, 2016

There were several problems with the implementation of start in Open
MPI:

  • There is not check whatsoever on the state of the request provided
    to MPI_Start/MPI_Start_all. It is erroneous to provide an active
    request to either of these calls. Since we are already looping over
    the provided requests there is little overhead in verifying that
    the request can be started.
  • Both ob1 and cm were always throwing away the request on the
    initial call to start and start_all. Subsequent calls would see
    that the request was pml_complete and reuse it. This 1) introduced
    a leak as the initial request was never freed, and 2) is
    unnecessary. I removed the code to reallocate the request.

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

@hjelmn
Copy link
Member Author

hjelmn commented May 27, 2016

@bosilca This fixes a hang in osc/pt2pt introduced by the request rework. Not sure why the code was not hanging before but I know why it was hanging. The start code re-allocating the request was dropping the callback pointer from the request.

@hjelmn
Copy link
Member Author

hjelmn commented May 27, 2016

Using mellanox jenkins to test cm. Is passing MTT with ob1 on my mac.

There were several problems with the implementation of start in Open
MPI:

 - There are no checks whatsoever on the state of the request(s)
   provided to MPI_Start/MPI_Start_all. It is erroneous to provide an
   active request to either of these calls. Since we are already
   looping over the provided requests there is little overhead in
   verifying that the request can be started.

 - Both ob1 and cm were always throwing away the request on the
   initial call to start and start_all with a particular
   request. Subsequent calls would see  that the request was
   pml_complete and reuse it. This 1) introduced a leak as the initial
   request was never freed, and 2) is unnecessary. I removed the code
   to reallocate the request.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn
Copy link
Member Author

hjelmn commented May 28, 2016

Need to rebase this. Fixing now.

@hjelmn
Copy link
Member Author

hjelmn commented May 28, 2016

:bot:retest:

@hjelmn
Copy link
Member Author

hjelmn commented May 28, 2016

Looks like cm is ok with this change. The semantics of start in ob1 and cm were identical so its no surprise.

Starting a new receive may cause a recursive call into the pt2pt
frag receive function. If this happens and the prior request is
on the garbage collection list it could cause problems. This commit
moves the gc insert until after the new request has been posted.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
@jsquyres
Copy link
Member

@bosilca @thananon Please have a look at this PR; thanks.

@mike-dubman
Copy link
Member

testing mpirun --timeout param

@mike-dubman
Copy link
Member

bot:retest

@mike-dubman
Copy link
Member

@rhc54 - timeout it ON!

10:08:53 + /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/bin/oshrun -np 8 --bind-to core -x SHMEM_SYMMETRIC_HEAP_SIZE=1024M -get-stack-traces -timeout 300 --mca spml yoda -mca pml ob1 -mca btl self,vader /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/oshmem_symmetric_data
10:08:54 Target on PE 6 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 3 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 1 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 5 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 7 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 4 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
10:08:54 Target on PE 2 is  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  
1

@jsquyres
Copy link
Member

@miked-mellanox @jladd-mlnx Don't forget to use --get-stack-traces (and/or --report-state-on-timeout).

@mike-dubman
Copy link
Member

thanks, fixed.

@bosilca
Copy link
Member

bosilca commented May 30, 2016

This brings a major change in the way we handle persistent requests. The original code allowed a persistent request to be started even if the request was only MPI complete but not PML complete. If I understand correctly the new code, we are loosing this capability, and I do not see a check to make sure that we are not restarting a request that is not yet PML complete.

@hjelmn
Copy link
Member Author

hjelmn commented May 30, 2016

Hmm, good point George. I didn't think about that case. Let me give it some thought and see if I can come up with a way to plug the leak and fix the missing callback without removing that feature. It should only affect send requests as receive requests are always both mpi and pml complete at the same time.

@hjelmn
Copy link
Member Author

hjelmn commented May 30, 2016

I see the path that is affected. mca_pml_ob1_send_request_start_buffered is the only path in ob1 that is not both pml and mpi complete at the same time. It should be sufficient to add code to re-allocate buffered send requests if they are not pml complete.

Speaking of buffered send. Do you think there is any interest in deprecating the feature in MPI-4.0. From an implementation perspective they do not buy the user much if anything as we buffer in the pml for small/medium sends.

@hjelmn
Copy link
Member Author

hjelmn commented May 30, 2016

There are two choices I see for the above case. 1) we can re-allocate the request and copy the request's attributes to the new one (callbacks, etc), or 2) spin until the request is pml complete. I lean towards the later but the former is not hard to restore. @bosilca Thoughts?

@hjelmn
Copy link
Member Author

hjelmn commented May 30, 2016

Until I figure out how to get this cleanup working with the case @bosilca mentioned I will close this PR and open a new one to fix just the callback issue.

@hjelmn hjelmn closed this May 30, 2016
@hjelmn
Copy link
Member Author

hjelmn commented May 30, 2016

@bosilca I think I have it fixed. Took less work than I expected. In order to detect the difference between a new request (fresh from isend_init) and an incomplete buffered send both the cm and ob1 isend_init functions now mark the request as pml_complete. This allows start to accurately use pml_complete to decide if a new request is needed. I kept the remainder of the cleanup as we know that no receive request can be in this state. Let me know if you are happy with the changes. I want to attach this to the request rework as osc/pt2pt hangs without this.

@hjelmn hjelmn mentioned this pull request May 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants