- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
pml/ob1: bug fixes #1758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pml/ob1: bug fixes #1758
Conversation
This commit fixes two bugs in pml/ob1: - Do not called MCA_PML_OB1_PROGRESS_PENDING from mca_pml_ob1_send_request_start_copy as this may lead to a recursive call to mca_pml_ob1_send_request_process_pending. - In mca_pml_ob1_send_request_start_rdma return the rdma frag object if a btl fragment can not be allocated. This fixes a leak identified by @abouteiller and @bosilca. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
| 
           @bosilca @abouteiller We were leaking rdma frags in ob1 if the btl ran out of fragments. This fixes that and another issue identified by your benchmark.  | 
    
| 
           @miked-mellanox False failure. Looks like it tried to use rdmacm without changing the receive queue pair list to use a per-peer for QP 0.  | 
    
| 
           @hjelmn - why it is needed? v1.10 same command line works fine: $/hpc/local/benchmarks/hpcx_install/hpcx-gcc/ompi-v1.10/bin/mpirun -np 2 --mca btl_openib_cpc_include rdmacm  -mca btl_openib_warn_default_gid_prefix 0  -mca pml ob1 -mca btl self,openib -mca btl_if_include mlx4_0:2 /usr/mpi/gcc/openmpi-1.10.3rc4/tests/osu-micro-benchmarks-5.2/osu_bw
# OSU MPI Bandwidth Test v5.2
# Size      Bandwidth (MB/s)
1                       2.67
2                       5.58
4                      11.10
8                      22.17
16                     44.33
32                     86.16
64                    171.67
128                   289.57
256                   585.56
512                  1057.71
1024                 1825.01
2048                 3066.73
4096                 4491.07
8192                 5625.41
16384                6786.90
32768                7676.25
65536                7925.74
131072               8018.55
262144               8077.81
524288               8109.15
1048576              8115.67
2097152              7337.95
4194304              7220.15 | 
    
| 
           @miked-mellanox Because using per-peer QPs by default was bad for our out-of-the box performance. From 2.0.0 and beyond SRQ is used exclusively by default.  | 
    
| 
           thanks  | 
    
| 
           added this to the test: -mca btl_openib_receive_queues P,65536,256,192,128:S,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64could you please explain the bug in rdmacm as you see it?  | 
    
| 
           bot:retest  | 
    
| 
           @miked-mellanox Yeah, for some time we have required a per-peer receive queue for any connection method that needs uses a clear-to-send protocol. I don't know if that can be improved but it has been a long-standing "feature". There hasn't been any effort to investigate allowing rdmacm to be used when using SRQ QPs only since the scaling of per-peer doesn't really matter when using rdmacm (which tended to scale poorly).  | 
    
| 
           Thanks to @rhc54 timeout feaure - 09:41:45    Thread 1 (Thread 0x7ffff73e2700 (LWP 32132)):
09:41:45    #0  0x0000003d6980b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
09:41:45    #1  0x00007fffeeb74bc9 in rdmacm_endpoint_finalize (endpoint=0x816be0) at connect/btl_openib_connect_rdmacm.c:1231
09:41:45    #2  0x00007fffeeb5fd2b in mca_btl_openib_endpoint_destruct (endpoint=0x816be0) at btl_openib_endpoint.c:368
09:41:45    #3  0x00007fffeeb489b7 in opal_obj_run_destructors (object=0x816be0) at ../../../../opal/class/opal_object.h:460
09:41:45    #4  0x00007fffeeb4dddb in mca_btl_openib_del_procs (btl=0x726f50, nprocs=1, procs=0x7fffffffc828, peers=0x804f80) at btl_openib.c:1316
09:41:45    #5  0x00007fffeefa5159 in mca_bml_r2_del_procs (nprocs=8, procs=0x78d5c0) at bml_r2.c:623
09:41:45    #6  0x00007fffee4af612 in mca_pml_ob1_del_procs (procs=0x78d5c0, nprocs=8) at pml_ob1.c:455
09:41:45    #7  0x00007ffff7ca50d4 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:333
09:41:45    #8  0x00007ffff7cd7525 in PMPI_Finalize () at pfinalize.c:47
09:41:45    #9  0x0000000000400890 in main (argc=1, argv=0x7fffffffcbf8) at hello_c.c:24
 | 
    
| 
           @hjelmn This is hanging.  | 
    
| 
           :bot:retest:  | 
    
| 
           @jladd-mlnx Can not reproduce your hang on a Connect X3 cluster. Anyway, it is totally unrelated to this commit so it is being merged.  | 
    
| 
           @Di0gen when did mellanox add the additional hello_c tests to the jenkins script - namely those that are using rdmacm? Is this the first PR that was tested using what appears to be a new jenkins script?  | 
    
| 
           @hppritcha - it was added yesterday:  | 
    
| 
           I confirm this solves the memory leak issue. 👍  | 
    
This commit fixes two bugs in pml/ob1:
mca_pml_ob1_send_request_start_copy as this may lead to a recursive
call to mca_pml_ob1_send_request_process_pending.
if a btl fragment can not be allocated. This fixes a leak
identified by @abouteiller and @bosilca.
Signed-off-by: Nathan Hjelm hjelmn@lanl.gov