Skip to content

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Jun 6, 2016

This commit fixes two bugs in pml/ob1:

  • Do not called MCA_PML_OB1_PROGRESS_PENDING from
    mca_pml_ob1_send_request_start_copy as this may lead to a recursive
    call to mca_pml_ob1_send_request_process_pending.
  • In mca_pml_ob1_send_request_start_rdma return the rdma frag object
    if a btl fragment can not be allocated. This fixes a leak
    identified by @abouteiller and @bosilca.

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

This commit fixes two bugs in pml/ob1:

 - Do not called MCA_PML_OB1_PROGRESS_PENDING from
   mca_pml_ob1_send_request_start_copy as this may lead to a recursive
   call to mca_pml_ob1_send_request_process_pending.

 - In mca_pml_ob1_send_request_start_rdma return the rdma frag object
   if a btl fragment can not be allocated. This fixes a leak
   identified by @abouteiller and @bosilca.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn
Copy link
Member Author

hjelmn commented Jun 6, 2016

@bosilca @abouteiller We were leaking rdma frags in ob1 if the btl ran out of fragments. This fixes that and another issue identified by your benchmark.

@hjelmn
Copy link
Member Author

hjelmn commented Jun 7, 2016

@miked-mellanox False failure. Looks like it tried to use rdmacm without changing the receive queue pair list to use a per-peer for QP 0.

@mike-dubman
Copy link
Member

@hjelmn - why it is needed? v1.10 same command line works fine:

$/hpc/local/benchmarks/hpcx_install/hpcx-gcc/ompi-v1.10/bin/mpirun -np 2 --mca btl_openib_cpc_include rdmacm  -mca btl_openib_warn_default_gid_prefix 0  -mca pml ob1 -mca btl self,openib -mca btl_if_include mlx4_0:2 /usr/mpi/gcc/openmpi-1.10.3rc4/tests/osu-micro-benchmarks-5.2/osu_bw
# OSU MPI Bandwidth Test v5.2
# Size      Bandwidth (MB/s)
1                       2.67
2                       5.58
4                      11.10
8                      22.17
16                     44.33
32                     86.16
64                    171.67
128                   289.57
256                   585.56
512                  1057.71
1024                 1825.01
2048                 3066.73
4096                 4491.07
8192                 5625.41
16384                6786.90
32768                7676.25
65536                7925.74
131072               8018.55
262144               8077.81
524288               8109.15
1048576              8115.67
2097152              7337.95
4194304              7220.15

@hjelmn
Copy link
Member Author

hjelmn commented Jun 7, 2016

@miked-mellanox Because using per-peer QPs by default was bad for our out-of-the box performance. From 2.0.0 and beyond SRQ is used exclusively by default.

@mike-dubman
Copy link
Member

thanks
so, does it mean rdmacm failure for SRQ type?

@mike-dubman
Copy link
Member

added this to the test:

-mca btl_openib_receive_queues P,65536,256,192,128:S,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64

could you please explain the bug in rdmacm as you see it?

@mike-dubman
Copy link
Member

bot:retest

@hjelmn
Copy link
Member Author

hjelmn commented Jun 7, 2016

@miked-mellanox Yeah, for some time we have required a per-peer receive queue for any connection method that needs uses a clear-to-send protocol. I don't know if that can be improved but it has been a long-standing "feature". There hasn't been any effort to investigate allowing rdmacm to be used when using SRQ QPs only since the scaling of per-peer doesn't really matter when using rdmacm (which tended to scale poorly).

@mike-dubman
Copy link
Member

Thanks to @rhc54 timeout feaure -

09:41:45    Thread 1 (Thread 0x7ffff73e2700 (LWP 32132)):
09:41:45    #0  0x0000003d6980b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
09:41:45    #1  0x00007fffeeb74bc9 in rdmacm_endpoint_finalize (endpoint=0x816be0) at connect/btl_openib_connect_rdmacm.c:1231
09:41:45    #2  0x00007fffeeb5fd2b in mca_btl_openib_endpoint_destruct (endpoint=0x816be0) at btl_openib_endpoint.c:368
09:41:45    #3  0x00007fffeeb489b7 in opal_obj_run_destructors (object=0x816be0) at ../../../../opal/class/opal_object.h:460
09:41:45    #4  0x00007fffeeb4dddb in mca_btl_openib_del_procs (btl=0x726f50, nprocs=1, procs=0x7fffffffc828, peers=0x804f80) at btl_openib.c:1316
09:41:45    #5  0x00007fffeefa5159 in mca_bml_r2_del_procs (nprocs=8, procs=0x78d5c0) at bml_r2.c:623
09:41:45    #6  0x00007fffee4af612 in mca_pml_ob1_del_procs (procs=0x78d5c0, nprocs=8) at pml_ob1.c:455
09:41:45    #7  0x00007ffff7ca50d4 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:333
09:41:45    #8  0x00007ffff7cd7525 in PMPI_Finalize () at pfinalize.c:47
09:41:45    #9  0x0000000000400890 in main (argc=1, argv=0x7fffffffcbf8) at hello_c.c:24

@jladd-mlnx
Copy link
Member

@hjelmn This is hanging.

@hjelmn
Copy link
Member Author

hjelmn commented Jun 7, 2016

:bot:retest:

@hjelmn
Copy link
Member Author

hjelmn commented Jun 7, 2016

@jladd-mlnx Can not reproduce your hang on a Connect X3 cluster. Anyway, it is totally unrelated to this commit so it is being merged.

@hjelmn hjelmn merged commit 3ddf3cc into open-mpi:master Jun 7, 2016
@hppritcha
Copy link
Member

@Di0gen when did mellanox add the additional hello_c tests to the jenkins script - namely those that are using rdmacm?

Is this the first PR that was tested using what appears to be a new jenkins script?

@mike-dubman
Copy link
Member

@hppritcha - it was added yesterday:
mellanox-hpc/jenkins_scripts@e09144b

@bosilca
Copy link
Member

bosilca commented Jun 8, 2016

I confirm this solves the memory leak issue. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants