pml/ob1: bug fixes #1758

hjelmn · 2016-06-06T23:58:10Z

This commit fixes two bugs in pml/ob1:

Do not called MCA_PML_OB1_PROGRESS_PENDING from
mca_pml_ob1_send_request_start_copy as this may lead to a recursive
call to mca_pml_ob1_send_request_process_pending.
In mca_pml_ob1_send_request_start_rdma return the rdma frag object
if a btl fragment can not be allocated. This fixes a leak
identified by @abouteiller and @bosilca.

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

@abouteiller

This commit fixes two bugs in pml/ob1: - Do not called MCA_PML_OB1_PROGRESS_PENDING from mca_pml_ob1_send_request_start_copy as this may lead to a recursive call to mca_pml_ob1_send_request_process_pending. - In mca_pml_ob1_send_request_start_rdma return the rdma frag object if a btl fragment can not be allocated. This fixes a leak identified by @abouteiller and @bosilca. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

hjelmn · 2016-06-06T23:58:50Z

@bosilca @abouteiller We were leaking rdma frags in ob1 if the btl ran out of fragments. This fixes that and another issue identified by your benchmark.

hjelmn · 2016-06-07T02:23:12Z

@miked-mellanox False failure. Looks like it tried to use rdmacm without changing the receive queue pair list to use a per-peer for QP 0.

mike-dubman · 2016-06-07T05:42:01Z

@hjelmn - why it is needed? v1.10 same command line works fine:

$/hpc/local/benchmarks/hpcx_install/hpcx-gcc/ompi-v1.10/bin/mpirun -np 2 --mca btl_openib_cpc_include rdmacm  -mca btl_openib_warn_default_gid_prefix 0  -mca pml ob1 -mca btl self,openib -mca btl_if_include mlx4_0:2 /usr/mpi/gcc/openmpi-1.10.3rc4/tests/osu-micro-benchmarks-5.2/osu_bw
# OSU MPI Bandwidth Test v5.2
# Size      Bandwidth (MB/s)
1                       2.67
2                       5.58
4                      11.10
8                      22.17
16                     44.33
32                     86.16
64                    171.67
128                   289.57
256                   585.56
512                  1057.71
1024                 1825.01
2048                 3066.73
4096                 4491.07
8192                 5625.41
16384                6786.90
32768                7676.25
65536                7925.74
131072               8018.55
262144               8077.81
524288               8109.15
1048576              8115.67
2097152              7337.95
4194304              7220.15

hjelmn · 2016-06-07T05:43:27Z

@miked-mellanox Because using per-peer QPs by default was bad for our out-of-the box performance. From 2.0.0 and beyond SRQ is used exclusively by default.

mike-dubman · 2016-06-07T05:49:11Z

thanks
so, does it mean rdmacm failure for SRQ type?

mike-dubman · 2016-06-07T05:51:00Z

added this to the test:

-mca btl_openib_receive_queues P,65536,256,192,128:S,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64

could you please explain the bug in rdmacm as you see it?

mike-dubman · 2016-06-07T05:51:10Z

bot:retest

hjelmn · 2016-06-07T05:55:03Z

@miked-mellanox Yeah, for some time we have required a per-peer receive queue for any connection method that needs uses a clear-to-send protocol. I don't know if that can be improved but it has been a long-standing "feature". There hasn't been any effort to investigate allowing rdmacm to be used when using SRQ QPs only since the scaling of per-peer doesn't really matter when using rdmacm (which tended to scale poorly).

mike-dubman · 2016-06-07T07:02:17Z

Thanks to @rhc54 timeout feaure -

09:41:45    Thread 1 (Thread 0x7ffff73e2700 (LWP 32132)):
09:41:45    #0  0x0000003d6980b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
09:41:45    #1  0x00007fffeeb74bc9 in rdmacm_endpoint_finalize (endpoint=0x816be0) at connect/btl_openib_connect_rdmacm.c:1231
09:41:45    #2  0x00007fffeeb5fd2b in mca_btl_openib_endpoint_destruct (endpoint=0x816be0) at btl_openib_endpoint.c:368
09:41:45    #3  0x00007fffeeb489b7 in opal_obj_run_destructors (object=0x816be0) at ../../../../opal/class/opal_object.h:460
09:41:45    #4  0x00007fffeeb4dddb in mca_btl_openib_del_procs (btl=0x726f50, nprocs=1, procs=0x7fffffffc828, peers=0x804f80) at btl_openib.c:1316
09:41:45    #5  0x00007fffeefa5159 in mca_bml_r2_del_procs (nprocs=8, procs=0x78d5c0) at bml_r2.c:623
09:41:45    #6  0x00007fffee4af612 in mca_pml_ob1_del_procs (procs=0x78d5c0, nprocs=8) at pml_ob1.c:455
09:41:45    #7  0x00007ffff7ca50d4 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:333
09:41:45    #8  0x00007ffff7cd7525 in PMPI_Finalize () at pfinalize.c:47
09:41:45    #9  0x0000000000400890 in main (argc=1, argv=0x7fffffffcbf8) at hello_c.c:24

jladd-mlnx · 2016-06-07T15:04:46Z

@hjelmn This is hanging.

hjelmn · 2016-06-07T15:10:59Z

:bot:retest:

hjelmn · 2016-06-07T17:18:47Z

@jladd-mlnx Can not reproduce your hang on a Connect X3 cluster. Anyway, it is totally unrelated to this commit so it is being merged.

hppritcha · 2016-06-07T18:12:29Z

@Di0gen when did mellanox add the additional hello_c tests to the jenkins script - namely those that are using rdmacm?

Is this the first PR that was tested using what appears to be a new jenkins script?

mike-dubman · 2016-06-07T19:18:32Z

@hppritcha - it was added yesterday:
mellanox-hpc/jenkins_scripts@e09144b

bosilca · 2016-06-08T11:01:49Z

I confirm this solves the memory leak issue. 👍

jsquyres mentioned this pull request Jun 7, 2016

mpool/rgpusm: update for rcache threading fixes open-mpi/ompi-release#1218

Merged

hjelmn merged commit 3ddf3cc into open-mpi:master Jun 7, 2016

pml/ob1: bug fixes #1758

pml/ob1: bug fixes #1758

Uh oh!

Conversation

hjelmn commented Jun 6, 2016

Uh oh!

hjelmn commented Jun 6, 2016

Uh oh!

hjelmn commented Jun 7, 2016

Uh oh!

mike-dubman commented Jun 7, 2016

Uh oh!

hjelmn commented Jun 7, 2016

Uh oh!

mike-dubman commented Jun 7, 2016

Uh oh!

mike-dubman commented Jun 7, 2016

Uh oh!

mike-dubman commented Jun 7, 2016

Uh oh!

hjelmn commented Jun 7, 2016

Uh oh!

mike-dubman commented Jun 7, 2016

Uh oh!

jladd-mlnx commented Jun 7, 2016

Uh oh!

hjelmn commented Jun 7, 2016

Uh oh!

hjelmn commented Jun 7, 2016

Uh oh!

hppritcha commented Jun 7, 2016

Uh oh!

mike-dubman commented Jun 7, 2016

Uh oh!

bosilca commented Jun 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants