Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Mar 21, 2016

This commit adds the data necessesary for supporting dynamic add_procs
to the rdmacm message (opal_process_name_t). The endpoint lookup
function has been updated to match the code in udcm.

Closes open-mpi/ompi#1468.

:bot:assign: @jladd-mlnx
:bot:milestone:v2.0.0
:bot🏷️bug

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

(cherry picked from open-mpi/ompi@645bd9d)

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

This commit adds the data necessesary for supporting dynamic add_procs
to the rdma message (opal_process_name_t). The endpoint lookup
function has been updated to match the code in udcm.

Closes open-mpi/ompi#1468.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

(cherry picked from open-mpi/ompi@645bd9d)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@jsquyres
Copy link
Member

@alinask Could you verify that this works for you?

@hppritcha hppritcha assigned alinask and unassigned jladd-mlnx Mar 21, 2016
@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1458/ for details.

@alinask
Copy link
Member

alinask commented Mar 22, 2016

@hjelmn , thanks for this fix!

I tested on the master branch, where mpi_add_procs_cutoff default is set to 0.
The following command lines work:

rdmacm, only per-peer QP:

mpirun -np 2 --bind-to core --map-by node  --display-map -mca pml ob1 -mca btl self,sm,openib --mca btl_openib_cpc_include rdmacm -mca btl_openib_receive_queues  P,65536,256,192,128  ./IMB/src/IMB-MPI1 pingpong

rdmacm, first QP is per-peer, rest are SRQ:

mpirun -np 2 --bind-to core --map-by node  --display-map -mca pml ob1 -mca btl self,sm,openib --mca btl_openib_cpc_include rdmacm -mca btl_openib_receive_queues  P,65536,256,192,128:S,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64 ./IMB-MPI1 pingpong

udcm:

mpirun -np 2 --bind-to core --map-by node  --display-map -mca pml ob1 -mca btl self,sm,openib --mca btl_openib_cpc_include udcm   ./IMB/src/IMB-MPI1 pingpong

Since the default of btl_openib_receive_queues is to only use SRQ QPs (in the master branch), rdmacm won't work without changing this mca parameter from the command line. I added a comment stating this in the FAQ but maybe a warning about this should be added to the code like @jsquyres suggested or the default should have a per-peer QP at the beginning like the release branch (v1.10) has?
Without adding a per-peer QP to the command line, there is a segmentation fault and the output doesn't imply anything about what needs to be done.

mpirun  -np 8 --bind-to core --map-by node  --display-map -mca pml ob1 -mca btl self,sm,openib --mca btl_openib_cpc_include rdmacm ./IMB/src/IMB-MPI1 pingpong

--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           vegas07
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       rdmacm
--------------------------------------------------------------------------
 benchmarks to run pingpong alltoall 
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[19697,1],0]) is on host: vegas06
  Process 2 ([[19697,1],1]) is on host: vegas07
  BTLs attempted: self

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
[vegas06:19044] *** Process received signal ***
[vegas06:19044] Signal: Segmentation fault (11)
[vegas06:19044] Signal code: Address not mapped (1)
[vegas06:19044] Failing at address: 0x8
[vegas06:19044] mca_bml_base_btl_array_get_next: invalid array size
[vegas06:19044] [ 0] /lib64/libpthread.so.0[0x3d2e60f710]
[vegas06:19044] [ 1] /labhome/alinas/workspace/ompi/ompi-github/ompi/install/lib/openmpi/mca_pml_ob1.so(+0xf028)[0x7ffff1b47028]
[vegas06:19044] [ 2] /labhome/alinas/workspace/ompi/ompi-github/ompi/install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_isend+0xc0)[0x7ffff1b47303]
[vegas06:19044] [ 3] /labhome/alinas/workspace/ompi/ompi-github/ompi/install/lib/libmpi.so.0(ompi_coll_base_bcast_intra_generic+0x1e4)[0x7ffff7d51a74]
[vegas06:19044] [ 4] /labhome/alinas/workspace/ompi/ompi-github/ompi/install/lib/libmpi.so.0(ompi_coll_base_bcast_intra_binomial+0x188)[0x7ffff7d526b6]
[vegas06:19044] [ 5] /labhome/alinas/workspace/ompi/ompi-github/ompi/install/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0x162)[0x7ffff0c6b0a1]
[vegas06:19044] [ 6] /labhome/alinas/workspace/ompi/ompi-github/ompi/install/lib/libmpi.so.0(MPI_Bcast+0x2e3)[0x7ffff7cf021b]
[vegas06:19044] [ 7] /labhome/alinas/workspace/imb_test/IMB/src/IMB-MPI1[0x403de4]
[vegas06:19044] [ 8] /labhome/alinas/workspace/imb_test/IMB/src/IMB-MPI1[0x401cc7]
[vegas06:19044] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3d2de1ed5d]
[vegas06:19044] [10] /labhome/alinas/workspace/imb_test/IMB/src/IMB-MPI1[0x401b69]
[vegas06:19044] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 19044 on node vegas06 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

@alinask
Copy link
Member

alinask commented Mar 24, 2016

An enhancement of the warning "rdmacm CPC only supported when the first QP is a PP QP" is here:
open-mpi/ompi#1488

@jladd-mlnx
Copy link
Member

👍

bot:assign: @jsquyres
bot:label:reviewed

Please merge.

@ompiteam-bot
Copy link

OMPIBot error: Label "reviewed" is already set on issue 1038.

@jsquyres
Copy link
Member

Am I understanding this PR correctly:

  • this PR fixes the rdmacm CPC to work with the new dynamic add procs
  • openib still segv's if there is no PP QP with no discernable error message
  • there is no PP QP by default

@alinask
Copy link
Member

alinask commented Mar 24, 2016

There is an error message if btl_base_verbose is set (open-mpi/ompi#1488) and the current default is no PP QP (before this PR).

@jsquyres
Copy link
Member

If we're still segv'ing by default, this seems like half a solution.

@hjelmn
Copy link
Member Author

hjelmn commented Mar 24, 2016

The SEGV still needs to be fixed in r2. I plan to take a look later today.

@hjelmn
Copy link
Member Author

hjelmn commented Mar 24, 2016

Kind of. The SEGV requires the user to specifically ask for rdmacm which is suboptimal on IB systems. I would prefer this get merged regardless of the warning message and r2 fix.

@hjelmn
Copy link
Member Author

hjelmn commented Mar 24, 2016

Also, there is a really clear error message about openib not working. It calls out the CPCs tried.

@jsquyres
Copy link
Member

How long to get the 2nd segv fix?

I'm only trying to prevent the "I promise to commit the 2nd half of the fix!" ...but then get too busy/forget to do it kind of scenario.

This commit adds code to detect when procs are unreachable when using
the dynamic add_procs functionality.

Fixes #1501

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

(cherry picked from open-mpi/ompi@9d5eeec)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@ompiteam-bot
Copy link

OMPIBot error: Label "-pushed-back" does not exist.

@hjelmn
Copy link
Member Author

hjelmn commented Mar 28, 2016

bot:nolabel:pushed-back

@hjelmn
Copy link
Member Author

hjelmn commented Mar 28, 2016

@jsquyres The SEGV is now fixed.

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1482/ for details.

@jsquyres
Copy link
Member

@alinask Can you confirm that this PR now fixes everything for you / review the PR? Thanks.

@alinask
Copy link
Member

alinask commented Mar 29, 2016

sure, I'll check.
@hjelmn was this fix pushed to the master branch as well or should I test it only on v2.x?

@jsquyres
Copy link
Member

@hjelmn says yes, it was pushed to master as well (he said this verbally on the call today).

@jladd-mlnx
Copy link
Member

@alinask, please ack ASAP. We are trying to get this release out.

@alinask
Copy link
Member

alinask commented Mar 30, 2016

I checked the v2.x branch with the patch from this PR with the command lines from above.

When using rdmacm without a per-peer QP, the output is:

$mpirun -H vegas08,vegas09 -np 2 --bind-to core --map-by node  --display-map -mca pml ob1 -mca btl self,sm,openib --mca btl_openib_cpc_include rdmacm   ./IMB/src/IMB-MPI1 pingpong 

 ========================   JOB MAP   ========================

 Data for node: vegas08 Num slots: 1    Max slots: 0    Num procs: 1
    Process OMPI jobid: [42738,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0]]:[B/././././././.][./././././././.]

 Data for node: vegas09 Num slots: 1    Max slots: 0    Num procs: 1
    Process OMPI jobid: [42738,1] App: 0 Process rank: 1 Bound: socket 0[core 0[hwt 0]]:[B/././././././.][./././././././.]

 =============================================================
--------------------------------------------------------------------------
WARNING: There are more than one active ports on host 'vegas08', but the
default subnet GID prefix was detected on more than one of these
ports.  If these ports are connected to different physical IB
networks, this configuration will fail in Open MPI.  This version of
Open MPI requires that every physically separate IB subnet that is
used between connected MPI processes must have different subnet ID
values.

Please see this FAQ entry for more details:

  http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_default_gid_prefix to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           vegas08
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       rdmacm
--------------------------------------------------------------------------
 benchmarks to run pingpong 
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[42738,1],0]) is on host: vegas08
  Process 2 ([[42738,1],1]) is on host: vegas09
  BTLs attempted: self

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
[vegas08:14207] *** An error occurred in MPI_Bcast
[vegas08:14207] *** reported by process [2800877569,140733193388032]
[vegas08:14207] *** on communicator MPI_COMM_WORLD
[vegas08:14207] *** MPI_ERR_INTERN: internal error
[vegas08:14207] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[vegas08:14207] ***    and potentially your MPI job)
[vegas08:14197] 1 more process has sent help message help-mpi-btl-openib.txt / default subnet prefix
[vegas08:14197] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[vegas08:14197] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port

So there is no segmentation fault anymore. Thanks @hjelmn !
The other command lines worked (rdmacm with a PP QP/ udcm) while mpi_add_procs_cutoff is set to 0.

@jsquyres
Copy link
Member

Thanks @alinask. I'm thinking that's an implicit 👍 :-)

@jsquyres
Copy link
Member

@hppritcha Good to go.

@hppritcha hppritcha merged commit a9cecde into open-mpi:v2.x Mar 30, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants