Skip to content

Only short hostname is accepted in rank file in case of affinity support #1614

@fengliplatform

Description

@fengliplatform

OpenMPI 1.10.1

When the rank file is like following with long name, Got error.

li@fengli-16-120: cat ./hostrankfile0
rank 0=fengli-17.gss.platformlab.ibm.com slot=0
rank 1=fengli-17.gss.platformlab.ibm.com slot=1
rank 2=fengli-17.gss.platformlab.ibm.com slot=2
rank 3=fengli-17.gss.platformlab.ibm.com slot=3
rank 4=fengli-16.gss.platformlab.ibm.com slot=0
rank 5=fengli-16.gss.platformlab.ibm.com slot=1
rank 6=fengli-16.gss.platformlab.ibm.com slot=2
rank 7=fengli-16.gss.platformlab.ibm.com slot=3

li@fengli-16-119: mpirun -rf ./hostrankfile0 ./hello2
--------------------------------------------------------------------------
The rankfile that was used claimed that a host was either not
allocated or oversubscribed its slots.  Please review your rank-slot
assignments and your host allocation to ensure a proper match.  Also,
some systems may require using full hostnames, such as
"host1.example.com" (instead of just plain "host1").

  Host: fengli-16
--------------------------------------------------------------------------

When the rank file is like following with short name, it works all right.

 li@fengli-16-125: cat ./hostrankfile1
rank 0=fengli-17 slot=0
rank 1=fengli-17 slot=1
rank 2=fengli-17 slot=2
rank 3=fengli-17 slot=3
rank 4=fengli-16 slot=0
rank 5=fengli-16 slot=1
rank 6=fengli-16 slot=2
rank 7=fengli-16 slot=3

li@fengli-16-124: mpirun -rf ./hostrankfile1 ./hello2
Hello world from processor fengli-16, rank 5 out of 8 processors
Hello world from processor fengli-16, rank 6 out of 8 processors
Hello world from processor fengli-16, rank 7 out of 8 processors
Hello world from processor fengli-16, rank 4 out of 8 processors
Hello world from processor fengli-17, rank 2 out of 8 processors
Hello world from processor fengli-17, rank 3 out of 8 processors
Hello world from processor fengli-17, rank 0 out of 8 processors
Hello world from processor fengli-17, rank 1 out of 8 processors

Note: this long hostname rank file has been tested successfully with OpenMPI 1.6.5.

_EDIT: Reformatted verbatim text_

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions