Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Apr 2, 2016

Hopefully a better approach than #1514. Aims at fixing #1396.

Tests:

% ompi_info --param btl all
             MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v3.0.0)
             MCA btl: sm (MCA v2.1.0, API v3.0.0, Component v3.0.0)
             MCA btl: self (MCA v2.1.0, API v3.0.0, Component v3.0.0)
             MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v3.0.0)
             MCA btl: parameter "btl_tcp_if_include" (current value: "",
                      data source: default, level: 1 user/basic, type:
                      string)
                      Comma-delimited list of devices and/or CIDR
                      notation of networks to use for MPI communication
                      (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
                      with btl_tcp_if_exclude.
             MCA btl: parameter "btl_tcp_if_exclude" (current value:
                      "127.0.0.1/8,sppp", data source: default, level: 1
                      user/basic, type: string)
                      Comma-delimited list of devices and/or CIDR
                      notation of networks to NOT use for MPI
                      communication -- all devices not matching these
                      specifications will be used (e.g.,
                      "eth0,192.168.0.0/16").  If set to a non-default
                      value, it is mutually exclusive with
                      btl_tcp_if_include.
             MCA btl: parameter "btl_tcp_progress_thread" (current value:
                      "0", data source: default, level: 1 user/basic,
                      type: int)


% ompi_info --param btl tcp
             MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v3.0.0)
             MCA btl: parameter "btl_tcp_if_include" (current value: "",
                      data source: default, level: 1 user/basic, type:
                      string)
                      Comma-delimited list of devices and/or CIDR
                      notation of networks to use for MPI communication
                      (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
                      with btl_tcp_if_exclude.
             MCA btl: parameter "btl_tcp_if_exclude" (current value:
                      "127.0.0.1/8,sppp", data source: default, level: 1
                      user/basic, type: string)
                      Comma-delimited list of devices and/or CIDR
                      notation of networks to NOT use for MPI
                      communication -- all devices not matching these
                      specifications will be used (e.g.,
                      "eth0,192.168.0.0/16").  If set to a non-default
                      value, it is mutually exclusive with
                      btl_tcp_if_include.
             MCA btl: parameter "btl_tcp_progress_thread" (current value:
                      "0", data source: default, level: 1 user/basic,
                      type: int)


% ompi_info -a
[...]
 MPI_MAX_DATAREP_STRING: 128
       MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v3.0.0)
       MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v3.0.0)
       MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v3.0.0)
             MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v3.0.0)
             MCA btl: sm (MCA v2.1.0, API v3.0.0, Component v3.0.0)
             MCA btl: self (MCA v2.1.0, API v3.0.0, Component v3.0.0)
             MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v3.0.0)
        MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v3.0.0)
[...]
        MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v3.0.0)
        MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component v3.0.0)
        MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component v3.0.0)
            MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component v3.0.0)
            MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v3.0.0)
       MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component v3.0.0)
             MCA mca: parameter "mca_param_files" (current value: "/home/kilo/.openmpi/mca-params.conf:/home/kilo/local/etc/openmpi-mca-params.conf", data source: default, level: 2 user/detail, type: string, deprecated, synonym of: mca_base_param_files)
                      Path for MCA configuration files containing variable values
[...]

@jsquyres / @hjelmn - What are your thoughts, gents?

@ghost ghost mentioned this pull request Apr 2, 2016
@rhc54
Copy link
Contributor

rhc54 commented Apr 2, 2016

FWIW: I like having the dashed lines separating the output - made it a lot easier to read, for me.

@ghost
Copy link
Author

ghost commented Apr 2, 2016

@rhc54 😄 For me too! Once we nail down a good approach to this bug, I'll be happy to try something similar in another pull request.

@ghost
Copy link
Author

ghost commented Apr 2, 2016

Mellanox tests (I've no idea what they do 😄) failed:
...
18:42:02 + taskset -c 12,13 timeout -s SIGSEGV 10m /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/bin/mpirun -np 2 -bind-to core -mca btl_openib_if_include mlx5_0:1 -x MXM_RDMA_PORTS=mlx5_0:1 -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_TLS=rc,cm -mca pml ob1 -mca btl self,openib /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/thread_tests/thread-tests-1.1/message_rate_th 8
18:42:02 Note: Max supported value for env var is 4095
18:42:02 vsetenv ghprbPullLongDescription failed
18:42:02 Note: Max supported value for env var is 4095
18:42:02 vsetenv ghprbPullLongDescription failed
18:42:02 Note: Max supported value for env var is 4095
18:42:02 vsetenv ghprbPullLongDescription failed
18:42:02 Note: Max supported value for env var is 4095
18:42:02 vsetenv ghprbPullLongDescription failed
18:42:02 Note: Max supported value for env var is 4095
18:42:02 vsetenv ghprbPullLongDescription failed
18:42:02 Note: Max supported value for env var is 4095
18:42:02 vsetenv ghprbPullLongDescription failed
18:42:02 Note: Max supported value for env var is 4095
18:42:02 vsetenv ghprbPullLongDescription failed
18:42:02 Note: Max supported value for env var is 4095
18:42:02 vsetenv ghprbPullLongDescription failed
18:42:02 Note: Max supported value for env var is 4095
18:42:02 vsetenv ghprbPullLongDescription failed
18:42:02 Note: Max supported value for env var is 4095
18:42:02 vsetenv ghprbPullLongDescription failed
18:42:02 Note: Max supported value for env var is 4095
18:42:02 vsetenv ghprbPullLongDescription failed
18:43:43 [jenkins01][[36327,1],0][btl_openib_endpoint.c:115:mca_btl_openib_endpoint_post_send] error posting send request error 12: Cannot allocate memory. size = 14
18:43:43
18:43:43 [jenkins01][[36327,1],0][btl_openib_endpoint.c:115:mca_btl_openib_endpoint_post_send] error posting send request error 12: Cannot allocate memory. size = 14
18:43:43
18:43:43 [jenkins01:8970] *** An error occurred in MPI_Send
18:43:43 [jenkins01:8970] *** reported by process [2380726273,0]
18:43:43 [jenkins01:8970] *** on communicator MPI_COMM_WORLD
18:43:43 [jenkins01:8970] *** MPI_ERR_OTHER: known error not in list
18:43:43 [jenkins01:8970] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
18:43:43 [jenkins01:8970] *** and potentially your MPI job)
18:43:43 [jenkins01][[36327,1],0][btl_openib_endpoint.c:115:mca_btl_openib_endpoint_post_send] error posting send request error 12: Cannot allocate memory. size = 14
18:43:43
18:43:43 [jenkins01:08965] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
18:43:43 [jenkins01:08965] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
18:43:43 Build step 'Execute shell' marked build as failure
18:43:43 [htmlpublisher] Archiving HTML reports...
18:43:43 [htmlpublisher] Archiving at BUILD level /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/cov_build to /var/lib/jenkins/jobs/gh-ompi-master-pr/builds/2164/htmlreports/Coverity_Report
18:43:45 Setting commit status on GitHub for 2cd1e8c
18:43:46 [BFA] Scanning build for known causes...
18:43:46 [BFA] No failure causes found
18:43:46 [BFA] Done. 0s
18:43:46 Setting status of b09ae94 to FAILURE with url http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/2164/ and message: 'Build finished. '
18:43:46 Using context: Mellanox
18:43:46 Finished: FAILURE
...

Doesn't seem related to the ompi_info changes?

@rhc54
Copy link
Contributor

rhc54 commented Apr 2, 2016

Completely unrelated - ignore them

@ghost
Copy link
Author

ghost commented Apr 2, 2016

That's what I thought. I did restart the testsuite though...

@rhc54
Copy link
Contributor

rhc54 commented Apr 2, 2016

👍 Looks good to me...

@hjelmn
Copy link
Member

hjelmn commented Apr 2, 2016

Curious how this affects --parsable? Does this cause any duplicate lines there? Not sure we care but it would be good to know.

@ghost
Copy link
Author

ghost commented Apr 2, 2016

@hjelmn Good call.

% ompi_info --parsable --param btl all
mca:btl:vader:version:"mca:2.1.0"
mca:btl:vader:version:"api:3.0.0"
mca:btl:vader:version:"component:3.0.0"
mca:btl:sm:version:"mca:2.1.0"
mca:btl:sm:version:"api:3.0.0"
mca:btl:sm:version:"component:3.0.0"
mca:btl:self:version:"mca:2.1.0"
mca:btl:self:version:"api:3.0.0"
mca:btl:self:version:"component:3.0.0"
mca:btl:tcp:version:"mca:2.1.0"
mca:btl:tcp:version:"api:3.0.0"
mca:btl:tcp:version:"component:3.0.0"
mca:btl:tcp:param:btl_tcp_if_include:value:
mca:btl:tcp:param:btl_tcp_if_include:source:default
mca:btl:tcp:param:btl_tcp_if_include:status:writeable
mca:btl:tcp:param:btl_tcp_if_include:level:1
mca:btl:tcp:param:btl_tcp_if_include:help:Comma-delimited list of devices and/or CIDR notation of 
networks to use for MPI communication (e.g., "eth0,192.168.0.0/16").  Mutually exclusive with   
btl_tcp_if_exclude.
mca:btl:tcp:param:btl_tcp_if_include:deprecated:no
mca:btl:tcp:param:btl_tcp_if_include:type:string
mca:btl:tcp:param:btl_tcp_if_include:disabled:false
mca:btl:tcp:param:btl_tcp_if_exclude:value:127.0.0.1/8,sppp
mca:btl:tcp:param:btl_tcp_if_exclude:source:default
mca:btl:tcp:param:btl_tcp_if_exclude:status:writeable
mca:btl:tcp:param:btl_tcp_if_exclude:level:1
mca:btl:tcp:param:btl_tcp_if_exclude:help:Comma-delimited list of devices and/or CIDR notation 
of networks to NOT use for MPI communication -- all devices not matching these specifications 
will be used (e.g., "eth0,192.168.0.0/16").  If set to a non-default value, it is mutually exclusive 
with btl_tcp_if_include.
mca:btl:tcp:param:btl_tcp_if_exclude:deprecated:no
mca:btl:tcp:param:btl_tcp_if_exclude:type:string
mca:btl:tcp:param:btl_tcp_if_exclude:disabled:false
mca:btl:tcp:param:btl_tcp_progress_thread:value:0
mca:btl:tcp:param:btl_tcp_progress_thread:source:default
mca:btl:tcp:param:btl_tcp_progress_thread:status:writeable
mca:btl:tcp:param:btl_tcp_progress_thread:level:1
mca:btl:tcp:param:btl_tcp_progress_thread:deprecated:no
mca:btl:tcp:param:btl_tcp_progress_thread:type:int
mca:btl:tcp:param:btl_tcp_progress_thread:disabled:false

% ompi_info --parsable --param btl tcp
mca:btl:tcp:version:"mca:2.1.0"
mca:btl:tcp:version:"api:3.0.0"
mca:btl:tcp:version:"component:3.0.0"
mca:btl:tcp:param:btl_tcp_if_include:value:
mca:btl:tcp:param:btl_tcp_if_include:source:default
mca:btl:tcp:param:btl_tcp_if_include:status:writeable
mca:btl:tcp:param:btl_tcp_if_include:level:1
[...]

% ompi_info -a --parsable
[...]
mca:backtrace:execinfo:version:"component:3.0.0"
mca:btl:vader:version:"mca:2.1.0"
mca:btl:vader:version:"api:3.0.0"
mca:btl:vader:version:"component:3.0.0"
mca:btl:sm:version:"mca:2.1.0"
mca:btl:sm:version:"api:3.0.0"
mca:btl:sm:version:"component:3.0.0"
mca:btl:self:version:"mca:2.1.0"
mca:btl:self:version:"api:3.0.0"
mca:btl:self:version:"component:3.0.0"
mca:btl:tcp:version:"mca:2.1.0"
mca:btl:tcp:version:"api:3.0.0"
mca:btl:tcp:version:"component:3.0.0"
mca:compress:bzip:version:"mca:2.1.0"
[...]

There don't appear to be any duplicate lines.

Pass component_map to opal_info_do_params(). It will be needed to output
component versions.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
@ghost ghost changed the title opal_info_support: output component versions DNM opal_info_support: output component versions Apr 2, 2016
@ghost
Copy link
Author

ghost commented Apr 2, 2016

Tabs... fixing now... done.

When invoking, for example, `ompi_info` with:
   -a
   --params foo all
   --params foo bar
it's useful to have the appropriate components and their versions be
displayed, regardless of whether they have registered any parameters.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
@ghost ghost changed the title DNM opal_info_support: output component versions opal_info_support: output component versions Apr 2, 2016
@rhc54 rhc54 merged commit d724d8a into open-mpi:master Apr 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants