New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
infiniband-diags: update ibportstate port enabling example #847
Conversation
infiniband-diags/ibportstate.c
Outdated
@@ -402,7 +402,7 @@ int main(int argc, char **argv) | |||
"\tmkey, mkeylease, mkeyprot\n"; | |||
const char *usage_examples[] = { | |||
"3 1 disable\t\t\t# by lid", | |||
"-G 0x2C9000100D051 1 enable\t# by guid", | |||
"3 1 -C mlx4_0 -P 1 enable\t# by lid, CA name and Port number", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since when is enablement via GUID not working?
FWIW if a CA port is disabled it does not have a valid LID (unless running on an SM node). So I'm wondering why this is better than using a port GUID which should be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[root ~]$ ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0xf4521403007be0e0
System image GUID: 0xf4521403007be0e3
Port 1:
State: Down
Physical state: Disabled
Rate: 10
Base lid: 38
LMC: 0
SM lid: 13
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e1
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 1
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e2
Link layer: InfiniBand
[root ~]$ ibportstate -G 0xf4521403007be0e1 1 enable
ibwarn: [84425] ib_path_query_via: sa call path_query failed
ibportstate: iberror: failed: can't resolve destination port 0xf4521403007be0e1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since when is enablement via GUID not working?
I did not run the git bisect test. But I build ibportstate with code from upstream repo. The problem is that the first port is disabled. function 'resolve_ca_port' skips the first port.
292├> for (i = 0; i <= ca.numports; i++) {
293│ DEBUG("checking port %d", i);
294│ if (!ca.ports[i])
295│ continue;
296│ if (strcmp(ca.ports[i]->link_layer, "InfiniBand") &&
297│ strcmp(ca.ports[i]->link_layer, "IB"))
298│ continue;
299│ if (up < 0 && ca.ports[i]->phys_state == 5)
300│ up = *port = i;
/root/rdma-core/libibumad/umad.c
(gdb) p ca.ports[0]
$11 = (umad_port_t *) 0x0
(gdb) p ca.ports[1]
$12 = (umad_port_t *) 0x6099d0
(gdb) bt
#0 resolve_ca_port (ca_name=ca_name@entry=0x611920 "mlx4_0", port=port@entry=0x7fffffffd82c) at ../libibumad/umad.c:292
#1 0x0000155554f0bea8 in resolve_ca_name (ca_in=0x0, ca_name=0x7fffffffd888, best_port=0x7fffffffd87c) at ../libibumad/umad.c:372
#2 resolve_ca_name (ca_in=, best_port=0x7fffffffd87c, ca_name=0x7fffffffd888) at ../libibumad/umad.c:334
#3 0x0000155554f0c22d in umad_open_port (ca_name=ca_name@entry=0x0, portnum=, portnum@entry=0) at ../libibumad/umad.c:701
#4 0x000015555511d718 in mad_rpc_open_port (dev_name=0x0, dev_port=0, mgmt_classes=mgmt_classes@entry=0x7fffffffdaa4, num_classes=num_classes@ent
ry=3) at ../libibmad/rpc.c:398
#5 0x00000000004019c4 in main (argc=3, argv=0x7fffffffdeb8) at ../infiniband-diags/ibportstate.c:423
FWIW if a CA port is disabled it does not have a valid LID (unless running on an SM node). So I'm wondering why this is better than using a port GUID which should be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should fix this code then. The tool should be able to enable a local port. I don't have time right now to figure out why this changed but at some point it did work and I don't see any reason it could not be made to work again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should fix this code then.
Got it. I'm closing this PR and will open a new PR to fix the code.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since when is enablement via GUID not working?
I tested the oldest OFED packages, which built from https://downloads.openfabrics.org/management/ , on a RHEL-7.0 x86-64 machine. They do not work.
~]$ ls *gz
infiniband-diags-1.3.2.tar.gz libibcommon-1.0.5.tar.gz libibumad-1.1.3.tar.gz
libibmad-1.1.2.tar.gz opensm-3.1.5.tar.gz
~]$ rpm -q libibcommon libibumad opensm-libs libibmad infiniband-diags
libibcommon-1.0.5-1.el7.x86_64
libibumad-1.1.3-1.el7.x86_64
opensm-libs-3.1.5-1.el7.x86_64
libibmad-1.1.2-1.el7.x86_64
infiniband-diags-1.3.2-1.el7.x86_64
~]$ ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 0
Node GUID: 0xf4521403007be160
System image GUID: 0xf4521403007be163
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 37
LMC: 0
SM lid: 13
Capability mask: 0x02514868
Port GUID: 0xf4521403007be161
Port 2:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 2
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: 0xf4521403007be162
~]$ ibportstate 37 1 disable
ibportstate: iberror: failed: smp query nodeinfo: Node type not switch
I think ibportstate
function enable
needs the HCA name when multiple HCAs available. It also needs the port number for target HCA, which has more than one ports. Otherwise, the first active port of first HCA found by libibumad will be use. The first active port found by libibumad may not in the same fabric as the target port, which will be enabled.
When enablement via GUID for a system which has multiple HCAs or IB ports, the CA name and Port number was not specified, the first active port of first HCA found by libibumad will be used. The port selected by libibumad and the port specified by GUID may not in the same fabric. When they are not in the same fabric, enablement via GUID will never work.
I think we need to update the help message to hint use to specific CA name and Port number when multiple HCAs or ports available. In the meanwhile, we also need to fix enablement via GUID when CA name and Port number are specified.
I'm reopening this PR for discussion. |
Test log with latest upstream rdma-core/infiniband-diags. ~]$ cat a.sh
#!/bin/bash
set -x
rpm -q rdma-core infiniband-diags
ibstat
ibportstate 38 1 disable
sleep 10
ibstat
ibportstate -G 0xf4521403007be0e1 1 enable
sleep 10
ibstat
ibportstate -G 0xf4521403007be0e1 1 -C mlx4_0 -P 1 enable
sleep 10
ibstat
ibportstate 38 1 -C mlx4_0 -P 1 enable
sleep 10
ibstat
~]$ sh a.sh
+ rpm -q rdma-core infiniband-diags
rdma-core-32.0-1.el8.x86_64
infiniband-diags-32.0-1.el8.x86_64
+ ibstat
ibwarn: [168949] umad_init: umad_init
ibwarn: [168949] umad_get_ca_device_list: return 1 cas
ibwarn: [168949] umad_get_ca: ca_name mlx4_0
ibwarn: [168949] umad_get_ca: opened mlx4_0
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0xf4521403007be0e0
System image GUID: 0xf4521403007be0e3
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 38
LMC: 0
SM lid: 13
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e1
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 1
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e2
Link layer: InfiniBand
+ ibportstate 38 1 disable
ibwarn: [168950] umad_init: umad_init
ibwarn: [168950] umad_open_port: ca (null) port 0
ibwarn: [168950] umad_get_ca_device_list: return 1 cas
ibwarn: [168950] resolve_ca_name: checking ca 'mlx4_0'
ibwarn: [168950] resolve_ca_port: checking ca 'mlx4_0'
ibwarn: [168950] umad_get_ca: ca_name mlx4_0
ibwarn: [168950] umad_get_ca: opened mlx4_0
ibwarn: [168950] resolve_ca_port: checking port 0
ibwarn: [168950] resolve_ca_port: checking port 1
ibwarn: [168950] resolve_ca_port: found active port 1
ibwarn: [168950] resolve_ca_name: found ca mlx4_0 with port 1 type 1
ibwarn: [168950] resolve_ca_name: found ca mlx4_0 with active port 1
ibwarn: [168950] umad_open_port: opening mlx4_0 port 1
ibwarn: [168950] dev_to_umad_id: mapped mlx4_0 1 to 0
ibwarn: [168950] umad_open_port: opened /dev/infiniband/umad0 fd 3 portid 0
ibwarn: [168950] umad_register: fd 3 mgmt_class 1 mgmt_version 1 rmpp_version 0 method_mask (nil)
ibwarn: [168950] umad_register: fd 3 registered to use agent 0 qp 0
ibwarn: [168950] umad_register: fd 3 mgmt_class 129 mgmt_version 1 rmpp_version 0 method_mask (nil)
ibwarn: [168950] umad_register: fd 3 registered to use agent 1 qp 0
ibwarn: [168950] umad_register: fd 3 mgmt_class 3 mgmt_version 2 rmpp_version 1 method_mask (nil)
ibwarn: [168950] umad_register: fd 3 registered to use agent 2 qp 1
ibwarn: [168950] umad_set_addr: umad 0x7fff675323a0 dlid 38 dqp 0 sl 0, qkey 0
ibwarn: [168950] umad_send: fd 3 agentid 0 umad 0x7fff675323a0 timeout 1000
ibwarn: [168950] umad_dump: agent id 0 status 0 timeout 1000
ibwarn: [168950] umad_addr_dump: qpn 0 qkey 0x0 lid 38 sl 0
grh_present 0 gid_index 0 hop_limit 0 traffic_class 0 flow_label 0x0 pkey_index 0x0
Gid 0x00000000000000000000000000000000
ibwarn: [168950] umad_recv: fd 3 umad 0x7fff675327a0 timeout 1000
ibwarn: [168950] umad_recv: mad received by agent 0 length 320
Initial CA/RT PortInfo:
ibwarn: [168950] umad_set_addr: umad 0x7fff67532330 dlid 38 dqp 0 sl 0, qkey 0
ibwarn: [168950] umad_send: fd 3 agentid 0 umad 0x7fff67532330 timeout 1000
ibwarn: [168950] umad_dump: agent id 0 status 0 timeout 1000
ibwarn: [168950] umad_addr_dump: qpn 0 qkey 0x0 lid 38 sl 0
grh_present 0 gid_index 0 hop_limit 0 traffic_class 0 flow_label 0x0 pkey_index 0x0
Gid 0x00000000000000000000000000000000
ibwarn: [168950] umad_recv: fd 3 umad 0x7fff67532730 timeout 1000
ibwarn: [168950] umad_recv: mad received by agent 0 length 320
# Port info: Lid 38 port 1
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................38
SMLid:...........................13
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
LinkSpeedExtSupported:...........14.0625 Gbps
LinkSpeedExtEnabled:.............14.0625 Gbps
LinkSpeedExtActive:..............14.0625 Gbps
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0
ibwarn: [168950] umad_set_addr: umad 0x7fff675323a0 dlid 38 dqp 0 sl 0, qkey 0
ibwarn: [168950] umad_send: fd 3 agentid 0 umad 0x7fff675323a0 timeout 1000
ibwarn: [168950] umad_dump: agent id 0 status 0 timeout 1000
ibwarn: [168950] umad_addr_dump: qpn 0 qkey 0x0 lid 38 sl 0
grh_present 0 gid_index 0 hop_limit 0 traffic_class 0 flow_label 0x0 pkey_index 0x0
Gid 0x00000000000000000000000000000000
ibwarn: [168950] umad_recv: fd 3 umad 0x7fff675327a0 timeout 1000
ibwarn: [168950] umad_recv: mad received by agent 0 length 320
# MLNX ext Port info: Lid 38 port 1
StateChangeEnable:...............0x00
LinkSpeedSupported:..............0x01
LinkSpeedEnabled:................0x01
LinkSpeedActive:.................0x00
Disable may be irreversible
ibwarn: [168950] umad_set_addr: umad 0x7fff675323b0 dlid 38 dqp 0 sl 0, qkey 0
ibwarn: [168950] umad_send: fd 3 agentid 0 umad 0x7fff675323b0 timeout 1000
ibwarn: [168950] umad_dump: agent id 0 status 0 timeout 1000
ibwarn: [168950] umad_addr_dump: qpn 0 qkey 0x0 lid 38 sl 0
grh_present 0 gid_index 0 hop_limit 0 traffic_class 0 flow_label 0x0 pkey_index 0x0
Gid 0x00000000000000000000000000000000
ibwarn: [168950] umad_recv: fd 3 umad 0x7fff675327b0 timeout 1000
ibwarn: [168950] umad_recv: mad received by agent 0 length 320
After PortInfo set:
# Port info: Lid 38 port 1
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................38
SMLid:...........................13
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................Extended speed
LinkSpeedExtSupported:...........14.0625 Gbps
LinkSpeedExtEnabled:.............14.0625 Gbps
LinkSpeedExtActive:..............14.0625 Gbps
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0
ibwarn: [168950] umad_close_port: closed fd 3
+ sleep 10
+ ibstat
ibwarn: [168988] umad_init: umad_init
ibwarn: [168988] umad_get_ca_device_list: return 1 cas
ibwarn: [168988] umad_get_ca: ca_name mlx4_0
ibwarn: [168988] umad_get_ca: opened mlx4_0
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0xf4521403007be0e0
System image GUID: 0xf4521403007be0e3
Port 1:
State: Down
Physical state: Disabled
Rate: 10
Base lid: 38
LMC: 0
SM lid: 13
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e1
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 1
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e2
Link layer: InfiniBand
+ ibportstate -G 0xf4521403007be0e1 1 enable
ibwarn: [168989] umad_init: umad_init
ibwarn: [168989] umad_open_port: ca (null) port 0
ibwarn: [168989] umad_get_ca_device_list: return 1 cas
ibwarn: [168989] resolve_ca_name: checking ca 'mlx4_0'
ibwarn: [168989] resolve_ca_port: checking ca 'mlx4_0'
ibwarn: [168989] umad_get_ca: ca_name mlx4_0
ibwarn: [168989] umad_get_ca: opened mlx4_0
ibwarn: [168989] resolve_ca_port: checking port 0
ibwarn: [168989] resolve_ca_port: checking port 1
ibwarn: [168989] resolve_ca_port: checking port 2
ibwarn: [168989] resolve_ca_port: found active port 2
ibwarn: [168989] resolve_ca_name: found ca mlx4_0 with port 2 type 1
ibwarn: [168989] resolve_ca_name: found ca mlx4_0 with active port 2
ibwarn: [168989] umad_open_port: opening mlx4_0 port 2
ibwarn: [168989] dev_to_umad_id: mapped mlx4_0 2 to 1
ibwarn: [168989] umad_open_port: opened /dev/infiniband/umad1 fd 3 portid 1
ibwarn: [168989] umad_register: fd 3 mgmt_class 1 mgmt_version 1 rmpp_version 0 method_mask (nil)
ibwarn: [168989] umad_register: fd 3 registered to use agent 0 qp 0
ibwarn: [168989] umad_register: fd 3 mgmt_class 129 mgmt_version 1 rmpp_version 0 method_mask (nil)
ibwarn: [168989] umad_register: fd 3 registered to use agent 1 qp 0
ibwarn: [168989] umad_register: fd 3 mgmt_class 3 mgmt_version 2 rmpp_version 1 method_mask (nil)
ibwarn: [168989] umad_register: fd 3 registered to use agent 2 qp 1
ibwarn: [168989] umad_get_port: ca_name (null) portnum 0
ibwarn: [168989] umad_get_ca_device_list: return 1 cas
ibwarn: [168989] resolve_ca_name: checking ca 'mlx4_0'
ibwarn: [168989] resolve_ca_port: checking ca 'mlx4_0'
ibwarn: [168989] umad_get_ca: ca_name mlx4_0
ibwarn: [168989] umad_get_ca: opened mlx4_0
ibwarn: [168989] resolve_ca_port: checking port 0
ibwarn: [168989] resolve_ca_port: checking port 1
ibwarn: [168989] resolve_ca_port: checking port 2
ibwarn: [168989] resolve_ca_port: found active port 2
ibwarn: [168989] resolve_ca_name: found ca mlx4_0 with port 2 type 1
ibwarn: [168989] resolve_ca_name: found ca mlx4_0 with active port 2
ibwarn: [168989] umad_release_port: port mlx4_0:2
ibwarn: [168989] umad_release_port: releasing mlx4_0:2
ibwarn: [168989] umad_get_port: ca_name (null) portnum 0
ibwarn: [168989] umad_get_ca_device_list: return 1 cas
ibwarn: [168989] resolve_ca_name: checking ca 'mlx4_0'
ibwarn: [168989] resolve_ca_port: checking ca 'mlx4_0'
ibwarn: [168989] umad_get_ca: ca_name mlx4_0
ibwarn: [168989] umad_get_ca: opened mlx4_0
ibwarn: [168989] resolve_ca_port: checking port 0
ibwarn: [168989] resolve_ca_port: checking port 1
ibwarn: [168989] resolve_ca_port: checking port 2
ibwarn: [168989] resolve_ca_port: found active port 2
ibwarn: [168989] resolve_ca_name: found ca mlx4_0 with port 2 type 1
ibwarn: [168989] resolve_ca_name: found ca mlx4_0 with active port 2
ibwarn: [168989] umad_release_port: port mlx4_0:2
ibwarn: [168989] umad_release_port: releasing mlx4_0:2
ibwarn: [168989] umad_set_addr: umad 0x7ffed38e3b80 dlid 1 dqp 1 sl 0, qkey 80010000
ibwarn: [168989] umad_send: fd 3 agentid 2 umad 0x7ffed38e3b80 timeout 1000
ibwarn: [168989] umad_dump: agent id 2 status 0 timeout 1000
ibwarn: [168989] umad_addr_dump: qpn 1 qkey 0x80010000 lid 1 sl 0
grh_present 0 gid_index 0 hop_limit 0 traffic_class 0 flow_label 0x0 pkey_index 0x0
Gid 0x00000000000000000000000000000000
ibwarn: [168989] umad_recv: fd 3 umad 0x7ffed38e3f80 timeout 1000
ibwarn: [168989] umad_recv: mad received by agent 2 length 320
ibwarn: [168989] ib_path_query_via: sa call path_query failed
ibportstate: iberror: failed: can't resolve destination port 0xf4521403007be0e1
+ sleep 10
+ ibstat
ibwarn: [168996] umad_init: umad_init
ibwarn: [168996] umad_get_ca_device_list: return 1 cas
ibwarn: [168996] umad_get_ca: ca_name mlx4_0
ibwarn: [168996] umad_get_ca: opened mlx4_0
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0xf4521403007be0e0
System image GUID: 0xf4521403007be0e3
Port 1:
State: Down
Physical state: Disabled
Rate: 10
Base lid: 38
LMC: 0
SM lid: 13
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e1
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 1
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e2
Link layer: InfiniBand
+ ibportstate -G 0xf4521403007be0e1 1 -C mlx4_0 -P 1 enable
ibwarn: [168997] umad_init: umad_init
ibwarn: [168997] umad_open_port: ca mlx4_0 port 1
ibwarn: [168997] umad_open_port: opening mlx4_0 port 1
ibwarn: [168997] dev_to_umad_id: mapped mlx4_0 1 to 0
ibwarn: [168997] umad_open_port: opened /dev/infiniband/umad0 fd 3 portid 0
ibwarn: [168997] umad_register: fd 3 mgmt_class 1 mgmt_version 1 rmpp_version 0 method_mask (nil)
ibwarn: [168997] umad_register: fd 3 registered to use agent 0 qp 0
ibwarn: [168997] umad_register: fd 3 mgmt_class 129 mgmt_version 1 rmpp_version 0 method_mask (nil)
ibwarn: [168997] umad_register: fd 3 registered to use agent 1 qp 0
ibwarn: [168997] umad_register: fd 3 mgmt_class 3 mgmt_version 2 rmpp_version 1 method_mask (nil)
ibwarn: [168997] umad_register: fd 3 registered to use agent 2 qp 1
ibwarn: [168997] umad_get_port: ca_name mlx4_0 portnum 1
ibwarn: [168997] umad_release_port: port mlx4_0:1
ibwarn: [168997] umad_release_port: releasing mlx4_0:1
ibwarn: [168997] umad_get_port: ca_name mlx4_0 portnum 1
ibwarn: [168997] umad_release_port: port mlx4_0:1
ibwarn: [168997] umad_release_port: releasing mlx4_0:1
ibwarn: [168997] umad_set_addr: umad 0x7ffcf771a310 dlid 13 dqp 1 sl 0, qkey 80010000
ibwarn: [168997] umad_send: fd 3 agentid 2 umad 0x7ffcf771a310 timeout 1000
ibwarn: [168997] umad_dump: agent id 2 status 0 timeout 1000
ibwarn: [168997] umad_addr_dump: qpn 1 qkey 0x80010000 lid 13 sl 0
grh_present 0 gid_index 0 hop_limit 0 traffic_class 0 flow_label 0x0 pkey_index 0x0
Gid 0x00000000000000000000000000000000
ibwarn: [168997] umad_recv: fd 3 umad 0x7ffcf771a710 timeout 1000
ibwarn: [168997] _do_madrpc: recv failed: Connection timed out
ibwarn: [168997] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 13)
ibwarn: [168997] ib_path_query_via: sa call path_query failed
ibportstate: iberror: failed: can't resolve destination port 0xf4521403007be0e1
+ sleep 10
+ ibstat
ibwarn: [169003] umad_init: umad_init
ibwarn: [169003] umad_get_ca_device_list: return 1 cas
ibwarn: [169003] umad_get_ca: ca_name mlx4_0
ibwarn: [169003] umad_get_ca: opened mlx4_0
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0xf4521403007be0e0
System image GUID: 0xf4521403007be0e3
Port 1:
State: Down
Physical state: Disabled
Rate: 10
Base lid: 38
LMC: 0
SM lid: 13
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e1
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 1
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e2
Link layer: InfiniBand
+ ibportstate 38 1 -C mlx4_0 -P 1 enable
ibwarn: [169004] umad_init: umad_init
ibwarn: [169004] umad_open_port: ca mlx4_0 port 1
ibwarn: [169004] umad_open_port: opening mlx4_0 port 1
ibwarn: [169004] dev_to_umad_id: mapped mlx4_0 1 to 0
ibwarn: [169004] umad_open_port: opened /dev/infiniband/umad0 fd 3 portid 0
ibwarn: [169004] umad_register: fd 3 mgmt_class 1 mgmt_version 1 rmpp_version 0 method_mask (nil)
ibwarn: [169004] umad_register: fd 3 registered to use agent 0 qp 0
ibwarn: [169004] umad_register: fd 3 mgmt_class 129 mgmt_version 1 rmpp_version 0 method_mask (nil)
ibwarn: [169004] umad_register: fd 3 registered to use agent 1 qp 0
ibwarn: [169004] umad_register: fd 3 mgmt_class 3 mgmt_version 2 rmpp_version 1 method_mask (nil)
ibwarn: [169004] umad_register: fd 3 registered to use agent 2 qp 1
ibwarn: [169004] umad_set_addr: umad 0x7ffe9cef0950 dlid 38 dqp 0 sl 0, qkey 0
ibwarn: [169004] umad_send: fd 3 agentid 0 umad 0x7ffe9cef0950 timeout 1000
ibwarn: [169004] umad_dump: agent id 0 status 0 timeout 1000
ibwarn: [169004] umad_addr_dump: qpn 0 qkey 0x0 lid 38 sl 0
grh_present 0 gid_index 0 hop_limit 0 traffic_class 0 flow_label 0x0 pkey_index 0x0
Gid 0x00000000000000000000000000000000
ibwarn: [169004] umad_recv: fd 3 umad 0x7ffe9cef0d50 timeout 1000
ibwarn: [169004] umad_recv: mad received by agent 0 length 320
Initial CA/RT PortInfo:
ibwarn: [169004] umad_set_addr: umad 0x7ffe9cef08e0 dlid 38 dqp 0 sl 0, qkey 0
ibwarn: [169004] umad_send: fd 3 agentid 0 umad 0x7ffe9cef08e0 timeout 1000
ibwarn: [169004] umad_dump: agent id 0 status 0 timeout 1000
ibwarn: [169004] umad_addr_dump: qpn 0 qkey 0x0 lid 38 sl 0
grh_present 0 gid_index 0 hop_limit 0 traffic_class 0 flow_label 0x0 pkey_index 0x0
Gid 0x00000000000000000000000000000000
ibwarn: [169004] umad_recv: fd 3 umad 0x7ffe9cef0ce0 timeout 1000
ibwarn: [169004] umad_recv: mad received by agent 0 length 320
# Port info: Lid 38 port 1
LinkState:.......................Down
PhysLinkState:...................Disabled
Lid:.............................38
SMLid:...........................13
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
LinkSpeedExtSupported:...........14.0625 Gbps
LinkSpeedExtEnabled:.............14.0625 Gbps
LinkSpeedExtActive:..............No Extended Speed
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0
ibwarn: [169004] umad_set_addr: umad 0x7ffe9cef0950 dlid 38 dqp 0 sl 0, qkey 0
ibwarn: [169004] umad_send: fd 3 agentid 0 umad 0x7ffe9cef0950 timeout 1000
ibwarn: [169004] umad_dump: agent id 0 status 0 timeout 1000
ibwarn: [169004] umad_addr_dump: qpn 0 qkey 0x0 lid 38 sl 0
grh_present 0 gid_index 0 hop_limit 0 traffic_class 0 flow_label 0x0 pkey_index 0x0
Gid 0x00000000000000000000000000000000
ibwarn: [169004] umad_recv: fd 3 umad 0x7ffe9cef0d50 timeout 1000
ibwarn: [169004] umad_recv: mad received by agent 0 length 320
# MLNX ext Port info: Lid 38 port 1
StateChangeEnable:...............0x00
LinkSpeedSupported:..............0x01
LinkSpeedEnabled:................0x01
LinkSpeedActive:.................0x00
ibwarn: [169004] umad_set_addr: umad 0x7ffe9cef0960 dlid 38 dqp 0 sl 0, qkey 0
ibwarn: [169004] umad_send: fd 3 agentid 0 umad 0x7ffe9cef0960 timeout 1000
ibwarn: [169004] umad_dump: agent id 0 status 0 timeout 1000
ibwarn: [169004] umad_addr_dump: qpn 0 qkey 0x0 lid 38 sl 0
grh_present 0 gid_index 0 hop_limit 0 traffic_class 0 flow_label 0x0 pkey_index 0x0
Gid 0x00000000000000000000000000000000
ibwarn: [169004] umad_recv: fd 3 umad 0x7ffe9cef0d60 timeout 1000
ibwarn: [169004] umad_recv: mad received by agent 0 length 320
After PortInfo set:
# Port info: Lid 38 port 1
LinkState:.......................Down
PhysLinkState:...................Polling
Lid:.............................38
SMLid:...........................13
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
LinkSpeedExtSupported:...........14.0625 Gbps
LinkSpeedExtEnabled:.............14.0625 Gbps
LinkSpeedExtActive:..............No Extended Speed
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0
ibwarn: [169004] umad_close_port: closed fd 3
+ sleep 10
+ ibstat
ibwarn: [169298] umad_init: umad_init
ibwarn: [169298] umad_get_ca_device_list: return 1 cas
ibwarn: [169298] umad_get_ca: ca_name mlx4_0
ibwarn: [169298] umad_get_ca: opened mlx4_0
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0xf4521403007be0e0
System image GUID: 0xf4521403007be0e3
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 38
LMC: 0
SM lid: 13
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e1
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 1
Capability mask: 0x02594868
Port GUID: 0xf4521403007be0e2
Link layer: InfiniBand |
@Honggang-LI A quick reading reveals: ibportstate 38 1 disable To disable a port, you must do so on the switch port it is connected. Or, is this a deliberate negative testing? Give Lid1 and Lid2 being local lids of the ports on an HCA, you can do: ibtracert lid1 lid2 |
Why?
Yes. For example, while we run NVME over IB, we flip the port state to simulate path fail. |
Because, at least for IB HCAs, it doesn't work.
I meant negative testing of ibportstate. Snip from the ibportstate man page: ibportstate allows the port state and port physical state of an IB port to be queried (...), or a switch port to be disabled, enabled, or reset. It also allows the link speed/width enabled on any IB port to be adjusted. |
In fact, it works for me for IB HCAs. Which type of HCA you had tested? I tested mlx4, mlx5, qib and mellanox connectIB. They all work for me. I locally changed Please see chapter 14 of InfiniBand Architecture Release 1.3 for details of PortInfo and local changes.
OK, I got it. But locally disable a HCA port and disable a switch port the HCA port connected to are different things. For example, at least the HCA port physical state is different. When locally disable HCA port, its physical port state
It seems we also need to update the man page for ibportstate. Here is an example of locally changed of HCA port state.
|
1873812
to
ddb638c
Compare
Yep, the args you present to ibportstate actually works. Just verified on a mlx4 system w IB link-layer. |
Is there some conclusion here? |
I thought the issue was just user error. But perhaps I miss-read something? |
The ibportstat documentation is misleading. We need update the man-page and example in usage message. |
I can see that being a problem. Will you close this and open another issue? |
…tstate A host, from which execute the enable/disable/reset command, may be connected to multiple InfiniBand fabrics. When the HCA name and Port number were not specified, the libibumad library will pick up the first active port it was found, which may not be wanted. Recommend to specific the HCA name and Port number when run ibportstate. On the other hand, HCA port may be locally changed without the knowledge of the Subnet Manager. When locally enable a disabled HCA port, the HCA name and Port number must be specified. Signed-off-by: Honggang Li <honli@redhat.com>
New PR: |
To enable a disabled port, the CA name and Port number is needed.
The old example of port enabling does not work, so replace it.
Signed-off-by: Honggang Li honli@redhat.com