Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mellanox/Nvidia SN2700 - show platform failure #2771

Open
omaticon opened this issue Apr 3, 2023 · 1 comment
Open

Mellanox/Nvidia SN2700 - show platform failure #2771

omaticon opened this issue Apr 3, 2023 · 1 comment

Comments

@omaticon
Copy link

omaticon commented Apr 3, 2023

Description: When attempting to pull system details on a Mellanox/Nvidia SN2700 utilizing the "show platform" commands the results are either outright errors or display incorrect information. This appears to be due to device files not existing within the sonic-mgmt docker.

Branches: Master, 202211 with builds on April 2nd, 2023j.

Steps to reproduce the issue

  1. From the OS prompt issue any of the "show platform" commands
  2. show platform summary
  3. show platform psustatus
  4. show platform temperature
  5. show platform firmware
  6. show platform fan

Describe the results you received

### admin@sonic:~$ show platform summary
Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700
ASIC: mellanox
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A

### admin@sonic:~$ show platform fan
Fan Not detected

### admin@sonic:~$ show platform firmware
Traceback (most recent call last):
File "/usr/local/bin/fwutil", line 5, in
from fwutil.main import cli
File "/usr/local/lib/python3.9/dist-packages/fwutil/init.py", line 3, in
from . import main
File "/usr/local/lib/python3.9/dist-packages/fwutil/main.py", line 40, in
pdp = PlatformDataProvider()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 162, in init
self.chassis_component_map = self.__get_chassis_component_map()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 168, in __get_chassis_component_map
chassis_name = self.__chassis.get_name()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 524, in get_name
self.initialize_eeprom()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 504, in initialize_eeprom
self._eeprom = Eeprom()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/eeprom.py", line 61, in init
raise RuntimeError("No syseeprom symlink found")
RuntimeError: No syseeprom symlink found

admin@sonic:~$ show platform psustatus

Error: Failed to get the number of PSUs
Error: Failed to get PSU status
Error: failed to get PSU status from state DB

### admin@sonic:~$ show platform ssdhealth
Error response from daemon: Container 94fb7159e219359e9d24424559ac1b225c488d60adca5f71ad28085ca09d9348 is not running
Device Model : StorFly VSF302XC016G-MLX1
Health : N/A
Temperature : 100C

### admin@sonic:~$ show platform syseeprom
Failed to read system EEPROM info from DB

### admin@sonic:~$ show platform temperature
Thermal Not detected

Describe the results you expected

The expected results is for the show platform commands to function properly without crashing, error messages or erroneous results.

Additional information you deem important (e.g. issue happens only occasionally)

This issue occurs every time due to the source of the issue. It appears the issue is the result of the following symlink target files not existing within the sonic-mgmt docker

admin@sonic:~$ ls -l /var/run/hw-management/eeprom
total 0
lrwxrwxrwx 1 root root 71 Apr 2 19:41 cpu_info -> /sys/devices/platform/mlxplat/i2c_mlxcpld.1/i2c-1/i2c-16/16-0051/eeprom
lrwxrwxrwx 1 root root 71 Apr 2 19:40 fan1_info -> /sys/devices/platform/mlxplat/i2c_mlxcpld.1/i2c-1/i2c-11/11-0050/eeprom
lrwxrwxrwx 1 root root 71 Apr 2 19:40 fan2_info -> /sys/devices/platform/mlxplat/i2c_mlxcpld.1/i2c-1/i2c-12/12-0050/eeprom
lrwxrwxrwx 1 root root 71 Apr 2 19:40 fan3_info -> /sys/devices/platform/mlxplat/i2c_mlxcpld.1/i2c-1/i2c-13/13-0050/eeprom
lrwxrwxrwx 1 root root 71 Apr 2 19:41 fan4_info -> /sys/devices/platform/mlxplat/i2c_mlxcpld.1/i2c-1/i2c-14/14-0050/eeprom
lrwxrwxrwx 1 root root 69 Apr 2 19:41 vpd_info -> /sys/devices/platform/mlxplat/i2c_mlxcpld.1/i2c-1/i2c-8/8-0051/eeprom

Output of show version

admin@sonic:~$ show version

SONiC Software Version: SONiC.master.245306-66d3586fd
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
Build commit: 66d3586fd
Build date: Sun Apr  2 13:33:39 UTC 2023
Built by: AzDevOps@vmss-soni000SD0

Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700
ASIC: mellanox
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A
Uptime: 08:34:11 up 12:53,  1 user,  load average: 0.89, 0.94, 0.65
Date: Mon 03 Apr 2023 08:34:11

Docker images:
REPOSITORY                    TAG                       IMAGE ID       SIZE
docker-syncd-mlnx             latest                    717ee0636857   729MB
docker-syncd-mlnx             master.245306-66d3586fd   717ee0636857   729MB
docker-orchagent              latest                    fdcce57c5c1d   327MB
docker-orchagent              master.245306-66d3586fd   fdcce57c5c1d   327MB
docker-fpm-frr                latest                    dd8752e128c7   345MB
docker-fpm-frr                master.245306-66d3586fd   dd8752e128c7   345MB
docker-teamd                  latest                    a226b158e4b5   315MB
docker-teamd                  master.245306-66d3586fd   a226b158e4b5   315MB
docker-macsec                 latest                    0c0d12d901b7   317MB
docker-platform-monitor       latest                    d620efca2d60   730MB
docker-platform-monitor       master.245306-66d3586fd   d620efca2d60   730MB
docker-dhcp-relay             latest                    1ade59a826a9   308MB
docker-eventd                 latest                    30d2c793f8ce   298MB
docker-eventd                 master.245306-66d3586fd   30d2c793f8ce   298MB
docker-sonic-p4rt             latest                    a2121bea14dd   869MB
docker-sonic-p4rt             master.245306-66d3586fd   a2121bea14dd   869MB
docker-snmp                   latest                    546a69d924fe   337MB
docker-snmp                   master.245306-66d3586fd   546a69d924fe   337MB
docker-sonic-telemetry        latest                    37cc35b0ffae   597MB
docker-sonic-telemetry        master.245306-66d3586fd   37cc35b0ffae   597MB
docker-lldp                   latest                    139e7ca03560   341MB
docker-lldp                   master.245306-66d3586fd   139e7ca03560   341MB
docker-database               latest                    7d3c357ca9c2   298MB
docker-database               master.245306-66d3586fd   7d3c357ca9c2   298MB
docker-mux                    latest                    81dfbcced72b   347MB
docker-mux                    master.245306-66d3586fd   81dfbcced72b   347MB
docker-router-advertiser      latest                    8e4a41431131   298MB
docker-router-advertiser      master.245306-66d3586fd   8e4a41431131   298MB
docker-nat                    latest                    bef8f955e112   290MB
docker-nat                    master.245306-66d3586fd   bef8f955e112   290MB
docker-sflow                  latest                    fdc2cf03d81b   288MB
docker-sflow                  master.245306-66d3586fd   fdc2cf03d81b   288MB
docker-sonic-mgmt-framework   latest                    c19e23c66b55   416MB
docker-sonic-mgmt-framework   master.245306-66d3586fd   c19e23c66b55   416MB
@omaticon
Copy link
Author

omaticon commented Apr 3, 2023

admin@sonic:~$ sudo generate_dump
Lock succesfully accquired and installed signal handlers
can't get debug descriptor: Resource temporarily unavailable
can't get device qualifier: Resource temporarily unavailable
can't get debug descriptor: Resource temporarily unavailable
can't get debug descriptor: Resource temporarily unavailable
can't get device qualifier: Resource temporarily unavailable
can't get debug descriptor: Resource temporarily unavailable
/usr/local/bin/generate_dump: line 535: $2: unbound variable
ERR: RC:-1 observed on line 535
conntrack v1.4.6 (conntrack-tools): 0 flow entries have been shown.
conntrack v1.4.6 (conntrack-tools): 0 flow entries have been shown.
conntrack v1.4.6 (conntrack-tools): 1521 flow entries have been shown.
conntrack v1.4.6 (conntrack-tools): 1521 flow entries have been shown.
bfdd is not running
bfdd is not running
bfdd is not running
bfdd is not running
init_buffer_resource_limits[
num_ingress_pools:8
num_egress_pools:8
num_shared_headroom_pools:1
num_total_pools:42
num_port_queue_buff:16
num_port_pg_buff:8
num_headroom_port_buff:9
unit_size:96
max_buffers_per_port:94
init_buffer_resource_limits]
mlnx_init_buffer_pool_ids[
default_ingress_pool_id:0
management_ingress_pool_id:2
base_ingress_user_sx_pool_id:4
default_egress_pool_id:12
management_egress_pool_id:14
base_egress_user_sx_pool_id:16
default_multicast_pool_id:10
user_pool_step:2
default_descriptor_ingress_pool_id:21
default_descriptor_egress_pool_id:30
mlnx_init_buffer_pool_ids]
The SAI dump is generated to /tmp/saisdkdump/sai_sdk_dump_04_03_2023_08_36_AM
/tmp/saisdkdump
Remove secret from etc files.
sed: can't read /var/dump/sonic_dump_sonic_20230403_083524/etc/pam_radius_auth.d/*.conf: No such file or directory
sed: can't read /etc/cron.d/logrotate: No such file or directory
ERR: RC:-2 observed on line 1357
sed: can't read /etc/cron.d/logrotate: No such file or directory
ERR: RC:-2 observed on line 1379
tar: sonic_dump_sonic_20230403_083524/etc/runit/runsvdir/default/ssh/log/supervise: File removed before we read it
ERR: RC:-1 observed on line 279
sonic_dump_sonic_20230403_083524/log/sai_sdk_dump_04_02_2023_05_44_PM.tar.gz
/var/dump/sonic_dump_sonic_20230403_083524/sai_failure_dump /home/admin
/home/admin
sonic_dump_sonic_20230403_083524/log/sai-dfw-1680462217.tar.gz
/var/dump/sonic_dump_sonic_20230403_083524/sai_sdk_dump /home/admin
/home/admin
sonic_dump_sonic_20230403_083524/log/sai-dfw-1680458643.tar.gz
/var/dump/sonic_dump_sonic_20230403_083524/sai_sdk_dump /home/admin
/home/admin
sonic_dump_sonic_20230403_083524/log/sai-dfw-1680457441.tar.gz
/var/dump/sonic_dump_sonic_20230403_083524/sai_sdk_dump /home/admin
/home/admin
Cleaning up working directory /var/dump/sonic_dump_sonic_20230403_083524
Removing lock. Exit: 0
/var/dump/sonic_dump_sonic_20230403_083524.tar.gz

The dump file itself is 43MB and too large to upload directly. What is the preferred file repo service to utilize?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant