Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Xcvrd crash due to invalid key access in type_of_media_interface, host_electrical_interface, connector_dict #206

Merged
merged 3 commits into from
Jul 13, 2021

Conversation

prgeor
Copy link
Collaborator

@prgeor prgeor commented Jul 9, 2021

Signed-off-by: Prince George prgeor@microsoft.com

Description

Fix for xcvrd crashes with the following trace:-

Jul 7 19:18:00.910456 str-DellEmc-z9332f-032 INFO pmon#/supervisord: message repeated 3 times: [ pcied Platform PCIe Configuration file doesn't exist!]
Jul 7 19:18:00.910586 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd Traceback (most recent call last):
Jul 7 19:18:00.910656 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/bin/xcvrd", line 8, in
Jul 7 19:18:00.910722 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd sys.exit(main())
Jul 7 19:18:00.910789 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/xcvrd/xcvrd.py", line 1416, in main
Jul 7 19:18:00.910858 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd xcvrd.run()
Jul 7 19:18:00.910920 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/xcvrd/xcvrd.py", line 1364, in run
Jul 7 19:18:00.910989 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd self.init()
Jul 7 19:18:00.911059 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/xcvrd/xcvrd.py", line 1327, in init
Jul 7 19:18:00.911125 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd post_port_sfp_dom_info_to_db(is_warm_start, self.stop_event)
Jul 7 19:18:00.911193 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/xcvrd/xcvrd.py", line 500, in post_port_sfp_dom_info_to_db
Jul 7 19:18:00.911258 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd post_port_sfp_info_to_db(logical_port_name, int_tbl[asic_index], transceiver_dict, stop_event)
Jul 7 19:18:00.911326 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/xcvrd/xcvrd.py", line 292, in post_port_sfp_info_to_db
Jul 7 19:18:00.911394 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd port_info_dict = _wrapper_get_transceiver_info(physical_port)
Jul 7 19:18:00.911461 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/xcvrd/xcvrd.py", line 163, in _wrapper_get_transceiver_info
Jul 7 19:18:00.911531 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd return platform_chassis.get_sfp(physical_port).get_transceiver_info()
Jul 7 19:18:00.911598 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/sonic_platform/sfp.py", line 542, in get_transceiver_info
Jul 7 19:18:00.911665 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd connector = self._get_eeprom_data('connector')
Jul 7 19:18:00.911733 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/sonic_platform/sfp.py", line 392, in _get_eeprom_data
Jul 7 19:18:00.911801 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd eeprom_data_raw, 0)
Jul 7 19:18:00.911867 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_sfp/qsfp_dd.py", line 221, in parse_connector
Jul 7 19:18:00.911933 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd return sffbase.parse(self, self.connector, connector_data, start_pos)
Jul 7 19:18:00.911998 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_sfp/sffbase.py", line 186, in parse
Jul 7 19:18:00.912065 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd outdict = self.parse_sff(eeprom_map, eeprom_data, start_pos)
Jul 7 19:18:00.912143 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_sfp/sffbase.py", line 158, in parse_sff
Jul 7 19:18:00.912233 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd meta_data, start_pos)
Jul 7 19:18:00.912324 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_sfp/sffbase.py", line 127, in parse_sff_element
Jul 7 19:18:00.912395 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd offset, size)
Jul 7 19:18:00.912465 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_sfp/qsfp_dd.py", line 49, in decode_connector
Jul 7 19:18:00.912533 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd return connector_dict[connector_id]
Jul 7 19:18:00.912603 str-DellEmc-z9332f-032 INFO pmon#/supervisord: xcvrd KeyError: 'ff'
Jul 7 19:18:00.939018 str-DellEmc-z9332f-032 INFO pmon#supervisord 2021-07-07 19:18:00,938 INFO exited: xcvrd (exit status 1; not expected)
Jul 7 19:18:01.651780 str-DellEmc-z9332f-032 NOTICE pmon#psud[44]: PSU absence warning cleared: PSU 1 is inserted back.

Motivation and Context

Due to wrong page selection, the eeprom read returned a value 0xff which was an invalid key into connector_dict[]. Now we verify the key validity before accessing the dict element. The fix for why the eeprom read returned 0xff is being investigated separately.

How Has This Been Tested?

Verified no xcvrd crash across multiple reboots on str-DellEmc-z9332f-032.

Additional Information (Optional)

Signed-off-by: Prince George <prgeor@microsoft.com>
aravindmani-1
aravindmani-1 previously approved these changes Jul 9, 2021
Signed-off-by: Prince George <prgeor@microsoft.com>
aravindmani-1
aravindmani-1 previously approved these changes Jul 9, 2021
Signed-off-by: Prince George <prgeor@microsoft.com>
@prgeor prgeor changed the title Fix for Xcvrd crash Fix Xcvrd crash due to invalid key access in type_of_media_interface, host_electrical_interface, connector_dict Jul 13, 2021
@lguohan lguohan merged commit 87c81de into sonic-net:master Jul 13, 2021
lguohan pushed a commit that referenced this pull request Jul 14, 2021
… host_electrical_interface, connector_dict (#206)

Due to wrong page selection, the eeprom read returned a value 0xff which was an invalid key into connector_dict[]. Now we verify the key validity before accessing the dict element. The fix for why the eeprom read returned 0xff is being investigated separately.

Signed-off-by: Prince George <prgeor@microsoft.com>
lguohan pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jul 22, 2021
To include:
> e168f1d 2021-07-19 pettershao-ragilenetworks: [python coverage] fix result color bar (sonic-net/sonic-platform-common#202)
> 87c81de 2021-07-13 Prince George: Fix Xcvrd crash due to invalid key access in type_of_media_interface, host_electrical_interface, connector_dict (sonic-net/sonic-platform-common#206)
> 4533f82 2021-06-21 ngoc-do: Add a template function that returns list of asics on module (sonic-net/sonic-platform-common#185)
> 1e860c5 2021-06-18 Aravind Mani: Fix decode error when parsing EEPROM fields (sonic-net/sonic-platform-common#199)
> 93641f3 2021-06-17 Sujin Kang: Unifying the platform api for get_pcie_aer_stats with PcieBase (sonic-net/sonic-platform-common#197)
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
To include:
> e168f1d 2021-07-19 pettershao-ragilenetworks: [python coverage] fix result color bar (sonic-net/sonic-platform-common#202)
> 87c81de 2021-07-13 Prince George: Fix Xcvrd crash due to invalid key access in type_of_media_interface, host_electrical_interface, connector_dict (sonic-net/sonic-platform-common#206)
> 4533f82 2021-06-21 ngoc-do: Add a template function that returns list of asics on module (sonic-net/sonic-platform-common#185)
> 1e860c5 2021-06-18 Aravind Mani: Fix decode error when parsing EEPROM fields (sonic-net/sonic-platform-common#199)
> 93641f3 2021-06-17 Sujin Kang: Unifying the platform api for get_pcie_aer_stats with PcieBase (sonic-net/sonic-platform-common#197)
judyjoseph pushed a commit that referenced this pull request Aug 20, 2021
… host_electrical_interface, connector_dict (#206)

Due to wrong page selection, the eeprom read returned a value 0xff which was an invalid key into connector_dict[]. Now we verify the key validity before accessing the dict element. The fix for why the eeprom read returned 0xff is being investigated separately.

Signed-off-by: Prince George <prgeor@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants