Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve eeprom access reliability #2756

Merged
merged 2 commits into from
Apr 17, 2019

Conversation

andriymoroz-mlnx
Copy link
Collaborator

@andriymoroz-mlnx andriymoroz-mlnx commented Apr 9, 2019

Signed-off-by: Andriy Moroz c_andriym@mellanox.com

- What I did
Updated eeprom plugin to retry when failed (5 retries)

- How I did it
Check if eeprom available. If not - wait 1 sec and retry

- How to verify it

- Description for the changelog

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>
@qiluo-msft
Copy link
Collaborator

qiluo-msft commented Apr 9, 2019

How do you test in DUT?
As we discussed in email, could you help permanently disable the EEPROM driver, and start the box from clean and load_minigraph?
Another test case: follow the above one, reboot, and enable the driver, test the system recover. #Closed

Copy link
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments

Co-Authored-By: andriymoroz-mlnx <c_andriym@mellanox.com>
@andriymoroz-mlnx
Copy link
Collaborator Author

How do you test in DUT?
As we discussed in email, could you help permanently disable the EEPROM driver, and start the box from clean and load_minigraph?
Another test case: follow the above one, reboot, and enable the driver, test the system recover.

Test#1: No mac - system keeps restarting services
Test#2 System recovered successfully after reboot + load_minigraph

@qiluo-msft
Copy link
Collaborator

@lguohan vsimage test failed because of there is no platform/vs/kvm-image.mk on 201803 branch. Can we remove it from the PR check list for this branch?

@lguohan lguohan merged commit 18e8151 into sonic-net:201803 Apr 17, 2019
StormLiangMS added a commit that referenced this pull request Mar 30, 2023
Why I did it
832ef9c4 - Fix bug in GCU vlanintf_validator ([Bcm SAI] ugprade Broadcom SAI to version 3.3.5.4m-1 #2765) (5 minutes ago) [jingwenxie]
53f611b7 - Revert "Convert IPv6 addresses to lowercase in apply-patch (Add Pegatron project to branch 201807 #2299)" (Add note for running out of disk space in /var/lib/docker to README.md #2758) (20 hours ago) [jingwenxie]
79a21cef - Revert frr route check ([mlnx] fix url inconsistency in fw.mk #2761) (8 minutes ago) [StormLiangMS]
824680ed - Resolved rc!=0 problem by replacing fgrep with awk. Added ipv4 filtering to get only v4 peers in case of show ip bgp neighbors (Improve eeprom access reliability #2756) (30 hours ago) [saurabh17g]
10f31ea6 - Revert "Replace pickle by json (Add autoneg to 7170-Q59S20 #2636)" ([hostcfgd] Default value of fallthrough for authentication set to be False.  #2746) (7 days ago) [Mai Bui]
05fa7513 - Fix the show interface counters throwing exception on device with no external interfaces ([docker-platform-monitor]: Add smartmontools 6.6-1 #2703) (11 days ago) [abdosi]
f27dea0c - [route_check] remove check-frr_patch mock ([minigraph]: Mark both ERSPAN and ERSPANv6 as mirror ACL tables #2732) (11 days ago) [Stepan Blyshchak]
2d95529d - Revert "Update load minigraph to load backend acl (mlnx msn2010: default config_db.json generation with sonic-cfggen is not working #2236)" (swss stretch update broke restore_neighbors.py for neigh service #2735) (12 days ago) [Neetha John]
c869c970 - (master) Update the ref guide to reflect the vlan brief output ([teamd] update teamd docker to stretch and fix teamd_init failure #2731) (2 weeks ago) [Vivek]
76457141 - Fix fast-reboot DB migration ([teamd]: update teamd docker to stretch #2734) (2 weeks ago) [Aryeh Feigin]
f7f783bc - Enhance the logic to wait for all buffer tables to be removed in _clear_qos ([sfputil] Not able to read out values of voltage/temp/power on some cables  #2720) (2 weeks ago) [Stephen Sun]
e6179afa - Remove timer from FAST_REBOOT STATE_DB entry and use finalizer (Rollback kernel submodule update. #2621) (3 weeks ago) [Aryeh Feigin]
ff688323 - [route_check] fix IPv6 address handling ([docker pmon] install fancontrol & sensord #2722) (3 weeks ago) [Stepan Blyshchak]
7a604c51 - update fast-reboot ([201811][sairedis][swss] advance sub module head of sairedis and swss #2728) (3 weeks ago) [jhli-cisco]
9f83ace9 - [GCU] Add vlanintf-validator (Revert "[device/celestica] blacklist gpio_ich kernel module on haliburton" #2697) (3 weeks ago) [jingwenxie]
338d1c05 - Check SONiC dependencies before installation. ([sonic-slave]: Add iproute2 dependencies in stretch docker #2716) (3 weeks ago) [Liu Shilong]
64d2efd2 - Improve show acl commands ([sonic-utilities] update submodule #2667) (3 weeks ago) [bingwang-ms]
2ef5b31e - [GCU] Add PFC_WD RDMA validator ([sub module] advance sonic-utilities sub module for 201811 branch #2619) (3 weeks ago) [isabelmsft]
c7aa8416 - [show][muxcable] increase timeout for displaying HW_STATUS (Fixing get_transceiver_change_event #2712) (3 weeks ago) [vdahiya12]
2fc2b826 - YANG validation for ConfigDB Updates: MIRROR_SESSION use case ([mellanox] Update SDK to 4.3.0132 #2430) (3 weeks ago) [isabelmsft]
e16bdaae - Fix non-zero status exit on non secure boot system ([service] add warmboot finializer service #2715) (3 weeks ago) [kellyyeh]
90d70152 - [route_check] implement a check for FRR routes not marked offloaded (Feature to run an option platform specific script on the first boot #2531) (3 weeks ago) [Stepan Blyshchak]
c2bc150a - [warm/fast-reboot] Backup logs from tmpfs to disk during fast/warm shutdown ([swss]: update swss docker to stretch #2714) (3 weeks ago) [Vaibhav Hemant Dixit]
a015834d - [db_migrator] Add missing attribute 'weight' to route entries in APPL DB ([device/celestica] blacklist gpio_ich kernel module on seastone #2691) (4 weeks ago) [Vaibhav Hemant Dixit]
cd519aac - [ci] Fix pipeline issue caused by sonic-slave-* change. ([201803] Modify Debian apt repos to reflect changes made by maintainers #2709) (4 weeks ago) [Liu Shilong]
2680e6f3 - [dhcp_relay] Fix dhcp_relay restart error while add/del vlan ([thrift] add a patch to revert THRIFT-3650 #2688) (4 weeks ago) [Yaqiang Zhu]
How I did it
How to verify it
mihirpat1 pushed a commit to mihirpat1/sonic-buildimage that referenced this pull request Jun 14, 2023
)

* Sometime during config reload, EVPN NVO table arrives later than remote VNI table entries. In such scenarios, remote vni entries are ignored and this leads to traffic loss. Added fix to retry instead of ignoring.
mihirpat1 pushed a commit to mihirpat1/sonic-buildimage that referenced this pull request Jun 14, 2023
…ic-net#2756)" (sonic-net#2773)

This reverts commit 750e064.
Reverts the PR sonic-net#2756

The fix added breaks the previously added workaround sonic-net#2626. Hence requesting to revert the fix.
Once we find a proper solution for sonic-net#12361 we need to reintegrate this PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants