Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fdborch] Fix FDB flush issues #2136

Merged
merged 7 commits into from
Aug 9, 2022

Conversation

oleksandrx-kolomeiets
Copy link
Contributor

@oleksandrx-kolomeiets oleksandrx-kolomeiets commented Feb 7, 2022

What I did
Add a flag to FDB entry to ensure that only the requested FDB entries will be deleted on flush notification.

Why I did it
In the mclag scenario, restart one of a pair of leaf switches. After completing the information exchange between leaf switches, the leaf switch will flush the fdb and then add the fdb, and then if the downlink port down, the fdb will be redirected.
At this point, a bug may appear:

ERR swss#orchagent: :- meta_sai_validate_fdb_entry: object key SAI_OBJECT_TYPE_FDB_ENTRY:{"bvid":"oid:0x26000000000943","mac":"B4:56:B9:F0:00:3E","switch_id":"oid:0x21000000000000"} already exists
ERR swss#orchagent: :- addFdbEntry: Failed to create dynamic FDB b4:56:b9:f0:00:3e on Ethernet16, rv:-6

analysis:
flush fdb, there are three steps

  1. Call ASI api after fdborch is processed; sairedis deletes the local cache; deletes fdb in the chip
  2. After deleting fdb in the chip, a flush notification will be sent to sairedis, and the fdb entry in ASIC_DB will be deleted
  3. When fdborch receives the notification, the fdb cached in orch is deleted

If between step 1 and step 3, fdborch processes the event of adding fdb, it may cause the fdb cache in sairedis to be incorrect or the entries in ASIC_DB are incorrect.
In this way, subsequent additions and deletions will cause larger e exceptions.

How I verified it
flush fdb
add fdb to APPL_DB immediately
set this fdb again after

Details if related
Similar fix can be found in this PR #1470, but it has some flaws as mentioned in the comments.

@oleksandrx-kolomeiets
Copy link
Contributor Author

@prsunny @gechiang
Hi!
Please review the changes.
Also, could you help me with CI? Is it a common issue that "Azure.sonic-swss" and "LGTM analysis: C/C++" are failing and if so then how can I fix them?

@prsunny
Copy link
Collaborator

prsunny commented Apr 7, 2022

can you please check the coverage - https://dev.azure.com/mssonic/build/_build/results?buildId=87118&view=codecoverage-tab

You may want to add the VS test to cover the missing flows.

Signed-off-by: Oleksandr Kolomeiets <oleksandrx.kolomeiets@intel.com>
@oleksandrx-kolomeiets oleksandrx-kolomeiets force-pushed the fix_fdb_flush branch 3 times, most recently from 8eeb58b to e22d21c Compare May 3, 2022 21:52
@oleksandrx-kolomeiets
Copy link
Contributor Author

@prsunny @gechiang
Hello.
It seems that all CI issues are fixed.
Please, review the changes.
Thanks!

@gechiang
Copy link
Contributor

@Praveen-Brcm, can you please help review this as well.
Thanks!

@oleksandrx-kolomeiets
Copy link
Contributor Author

@gechiang, @Praveen-Brcm, please review the changes. Thanks!

@oleksandrx-kolomeiets
Copy link
Contributor Author

@prsunny, @gechiang,
can you please review this fix?
Thank you.

Copy link
Contributor

@gechiang gechiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@gechiang
Copy link
Contributor

gechiang commented Aug 8, 2022

@oleksandrx-kolomeiets can you please take care of the EasyCLA check? There was an email sent to the community on how todo this one time registration which you need to do before the check can pass...

@Praveen-Brcm Gentle reminder for you to review this PR.

@Praveen-Brcm
Copy link
Contributor

@gechiang : Changes looks good.

@oleksandrx-kolomeiets
Copy link
Contributor Author

@msosyak, @gechiang, @Praveen-Brcm, thank you for the review!
EasyCLA check has passed. PR is ready to be merged.

@gechiang gechiang merged commit 8dae356 into sonic-net:master Aug 9, 2022
lukasstockner pushed a commit to genesiscloud/sonic-swss that referenced this pull request Nov 14, 2022
* [fdborch] Fix FDB flush issues

Signed-off-by: Oleksandr Kolomeiets <oleksandrx.kolomeiets@intel.com>

* Fix coverage

* Fix test_FdbAddedAfterMemberCreated.

* Fix flush_syncd_notif_ut.cpp.
lukasstockner pushed a commit to genesiscloud/sonic-swss that referenced this pull request Mar 14, 2023
* [fdborch] Fix FDB flush issues

Signed-off-by: Oleksandr Kolomeiets <oleksandrx.kolomeiets@intel.com>

* Fix coverage

* Fix test_FdbAddedAfterMemberCreated.

* Fix flush_syncd_notif_ut.cpp.
lukasstockner pushed a commit to genesiscloud/sonic-swss that referenced this pull request Apr 2, 2023
* [fdborch] Fix FDB flush issues

Signed-off-by: Oleksandr Kolomeiets <oleksandrx.kolomeiets@intel.com>

* Fix coverage

* Fix test_FdbAddedAfterMemberCreated.

* Fix flush_syncd_notif_ut.cpp.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants