Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[action] [PR:15583] [arp_update]: Fix IPv6 neighbor race condition #15694

Merged
merged 1 commit into from
Jul 1, 2023

Conversation

mssonicbld
Copy link
Collaborator

Why I did it

A race condition exists which makes it possible for the kernel to resolve a trapped packet's destination IP at the same time that arp_update is running the ip neigh replace command for that neighbor IP. When this occurs, the kernel's neighbor entry for this IP is in the INCOMPLETE state, and the ip neigh replace command sets it to permanently incomplete. This means no netlink message will be generated for this neighbor, since the kernel doesn't generate netlink messages for INCOMPLETE neighbors (it would only generate a message once the neighbor transitions to FAILED, which doesn't happen due to the ip neigh replace command). As a result, no APPL_DB neighbor table entry is ever created and no tunnel route for the IP is ever installed, leading to dropped traffic.

Work item tracking
  • Microsoft ADO (number only): 24152065

How I did it

  • Check each FAILED or INCOMPLETE neighbor to see if a corresponding APPL_DB neighbor table entry exists. If no entry exists, flush the neighbor from the kernel
  • After pinging the neighbor IPs, wait for any neighbor entries which might be transiently INCOMPLETE to transition to FAILED (so that the subsequent ip neigh replace command can set them to permanently incomplete)
  • Exclude any INCOMPLETE neighbors from the ip neigh replace command in case they are new neighbors for which no netlink message has been generated yet

How to verify it

  • Run arp_update with no FAILED or INCOMPLETE neighbor entries, verify no changes are made to the kernel neighbor table

  • Run arp_update with a FAILED neighbor entry with corresponding APPL_DB entry, verify the neighbor IP is pinged and set to INCOMPLETE permanently

  • Run arp_update with an INCOMPLETE neighbor entry with corresponding APPL_DB entry, verify the neighbor IP is pinged and set to INCOMPLETE permanently

  • Run arp_update with a FAILED neighbor entry without corresponding APPL_DB entry, verify the neighbor is flushed, pinged, and set to INCOMPLETE permanently

  • Run arp_update with an INCOMPLETE neighbor entry without corresponding APPL_DB entry, verify the neighbor is flushed, pinged, and set to INCOMPLETE permanently

  • Run arp_update with various combinations of above scenarios - verify that only neighbors missing APPL_DB entries are flushed from the kernel; verify that all FAILED/INCOMPLETE neighbors are pinged and set to permanently INCOMPLETE

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

* [arp_update]: Fix IPv6 neighbor race condition on dualtors
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
@mssonicbld
Copy link
Collaborator Author

Original PR: #15583

@mssonicbld mssonicbld merged commit 7952fe7 into sonic-net:202205 Jul 1, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants