Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[muxorch] Always use direct link for SoC IPs #2369

Merged
merged 3 commits into from
Jul 14, 2022

Conversation

lolyu
Copy link
Contributor

@lolyu lolyu commented Jul 5, 2022

What I did
For SoC IPs of ports in active-active cable type, if mux is toggled to standby, still use direct link instead of changing next-hop to the tunnel.

Why I did it
In an active-active dualtor setup, changing the nexthop of route to SoC IP to the tunnel will have the following problem:
If lower ToR is already standby, and somehow the upper ToR decides to toggle itself to standby, the toggle will fail:
linkmgrd decide to toggle to standby --> muxorch disables the SoC IP neighbor and change the route next-hop to the tunnel --> ycabled could not setup gRPC connection.

How I verified it
Add unittest and verify the change on active-active testbeds.

Details if related

Copy link
Collaborator

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to take a different approach for this and not make it part of the nbr_handler. We can just return from the 'update' function.

@lolyu
Copy link
Contributor Author

lolyu commented Jul 6, 2022

I prefer to take a different approach for this and not make it part of the nbr_handler. We can just return from the 'update' function.

OK, I will change it to simply skip the neighbor handler

@prsunny
Copy link
Collaborator

prsunny commented Jul 8, 2022

Propose to have it as below:

  1. Extend Muxcable class to have a list of soc_ip4/6 addresses. Name it say, 'skip_neighbor_list'
  2. In MuxOrch::updateNeighbor, check for this list and return if new neighbor is present in the 'add' case.
  3. No changes to MuxNeighbor class

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Jul 9, 2022

This pull request introduces 1 alert when merging 11a8fff into bf91a49 - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@lolyu lolyu marked this pull request as ready for review July 9, 2022 11:40
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
orchagent/muxorch.cpp Outdated Show resolved Hide resolved
orchagent/muxorch.cpp Outdated Show resolved Hide resolved
orchagent/muxorch.cpp Outdated Show resolved Hide resolved
@prsunny
Copy link
Collaborator

prsunny commented Jul 10, 2022

@lolyu , please dont force-push. I would like to review incremental changes.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@lolyu
Copy link
Contributor Author

lolyu commented Jul 11, 2022

@lolyu , please dont force-push. I would like to review incremental changes.

@prsunny Sure, updated, thanks!

Copy link
Collaborator

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, minor comments

tests/test_mux.py Show resolved Hide resolved
@@ -534,6 +534,11 @@ bool MuxCable::nbrHandler(bool enable, bool update_rt)
void MuxCable::updateNeighbor(NextHopKey nh, bool add)
{
sai_object_id_t tnh = mux_orch_->getNextHopTunnelId(MUX_TUNNEL, peer_ip4_);
if (add && skip_neighbors_.find(nh.ip_address) != skip_neighbors_.end())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if a 'del' operation does not impact the code flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the del operation, all works well.

@lolyu lolyu merged commit 24a0797 into sonic-net:master Jul 14, 2022
yxieca pushed a commit that referenced this pull request Jul 17, 2022
What I did
For SoC IPs of ports in active-active cable type, if mux is toggled to standby, still use direct link instead of changing next-hop to the tunnel.

Why I did it
In an active-active dualtor setup, changing the nexthop of route to SoC IP to the tunnel will have the following problem:
If lower ToR is already standby, and somehow the upper ToR decides to toggle itself to standby, the toggle will fail:
linkmgrd decide to toggle to standby --> muxorch disables the SoC IP neighbor and change the route next-hop to the tunnel --> ycabled could not setup gRPC connection.

How I verified it
Add unittest and verify the change on active-active testbeds.
preetham-singh pushed a commit to preetham-singh/sonic-swss that referenced this pull request Aug 6, 2022
What I did
For SoC IPs of ports in active-active cable type, if mux is toggled to standby, still use direct link instead of changing next-hop to the tunnel.

Why I did it
In an active-active dualtor setup, changing the nexthop of route to SoC IP to the tunnel will have the following problem:
If lower ToR is already standby, and somehow the upper ToR decides to toggle itself to standby, the toggle will fail:
linkmgrd decide to toggle to standby --> muxorch disables the SoC IP neighbor and change the route next-hop to the tunnel --> ycabled could not setup gRPC connection.

How I verified it
Add unittest and verify the change on active-active testbeds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants