Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bulk mode] Fix bulk conflict when in case there are both remove and set operations #2071

Merged
merged 3 commits into from Dec 14, 2021

Conversation

shi-su
Copy link
Contributor

@shi-su shi-su commented Dec 8, 2021

What I did
Check if there are items pending removal in bulk before calling bulk set API.

Fixes sonic-net/sonic-buildimage#9434

Why I did it
When there are items pending removal in bulk before calling set API, it means the item will be removed before the set and it should do create instead.

How I verified it

Details if related

@qiluo-msft
Copy link
Contributor

qiluo-msft commented Dec 8, 2021

auto it = consumer.m_toSync.begin();

I have another idea. For route orch, if m_toSync gets a DEL/SET, we can safely merge it into a SET. Then it will simplify the bulking logic. his is possible because the new SET has covered all the fields compared the existing fields before DEL.
And this will lead to a general solution.


In reply to: 989219096


Refers to: orchagent/routeorch.cpp:468 in 2a72407. [](commit_id = 2a72407, deletion_comment = False)

*/
if (it_route == m_syncdRoutes.at(vrf_id).end())
if (it_route == m_syncdRoutes.at(vrf_id).end() || gRouteBulker.bulk_entry_pending_removal(route_entry))
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a unit test? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a unit test for the bulker function

Copy link
Collaborator

@stephenxs stephenxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic works only if removing a routing entry always succeeds.
If removing a routing entry can fail and the routing entry is still in SAI table in that case, the logic will be broken. Can that happen?

@stephenxs
Copy link
Collaborator

auto it = consumer.m_toSync.begin();

I have another idea. For route orch, if m_toSync gets a DEL/SET, we can safely merge it into a SET. Then it will simplify the bulking logic. his is possible because the new SET has covered all the fields compared the existing fields before DEL. And this will lead to a general solution.

Sounds good but I don't know what happens if the routing entry switches from a normal nexthop/nhg to some special thing like a tunnel/MPLS. In that case, is removing the old entry a must or it can just override the attributes?
In addToSync there is similar logic to merge multiple operations on the same key, but it doesn't merge a removal to a setting (S, D => D). It only merges a setting to a removal.

@shi-su
Copy link
Contributor Author

shi-su commented Dec 9, 2021

This logic works only if removing a routing entry always succeeds. If removing a routing entry can fail and the routing entry is still in SAI table in that case, the logic will be broken. Can that happen?

This is a valid concern. This logic is based on the assumption that the remove operation is successful. In the current stage, the removal failure would cause orchagent exit, thus the situation should not happen. Also, the original behavior does not seem to be working. Let's assume the remove operation fails, the set operation would become a set_entry_attribute. In case that the set is successful and the remove gets retried, it might end up with the route getting removed though it is supposed to be there.

@shi-su
Copy link
Contributor Author

shi-su commented Dec 9, 2021

auto it = consumer.m_toSync.begin();

I have another idea. For route orch, if m_toSync gets a DEL/SET, we can safely merge it into a SET. Then it will simplify the bulking logic. his is possible because the new SET has covered all the fields compared the existing fields before DEL. And this will lead to a general solution.

Sounds good but I don't know what happens if the routing entry switches from a normal nexthop/nhg to some special thing like a tunnel/MPLS. In that case, is removing the old entry a must or it can just override the attributes? In addToSync there is similar logic to merge multiple operations on the same key, but it doesn't merge a removal to a setting (S, D => D). It only merges a setting to a removal.

I have some concerns about making merging DEL and SET into one SET a general solution. For some SAI APIs, some attributes may only be configurable on creation. In such cases, we should keep both operations and strictly do a DEL and then CREATE. Yet I do agree that it is a valid solution specific to route orch. And it could be an additional performance improvement on top of this fix.

@stephenxs
Copy link
Collaborator

This logic works only if removing a routing entry always succeeds. If removing a routing entry can fail and the routing entry is still in SAI table in that case, the logic will be broken. Can that happen?

This is a valid concern. This logic is based on the assumption that the remove operation is successful. In the current stage, the removal failure would cause orchagent exit, thus the situation should not happen. Also, the original behavior does not seem to be working. Let's assume the remove operation fails, the set operation would become a set_entry_attribute. In case that the set is successful and the remove gets retried, it might end up with the route getting removed though it is supposed to be there.

Ok. That means it doesn't introduce new risks or degradation.

@dprital
Copy link
Collaborator

dprital commented Dec 13, 2021

@shi-su, Can you please retrigger the checkers for this PR ?

@shi-su
Copy link
Contributor Author

shi-su commented Dec 13, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator

vs test failed due to ##[error]Artifact sonic-gcov already exists for build 58142.

@shi-su
Copy link
Contributor Author

shi-su commented Dec 14, 2021

vs test failed due to ##[error]Artifact sonic-gcov already exists for build 58142.

I retriggered the tests. I believe the status of sonic-gcov will be refreshed after this run if it passes.

@shi-su shi-su merged commit 5d5c169 into sonic-net:master Dec 14, 2021
@shi-su shi-su deleted the bulk_conflict branch December 14, 2021 06:50
liat-grozovik pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Dec 28, 2021
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086)
a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041)
71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051)
5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071)
8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072)
ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811)
89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064)
8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040)
b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT
ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060)
45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049)
b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054)
d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009)
24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029)
15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897)
ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951)
e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955)
bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997)
f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026)
fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910)
9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992)
3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048)
fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987)
16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907)
9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642)

Signed-off-by: Stephen Sun stephens@nvidia.com
stephenxs added a commit to stephenxs/sonic-buildimage that referenced this pull request Jan 6, 2022
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086)
a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041)
71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051)
5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071)
8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072)
ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811)
89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064)
8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040)
b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT
ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060)
45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049)
b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054)
d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009)
24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029)
15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897)
ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951)
e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955)
bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997)
f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026)
fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910)
9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992)
3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048)
fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987)
16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907)
9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642)

Signed-off-by: Stephen Sun stephens@nvidia.com
judyjoseph pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jan 6, 2022
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086)
a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041)
71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051)
5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071)
8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072)
ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811)
89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064)
8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040)
b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT
ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060)
45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049)
b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054)
d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009)
24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029)
15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897)
ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951)
e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955)
bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997)
f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026)
fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910)
9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992)
3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048)
fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987)
16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907)
9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642)

Signed-off-by: Stephen Sun stephens@nvidia.com
preetham-singh pushed a commit to preetham-singh/sonic-swss that referenced this pull request Aug 6, 2022
…set operations (sonic-net#2071)

What I did
Check if there are items pending removal in bulk before calling bulk set API.

Fixes sonic-net/sonic-buildimage#9434

Why I did it
When there are items pending removal in bulk before calling set API, it means the item will be removed before the set and it should do create instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants