Skip to content

[Dualtor-AA] Bug: unexpected syslog errors caused by the test behavior in test_crm_nexthop_group #20563

@congh-nvidia

Description

@congh-nvidia

Is it platform specific

generic

Importance or Severity

Medium

Description of the bug

Syslog errors are observed in test case crm.test_crm.test_crm_nexthop_group on dualtor-aa testbed:

'2025 Aug 29 12:06:28.831329 r-4700-72 ERR swss#orchagent: :- meta_generic_validation_remove: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"2.0.3.132/32","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn\'t exist'
'2025 Aug 29 12:06:28.831555 r-4700-72 ERR swss#orchagent: :- meta_generic_validation_remove: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"2.0.3.120/32","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn\'t exist'
'2025 Aug 29 12:06:28.831632 r-4700-72 ERR swss#orchagent: :- meta_generic_validation_remove: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"2.0.3.141/32","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn\'t exist'
...
'2025 Aug 29 12:06:28.853451 r-4700-72 ERR syncd#SDK: [SAI_ROUTE.ERR] ./src/mlnx_sai_route.c[1113]- mlnx_get_route: Failed to get 1 route entries Entry Not Found.'
'2025 Aug 29 12:06:28.929416 r-4700-72 ERR syncd#SDK: message repeated 9 times: [ [SAI_ROUTE.ERR] ./src/mlnx_sai_route.c[1113]- mlnx_get_route: Failed to get 1 route entries Entry Not Found.]'
'2025 Aug 29 12:06:28.929447 r-4700-72 ERR syncd#SDK: :- sendApiResponse: api SAI_COMMON_API_BULK_REMOVE failed in syncd mode: SAI_STATUS_FAILURE'
'2025 Aug 29 12:06:28.932570 r-4700-72 ERR swss#orchagent: :- flush_removing_entries: EntityBulker.flush remove entries failed
number of entries to remove: 336
status: SAI_STATUS_FAILURE'
'2025 Aug 29 12:06:28.979623 r-4700-72 ERR swss#orchagent: :- removeRoutePost: Failed to remove route prefix:2.0.3.117/32'
'2025 Aug 29 12:06:28.979852 r-4700-72 ERR swss#orchagent: :- removeRoutePost: Failed to remove route prefix:2.0.3.119/32'
...
'2025 Aug 29 12:06:30.894467 r-4700-72 ERR swss#orchagent: :- meta_generic_validation_remove: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"2.0.5.77/32","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn\'t exist'
'2025 Aug 29 12:06:30.894467 r-4700-72 ERR swss#orchagent: :- meta_generic_validation_remove: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"2.0.5.61/32","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn\'t exist'
'2025 Aug 29 12:06:30.894467 r-4700-72 ERR swss#orchagent: :- meta_generic_validation_remove: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"2.0.5.59/32","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn\'t exist'
...
'2025 Aug 29 12:06:30.909964 r-4700-72 ERR syncd#SDK: [SAI_ROUTE.ERR] ./src/mlnx_sai_route.c[1113]- mlnx_get_route: Failed to get 1 route entries Entry Not Found.'
'2025 Aug 29 12:06:31.044167 r-4700-72 ERR syncd#SDK: message repeated 106 times: [ [SAI_ROUTE.ERR] ./src/mlnx_sai_route.c[1113]- mlnx_get_route: Failed to get 1 route entries Entry Not Found.]'
'2025 Aug 29 12:06:31.044167 r-4700-72 ERR syncd#SDK: :- sendApiResponse: api SAI_COMMON_API_BULK_REMOVE failed in syncd mode: SAI_STATUS_FAILURE'
'2025 Aug 29 12:06:31.048911 r-4700-72 ERR swss#orchagent: :- flush_removing_entries: EntityBulker.flush remove entries failed
number of entries to remove: 516
status: SAI_STATUS_FAILURE'
'2025 Aug 29 12:06:31.070236 r-4700-72 ERR swss#orchagent: :- removeRoutePost: Failed to remove route prefix:2.0.3.172/32'
'2025 Aug 29 12:06:31.070582 r-4700-72 ERR swss#orchagent: :- removeRoutePost: Failed to remove route prefix:2.0.3.173/32'
...
'2025 Aug 29 12:06:33.265114 r-4700-72 ERR swss#orchagent: :- meta_generic_validation_remove: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"2.0.5.207/32","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn\'t exist'
'2025 Aug 29 12:06:33.265123 r-4700-72 ERR swss#orchagent: :- meta_generic_validation_remove: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"2.0.5.204/32","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn\'t exist'
...
'2025 Aug 29 12:06:33.338084 r-4700-72 ERR syncd#SDK: [SAI_ROUTE.ERR] ./src/mlnx_sai_route.c[1113]- mlnx_get_route: Failed to get 1 route entries Entry Not Found.'
'2025 Aug 29 12:06:33.363171 r-4700-72 ERR syncd#SDK: message repeated 8 times: [ [SAI_ROUTE.ERR] ./src/mlnx_sai_route.c[1113]- mlnx_get_route: Failed to get 1 route entries Entry Not Found.]'
'2025 Aug 29 12:06:33.363171 r-4700-72 ERR syncd#SDK: :- sendApiResponse: api SAI_COMMON_API_BULK_REMOVE failed in syncd mode: SAI_STATUS_FAILURE'
'2025 Aug 29 12:06:33.366070 r-4700-72 ERR swss#orchagent: :- flush_removing_entries: EntityBulker.flush remove entries failed
number of entries to remove: 412
status: SAI_STATUS_FAILURE'
'2025 Aug 29 12:06:33.370268 r-4700-72 ERR swss#orchagent: :- removeRoutePost: Failed to remove route prefix:2.0.5.174/32'
'...

From @Ndancejic 's analysis, the root casue is some of these route entries are removed in the cleanup step when the corresponding neighbor entries are removed. The way this test currently works is that it creates a route prefix ex: 10.0.0.3/32 which points to a next hop group {10.0.0.1, 10.0.0.3}. This works fine for testing CRM but it’s not an expected use-case for Sonic. In dualtor, when a neighbor entry is not resolved, we also create a tunnel route w/prefix 10.0.0.3/32. What’s happening in this test case is that some of these routes are being created and the test is finishing before the corresponding neighbor entries are created. Then during the cleanup step when the neighbor entries are being removed, dualtor logic is removing the prefix (which happens to correspond to the route entry as well). So when route cleanup is happening, we attempt to remove that prefix but it does not exist.

In the future, likely 202511, we’ll be switching to no_host_route handling of dualtor neighbors: sonic-net/sonic-swss#3722. After this, the errors should not be observed anymore.

Steps to Reproduce

Run test crm.test_crm.test_crm_nexthop_group on a SN4700 dualtor-aa-64-breakout testbed.

Actual Behavior and Expected Behavior

No errors.

Relevant log output

Output of show version

Attach files (if any)

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions