Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BGP State Change Does not Trigger BGP State Event #19591

Open
wumiaont opened this issue Jul 16, 2024 · 10 comments
Open

BGP State Change Does not Trigger BGP State Event #19591

wumiaont opened this issue Jul 16, 2024 · 10 comments
Assignees
Labels
Chassis 🤖 Modular chassis support Issue for 202405 MSFT Triaged this issue has been triaged

Comments

@wumiaont
Copy link
Contributor

wumiaont commented Jul 16, 2024

Description

During Telemetry test, found one test case to catch BGP state event failed. The testing is issuing "config bgp startup all" "config bgp shutdown all", config bgp startup all" to the duthost. Then subscribe to gnmi server for the bgp state change event.

It's expecting to receive BGP state change event by the subscriber. This does not happen.

Steps to reproduce the issue:

  1. On PTF server, issue "python /root/gnxi/gnmi_cli_py/py_gnmicli.py -g -t 10.250.6.231 -p 8080 -m subscribe -x all[heartbeat=2] -xt EVENTS -o ndastreamingservertest -n --subscribe_mode 0 --submode 1 --interval 0 --update_count 1 --filter_event_regex sonic-events-bgp:bgp-state"
    Here 10.250.6.231 is the mgmt ip of duthost.
  2. Issue "config bgp startup all" "config bgp shutdown all", config bgp startup all" from the chassis.
  3. On PTF server there's no response. Subscriber still waiting for valid response.

Describe the results you received:

The test failed.

Describe the results you expected:

startup/shutdown bgp repeatedly shoud trigger BGP state change event.

Output of show version:

202405/master

(paste your output here)

More detail for the test you can check sonic-mgmt/tests/telemetry/events/bgp-events.py::test_event

One observation here: the BGP notification event works with similar action described above by creating IP rules to drop TC packets to/from port 179.

@vmittal-msft
Copy link
Contributor

@zbud-msft Please help take a look.

@zbud-msft
Copy link
Contributor

@wumiaont Can you please verify if other paths work that are not EVENTS db? Is it maybe an accessibility issue? Are you seeing heartbeats on the subscriber (ptf gnmi client) side? In syslog are you able to see the bgp state change log?

@wumiaont
Copy link
Contributor Author

wumiaont commented Aug 1, 2024

I have verified the bgp notification events are received when we create rule to drop packets to/from port 179. BGP state change events are not received. Also if I do not use filter I can see heartbeat events are received.

@wumiaont
Copy link
Contributor Author

wumiaont commented Aug 1, 2024

There are other issues such as #19603. Please look at comments there. I can see events were published from log for swss. But gnmi client does not get it. If I remove filter_event_regex from cli I can get heartbeats events. But no swss events if we trigger such as port shutdown/startup actions.

python /root/gnxi/gnmi_cli_py/py_gnmicli.py -g -t 10.250.6.231 -p 8080 -m subscribe -x all[heartbeat=2] -xt EVENTS -o ndastreamingservertest --subscribe_mode 0 --submode 1 --interval 0 --update_count 100

2024-08-01 01:14:59.826520 response received:
update {
timestamp: 1722474899815081373
prefix {
target: "EVENTS"
}
update {
path {
elem {
name: "all"
key {
key: "heartbeat"
value: "2"
}
}
}
val {
json_ietf_val: "{"sonic-events-eventd:heartbeat":{"timestamp":"2024-08-01T01:14:59.815016Z"}}"
}
}
}
......

@wumiaont
Copy link
Contributor Author

wumiaont commented Aug 1, 2024

@zbud-msft many events tests work such as host events, bgp notification events. Only bgp state event and swss events are not received. I can see from log the corresponding events are published. Please help to check what could be wrong here.

@wumiaont
Copy link
Contributor Author

wumiaont commented Aug 1, 2024

This is for bgp notification test. That works. I am using client without the regex filtering from ptf server.
python /root/gnxi/gnmi_cli_py/py_gnmicli.py -g -t 10.250.6.231 -p 8080 -m subscribe -x all[heartbeat=2] -xt EVENTS -o ndastreamingservertest --subscribe_mo-submode 0 --submode 1 --interval 0 --update_count 100

2024-08-01 15:57:32.814535 response received:
update {
timestamp: 1722527852781597007
prefix {
target: "EVENTS"
}
update {
path {
elem {
name: "all"
key {
key: "heartbeat"
value: "2"
}
}
}
val {
json_ietf_val: "{"sonic-events-eventd:heartbeat":{"timestamp":"2024-08-01T15:57:32.781521Z"}}"
}
}
}

2024-08-01 15:57:33.750013 response received:
update {
timestamp: 1722527853743868153
prefix {
target: "EVENTS"
}
update {
path {
elem {
name: "all"
key {
key: "heartbeat"
value: "2"
}
}
}
val {
json_ietf_val: "{"sonic-events-bgp:notification":{"ip":"10.0.0.1","is_sent":"true","major_code":"4","minor_code":"0","timestamp":"2024-08-01T15:57:33.743775Z"}}"
}
}
}

@wumiaont
Copy link
Contributor Author

wumiaont commented Aug 1, 2024

For BGP state test. I can see from syslog the event is published. But gnmi client only received heartbeat events during test.

2024 Aug 1 16:28:15.406947 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::2","status":"down","timestamp":"2024-08-01T16:28:15.406799Z"}}
2024 Aug 1 16:28:15.407054 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::a","status":"down","timestamp":"2024-08-01T16:28:15.406876Z"}}
2024 Aug 1 16:28:16.692872 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.1","status":"up","timestamp":"2024-08-01T16:28:16.692602Z"}}
2024 Aug 1 16:28:16.694208 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.11","status":"up","timestamp":"2024-08-01T16:28:16.694020Z"}}
2024 Aug 1 16:28:16.7616

@wumiaont
Copy link
Contributor Author

wumiaont commented Aug 1, 2024

@zbud-msft It looks to me that if the event is published by a global service that will work. It it's published by service under certain namespace then client will not receive it. Below log you can see bgp-state events are published from bgp0 or bgp1. Which failed to be received by client. notification is published by rsyslog_plugin, which works.

swss has similar issue of failure as swss is with each namespace.

Looks the publish code has an issue to handle services under namespace.

2024 Aug 1 16:53:04.209148 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::2","status":"down","timestamp":"2024-08-01T16:53:04.208989Z"}}
2024 Aug 1 16:53:04.209213 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::a","status":"down","timestamp":"2024-08-01T16:53:04.209080Z"}}
2024 Aug 1 16:53:04.218601 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.7","status":"down","timestamp":"2024-08-01T16:53:04.218323Z"}}
2024 Aug 1 16:53:04.218802 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::16","status":"down","timestamp":"2024-08-01T16:53:04.218654Z"}}
2024 Aug 1 16:53:04.218907 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::e","status":"down","timestamp":"2024-08-01T16:53:04.218753Z"}}
2024 Aug 1 16:53:05.290234 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.1","status":"up","timestamp":"2024-08-01T16:53:05.289958Z"}}
2024 Aug 1 16:53:05.293026 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.11","status":"up","timestamp":"2024-08-01T16:53:05.292833Z"}}
2024 Aug 1 16:53:05.406129 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.5","status":"up","timestamp":"2024-08-01T16:53:05.405738Z"}}
2024 Aug 1 16:53:05.406342 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::2","status":"up","timestamp":"2024-08-01T16:53:05.405859Z"}}
2024 Aug 1 16:53:05.406460 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::a","status":"up","timestamp":"2024-08-01T16:53:05.406214Z"}}
2024 Aug 1 16:53:05.411090 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.7","status":"up","timestamp":"2024-08-01T16:53:05.410553Z"}}
2024 Aug 1 16:53:05.411447 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::16","status":"up","timestamp":"2024-08-01T16:53:05.411204Z"}}
2024 Aug 1 16:53:05.411546 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::e","status":"up","timestamp":"2024-08-01T16:53:05.411296Z"}}
2024 Aug 1 16:59:47.205043 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.1","is_sent":"true","major_code":"5","minor_code":"0","timestamp":"2024-08-01T16:59:47.204266Z"}}
2024 Aug 1 16:59:47.205251 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.2","is_sent":"true","major_code":"5","minor_code":"0","timestamp":"2024-08-01T16:59:47.204888Z"}}
2024 Aug 1 16:59:49.207720 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.2","is_sent":"true","major_code":"6","minor_code":"7","timestamp":"2024-08-01T16:59:49.207207Z"}}
2024 Aug 1 16:59:49.207871 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.1","is_sent":"true","major_code":"6","minor_code":"7","timestamp":"2024-08-01T16:59:49.207769Z"}}

@zbud-msft
Copy link
Contributor

Hi @wumiaont seems like there is a common issue with multi-asic devices for swss events and bgp state event. I will look into this issue. As of right now, eventd/structured events does not claim to provide full support for multi-asic. In the meantime, I will disable test_events for multi-asic devices. Maybe we can keep one thread open since we have one for swss and one for bgp and they point to same issue.

@wumiaont
Copy link
Contributor Author

wumiaont commented Aug 1, 2024

Hi @wumiaont seems like there is a common issue with multi-asic devices for swss events and bgp state event. I will look into this issue. As of right now, eventd/structured events does not claim to provide full support for multi-asic. In the meantime, I will disable test_events for multi-asic devices. Maybe we can keep one thread open since we have one for swss and one for bgp and they point to same issue.

Thanks for looking into the issue. Let me know if you need anything from me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis 🤖 Modular chassis support Issue for 202405 MSFT Triaged this issue has been triaged
Projects
Status: No status
Development

No branches or pull requests

4 participants