fixed gnmi-1.17 issues related to gnoi power down request fails after controller card switchover#5335
fixed gnmi-1.17 issues related to gnoi power down request fails after controller card switchover#5335mnp-juniper wants to merge 4 commits intoopenconfig:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses reliability issues in the controller card redundancy tests by introducing a robust retry loop for gNOI power-down operations. It also streamlines the test flow by removing unnecessary switchover steps and optimizing wait times for component state transitions, ensuring more stable and efficient test execution. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
Pull Request Functional Test Report for #5335 / 89d0735Virtual Devices
Hardware Devices
|
There was a problem hiding this comment.
Code Review
This pull request refactors the controller card redundancy test by renaming RP variables and shifting the test logic to focus on standby RP power-down. Significant updates include adding a readiness check for the standby RP, implementing a retry loop with a timeout context for the power-down RPC to improve reliability, and reducing GNMI await timeouts from 30 to 20 minutes. Review feedback recommends enhancing log messages by including specific component names and roles to improve clarity and debuggability.
| switchoverReady := gnmi.OC().Component(rpActiveBeforeSwitch).SwitchoverReady() | ||
| switchoverReady := gnmi.OC().Component(rpActive).SwitchoverReady() | ||
| gnmi.Await(t, dut, switchoverReady.State(), 10*time.Minute, true) | ||
| t.Logf("SwitchoverReady().Get(t): %v", gnmi.Get(t, dut, switchoverReady.State())) |
There was a problem hiding this comment.
The log message is ambiguous as it does not specify which controller card is being checked. Including the component name and role improves log clarity and helps in debugging test failures.
| t.Logf("SwitchoverReady().Get(t): %v", gnmi.Get(t, dut, switchoverReady.State())) | |
| t.Logf("SwitchoverReady for active RP (%s): %v", rpActive, gnmi.Get(t, dut, switchoverReady.State())) |
| // Check if standby RP is ready for switchover | ||
| switchoverReady = gnmi.OC().Component(rpStandby).SwitchoverReady() | ||
| gnmi.Await(t, dut, switchoverReady.State(), 10*time.Minute, true) | ||
| t.Logf("SwitchoverReady().Get(t): %v", gnmi.Get(t, dut, switchoverReady.State())) |
There was a problem hiding this comment.
The log message is ambiguous as it does not specify which controller card is being checked. Including the component name and role improves log clarity and helps in debugging test failures.
| t.Logf("SwitchoverReady().Get(t): %v", gnmi.Get(t, dut, switchoverReady.State())) | |
| t.Logf("SwitchoverReady for standby RP (%s): %v", rpStandby, gnmi.Get(t, dut, switchoverReady.State())) |
Problem:
For a gNOI RPC to succeed after GRES, typically we need to wait 60 to 120 seconds.Due to this some times powerdown request to power down standby is not succeeding .
Fix:
Re-try power down standby gnoi request in a loop with a maximum of 12 attempts with a delay of 10s after each failed attemt.
Also removed the code to do switchover in test TEST_3 as the README cleary states that "Verify and power down the "previous_primary" component which should have already been switch over in the previous sub-test."