-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPS gets into a state where IP addresses cannot be configured #66
Comments
Regarding the SAI_PORT error (i.e. attr index 0 attr id 76 failed) the below fix should solve the issue (merged recently) https://review.openswitch.net/#/c/14562/ We need to check why the opx-pas is failing, can you please provide "service opx-pas status" details. |
Here's
|
I have discovered that when CPS gets into this state, it can be recovered by However there is no indication that
Hope that helps. |
I can now reliably reproduce this condition, by using ansible to install a debian package - any package, even one that does not exist! - on the switch. On some other host (from which you will run ansible):
You should now find that attempts to configure IP addresses through CPS fail. It seems as though something happened during the ansible command that breaks the registration from |
Somehow I could not reproduce the issue having followed the above procedure, hence need more info to troubleshoot further. |
Sure - what do you need? |
Can you provide the logs related to the issue (error logs, registration break, cps command failure ,etc), that may help. |
I really don't have any useful logs:
What else do you need? Are there likely to be other useful logs somewhere else? Perhaps you'll want to provide debug versions of some code, making additional logs? |
Assuming you are able to reproduce the issue consistently, can you please check if below changes fix the issue. Recently we have encountered some EPIPE randomly and following code helps to avoid that. Diff: File attached with above change (please rename the file to base_ip.py) : |
The code that you provided tries to log the undefined variable However, if I fix that so that we only try to log I see that there are still a handful of uses of |
Great to know that the change fixes the issue. We will commit this change. We may not need to remove other print. I actually tested without restarting opx-ip service, hence it worked :-) Sorry for that. As you have mentioned this is the updated file |
Fix in review |
Closing this, please reopen for any further issue. |
We sometimes see a VM get into a state where attempts to set an IP address on an interface fail.
It's currently unclear exactly what the necessary steps to reproduce this are - I am working on getting a clearer picture of this.
Once in the bad state, attempts to set an IP address via CPS fail. Eg running this script results in us hitting the error branch.
I've seen examples where the
opx-pas
service has failed - but restarting it does not resolve the problem.I've also seen examples where the
opx-nas
service makes a series of interesting logs:... but restarting the
opx-nas
service does not resolve the issue either.Rebooting the box does allow IP address configuration to start succeeding again.
Where should we be looking for additional diagnostics to explain what is going on when the configuration fails?
Thanks!
The text was updated successfully, but these errors were encountered: