-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1886572: Calculate keepalived priority for ingress #141
Conversation
yboaron
commented
Jun 2, 2021
/retitle Bug 1886572: Calculate keepalived priority for ingress |
@yboaron: This pull request references Bugzilla bug 1886572, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (vvoronko@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is there any chance we can do this in a check script instead of the monitor? Or do we not have the ability to run oc commands from inside the keepalived container? It would just be nice to keep all of the priority bits in the same place so we're not having to look at both the keepalived and monitor logs when trying to figure out what happened with the priority if/when a VIP ends up somewhere it shouldn't. Although maybe on reload keepalived would log the new priority anyway? Mostly I want to avoid making keepalived harder to debug than it already is. |
Agree that having check_script for this purpose is ideal, we need to be able :
I'll try to mount host's oc binary and check that |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some discussion elsewhere, we decided to proceed with this solution since it's the one we have implemented and code freeze is coming up.
However, I have concerns about the error handling and we need to fix the fmt errors before it can go in.
pkg/config/node.go
Outdated
func GetIngressPriority(kubeconfigPath string,nonVirtualIP string) (int){ | ||
config, err := clientcmd.BuildConfigFromFlags("", kubeconfigPath) | ||
if err != nil { | ||
return 40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to default to 40 for the error case. If we can't determine whether the node has the default ingress we should give it the lower priority so we don't take the VIP from a node that is known to have the right ingress.
This applies to all of the other error cases below too.
Also, can we log the error? Otherwise all we know is that we got a priority of 20, but that could mean we didn't have the ingress or it could mean something went wrong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set the priority to 40 in case of error because I didn't want to change current behavior (where priority set to 40 by default) though setting the priority to 20 in error case makes sense.
I'll change it to 20.
Ingress VIP should be set only on a node that runs an instance of the default ingress controller pod. In current code, in case extra ingress-controllers are created the ingress VIP might be wrongly set on a node that doesn't run an instance of the default ingress controller. This PR calculates the priority for keepalived ingress VIP depending on the presence of the router pod in the node by monitoring the content of router-internal-default endpoints resource.
/retest |
/test e2e-metal-ipi |
@yboaron: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm The metal-ipi job failure appears unrelated and the ipv6 job passed so this should be fine. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cybertron, yboaron The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
@yboaron: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged: These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with Bugzilla bug 1886572 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Revert "Merge pull request #141 from yboaron/get_endpoints"