Bug 1886572: Calculate keepalived priority for ingress #141

yboaron · 2021-06-02T18:33:48Z

Ingress VIP should be set only on a node that runs an instance of the default ingress controller pod.
In current code, in case extra ingress-controllers are created the ingress VIP might be wrongly set on a node that doesn't run
an instance of the default ingress controller.

This PR calculates the priority for keepalived ingress VIP depending on the presence of the router pod in the node by monitoring the content of router-internal-default endpoints resource.

yboaron · 2021-06-02T18:35:23Z

/retitle Bug 1886572: Calculate keepalived priority for ingress

openshift-ci · 2021-06-02T18:35:50Z

@yboaron: This pull request references Bugzilla bug 1886572, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.8.0) matches configured target release for branch (4.8.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (vvoronko@redhat.com), skipping review request.

In response to this:

Bug 1886572: Calculate keepalived priority for ingress

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cybertron · 2021-06-02T19:06:40Z

Is there any chance we can do this in a check script instead of the monitor? Or do we not have the ability to run oc commands from inside the keepalived container? It would just be nice to keep all of the priority bits in the same place so we're not having to look at both the keepalived and monitor logs when trying to figure out what happened with the priority if/when a VIP ends up somewhere it shouldn't.

Although maybe on reload keepalived would log the new priority anyway? Mostly I want to avoid making keepalived harder to debug than it already is.

yboaron · 2021-06-03T11:25:44Z

Is there any chance we can do this in a check script instead of the monitor? Or do we not have the ability to run oc commands from inside the keepalived container? It would just be nice to keep all of the priority bits in the same place so we're not having to look at both the keepalived and monitor logs when trying to figure out what happened with the priority if/when a VIP ends up somewhere it shouldn't.

Although maybe on reload keepalived would log the new priority anyway? Mostly I want to avoid making keepalived harder to debug than it already is.

Agree that having check_script for this purpose is ideal, we need to be able :

As you mentioned, to run oc command from the keepalived container
Retrieve node's IP address

I'll try to mount host's oc binary and check that

yboaron · 2021-06-03T14:48:36Z

/retest

cybertron

After some discussion elsewhere, we decided to proceed with this solution since it's the one we have implemented and code freeze is coming up.

However, I have concerns about the error handling and we need to fix the fmt errors before it can go in.

cybertron · 2021-06-04T17:12:48Z

pkg/config/node.go

+func GetIngressPriority(kubeconfigPath string,nonVirtualIP string) (int){
+	config, err := clientcmd.BuildConfigFromFlags("", kubeconfigPath)
+	if err != nil {
+		return 40


I don't think we want to default to 40 for the error case. If we can't determine whether the node has the default ingress we should give it the lower priority so we don't take the VIP from a node that is known to have the right ingress.

This applies to all of the other error cases below too.

Also, can we log the error? Otherwise all we know is that we got a priority of 20, but that could mean we didn't have the ingress or it could mean something went wrong here.

I set the priority to 40 in case of error because I didn't want to change current behavior (where priority set to 40 by default) though setting the priority to 20 in error case makes sense.

I'll change it to 20.

Ingress VIP should be set only on a node that runs an instance of the default ingress controller pod. In current code, in case extra ingress-controllers are created the ingress VIP might be wrongly set on a node that doesn't run an instance of the default ingress controller. This PR calculates the priority for keepalived ingress VIP depending on the presence of the router pod in the node by monitoring the content of router-internal-default endpoints resource.

yboaron · 2021-06-07T10:06:23Z

/retest

yboaron · 2021-06-07T13:31:42Z

/test e2e-metal-ipi

openshift-ci · 2021-06-07T15:27:07Z

@yboaron: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-metal-ipi	`65f8866`	link	`/test e2e-metal-ipi`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

cybertron · 2021-06-07T16:04:11Z

/lgtm

The metal-ipi job failure appears unrelated and the ipv6 job passed so this should be fine.

openshift-ci · 2021-06-07T16:04:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, yboaron

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cybertron,yboaron]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2021-06-07T16:04:45Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2021-06-07T16:05:30Z

@yboaron: Some pull requests linked via external trackers have merged:

openshift/baremetal-runtimecfg#141

The following pull requests linked via external trackers have not merged:

openshift/machine-config-operator#2595 is open

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with /bugzilla refresh.

Bugzilla bug 1886572 has not been moved to the MODIFIED state.

In response to this:

Bug 1886572: Calculate keepalived priority for ingress

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This reverts commit c8b1456, reversing changes made to e6b36b2.

Revert "Merge pull request #141 from yboaron/get_endpoints"

This reverts commit c8b1456, reversing changes made to e6b36b2.

openshift-ci bot requested review from bcrochet and cybertron June 2, 2021 18:34

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2021

openshift-ci bot changed the title ~~Calculate keepalived priority for ingress~~ Bug 1886572: Calculate keepalived priority for ingress Jun 2, 2021

openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jun 2, 2021

yboaron mentioned this pull request Jun 3, 2021

Bug 1886572: [on-prem] set keepalived ingress priority to variable openshift/machine-config-operator#2595

Merged

cybertron requested changes Jun 4, 2021

View reviewed changes

yboaron force-pushed the get_endpoints branch from a4c7646 to 65f8866 Compare June 7, 2021 08:01

openshift-ci bot assigned cybertron Jun 7, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 7, 2021

openshift-merge-robot merged commit c8b1456 into openshift:master Jun 7, 2021

yboaron added a commit to yboaron/baremetal-runtimecfg that referenced this pull request Jun 24, 2021

Revert "Merge pull request openshift#141 from yboaron/get_endpoints"

3e92a10

This reverts commit c8b1456, reversing changes made to e6b36b2.

yboaron mentioned this pull request Jun 24, 2021

On-prem: add default ingress track script to Keepalived openshift/machine-config-operator#2637

Merged

openshift-merge-robot added a commit that referenced this pull request Jul 18, 2021

Merge pull request #146 from yboaron/t2_revert_ingress_prio

6bd3398

Revert "Merge pull request #141 from yboaron/get_endpoints"

yboaron added a commit to yboaron/baremetal-runtimecfg that referenced this pull request Oct 18, 2021

Revert "Merge pull request openshift#141 from yboaron/get_endpoints"

64392cb

This reverts commit c8b1456, reversing changes made to e6b36b2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1886572: Calculate keepalived priority for ingress #141

Bug 1886572: Calculate keepalived priority for ingress #141

yboaron commented Jun 2, 2021

yboaron commented Jun 2, 2021

openshift-ci bot commented Jun 2, 2021

cybertron commented Jun 2, 2021

yboaron commented Jun 3, 2021

yboaron commented Jun 3, 2021

cybertron left a comment

cybertron Jun 4, 2021

yboaron Jun 7, 2021

yboaron commented Jun 7, 2021 •

edited

Loading

yboaron commented Jun 7, 2021

openshift-ci bot commented Jun 7, 2021

cybertron commented Jun 7, 2021

openshift-ci bot commented Jun 7, 2021

openshift-bot commented Jun 7, 2021

openshift-ci bot commented Jun 7, 2021

Bug 1886572: Calculate keepalived priority for ingress #141

Bug 1886572: Calculate keepalived priority for ingress #141

Conversation

yboaron commented Jun 2, 2021

yboaron commented Jun 2, 2021

openshift-ci bot commented Jun 2, 2021

cybertron commented Jun 2, 2021

yboaron commented Jun 3, 2021

yboaron commented Jun 3, 2021

cybertron left a comment

Choose a reason for hiding this comment

cybertron Jun 4, 2021

Choose a reason for hiding this comment

yboaron Jun 7, 2021

Choose a reason for hiding this comment

yboaron commented Jun 7, 2021 • edited Loading

yboaron commented Jun 7, 2021

openshift-ci bot commented Jun 7, 2021

cybertron commented Jun 7, 2021

openshift-ci bot commented Jun 7, 2021

openshift-bot commented Jun 7, 2021

openshift-ci bot commented Jun 7, 2021

yboaron commented Jun 7, 2021 •

edited

Loading