Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1886572: Calculate keepalived priority for ingress #141

Merged
merged 1 commit into from
Jun 7, 2021

Conversation

yboaron
Copy link
Contributor

@yboaron yboaron commented Jun 2, 2021

Ingress VIP should be set only on a node that runs an instance of the default ingress controller pod.
In current code, in case extra ingress-controllers are created the ingress VIP might be wrongly set on a node that doesn't run
an instance of the default ingress controller.

This PR calculates the priority for keepalived ingress VIP depending on the presence of the router pod in the node by monitoring the content of router-internal-default endpoints resource.

@openshift-ci openshift-ci bot requested review from bcrochet and cybertron June 2, 2021 18:34
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2021
@yboaron
Copy link
Contributor Author

yboaron commented Jun 2, 2021

/retitle Bug 1886572: Calculate keepalived priority for ingress

@openshift-ci openshift-ci bot changed the title Calculate keepalived priority for ingress Bug 1886572: Calculate keepalived priority for ingress Jun 2, 2021
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jun 2, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 2, 2021

@yboaron: This pull request references Bugzilla bug 1886572, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (vvoronko@redhat.com), skipping review request.

In response to this:

Bug 1886572: Calculate keepalived priority for ingress

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cybertron
Copy link
Member

Is there any chance we can do this in a check script instead of the monitor? Or do we not have the ability to run oc commands from inside the keepalived container? It would just be nice to keep all of the priority bits in the same place so we're not having to look at both the keepalived and monitor logs when trying to figure out what happened with the priority if/when a VIP ends up somewhere it shouldn't.

Although maybe on reload keepalived would log the new priority anyway? Mostly I want to avoid making keepalived harder to debug than it already is.

@yboaron
Copy link
Contributor Author

yboaron commented Jun 3, 2021

Is there any chance we can do this in a check script instead of the monitor? Or do we not have the ability to run oc commands from inside the keepalived container? It would just be nice to keep all of the priority bits in the same place so we're not having to look at both the keepalived and monitor logs when trying to figure out what happened with the priority if/when a VIP ends up somewhere it shouldn't.

Although maybe on reload keepalived would log the new priority anyway? Mostly I want to avoid making keepalived harder to debug than it already is.

Agree that having check_script for this purpose is ideal, we need to be able :

  1. As you mentioned, to run oc command from the keepalived container
  2. Retrieve node's IP address

I'll try to mount host's oc binary and check that

@yboaron
Copy link
Contributor Author

yboaron commented Jun 3, 2021

/retest

Copy link
Member

@cybertron cybertron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion elsewhere, we decided to proceed with this solution since it's the one we have implemented and code freeze is coming up.

However, I have concerns about the error handling and we need to fix the fmt errors before it can go in.

func GetIngressPriority(kubeconfigPath string,nonVirtualIP string) (int){
config, err := clientcmd.BuildConfigFromFlags("", kubeconfigPath)
if err != nil {
return 40
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to default to 40 for the error case. If we can't determine whether the node has the default ingress we should give it the lower priority so we don't take the VIP from a node that is known to have the right ingress.

This applies to all of the other error cases below too.

Also, can we log the error? Otherwise all we know is that we got a priority of 20, but that could mean we didn't have the ingress or it could mean something went wrong here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the priority to 40 in case of error because I didn't want to change current behavior (where priority set to 40 by default) though setting the priority to 20 in error case makes sense.

I'll change it to 20.

Ingress VIP should be set only on a node that runs an
instance of the default ingress controller pod.
In current code, in case extra ingress-controllers are created
the ingress VIP might be wrongly set on a node that doesn't run
an instance of the default ingress controller.

This PR calculates the priority for keepalived ingress VIP depending
on the presence of the router pod in the node by monitoring the
content of router-internal-default endpoints resource.
@yboaron
Copy link
Contributor Author

yboaron commented Jun 7, 2021

/retest

@yboaron
Copy link
Contributor Author

yboaron commented Jun 7, 2021

/test e2e-metal-ipi

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 7, 2021

@yboaron: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-metal-ipi 65f8866 link /test e2e-metal-ipi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cybertron
Copy link
Member

/lgtm

The metal-ipi job failure appears unrelated and the ipv6 job passed so this should be fine.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 7, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 7, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, yboaron

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit c8b1456 into openshift:master Jun 7, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 7, 2021

@yboaron: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with /bugzilla refresh.

Bugzilla bug 1886572 has not been moved to the MODIFIED state.

In response to this:

Bug 1886572: Calculate keepalived priority for ingress

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

yboaron added a commit to yboaron/baremetal-runtimecfg that referenced this pull request Jun 24, 2021
openshift-merge-robot added a commit that referenced this pull request Jul 18, 2021
Revert "Merge pull request #141 from yboaron/get_endpoints"
yboaron added a commit to yboaron/baremetal-runtimecfg that referenced this pull request Oct 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants