Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 2079517: Use externalTrafficPolicy: Cluster with OVN #713

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented Mar 3, 2022

OVN-Kubernetes in OpenShift 4.9 and earlier does not support externalTrafficPolicy: Local, and specifying it is reported to result in imbalanced traffic for some users. This change checks the cluster network type and sets externalTrafficPolicy: Cluster on the LoadBalancer-type service that the operator creates for ingress if the network type is "OVN-Kubernetes".

  • pkg/operator/controller/ingress/controller.go (ensureIngressController): Pass the network config to ensureLoadBalancerService and syncIngressControllerStatus.
  • pkg/operator/controller/ingress/load_balancer_service.go (ensureLoadBalancerService, loadBalancerServiceIsUpgradeable): Add a parameter for the network config. Pass the network type from the network config to desiredLoadBalancerService.
    (desiredLoadBalancerService): Add a parameter for the network type. Set externalTrafficPolicy: Cluster if the network type is "OVN-Kubernetes".
  • pkg/operator/controller/ingress/load_balancer_service_test.go (TestDesiredLoadBalancerService): Add a test case for OVN-Kubernetes.
  • pkg/operator/controller/ingress/status.go (syncIngressControllerStatus): Add a parameter for the network config. Pass the config to computeIngressUpgradeableCondition.
    (computeIngressUpgradeableCondition): Add a parameter for the network config. Pass the network config to loadBalancerServiceIsUpgradeable.
  • pkg/operator/controller/ingress/status_test.go (TestComputeIngressUpgradeableCondition): Pass the network type to desiredLoadBalancerService and the network config to computeIngressUpgradeableCondition.

This PR includes commits from #711 because we want #711 to merge first, and it includes conflicting changes. The first two commits in this PR can be ignored by reviewers.

/hold


Surya, can you verify that this change is correct? In particular, I want to be sure we're checking the right API field for the right value and that the logic behind this PR is sound at a high level.

/assign @tssurya

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Mar 3, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2022

@Miciah: This pull request references Bugzilla bug 2060542, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.z) matches configured target release for branch (4.9.z)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1903408 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE))
  • dependent Bugzilla bug 1903408 targets the "4.10.0" release, which is one of the valid target releases: 4.10.0
  • bug has dependents

Requesting review from QA contact:
/cc @anuragthehatter

In response to this:

Bug 2060542: Use externalTrafficPolicy: Cluster with OVN

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 3, 2022
// externalTrafficPolicy: Local is unsupported by OVN in
// OpenShift 4.9 and earlier; see
// <https://bugzilla.redhat.com/show_bug.cgi?id=2060542>.
if networkType == "OVN-Kubernetes" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its OVNKubernetes, we should probably hardcode tha value as a constant to avoid the typo?
reference: https://github.com/openshift/api/blob/fb6f933bb8d5ce8454a8777c0c4782c193ef5674/operator/v1/types_network.go#L536

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I didn't think to check for named constants because the Spec.NetworkType field of the Network type in config/v1/types_network.go is a string type. I wonder whether the field could be changed to use the NetworkType type; I suppose if the same values are allowed, it may be all right to change the type, although it would require changes in any Go code using the Go type definitions.

@@ -524,8 +530,8 @@ func loadBalancerServiceTagsModified(current, expected *corev1.Service) (bool, *
// return value is nil. Otherwise, if something or someone else has modified
// the service, then the return value is a non-nil error indicating that the
// modification must be reverted before upgrading is allowed.
func loadBalancerServiceIsUpgradeable(ic *operatorv1.IngressController, deploymentRef metav1.OwnerReference, current *corev1.Service, platform *configv1.PlatformStatus) error {
want, desired, err := desiredLoadBalancerService(ic, deploymentRef, platform)
func loadBalancerServiceIsUpgradeable(ic *operatorv1.IngressController, deploymentRef metav1.OwnerReference, current *corev1.Service, platform *configv1.PlatformStatus, networkConfig *configv1.Network) error {
Copy link

@tssurya tssurya Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship or @squeed : I have always been confused as to when to use the operv1 versus configv1 api for things. Could you please configm if using configv1 ^ is ok here? Seems like for config we support sdn and ovn: https://github.com/openshift/api/blob/d5252bac47154049f80d49eff7fec5bb642be9af/config/v1/types_network.go#L44 and for operator we support the other ones like kuryr etc..sorry for the dumb question, I probably should be knowing these things by now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing that operator/v1/types_network.go defines these named constants, but config/v1/types_network.go just uses a string type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tssurya good question! In this case, it doesn't really matter, because configv1 NetworkType is inferred from configv1. Since configv1 is guaranteed to be set from cluster creation, it is slightly better to use. But it really, really, really doesn't matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing that operator/v1/types_network.go defines these named constants [but configv1 doesn't]

That's because you can run a cluster with a third-party network operator - so we need to support arbitrary network types in configv1. But operv1 defines what the openshift cluster network operator supports.

Copy link

@tssurya tssurya Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because configv1 NetworkType is inferred from configv1.

you meant from operv1?
(later)... I see you meant configv1 that users set! got it :)

Since configv1 is guaranteed to be set from cluster creation, it is slightly better to use. But it really, really, really doesn't matter.

ack thanks casey! makes sense. @Miciah so I think we are good here.

@tssurya
Copy link

tssurya commented Mar 3, 2022

thanks @Miciah for the quick fix, appreciate it!

@Miciah Miciah force-pushed the BZ2060542-use-externalTrafficPolicy-Cluster-with-OVN branch from a777e70 to fddf13c Compare March 3, 2022 19:51
@Miciah
Copy link
Contributor Author

Miciah commented Mar 4, 2022

Bootstrap failures.
/retest

// OpenShift 4.9 and earlier; see
// <https://bugzilla.redhat.com/show_bug.cgi?id=2060542>.
if networkType == string(operatorv1.NetworkTypeOVNKubernetes) {
service.Spec.ExternalTrafficPolicy = "Cluster"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@Miciah Miciah force-pushed the BZ2060542-use-externalTrafficPolicy-Cluster-with-OVN branch from fddf13c to 15202ad Compare March 8, 2022 17:46
@Miciah
Copy link
Contributor Author

Miciah commented Mar 8, 2022

Failed to launch the cluster:

level=error msg=Error: error listing tags for LB Target Group (arn:aws:elasticloadbalancing:us-east-1:460538899914:targetgroup/ci-op-fjqjgw7q-265e5-b5vk5-sint/84bf5875e5bc1488): TargetGroupNotFound: Target groups 'arn:aws:elasticloadbalancing:us-east-1:460538899914:targetgroup/ci-op-fjqjgw7q-265e5-b5vk5-sint/84bf5875e5bc1488' not found
level=error msg=	status code: 400, request id: d43be720-0c69-49e4-9094-98e9ad332ddb
level=error
level=error msg=  on ../tmp/openshift-install-cluster-733658679/vpc/master-elb.tf line 99, in resource "aws_lb_target_group" "services":
level=error msg=  99: resource "aws_lb_target_group" "services" {
level=error
level=error
level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change 

/test e2e-aws-operator

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 17, 2022
@Miciah Miciah force-pushed the BZ2060542-use-externalTrafficPolicy-Cluster-with-OVN branch from 15202ad to 7b3de00 Compare March 17, 2022 20:05
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 17, 2022
@Miciah
Copy link
Contributor Author

Miciah commented Mar 17, 2022

#711 merged, and I've rebased this PR; ignoring the changes from #711, the diff for this PR is exactly the same but for some line numbers (compare https://github.com/openshift/cluster-ingress-operator/commit/15202ad247d0dedaadec7c2564b60db9800f50ff.patch and https://github.com/openshift/cluster-ingress-operator/commit/7b3de002a1bf5373504d2b5dde2985cec012aecf.patch).
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 17, 2022
@Miciah
Copy link
Contributor Author

Miciah commented Apr 1, 2022

/hold cancel

OVN-Kubernetes in OpenShift 4.9 and earlier does not support
externalTrafficPolicy: Local, and specifying it is reported to result in
imbalanced traffic for some users.  This commit checks the cluster network
type and sets externalTrafficPolicy: Cluster on the LoadBalancer-type
service that the operator creates for ingress if the network type is
"OVN-Kubernetes".

This commit fixes bug 2079517.

https://bugzilla.redhat.com/show_bug.cgi?id=2079517

* pkg/operator/controller/ingress/controller.go (ensureIngressController):
Pass the network config to ensureLoadBalancerService and
syncIngressControllerStatus.
* pkg/operator/controller/ingress/load_balancer_service.go
(ensureLoadBalancerService, loadBalancerServiceIsUpgradeable):
Add a parameter for the network config.  Pass the network type
from the network config to desiredLoadBalancerService.
(desiredLoadBalancerService): Add a parameter for the network type.  Set
externalTrafficPolicy: Cluster if the network type is "OVN-Kubernetes".
* pkg/operator/controller/ingress/load_balancer_service_test.go
(TestDesiredLoadBalancerService): Add a test case for OVN-Kubernetes.
* pkg/operator/controller/ingress/status.go (syncIngressControllerStatus):
Add a parameter for the network config.  Pass the config to
computeIngressUpgradeableCondition.
(computeIngressUpgradeableCondition): Add a parameter for the network
config.  Pass the network config to loadBalancerServiceIsUpgradeable.
* pkg/operator/controller/ingress/status_test.go
(TestComputeIngressUpgradeableCondition): Pass the network type to
desiredLoadBalancerService and the network config to
computeIngressUpgradeableCondition.
@Miciah Miciah force-pushed the BZ2060542-use-externalTrafficPolicy-Cluster-with-OVN branch from 7b3de00 to e4aeaf7 Compare April 27, 2022 16:32
@Miciah Miciah changed the title Bug 2060542: Use externalTrafficPolicy: Cluster with OVN Bug 2079517: Use externalTrafficPolicy: Cluster with OVN Apr 27, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 27, 2022

@Miciah: An error was encountered querying GitHub for users with public email (hongli@redhat.com) for bug 2079517 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. non-200 OK status code: 403 Forbidden body: "{\n \"documentation_url\": \"https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits\",\n \"message\": \"You have exceeded a secondary rate limit. Please wait a few minutes before you try again.\"\n}\n"

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

In response to this:

Bug 2079517: Use externalTrafficPolicy: Cluster with OVN

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tssurya
Copy link

tssurya commented Apr 28, 2022

/retest-required

Spec: configv1.NetworkSpec{
NetworkType: "OpenShiftSDN",
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we would add a new variable to the test case itself, and add a couple of test cases to test the variant values.

@candita
Copy link
Contributor

candita commented Apr 28, 2022

@Miciah was there any update for e2e tests for this?

@candita
Copy link
Contributor

candita commented Apr 28, 2022

Error: Failed to download metadata for repo 'localdev-rhel-8-server-ose-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
...
No match for argument: skopeo
Error: Unable to find a match: skopeo

/test e2e-gcp-serial

@candita
Copy link
Contributor

candita commented Apr 28, 2022

fail [github.com/openshift/origin/test/extended/oauth/requestheaders.go:204]: unexpected error response status (401) while trying to reach '/oauth/authorize?client_id=openshift-challenging-client&response_type=token' endpoint: HTTP/1.1 401 Unauthorized
...
Referrer-Policy: strict-origin-when-cross-origin
Warning: 199 Origin "A non-empty X-CSRF-Token header is required to receive basic-auth challenges"
...
X-Frame-Options: DENY
X-Xss-Protection: 1; mode=block
A non-empty X-CSRF-Token header is required to receive basic-auth challenges
failed: (2m23s) 2022-04-28T22:33:08 "[Serial] [sig-auth][Feature:OAuthServer] [RequestHeaders] [IdP] test RequestHeaders IdP [Suite:openshift/conformance/serial]"

@candita
Copy link
Contributor

candita commented Apr 28, 2022

/retest

@candita
Copy link
Contributor

candita commented Apr 29, 2022

/lgtm

@Miciah Miciah added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Apr 29, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 29, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: candita, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lihongan
Copy link
Contributor

lihongan commented Jun 6, 2022

verified with pre-merge test process, see https://bugzilla.redhat.com/show_bug.cgi?id=2079517#c1
/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Jun 6, 2022
@lihongan
Copy link
Contributor

lihongan commented Jun 6, 2022

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jun 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 6, 2022

@Miciah: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit cb65025 into openshift:release-4.9 Jun 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 6, 2022

@Miciah: All pull requests linked via external trackers have merged:

Bugzilla bug 2079517 has been moved to the MODIFIED state.

In response to this:

Bug 2079517: Use externalTrafficPolicy: Cluster with OVN

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants