Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LoadBalancer tests to be provider-agnostic #124660

Merged
merged 5 commits into from
May 9, 2024

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented May 1, 2024

What this PR does / why we need it:

Drops all the provider checks from the LoadBalancer tests and instead just marks them all [Feature:LoadBalancer]. We will need corresponding updates in test-infra...

At the moment it doesn't make any attempt to distinguish between what different LoadBalancer-supporting platforms support (#123714). People who want to run them will have to just manually --ginkgo.skip the tests they can't pass for now.

(Note: originally this PR was about trying to get the LB tests to pass under cloud-provider-kind, but now it is just about re-enabling them on GCE.)

Which issue(s) this PR fixes:

Fixes #124338

Does this PR introduce a user-facing change?

NONE

/kind cleanup
/sig network
/priority important-soon
/assign @aojea

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/network Categorizes an issue or PR as relevant to SIG Network. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 1, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 1, 2024
@danwinship
Copy link
Contributor Author

danwinship commented May 2, 2024

pull-kubernetes-e2e-kind-cloud-provider-loadbalancer results:

 Summarizing 7 Failures:
  [FAIL] [sig-network] LoadBalancers [Feature:LoadBalancer] [It] should be able to preserve UDP traffic when server pod cycles for a LoadBalancer service on the same nodes [sig-network, Feature:LoadBalancer]
  k8s.io/kubernetes/test/e2e/network/loadbalancer.go:866
  [FAIL] [sig-network] LoadBalancers [Feature:LoadBalancer] [It] should be able to change the type and ports of a UDP service [Slow] [sig-network, Feature:LoadBalancer, Slow]
  k8s.io/kubernetes/test/e2e/network/service.go:591
  [FAIL] [sig-network] LoadBalancers ExternalTrafficPolicy:Local [Feature:LoadBalancer] [Slow] [It] should work for type=LoadBalancer [sig-network, Feature:LoadBalancer, Slow]
  k8s.io/kubernetes/test/e2e/network/loadbalancer.go:990
  [FAIL] [sig-network] LoadBalancers [Feature:LoadBalancer] [It] should be able to preserve UDP traffic when server pod cycles for a LoadBalancer service on different nodes [sig-network, Feature:LoadBalancer]
  k8s.io/kubernetes/test/e2e/network/loadbalancer.go:736
  [FAIL] [sig-network] LoadBalancers [Feature:LoadBalancer] [It] should have session affinity work for LoadBalancer service with Cluster traffic policy [Slow] [LinuxOnly] [sig-network, Feature:LoadBalancer, Slow]
  k8s.io/kubernetes/test/e2e/network/service.go:266
  [FAIL] [sig-network] LoadBalancers [Feature:LoadBalancer] [It] should be able to switch session affinity for LoadBalancer service with Local traffic policy [Slow] [LinuxOnly] [sig-network, Feature:LoadBalancer, Slow]
  k8s.io/kubernetes/test/e2e/network/service.go:266
  [FAIL] [sig-network] LoadBalancers [Feature:LoadBalancer] [It] should be able to switch session affinity for LoadBalancer service with Cluster traffic policy [Slow] [LinuxOnly] [sig-network, Feature:LoadBalancer, Slow]
  k8s.io/kubernetes/test/e2e/network/service.go:266 

So all of the UDP LoadBalancer tests failed (which is a known bug, kubernetes-sigs/cloud-provider-kind#58), and most of the SessionAffinity tests. ("should have session affinity work for LoadBalancer service with Local traffic policy" passed, but none of the others did.)

@danwinship
Copy link
Contributor Author

/test pull-kubernetes-e2e-kind-cloud-provider-loadbalancer
seeing if failures are consistent or flakes...

@aojea
Copy link
Member

aojea commented May 4, 2024

/test pull-kubernetes-e2e-kind-cloud-provider-loadbalancer

@aojea
Copy link
Member

aojea commented May 4, 2024

great, we moved from

Kubernetes e2e suite: [It] [sig-network] LoadBalancers [Feature:LoadBalancer] should be able to preserve UDP traffic when server pod cycles for a LoadBalancer service on different nodes expand_more	3m18s
Kubernetes e2e suite: [It] [sig-network] LoadBalancers [Feature:LoadBalancer] should have session affinity work for LoadBalancer service with Cluster traffic policy [Slow] [LinuxOnly] expand_more	10m47s
Kubernetes e2e suite: [It] [sig-network] LoadBalancers ExternalTrafficPolicy:Local [Feature:LoadBalancer] [Slow] should work for type=LoadBalancer expand_more	1m20s
Kubernetes e2e suite: [It] [sig-network] LoadBalancers [Feature:LoadBalancer] should be able to preserve UDP traffic when server pod cycles for a LoadBalancer service on the same nodes expand_more	2m12s
Kubernetes e2e suite: [It] [sig-network] LoadBalancers [Feature:LoadBalancer] should be able to switch session affinity for LoadBalancer service with Local traffic policy [Slow] [LinuxOnly] expand_more	10m13s
Kubernetes e2e suite: [It] [sig-network] LoadBalancers [Feature:LoadBalancer] should be able to change the type and ports of a UDP service [Slow] expand_more	3m46s
Kubernetes e2e suite: [It] [sig-network] LoadBalancers [Feature:LoadBalancer] should be able to switch session affinity for LoadBalancer service with Cluster traffic policy [Slow] [LinuxOnly] expand_more

To only

Kubernetes e2e suite: [It] [sig-network] LoadBalancers [Feature:LoadBalancer] should be able to change the type and ports of a UDP service [Slow] expand_more	16m19s
Kubernetes e2e suite: [It] [sig-network] LoadBalancers ExternalTrafficPolicy:Local [Feature:LoadBalancer] [Slow] should work for type=LoadBalancer expand_more

This will not pass as the loadbalancer is type proxy, so it will always replace the source ip, we may try to use something like this https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/other_features/ip_transparency#original-source-listener-filter but it may not work in all environments

Kubernetes e2e suite: [It] [sig-network] LoadBalancers ExternalTrafficPolicy:Local [Feature:LoadBalancer] [Slow] should work for type=LoadBalancer expand_more

The UDP one fails because is waiting for a rejection and we have discussed the same for TCP, is not clear we can impose this behavior for loadbalancers in general)

// testRejectedUDP tests that the given host rejects a UDP request on the given port.
func testRejectedUDP(host string, port int, timeout time.Duration) {
pollfn := func() (bool, error) {
result := pokeUDP(host, port, "echo hello", &UDPPokeParams{Timeout: 3 * time.Second})
if result.Status == UDPRefused {
return true, nil
}
return false, nil // caller can retry
}
if err := wait.PollImmediate(framework.Poll, timeout, pollfn); err != nil {
framework.Failf("UDP service %v:%v not rejected: %v", host, port, err)
}
}

[FAILED] UDP service 172.18.0.6:81 not rejected: timed out waiting for the condition

@aojea
Copy link
Member

aojea commented May 4, 2024

with

diff --git a/test/e2e/network/loadbalancer.go b/test/e2e/network/loadbalancer.go
index e68e9153d11..08896bbecae 100644
--- a/test/e2e/network/loadbalancer.go
+++ b/test/e2e/network/loadbalancer.go
@@ -495,8 +495,8 @@ var _ = common.SIGDescribe("LoadBalancers", ginkgo.Serial, func() {
                err = udpJig.Scale(ctx, 0)
                framework.ExpectNoError(err)
 
-               ginkgo.By("looking for ICMP REJECT on the UDP service's LoadBalancer")
-               testRejectedUDP(udpIngressIP, svcPort, loadBalancerCreateTimeout)
+               ginkgo.By("UDP service's LoadBalancer should not be reachable")
+               testNotReachableUDP(udpIngressIP, svcPort, loadBalancerCreateTimeout)
 
                ginkgo.By("Scaling the pods to 1")
                err = udpJig.Scale(ctx, 1)

for the other test should work for type=LoadBalancer we should add a new Label as Feature:LoadbalancerL4 or Layer4 or something that indicates that the loadbalancer provisioned operates at L4

@danwinship
Copy link
Contributor Author

This will not pass as the loadbalancer is type proxy

The e2e test should check the LoadBalancerStatus and belatedly skip the test in that case. (If either ingress.IPMode == v1.LoadBalancerIPModeProxy or ingress.IP == "")

@aojea
Copy link
Member

aojea commented May 5, 2024

great , those two tests that are failing now will be fixed once we make them aware of the proxy mode as Dan says in previous comment and we implement that in cloud-provider-kind kubernetes-sigs/cloud-provider-kind#63

Kubernetes e2e suite: [It] [sig-network] LoadBalancers ExternalTrafficPolicy:Local [Feature:LoadBalancer] [Slow] should work for type=LoadBalancer expand_more 1m32s
Kubernetes e2e suite: [It] [sig-network] LoadBalancers ExternalTrafficPolicy:Local [Feature:LoadBalancer] [Slow] should handle updates to ExternalTrafficPolicy field expand_more

@dims
Copy link
Member

dims commented May 6, 2024

@danwinship please re-add the entire test/e2e/network/loadbalancer.go back in this PR once #124519 lands. thanks!

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 7, 2024
(No changes from before they were deleted, and thus they don't
actually compile because of missing provider code.)
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 7, 2024
@@ -926,7 +926,7 @@ var _ = common.SIGDescribe("LoadBalancers", feature.LoadBalancer, func() {
})
})

var _ = common.SIGDescribe("LoadBalancers ESIPP", feature.LoadBalancer, framework.WithSlow(), func() {
var _ = common.SIGDescribe("LoadBalancers ExternalTrafficPolicy: Local", feature.LoadBalancer, framework.WithSlow(), func() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FTR I'm doing this now just to keep all (most?) of the renaming in one PR; I didn't want people updating their CI configs to skip all the "ESIPP" tests and then have us rename them later...)

@aojea
Copy link
Member

aojea commented May 7, 2024

@danwinship linter error

{ ScriptError 
test/e2e/network/loadbalancer.go:1453:6: ST1003: var previousHttpErrors should be previousHTTPErrors (stylecheck)
	var previousHttpErrors uint64 = 0
	    ^
test/e2e/network/loadbalancer.go:1500:3: ST1003: var currentHttpErrors should be currentHTTPErrors (stylecheck)
		currentHttpErrors := atomic.LoadUint64(&httpErrors)
		^
test/e2e/network/loadbalancer.go:1504:3: ST1003: var partialHttpErrors should be partialHTTPErrors (stylecheck)
		partialHttpErrors := currentHttpErrors - previousHttpErrors
		^

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 8, 2024

@danwinship: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-kind-cloud-provider-loadbalancer 67e0c51 link false /test pull-kubernetes-e2e-kind-cloud-provider-loadbalancer

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@aojea
Copy link
Member

aojea commented May 8, 2024

/test pull-kubernetes-e2e-gce-100-performance

@danwinship
Copy link
Contributor Author

/retest-required

@aojea
Copy link
Member

aojea commented May 8, 2024

/test pull-kubernetes-e2e-gce-100-performance

@danwinship
Copy link
Contributor Author

@aojea what are you running the perf test for?
anyway, I think this is ready to merge?

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2024
@aojea
Copy link
Member

aojea commented May 9, 2024

@aojea what are you running the perf test for?

wrong ones sorry, I wanted to check kubernetes/test-infra#32585

/lgtm
/approve

Let's iterate

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 9, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a8e87178b1c683203d68ba3d524f8f3984407eea

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit d36b267 into kubernetes:master May 9, 2024
21 of 23 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.31 milestone May 9, 2024
@danwinship danwinship deleted the feature-loadbalancer branch May 9, 2024 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

update load balancer e2e tests for legacy CloudProvider removal
4 participants