Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write Service Status Life Cycle test - +4 endpoint coverage #98018

Merged
merged 1 commit into from
Mar 4, 2021

Conversation

riaankleinhans
Copy link
Contributor

@riaankleinhans riaankleinhans commented Jan 13, 2021

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
This PR adds a test to test the following untested endpoints:

  • patchCoreV1NamespacedServiceStatus
  • readCoreV1NamespacedServiceStatus
  • replaceCoreV1NamespacedServiceStatus
  • patchCoreV1NamespacedService

Which issue(s) this PR fixes:
Fixes #94867
Testgrid Link:
Link

Special notes for your reviewer:
Adds +4 endpoint test coverage (good for conformance)

Does this PR introduce a user-facing change?:

NONE

Release note:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

/sig testing
/sig architecture
/area conformance

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/conformance Issues or PRs related to kubernetes conformance tests needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 13, 2021
@k8s-ci-robot k8s-ci-robot added area/test sig/network Categorizes an issue or PR as relevant to SIG Network. labels Jan 13, 2021
@riaankleinhans
Copy link
Contributor Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 13, 2021
@riaankleinhans
Copy link
Contributor Author

/assign @smarterclayton
This is from the discussion in the 12 Jan Conformance meeting

@riaankleinhans
Copy link
Contributor Author

/retest

@heyste heyste force-pushed the service-status-life-cycle-test branch from 6bae402 to cdacda8 Compare January 13, 2021 21:06
@riaankleinhans
Copy link
Contributor Author

/test all

@heyste
Copy link
Member

heyste commented Jan 14, 2021

/test pull-kubernetes-e2e-ubuntu-gce-network-policies
flake Kubernetes e2e suite: [sig-network] Networking IPerf [Experimental] [Slow] [Feature:Networking-Performance] should transfer ~ 1GB onto the service endpoint 1 servers (maximum of 1 clients)

I0113 23:52:52.604] Jan 13 23:52:44.588: INFO: Selector matched 1 pods for map[app:iperf-e2e-cli-pod]
I0113 23:52:52.604] Jan 13 23:52:44.589: INFO: ForEach: Found 1 pods from the filter.  Now looping through them.
I0113 23:52:52.604] Jan 13 23:52:44.589: INFO: Running '/go/src/k8s.io/kubernetes/kubernetes/platforms/linux/amd64/kubectl --server=https://35.233.212.159 --kubeconfig=/workspace/.kube/config --namespace=network-perf-2835 logs iperf-e2e-cli-pod-0 iperf-client'
I0113 23:52:52.604] Jan 13 23:52:44.903: INFO: stderr: ""
I0113 23:52:52.604] Jan 13 23:52:44.903: INFO: stdout: "connect failed: Connection refused\n"
I0113 23:52:52.605] Jan 13 23:52:46.904: FAIL: Unexpected error, Failed to find "0-", last result: "connect failed: Connection refused
I0113 23:52:52.605] " when running forEach on the pods. 

@riaankleinhans
Copy link
Contributor Author

/test pull-kubernetes-e2e-ubuntu-gce-network-policies

I0114 01:20:42.031] [Fail] [sig-network] Networking IPerf [Experimental] [Slow] [Feature:Networking-Performance] [It] should transfer ~ 1GB onto the service endpoint 1 servers (maximum of 1 clients) I0114 01:20:42.031] test/e2e/framework/framework.go:809 I0114 01:20:42.031] I0114 01:20:42.031] Ran 405 of 5705 Specs in 1676.663 seconds I0114 01:20:42.031] FAIL! -- 404 Passed | 1 Failed | 0 Pending | 5300 Skipped

@riaankleinhans
Copy link
Contributor Author

/test all

test/e2e/network/service.go Show resolved Hide resolved
test/e2e/network/service.go Outdated Show resolved Hide resolved
test/e2e/network/service.go Outdated Show resolved Hide resolved
@heyste heyste force-pushed the service-status-life-cycle-test branch from cdacda8 to c00e1c0 Compare January 21, 2021 01:03
@riaankleinhans
Copy link
Contributor Author

/test all

1 similar comment
@riaankleinhans
Copy link
Contributor Author

/test all

@heyste heyste force-pushed the service-status-life-cycle-test branch from 8e227bf to 2f9840c Compare February 17, 2021 23:23
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 17, 2021
@heyste heyste force-pushed the service-status-life-cycle-test branch from 2f9840c to 0993b7b Compare February 18, 2021 01:16
@heyste
Copy link
Member

heyste commented Feb 18, 2021

/test pull-kubernetes-e2e-kind
flake

Kubernetes e2e suite: [sig-network] EndpointSlice should create Endpoints and EndpointSlices for Pods matching a Service  3m1s

test/e2e/network/endpointslice.go:167 
Feb 18 01:43:36.146: timed out waiting for Pods to have IPs assigned
Unexpected error:
    <*errors.errorString \| 0xc0001f81d0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred
test/e2e/network/endpointslice.go:289

@dims
Copy link
Member

dims commented Feb 18, 2021

@johnbelamaric all the comments have been addressed. please take a look. Do you want any other sig-network approvers to chime in too?

@riaankleinhans riaankleinhans changed the title Write Service Status Life Cycle test - +2 endpoint coverage Write Service Status Life Cycle test - +4 endpoint coverage Feb 18, 2021
@riaankleinhans
Copy link
Contributor Author

@heyste have updated the Status testing as discussed and we believe the test is good to go on the test grid. There are only 3 weeks left before test freeze 1.21, and we need 2 weeks flake free before Promoting.
Can you please review @johnbelamaric @andrewsykim

@heyste heyste force-pushed the service-status-life-cycle-test branch from 0993b7b to 13708ce Compare March 1, 2021 22:14
@aojea
Copy link
Member

aojea commented Mar 1, 2021

I don't want to block on this comment, just want to open a discussion based on this e2e test approach for following up later.

I'm far from an expert on the area, but I always see that the recommendation is to not use watchers directly because they are not able to handle disconnects, most of the e2e use the wait.PollImmediate() method for checking the resources and synchronizing the steps with that active loop.

@BenTheElder @wojtek-t @liggitt what do you think? I think that it will be useful to set some guidelines on the e2e tests so we have some consistency on the code.

e2e test validates the following service status endpoints
- patchCoreV1NamespacedServiceStatus
- readCoreV1NamespacedServiceStatus
- replaceCoreV1NamespacedServiceStatus

Also includes untested service endpoint
- patchCoreV1NamespacedService
@heyste heyste force-pushed the service-status-life-cycle-test branch from 13708ce to 3e9c44f Compare March 1, 2021 23:06
Ingress: []v1.LoadBalancerIngress{{IP: "203.0.113.1"}},
}
lbStatusJSON, err := json.Marshal(lbStatus)
framework.ExpectNoError(err, "Failed to marshal JSON. %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks funny :) err twice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit. no need to fix right now

@dims
Copy link
Member

dims commented Mar 1, 2021

/approve
/lgtm

let's please iterate if needed. we need to get this to soak quickly

cc @spiffxp @johnbelamaric @liggitt

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 1, 2021
@johnbelamaric
Copy link
Member

/lgtm
/approve

@aojea we have built some defenses in for watch failures, we do want to use watch in a number of these tests. in this one in particular, we need to use watch otherwise we may miss our update due to fighting with the controller that is actually managing the resource.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, johnbelamaric, Riaankl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2021
@aojea
Copy link
Member

aojea commented Mar 4, 2021

👍 cool, nice to know

@riaankleinhans
Copy link
Contributor Author

/test pull-kubernetes-e2e-kind

Unrelated flake

Kubernetes e2e suite: [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic PV (default fs)] volumes should store data expand_more	6m7s
Kubernetes e2e suite: [sig-cli] Kubectl client Simple pod should return command exit codes expand_more	4m38s
Kubernetes e2e suite: [sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: block] [Testpattern: Pre-provisioned PV (ext4)] volumes should store data expand_more

@riaankleinhans
Copy link
Contributor Author

/test pull-kubernetes-integration

Unrelated flake

2/3181 Tests Failed. | expand_less
k8s.io/kubernetes/test/integration/apiserver/admissionwebhook: TestWebhookTimeoutWithWatchCache/timed_out_client_requests_skip_later_mutating_webhooks_(regardless_of_failure_policy)_and_fail expand_more3s | k8s.io/kubernetes/test/integration/apiserver/admissionwebhook: TestWebhookTimeoutWithWatchCache/timed_out_client_requests_skip_later_mutating_webhooks_(regardless_of_failure_policy)_and_fail expand_more | 3s
k8s.io/kubernetes/test/integration/apiserver/admissionwebhook: TestWebhookTimeoutWithWatchCache/timed_out_client_requests_skip_later_mutating_webhooks_(regardless_of_failure_policy)_and_fail expand_more | 3s
k8s.io/kubernetes/test/integration/apiserver/admissionwebhook: TestWebhookTimeoutWithWatchCache expand_more | k8s.io/kubernetes/test/integration/apiserver/admissionwebhook: TestWebhookTimeoutWithWatchCache expand_more
k8s.io/kubernetes/test/integration/apiserver/admissionwebhook: TestWebhookTimeoutWithWatchCache expand_more

@k8s-ci-robot
Copy link
Contributor

@Riaankl: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kind 3e9c44f link /test pull-kubernetes-e2e-kind

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@riaankleinhans
Copy link
Contributor Author

/test pull-kubernetes-e2e-kind
unrelated flake

• Failure [180.934 seconds]
[sig-network] EndpointSlice
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/common/framework.go:23
  should create Endpoints and EndpointSlices for Pods matching a Service [It]
  /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/endpointslice.go:168
  Mar  4 22:19:23.352: timed out waiting for Pods to have IPs assigned
      
  Unexpected error:
      <*errors.errorString | 0xc0002b6240>: {
          s: "timed out waiting for the condition",
      }
      timed out waiting for the condition
  occurred

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/conformance Issues or PRs related to kubernetes conformance tests area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write Service Status Life Cycle test - +4 endpoint coverage
8 participants