New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promote gRPC probe e2e test to Conformance #115856
Promote gRPC probe e2e test to Conformance #115856
Conversation
@lanycrost: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @lanycrost. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@lanycrost: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
i'm not sure why that was mentioned in this pull request, but that's not a compatible change to make, and would break existing clients, so we can't do that |
@liggitt I'm using GRPCAction here. For example look HTTPGetAction and TCPSocketAction also have Port field which type is |
That was an explicit design choice when adding GRPC actions. From https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2727-grpc-probe/README.md#design-details:
Even if it wasn't intentional and we did want to change it, all API clients built since 1.23.0 would break if the server ever started returning string values in |
OK, thanks, got it |
@lanycrost we don't mark tests as conformance from the get go. we add them as normal tests and check for flakiness over a couple of weeks and then mark them for conformance if appropriate: |
/release-note-none |
@lanycrost: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
the promoted test is flaking
the condition for a Conformance test is to show stability, this failure has to be analyzed /hold |
@lanycrost are you interested to investigate failures? Link: https://storage.googleapis.com/k8s-triage/index.html?test=GRPC%20liveness Failures classified into "In [BeforeEach]" are not interesting. Looking at the mismatched number of restarts section. In this section, click on occurrences of test failures and on stdout for the gRPC tests. First example I see - liveness failed, but container doesn't seem to be restarted (or restartCount didn't increment):
Looks like this is a windows Node. Looking at the kubelet.log (artifact link from the test failure). Kubelet log shows that the event was detected, but kubelet ignored the need for restart. So it seems like gRPC worked fine, the issue is in Windows kubelet itself. Looking at other types of probes: https://storage.googleapis.com/k8s-triage/index.html?test=liveness It is indeed the case |
@lanycrost please find the linux test grid demonstrating the all green execution. I think all probe failures needs to be investigated separately. No need to block this KEP release on overall problems with Windows. |
Filed: #116123 |
@SergeyKanzhelev yes, sure. I will try to understand what's going on here to get it work correctly. |
@aojea one example of non-flaking: https://testgrid.k8s.io/sig-windows-signal#capz-windows-containerd-master&include-filter-by-regex=GRPC As I pointed out before - flakes are hapenning on all probe types and not related to gRPC. I'd suggest we unblock this PR. |
Is about the stability of the test on the monitored jobs, blocking and informing, https://testgrid.k8s.io/sig-release-master-informing#gce-master-scale-correctness , the triage link picks a lot of errors and most of them are usually noise from jobs that are not stable. These tests in presubmits and periodic release-blocking and release-informing are stable and monitored daily, do we know why this test failed https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/115856/pull-kubernetes-conformance-kind-ga-only-parallel/1629756997509320704 That the test that is going to be promoted as conformance is just the one that fails in presubmit is at least something that deserve to be explained before merge, or we are being incoherent with ourselves allowing to merge with a flake |
The failure in below is:
Seems like the ordering needs to change. So the GA should happen before we promoting to Conformance. I will submit a PR to GA the feature. I treated promotion to Conformance as a prerequisite |
this is done: #116233 I wonder if tests will just pick it up or you need to rebase. I'd suggest rebasing to be on a safe side |
prow rebases, github actions does not (at least last time I checked) /test pull-kubernetes-conformance-kind-ga-only-parallel |
All tests are passing now. Time to merge. |
@SergeyKanzhelev thanks! /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, lanycrost, saschagrunert, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@SergeyKanzhelev seems we need to remove |
/unhold |
What type of PR is this?
/kind cleanup
/area tests
/area conformance
@kubernetes/sig-architecture-pr-reviews @kubernetes/sig-node-pr-reviews @kubernetes/cncf-conformance-wg
What this PR does / why we need it:
adds tests in test/e2e/common/node/container_probe.go:
Which issue(s) this PR fixes:
Fixes #115780
Special notes for your reviewer:
I'm thinking about the changing Port type of
GRPCAction
struct tointstr.IntOrString
.What you think about it?
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: