-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure COS to use NPD in daemonset mode and align kubeup NPD manifests with the manifests in the NPD repo #121007
Configure COS to use NPD in daemonset mode and align kubeup NPD manifests with the manifests in the NPD repo #121007
Conversation
/test npd |
@upodroid: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest My changes are successful. https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/121007/pull-kubernetes-e2e-gce/1709933808054177792 Look for NodeProblemDetector in the passed tab |
/test pull-kubernetes-node-e2e-containerd-standalone-mode |
4b55a75
to
cfccbd0
Compare
@@ -380,7 +378,7 @@ func getNpdPodStat(ctx context.Context, f *framework.Framework, nodeName string) | |||
|
|||
hasNpdPod := false | |||
for _, pod := range summary.Pods { | |||
if !strings.HasPrefix(pod.PodRef.Name, "npd") { | |||
if !strings.HasPrefix(pod.PodRef.Name, "node-problem-detector") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
npd.yaml used a daemonset name that didn't match the values in the upstream manifests. https://github.com/kubernetes/node-problem-detector/blob/master/deployment/node-problem-detector.yaml
kops deploys npd's daemonset with the name node-problem-detector
so it is expected all kubernetes clusters are bootstrapped using the correct manifests provided by the component/project maintainer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@upodroid: GitHub didn't allow me to request PR reviews from the following users: or. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
355000a
to
841a861
Compare
/test pull-kubernetes-e2e-gce-correctness |
This PR is ready to merged. Notes:
|
de700e8
to
c263d06
Compare
/test pull-kubernetes-e2e-gce-correctness |
/retest |
c263d06
to
011c65e
Compare
/retest |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, upodroid The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -287,12 +287,7 @@ export ENABLE_DNS_HORIZONTAL_AUTOSCALER="${KUBE_ENABLE_DNS_HORIZONTAL_AUTOSCALER | |||
# none - Not run node problem detector. | |||
# daemonset - Run node problem detector as daemonset. | |||
# standalone - Run node problem detector as standalone system daemon. | |||
if [[ "${NODE_OS_DISTRIBUTION}" == "gci" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we are loosing the test coverage for the standalone mode? I think the hidden logic of defauling it to standalone on COS is wrong. But I worry that we are loosing test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't because standalone tests are currently launched using the node e2e runner, which doesn't use the cluster/* scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standalone tests
https://testgrid.k8s.io/sig-node-release-blocking#node-kubelet-containerd-standalone-mode
They are all node e2e tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For NPD, I think we lost test coverage in standalone mode due to this PR. In NPD standalone mode, we currently pull the tar files from gs://kubernetes-release/node-problem-detector/
. See https://github.com/kubernetes/kubernetes/blob/d3d06c3c7e07c7c79ff46c0fc3b9f081ce6b0226/cluster/gce/gci/configure.sh#L299C99-L299C117.
But running gsutil ls gs://kubernetes-release/node-problem-detector/
, there is no NPD v0.8.13
, which is bumped by this PR. It only has NPD versions up to v0.8.10
. But none of the release blocking tests failed.
This is ready to be merged. Can I get LGTM please? |
/lgtm |
LGTM label has been added. Git tree hash: 6f5204869bed8298c3d21764ff3bf89cb6f4d8dc
|
This is causing all these DNS jobs to fail because the NPD can not be scheduled
|
This is the problem
npd requests the following:
bumping the controlplane from n1-standard-1 to n1-standard-2 can fix it |
I prefer this https://github.com/kubernetes/test-infra/pull/31312/files This change is ok, |
it is a daemonset, so you'll need to bump all nodes, but there is no need to waste resources, this changes is ok, the DNS jobs don't need to install npd kubernetes/test-infra#31312 |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
NodeProblemDetector tests are currently failing on kops clusters because this test tries to SSH to the API Server IP.
host local exec was introduced sometime ago to address this problem.
Which issue(s) this PR fixes:
Part of #120989
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
Part of kubernetes/enhancements#4224