Configure COS to use NPD in daemonset mode and align kubeup NPD manifests with the manifests in the NPD repo #121007

upodroid · 2023-10-05T14:09:21Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

NodeProblemDetector tests are currently failing on kops clusters because this test tries to SSH to the API Server IP.

host local exec was introduced sometime ago to address this problem.

Which issue(s) this PR fixes:

Part of #120989

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Part of kubernetes/enhancements#4224

upodroid · 2023-10-05T14:09:32Z

/test npd

k8s-ci-robot · 2023-10-05T14:09:34Z

@upodroid: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test pull-cadvisor-e2e-kubernetes
/test pull-kubernetes-conformance-kind-ga-only-parallel
/test pull-kubernetes-coverage-unit
/test pull-kubernetes-dependencies
/test pull-kubernetes-dependencies-go-canary
/test pull-kubernetes-e2e-gce
/test pull-kubernetes-e2e-gce-100-performance
/test pull-kubernetes-e2e-gce-big-performance
/test pull-kubernetes-e2e-gce-canary
/test pull-kubernetes-e2e-gce-cos
/test pull-kubernetes-e2e-gce-cos-canary
/test pull-kubernetes-e2e-gce-cos-no-stage
/test pull-kubernetes-e2e-gce-network-proxy-http-connect
/test pull-kubernetes-e2e-gce-scale-performance-manual
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-integration
/test pull-kubernetes-integration-go-canary
/test pull-kubernetes-kubemark-e2e-gce-scale
/test pull-kubernetes-node-e2e-containerd
/test pull-kubernetes-typecheck
/test pull-kubernetes-unit
/test pull-kubernetes-unit-go-canary
/test pull-kubernetes-update
/test pull-kubernetes-verify
/test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

/test check-dependency-stats
/test pull-ci-kubernetes-unit-windows
/test pull-crio-cgroupv1-node-e2e-eviction
/test pull-crio-cgroupv1-node-e2e-features
/test pull-crio-cgroupv1-node-e2e-hugepages
/test pull-crio-cgroupv1-node-e2e-resource-managers
/test pull-e2e-gce-cloud-provider-disabled
/test pull-kubernetes-conformance-image-test
/test pull-kubernetes-conformance-kind-ga-only
/test pull-kubernetes-conformance-kind-ipv6-parallel
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-crio-node-memoryqos-cgrpv2
/test pull-kubernetes-cross
/test pull-kubernetes-e2e-autoscaling-hpa-cm
/test pull-kubernetes-e2e-autoscaling-hpa-cpu
/test pull-kubernetes-e2e-capz-azure-disk
/test pull-kubernetes-e2e-capz-azure-disk-vmss
/test pull-kubernetes-e2e-capz-azure-file
/test pull-kubernetes-e2e-capz-azure-file-vmss
/test pull-kubernetes-e2e-capz-conformance
/test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
/test pull-kubernetes-e2e-capz-windows-alpha-features
/test pull-kubernetes-e2e-capz-windows-master
/test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
/test pull-kubernetes-e2e-containerd-gce
/test pull-kubernetes-e2e-ec2
/test pull-kubernetes-e2e-ec2-conformance
/test pull-kubernetes-e2e-gce-correctness
/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-e2e-gce-cos-kubetest2
/test pull-kubernetes-e2e-gce-csi-serial
/test pull-kubernetes-e2e-gce-device-plugin-gpu
/test pull-kubernetes-e2e-gce-kubelet-credential-provider
/test pull-kubernetes-e2e-gce-network-proxy-grpc
/test pull-kubernetes-e2e-gce-serial
/test pull-kubernetes-e2e-gce-storage-disruptive
/test pull-kubernetes-e2e-gce-storage-slow
/test pull-kubernetes-e2e-gce-storage-snapshot
/test pull-kubernetes-e2e-gci-gce-autoscaling
/test pull-kubernetes-e2e-gci-gce-ingress
/test pull-kubernetes-e2e-gci-gce-ipvs
/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
/test pull-kubernetes-e2e-kind-alpha-features
/test pull-kubernetes-e2e-kind-canary
/test pull-kubernetes-e2e-kind-dual-canary
/test pull-kubernetes-e2e-kind-ipv6-canary
/test pull-kubernetes-e2e-kind-ipvs-dual-canary
/test pull-kubernetes-e2e-kind-kms
/test pull-kubernetes-e2e-kind-multizone
/test pull-kubernetes-e2e-kops-aws
/test pull-kubernetes-e2e-storage-kind-disruptive
/test pull-kubernetes-e2e-ubuntu-gce-network-policies
/test pull-kubernetes-integration-eks
/test pull-kubernetes-kind-dra
/test pull-kubernetes-kind-json-logging
/test pull-kubernetes-kind-text-logging
/test pull-kubernetes-kubemark-e2e-gce-big
/test pull-kubernetes-linter-hints
/test pull-kubernetes-local-e2e
/test pull-kubernetes-node-arm64-e2e-containerd-ec2
/test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
/test pull-kubernetes-node-arm64-ubuntu-serial-gce
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv2-e2e
/test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
/test pull-kubernetes-node-crio-e2e
/test pull-kubernetes-node-crio-e2e-kubetest2
/test pull-kubernetes-node-e2e-containerd-1-7-dra
/test pull-kubernetes-node-e2e-containerd-alpha-features
/test pull-kubernetes-node-e2e-containerd-ec2
/test pull-kubernetes-node-e2e-containerd-features
/test pull-kubernetes-node-e2e-containerd-features-kubetest2
/test pull-kubernetes-node-e2e-containerd-kubetest2
/test pull-kubernetes-node-e2e-containerd-serial-ec2
/test pull-kubernetes-node-e2e-containerd-sidecar-containers
/test pull-kubernetes-node-e2e-containerd-standalone-mode
/test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
/test pull-kubernetes-node-e2e-crio-dra
/test pull-kubernetes-node-kubelet-credential-provider
/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-cpu-manager
/test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
/test pull-kubernetes-node-kubelet-serial-hugepages
/test pull-kubernetes-node-kubelet-serial-memory-manager
/test pull-kubernetes-node-kubelet-serial-pod-disruption-conditions
/test pull-kubernetes-node-kubelet-serial-topology-manager
/test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
/test pull-kubernetes-node-swap-fedora
/test pull-kubernetes-node-swap-fedora-serial
/test pull-kubernetes-node-swap-ubuntu-serial
/test pull-kubernetes-unit-experimental
/test pull-kubernetes-verify-strict-lint
/test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

pull-kubernetes-conformance-kind-ga-only-parallel
pull-kubernetes-conformance-kind-ipv6-parallel
pull-kubernetes-dependencies
pull-kubernetes-e2e-ec2
pull-kubernetes-e2e-ec2-conformance
pull-kubernetes-e2e-gce
pull-kubernetes-e2e-kind
pull-kubernetes-e2e-kind-ipv6
pull-kubernetes-integration
pull-kubernetes-linter-hints
pull-kubernetes-node-e2e-containerd
pull-kubernetes-typecheck
pull-kubernetes-unit
pull-kubernetes-verify
pull-kubernetes-verify-strict-lint

In response to this:

/test npd

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

upodroid · 2023-10-05T15:21:44Z

/retest

My changes are successful. https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/121007/pull-kubernetes-e2e-gce/1709933808054177792 Look for NodeProblemDetector in the passed tab

/cc @pohly @SergeyKanzhelev

upodroid · 2023-10-05T15:42:31Z

/test pull-kubernetes-node-e2e-containerd-standalone-mode

upodroid · 2023-10-05T16:13:22Z

test/e2e/node/node_problem_detector.go

@@ -380,7 +378,7 @@ func getNpdPodStat(ctx context.Context, f *framework.Framework, nodeName string)

 	hasNpdPod := false
 	for _, pod := range summary.Pods {
-		if !strings.HasPrefix(pod.PodRef.Name, "npd") {
+		if !strings.HasPrefix(pod.PodRef.Name, "node-problem-detector") {


npd.yaml used a daemonset name that didn't match the values in the upstream manifests. https://github.com/kubernetes/node-problem-detector/blob/master/deployment/node-problem-detector.yaml

kops deploys npd's daemonset with the name node-problem-detector so it is expected all kubernetes clusters are bootstrapped using the correct manifests provided by the component/project maintainer

/cc @dims or @justinsb

k8s-ci-robot · 2023-10-05T16:14:28Z

@upodroid: GitHub didn't allow me to request PR reviews from the following users: or.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @dims or @justinsb

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

upodroid · 2023-10-15T11:54:51Z

/test pull-kubernetes-e2e-gce-correctness

upodroid · 2023-10-15T12:29:02Z

This PR is ready to merged.

Notes:

I struggled to get hostexec to work with systemctl commands. The rewrite to use hostexec pod is available at borg-land@f3fc246 and slack thread https://kubernetes.slack.com/archives/C0BP8PW9G/p1697203887986569 I'm stuck on what Ben suggested.
NPD is now deployed as a daemonset on COS.
The test now looks for pods that start with node-problem-detector instead of npd. The manifests match the naming set by npd.
I'm not sure why we check for standalone, standalone kubelet tests are launched and run with the node_e2e suite. https://testgrid.k8s.io/sig-node-release-blocking#node-kubelet-containerd-standalone-mode
ssh failures aren't important now, the test checks for pods if it can't establish ssh connection & npd isn't running as systemd service

upodroid · 2023-10-15T16:11:49Z

/test pull-kubernetes-e2e-gce-correctness

upodroid · 2023-10-15T17:32:58Z

/retest

upodroid · 2023-10-16T13:41:06Z

/retest

dims · 2023-10-17T13:06:25Z

/approve

k8s-ci-robot · 2023-10-17T13:06:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, upodroid

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~build/OWNERS~~ [dims]
~~cluster/addons/node-problem-detector/OWNERS~~ [dims]
~~cluster/gce/OWNERS~~ [dims]
~~test/e2e/node/OWNERS~~ [dims]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

SergeyKanzhelev · 2023-10-17T18:51:34Z

cluster/gce/config-default.sh

@@ -287,12 +287,7 @@ export ENABLE_DNS_HORIZONTAL_AUTOSCALER="${KUBE_ENABLE_DNS_HORIZONTAL_AUTOSCALER
 #   none           - Not run node problem detector.
 #   daemonset      - Run node problem detector as daemonset.
 #   standalone     - Run node problem detector as standalone system daemon.
-if [[ "${NODE_OS_DISTRIBUTION}" == "gci" ]]; then


so we are loosing the test coverage for the standalone mode? I think the hidden logic of defauling it to standalone on COS is wrong. But I worry that we are loosing test coverage.

We aren't because standalone tests are currently launched using the node e2e runner, which doesn't use the cluster/* scripts.

Standalone tests

https://testgrid.k8s.io/sig-node-release-blocking#node-kubelet-containerd-standalone-mode

They are all node e2e tests.

For NPD, I think we lost test coverage in standalone mode due to this PR. In NPD standalone mode, we currently pull the tar files from gs://kubernetes-release/node-problem-detector/. See https://github.com/kubernetes/kubernetes/blob/d3d06c3c7e07c7c79ff46c0fc3b9f081ce6b0226/cluster/gce/gci/configure.sh#L299C99-L299C117.

But running gsutil ls gs://kubernetes-release/node-problem-detector/, there is no NPD v0.8.13, which is bumped by this PR. It only has NPD versions up to v0.8.10. But none of the release blocking tests failed.

upodroid · 2023-10-22T10:00:36Z

This is ready to be merged. Can I get LGTM please?

dims · 2023-10-23T15:11:11Z

/lgtm

k8s-ci-robot · 2023-10-23T15:11:19Z

LGTM label has been added.

Git tree hash: 6f5204869bed8298c3d21764ff3bf89cb6f4d8dc

aojea · 2023-11-24T10:54:59Z

This is causing all these DNS jobs to fail because the NPD can not be scheduled
https://testgrid.k8s.io/sig-network-gce#gci-gce-kube-dns-nodecache
https://testgrid.k8s.io/sig-network-gce#gci-gce-coredns-nodecache
...

[FAILED] Error waiting for all pods to be running and ready: Timed out after 600.000s.
  Expected all pods (need at least 0) in namespace "kube-system" to be running and ready (except for 0).
  32 / 33 pods were running and ready.
  Expected 5 pod replicas, 5 are Running and Ready.
  Pods that were neither completed nor running:
      <[]v1.Pod | len:1, cap:1>: 
          - metadata:
              creationTimestamp: "2023-11-24T05:00:49Z"
              generateName: node-problem-detector-
              labels:
                app.kubernetes.io/name: node-problem-detector
                app.kubernetes.io/version: v0.8.13
                controller-revision-hash: 77d7676dcb
                pod-template-generation: "1"
              managedFields:
              - apiVersion: v1
                fieldsType: FieldsV1
                fieldsV1:
                  f:metadata:
                    f:generateName: {}
                    f:labels:
                      .: {}
                      f:app.kubernetes.io/name: {}
                      f:app.kubernetes.io/version: {}
                      f:controller-revision-hash: {}
                      f:pod-template-generation: {}
                    f:ownerReferences:
                      .: {}
                      k:{"uid":"05359bbf-d6c3-4b2a-bb96-f48f6f20aea8"}: {}
                  f:spec:
                    f:affinity:
                      .: {}
                      f:nodeAffinity:
                        .: {}
                        f:requiredDuringSchedulingIgnoredDuringExecution: {}
                    f:containers:
                      k:{"name":"node-problem-detector"}:
                        .: {}
                        f:command: {}
                        f:env:
                          .: {}
                          k:{"name":"NODE_NAME"}:
                            .: {}
                            f:name: {}
                            f:valueFrom:
                              .: {}
                              f:fieldRef: {}
                        f:image: {}
                        f:imagePullPolicy: {}
                        f:name: {}
                        f:resources:
                          .: {}
                          f:limits:
                            .: {}
                            f:cpu: {}
                            f:memory: {}
                          f:requests:
                            .: {}
                            f:cpu: {}
                            f:memory: {}
                        f:securityContext:
                          .: {}
                          f:privileged: {}
                        f:terminationMessagePath: {}
                        f:terminationMessagePolicy: {}
                        f:volumeMounts:
                          .: {}
                          k:{"mountPath":"/dev/kmsg"}:
                            .: {}
                            f:mountPath: {}
                            f:name: {}
                            f:readOnly: {}
                          k:{"mountPath":"/etc/localtime"}:
                            .: {}
                            f:mountPath: {}
                            f:name: {}
                            f:readOnly: {}
                          k:{"mountPath":"/var/log"}:
                            .: {}
                            f:mountPath: {}
                            f:name: {}
                    f:dnsPolicy: {}
                    f:enableServiceLinks: {}
                    f:restartPolicy: {}
                    f:schedulerName: {}
                    f:securityContext: {}
                    f:serviceAccount: {}
                    f:serviceAccountName: {}
                    f:terminationGracePeriodSeconds: {}
                    f:tolerations: {}
                    f:volumes:
                      .: {}
                      k:{"name":"kmsg"}:
                        .: {}
                        f:hostPath:
                          .: {}
                          f:path: {}
                          f:type: {}
                        f:name: {}
                      k:{"name":"localtime"}:
                        .: {}
                        f:hostPath:
                          .: {}
                          f:path: {}
                          f:type: {}
                        f:name: {}
                      k:{"name":"log"}:
                        .: {}
                        f:hostPath:
                          .: {}
                          f:path: {}
                          f:type: {}
                        f:name: {}
                manager: kube-controller-manager
                operation: Update
                time: "2023-11-24T05:00:49Z"
              - apiVersion: v1
                fieldsType: FieldsV1
                fieldsV1:
                  f:status:
                    f:conditions:
                      .: {}
                      k:{"type":"PodScheduled"}:
                        .: {}
                        f:lastProbeTime: {}
                        f:lastTransitionTime: {}
                        f:message: {}
                        f:reason: {}
                        f:status: {}
                        f:type: {}
                manager: kube-scheduler
                operation: Update
                subresource: status
                time: "2023-11-24T05:00:49Z"
              name: node-problem-detector-g9h6s
              namespace: kube-system
              ownerReferences:
              - apiVersion: apps/v1
                blockOwnerDeletion: true
                controller: true
                kind: DaemonSet
                name: node-problem-detector
                uid: 05359bbf-d6c3-4b2a-bb96-f48f6f20aea8
              resourceVersion: "[984](https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-coredns-performance-nodecache/1727913089497567232#1:build-log.txt%3A984)"
              uid: 6bf2ba29-68df-4544-8df5-a83dbf31bb7a
            spec:
              affinity:
                nodeAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                    nodeSelectorTerms:
                    - matchFields:
                      - key: metadata.name
                        operator: In
                        values:
                        - gce-coredns-perf-cache-master
              containers:
              - command:
                - /bin/sh
                - -c
                - exec /node-problem-detector --logtostderr --config.system-log-monitor=/config/kernel-monitor.json,/config/systemd-monitor.json
                  --config.custom-plugin-monitor=/config/kernel-monitor-counter.json,/config/systemd-monitor-counter.json
                  --config.system-stats-monitor=/config/system-stats-monitor.json >>/var/log/node-problem-detector.log
                  2>&1
                env:
                - name: NODE_NAME
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: spec.nodeName
                image: registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.13
                imagePullPolicy: IfNotPresent
                name: node-problem-detector
                resources:
                  limits:
                    cpu: 200m
                    memory: 100Mi
                  requests:
                    cpu: 20m
                    memory: 20Mi
                securityContext:
                  privileged: true
                terminationMessagePath: /dev/termination-log
                terminationMessagePolicy: File
                volumeMounts:
                - mountPath: /var/log
                  name: log
                - mountPath: /dev/kmsg
                  name: kmsg
                  readOnly: true
                - mountPath: /etc/localtime
                  name: localtime
                  readOnly: true
                - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
                  name: kube-api-access-6xftj
                  readOnly: true
              dnsPolicy: ClusterFirst
              enableServiceLinks: true
              preemptionPolicy: PreemptLowerPriority
              priority: 0
              restartPolicy: Always
              schedulerName: default-scheduler
              securityContext: {}
              serviceAccount: node-problem-detector
              serviceAccountName: node-problem-detector
              terminationGracePeriodSeconds: 30
              tolerations:
              - effect: NoExecute
                operator: Exists
              - effect: NoSchedule
                operator: Exists
              - key: CriticalAddonsOnly
                operator: Exists
              - effect: NoExecute
                key: node.kubernetes.io/not-ready
                operator: Exists
              - effect: NoExecute
                key: node.kubernetes.io/unreachable
                operator: Exists
              - effect: NoSchedule
                key: node.kubernetes.io/disk-pressure
                operator: Exists
              - effect: NoSchedule
                key: node.kubernetes.io/memory-pressure
                operator: Exists
              - effect: NoSchedule
                key: node.kubernetes.io/pid-pressure
                operator: Exists
              - effect: NoSchedule
                key: node.kubernetes.io/unschedulable
                operator: Exists
              volumes:
              - hostPath:
                  path: /var/log/
                  type: ""
                name: log
              - hostPath:
                  path: /dev/kmsg
                  type: ""
                name: kmsg
              - hostPath:
                  path: /etc/localtime
                  type: FileOrCreate
                name: localtime
              - name: kube-api-access-6xftj
                projected:
                  defaultMode: 420
                  sources:
                  - serviceAccountToken:
                      expirationSeconds: 3607
                      path: token
                  - configMap:
                      items:
                      - key: ca.crt
                        path: ca.crt
                      name: kube-root-ca.crt
                  - downwardAPI:
                      items:
                      - fieldRef:
                          apiVersion: v1
                          fieldPath: metadata.namespace
                        path: namespace
            status:
              conditions:
              - lastProbeTime: null
                lastTransitionTime: "2023-11-24T05:00:49Z"
                message: '0/4 nodes are available: 1 Insufficient cpu. preemption: 0/4 nodes
                  are available: 4 No preemption victims found for incoming pod.'
                reason: Unschedulable
                status: "False"
                type: PodScheduled
              phase: Pending
              qosClass: Burstable

upodroid · 2023-11-24T11:07:07Z

This is the problem

                message: '0/4 nodes are available: 1 Insufficient cpu. preemption: 0/4 nodes
                  are available: 4 No preemption victims found for incoming pod.'

npd requests the following:

        resources:
          limits:
            cpu: "200m"
            memory: "100Mi"
          requests:
            cpu: "20m"
            memory: "20Mi"

bumping the controlplane from n1-standard-1 to n1-standard-2 can fix it

aojea · 2023-11-24T11:09:35Z

This is the problem

                message: '0/4 nodes are available: 1 Insufficient cpu. preemption: 0/4 nodes
                  are available: 4 No preemption victims found for incoming pod.'

npd requests the following:

        resources:
          limits:
            cpu: "200m"
            memory: "100Mi"
          requests:
            cpu: "20m"
            memory: "20Mi"

bumping the controlplane from n1-standard-1 to n1-standard-2 can fix it

I prefer this https://github.com/kubernetes/test-infra/pull/31312/files

This change is ok,

aojea · 2023-11-24T11:10:27Z

bumping the controlplane from n1-standard-1 to n1-standard-2 can fix it

it is a daemonset, so you'll need to bump all nodes, but there is no need to waste resources, this changes is ok, the DNS jobs don't need to install npd kubernetes/test-infra#31312

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 5, 2023

k8s-ci-robot requested review from bart0sh and yujuhong October 5, 2023 14:09

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 5, 2023

k8s-ci-robot requested review from pohly and SergeyKanzhelev October 5, 2023 15:21

upodroid force-pushed the npd-host-exec-rewrite branch from 4b55a75 to cfccbd0 Compare October 5, 2023 16:10

upodroid commented Oct 5, 2023

View reviewed changes

k8s-ci-robot requested review from dims and justinsb October 5, 2023 16:14

upodroid force-pushed the npd-host-exec-rewrite branch 2 times, most recently from 355000a to 841a861 Compare October 5, 2023 16:18

upodroid changed the title ~~Rewrite NodeProblemDetector test to support host local exec~~ Configure COS to use NPD in daemonset mode and align NPD manifests with upstream NPD Oct 15, 2023

upodroid changed the title ~~Configure COS to use NPD in daemonset mode and align NPD manifests with upstream NPD~~ Configure COS to use NPD in daemonset mode and align kubeup NPD manifests with the manifests in the NPD repo Oct 15, 2023

upodroid force-pushed the npd-host-exec-rewrite branch from de700e8 to c263d06 Compare October 15, 2023 16:11

configure npd to run as daemonset on cos

011c65e

upodroid force-pushed the npd-host-exec-rewrite branch from c263d06 to 011c65e Compare October 16, 2023 11:34

upodroid mentioned this pull request Oct 16, 2023

Tests that need to be removed/rewritten to support kops #120989

Open

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 17, 2023

SergeyKanzhelev reviewed Oct 17, 2023

View reviewed changes

k8s-ci-robot assigned dims Oct 23, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 23, 2023

k8s-ci-robot merged commit 604e9e0 into kubernetes:master Oct 23, 2023
15 checks passed

SIG Node CI/Test Board automation moved this from PRs - Needs Approver to Done Oct 23, 2023

SIG Node PR Triage automation moved this from Needs Approver to Done Oct 23, 2023

k8s-ci-robot added this to the v1.29 milestone Oct 23, 2023

upodroid mentioned this pull request Nov 9, 2023

test: introduce a Feature label for skipping KubeUp specific tests #121768

Merged

wangzhen127 mentioned this pull request Mar 9, 2024

Missing test coverage for standalone mode configuration kubernetes/node-problem-detector#878

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure COS to use NPD in daemonset mode and align kubeup NPD manifests with the manifests in the NPD repo #121007

Configure COS to use NPD in daemonset mode and align kubeup NPD manifests with the manifests in the NPD repo #121007

upodroid commented Oct 5, 2023

upodroid commented Oct 5, 2023

k8s-ci-robot commented Oct 5, 2023

upodroid commented Oct 5, 2023

upodroid commented Oct 5, 2023

upodroid Oct 5, 2023

upodroid Oct 5, 2023

k8s-ci-robot commented Oct 5, 2023

upodroid commented Oct 15, 2023

upodroid commented Oct 15, 2023

upodroid commented Oct 15, 2023

upodroid commented Oct 15, 2023

upodroid commented Oct 16, 2023

dims commented Oct 17, 2023

k8s-ci-robot commented Oct 17, 2023

SergeyKanzhelev Oct 17, 2023

upodroid Oct 17, 2023

upodroid Oct 17, 2023

wangzhen127 Mar 9, 2024

upodroid commented Oct 22, 2023

dims commented Oct 23, 2023

k8s-ci-robot commented Oct 23, 2023

aojea commented Nov 24, 2023

upodroid commented Nov 24, 2023

aojea commented Nov 24, 2023

aojea commented Nov 24, 2023

Configure COS to use NPD in daemonset mode and align kubeup NPD manifests with the manifests in the NPD repo #121007

Configure COS to use NPD in daemonset mode and align kubeup NPD manifests with the manifests in the NPD repo #121007

Conversation

upodroid commented Oct 5, 2023

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

upodroid commented Oct 5, 2023

k8s-ci-robot commented Oct 5, 2023

upodroid commented Oct 5, 2023

upodroid commented Oct 5, 2023

upodroid Oct 5, 2023

Choose a reason for hiding this comment

upodroid Oct 5, 2023

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 5, 2023

upodroid commented Oct 15, 2023

upodroid commented Oct 15, 2023

upodroid commented Oct 15, 2023

upodroid commented Oct 15, 2023

upodroid commented Oct 16, 2023

dims commented Oct 17, 2023

k8s-ci-robot commented Oct 17, 2023

SergeyKanzhelev Oct 17, 2023

Choose a reason for hiding this comment

upodroid Oct 17, 2023

Choose a reason for hiding this comment

upodroid Oct 17, 2023

Choose a reason for hiding this comment

wangzhen127 Mar 9, 2024

Choose a reason for hiding this comment

upodroid commented Oct 22, 2023

dims commented Oct 23, 2023

k8s-ci-robot commented Oct 23, 2023

aojea commented Nov 24, 2023

upodroid commented Nov 24, 2023

aojea commented Nov 24, 2023

aojea commented Nov 24, 2023