Daemonset Pod Misscheduled on Tainted Node #71086

Evesy · 2018-11-15T19:08:06Z

What happened:

Daemonset without any node tolerations scheduled on suitable nodes. Specifically not scheduled on nodes with the below taint:
```
 taints:
 - effect: NoSchedule
   key: key
   value: core
```
GKE node upgrade was triggered
Pod ended up scheduled on a node with the above taint

Daemonset status:

Desired Number of Nodes Scheduled: 2
Current Number of Nodes Scheduled: 2
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 2
Number of Nodes Misscheduled: 1
Pods Status:  3 Running / 0 Waiting / 0 Succeeded / 0 Failed

Possibly related: The node that the pod was incorrectly scheduled on was a brand new GCE instance running 1.11.2 (Previous nodes were on 1.10.9) (Creation time: Nov 15, 2018, 3:41:59 PM); however the node has the same ID as a previous node in the cluster, and Kubernetes recognises the node as being 13 days old (Despite it only being created hours ago)

What you expected to happen:
Daemonset pods should not have been scheduled on nodes with the NO_SCHEDULE taint, as the daemonset does not have any tolerations

How to reproduce it (as minimally and precisely as possible):
Unable to replicate at this point

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.11.2
Cloud provider or hardware configuration: GCP/GKE
OS (e.g. from /etc/os-release): Google COS
Kernel (e.g. uname -a):
Install tools:
Others:

/kind bug

The text was updated successfully, but these errors were encountered:

Evesy · 2018-11-15T19:08:29Z

/sig scheduling

ravisantoshgudimetla · 2018-11-16T02:57:14Z

I assume, the master is also running on 1.11. Also, can you provide the ds pod spec? Wonder if any default tolerations got applied to the pod or if the pod is critical pod

Evesy · 2018-11-16T10:05:01Z

Yeah, master is also 1.11

Pod template:

  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      creationTimestamp: null
      labels:
        k8s-app: ip-masq-agent
    spec:
      containers:
      - image: k8s.gcr.io/ip-masq-agent-amd64:v2.0.2
        imagePullPolicy: IfNotPresent
        name: ip-masq-agent
        resources:
          requests:
            cpu: 10m
            memory: 16Mi
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/config
          name: config
      dnsPolicy: ClusterFirst
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/config
          name: config
      dnsPolicy: ClusterFirst
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/masq-agent-ds-ready: "true"
      priorityClassName: system-node-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: ip-masq-agent
      serviceAccountName: ip-masq-agent
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: config
            path: ip-masq-agent
          name: ip-masq-agent
          optional: true
        name: config
  templateGeneration: 2
  updateStrategy:
    type: OnDelete

The pod is critical (scheduler.alpha.kubernetes.io/critical-pod: ""), I wasn't aware this would cause it to be scheduled on any node, is that the case?

ravisantoshgudimetla · 2018-11-16T18:53:29Z

I think yes, we add tolerations to critical pods:

kubernetes/pkg/controller/daemon/util/daemonset_util.go

Line 52 in d0c3cd1

func AddOrUpdateDaemonPodTolerations(spec *v1.PodSpec, isCritical bool) {

But to be fair, this should have been the case earlier as well, meaning before upgrading 1.11, this DS should have landed onto the node irrespective of taints. We'd check only essential predicates for critical pod. Isn't it the case?

Evesy · 2018-11-16T22:38:10Z

There is more than one node with this taint on it too, but only one of them ended up with a scheduled DS pod on it

ravisantoshgudimetla · 2018-11-19T18:15:28Z

It could be because of nodeSelector. Can you please check if that is the case?

Evesy · 2018-11-19T18:25:23Z

The Daemonset does have a nodeSelector of beta.kubernetes.io/masq-agent-ds-ready: "true", which exists on all nodes

ravisantoshgudimetla · 2018-11-19T18:44:18Z

To be clear, does all the nodes have same taint and label? I ask because I thought your initial comment was related to pod shouldn't be scheduled on node with taint NoSchedule. It does because of #71086 (comment)? and now you think the pod is getting misplaced onto wrong node? If yes, it could help if you provide the node specs on which the pod is getting scheduling and which one you expect the pod to be scheduled on? Also, you can more information by increasing the log level and looking at the kube-controller.log in this case as DS controller is still handling scheduling in 1.11

Evesy · 2018-11-19T19:39:45Z

Two sets of node types, A and B

Both nodes A and B have the label beta.kubernetes.io/masq-agent-ds-ready: "true"

Nodes B also have a taint:

 taints:
 - effect: NoSchedule
   key: key
   value: core

Normal and past behaviour is that the Daemonset in question runs pods on all of nodes A, none on nodes B

In this scenario an additional pod belonging to this Daemonset has been scheduled on node B -- The Daemonset identifies this node as having been misscheduled. The DS has also correctly scheduled pods on all nodes A

I'd understand if the below added tolerations to the Daemonset, resulting in it being scheduled everywhere, however this doesn't appear to be the case. Only one node of type B has ended up with a DS pod running on it.

kubernetes/pkg/controller/daemon/util/daemonset_util.go

Line 52 in d0c3cd1

func AddOrUpdateDaemonPodTolerations(spec *v1.PodSpec, isCritical bool) {

Unfortunately I'm running on GKE so I don't have access to the controller logs. I'm fairly certain if I were to delete the misplaced pod it simply wouldn't be recreated and the issue would be resolved, but I'm curious to know if this is expected behaviour.

Thanks!

fejta-bot · 2019-02-17T20:14:10Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-03-19T20:58:49Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-04-18T21:30:30Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-04-18T21:30:37Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 15, 2018

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 15, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 17, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 19, 2019

k8s-ci-robot closed this as completed Apr 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Daemonset Pod Misscheduled on Tainted Node #71086

Daemonset Pod Misscheduled on Tainted Node #71086

Evesy commented Nov 15, 2018

Evesy commented Nov 15, 2018

ravisantoshgudimetla commented Nov 16, 2018

Evesy commented Nov 16, 2018 •

edited

ravisantoshgudimetla commented Nov 16, 2018

Evesy commented Nov 16, 2018

ravisantoshgudimetla commented Nov 19, 2018

Evesy commented Nov 19, 2018

ravisantoshgudimetla commented Nov 19, 2018 •

edited

Evesy commented Nov 19, 2018

fejta-bot commented Feb 17, 2019

fejta-bot commented Mar 19, 2019

fejta-bot commented Apr 18, 2019

k8s-ci-robot commented Apr 18, 2019

Daemonset Pod Misscheduled on Tainted Node #71086

Daemonset Pod Misscheduled on Tainted Node #71086

Comments

Evesy commented Nov 15, 2018

Evesy commented Nov 15, 2018

ravisantoshgudimetla commented Nov 16, 2018

Evesy commented Nov 16, 2018 • edited

ravisantoshgudimetla commented Nov 16, 2018

Evesy commented Nov 16, 2018

ravisantoshgudimetla commented Nov 19, 2018

Evesy commented Nov 19, 2018

ravisantoshgudimetla commented Nov 19, 2018 • edited

Evesy commented Nov 19, 2018

fejta-bot commented Feb 17, 2019

fejta-bot commented Mar 19, 2019

fejta-bot commented Apr 18, 2019

k8s-ci-robot commented Apr 18, 2019

Evesy commented Nov 16, 2018 •

edited

ravisantoshgudimetla commented Nov 19, 2018 •

edited