New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Daemonset Pod Misscheduled on Tainted Node #71086
Comments
/sig scheduling |
I assume, the master is also running on 1.11. Also, can you provide the ds pod spec? Wonder if any default tolerations got applied to the pod or if the pod is critical pod |
Yeah, master is also 1.11 Pod template:
The pod is critical ( |
I think yes, we add tolerations to critical pods:
But to be fair, this should have been the case earlier as well, meaning before upgrading 1.11, this DS should have landed onto the node irrespective of taints. We'd check only essential predicates for critical pod. Isn't it the case? |
There is more than one node with this taint on it too, but only one of them ended up with a scheduled DS pod on it |
It could be because of nodeSelector. Can you please check if that is the case? |
The Daemonset does have a nodeSelector of |
To be clear, does all the nodes have same taint and label? I ask because I thought your initial comment was related to pod shouldn't be scheduled on node with taint |
Two sets of node types, A and B Both nodes A and B have the label Nodes B also have a taint:
Normal and past behaviour is that the Daemonset in question runs pods on all of nodes A, none on nodes B In this scenario an additional pod belonging to this Daemonset has been scheduled on node B -- The Daemonset identifies this node as having been misscheduled. The DS has also correctly scheduled pods on all nodes A I'd understand if the below added tolerations to the Daemonset, resulting in it being scheduled everywhere, however this doesn't appear to be the case. Only one node of type B has ended up with a DS pod running on it.
Unfortunately I'm running on GKE so I don't have access to the controller logs. I'm fairly certain if I were to delete the misplaced pod it simply wouldn't be recreated and the issue would be resolved, but I'm curious to know if this is expected behaviour. Thanks! |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
Possibly related: The node that the pod was incorrectly scheduled on was a brand new GCE instance running
1.11.2
(Previous nodes were on1.10.9
) (Creation time:Nov 15, 2018, 3:41:59 PM
); however the node has the same ID as a previous node in the cluster, and Kubernetes recognises the node as being 13 days old (Despite it only being created hours ago)What you expected to happen:
Daemonset pods should not have been scheduled on nodes with the
NO_SCHEDULE
taint, as the daemonset does not have any tolerationsHow to reproduce it (as minimally and precisely as possible):
Unable to replicate at this point
Anything else we need to know?:
Environment:
kubectl version
): 1.11.2uname -a
):/kind bug
The text was updated successfully, but these errors were encountered: