cluster creating hundreds of pods because node is down #6303

migs35323 · 2022-10-20T11:02:55Z

Not sure if this is the right place to ask this or in rancher..
i have a k3s cluster with some nodes, and i have a deployment of an app where i made it so it only schedulles in one particular node,
i used rancher for this effect, (deployment > config > Node Scheduling > run in specific node )
the thing is that particular node went down and when i did come back it attempted to create hundreds of pods, i have the cluster overflowing with terminating pods, hundreds or thousans of them at one point in the past...

is there a way that i can make it so it just doesn't try to create pods when the available machines are down?

brandond · 2022-10-20T17:37:51Z

You didn't fill out the issue template, so I'm not sure what version of K3s you're working with or what your cluster configuration is.

It is not expected that the deployment controller would continue to create pods when there is no node available to schedule them on, or when a node does become available. Are you using an autoscaler that scaled up the deployment replica count in an attempt to create pods? Can you post more information on what specifically you're seeing, including kubectl get deployment -o yaml for your deployment, and kubectl get pod -o yaml showing the pods in question?

migs35323 · 2022-10-21T09:37:28Z

k3s v1.24.4+k3s1
cluster with rancher v2.6.8
1 master with no schedule taint,
2 normal nodes
1 extra node with PreferNoSchedule

the extra node (the last one) is where i had the workload in question, the machine went down for a day

the most recent app where i have this issue is a standard gitlab runner deployment, i had the same situation in another deployment before where i basically used the same steps

heres the deployment i had to redact most info.
rancher applied the "spec.nodename" tag,
i believe when the cluster tries to shedule the pod doesn't check or care if the machine is down, sees the pod is not up, and tries it again

`apiVersion: apps/v1
kind: Deployment
metadata:
name: gitlab-runner-
namespace:
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
...
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
...
spec:
affinity: {}
containers:
- command:
...
image: gitlab/gitlab-runner
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
...
livenessProbe:
exec:
command:
...
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: gitlab-runner
...
readinessProbe:
...
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
...
initContainers:
- command:
- sh
...
image: gitlab/gitlab-runner
imagePullPolicy: IfNotPresent
name: configure
resources: {}
securityContext:
allowPrivilegeEscalation: false
...
nodeName: ${EXTRA_NODE}
restartPolicy: Always
schedulerName: default-scheduler
...
status:
availableReplicas: 1
conditions:

lastTransitionTime:
lastUpdateTime:
message: ReplicaSet "gitlab-runner" has successfully
progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
lastTransitionTime:
lastUpdateTime:
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
readyReplicas: 1
replicas: 1
updatedReplicas: 1
`

brandond · 2022-10-21T15:23:47Z

It doesn't sound like this is a problem with k3s or rancher then, but rather just the behavior of Kubernetes itself when you configure such a Deployment?

caroline-suse-rancher · 2023-04-19T18:47:59Z

I'm going to convert this to a discussion as this seems like more of a question than a bug report.

k3s-io locked and limited conversation to collaborators Apr 19, 2023

caroline-suse-rancher converted this issue into discussion #7318 Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

cluster creating hundreds of pods because node is down #6303

cluster creating hundreds of pods because node is down #6303

migs35323 commented Oct 20, 2022

brandond commented Oct 20, 2022

migs35323 commented Oct 21, 2022 •

edited

Loading

brandond commented Oct 21, 2022

caroline-suse-rancher commented Apr 19, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

cluster creating hundreds of pods because node is down #6303

cluster creating hundreds of pods because node is down #6303

Comments

migs35323 commented Oct 20, 2022

brandond commented Oct 20, 2022

migs35323 commented Oct 21, 2022 • edited Loading

brandond commented Oct 21, 2022

caroline-suse-rancher commented Apr 19, 2023

This issue was moved to a discussion.

migs35323 commented Oct 21, 2022 •

edited

Loading