Thousands of deploy-job pods in pending state #755

HighwayofLife · 2018-07-06T01:27:45Z

RKE version:
rke version v0.1.8-rc11

Docker version: (docker version,docker info preferred)
1.12

Operating system and kernel: (cat /etc/os-release, uname -r preferred)
CoreOS

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Azure Private Cloud

I just ran RKE with the newest version of RKE against an existing cluster previously provisioned by RKE, and the rke-network-plugin-deploy-job failed during the run, but when I checked the node, noticed that the CPU was at 95%, Disk writes were going crazy, and the kubelet was consuming a huge amount of CPU. Turns out, 19,000 pods for the rke-network-plugin-deploy-job had been created and were in Pending state.

19,000

The text was updated successfully, but these errors were encountered:

deniseschannon · 2018-07-06T15:35:39Z

What k8s version were you using? Was this k8s 1.11?

HighwayofLife · 2018-07-06T18:07:11Z

No, this one was 1.10.3

willmao · 2018-07-07T11:38:19Z

have a look at the kubelet and cni install related containers' logs will help to solve your problem.

willmao · 2018-07-07T11:39:56Z

rke/k8s will keep scheduling cni plugin pod if it failed

HighwayofLife · 2018-07-07T19:02:37Z

It should clean up an old pod before it starts a new one. And after a certain number of tries it should report an error without continuing to start new pods

moelsayed · 2018-07-11T20:32:03Z

@HighwayofLife This is known issue with k8s 1.10.x. It's fixed in 1.10.5. Using version 0.1.8 with default k8s 1.10.5-rancher1-1 should resolve this.

deniseschannon · 2018-07-20T17:13:03Z

@HighwayofLife Let me know if you start using k8s v1.10.5 and still have these issues.

HighwayofLife · 2018-07-22T02:18:37Z

I'm using 1.10.5-rancher1 and still seeing this on occasion. I haven't had a chance to investigate. Will do that next week.

moelsayed · 2018-08-13T15:41:33Z

@HighwayofLife are you still seeing this with v0.1.9 ?

HighwayofLife · 2018-08-13T16:19:15Z

No, I have not seen this reappear in 0.1.9

moelsayed · 2018-08-13T16:50:01Z

Cool. I will close this issue for now.

deniseschannon added this to the v0.1.9 milestone Jul 11, 2018

deniseschannon self-assigned this Jul 11, 2018

deniseschannon modified the milestones: v0.1.9, v0.1.10 Jul 20, 2018

moelsayed closed this as completed Aug 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thousands of deploy-job pods in pending state #755

Thousands of deploy-job pods in pending state #755

HighwayofLife commented Jul 6, 2018

deniseschannon commented Jul 6, 2018

HighwayofLife commented Jul 6, 2018

willmao commented Jul 7, 2018

willmao commented Jul 7, 2018

HighwayofLife commented Jul 7, 2018

moelsayed commented Jul 11, 2018 •

edited

Loading

deniseschannon commented Jul 20, 2018

HighwayofLife commented Jul 22, 2018 via email

moelsayed commented Aug 13, 2018

HighwayofLife commented Aug 13, 2018

moelsayed commented Aug 13, 2018

Thousands of deploy-job pods in pending state #755

Thousands of deploy-job pods in pending state #755

Comments

HighwayofLife commented Jul 6, 2018

deniseschannon commented Jul 6, 2018

HighwayofLife commented Jul 6, 2018

willmao commented Jul 7, 2018

willmao commented Jul 7, 2018

HighwayofLife commented Jul 7, 2018

moelsayed commented Jul 11, 2018 • edited Loading

deniseschannon commented Jul 20, 2018

HighwayofLife commented Jul 22, 2018 via email

moelsayed commented Aug 13, 2018

HighwayofLife commented Aug 13, 2018

moelsayed commented Aug 13, 2018

moelsayed commented Jul 11, 2018 •

edited

Loading