Controller node not being properly tainted #199

artushin · 2017-01-04T16:41:26Z

/opt/bin/taint-and-uncordon had a syntax issue with the kubectl taint command. Was failing with error: at least one taint update is required and the controller node was still getting pods scheduled to it. PR fixes the syntax as per http://kubernetes.io/docs/user-guide/kubectl/kubectl_taint/

codecov-io · 2017-01-04T16:45:47Z

Current coverage is 68.87% (diff: 100%)

Merging #199 into master will decrease coverage by 0.12%

@@             master       #199   diff @@
==========================================
  Files             4          4          
  Lines          1132       1134     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits            781        781          
- Misses          262        263     +1   
- Partials         89         90     +1

Powered by Codecov. Last update 2b29fc3...fb34129

mumoshu · 2017-01-06T02:27:53Z

@artushin Thanks for your help!
I'm not sure how I'd made a mistake like this slipped into the v0.9.2-rc.2 release but your fix does seems to be correct.
Please give me a day or two to test before/after merging your fix by running our e2e tests.

mumoshu · 2017-01-06T05:15:50Z

Update: I've confirmed that @artushin's fix does work as expected:

$ kubectl describe node ip-10-0-0-193.ap-northeast-1.compute.internal
Name:  			ip-10-0-0-193.ap-northeast-1.compute.internal
Labels:			beta.kubernetes.io/arch=amd64
       			beta.kubernetes.io/instance-type=t2.medium
       			beta.kubernetes.io/os=linux
       			failure-domain.beta.kubernetes.io/region=ap-northeast-1
       			failure-domain.beta.kubernetes.io/zone=ap-northeast-1a
       			kube-aws.coreos.com/autoscalinggroup=kubeawstest1-AutoScaleController-1MXSGQDT5DGKF
       			kube-aws.coreos.com/launchconfiguration=kubeawstest1-LaunchConfigurationController-1EGZPO0LU1JJO
       			kubernetes.io/hostname=ip-10-0-0-193.ap-northeast-1.compute.internal
Taints:			node.alpha.kubernetes.io/role=master:NoSchedule
*snip*

artushin · 2017-01-06T20:41:11Z

Taint-and-cordon still fails:

taint-and-uncordon[25536]: [14820.578848] hyperkube[5]: error: Node 'ip-10-0-0-224.us-west-2.compute.internal' already has a taint with key (node.alpha.kubernetes.io/role) and effect (NoSchedule), and --overwrite is false

With that fixed, I'm seeing a failure with

taint-and-uncordon[26251]: rm: unable to remove pod "1696c88c-99f4-4810-bb77-d406315f1452": remove /var/lib/rkt/pods/exited-garbage/1696c88c-99f4-4810-bb77-d406315f1452/stage1/rootfs: device or resource busy

Haven't used rkt, so that'll require a bit of investigation. Also my controller is becoming CPU bound eventually. Maybe because of this issue and a ton of un-garbage-collected rkt images? I dunno.

artushin · 2017-01-06T20:50:52Z

sudo rkt run
...
 --uuid-file-save=/var/run/coreos/taint-and-uncordon.uuid \
...
sudo rkt rm --uuid-file=/var/run/coreos/taint-and-uncordon.uuid

seems broken. Can you just rely on regular gc?

Getting rid of that seems to have fixed my CPU throttle issue too:

reiinakano · 2017-01-08T09:47:10Z

Any updates on this? Is the snippet

sudo rkt run
...
--uuid-file-save=/var/run/coreos/taint-and-uncordon.uuid
...
sudo rkt rm --uuid-file=/var/run/coreos/taint-and-uncordon.uuid

safe to remove without affecting other parts?

I'm having problems with the CPU usage (exactly as artushin experienced). Turns out that if I reboot the instance, the CPU usage resets, however, weird things happen after rebooting e.g. pods not being able to see each other, multiple restarts of DNS and apiserver containers, etc.

Update:
Went ahead and removed it, then rebuilt my cluster. Running smoothly now.

redbaron · 2017-01-08T12:11:16Z

@reiinakano , @artushin these commands delete container which rkt created to run command, do you have an idea how it might be related to high CPU time?

artushin · 2017-01-08T16:03:01Z

@redbaron When I was seeing the issue, rkt list had a ton of errors like

list: Unable to read pod fb7842e7-ab0a-49fb-845a-2399b36a8151 manifest:
  error reading pod manifest

so I assume those failed taint-and-uncordon containers were being kept around possible increasing CPU usage from failing GC? Someone with more rkt knowledge should look into the failure case of running the taint-and-uncordon job. In the meantime, if the script passes, the CPU usage issue shouldn't occur.

Don't understand why simply getting that script to run without error relieved so much cpu though, because unlike @reiinakano, I didn't rebuild the cluster, I just edited the script in /opt.

whereisaaron · 2017-01-08T21:36:52Z

Even after I editing the '/opt/bin' script so that the taint works, the task still just fails over and over. As @artushin noted the sudo rkt rm --uuid-file=/var/run/coreos/taint-and-uncordon.uuid always seems to fail with rm: unable to remove pod.

Jan 08 21:15:57 ip-1.1.1.1.ap-southeast-2.compute.internal taint-and-uncordon[13188]: rm: unable to remove pod "53f4b000-7782-4b60-81b5-2997$
Jan 08 21:15:57 ip-1.1.1.1.ap-southeast-2.compute.internal systemd[1]: kube-node-taint-and-uncordon.service: Main process exited, code=exite$
Jan 08 21:15:57 ip-1.1.1.1.ap-southeast-2.compute.internal systemd[1]: kube-node-taint-and-uncordon.service: Unit entered failed state.
Jan 08 21:15:57 ip-1.1.1.1.ap-southeast-2.compute.internal systemd[1]: kube-node-taint-and-uncordon.service: Failed with result 'exit-code'.

These failed, but not cleaned up rkt pods seem to mount up and grow a docker clean-up task that runs about every 60s. I don't know how rkt works either. But it may be this clean up task that gradually grows and increase the load as the number of failed containers mount.

You can see the slow, but inevitable death-spiral on this controller I just made. The taint-and-uncordon is patched to always work, but the rkt task fails each time, and the number pods in the 60s clean-up task gets longer and longer, and the CPU load higher and higher.

whereisaaron · 2017-01-08T23:51:00Z

Hi remade the same cluster as above, this time fully removing the taint-and-uncordon task from the cloud-init file. Compare the CPU profiles, it is back to normal and comparable with my other clusters, settling at 15% soon after launch. Compare that with when the taint-and-uncordon task was still there, the CPU is never below 60% and slowly climbs to 100% over time.

whereisaaron · 2017-01-09T00:01:45Z

Although the controller is stable with taint-and-uncordon removed, that problem with the periodic clean-up task persists, it just doesn't get any worse. Every time is it just eight containers mentioned in the errors every 60 seconds, plus the error opening '/run/docker/libcontainerd/docker-containerd.pid'.

I guess there is a deeper problem with the rkt container tasks that we just don't notice until one goes awry like taint-and-uncordon does. It could be we could reduce the controller CPU load further by solving this problem

Jan 08 23:51:33 ip-1-1-1-1.region.compute.internal kubelet-wrapper[1537]: E0108 23:51:33.217869    1537 container_manager_linux
.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no suc
h file or directory
Jan 08 23:51:33 ip-1-1-1-1.region.compute.internal dockerd[1421]: time="2017-01-08T23:51:33.233938247Z" level=error msg="Handle
r for GET /containers/3da5a08fd693749466ccfd1ed1b52164619bf86c4a2a39f1c0caafdf6c450ccd/json returned error: No such container: 3da5a08fd69374
9466ccfd1ed1b52164619bf86c4a2a39f1c0caafdf6c450ccd"
Jan 08 23:51:33 ip-1-1-1-1.region.compute.internal dockerd[1421]: time="2017-01-08T23:51:33.267776846Z" level=error msg="Handler for GET /containers/73ddc461d2b1f0746853940eadb7951b939d0ae4e0a5257a41e17bbf48646293/json returned error: No such container: 73ddc461d2b1f0746853940eadb7951b939d0ae4e0a5257a41e17bbf48646293"
Jan 08 23:51:33 ip-1-1-1-1.region.compute.internal dockerd[1421]: time="2017-01-08T23:51:33.273305898Z" level=error msg="Handler for GET /containers/1bc2af7f3283cd951ad93ed8918761bac247cef4d6b366fa0e071addd3583003/json returned error: No such container: 1bc2af7f3283cd951ad93ed8918761bac247cef4d6b366fa0e071addd3583003"
Jan 08 23:51:33 ip-1-1-1-1.region.compute.internal dockerd[1421]: time="2017-01-08T23:51:33.282636512Z" level=error msg="Handler for GET /containers/f68bdb07afd94c87b2c55060894bef4b4e30a9a9b5e56c4e068de74ac34d7e8e/json returned error: No such container: f68bdb07afd94c87b2c55060894bef4b4e30a9a9b5e56c4e068de74ac34d7e8e"
Jan 08 23:51:33 ip-1-1-1-1.region.compute.internal dockerd[1421]: time="2017-01-08T23:51:33.283499144Z" level=error msg="Handler for GET /containers/e18bc18d22fc7cf5722ed3bf39ade10e59b78809ef5786c8fc71d77fd0869132/json returned error: No such container: e18bc18d22fc7cf5722ed3bf39ade10e59b78809ef5786c8fc71d77fd0869132"
Jan 08 23:51:33 ip-1-1-1-1.region.compute.internal dockerd[1421]: time="2017-01-08T23:51:33.284049871Z" level=error msg="Handler for GET /containers/e071cceebc9c4a2813fd3c61be60e4388b6d98bbee77092fd1a3439fb64af2ff/json returned error: No such container: e071cceebc9c4a2813fd3c61be60e4388b6d98bbee77092fd1a3439fb64af2ff"
Jan 08 23:51:33 ip-1-1-1-1.region.compute.internal dockerd[1421]: time="2017-01-08T23:51:33.284594161Z" level=error msg="Handler for GET /containers/3755d84cb13cc852423bb29635476a7a4f3d9053d254c376432add0af64b61d0/json returned error: No such container: 3755d84cb13cc852423bb29635476a7a4f3d9053d254c376432add0af64b61d0"
Jan 08 23:51:33 ip-1-1-1-1.region.compute.internal dockerd[1421]: time="2017-01-08T23:51:33.285088451Z" level=error msg="Handler for GET /containers/a30eb84ec2e8f61118b9a4715b11d640db1d1dc91d86e4a871f4657697ee2e1d/json returned error: No such container: a30eb84ec2e8f61118b9a4715b11d640db1d1dc91d86e4a871f4657697ee2e1d"

redbaron · 2017-01-09T00:15:28Z

do these container IDs have anything in common? are they from same failed/exited container?

Update: also previous errors you reported were from rkt list, these ones are from dockerd

whereisaaron · 2017-01-09T01:39:50Z

@redbaron I don't know what those containers IDs are from, there was no obvious mention of them in journalctl since boot. The 60s clean-up errors were always dockerd errors, and that list grew when the rkt task was repeatedly failing. I have no clue how/if they are related. It may be a red herring; I don't actually know the growing clean-up errors are behind the growing CPU load. I just know that taint-and-uncordon failing continuously is bad news for the stability of my controller nodes :-)

mumoshu · 2017-01-10T00:12:19Z

@redbaron @artushin @whereisaaron @reiinakano I'd been seeing the rkt rm errors while testing my cluster w/ the fix, but I had no luck knowing the implications/effects of those. Thanks for all the investigations you've done 🙇

AFAIK the rkt rm issue is caused by rkt/rkt#3181 and workarounds I'm aware of are:

Run rkt via the /opt/bin/host-rkt wrapper script ("sharing" the rkt data directory across a shared bindmount breaks rkt rm rkt/rkt#3181 (comment))
Run in docker instead
Give up rkt rmin the container 😮

Edit: The first option doesn't work in this case.

cleaning pod resources.
Jan 10 00:56:56 ip-10-0-0-113.ap-northeast-1.compute.internal taint-and-uncordon[21244]: rm: unable to remove pod "f2b04f46-6b60-4967-956c-59167d377079": remove /var/lib/rkt/pods/exited-garbage/f2b04f46-6b60-4967-956c-59167d377079/stage1/rootfs: device or resource busy

mumoshu · 2017-01-10T06:59:02Z

I'd rather like to avoid rkt here so that we can completely solve this issue until it is fixed in rkt v1.22.0 via rkt/rkt#3486. Therefore, for now, I'd suggest changing /opt/bin/taint-and-uncordon to something like:

      #!/bin/bash -e

      hostname=$(hostname)

      docker run --rm --net=host \
        -v /etc/kubernetes:/etc/kubernetes \
        -v /etc/resolv.conf:/etc/resolv.conf \
        {{.HyperkubeImageRepo}}:{{.K8sVer}} /bin/bash \
          -vxc \
          'echo tainting this node; \
           hostname="'${hostname}'"; \
           kubectl="/kubectl --server=http://127.0.0.1:8080"; \
           taint="$kubectl taint node $hostname"; \
           $taint "node.alpha.kubernetes.io/role=master:NoSchedule"; \
           echo done. ;\
           echo uncordoning this node; \
           $kubectl uncordon $hostname;\
           echo done.'

whereisaaron · 2017-01-10T07:13:35Z

Agree, best to avoid rkt until that gnaly issue is fixed.

Suggest '-vxec' for better error handling and to avoid uncordoning an untainted node by mistake if the taint command throws an error (as happened with the '=' typo).
Suggest '--overwrite' just in case of a re-run due to the uncordon throwing an error the first time or some other error (as happened with the rkt task).
Suggest a bunch of pedantic and unnecessary quoting and message changes for consistency 😸

#!/bin/bash -e

      hostname=$(hostname)

      docker run --rm --net=host \
        -v /etc/kubernetes:/etc/kubernetes \
        -v /etc/resolv.conf:/etc/resolv.conf \
        {{.HyperkubeImageRepo}}:{{.K8sVer}} /bin/bash \
          -vxec \
          'echo "tainting this node."; \
           hostname="'${hostname}'"; \
           kubectl="/kubectl --server=http://127.0.0.1:8080"; \
           taint="$kubectl taint node --overwrite"; \
           $taint "$hostname" "node.alpha.kubernetes.io/role=master:NoSchedule"; \
           echo "done."; \
           echo "uncordoning this node."; \
           $kubectl uncordon "$hostname"; \
           echo "done."'

mumoshu · 2017-01-10T07:17:23Z

@whereisaaron I agree on every point 👍

mumoshu · 2017-01-10T13:18:47Z

Hi @artushin, would you mind updating your PR to use docker instead of rkt as @whereisaaron suggested in #199 (comment) if you're also ok with it?

using script supplied in @whereisarron's comment in PR 199.

artushin · 2017-01-10T15:02:58Z

lgtm. @whereisaaron, switching to exactly what you provided.

rkt has a gnarly bug (rkt/rkt#3181) that won't be fixed in a hurry (rkt/rkt#3486). It least to continuous task failures that eventually totally wreak worker nodes (kubernetes-retired#244). In the meantime we can use docker just as easily for this simple task. This work around was discussed in kubernetes-retired#199.

mumoshu · 2017-01-11T00:05:46Z

LGTM. Thanks to all for your contribution! 🙇 @redbaron @artushin @whereisaaron @reiinakano

reiinakano · 2017-01-11T02:39:07Z

Quick question, but could I pull in this new merge and do kube-aws update without having to rebuild my cluster? Sorry, don't really understand the internals of kube-aws

mumoshu · 2017-01-11T02:47:45Z

@reiinakano Yes, it is designed to work in this case. However please beware of the fact that it isn't strictly tested with every combination of changes. Could you mind taking a look at the "full update" (which is your case) section of the relevant kube-aws doc for more info regarding what kind of updates kube-aws is intended to support and how.

mumoshu · 2017-01-11T02:51:42Z

Also note that if you are going to update worker nodes and you had not yet enabled experimental.nodeDrainer and correct grace periods for your pods with enough replicas, you may encounter some down-time.

kube-aws does replace your nodes one-by-one hence you have to teach kube-aws, your kubernetes cluster and your pods how to tolerate single node replacement.

reiinakano · 2017-01-11T02:55:53Z

I see. Thanks @mumoshu !

rkt has a gnarly bug (rkt/rkt#3181) that won't be fixed in a hurry (rkt/rkt#3486). It least to continuous task failures that eventually totally wreak worker nodes (kubernetes-retired#244). In the meantime we can use docker just as easily for this simple task. This work around was discussed in kubernetes-retired#199.

Controller node not being properly tainted

fix syntax of cloud-config-controller kubectl taint command

3822c52

mumoshu added this to the v0.9.3-rc.3 milestone Jan 6, 2017

whereisaaron mentioned this pull request Jan 8, 2017

Correct typo in controller/master taint #216

Closed

reiinakano mentioned this pull request Jan 8, 2017

Confusion regarding kube-aws update in docs #212

Closed

This was referenced Jan 10, 2017

Bash improvements #217

Merged

taint-and-uncordon task continuously fails on controller nodes due to typo in taint #215

Closed

whereisaaron mentioned this pull request Jan 10, 2017

Tainted but idle workers trend to 100% CPU and fail within a few hours #224

Closed

mumoshu mentioned this pull request Jan 10, 2017

Generated ELB Security Group caused API to be inaccessible due to blocking ICMP #214

Closed

taint-and-uncordon: switch to docker

fb34129

using script supplied in @whereisarron's comment in PR 199.

whereisaaron mentioned this pull request Jan 10, 2017

Change taint-and-uncordon worker task to use docker for now #231

Merged

mumoshu merged commit 18c5c03 into kubernetes-retired:master Jan 11, 2017

This was referenced Jan 11, 2017

node taint failed #232

Closed

Fix taint, fixes #232 #233

Closed

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this pull request Mar 27, 2018

Merge pull request kubernetes-retired#199 from artushin/master

634e954

Controller node not being properly tainted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller node not being properly tainted #199

Controller node not being properly tainted #199

artushin commented Jan 4, 2017

codecov-io commented Jan 4, 2017 •

edited

mumoshu commented Jan 6, 2017

mumoshu commented Jan 6, 2017

artushin commented Jan 6, 2017

artushin commented Jan 6, 2017 •

edited

reiinakano commented Jan 8, 2017 •

edited

redbaron commented Jan 8, 2017

artushin commented Jan 8, 2017

whereisaaron commented Jan 8, 2017

whereisaaron commented Jan 8, 2017

whereisaaron commented Jan 9, 2017

redbaron commented Jan 9, 2017 •

edited

whereisaaron commented Jan 9, 2017

mumoshu commented Jan 10, 2017 •

edited

mumoshu commented Jan 10, 2017

whereisaaron commented Jan 10, 2017 •

edited

mumoshu commented Jan 10, 2017

mumoshu commented Jan 10, 2017

artushin commented Jan 10, 2017

mumoshu commented Jan 11, 2017

reiinakano commented Jan 11, 2017

mumoshu commented Jan 11, 2017

mumoshu commented Jan 11, 2017 •

edited

reiinakano commented Jan 11, 2017

Controller node not being properly tainted #199

Controller node not being properly tainted #199

Conversation

artushin commented Jan 4, 2017

codecov-io commented Jan 4, 2017 • edited

Current coverage is 68.87% (diff: 100%)

mumoshu commented Jan 6, 2017

mumoshu commented Jan 6, 2017

artushin commented Jan 6, 2017

artushin commented Jan 6, 2017 • edited

reiinakano commented Jan 8, 2017 • edited

redbaron commented Jan 8, 2017

artushin commented Jan 8, 2017

whereisaaron commented Jan 8, 2017

whereisaaron commented Jan 8, 2017

whereisaaron commented Jan 9, 2017

redbaron commented Jan 9, 2017 • edited

whereisaaron commented Jan 9, 2017

mumoshu commented Jan 10, 2017 • edited

mumoshu commented Jan 10, 2017

whereisaaron commented Jan 10, 2017 • edited

mumoshu commented Jan 10, 2017

mumoshu commented Jan 10, 2017

artushin commented Jan 10, 2017

mumoshu commented Jan 11, 2017

reiinakano commented Jan 11, 2017

mumoshu commented Jan 11, 2017

mumoshu commented Jan 11, 2017 • edited

reiinakano commented Jan 11, 2017

codecov-io commented Jan 4, 2017 •

edited

artushin commented Jan 6, 2017 •

edited

reiinakano commented Jan 8, 2017 •

edited

redbaron commented Jan 9, 2017 •

edited

mumoshu commented Jan 10, 2017 •

edited

whereisaaron commented Jan 10, 2017 •

edited

mumoshu commented Jan 11, 2017 •

edited