kube-proxy scheduled before node is initialised by cloud-controller #1027

NeilW · 2018-07-27T14:34:11Z

BUG REPORT

Versions

kubeadm version (use kubeadm version):
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration:
Brightbox
OS (e.g. from /etc/os-release):
Ubuntu 18.04 LTS
Kernel (e.g. uname -a):
Linux srv-d35vu 4.15.0-29-generic The product_uuid and the hostname should be unique across nodes #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Others:

What happened?

kube-proxy is scheduled on new worker nodes before the cloud-controller has initialised the node addresses. This causes kube-proxy to fail to pick up the node's IP address properly - with knock on effects to the proxy function managing load balancers.

ubuntu@srv-d35vu:~$ kubectl -n kube-system logs kube-proxy-xvd8m 
I0727 14:10:07.854300       1 server_others.go:183] Using ipvs Proxier.
W0727 14:10:07.878777       1 server.go:610] Failed to retrieve node IP: host IP unknown; known addresses: []
W0727 14:10:07.879942       1 proxier.go:340] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
I0727 14:10:07.880198       1 server_others.go:210] Tearing down inactive rules.
I0727 14:10:07.917875       1 server.go:448] Version: v1.11.1
I0727 14:10:07.931360       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0727 14:10:07.931503       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0727 14:10:07.934345       1 conntrack.go:83] Setting conntrack hashsize to 32768
I0727 14:10:07.945352       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0727 14:10:07.946073       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0727 14:10:07.946358       1 config.go:202] Starting service config controller
I0727 14:10:07.946479       1 controller_utils.go:1025] Waiting for caches to sync for service config controller
I0727 14:10:07.947422       1 config.go:102] Starting endpoints config controller
I0727 14:10:07.947586       1 controller_utils.go:1025] Waiting for caches to sync for endpoints config controller
I0727 14:10:08.046848       1 controller_utils.go:1032] Caches are synced for service config controller
I0727 14:10:08.047791       1 controller_utils.go:1032] Caches are synced for endpoints config controller

What you expected to happen?

kube-proxy should probably block on the uninitialised taint via the config (or respond to the event of the address update if that is possible).

Not sure if kubeadm uses its own ds specification or just picks up an upstream one.

How to reproduce it (as minimally and precisely as possible)?

Run a kubeadm init with 'cloud-provider: external' set and join a worker to the cluster. Kube-Proxy will schedule and run on all nodes even with the uninitialised taints in place.

Anything else we need to know?

Deleting the pod and causing a reload picks up the node ip on the worker.

ubuntu@srv-d35vu:~$ kubectl -n kube-system delete pods kube-proxy-xvd8m 
pod "kube-proxy-xvd8m" deleted
ubuntu@srv-d35vu:~$ kubectl -n kube-system logs kube-proxy-
kube-proxy-b2k9z  kube-proxy-l2r6m  
ubuntu@srv-d35vu:~$ kubectl -n kube-system logs kube-proxy-l2r6m 
I0727 14:11:34.441117       1 server_others.go:183] Using ipvs Proxier.
I0727 14:11:34.456983       1 server_others.go:210] Tearing down inactive rules.
E0727 14:11:34.495629       1 proxier.go:423] Failed to execute iptables-restore for nat: exit status 1 (iptables-restore: line 7 failed
)
I0727 14:11:34.500124       1 server.go:448] Version: v1.11.1
I0727 14:11:34.512962       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0727 14:11:34.513311       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0727 14:11:34.513647       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0727 14:11:34.514939       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0727 14:11:34.515210       1 config.go:102] Starting endpoints config controller
I0727 14:11:34.515325       1 controller_utils.go:1025] Waiting for caches to sync for endpoints config controller
I0727 14:11:34.515507       1 config.go:202] Starting service config controller
I0727 14:11:34.515605       1 controller_utils.go:1025] Waiting for caches to sync for service config controller
I0727 14:11:34.615648       1 controller_utils.go:1032] Caches are synced for endpoints config controller
I0727 14:11:34.615840       1 controller_utils.go:1032] Caches are synced for service config controller

kubeadm.conf is

apiVersion: kubeadm.k8s.io/v1alpha2
apiServerExtraArgs:
  cloud-provider: external
controllerManagerExtraArgs:
  cloud-provider: external
nodeRegistration:
  kubeletExtraArgs:
    cloud-provider: external
kind: MasterConfiguration
clusterName: kubernetes
api:
  advertiseAddress: 0.0.0.0
networking:
  dnsDomain: cluster.local
  podSubnet: 192.168.0.0/16
  serviceSubnet: 172.30.0.0/16
bootstrapTokens:
  - token: n14rmi.zutbuixp6uzbmcop
kubeProxy:
  config:
    mode: ipvs
    ipvs:
      scheduler: lc

The text was updated successfully, but these errors were encountered:

neolit123 · 2018-07-27T20:13:40Z

Run a kubeadm init with 'cloud-provider: external' set and join a worker to the cluster. Kube-Proxy will schedule and run on all nodes even with the uninitialised taints in place

if this cannot be fixed by adjusting the kubeadm kube-proxy manifest and config map:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/addons/proxy/manifests.go

or by adjusting the kubeproxy config that the kubeadm config embeds:
https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/apis/kubeproxyconfig/types.go#L100

then it's hard to qualify this as a kubeadm issue.

NeilW · 2018-07-27T21:49:17Z

Thanks. I'll have a play with the DS and if I can't do anything there I'll kick it upstairs.

NeilW · 2018-07-31T15:30:20Z

Running the kube-proxy regardless of taints via the - Exists line (re-introduced in 8dcb980) means that kube-proxy runs before the NodeIPs are initialised by the cloud-provider. That breaks kube-proxy.
kube-proxy has to respect the 'node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule' taint put in place by kubelet.

neolit123 · 2018-07-31T15:38:13Z

@NeilW
kubernetes/kubernetes#65931
it fixed one thing but broke another....we did similar with the last changes in that file before that.

we are really back and forth on this one and to be honest i don't know what's best here in terms of rules, except that i know for sure that we need to expose those hardcoded addon configurations to the users and let them adjust the values they want.

/assign @timothysc
/assign @luxas

NeilW · 2018-07-31T15:47:12Z

It looks like an architectural issue that's dropping down the gap.

seh · 2018-08-08T15:46:01Z

It's unfortunate that TolerationOperator only offers the "Exists" and "Equal" predicates, and not the "NotIn" predicate understood by node selectors. With "NotIn," we could express "everything except this whitelist of taints"—such as the cloud provider one mentioned above.

timothysc · 2018-10-11T19:33:42Z

/cc @kubernetes/sig-scheduling-bugs

bsalamat · 2018-10-12T00:30:25Z

I believe this issue happened at the time that kube-proxy (which is a Daemon) was being scheduled by DaemonSet controller. I wonder if the same issue exists now (k8s 1.12+) when Daemons are scheduled by the default scheduler.

NeilW · 2018-10-12T14:38:53Z

Same problem

$ kubectl -n kube-system logs kube-proxy-6hw29
W1012 14:36:33.306545       1 server.go:609] Failed to retrieve node IP: host IP unknown; known addresses: []
I1012 14:36:33.306605       1 server_others.go:189] Using ipvs Proxier.
W1012 14:36:33.307482       1 proxier.go:328] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
I1012 14:36:33.307639       1 server_others.go:216] Tearing down inactive rules.
I1012 14:36:33.337372       1 server.go:447] Version: v1.12.1
I1012 14:36:33.348285       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I1012 14:36:33.348458       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1012 14:36:33.348861       1 conntrack.go:83] Setting conntrack hashsize to 32768
I1012 14:36:33.359809       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I1012 14:36:33.360044       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I1012 14:36:33.360491       1 config.go:102] Starting endpoints config controller
I1012 14:36:33.360598       1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I1012 14:36:33.360724       1 config.go:202] Starting service config controller
I1012 14:36:33.361784       1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I1012 14:36:33.460899       1 controller_utils.go:1034] Caches are synced for endpoints config controller
I1012 14:36:33.462227       1 controller_utils.go:1034] Caches are synced for service config controller

NeilW · 2018-10-12T14:50:47Z

The patch I use to workaround the issue is

kubectl -n kube-system patch ds kube-proxy -p='{ "spec": { "template": { "spec": { "tolerations": [ { "key": "CriticalAddonsOnly", "operator": "Exists" }, { "effect": "NoSchedule", "key": "node-role.kubernetes.io/master" } ] } } } }'

seh · 2018-11-21T16:58:02Z

Has anyone considered adding a "daemonTaints" field to kubeadm's ClusterConfiguration type that would allow operators to define locally-known taints that all system daemons like kube-proxy should tolerate? An example from my world: "nvidia.com/gpu" with the "NoSchedule" effect. I don't expect kubeadm to know about that, but it would be nice if there was a common, structured way to tell it about that.

neolit123 · 2018-11-21T17:56:10Z

so one of the reasons we didn't expose full control of the addon manifests in the latest iteration of the config was because we then lose some of the control when we do upgrades.

so by planning a field such as "daemonTaints" we need to evaluate upgrade scenarios too.

overall this falls in the the bucket of customization the we currently simply do not allow and users need to patch.

neolit123 · 2018-11-21T17:57:18Z

this idea here is mostly outdated (and not approved) but it spawned a bit of a discussion about the problems at hand: #1091

seh · 2018-11-21T18:18:29Z

Thanks for the background.

Instead of "daemonTaints," I should have said "daemonTolerations," but you get the idea.

pablochacin · 2019-02-13T19:05:04Z

@timothysc I'm interested in helping with this issue or taking it as I understand it is related to the broader add-on issue as @neolit123 suggested.

neolit123 · 2019-02-13T19:26:30Z

here are some key points in this thread:
#1027 (comment)
#1027 (comment)
#1027 (comment)

this isn't really a kubeadm bug, because in kubernetes/kubernetes#65931 we arguably fixed a bigger problem. but with that PR we introduced this problem, which then sort of transitions into a feature request in the scheduler as outlined here #1027 (comment)

i don't think we can do much in this ticket for this cycle in terms of code.
what we can do is document the workaround here:
#1027 (comment)

if anyone wants to help this is the place:
https://github.com/kubernetes/website/blob/master/content/en/docs/setup/independent/troubleshooting-kubeadm.md

/kind documentation

neolit123 · 2019-03-07T16:27:29Z

i will send a website PR to add a troubleshooting note.

neolit123 · 2019-03-08T03:13:02Z

sent docs PR to document the workaround:
kubernetes/website#13033

moving this to the Next milestone.
we need to figure out if this is still a problem in latest 1.13 and 1.14.

NeilW · 2019-03-08T10:52:59Z

Unfortunately it is. If I remove the toleration patch I get

$ kubectl -n 'kube-system' logs kube-proxy-m4ssh
W0308 10:27:28.506237       1 node.go:108] Failed to retrieve node IP: host IP unknown; known addresses: []
I0308 10:27:28.506517       1 server_others.go:189] Using ipvs Proxier.
W0308 10:27:28.507144       1 proxier.go:366] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
I0308 10:27:28.509332       1 server_others.go:216] Tearing down inactive rules.
I0308 10:27:28.540195       1 server.go:483] Version: v1.13.4

when building a new cluster with terraform.

fejta-bot · 2019-06-06T11:29:40Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-07-06T12:18:09Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

neolit123 · 2019-08-03T00:09:14Z

docs PR merged, but not much we can do here on the kubeadm side to conform this use case.
see #1027 (comment)
/close

k8s-ci-robot · 2019-08-03T00:09:15Z

@neolit123: Closing this issue.

In response to this:

docs PR merged, but not much we can do here on the kubeadm side to conform this use case.
see #1027 (comment)
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 added the area/cloudprovider label Jul 27, 2018

neolit123 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Jul 27, 2018

NeilW mentioned this issue Jul 31, 2018

Cloud Provider experiences kubernetes/kubernetes#66825

Closed

neolit123 added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Jul 31, 2018

k8s-ci-robot assigned luxas and timothysc Jul 31, 2018

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. kind/bug Categorizes issue or PR as related to a bug. labels Oct 11, 2018

timothysc added this to the v1.13 milestone Oct 26, 2018

timothysc unassigned luxas and timothysc Oct 26, 2018

timothysc added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Oct 31, 2018

timothysc removed this from the v1.13 milestone Oct 31, 2018

timothysc added this to the v1.14 milestone Jan 7, 2019

timothysc self-assigned this Feb 13, 2019

neolit123 added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 13, 2019

k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Feb 13, 2019

neolit123 self-assigned this Mar 7, 2019

neolit123 mentioned this issue Mar 8, 2019

kubeadm-ts: add entry about patching kube-proxy in CCM scenarios kubernetes/website#13033

Merged

neolit123 modified the milestones: v1.14, Next Mar 8, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 6, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 6, 2019

k8s-ci-robot closed this as completed Aug 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-proxy scheduled before node is initialised by cloud-controller #1027

kube-proxy scheduled before node is initialised by cloud-controller #1027

NeilW commented Jul 27, 2018 •

edited

Loading

neolit123 commented Jul 27, 2018

NeilW commented Jul 27, 2018

NeilW commented Jul 31, 2018

neolit123 commented Jul 31, 2018 •

edited

Loading

NeilW commented Jul 31, 2018

seh commented Aug 8, 2018

timothysc commented Oct 11, 2018

bsalamat commented Oct 12, 2018

NeilW commented Oct 12, 2018

NeilW commented Oct 12, 2018

seh commented Nov 21, 2018

neolit123 commented Nov 21, 2018

neolit123 commented Nov 21, 2018

seh commented Nov 21, 2018

pablochacin commented Feb 13, 2019

neolit123 commented Feb 13, 2019

neolit123 commented Mar 7, 2019

neolit123 commented Mar 8, 2019

NeilW commented Mar 8, 2019

fejta-bot commented Jun 6, 2019

fejta-bot commented Jul 6, 2019

neolit123 commented Aug 3, 2019

k8s-ci-robot commented Aug 3, 2019

kube-proxy scheduled before node is initialised by cloud-controller #1027

kube-proxy scheduled before node is initialised by cloud-controller #1027

Comments

NeilW commented Jul 27, 2018 • edited Loading

BUG REPORT

Versions

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

neolit123 commented Jul 27, 2018

NeilW commented Jul 27, 2018

NeilW commented Jul 31, 2018

neolit123 commented Jul 31, 2018 • edited Loading

NeilW commented Jul 31, 2018

seh commented Aug 8, 2018

timothysc commented Oct 11, 2018

bsalamat commented Oct 12, 2018

NeilW commented Oct 12, 2018

NeilW commented Oct 12, 2018

seh commented Nov 21, 2018

neolit123 commented Nov 21, 2018

neolit123 commented Nov 21, 2018

seh commented Nov 21, 2018

pablochacin commented Feb 13, 2019

neolit123 commented Feb 13, 2019

neolit123 commented Mar 7, 2019

neolit123 commented Mar 8, 2019

NeilW commented Mar 8, 2019

fejta-bot commented Jun 6, 2019

fejta-bot commented Jul 6, 2019

neolit123 commented Aug 3, 2019

k8s-ci-robot commented Aug 3, 2019

NeilW commented Jul 27, 2018 •

edited

Loading

neolit123 commented Jul 31, 2018 •

edited

Loading