KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT weren't set in pod env #40973

hongchaodeng · 2017-02-05T07:10:30Z

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration:
GKE

What happened:

KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT weren't set in pod env.

What you expected to happen:

KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT should be set by default in pod.

How to reproduce it (as minimally and precisely as possible):

Create in-cluster client in a pod.

This isn't easily reproducible. We encountered this issue when running extensive e2e tests and the logs showed that a pod crashed due to:

panic: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined

The text was updated successfully, but these errors were encountered:

hongchaodeng · 2017-02-05T17:48:01Z

@dchen1107 @Random-Liu
Potential node issue?

calebamiles · 2017-02-06T17:53:24Z

Could someone from @kubernetes/sig-node-bugs please take a look. Thanks!

cc: @dchen1107

liggitt · 2017-02-07T16:16:51Z

cc @stevekuznetsov

stevekuznetsov · 2017-02-07T16:34:41Z

Seeing this intermittently as you say, in Origin e2e CI.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2+43a9be4", GitCommit:"43a9be4", GitTreeState:"clean", BuildDate:"2017-02-07T16:24:34Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}

stevekuznetsov · 2017-02-07T16:35:04Z

@pmorie was thinking this may be symptomatic of the kubernetes service not actually starting.

stevekuznetsov · 2017-02-09T21:09:44Z

This is occurring with pretty high incidence in our CI -- is someone from @kubernetes/sig-node-bugs assigned to triage this?

resouer · 2017-02-10T03:24:48Z

@hongchaodeng When generating ENVs for container, there is a known race that kubelet may generate ENV before service is started.

See this: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L412-L416

May be we should make client-go retry for more times?

xiang90 · 2017-02-15T23:45:32Z

@resouer

May be we should make client-go retry for more times?

I feel this is more like a workaround. This bug should be fixed.

calebamiles · 2017-02-25T00:43:23Z

ping, @kubernetes/sig-node-bugs can we come up with a plan for either fixing the race condition in the kublet or are we planning on changing client-go @kubernetes/sig-api-machinery-misc?

shiywang · 2017-06-02T04:54:52Z

met same problem, do we have a workaround here ?

Client Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.0-alpha.4.914+b9e8d2aee6d593", GitCommit:"b9e8d2aee6d59330306cb458c1241f5e2578c40b", GitTreeState:"clean", BuildDate:"2017-06-02T04:13:59Z", GoVersion:"go1.8.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.0-alpha.4.914+b9e8d2aee6d593", GitCommit:"b9e8d2aee6d59330306cb458c1241f5e2578c40b", GitTreeState:"clean", BuildDate:"2017-06-02T04:13:59Z", GoVersion:"go1.8.1", Compiler:"gc", Platform:"linux/amd64"}
2017/06/02 04:51:34 util.go:131: Step './hack/e2e-internal/e2e-status.sh' finished in 117.680893ms
2017/06/02 04:51:34 util.go:129: Running: ./hack/ginkgo-e2e.sh --ginkgo.focus=ThirdParty
Setting up for KUBERNETES_PROVIDER="local".
Local doesn't need special preparations for e2e tests
Jun  2 04:51:34.868: INFO: >>> kubeConfig: 
Jun  2 04:51:34.868: INFO: failed to load config: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
panic:

qiujian16 · 2017-07-06T09:51:08Z

met the same problem, especially when we reboot the master node.

StevenACoffman · 2017-10-19T19:23:47Z

So this is biting us with some regularity, so I assume there is a recommended workaround. Any tips?

It is sometimes problematic to override environment variables in pod definitions as an IP address is expected by some things:

        env:
        - name: KUBERNETES_SERVICE_HOST
          value: "kubernetes.default.svc.cluster.local"
        - name: KUBERNETES_SERVICE_PORT
          value: "443"

I'm wondering if the IP address is available somewhere else? Downward API volume file?
Also, it would be great if there was some way to make the env override conditional on it not already existing.

stevekuznetsov · 2017-10-19T20:00:34Z

We worked around this by polling until the kubernetes service in namespace default was up and running before launching our pods.

smarterclayton · 2017-10-21T21:53:20Z

Re: conditional, we are unlikely to that because we want to move to explicit service env vars. What doesn't expect IPs? Nothing about _HOST promises not to be a name, so anything in Kube or it's ecosystem should fix assumptions like that (please help us open bugs for that wherever you've hit that).

fejta-bot · 2018-01-19T22:49:24Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

redbaron · 2018-02-06T15:49:23Z

/remove-lifecycle stale

[ upstream commit 604dab4 ] Since the k8s service is only created after the container is started, kubelet is not fast enough to set `KUBERNETES_SERVICE_HOST` nor `KUBERNETES_SERVICE_PORT` in a container which can result in Cilium having non-expected behaviors such as: panicking upon initialization; use an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR the k8s node has set; Re-allocate cilium_host router IP address which can cause network disruption; Inability to restore endpoints since their IP do not belong to the autogenerated CIDR. As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set we can detect if Cilium is running in k8s mode by also checking if this flag is set and not depend on `KUBERNETES_SERVICE_HOST` nor `KUBERNETES_SERVICE_PORT` for this detection. More info: kubernetes/kubernetes#40973 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

…t k8s [ upstream commit 1598f74 ] We've seen panics where it seems k8s isn't setup correctly but CRD related operations occur, and segfault. This occurs when the kubernetes service is not ready by the time cilium starts up and so cilium misses the KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being misconfigured. See kubernetes/kubernetes#40973 See #11021 Signed-off-by: Ray Bejjani <ray@isovalent.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

[ upstream commit 604dab4 ] Since the k8s service is only created after the container is started, kubelet is not fast enough to set `KUBERNETES_SERVICE_HOST` nor `KUBERNETES_SERVICE_PORT` in a container which can result in Cilium having non-expected behaviors such as: panicking upon initialization; use an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR the k8s node has set; Re-allocate cilium_host router IP address which can cause network disruption; Inability to restore endpoints since their IP do not belong to the autogenerated CIDR. As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set we can detect if Cilium is running in k8s mode by also checking if this flag is set and not depend on `KUBERNETES_SERVICE_HOST` nor `KUBERNETES_SERVICE_PORT` for this detection. More info: kubernetes/kubernetes#40973 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

…t k8s [ upstream commit 1598f74 ] We've seen panics where it seems k8s isn't setup correctly but CRD related operations occur, and segfault. This occurs when the kubernetes service is not ready by the time cilium starts up and so cilium misses the KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being misconfigured. See kubernetes/kubernetes#40973 See #11021 Signed-off-by: Ray Bejjani <ray@isovalent.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

[ upstream commit 604dab4 ] Since the k8s service is only created after the container is started, kubelet is not fast enough to set `KUBERNETES_SERVICE_HOST` nor `KUBERNETES_SERVICE_PORT` in a container which can result in Cilium having non-expected behaviors such as: panicking upon initialization; use an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR the k8s node has set; Re-allocate cilium_host router IP address which can cause network disruption; Inability to restore endpoints since their IP do not belong to the autogenerated CIDR. As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set we can detect if Cilium is running in k8s mode by also checking if this flag is set and not depend on `KUBERNETES_SERVICE_HOST` nor `KUBERNETES_SERVICE_PORT` for this detection. More info: kubernetes/kubernetes#40973 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

…t k8s [ upstream commit 1598f74 ] We've seen panics where it seems k8s isn't setup correctly but CRD related operations occur, and segfault. This occurs when the kubernetes service is not ready by the time cilium starts up and so cilium misses the KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being misconfigured. See kubernetes/kubernetes#40973 See #11021 Signed-off-by: Ray Bejjani <ray@isovalent.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

[ upstream commit 604dab4 ] Since the k8s service is only created after the container is started, kubelet is not fast enough to set `KUBERNETES_SERVICE_HOST` nor `KUBERNETES_SERVICE_PORT` in a container which can result in Cilium having non-expected behaviors such as: panicking upon initialization; use an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR the k8s node has set; Re-allocate cilium_host router IP address which can cause network disruption; Inability to restore endpoints since their IP do not belong to the autogenerated CIDR. As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set we can detect if Cilium is running in k8s mode by also checking if this flag is set and not depend on `KUBERNETES_SERVICE_HOST` nor `KUBERNETES_SERVICE_PORT` for this detection. More info: kubernetes/kubernetes#40973 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>

…t k8s [ upstream commit 1598f74 ] We've seen panics where it seems k8s isn't setup correctly but CRD related operations occur, and segfault. This occurs when the kubernetes service is not ready by the time cilium starts up and so cilium misses the KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being misconfigured. See kubernetes/kubernetes#40973 See #11021 Signed-off-by: Ray Bejjani <ray@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>

[ upstream commit 604dab4 ] Since the k8s service is only created after the container is started, kubelet is not fast enough to set `KUBERNETES_SERVICE_HOST` nor `KUBERNETES_SERVICE_PORT` in a container which can result in Cilium having non-expected behaviors such as: panicking upon initialization; use an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR the k8s node has set; Re-allocate cilium_host router IP address which can cause network disruption; Inability to restore endpoints since their IP do not belong to the autogenerated CIDR. As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set we can detect if Cilium is running in k8s mode by also checking if this flag is set and not depend on `KUBERNETES_SERVICE_HOST` nor `KUBERNETES_SERVICE_PORT` for this detection. More info: kubernetes/kubernetes#40973 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>

…t k8s [ upstream commit 1598f74 ] We've seen panics where it seems k8s isn't setup correctly but CRD related operations occur, and segfault. This occurs when the kubernetes service is not ready by the time cilium starts up and so cilium misses the KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being misconfigured. See kubernetes/kubernetes#40973 See #11021 Signed-off-by: Ray Bejjani <ray@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>

virendrasuryavanshi · 2020-10-30T06:45:59Z

Met the same issue, is there any workaround?
Kubernetes v1.18.10

FATA[0000] could not get config error="unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined" exit status 1

xiang90 mentioned this issue Feb 5, 2017

backup pod: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined coreos/etcd-operator#731

Closed

calebamiles added kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Feb 6, 2017

stevekuznetsov mentioned this issue Feb 9, 2017

Registry fails to come up (oc rollout status dc/docker-registry exited with status 1) openshift/origin#12897

Closed

foxish mentioned this issue Feb 15, 2017

Use master environment variable from KubernetesClusterSchedulerBackend apache-spark-on-k8s/spark#117

Closed

calebamiles modified the milestone: v1.6 Mar 8, 2017

pwittrock removed this from the v1.6 milestone Mar 11, 2017

mfojtik mentioned this issue Mar 29, 2017

Use DNS for client InClusterConfig() #43807

Closed

dghubble mentioned this issue Sep 28, 2017

Deprecate control plane self-hosted etcd poseidon/typhoon#13

Closed

dghubble mentioned this issue Oct 24, 2017

Clarify status of bootkube experimental self-hosted etcd kubernetes-retired/bootkube#738

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 6, 2018

mattfarina mentioned this issue Apr 12, 2018

helm list failed helm/helm#3816

Closed

oshoval mentioned this issue Apr 27, 2020

Fix Ipv6 Migration kubevirt/kubevirt#3221

Closed

deads2k mentioned this issue May 27, 2020

reduce race risk in kubelet for missing KUBERNETES_SERVICE_HOST #91500

Merged

virendrasuryavanshi mentioned this issue Oct 31, 2020

unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined #96070

Closed

vmansolas mentioned this issue Oct 12, 2021

Kubelet race not setting correct pod env (KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT) cloudfoundry-incubator/quarks-utils#135

Open

pacoxu mentioned this issue Dec 14, 2021

reduce race risk in kubelet for missing KUBERNETES_SERVICE_HOST 1.19+ fixed klts-io/kubernetes-lts#105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT weren't set in pod env #40973

KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT weren't set in pod env #40973

hongchaodeng commented Feb 5, 2017

hongchaodeng commented Feb 5, 2017

calebamiles commented Feb 6, 2017

liggitt commented Feb 7, 2017

stevekuznetsov commented Feb 7, 2017

stevekuznetsov commented Feb 7, 2017

stevekuznetsov commented Feb 9, 2017

resouer commented Feb 10, 2017

xiang90 commented Feb 15, 2017

calebamiles commented Feb 25, 2017

shiywang commented Jun 2, 2017

qiujian16 commented Jul 6, 2017

StevenACoffman commented Oct 19, 2017

stevekuznetsov commented Oct 19, 2017

smarterclayton commented Oct 21, 2017 via email

fejta-bot commented Jan 19, 2018

redbaron commented Feb 6, 2018

virendrasuryavanshi commented Oct 30, 2020 •

edited

KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT weren't set in pod env #40973

KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT weren't set in pod env #40973

Comments

hongchaodeng commented Feb 5, 2017

hongchaodeng commented Feb 5, 2017

calebamiles commented Feb 6, 2017

liggitt commented Feb 7, 2017

stevekuznetsov commented Feb 7, 2017

stevekuznetsov commented Feb 7, 2017

stevekuznetsov commented Feb 9, 2017

resouer commented Feb 10, 2017

xiang90 commented Feb 15, 2017

calebamiles commented Feb 25, 2017

shiywang commented Jun 2, 2017

qiujian16 commented Jul 6, 2017

StevenACoffman commented Oct 19, 2017

stevekuznetsov commented Oct 19, 2017

smarterclayton commented Oct 21, 2017 via email

fejta-bot commented Jan 19, 2018

redbaron commented Feb 6, 2018

virendrasuryavanshi commented Oct 30, 2020 • edited

virendrasuryavanshi commented Oct 30, 2020 •

edited