Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT weren't set in pod env #40973

Closed
hongchaodeng opened this issue Feb 5, 2017 · 29 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/unresolved Indicates an issue that can not or will not be resolved.

Comments

@hongchaodeng
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration:
    GKE

What happened:

KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT weren't set in pod env.

What you expected to happen:

KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT should be set by default in pod.

How to reproduce it (as minimally and precisely as possible):

Create in-cluster client in a pod.

This isn't easily reproducible. We encountered this issue when running extensive e2e tests and the logs showed that a pod crashed due to:

panic: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
@hongchaodeng
Copy link
Contributor Author

@dchen1107 @Random-Liu
Potential node issue?

@calebamiles calebamiles added kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Feb 6, 2017
@calebamiles
Copy link
Contributor

Could someone from @kubernetes/sig-node-bugs please take a look. Thanks!

cc: @dchen1107

@liggitt
Copy link
Member

liggitt commented Feb 7, 2017

cc @stevekuznetsov

@stevekuznetsov
Copy link
Contributor

Seeing this intermittently as you say, in Origin e2e CI.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2+43a9be4", GitCommit:"43a9be4", GitTreeState:"clean", BuildDate:"2017-02-07T16:24:34Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}

@stevekuznetsov
Copy link
Contributor

@pmorie was thinking this may be symptomatic of the kubernetes service not actually starting.

@stevekuznetsov
Copy link
Contributor

This is occurring with pretty high incidence in our CI -- is someone from @kubernetes/sig-node-bugs assigned to triage this?

@resouer
Copy link
Contributor

resouer commented Feb 10, 2017

@hongchaodeng When generating ENVs for container, there is a known race that kubelet may generate ENV before service is started.

See this: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L412-L416

May be we should make client-go retry for more times?

@xiang90
Copy link
Contributor

xiang90 commented Feb 15, 2017

@resouer

May be we should make client-go retry for more times?

I feel this is more like a workaround. This bug should be fixed.

@calebamiles
Copy link
Contributor

ping, @kubernetes/sig-node-bugs can we come up with a plan for either fixing the race condition in the kublet or are we planning on changing client-go @kubernetes/sig-api-machinery-misc?

@calebamiles calebamiles modified the milestone: v1.6 Mar 8, 2017
@pwittrock pwittrock removed this from the v1.6 milestone Mar 11, 2017
@shiywang
Copy link
Contributor

shiywang commented Jun 2, 2017

met same problem, do we have a workaround here ?

Client Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.0-alpha.4.914+b9e8d2aee6d593", GitCommit:"b9e8d2aee6d59330306cb458c1241f5e2578c40b", GitTreeState:"clean", BuildDate:"2017-06-02T04:13:59Z", GoVersion:"go1.8.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.0-alpha.4.914+b9e8d2aee6d593", GitCommit:"b9e8d2aee6d59330306cb458c1241f5e2578c40b", GitTreeState:"clean", BuildDate:"2017-06-02T04:13:59Z", GoVersion:"go1.8.1", Compiler:"gc", Platform:"linux/amd64"}
2017/06/02 04:51:34 util.go:131: Step './hack/e2e-internal/e2e-status.sh' finished in 117.680893ms
2017/06/02 04:51:34 util.go:129: Running: ./hack/ginkgo-e2e.sh --ginkgo.focus=ThirdParty
Setting up for KUBERNETES_PROVIDER="local".
Local doesn't need special preparations for e2e tests
Jun  2 04:51:34.868: INFO: >>> kubeConfig: 
Jun  2 04:51:34.868: INFO: failed to load config: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
panic: 

@qiujian16
Copy link
Contributor

met the same problem, especially when we reboot the master node.

@StevenACoffman
Copy link
Contributor

So this is biting us with some regularity, so I assume there is a recommended workaround. Any tips?

It is sometimes problematic to override environment variables in pod definitions as an IP address is expected by some things:

        env:
        - name: KUBERNETES_SERVICE_HOST
          value: "kubernetes.default.svc.cluster.local"
        - name: KUBERNETES_SERVICE_PORT
          value: "443"

I'm wondering if the IP address is available somewhere else? Downward API volume file?
Also, it would be great if there was some way to make the env override conditional on it not already existing.

@stevekuznetsov
Copy link
Contributor

We worked around this by polling until the kubernetes service in namespace default was up and running before launching our pods.

@smarterclayton
Copy link
Contributor

smarterclayton commented Oct 21, 2017 via email

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2018
@redbaron
Copy link
Contributor

redbaron commented Feb 6, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 6, 2018
jrfastab pushed a commit to cilium/cilium that referenced this issue Apr 23, 2020
[ upstream commit 604dab4 ]

Since the k8s service is only created after the container is started,
kubelet is not fast enough to set `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` in a container which can result in Cilium
having non-expected behaviors such as: panicking upon initialization; use
an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR
the k8s node has set; Re-allocate cilium_host router IP address which
can cause network disruption; Inability to restore endpoints since their
IP do not belong to the autogenerated CIDR.

As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set
we can detect if Cilium is running in k8s mode by also checking if this
flag is set and not depend on `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` for this detection.

More info: kubernetes/kubernetes#40973

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
jrfastab pushed a commit to cilium/cilium that referenced this issue Apr 23, 2020
…t k8s

[ upstream commit 1598f74 ]

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See #11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
jrfastab pushed a commit to cilium/cilium that referenced this issue Apr 28, 2020
[ upstream commit 604dab4 ]

Since the k8s service is only created after the container is started,
kubelet is not fast enough to set `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` in a container which can result in Cilium
having non-expected behaviors such as: panicking upon initialization; use
an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR
the k8s node has set; Re-allocate cilium_host router IP address which
can cause network disruption; Inability to restore endpoints since their
IP do not belong to the autogenerated CIDR.

As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set
we can detect if Cilium is running in k8s mode by also checking if this
flag is set and not depend on `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` for this detection.

More info: kubernetes/kubernetes#40973

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
jrfastab pushed a commit to cilium/cilium that referenced this issue Apr 28, 2020
…t k8s

[ upstream commit 1598f74 ]

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See #11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
joestringer pushed a commit to cilium/cilium that referenced this issue Apr 29, 2020
[ upstream commit 604dab4 ]

Since the k8s service is only created after the container is started,
kubelet is not fast enough to set `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` in a container which can result in Cilium
having non-expected behaviors such as: panicking upon initialization; use
an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR
the k8s node has set; Re-allocate cilium_host router IP address which
can cause network disruption; Inability to restore endpoints since their
IP do not belong to the autogenerated CIDR.

As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set
we can detect if Cilium is running in k8s mode by also checking if this
flag is set and not depend on `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` for this detection.

More info: kubernetes/kubernetes#40973

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
joestringer pushed a commit to cilium/cilium that referenced this issue Apr 29, 2020
…t k8s

[ upstream commit 1598f74 ]

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See #11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
christarazi pushed a commit to cilium/cilium that referenced this issue Apr 30, 2020
[ upstream commit 604dab4 ]

Since the k8s service is only created after the container is started,
kubelet is not fast enough to set `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` in a container which can result in Cilium
having non-expected behaviors such as: panicking upon initialization; use
an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR
the k8s node has set; Re-allocate cilium_host router IP address which
can cause network disruption; Inability to restore endpoints since their
IP do not belong to the autogenerated CIDR.

As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set
we can detect if Cilium is running in k8s mode by also checking if this
flag is set and not depend on `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` for this detection.

More info: kubernetes/kubernetes#40973

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi pushed a commit to cilium/cilium that referenced this issue Apr 30, 2020
…t k8s

[ upstream commit 1598f74 ]

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See #11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
joestringer pushed a commit to cilium/cilium that referenced this issue May 4, 2020
[ upstream commit 604dab4 ]

Since the k8s service is only created after the container is started,
kubelet is not fast enough to set `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` in a container which can result in Cilium
having non-expected behaviors such as: panicking upon initialization; use
an autogenerated IPv4 allocated IP as Cilium won't detect which podCIDR
the k8s node has set; Re-allocate cilium_host router IP address which
can cause network disruption; Inability to restore endpoints since their
IP do not belong to the autogenerated CIDR.

As all Cilium DaemonSets have the K8S_NODE_NAME environment variable set
we can detect if Cilium is running in k8s mode by also checking if this
flag is set and not depend on `KUBERNETES_SERVICE_HOST` nor
`KUBERNETES_SERVICE_PORT` for this detection.

More info: kubernetes/kubernetes#40973

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
joestringer pushed a commit to cilium/cilium that referenced this issue May 4, 2020
…t k8s

[ upstream commit 1598f74 ]

We've seen panics where it seems k8s isn't setup correctly but CRD
related operations occur, and segfault. This occurs when the kubernetes
service is not ready by the time cilium starts up and so cilium misses the
KUBERNETES_SERVICE_{HOST,PORT} settings resulting in it being
misconfigured.
See kubernetes/kubernetes#40973
See #11021

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
@virendrasuryavanshi
Copy link

virendrasuryavanshi commented Oct 30, 2020

Met the same issue, is there any workaround?
Kubernetes v1.18.10

FATA[0000] could not get config error="unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined" exit status 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/unresolved Indicates an issue that can not or will not be resolved.
Projects
None yet
Development

No branches or pull requests