-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes 1.11.1 nodes occasionally do not register internal IP address #860
Comments
I just saw this in our 1.11 cluster yesterday as well. Have not had a chance to investigate. It's causing the key vault flex volume to fail. |
If a restart of the kubelet fixes this, can you provide the |
I've added the full log file from one of the affected nodes to this gist: |
At some point all our nodes in the cluster lost their internal IP (The two with IPs had their kubelet container restarted.) NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-corp-prod-0-master-us-corp-kc-8a-0 Ready controlplane,etcd 4d v1.11.1 <none> <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1
k8s-corp-prod-0-master-us-corp-kc-8b-1 Ready controlplane,etcd 4d v1.11.1 <none> <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1
k8s-corp-prod-0-master-us-corp-kc-8c-2 Ready controlplane,etcd 4d v1.11.1 10.144.10.134 <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8a-0 Ready worker 4d v1.11.1 <none> <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8a-1 Ready worker 4d v1.11.1 <none> <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8a-2 Ready worker 4d v1.11.1 10.144.2.141 <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8b-0 Ready worker 4d v1.11.1 <none> <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8b-1 Ready worker 4d v1.11.1 <none> <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8b-2 Ready worker 4d v1.11.1 <none> <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8c-0 Ready worker 4d v1.11.1 <none> <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8c-1 Ready worker 4d v1.11.1 <none> <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8c-2 Ready worker 4d v1.11.1 <none> <none> Container Linux by CoreOS 1800.6.0 (Rhyolite) 4.14.59-coreos-r2 docker://18.3.1 Here are the kubelet logs from a nodes that regained its internal ip after a restart of the kubelet container: |
I am unable to reproduce this on a bare-metal setup:
I will try with a cloud provider and see how it goes. |
I think it's more an issue of Kubernetes and the OpenStack cloud provider, maybe in addition to an unstable OpenStack API endpoint, please also refer my issue that contains some log excerpts: kubernetes/cloud-provider-openstack#280 |
This isn't specific to the OpenStack cloud provider. We're using Azure and encountered the lost Internal IPs as well. |
This problem is reported in kubernetes kubernetes/kubernetes#68270, the kubelet fails to get the address of the node from the cloud provider and it fails to update the node status, i will keep the issue open until the issue in k8s is resolved.
|
Looks like the issue has been fixed in k8s 1.12, and there are currently requests to backport it to 1.11, but no confirmation that it will be backported |
The issue has been fixed in k8s 1.11.6; once we make v1.11.6 available to rke and rancher. |
We upgraded to 1.11.5 a few days ago and the bug that nodes lose their internal IP has not occurred again. |
@stieler-it the fix is not a part of 1.11.5; it is a part of k8s v1.11.6. So wonder if the fact that the bug hasn't occurred after the upgrade to v1.11.5 on your setup can be coincidental. |
@alena1108 Ok, good to know. However, the bug appeared pretty often (every few hours at least) and now didn't appear for like 6 days. Maybe something else mitigated the problem - or it is just luck so far. We'll see. |
Validated # 1 on master 1/4
Validated # 2 on v2.1.5-rc3 which is derived from master 1/2
|
Is it possible to get this fixed in the 1.6 branch too ? Kube version there is |
@mrmason we are going to address it there as well; here is the corresponding issue: rancher/rancher#14600 |
Rancher versions:
rancher/server or rancher/rancher: 2.0.7
rancher/agent or rancher/rancher-agent: 2.0.6
Docker version: (
docker version
,docker info
preferred)Server:
Engine:
Version: 18.03.1-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.6
Git commit: 9ee9f40
Built: Thu Apr 26 04:27:49 2018
OS/Arch: linux/amd64
Experimental: false
Operating system and kernel: (
cat /etc/os-release
,uname -r
preferred)NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1800.6.0
VERSION_ID=1800.6.0
BUILD_ID=2018-08-04-0323
PRETTY_NAME="Container Linux by CoreOS 1800.6.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
OpenStack
Steps to Reproduce:
Deploy Kubernetes 1.11.1 cluster with RKE using the rke_config.yml
rke config:
Results
A restart of the kubelet container on the affected nodes resolves this issue.
The text was updated successfully, but these errors were encountered: