Skip to content
This repository has been archived by the owner on Jul 29, 2018. It is now read-only.

K8s setup broken after updating to 1.0.0-0.8.gitb2dafda.el7 #68

Closed
LalatenduMohanty opened this issue Aug 15, 2015 · 10 comments
Closed
Labels
Milestone

Comments

@LalatenduMohanty
Copy link
Contributor

K8s setup is broken after updating to latest k8ns bits of CentOS7 i.e. kubernetes 0:1.0.0-0.8.gitb2dafda.el7

[vagrant@localhost ~]$ kubectl get nodes
error: Failed to negotiate an api version. Server supports: map[v1beta1:{} v1beta2:{} v1beta3:{}]. Client supports: [v1].

The latest build of the atomicapp vagrant box is used [1] for reproducing this issue.
[1] https://cbs.centos.org/koji/taskinfo?taskID=16911 or you can use https://atlas.hashicorp.com/atomicapp/boxes/dev-testing

Steps to reproduce:

  • vagrant init atomicapp/dev-testing
  • vagrant up
  • vagrant ssh
  • sudo kubectl get nodes
  • sudo yum update -y
  • sudo kubectl get nodes
[root@xx xxx]#  vagrant ssh

[vagrant@localhost ~]$ kubectl get nodes
NAME        LABELS                             STATUS
127.0.0.1   kubernetes.io/hostname=127.0.0.1   Ready

[vagrant@localhost ~]$  sudo yum update -y

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Dependency Installed:
  python-sqlalchemy.x86_64 0:0.9.8-1.el7                                                                                                                                                                           
Updated:
  atomic.x86_64 0:1.0-108.el7.centos                    docker.x86_64 0:1.7.1-108.el7.centos                docker-python.x86_64 0:1.4.0-108.el7.centos              docker-registry.x86_64 0:0.9.1-7.el7          
  docker-selinux.x86_64 0:1.7.1-108.el7.centos          etcd.x86_64 0:2.0.13-2.el7                          flannel.x86_64 0:0.2.0-10.el7                            kubernetes.x86_64 0:1.0.0-0.8.gitb2dafda.el7  
  kubernetes-master.x86_64 0:1.0.0-0.8.gitb2dafda.el7   kubernetes-node.x86_64 0:1.0.0-0.8.gitb2dafda.el7   python-websocket-client.noarch 0:0.14.1-108.el7.centos   tzdata.noarch 0:2015f-1.el7                   

[vagrant@localhost ~]$ kubectl get nodes
error: Failed to negotiate an api version. Server supports: map[v1beta1:{} v1beta2:{} v1beta3:{}]. Client supports: [v1].

Restarting the service does not help

[vagrant@localhost ~]$ sudo systemctl restart kube-apiserver.service 

[vagrant@localhost ~]$ kubectl get nodes
error: couldn't read version from server: Get http://localhost:8080/api: dial tcp 127.0.0.1:8080: connection refused

After yum update, I rebooted the machine and got below error

[vagrant@localhost ~]$ kubectl get nodes
error: couldn't read version from server: Get http://localhost:8080/api: dial tcp 127.0.0.1:8080: connection refused

Looks like port 8080 is not available in the vagrant box

[vagrant@localhost ~]$ telnet localhost 8080
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
@LalatenduMohanty
Copy link
Contributor Author

K8s packages before yum update

[vagrant@localhost ~]$ rpm -qa | grep kube
kubernetes-master-0.17.1-4.el7.x86_64
kubernetes-node-0.17.1-4.el7.x86_64
kubernetes-0.17.1-4.el7.x86_64

Here are the RPM packages after the yum update

[vagrant@localhost ~]$ sudo rpm -qa | grep -i kub
kubernetes-1.0.0-0.8.gitb2dafda.el7.x86_64
kubernetes-node-1.0.0-0.8.gitb2dafda.el7.x86_64
kubernetes-master-1.0.0-0.8.gitb2dafda.el7.x86_64

@LalatenduMohanty
Copy link
Contributor Author

@derekwaynecarr do you have any suggestion for this issue?

This is the Vagrant box what used to be https://github.com/LalatenduMohanty/centos7-container-app-vagrant-box

@LalatenduMohanty
Copy link
Contributor Author

"kubectl get nodes" on a fresh install of k8s works fine. So it looks like the update is broken. Thanks to @navidshaikh for the finding.

* vagrant init centos/7
* sudo yum install kubernetes
* sudo yum install etcd
* sudo systemctl enable etcd kube-apiserver kube-controller-manager kube-scheduler
* sudo systemctl enable kube-proxy kubelet
* sudo systemctl enable docker
* sudo reboot

[vagrant@localhost ~]$ kubectl get nodes
NAME        LABELS                             STATUS
127.0.0.1   kubernetes.io/hostname=127.0.0.1   Ready

@LalatenduMohanty
Copy link
Contributor Author

I did a fresh build with the latest packages. We thought issue might get resolved as the new image would have freshly installed k8s packages. But in the new box we are getting another issue i.e. kubectl get nodes does not print any useful information.

cc @navidshaikh

vagrant@localhost ~]$ kubectl get nodes
NAME        LABELS                             STATUS

[1] https://cbs.centos.org/koji/taskinfo?taskID=18983

@LalatenduMohanty
Copy link
Contributor Author

So it seems that kube-apiserver is not opening any port with the new k8s rpms.

Port list before update

vagrant@localhost ~]$ sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      1967/kubelet        
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      825/kube-proxy      
tcp        0      0 127.0.0.1:10250         0.0.0.0:*               LISTEN      1967/kubelet        
tcp        0      0 127.0.0.1:10251         0.0.0.0:*               LISTEN      545/kube-scheduler  
tcp        0      0 127.0.0.1:2380          0.0.0.0:*               LISTEN      823/etcd            
tcp        0      0 127.0.0.1:10252         0.0.0.0:*               LISTEN      544/kube-controller 
tcp        0      0 127.0.0.1:10255         0.0.0.0:*               LISTEN      1967/kubelet        
tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN      824/kube-apiserver  
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      822/sshd            
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1618/master         
tcp        0      0 127.0.0.1:7001          0.0.0.0:*               LISTEN      823/etcd            
tcp        0      0 127.0.0.1:4001          0.0.0.0:*               LISTEN      823/etcd            
tcp6       0      0 :::7080                 :::*                    LISTEN      824/kube-apiserver  
tcp6       0      0 :::6443                 :::*                    LISTEN      824/kube-apiserver  
tcp6       0      0 :::22                   :::*                    LISTEN      822/sshd            
tcp6       0      0 ::1:25                  :::*                    LISTEN      1618/master         
tcp6       0      0 :::4194                 :::*                    LISTEN      1967/kubelet        
udp        0      0 0.0.0.0:68              0.0.0.0:*                           776/dhclient        
udp        0      0 0.0.0.0:46818           0.0.0.0:*                           776/dhclient        
udp6       0      0 :::57570                :::*                                776/dhclient        
                            -    

After update + reboot

[vagrant@localhost ~]$ sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      1384/kubelet        
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      823/kube-proxy      
tcp        0      0 127.0.0.1:10250         0.0.0.0:*               LISTEN      1384/kubelet        
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN      819/etcd            
tcp        0      0 127.0.0.1:10251         0.0.0.0:*               LISTEN      538/kube-scheduler  
tcp        0      0 127.0.0.1:2380          0.0.0.0:*               LISTEN      819/etcd            
tcp        0      0 127.0.0.1:10252         0.0.0.0:*               LISTEN      537/kube-controller 
tcp        0      0 127.0.0.1:10255         0.0.0.0:*               LISTEN      1384/kubelet        
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      818/sshd            
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1414/master         
tcp        0      0 127.0.0.1:7001          0.0.0.0:*               LISTEN      819/etcd            
tcp6       0      0 :::22                   :::*                    LISTEN      818/sshd            
tcp6       0      0 ::1:25                  :::*                    LISTEN      1414/master         
tcp6       0      0 :::4194                 :::*                    LISTEN      1384/kubelet        
udp        0      0 0.0.0.0:68              0.0.0.0:*                           775/dhclient        
udp        0      0 0.0.0.0:37564           0.0.0.0:*                           775/dhclient        
udp6       0      0 :::42801                :::*                                775/dhclient        

@navidshaikh
Copy link
Contributor

@LalatenduMohanty : I tried reproducing the issue and figured the docker service is failing with error msg="Error starting daemon: error initializing graphdriver: Unknown option dm.fs

The reason docker service is failing because there are multiple entries of DOCKER_STORAGE_OPTIONS in /etc/sysconfig/docker-storage.
Just comment out the unnecessary one and restart (in order)

  • docker service
  • kubelet service

Once kubelet service is running, it registers node and you should be able to see the node by kubectl get nodes

[vagrant@localhost ~]$ kubectl get nodes
NAME      LABELS    STATUS

[vagrant@localhost ~]$ systemctl status kubelet
kubelet.service - Kubernetes Kubelet Server
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled)
   Active: inactive (dead)
     Docs: https://github.com/GoogleCloudPlatform/kubernetes

[vagrant@localhost ~]$ sudo systemctl start kubelet
A dependency job for kubelet.service failed. See 'journalctl -xn' for details.

[vagrant@localhost ~]$ sudo -i

[root@localhost ~]# systemctl status kubelet -l
kubelet.service - Kubernetes Kubelet Server
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled)
   Active: inactive (dead)
     Docs: https://github.com/GoogleCloudPlatform/kubernetes

Aug 19 10:53:50 localhost.localdomain systemd[1]: Dependency failed for Kubernetes Kubelet Server.
Aug 19 10:55:32 localhost.localdomain systemd[1]: Dependency failed for Kubernetes Kubelet Server.

[root@localhost ~]# systemctl status docker -l
docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled)
  Drop-In: /usr/lib/systemd/system/docker.service.d
           └─flannel.conf
   Active: failed (Result: exit-code) since Wed 2015-08-19 10:55:32 EDT; 37s ago
     Docs: http://docs.docker.com
  Process: 12979 ExecStart=/usr/bin/docker -d $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $ADD_REGISTRY $BLOCK_REGISTRY $INSECURE_REGISTRY (code=exited, status=1/FAILURE)
 Main PID: 12979 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/docker.service

Aug 19 10:55:32 localhost.localdomain systemd[1]: Starting Docker Application Container Engine...
Aug 19 10:55:32 localhost.localdomain docker[12979]: time="2015-08-19T10:55:32.594282398-04:00" level=fatal msg="Error starting daemon: error initializing graphdriver: Unknown option dm.fs"
Aug 19 10:55:32 localhost.localdomain systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Aug 19 10:55:32 localhost.localdomain systemd[1]: Failed to start Docker Application Container Engine.
Aug 19 10:55:32 localhost.localdomain systemd[1]: Unit docker.service entered failed state.


[root@localhost ~]# vi /etc/sysconfig/docker-storage

[root@localhost ~]# cat /etc/sysconfig/docker-storage
# This file may be automatically generated by an installation program.

# By default, Docker uses a loopback-mounted sparse file in
# /var/lib/docker.  The loopback makes it slower, and there are some
# restrictive defaults, such as 100GB max storage.

# If your installation did not set a custom storage for Docker, you
# may do it below.

# Example: Use a custom pair of raw logical volumes (one for metadata,
# one for data).
# DOCKER_STORAGE_OPTIONS = --storage-opt dm.metadatadev=/dev/mylogvol/my-docker-metadata --storage-opt dm.datadev=/dev/mylogvol/my-docker-data

# DOCKER_STORAGE_OPTIONS=

DOCKER_STORAGE_OPTIONS=--storage-opt dm.fs=xfs --storage-opt dm.datadev=/dev/mapper/vg001-docker--data --storage-opt dm.metadatadev=/dev/mapper/vg001-docker--meta

[root@localhost ~]# systemctl restart docker && systemctl status docker
docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled)
  Drop-In: /usr/lib/systemd/system/docker.service.d
           └─flannel.conf
   Active: active (running) since Wed 2015-08-19 10:59:16 EDT; 23ms ago
     Docs: http://docs.docker.com
 Main PID: 13942 (docker)
   CGroup: /system.slice/docker.service
           └─13942 /usr/bin/docker -d --selinux-enabled

Aug 19 10:59:15 localhost.localdomain systemd[1]: Starting Docker Application Container Engine...
Aug 19 10:59:15 localhost.localdomain docker[13942]: time="2015-08-19T10:59:15.138044957-04:00" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)"
Aug 19 10:59:15 localhost.localdomain docker[13942]: time="2015-08-19T10:59:15.222514984-04:00" level=error msg="WARNING: No --storage-opt dm.thinpooldev specified, using loopback; this configura...oduction use"
Aug 19 10:59:16 localhost.localdomain docker[13942]: time="2015-08-19T10:59:16.558001953-04:00" level=warning msg="Running modprobe bridge nf_nat br_netfilter failed with message: insmod /lib/modules/3.10.0-2...
Aug 19 10:59:16 localhost.localdomain docker[13942]: time="2015-08-19T10:59:16.563867645-04:00" level=info msg="Firewalld running: false"
Aug 19 10:59:16 localhost.localdomain docker[13942]: time="2015-08-19T10:59:16.805179265-04:00" level=info msg="Loading containers: start."
Aug 19 10:59:16 localhost.localdomain docker[13942]: time="2015-08-19T10:59:16.806027775-04:00" level=info msg="Loading containers: done."
Aug 19 10:59:16 localhost.localdomain docker[13942]: time="2015-08-19T10:59:16.806050478-04:00" level=info msg="Daemon has completed initialization"
Aug 19 10:59:16 localhost.localdomain docker[13942]: time="2015-08-19T10:59:16.806067338-04:00" level=info msg="Docker daemon" commit="3043001/1.7.1" execdriver=native-0.2 graphdriver=devicemapper version=1.7.1
Aug 19 10:59:16 localhost.localdomain systemd[1]: Started Docker Application Container Engine.
Hint: Some lines were ellipsized, use -l to show in full.

[root@localhost ~]# systemctl restart kubelet && systemctl status kubelet
kubelet.service - Kubernetes Kubelet Server
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled)
   Active: active (running) since Wed 2015-08-19 10:59:32 EDT; 16ms ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 14134 (kubelet)
   CGroup: /system.slice/kubelet.service
           └─14134 /usr/bin/kubelet --logtostderr=true --v=0 --api_servers=http://127.0.0.1:8080 --address=127.0.0.1 --hostname_override=127.0.0.1 --allow_privileged=false

Aug 19 10:59:32 localhost.localdomain systemd[1]: kubelet.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 19 10:59:32 localhost.localdomain systemd[1]: Unit kubelet.service entered failed state.
Aug 19 10:59:32 localhost.localdomain systemd[1]: Starting Kubernetes Kubelet Server...
Aug 19 10:59:32 localhost.localdomain systemd[1]: Started Kubernetes Kubelet Server.

[root@localhost ~]# kubectl get nodes
NAME        LABELS                             STATUS
127.0.0.1   kubernetes.io/hostname=127.0.0.1   Ready


@navidshaikh
Copy link
Contributor

@LalatenduMohanty : I think multiple entries of DOCKER_STORAGE_OPTIONS occurred while processing it in kickstart file?
https://github.com/projectatomic/adb-atomic-developer-bundle/blob/master/build_tools/kickstarts/centos-7-kubernetes-vagrant.ks#L59

@LalatenduMohanty
Copy link
Contributor Author

@navidshaikh Yes, this issue is present with https://cbs.centos.org/koji/taskinfo?taskID=18983 . Thats the build I have done to see if the issue is not occurring with fresh installation of k8s package.

Lets track the issue at #69

and lets track this issue after we do yum update i.e. updating the kubernetes packages

@LalatenduMohanty
Copy link
Contributor Author

@navidshaikh Also with respect to the initial "yum update" issue, below steps fixes it

sudo etcdctl rm --recursive /registry 
sudo systemctl restart etcd 

Thanks to @aveshagarwal for pointing that out.

Because etcd most likely might be having data using v1beta1/2 (older than v1beta3) whereas the latest in rhel/centos/atomic can only understand v1beta3 and v1.

@LalatenduMohanty
Copy link
Contributor Author

Closing the issue as in the newer builds we are not seeing this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants