Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ks-account pod in CrashLoopBackOff after fresh install of kubesphere v2.1.1 #1925

Closed
titou10titou10 opened this issue Feb 26, 2020 · 19 comments
Assignees

Comments

@titou10titou10
Copy link

titou10titou10 commented Feb 26, 2020

Describe the Bug
I install kubesphere v2.1.1 on a fresh install of rke v1.0.4.
Everything seems OK except the"ks-account"pod that is in"CrashLoopBackOff" mode.
The pod fail with"create client certificate failed: <nil>"

I can display the console login page but can't login, it fails with"unable to access backend services"
I did the procedure twice after resetting the nodes..and the rke cluster is healthy and fully operational

Versions Used
KubeSphere: 2.1.1
Kubernetes: rancher/rke v1.0.4 fresh install

Environment
3 masters 8G + 3 workers 8G, all with centos 7.7 fully updated, selinux and firewalld disabled

How To Reproduce
Steps to reproduce the behavior:

  1. Setup 6 nodes with centos 7.7 8G
  2. Install rke with 3 masters and 3 workers
  3. Install kubesystem by following the instructions here

Expected behavior
all pods in the kubesphere-system up and running, then be able to login to the console

Logs

kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:20:10Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:22:30Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

kubectl get pods -n kubesphere-system
NAME                                   READY   STATUS             RESTARTS   AGE
ks-account-789cd8bbd5-nlvg9            0/1     CrashLoopBackOff   20         79m
ks-apigateway-5664c4b76f-8vsf4         1/1     Running            0          79m
ks-apiserver-75f468d48b-9dfwb          1/1     Running            0          79m
ks-console-78bddc5bfb-zlzq9            1/1     Running            0          79m
ks-controller-manager-d4788677-6pxhd   1/1     Running            0          79m
ks-installer-75b8d89dff-rl76c          1/1     Running            0          81m
openldap-0                             1/1     Running            0          80m
redis-6fd6c6d6f9-6nfmd                 1/1     Running            0          80m

kubectl logs -n kubesphere-system ks-account-789cd8bbd5-nlvg9
W0226 00:40:43.093650       1 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
E0226 00:40:44.709957       1 kubeconfig.go:62] create client certificate failed: <nil>
E0226 00:40:44.710030       1 im.go:1030] create user kubeconfig failed sonarqube create client certificate failed: <nil>
E0226 00:40:44.710057       1 im.go:197] user init failed sonarqube create client certificate failed: <nil>
E0226 00:40:44.710073       1 im.go:87] create default users user sonarqube init failed: create client certificate failed: <nil>
Error: user sonarqube init failed: create client certificate failed: <nil>
Usage:
  ks-iam [flags]
Flags:
      --add-dir-header                            If true, adds the file directory to the header
      --admin-email string                        default administrator's email (default "admin@kubesphere.io")
      --admin-password string                     default administrator's password (default "passw0rd")
{...}


kubectl describe pod ks-account-789cd8bbd5-nlvg9 -n kubesphere-system
Name:         ks-account-789cd8bbd5-nlvg9
Namespace:    kubesphere-system
Priority:     0
Node:         worker3/192.168.5.47
Start Time:   Tue, 25 Feb 2020 18:22:55 -0500
Labels:       app=ks-account
              pod-template-hash=789cd8bbd5
              tier=backend
              version=v2.1.1
Annotations:  cni.projectcalico.org/podIP: 10.62.5.7/32
Status:       Running
IP:           10.62.5.7
IPs:
  IP:           10.62.5.7
Controlled By:  ReplicaSet/ks-account-789cd8bbd5
Init Containers:
  wait-redis:
    Container ID:  docker://1d63b336dac9e322155ee8cc31bc266df5ab4f734de5cf683b33d8cf6abc940b
    Image:         alpine:3.10.4
    Image ID:      docker-pullable://docker.io/alpine@sha256:7c3773f7bcc969f03f8f653910001d99a9d324b4b9caa008846ad2c3089f5a5f
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      until nc -z redis.kubesphere-system.svc 6379; do echo "waiting for redis"; sleep 2; done;
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 25 Feb 2020 18:22:56 -0500
      Finished:     Tue, 25 Feb 2020 18:22:56 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kubesphere-token-hk59s (ro)
  wait-ldap:
    Container ID:  docker://b51a105434877c6a17cd4cc14bc6ad40e9d06c5542eadf1b62855a1c12cb847c
    Image:         alpine:3.10.4
    Image ID:      docker-pullable://docker.io/alpine@sha256:7c3773f7bcc969f03f8f653910001d99a9d324b4b9caa008846ad2c3089f5a5f
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      until nc -z openldap.kubesphere-system.svc 389; do echo "waiting for ldap"; sleep 2; done;
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 25 Feb 2020 18:22:57 -0500
      Finished:     Tue, 25 Feb 2020 18:23:13 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kubesphere-token-hk59s (ro)
Containers:
  ks-account:
    Container ID:  docker://033c55c2a717e672d4abe256a9955f01d46ee47e08147a0660470ac0a9ae1055
    Image:         kubesphere/ks-account:v2.1.1
    Image ID:      docker-pullable://docker.io/kubesphere/ks-account@sha256:6fccef53ab7a269160ce7816dfe3583730ac7fe2064ea5c9e3ce5e366f3470eb
    Port:          9090/TCP
    Host Port:     0/TCP
    Command:
      ks-iam
      --logtostderr=true
      --jwt-secret=$(JWT_SECRET)
      --admin-password=$(ADMIN_PASSWORD)
      --enable-multi-login=False
      --token-idle-timeout=40m
      --redis-url=redis://redis.kubesphere-system.svc:6379
      --generate-kubeconfig=true
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 25 Feb 2020 19:55:58 -0500
      Finished:     Tue, 25 Feb 2020 19:55:59 -0500
    Ready:          False
    Restart Count:  23
    Limits:
      cpu:     1
      memory:  500Mi
    Requests:
      cpu:     20m
      memory:  100Mi
    Environment:
      KUBECTL_IMAGE:   kubesphere/kubectl:v1.0.0
      JWT_SECRET:      <set to the key 'jwt-secret' in secret 'ks-account-secret'>      Optional: false
      ADMIN_PASSWORD:  <set to the key 'admin-password' in secret 'ks-account-secret'>  Optional: false
    Mounts:
      /etc/ks-iam from user-init (rw)
      /etc/kubesphere from kubesphere-config (rw)
      /etc/kubesphere/rules from policy-rules (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kubesphere-token-hk59s (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  policy-rules:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      policy-rules
    Optional:  false
  user-init:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      user-init
    Optional:  false
  kubesphere-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kubesphere-config
    Optional:  false
  kubesphere-token-hk59s:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kubesphere-token-hk59s
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 60s
                 node.kubernetes.io/unreachable:NoExecute for 60s
Events:
  Type     Reason   Age                    From              Message
  ----     ------   ----                   ----              -------
  Warning  BackOff  3m46s (x413 over 93m)  kubelet, worker3  Back-off restarting failed container
@zryfish
Copy link
Member

zryfish commented Feb 26, 2020

Can you paste yaml of kube-apiserver ? We suspect it's related to root certificate.

@titou10titou10
Copy link
Author

titou10titou10 commented Feb 26, 2020

what do you mean by"kube-apiserver"Do you mean the pod"ks-apiserver"?
I mean what command to get it and from what node?

@zryfish
Copy link
Member

zryfish commented Feb 26, 2020

no, k8s component kube-apiserver , normally you get yaml using following command

kubectl -n kube-system get po [kube-apiserver-name] -o yaml

please paste content you got above

@titou10titou10
Copy link
Author

titou10titou10 commented Feb 26, 2020

There is no pod with a name like kube-apiserver...
There is a docker container name"kube-apiserver" running on each master

kubectl -n kube-system get pods -o wide
NAME                                      READY   STATUS      RESTARTS   AGE     IP             NODE      NOMINATED NODE   READINESS GATES
canal-69rm4                               2/2     Running     0          3h39m   192.168.5.42   master1   <none>           <none>
canal-cgnh7                               2/2     Running     0          3h39m   192.168.5.43   master2   <none>           <none>
canal-ckj6w                               2/2     Running     0          3h39m   192.168.5.47   worker3   <none>           <none>
canal-fpzbm                               2/2     Running     0          3h39m   192.168.5.46   worker2   <none>           <none>
canal-xb4px                               2/2     Running     0          3h39m   192.168.5.45   worker1   <none>           <none>
canal-xbqtk                               2/2     Running     0          3h39m   192.168.5.44   master3   <none>           <none>
coredns-7c5566588d-5c9sr                  1/1     Running     0          3h39m   10.62.4.2      worker2   <none>           <none>
coredns-7c5566588d-tdjvn                  1/1     Running     0          3h39m   10.62.5.2      worker3   <none>           <none>
coredns-autoscaler-65bfc8d47d-qt55j       1/1     Running     0          3h39m   10.62.3.2      worker1   <none>           <none>
metrics-server-6b55c64f86-hxvcf           1/1     Running     0          3h39m   10.62.3.3      worker1   <none>           <none>
rke-coredns-addon-deploy-job-bgb8f        0/1     Completed   0          3h39m   192.168.5.42   master1   <none>           <none>
rke-ingress-controller-deploy-job-b76mg   0/1     Completed   0          3h39m   192.168.5.42   master1   <none>           <none>
rke-metrics-addon-deploy-job-2tcpz        0/1     Completed   0          3h39m   192.168.5.42   master1   <none>           <none>
rke-network-plugin-deploy-job-sk4nq       0/1     Completed   0          3h39m   192.168.5.42   master1   <none>           <none>
rke-user-addon-deploy-job-pvp78           0/1     Completed   0          3h39m   192.168.5.42   master1   <none>           <none>
tiller-deploy-bc4f597d8-bpkdm             1/1     Running     0          3h23m   10.62.4.4      worker2   <none>           <none>

@titou10titou10
Copy link
Author

titou10titou10 commented Feb 26, 2020

Here is the beginning of the logs of the container kube-apiserver on the first master

docker logs kube-apiserver
+ echo kube-apiserver --client-ca-file=/etc/kubernetes/ssl/kube-ca.pem --etcd-cafile=/etc/kubernetes/ssl/kube-ca.pem --proxy-client-cert-file=/etc/kubernetes/ssl/kube-apiserver-proxy-client.pem --tls-private-key-file=/etc/kubernetes/ssl/kube-apiserver-key.pem --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --profiling=false --storage-backend=etcd3 --service-account-lookup=true --authorization-mode=Node,RBAC --kubelet-client-certificate=/etc/kubernetes/ssl/kube-apiserver.pem --kubelet-client-key=/etc/kubernetes/ssl/kube-apiserver-key.pem --proxy-client-key-file=/etc/kubernetes/ssl/kube-apiserver-proxy-client-key.pem --service-cluster-ip-range=10.63.0.0/16 --service-node-port-range=30000-32767 --tls-cert-file=/etc/kubernetes/ssl/kube-apiserver.pem --requestheader-group-headers=X-Remote-Group --runtime-config=authorization.k8s.io/v1beta1=true --insecure-port=0 --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,NodeRestriction,Priority,TaintNodesByCondition,PersistentVolumeClaimResize --secure-port=6443 --requestheader-allowed-names=kube-apiserver-proxy-client --cloud-provider= --etcd-certfile=/etc/kubernetes/ssl/kube-node.pem --anonymous-auth=false --requestheader-extra-headers-prefix=X-Remote-Extra- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --requestheader-username-headers=X-Remote-User --etcd-prefix=/registry --requestheader-client-ca-file=/etc/kubernetes/ssl/kube-apiserver-requestheader-ca.pem --service-account-key-file=/etc/kubernetes/ssl/kube-service-account-token-key.pem --allow-privileged=true --etcd-keyfile=/etc/kubernetes/ssl/kube-node-key.pem --bind-address=0.0.0.0 --etcd-servers=https://ks-master1.xxx.yyy:2379,https://ks-master2.xxx.yyy:2379,https://ks-master3.xxx.yyy:2379

@titou10titou10
Copy link
Author

From master1:

docker ps
IMAGE                                 COMMAND                NAMES
70eeaa7791f2                          "./kube-rbac-proxy..." k8s_kube-rbac-proxy_node-exporter-tqdx7_kubesphere-monitoring-system_ee6cafba-151d-4900-bb74-c3be02fc9d88_0
c6eb612e18e4                          "/bin/node_exporte..." k8s_node-exporter_node-exporter-tqdx7_kubesphere-monitoring-system_ee6cafba-151d-4900-bb74-c3be02fc9d88_0
rancher/pause:3.1                     "/pause"               k8s_POD_node-exporter-tqdx7_kubesphere-monitoring-system_ee6cafba-151d-4900-bb74-c3be02fc9d88_0
ff281650a721                          "/opt/bin/flanneld..." k8s_kube-flannel_canal-69rm4_kube-system_7f98e170-111d-4fdb-b19f-1c0cb3796a6d_0
387b2425e2ee                          "start_runit"          k8s_calico-node_canal-69rm4_kube-system_7f98e170-111d-4fdb-b19f-1c0cb3796a6d_0
rancher/pause:3.1                     "/pause"               k8s_POD_canal-69rm4_kube-system_7f98e170-111d-4fdb-b19f-1c0cb3796a6d_0
rancher/hyperkube:v1.17.2-rancher1    "/opt/rke-tools/en..." kube-proxy
rancher/hyperkube:v1.17.2-rancher1    "/opt/rke-tools/en..." kubelet
rancher/hyperkube:v1.17.2-rancher1    "/opt/rke-tools/en..." kube-scheduler
rancher/hyperkube:v1.17.2-rancher1    "/opt/rke-tools/en..." kube-controller-manager
rancher/hyperkube:v1.17.2-rancher1    "/opt/rke-tools/en..." kube-apiserver
rancher/rke-tools:v0.1.52             "/opt/rke-tools/rk..." etcd-rolling-snapshots
rancher/coreos-etcd:v3.4.3-rancher1   "/usr/local/bin/et..." etcd

@zryfish
Copy link
Member

zryfish commented Feb 26, 2020

KubeSphere uses csr to issue kubeconfig to each user, that needs extra configuration on kube-apiserver, refer to https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/#a-note-to-cluster-administrators. A standard Kubernetes cluster is enabled by default, but didn't see this on a rke cluster per your comment. So you need to add this manually.

@titou10titou10
Copy link
Author

titou10titou10 commented Feb 26, 2020

Thanks for the diagnostic. I have found a post on how to activate the CSR signing feature in rke
I will test it later today
IMHO the documentation about installing kubesphere on an existing k8s should state that this is a prerequisite to have the CSR signing feature activated in kube-apiserver
https://kubesphere.io/en/install
https://github.com/kubesphere/ks-installer

@FeynmanZhou
Copy link
Member

FeynmanZhou commented Feb 26, 2020

@titou10titou10 Good suggestion, it's necessary to mention it for the on-premise Kubernetes like RKE, we will add the CSR signing in prerequisite.

Please let us know if you install KubeSphere on RKE successfully.

Regards,
Feynman

@titou10titou10
Copy link
Author

titou10titou10 commented Feb 26, 2020

It Works!
Thanks
I uninstall/reinstall rke + kubesphere
For reference, I added in the rke cluster config file the following:

services:
  kube-controller:
    extra_args:
      cluster-signing-cert-file: /etc/kubernetes/ssl/kube-ca.pem
      cluster-signing-key-file: /etc/kubernetes/ssl/kube-ca-key.pem

As said before, the doc must be updated to include the activation of the CSR feature in kube-apiserver as a prerequisite
(BTW kubesphere is fantastic, a great alternative to OpenShift IMHO)

@FeynmanZhou
Copy link
Member

@titou10titou10 Awesome! Welcome to join KubeSphere slack channel for more communication.

@FeynmanZhou
Copy link
Member

It Works!
Thanks
I uninstall/reinstall rke + kubsphere
For reference, I adde in the rke cluster config file the following:

services:
  kube-controller:
    extra_args:
      cluster-signing-cert-file: /etc/kubernetes/ssl/kube-ca.pem
      cluster-signing-key-file: /etc/kubernetes/ssl/kube-ca-key.pem

As said before, the doc must be updated to include the activation of the CSR feature in kube-apiserver as a prerequisite
(BTW kubesphere is fantastic, a great alternative to OpenShift IMHO)

@titou10titou10 BTW, did you use a default minimal installation, or started with a complete setup?

@titou10titou10
Copy link
Author

titou10titou10 commented Feb 27, 2020

I started with a minimal installation as stated in the doc, then activated the DevOps and OpenPitrix features by editing the configmap
I created a second service + dedicated ingress to access the dashboard, instead of the "NodePort" service: I have a VM with nginx that acts as a load balancer in front of the 3 workers where the default RKE ingres controller is deployed
Everything went fine and the 2 features are fonctionnal
Do you want me to test other features?

There is one thing that does not seem to work. When I try to use thekubectltool from the dashboard, a black window opens with this message in red:

Could not connect to the container. Do you have sufficient privileges?

UPDATE
In fact thekubectltool works if I choose to open it "in a new window". weird

@FeynmanZhou
Copy link
Member

@titou10titou10
You can enable other components at your will, but I recommend you to enable others for comprehensive user experience.

@wansir Could you pls help to look at the issue of web kubecl as above?

@zryfish
Copy link
Member

zryfish commented Feb 28, 2020

it appears websocket proxy is not enabled in your nginx, suggest check proxy settings of your nginx.

@titou10titou10
Copy link
Author

@zryfish you were right, fixing the nginx lb configuration for wss solved the problem

@titou10titou10
Copy link
Author

closing this as the original problem is fixed and opening a new issue for documentation

@heppytt
Copy link

heppytt commented Jun 26, 2020

It Works!
Thanks
I uninstall/reinstall rke + kubesphere
For reference, I added in the rke cluster config file the following:

services:
  kube-controller:
    extra_args:
      cluster-signing-cert-file: /etc/kubernetes/ssl/kube-ca.pem
      cluster-signing-key-file: /etc/kubernetes/ssl/kube-ca-key.pem

As said before, the doc must be updated to include the activation of the CSR feature in kube-apiserver as a prerequisite
(BTW kubesphere is fantastic, a great alternative to OpenShift IMHO)

hello
I want to know.how to install it?

@wjh-w
Copy link

wjh-w commented Nov 10, 2020

root@node2 ~]# kubectl logs -n kubesphere-system ks-controller-manager-6ccdbbb476-2tl89
W1110 22:06:51.034869 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
E1110 22:07:41.047282 1 server.go:81] failed to connect to ldap service, please check ldap status, error: factory is not able to fill the pool: LDAP Result Code 200 "Network Error": dial tcp: lookup openldap.kubesphere-system.svc on 10.68.0.2:53: read udp 172.20.1.23:59558->10.68.0.2:53: i/o timeout

I reported an error like this, please help to solve it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants