After completing the test node autonomy, the edge node status still keep ready #106

HirazawaUi · 2020-08-18T14:15:22Z

Situation description

I installed the kubernetes cluster using kubeadm. The version of the cluster is 1.16. The cluster has a master and three nodes.
After I finished installing open-yurt manually, I started trying to test whether the result of my installation was successful
I used the Test node autonomy chapter in https://github.com/alibaba/openyurt/blob/master/docs/tutorial/yurtctl.md to test
After I completed the actions in the Test node autonomy chapter, the edge node status still keep reday

Operation steps

I created a sample pod

kubectl apply -f-<<EOF
apiVersion: v1
kind: Pod
metadata:
  name: bbox
spec:
  nodeName: node3       
  containers:
  - image: busybox
    command:
    - top
    name: bbox
EOF

node3 is the edge node. I chose the simplest way to schedule the sample pod to the edge node, although this method is not recommended in the kubernetes documentation

I modified yurt-hub.yaml. make the value of --server-addr= a non-existent ip and port
```
- --server-addr=https://1.1.1.1:6448
```

Then I used the `curl -s http://127.0.0.1:10261` command to test and verify whether the edge node can work normally in offline mode. the result of the command is as expected

{
  "kind": "Status",
  "metadata": {

  },
  "status": "Failure",
  "message": "request( get : /) is not supported when cluster is unhealthy",
  "reason": "BadRequest",
  "code": 400
}

But node3 status still keep ready. and yurt-hub enters pending state

kubectl get nodes
NAME     STATUS   ROLES    AGE   VERSION
master   Ready    master   23h   v1.16.6
node1    Ready    <none>   23h   v1.16.6
node2    Ready    <none>   23h   v1.16.6
node3    Ready    <none>   23h   v1.16.6

# kubectl get pods -n kube-system | grep yurt
yurt-controller-manager-59544577cc-t948z   1/1     Running   0          5h42m
yurt-hub-node3                             0/1     Pending   0          5h32m

Some configuration items and logs that may be used as reference

Label information of each node

root@master:~# kubectl describe nodes master | grep Labels
Labels:             alibabacloud.com/is-edge-worker=false
root@master:~# kubectl describe nodes node1 | grep Labels
Labels:             alibabacloud.com/is-edge-worker=false
root@master:~# kubectl describe nodes node2 | grep Labels
Labels:             alibabacloud.com/is-edge-worker=false
root@master:~# kubectl describe nodes node3 | grep Labels
Labels:             alibabacloud.com/is-edge-worker=true

Configuration of kube-controller-manager

    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --controllers=*,bootstrapsigner,tokencleaner,-nodelifecycle
    - --kubeconfig=/etc/kubernetes/controller-manager.conf

/etc/kubernetes/manifests/yurthub.yml

# cat yurthub.yml
apiVersion: v1
kind: Pod
metadata:
  labels:
    k8s-app: yurt-hub
  name: yurt-hub
  namespace: kube-system
spec:
  volumes:
  - name: pki
    hostPath:
      path: /etc/kubernetes/pki
      type: Directory
  - name: kubernetes
    hostPath:
      path: /etc/kubernetes
      type: Directory
  - name: pem-dir
    hostPath:
      path: /var/lib/kubelet/pki
      type: Directory
  containers:
  - name: yurt-hub
    image: openyurt/yurthub:latest
    imagePullPolicy: Always
    volumeMounts:
    - name: kubernetes
      mountPath: /etc/kubernetes
    - name: pki
      mountPath: /etc/kubernetes/pki
    - name: pem-dir
      mountPath: /var/lib/kubelet/pki
    command:
    - yurthub
    - --v=2
    - --server-addr=https://1.1.1.1:6448
    - --node-name=$(NODE_NAME)
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        path: /v1/healthz
        port: 10261
      initialDelaySeconds: 300
      periodSeconds: 5
      failureThreshold: 3
    resources:
      requests:
        cpu: 150m
        memory: 150Mi
      limits:
        memory: 300Mi
    securityContext:
      capabilities:
        add: ["NET_ADMIN", "NET_RAW"]
    env:
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
  hostNetwork: true
  priorityClassName: system-node-critical
  priority: 2000001000

/etc/systemd/system/kubelet.service.d/10-kubeadm.conf

# cat  /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
#Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/var/lib/openyurt/kubelet.conf"
Environment="KUBELET_KUBECONFIG_ARGS=--kubeconfig=/var/lib/openyurt/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

/var/lib/openyurt/kubelet.conf

# cat /var/lib/openyurt/kubelet.conf
apiVersion: v1
clusters:
- cluster:
    server: http://127.0.0.1:10261
  name: default-cluster
contexts:
- context:
    cluster: default-cluster
    namespace: default
  name: default-context
current-context: default-context
kind: Config
preferences: {}
users:
- name: default-auth

Use `kubectl describe` to view yurt-hub pod information

# kubectl describe pods yurt-hub-node3 -n kube-system
Name:                 yurt-hub-node3
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 node3/
Labels:               k8s-app=yurt-hub
Annotations:          kubernetes.io/config.hash: 7be1318d63088969eafcd2fa5887f2ef
                      kubernetes.io/config.mirror: 7be1318d63088969eafcd2fa5887f2ef
                      kubernetes.io/config.seen: 2020-08-18T08:41:27.431580091Z
                      kubernetes.io/config.source: file
Status:               Pending
IP:
IPs:                  <none>
Containers:
  yurt-hub:
    Image:      openyurt/yurthub:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      yurthub
      --v=2
      --server-addr=https://10.10.13.82:6448
      --node-name=$(NODE_NAME)
    Limits:
      memory:  300Mi
    Requests:
      cpu:     150m
      memory:  150Mi
    Liveness:  http-get http://127.0.0.1:10261/v1/healthz delay=300s timeout=1s period=5s #success=1 #failure=3
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /etc/kubernetes from kubernetes (rw)
      /etc/kubernetes/pki from pki (rw)
      /var/lib/kubelet/pki from pem-dir (rw)
Volumes:
  pki:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/pki
    HostPathType:  Directory
  kubernetes:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes
    HostPathType:  Directory
  pem-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pki
    HostPathType:  Directory
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       :NoExecute
Events:            <none>

Use `docker ps` on the edge node to view the log of the yurt-hub container. Intercept the last 20 lines

# docker logs 0c89efbe949b --tail 20
I0818 13:54:13.293068       1 health_checker.go:151] ping cluster healthz with result, Get https://1.1.1.1:6448/healthz: dial tcp 1.1.1.1:6448: connect: connection refused
I0818 13:54:13.561262       1 util.go:177] kubelet get nodes: /api/v1/nodes/node3?resourceVersion=0&timeout=10s with status code 200, spent 331.836µs, left 10 requests in flight
I0818 13:54:15.746576       1 util.go:177] kubelet update leases: /apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node3?timeout=10s with status code 200, spent 83.127µs, left 10 requests in flight
I0818 13:54:15.828560       1 util.go:177] kubelet get pods: /api/v1/namespaces/kube-system/pods/yurt-hub-node3 with status code 200, spent 436.489µs, left 10 requests in flight
I0818 13:54:15.829628       1 util.go:177] kubelet patch pods: /api/v1/namespaces/kube-system/pods/yurt-hub-node3/status with status code 200, spent 307.187µs, left 10 requests in flight
I0818 13:54:17.831366       1 util.go:177] kubelet delete pods: /api/v1/namespaces/kube-system/pods/yurt-hub-node3 with status code 200, spent 147.492µs, left 10 requests in flight
I0818 13:54:17.833762       1 util.go:177] kubelet create pods: /api/v1/namespaces/kube-system/pods with status code 201, spent 111.762µs, left 10 requests in flight
I0818 13:54:22.273899       1 health_checker.go:151] ping cluster healthz with result, Get https://1.1.1.1:6448/healthz: dial tcp 1.1.1.1:6448: connect: connection refused
I0818 13:54:23.486523       1 util.go:177] kubelet watch configmaps: /api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dkube-flannel-cfg&resourceVersion=2161&timeout=7m54s&timeoutSeconds=474&watch=true with status code 200, spent 7m54.000780359s, left 9 requests in flight
I0818 13:54:23.648871       1 util.go:177] kubelet get nodes: /api/v1/nodes/node3?resourceVersion=0&timeout=10s with status code 200, spent 266.182µs, left 10 requests in flight
I0818 13:54:25.748497       1 util.go:177] kubelet update leases: /apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node3?timeout=10s with status code 200, spent 189.694µs, left 10 requests in flight
I0818 13:54:25.830919       1 util.go:177] kubelet get pods: /api/v1/namespaces/kube-system/pods/yurt-hub-node3 with status code 200, spent 1.375535ms, left 10 requests in flight
I0818 13:54:25.835015       1 util.go:177] kubelet patch pods: /api/v1/namespaces/kube-system/pods/yurt-hub-node3/status with status code 200, spent 1.363765ms, left 10 requests in flight
I0818 13:54:33.733913       1 util.go:177] kubelet get nodes: /api/v1/nodes/node3?resourceVersion=0&timeout=10s with status code 200, spent 303.499µs, left 10 requests in flight
I0818 13:54:34.261504       1 health_checker.go:151] ping cluster healthz with result, Get https://1.1.1.1:6448/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0818 13:54:35.751002       1 util.go:177] kubelet update leases: /apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node3?timeout=10s with status code 200, spent 144.723µs, left 10 requests in flight
I0818 13:54:35.830895       1 util.go:177] kubelet get pods: /api/v1/namespaces/kube-system/pods/yurt-hub-node3 with status code 200, spent 1.146812ms, left 10 requests in flight
I0818 13:54:35.834366       1 util.go:177] kubelet patch pods: /api/v1/namespaces/kube-system/pods/yurt-hub-node3/status with status code 200, spent 744.857µs, left 10 requests in flight
I0818 13:54:42.274049       1 health_checker.go:151] ping cluster healthz with result, Get https://1.1.1.1:6448/healthz: dial tcp 1.1.1.1:6448: connect: connection refused
I0818 13:54:43.818381       1 util.go:177] kubelet get nodes: /api/v1/nodes/node3?resourceVersion=0&timeout=10s with status code 200, spent 248.672µs, left 10 requests in flight

Use `kubectl logs` to view the logs of yurt-controller-manager. Intercept the last 20 lines

# kubectl logs yurt-controller-manager-59544577cc-t948z -n kube-system --tail 20
E0818 13:56:07.239721       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:10.560864       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:13.288544       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:16.726605       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:19.623694       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:23.572803       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:26.809117       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:29.021205       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:31.271086       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:34.083918       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:37.493386       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:40.222869       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:44.149011       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:47.699211       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:50.177053       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:52.553163       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:55.573328       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:56:58.677034       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:57:02.844152       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0818 13:57:05.044990       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"

At last

I very much hope that you can help me solve the problem or point out my mistakes. If there is any other information that needs to be provided, please communicate with me in time

The text was updated successfully, but these errors were encountered:

xujunjie-cover · 2020-08-20T04:22:35Z

Hello, @HirazawaUi
In your steps，Configuration of kube-controller-manager
Please cancel append -nodelifecycle to - --controllers=*,bootstrapsigner,tokencleaner，

In my opinion, Work node sends its health status to master, if send overtime, Master node will change node status;
After install yurt on work node, edge work node sends its health status to yurthub, Master node should change the node status
like descriptions in yurtctl.md;

but if you set kube-controller-manager config -nodelifecycle, Master node will not change status when receive health status overtime.

I don't know if my opinion is right, I will find introduction in Kubernetes.

Fei-Guo · 2020-08-20T06:29:02Z

Isn't this an expected behavior? With the yurt-controller-manager which re-implements the node lifecycle controller, if an edge node stops sending the heartbeats, the yurt-controller-manager will not mark node NotReady.

HirazawaUi · 2020-08-20T06:48:22Z

Hello, @HirazawaUi
In your steps，Configuration of kube-controller-manager
Please cancel append -nodelifecycle to - --controllers=*,bootstrapsigner,tokencleaner，

In my opinion, Work node sends its health status to master, if send overtime, Master node will change node status;
After install yurt on work node, edge work node sends its health status to yurthub, Master node should change the node status
like descriptions in yurtctl.md;

but if you set kube-controller-manager config -nodelifecycle, Master node will not change status when receive health status overtime.

I don't know if my opinion is right, I will find introduction in Kubernetes.

thank you very much for your answers, which solved my problem and now openyurt is running normally. I am not sure if anyone else has encountered this problem, I think this parameter should be described in more detail in the document

HirazawaUi · 2020-08-20T06:49:28Z

Isn't this an expected behavior? With the yurt-controller-manager which re-implements the node lifecycle controller, if an edge node stops sending the heartbeats, the yurt-controller-manager will not mark node NotReady.

In the Test node autonomy chapter of https://github.com/alibaba/openyurt/blob/master/docs/tutorial/yurtctl.md, I see that the expected result is that after modifying yurthub's --server-addr= , The edge node will enter the notready state, and the pod will always be in the running state

Fei-Guo · 2020-08-20T06:51:15Z

You are right. I take it back. The yurt-controller-manager just does not evict Pods and Node status is still updated to NotReady.

xujunjie-cover · 2020-08-20T07:01:55Z

Show nodes status after Convert a multi-nodes Kubernetes cluster in yurtctl.md may be better. Thanks

Fei-Guo · 2020-08-20T07:02:04Z

It is recommended that the default nodelifecycle controller should not be installed, otherwise it will conflict with yurt-controller-manager. In yurtctl tool, to ease the workflow, we delete the nodelifecycle sa from kube-system assuming that
the default nodecontroller will not work without the sa. Is this possible that in your setup, the default nodelifecycle controller still works (using other SAs)?

HirazawaUi · 2020-08-20T07:04:43Z

I observed that my simple pod entered the Terminating state about five minutes after the edge node stopped sending heartbeats. I don't know if this is normal and meets expectations. There is no similar description in the document

Information output using kubectl get pods

# kubectl get pods
NAME   READY   STATUS        RESTARTS   AGE
bbox   1/1     Terminating   0          59m

Information output using kubectl describe pods bbox

# kubectl describe pods bbox
Name:                      bbox
Namespace:                 default
Priority:                  0
Node:                      node3/10.10.13.85
Start Time:                Thu, 20 Aug 2020 06:01:18 +0000
Labels:                    <none>
Annotations:               Status:  Terminating (lasts 54m)
Termination Grace Period:  30s
IP:                        10.20.3.8
IPs:
  IP:  10.20.3.8
Containers:
  bbox:
    Container ID:  docker://1d7e95fe71f632363f7e811813ce1fd7778cfa53258cc66b5eb5aae39babca68
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:4f47c01fa91355af2865ac10fef5bf6ec9c7f42ad2321377c21e844427972977
    Port:          <none>
    Host Port:     <none>
    Command:
      top
    State:          Running
      Started:      Thu, 20 Aug 2020 06:01:37 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ccjzj (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-ccjzj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-ccjzj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason   Age   From            Message
  ----    ------   ----  ----            -------
  Normal  Pulled   60m   kubelet, node3  Successfully pulled image "busybox"
  Normal  Created  60m   kubelet, node3  Created container bbox
  Normal  Started  60m   kubelet, node3  Started container bbox

HirazawaUi · 2020-08-20T07:24:27Z

It is recommended that the default nodelifecycle controller should not be installed, otherwise it will conflict with yurt-controller-manager. In yurtctl tool, to ease the workflow, we delete the nodelifecycle sa from kube-system assuming that
the default nodecontroller will not work without the sa. Is this possible that in your setup, the default nodelifecycle controller still works (using other SAs)?

I did not find the relevant instructions in the kubernetes documentation, about using other SAs make the default nodelifecycle controller still works. In fact, this is to test the brand new kubernetes cluster installed by openyurt using kubeadm, I haven't made any changes to it

Fei-Guo · 2020-08-20T07:33:17Z

I observed that my simple pod entered the Terminating state about five minutes after the edge node stopped sending heartbeats. I don't know if this is normal and meets expectations. There is no similar description in the document

This looks like the default node controller behavior. Can you do kubectl get sa -n kube-system | grep node and see if the node controller sa is still there?

HirazawaUi · 2020-08-20T08:42:03Z

I observed that my simple pod entered the Terminating state about five minutes after the edge node stopped sending heartbeats. I don't know if this is normal and meets expectations. There is no similar description in the document

This looks like the default node controller behavior. Can you do kubectl get sa -n kube-system | grep node and see if the node controller sa is still there?

Yes, it exists in the kubernetes cluster，Do I need to delete it?

Fei-Guo · 2020-08-20T17:08:19Z

I observed that my simple pod entered the Terminating state about five minutes after the edge node stopped sending heartbeats. I don't know if this is normal and meets expectations. There is no similar description in the document

This looks like the default node controller behavior. Can you do kubectl get sa -n kube-system | grep node and see if the node controller sa is still there?

Yes, it exists in the kubernetes cluster，Do I need to delete it?

Yes, the Yurtctl supposes to delete this SA (https://github.com/alibaba/openyurt/blob/ea19a211e43324f71a318a2236799b18291df4d8/pkg/yurtctl/cmd/convert/convert.go#L215). You can manually delete it for now. We should figure out why this deletion fails.

charleszheng44 · 2020-08-20T19:59:46Z

@xujunjie-cover your understand is correct. If nodelifecycle controller is disabled, then the node status will never change. However, to my knowledge, the yurt-controller-manager will be responsible for managing the nodelifecycle, therefore the default nodelifecycle should still be disabled. @rambohe-ch would you verify this?

@HirazawaUi I just tried the manually setup process (disable the nodelifecycle controller by applying option --controllers=*,bootstrapsigner,tokencleaner,-nodelifecycle), and everything works as expected (node become NotReady after disconnected from the apiserver). Could you check if the edge node is marked as autonomous? (edgenode has the label node.beta.alibabacloud.com/autonomy=true )

Fei-Guo · 2020-08-20T20:08:19Z

Ah, I realized that you are doing the conversion manually instead of using yurtctl. Then you should use --controllers=*,bootstrapsigner,tokencleaner,-nodelifecycle option to disable the default nodelifecylce controller.
Basically, follow the step mentioned in https://github.com/alibaba/openyurt/blob/master/docs/tutorial/manually-setup.md#disable-the-default-nodelifecycle-controller

Can you please do it and repeat the test? We will go from there and see what does not meet the expectation.

HirazawaUi · 2020-08-21T02:47:16Z

@xujunjie-cover your understand is correct. If nodelifecycle controller is disabled, then the node status will never change. However, to my knowledge, the yurt-controller-manager will be responsible for managing the nodelifecycle, therefore the default nodelifecycle should still be disabled. @rambohe-ch would you verify this?

@HirazawaUi I just tried the manually setup process (disable the nodelifecycle controller by applying option --controllers=*,bootstrapsigner,tokencleaner,-nodelifecycle), and everything works as expected (node become NotReady after disconnected from the apiserver). Could you check if the edge node is marked as autonomous? (edgenode has the label node.beta.alibabacloud.com/autonomy=true )

this is the label and Annotations information of the edge node, alibabacloud.com/is-edge-worker=true label and node.beta.alibabacloud.com/autonomy: true annotation exist

Name:               node3
Roles:              <none>
Labels:             alibabacloud.com/is-edge-worker=true
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node3
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"fe:5d:79:c2:90:e5"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 10.10.13.85
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    node.beta.alibabacloud.com/autonomy: true
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 17 Aug 2020 13:55:19 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  node3
  AcquireTime:     <unset>
  RenewTime:       Fri, 21 Aug 2020 02:29:27 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 18 Aug 2020 08:38:30 +0000   Tue, 18 Aug 2020 08:38:30 +0000   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Fri, 21 Aug 2020 02:28:53 +0000   Fri, 21 Aug 2020 02:24:51 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Fri, 21 Aug 2020 02:28:53 +0000   Fri, 21 Aug 2020 02:24:51 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Fri, 21 Aug 2020 02:28:53 +0000   Fri, 21 Aug 2020 02:24:51 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Fri, 21 Aug 2020 02:28:53 +0000   Fri, 21 Aug 2020 02:24:51 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.10.13.85
  Hostname:    node3
Capacity:
  cpu:                4
  ephemeral-storage:  102684600Ki
  hugepages-2Mi:      0
  memory:             8168092Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  94634127204
  hugepages-2Mi:      0
  memory:             8065692Ki
  pods:               110
System Info:
  Machine ID:                 0500c02d85d74055b90a973b0bd7a4cc
  System UUID:                6A2B2942-28F3-C25F-F60C-2D14B7B068F2
  Boot ID:                    92c16623-c1be-4064-aa23-4f372029c12b
  Kernel Version:             4.15.0-99-generic
  OS Image:                   Ubuntu 18.04.4 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.12
  Kubelet Version:            v1.16.6
  Kube-Proxy Version:         v1.16.6
PodCIDR:                      10.20.3.0/24
PodCIDRs:                     10.20.3.0/24
Non-terminated Pods:          (2 in total)
  Namespace                   Name                           CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                           ------------  ----------  ---------------  -------------  ---
  kube-system                 kube-flannel-ds-amd64-nj6q8    100m (2%)     100m (2%)   50Mi (0%)        50Mi (0%)      3d12h
  kube-system                 kube-proxy-n9nxr               0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d12h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  100m (2%)
  memory             50Mi (0%)  50Mi (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:
  Type    Reason                   Age                 From            Message
  ----    ------                   ----                ----            -------
  Normal  NodeHasSufficientMemory  12m (x21 over 20h)  kubelet, node3  Node node3 status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    12m (x21 over 20h)  kubelet, node3  Node node3 status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     12m (x21 over 20h)  kubelet, node3  Node node3 status is now: NodeHasSufficientPID
  Normal  NodeReady                12m (x3 over 20h)   kubelet, node3  Node node3 status is now: NodeReady

HirazawaUi · 2020-08-21T02:51:22Z

Ah, I realized that you are doing the conversion manually instead of using yurtctl. Then you should use --controllers=*,bootstrapsigner,tokencleaner,-nodelifecycle option to disable the default nodelifecylce controller.
Basically, follow the step mentioned in https://github.com/alibaba/openyurt/blob/master/docs/tutorial/manually-setup.md#disable-the-default-nodelifecycle-controller

Can you please do it and repeat the test? We will go from there and see what does not meet the expectation.

I have executed kubectl delete -f yurt-controller-manager.yaml to delete the deployment controller and rebuild, but the result is still the same as above, the edge node state is still keep ready, should I try other versions of kubernetes or try to follow the document example only A kubernetes cluster with one master and one node

Fei-Guo · 2020-08-21T03:20:16Z

I have executed kubectl delete -f yurt-controller-manager.yaml to delete the deployment controller and rebuild, but the result is still the same as above, the edge node state is still keep ready, should I try other versions of kubernetes or try to follow the document example only A kubernetes cluster with one master and one pod

Please note that you should restart the default Kubernetes controller-manager kube-controller-manager in kube-system with the correct option, not the yurt-controller-manager.

HirazawaUi · 2020-08-21T03:28:30Z

I observed that my simple pod entered the Terminating state about five minutes after the edge node stopped sending heartbeats. I don't know if this is normal and meets expectations. There is no similar description in the document

This looks like the default node controller behavior. Can you do kubectl get sa -n kube-system | grep node and see if the node controller sa is still there?

before this. I have modified the --controllers of kube-controller-manager. It has restarted automatically

charleszheng44 · 2020-08-21T06:10:32Z

Ah, I realized that you are doing the conversion manually instead of using yurtctl. Then you should use --controllers=*,bootstrapsigner,tokencleaner,-nodelifecycle option to disable the default nodelifecylce controller.
Basically, follow the step mentioned in https://github.com/alibaba/openyurt/blob/master/docs/tutorial/manually-setup.md#disable-the-default-nodelifecycle-controller
Can you please do it and repeat the test? We will go from there and see what does not meet the expectation.

I have executed kubectl delete -f yurt-controller-manager.yaml to delete the deployment controller and rebuild, but the result is still the same as above, the edge node state is still keep ready, should I try other versions of kubernetes or try to follow the document example only A kubernetes cluster with one master and one node

@HirazawaUi Thanks for the detailed log output. I will verify the manual setup process on a multi-nodes 1.16 Kubernetes, and let you know if the problem can be reproduced.

HirazawaUi · 2020-09-07T06:54:25Z

@charleszheng44 @Fei-Guo
I think I know where the problem is. I fell into a misunderstanding of thinking. I rechecked the log of yurt-controller-manager and found that the problem is simply because the default serviceaccount in kube-system has insufficient permissions.
Then,I use The serviceaccount was created on the command line, and the manifest of yurt-controller-manager was modified to bind it with yurt-controller-manager. Now all the results are in line with expectations

Show the log of yurt-controller-manager again

E0905 07:02:52.347282       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0905 07:02:55.841780       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0905 07:03:00.146093       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0905 07:03:04.198585       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0905 07:03:08.111697       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0905 07:03:12.270039       1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: leases.coordination.k8s.io "yurt-controller-manager" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"

Create serviceaccount

# kubectl -n kube-system create sa yurt-cm
serviceaccount/yurt-cm created
# kubectl create clusterrolebinding yurt-cm --clusterrole=cluster-admin --serviceaccount=kube-system:yurt-cm
clusterrolebinding.rbac.authorization.k8s.io/yurt-cm created

Modify the deploy of yurt-controller-manager

      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: yurt-cm
      serviceAccountName: yurt-cm
      terminationGracePeriodSeconds: 30

vincent-pli · 2020-10-19T05:32:40Z

@charleszheng44
I hit the issue again today, seems we do not resolve it: #52

charleszheng44 · 2020-10-20T00:54:34Z

@charleszheng44
I hit the issue again today, seems we do not resolve it: #52

Thanks for reporting the issue, I forgot to fix it 😅. #123 should resolve the problem.

HirazawaUi closed this as completed Aug 20, 2020

HirazawaUi reopened this Aug 20, 2020

charleszheng44 self-assigned this Oct 19, 2020

charleszheng44 mentioned this issue Oct 20, 2020

Debug: grant the privilege of accessing lease resource to YCM #123

Merged

charleszheng44 closed this as completed in #123 Oct 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After completing the test node autonomy, the edge node status still keep ready #106

After completing the test node autonomy, the edge node status still keep ready #106

HirazawaUi commented Aug 18, 2020 •

edited

Loading

I created a sample pod

I modified yurt-hub.yaml. make the value of `--server-addr=` a non-existent ip and port

Then I used the `curl -s http://127.0.0.1:10261` command to test and verify whether the edge node can work normally in offline mode. the result of the command is as expected

But node3 status still keep ready. and yurt-hub enters pending state

Label information of each node

Configuration of kube-controller-manager

/etc/kubernetes/manifests/yurthub.yml

/etc/systemd/system/kubelet.service.d/10-kubeadm.conf

/var/lib/openyurt/kubelet.conf

Use `kubectl describe` to view yurt-hub pod information

Use `docker ps` on the edge node to view the log of the yurt-hub container. Intercept the last 20 lines

Use `kubectl logs` to view the logs of yurt-controller-manager. Intercept the last 20 lines

xujunjie-cover commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020

HirazawaUi commented Aug 20, 2020

HirazawaUi commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020

xujunjie-cover commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020

HirazawaUi commented Aug 20, 2020

HirazawaUi commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020

HirazawaUi commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020

charleszheng44 commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020 •

edited

Loading

HirazawaUi commented Aug 21, 2020 •

edited

Loading

HirazawaUi commented Aug 21, 2020 •

edited

Loading

Fei-Guo commented Aug 21, 2020

HirazawaUi commented Aug 21, 2020

charleszheng44 commented Aug 21, 2020

HirazawaUi commented Sep 7, 2020

vincent-pli commented Oct 19, 2020

charleszheng44 commented Oct 20, 2020

After completing the test node autonomy, the edge node status still keep ready #106

After completing the test node autonomy, the edge node status still keep ready #106

Comments

HirazawaUi commented Aug 18, 2020 • edited Loading

Situation description

Operation steps

I created a sample pod

I modified yurt-hub.yaml. make the value of --server-addr= a non-existent ip and port

Then I used the curl -s http://127.0.0.1:10261 command to test and verify whether the edge node can work normally in offline mode. the result of the command is as expected

But node3 status still keep ready. and yurt-hub enters pending state

Some configuration items and logs that may be used as reference

Label information of each node

Configuration of kube-controller-manager

/etc/kubernetes/manifests/yurthub.yml

/etc/systemd/system/kubelet.service.d/10-kubeadm.conf

/var/lib/openyurt/kubelet.conf

Use kubectl describe to view yurt-hub pod information

Use docker ps on the edge node to view the log of the yurt-hub container. Intercept the last 20 lines

Use kubectl logs to view the logs of yurt-controller-manager. Intercept the last 20 lines

At last

xujunjie-cover commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020

HirazawaUi commented Aug 20, 2020

HirazawaUi commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020

xujunjie-cover commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020

HirazawaUi commented Aug 20, 2020

HirazawaUi commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020

HirazawaUi commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020

charleszheng44 commented Aug 20, 2020

Fei-Guo commented Aug 20, 2020 • edited Loading

HirazawaUi commented Aug 21, 2020 • edited Loading

HirazawaUi commented Aug 21, 2020 • edited Loading

Fei-Guo commented Aug 21, 2020

HirazawaUi commented Aug 21, 2020

charleszheng44 commented Aug 21, 2020

HirazawaUi commented Sep 7, 2020

Show the log of yurt-controller-manager again

Create serviceaccount

Modify the deploy of yurt-controller-manager

vincent-pli commented Oct 19, 2020

charleszheng44 commented Oct 20, 2020

HirazawaUi commented Aug 18, 2020 •

edited

Loading

I modified yurt-hub.yaml. make the value of `--server-addr=` a non-existent ip and port

Then I used the `curl -s http://127.0.0.1:10261` command to test and verify whether the edge node can work normally in offline mode. the result of the command is as expected

Use `kubectl describe` to view yurt-hub pod information

Use `docker ps` on the edge node to view the log of the yurt-hub container. Intercept the last 20 lines

Use `kubectl logs` to view the logs of yurt-controller-manager. Intercept the last 20 lines

Fei-Guo commented Aug 20, 2020 •

edited

Loading

HirazawaUi commented Aug 21, 2020 •

edited

Loading

HirazawaUi commented Aug 21, 2020 •

edited

Loading