Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to mount configmap/secret volume because of "no such file or directory" #1736

Closed
GsssC opened this issue May 27, 2020 · 28 comments · Fixed by #1809
Closed

Failed to mount configmap/secret volume because of "no such file or directory" #1736

GsssC opened this issue May 27, 2020 · 28 comments · Fixed by #1809
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@GsssC
Copy link
Member

GsssC commented May 27, 2020

What happened:
Failed to mount configmap/secret volume because of "no such file or directory". Although we can be sure that related resources are included in the sqlite.

I0527 10:35:29.789719     660 edged_volumes.go:54] Using volume plugin "kubernetes.io/empty-dir" to mount wrapped_kube-proxy
I0527 10:35:29.800195     660 process.go:685] get a message {Header:{ID:8b57a409-25c9-454e-a9ae-b23f0b1861a9 ParentID: Timestamp:1590546929789 ResourceVersion: Sync:true} Router:{Source:edged Group:meta Operation:query Resource:kube-system/configmap/kube-proxy} Content:<nil>}
I0527 10:35:29.800543     660 metaclient.go:121] send sync message kube-system/configmap/kube-proxy successed and response: {{ab5f3aab-11ff-48cf-8c3b-c5ded97678db 8b57a409-25c9-454e-a9ae-b23f0b1861a9 1590546929800  false} {metaManager meta response kube-system/configmap/kube-proxy} [{"data":{"config.conf":"apiVersion: kubeproxy.config.k8s.io/v1alpha1\nbindAddress: 0.0.0.0\nclientConnection:\n  acceptContentTypes: \"\"\n  burst: 0\n  contentType: \"\"\n  kubeconfig: /var/lib/kube-proxy/kubeconfig.conf\n  qps: 0\nclusterCIDR: 192.168.0.0/16\nconfigSyncPeriod: 0s\nconntrack:\n  maxPerCore: null\n  min: null\n  tcpCloseWaitTimeout: null\n  tcpEstablishedTimeout: null\nenableProfiling: false\nhealthzBindAddress: \"\"\nhostnameOverride: \"\"\niptables:\n  masqueradeAll: false\n  masqueradeBit: null\n  minSyncPeriod: 0s\n  syncPeriod: 0s\nipvs:\n  excludeCIDRs: null\n  minSyncPeriod: 0s\n  scheduler: \"\"\n  strictARP: false\n  syncPeriod: 0s\nkind: KubeProxyConfiguration\nmetricsBindAddress: \"\"\nmode: \"\"\nnodePortAddresses: null\noomScoreAdj: null\nportRange: \"\"\nudpIdleTimeout: 0s\nwinkernel:\n  enableDSR: false\n  networkName: \"\"\n  sourceVip: \"\"","kubeconfig.conf":"apiVersion: v1\nkind: Config\nclusters:\n- cluster:\n    certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt\n    server: https://10.10.102.78:6443\n  name: default\ncontexts:\n- context:\n    cluster: default\n    namespace: default\n    user: default\n  name: default\ncurrent-context: default\nusers:\n- name: default\n  user:\n    tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token"},"metadata":{"creationTimestamp":"2020-04-21T14:50:46Z","labels":{"app":"kube-proxy"},"name":"kube-proxy","namespace":"kube-system","resourceVersion":"193","selfLink":"/api/v1/namespaces/kube-system/configmaps/kube-proxy","uid":"5651c863-c755-4da4-8039-b251efc82470"}}]}
E0527 10:35:29.800949     660 configmap.go:249] Error creating atomic writer: stat /var/lib/edged/pods/25e6f0ea-6364-4bcc-9937-9760b6ec956a/volumes/kubernetes.io~configmap/kube-proxy: no such file or directory
W0527 10:35:29.801070     660 empty_dir.go:392] Warning: Unmount skipped because path does not exist: /var/lib/edged/pods/25e6f0ea-6364-4bcc-9937-9760b6ec956a/volumes/kubernetes.io~configmap/kube-proxy
I0527 10:35:29.801109     660 record.go:24] Warning FailedMount MountVolume.SetUp failed for volume "kube-proxy" : stat /var/lib/edged/pods/25e6f0ea-6364-4bcc-9937-9760b6ec956a/volumes/kubernetes.io~configmap/kube-proxy: no such file or directory
E0527 10:35:29.801199     660 nestedpendingoperations.go:270] Operation for "\"kubernetes.io/configmap/25e6f0ea-6364-4bcc-9937-9760b6ec956a-kube-proxy\" (\"25e6f0ea-6364-4bcc-9937-9760b6ec956a\")" failed. No retries permitted until 2020-05-27 10:37:31.80112802 +0800 CST m=+2599.727653327 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"kube-proxy\" (UniqueName: \"kubernetes.io/configmap/25e6f0ea-6364-4bcc-9937-9760b6ec956a-kube-proxy\") pod \"kube-proxy-gbdgw\" (UID: \"25e6f0ea-6364-4bcc-9937-9760b6ec956a\") : stat /var/lib/edged/pods/25e6f0ea-6364-4bcc-9937-9760b6ec956a/volumes/kubernetes.io~configmap/kube-proxy: no such file or directory"

What you expected to happen:
mount successffuly
How to reproduce it (as minimally and precisely as possible):
Sorry for I can not provide the way of reproduction.
Anything else we need to know?:

Environment:

  • KubeEdge version(e.g. cloudcore/edgecore --version): v1.3.0
@GsssC GsssC added the kind/bug Categorizes issue or PR as related to a bug. label May 27, 2020
@GsssC
Copy link
Member Author

GsssC commented May 27, 2020

/assign

@GsssC
Copy link
Member Author

GsssC commented May 27, 2020

Logs that may be bug-related

I0527 15:53:09.966617     660 edged.go:903] consumer: [0], worker get removed pod [kube-proxy-gbdgw]
I0527 15:53:09.966629     660 edged.go:975] start to consume removed pod [kube-proxy-gbdgw]
I0527 15:53:09.966652     660 edged.go:994] consume removed pod [kube-proxy-gbdgw] successfully
I0527 15:53:10.073450     660 reconciler.go:183] operationExecutor.UnmountVolume started for volume "kube-proxy-token-qzpzl" (UniqueName: "kubernetes.io/secret/25e6f0ea-6364-4bcc-9937-9760b6ec956a-kube-proxy-token-qzpzl") pod "25e6f0ea-6364-4bcc-9937-9760b6ec956a" (UID: "25e6f0ea-6364-4bcc-9937-9760b6ec956a") 
I0527 15:53:10.073537     660 reconciler.go:183] operationExecutor.UnmountVolume started for volume "xtables-lock" (UniqueName: "kubernetes.io/host-path/25e6f0ea-6364-4bcc-9937-9760b6ec956a-xtables-lock") pod "25e6f0ea-6364-4bcc-9937-9760b6ec956a" (UID: "25e6f0ea-6364-4bcc-9937-9760b6ec956a") 
I0527 15:53:10.073613     660 reconciler.go:183] operationExecutor.UnmountVolume started for volume "lib-modules" (UniqueName: "kubernetes.io/host-path/25e6f0ea-6364-4bcc-9937-9760b6ec956a-lib-modules") pod "25e6f0ea-6364-4bcc-9937-9760b6ec956a" (UID: "25e6f0ea-6364-4bcc-9937-9760b6ec956a") 
I0527 15:53:10.073887     660 operation_generator.go:713] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/25e6f0ea-6364-4bcc-9937-9760b6ec956a-lib-modules" (OuterVolumeSpecName: "lib-modules") pod "25e6f0ea-6364-4bcc-9937-9760b6ec956a" (UID: "25e6f0ea-6364-4bcc-9937-9760b6ec956a"). InnerVolumeSpecName "lib-modules". PluginName "kubernetes.io/host-path", VolumeGidValue ""
I0527 15:53:10.074642     660 operation_generator.go:713] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/25e6f0ea-6364-4bcc-9937-9760b6ec956a-xtables-lock" (OuterVolumeSpecName: "xtables-lock") pod "25e6f0ea-6364-4bcc-9937-9760b6ec956a" (UID: "25e6f0ea-6364-4bcc-9937-9760b6ec956a"). InnerVolumeSpecName "xtables-lock". PluginName "kubernetes.io/host-path", VolumeGidValue ""
I0527 15:53:10.102565     660 operation_generator.go:713] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/25e6f0ea-6364-4bcc-9937-9760b6ec956a-kube-proxy-token-qzpzl" (OuterVolumeSpecName: "kube-proxy-token-qzpzl") pod "25e6f0ea-6364-4bcc-9937-9760b6ec956a" (UID: "25e6f0ea-6364-4bcc-9937-9760b6ec956a"). InnerVolumeSpecName "kube-proxy-token-qzpzl". PluginName "kubernetes.io/secret", VolumeGidValue ""
E0521 14:21:19.328467   19341 edged.go:939] Unable to mount volumes for pod "kube-proxy-gbdgw_kube-system(25e6f0ea-6364-4bcc-9937-9760b6ec956a)": unmounted volumes=[kube-proxy], unattached volumes=[xtables-lock lib-modules kube-proxy-token-qzpzl kube-proxy]: timed out waiting for the condition; skipping pod
I0521 14:21:19.328548   19341 edged.go:858] worker [4] get pod addition item [kube-proxy-gbdgw]
E0521 14:21:19.328570   19341 edged.go:861] consume pod addition backoff: Back-off consume pod [kube-proxy-gbdgw] addition  error, backoff: [5m0s]
I0521 14:21:19.328616   19341 edged.go:863] worker [4] backoff pod addition item [kube-proxy-gbdgw] failed, re-add to queue
E0521 14:21:19.328639   19341 edged.go:877] worker [4] handle pod addition item [kube-proxy-gbdgw] failed: unmounted volumes=[kube-proxy], unattached volumes=[xtables-lock lib-modules kube-proxy-token-qzpzl kube-proxy]: timed out waiting for the condition, re-add to queue
I0521 14:21:19.808038   19341 edged_status.go:186] Sync VolumesInUse: []

@zzxgzgz
Copy link
Contributor

zzxgzgz commented May 27, 2020

What happened:
I encounter the same issue when trying to run Mizar with KubeEdge.

A part of EdgeCore's log:

E0527 04:21:51.734263   29955 configmap.go:249] Error creating atomic writer: stat /var/lib/edged/pods/b9c5a329-d276-4a70-a168-c413774aa937/volumes/kubernetes.io~configmap/kube-proxy: no such file or directory
W0527 04:21:51.734313   29955 empty_dir.go:392] Warning: Unmount skipped because path does not exist: /var/lib/edged/pods/b9c5a329-d276-4a70-a168-c413774aa937/volumes/kubernetes.io~configmap/kube-proxy
I0527 04:21:51.734341   29955 record.go:24] Warning FailedMount MountVolume.SetUp failed for volume "kube-proxy" : stat /var/lib/edged/pods/b9c5a329-d276-4a70-a168-c413774aa937/volumes/kubernetes.io~configmap/kube-proxy: no such file or directory

Images running on the Edge Node:

root@ip-172-31-12-211:/var/lib/edged/pods/97e36ff1-7a5e-4cc0-8158-4aba1ead53f0/volumes# docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS               NAMES
b88dff023d67        fwnetworking/testpod   "/bin/sh -c /var/miz…"   9 hours ago         Up 9 hours                              k8s_pod0_pod0_default_97e36ff1-7a5e-4cc0-8158-4aba1ead53f0_0
40fe0451c55e        kubeedge/pause:3.1     "/pause"                 9 hours ago         Up 9 hours                              k8s_POD_pod0_default_97e36ff1-7a5e-4cc0-8158-4aba1ead53f0_0
c9b6dacc73ed        kubeedge/pause:3.1     "/pause"                 9 hours ago         Up 9 hours                              k8s_POD_mizar-operator-74b778447b-gwsrf_default_d8693fb8-1e49-4ba1-8daa-102692a19755_0
205636032aee        3439b7546f29           "/usr/local/bin/kube…"   11 hours ago        Up 11 hours                             k8s_kube-proxy_kube-proxy-5zbgt_kube-system_6aa19402-09ad-483d-905e-acc24a890b57_0
9248c9716362        kubeedge/pause:3.1     "/pause"                 11 hours ago        Up 11 hours                             k8s_POD_mizar-daemon-ntbkc_default_953935ee-7239-472e-993b-803761337b17_0
3e31578d543c        kubeedge/pause:3.1     "/pause"                 11 hours ago        Up 11 hours                             k8s_POD_kube-proxy-5zbgt_kube-system_6aa19402-09ad-483d-905e-acc24a890b57_0

What you expected to happen:
Mizar to be deployed successfully, which should look like this on a worker node:

140ec7b3e263        fwnetworking/endpointopr   "/bin/sh -c 'kopf ru…"   6 days ago          Up 6 days                               k8s_mizar-operator_mizar-operator-74b778447b-76z6j_default_077dfc71-e542-4b54-9c35-04891eb67356_0
4158514b1a22        fwnetworking/dropletd      "/bin/sh -c mizard"      6 days ago          Up 6 days                               k8s_mizar-daemon_mizar-daemon-6vzvj_default_a3752409-5541-4273-b6d6-67a6599af09c_0
c1a657836b01        k8s.gcr.io/pause:3.2       "/pause"                 6 days ago          Up 6 days                               k8s_POD_mizar-operator-74b778447b-76z6j_default_077dfc71-e542-4b54-9c35-04891eb67356_0
470b7468707d        k8s.gcr.io/pause:3.2       "/pause"                 6 days ago          Up 6 days                               k8s_POD_mizar-daemon-6vzvj_default_a3752409-5541-4273-b6d6-67a6599af09c_0
5c6e219c1d05        0d40868643c6               "/usr/local/bin/kube…"   6 days ago          Up 6 days                               k8s_kube-proxy_kube-proxy-rcxx6_kube-system_717c68e4-f703-4dfb-86b0-2a8d00f6047d_0
22b01d38e723        k8s.gcr.io/pause:3.2       "/pause"                 6 days ago          Up 6 days                               k8s_POD_kube-proxy-rcxx6_kube-system_717c68e4-f703-4dfb-86b0-2a8d00f6047d_0

How to reproduce it (as minimally and precisely as possible):

  1. Create a K8s cluster(didn't use any CNI listed here as Mizar should also work as a CNI when deployed).
  2. Run cloudcore on the master node, and edgecore on the edge node.
  3. Then I run these commands in the /mizar folder to deploy it:
 ./install/create_service_account.sh
 ./install/create_crds.sh
 kubectl apply -f ./etc/deploy/daemon.deploy.yaml
 kubectl apply -f ./etc/deploy/operator.deploy.yaml

Anything else we need to know?
Below are some pre-steps to run Mizar successfully:

  1. Clone the Mizar repo from Here.
  2. Run all commands in https://github.com/futurewei-cloud/mizar/blob/dev-next/k8s/kind/Dockerfile on all worker nodes manually inside the mizar folder.
  3. In your worker nodes' mizar folder, find mizar/mizar/daemon/app.py and delete line 40:
    nsenter -t 1 -m -u -n -i rm /etc/cni/net.d/10-kindnet.conflist &&\
  4. Open port 111, 622 for RPC communication by using command 'sudo ufw allow 111/udp', also open ports for UDP.
  5. Security groups allow port 111, 622.

Environment:

  • CloudCore: KubeEdge v1.2.0-beta.0.231+1c8bfa95ced0d3-dirty
  • EdgeCore: KubeEdge v1.2.2-1+343d12f867c86d-dirty

@GsssC
Copy link
Member Author

GsssC commented May 27, 2020

@zzxgzgz I run kubeproxy on purpose, do you not want to run it, but still appear?

@zzxgzgz
Copy link
Contributor

zzxgzgz commented May 27, 2020

@GsssC That's right. I did not deploy kube-proxy. Maybe the Mizar did. But according to the Mizar team, there's no proxy in the Mizar program.

@GsssC
Copy link
Member Author

GsssC commented May 27, 2020

@zzxgzgz is there any kubeproxy pod deployed to edgenode, as shown in ’kubectl get pod -nkube-system -owide’

@zzxgzgz
Copy link
Contributor

zzxgzgz commented May 27, 2020

@GsssC Looks like there are one for cloud node and one for edge node:

root@ip-172-31-10-89:/home/ubuntu/keadm# kubectl get pod -nkube-system -owide
NAME                                      READY   STATUS    RESTARTS   AGE   IP              NODE               NOMINATED NODE   READINESS GATES
coredns-66bff467f8-h8bbf                  1/1     Running   0          21h   10.244.0.2      ip-172-31-10-89    <none>           <none>
coredns-66bff467f8-w72zd                  1/1     Running   0          21h   10.244.0.3      ip-172-31-10-89    <none>           <none>
etcd-ip-172-31-10-89                      1/1     Running   0          21h   172.31.10.89    ip-172-31-10-89    <none>           <none>
kube-apiserver-ip-172-31-10-89            1/1     Running   0          21h   172.31.10.89    ip-172-31-10-89    <none>           <none>
kube-controller-manager-ip-172-31-10-89   1/1     Running   0          21h   172.31.10.89    ip-172-31-10-89    <none>           <none>
kube-proxy-5zbgt                          1/1     Running   0          20h   172.31.12.211   ip-172-31-12-211   <none>           <none>
kube-proxy-rfx56                          1/1     Running   0          21h   172.31.10.89    ip-172-31-10-89    <none>           <none>
kube-scheduler-ip-172-31-10-89            1/1     Running   0          21h   172.31.10.89    ip-172-31-10-89    <none>           <none>

@GsssC
Copy link
Member Author

GsssC commented May 27, 2020

@zzxgzgz yes,because kubeproxy is running as a deamonset. you can get it by ‘kubectl get all -o wide -nkube-system’,that is why kubeproxy is deployed on edgenode automatically.

But it’s not related with this issue.

@zzxgzgz
Copy link
Contributor

zzxgzgz commented May 27, 2020

@GsssC Thank you for explaining, that makes more sense now. Does it mean that it is normal for kube-proxy running on the edge node? Is this the reason why the other containers cannot mount successfully?

Just a FYI, when I tried to deploy a test pod on the edge node, it was successful, but the Mizar related pods are still failing.

@GsssC
Copy link
Member Author

GsssC commented May 28, 2020

@zzxgzgz
Does it mean that it is normal for kube-proxy running on the edge node?

No, for edge scene kubeproxy should not running on the edgenode, you could modify the yaml of kubeproxy daemonset to avoid it.

But for mizar, sounds like a cni plugin. Does it need to connect to api-server via service clusterip, just like calico? If so it may need kubeproxy run well

Is this the reason why the other containers cannot mount successfully?

No, I am working on why fail to mount.

Just a FYI, when I tried to deploy a test pod on the edge node, it was successful, but the Mizar related pods are still failing.

There are many reasons why pod can not run well on edgenodes. I suggest that you could view the logs in mizar container.

@zzxgzgz
Copy link
Contributor

zzxgzgz commented May 29, 2020

@GsssC Thank you for your reply.

Yes, one of the functionalities of Mizar is to work as a CNI, and it requires kube-proxy. I suspect that's the reason why it doesn't run well with KubeEdge.

By the way, are you on KubeEdge's Slack channel? I'd like to ask you more questions about KubeEdge and it is more convenient to communicate on Slack.

Thank you.

@GsssC
Copy link
Member Author

GsssC commented Jun 1, 2020

@zzxgzgz I am on KubeEdge Slack channel and named GsssC as well. But not often to use it.

@GsssC
Copy link
Member Author

GsssC commented Jun 4, 2020

Hey guys! I have been tracking this bug for the last few days.For now, I can draw some conclusions:

  • Reason:
    I still do not know what operation or reason causes the bug. If some one also meet this bug, plz tell me the detail.

  • Phenomena:
    All pod (beside kube-proxy) which use configmaps mounted into pod will meet this bug, and pending because of configmap mount fail. And it logs :

nestedpendingoperations.go:270] Operation for "\"kubernetes.io/configmap/25e6f0ea-6364-4bcc-9937-9760b6ec956a-kube-proxy\" (\"25e6f0ea-6364-4bcc-9937-9760b6ec956a\")" failed. No retries permitted until 2020-05-27 10:37:31.80112802 +0800 CST m=+2599.727653327 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"kube-proxy\" (UniqueName: \"kubernetes.io/configmap/25e6f0ea-6364-4bcc-9937-9760b6ec956a-kube-proxy\") pod \"kube-proxy-gbdgw\" (UID: \"25e6f0ea-6364-4bcc-9937-9760b6ec956a\") : stat /var/lib/edged/pods/25e6f0ea-6364-4bcc-9937-9760b6ec956a/volumes/kubernetes.io~configmap/kube-proxy: no such file or directory"
  • Temporary solution:
    For exploring the code, we could delete the "ready file" in /var/lib/edged/pods/pod_uid/plugins to triger re-create directory.And wait for seconds, for me, everything gose ok.
    image

@GsssC
Copy link
Member Author

GsssC commented Jun 4, 2020

/cc @fisherxu @kevin-wangzefeng

@zzxgzgz
Copy link
Contributor

zzxgzgz commented Jun 4, 2020

Hey guys! I have been tracking this bug for the last few days.For now, I can draw some conclusions:

  • Reason:
    I still do not know what operation or reason causes the bug. If some one also meet this bug, plz tell me the detail.
  • Phenomena:
    All pod (beside kube-proxy) which use configmaps mounted into pod will meet this bug, and pending because of configmap mount fail. And it logs :
nestedpendingoperations.go:270] Operation for "\"kubernetes.io/configmap/25e6f0ea-6364-4bcc-9937-9760b6ec956a-kube-proxy\" (\"25e6f0ea-6364-4bcc-9937-9760b6ec956a\")" failed. No retries permitted until 2020-05-27 10:37:31.80112802 +0800 CST m=+2599.727653327 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"kube-proxy\" (UniqueName: \"kubernetes.io/configmap/25e6f0ea-6364-4bcc-9937-9760b6ec956a-kube-proxy\") pod \"kube-proxy-gbdgw\" (UID: \"25e6f0ea-6364-4bcc-9937-9760b6ec956a\") : stat /var/lib/edged/pods/25e6f0ea-6364-4bcc-9937-9760b6ec956a/volumes/kubernetes.io~configmap/kube-proxy: no such file or directory"
  • Temporary solution:
    For exploring the code, we could delete the "ready file" in /var/lib/edged/pods/pod_uid/plugins to triger re-create directory.And restart edgecore, for me, everything gose ok.
    image

Thank you for your effort!

One question for the temporary solution:
Should we delete the ready files for the failing pods? Or for the kube-proxy? Also should we only delete the 'ready' files in those folders, or should we delete the folders as well?

Thank you again.

@GsssC
Copy link
Member Author

GsssC commented Jun 4, 2020

Hey guys! I found a way to 100% reproduce the bug.

  1. the normal status
    image

  2. stop edgecore by systemctl stop edgecore

  3. remove container by docker rm -f `docker ps -aq`
    image

  4. restart edgecore.
    image

we can see the secret and configmap directory delete by edgecore when it restarts. For secret, edgecore recreate it after seconds.But for configmap, it can not be recreated by edgecore automatically. May the bug will be shown in kubelet as well, and i will do more test later.

@GsssC
Copy link
Member Author

GsssC commented Jun 4, 2020

@zzxgzgz just delete configmap ready file about specific pod. Delete whole plugins directory is a convient way, because secret reconcile is no problem.

@zzxgzgz
Copy link
Contributor

zzxgzgz commented Jun 4, 2020

@GsssC I tried your solution by deleteing the whole plugin folder. And now the kube-proxy container is giving me a different error:

E0604 16:28:25.286180       1 event.go:214] Unable to write event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"ip-172-31-12-211.161564027a866f3d", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-172-31-12-211", UID:"ip-172-31-12-211", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kube-proxy.", Source:v1.EventSource{Component:"kube-proxy", Host:"ip-172-31-12-211"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbfae66114928fd3d, ext:36109497681, loc:(*time.Location)(0x28a6880)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbfae66114928fd3d, ext:36109497681, loc:(*time.Location)(0x28a6880)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}' (retry limit exceeded!)
E0604 16:28:38.254086       1 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: Get https://172.31.10.89:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
E0604 16:28:51.375479       1 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Endpoints: Get https://172.31.10.89:6443/api/v1/endpoints?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

This is what the container folder looks like currently:
Annotation 2020-06-04 093253

Do you know what could be the reason of this, or how to fix it?

Thank you.

@GsssC
Copy link
Member Author

GsssC commented Jun 4, 2020

@zzxgzgz Have you ever reset kubernetes cluster? It seems like your kubeproxy’s kubeconfig is not suitable for current k8s cluster(api-server)

@zzxgzgz
Copy link
Contributor

zzxgzgz commented Jun 4, 2020

@GsssC I created a new cluster (and then run cloudcore & edgecore) before trying your solution. Does that count?

I also looked into the log of cloudcore, it has a weird bug as well:

25781 I0604 16:59:43.753441    3505 messagehandler.go:217] event received for node ip-172-31-12-211 id: 784e4d5e-648a-48eb-abd4-28ddaec04a7b, parent_id: 0aa8b
      807-fde8-4f29-bef4-5cb3a8ccdedb, group: resource, source: twin, resource: node/ip-172-31-12-211/membership/detail, operation: get, content: {"event_type
      ":"group_membership_event","event_id":"123","group_id":"ip-172-31-12-211","operation":"detail","timestamp":1591287463751}
25782 I0604 16:59:43.753465    3505 upstream.go:86] Dispatch message: 784e4d5e-648a-48eb-abd4-28ddaec04a7b
25783 W0604 16:59:43.753487    3505 upstream.go:90] Parse message: 784e4d5e-648a-48eb-abd4-28ddaec04a7b resource type with error: unknown resource

I looked into the code, and this error was triggered because the resource is "node/edge-node/membership/detail", and in this function:
https://github.com/kubeedge/kubeedge/blob/master/cloud/pkg/devicecontroller/controller/upstream.go#L88
If you pass in "node/ip-172-31-12-211/membership/detail", it will always return an error as it is simply checking if the string passed in ("node/edge-node/membership/detail" in this case) contains
deviceconstants.ResourceTypeTwinEdgeUpdated("twin/edge_updated").

Do you think it might be related to this error?

@GsssC
Copy link
Member Author

GsssC commented Jun 4, 2020

@zzxgzgz No,I think it’s not related to your problem.

I suggest that you should check sync-crds to confirm the new cluster’s kubeproxy configmap has been sent to edge, or edgecore will use last kubeproxy configmap in edge.db:

kubectl get crds
And then kubectl get “kubeedge.io related crd”

@GsssC
Copy link
Member Author

GsssC commented Jun 4, 2020

@zzxgzgz If so, you need to delete configmap in /var/lib/kubeedge/edgecore.db to trigger edgecore query for new configmap

@GsssC GsssC closed this as completed Jun 4, 2020
@GsssC GsssC reopened this Jun 4, 2020
@zzxgzgz
Copy link
Contributor

zzxgzgz commented Jun 4, 2020

@GsssC I am able to get the crds:

root@ip-172-31-10-89:/home/go/src/github.com/mizar# kubectl  get crds
NAME                                           CREATED AT
bouncers.mizar.com                             2020-06-04T16:19:36Z
clusterobjectsyncs.reliablesyncs.kubeedge.io   2020-06-04T16:11:48Z
devicemodels.devices.kubeedge.io               2020-06-04T16:11:48Z
devices.devices.kubeedge.io                    2020-06-04T16:11:48Z
dividers.mizar.com                             2020-06-04T16:19:36Z
droplets.mizar.com                             2020-06-04T16:19:36Z
endpoints.mizar.com                            2020-06-04T16:19:36Z
nets.mizar.com                                 2020-06-04T16:19:36Z
objectsyncs.reliablesyncs.kubeedge.io          2020-06-04T16:11:49Z
vpcs.mizar.com                                 2020-06-04T16:19:37Z

However, I'm not sure about how to check the config map has been sent to edge.

Also, should I just delete the edgecore.db and crd records? Should I also restart edgecore? How do I delete crd records? Is it by doing :

kubectl delete crds

?

Thank you.

@GsssC
Copy link
Member Author

GsssC commented Jun 4, 2020

@zzxgzgz try
kubectl get objectsyncs.reliablesyncs.kubeedge.io

@zzxgzgz
Copy link
Contributor

zzxgzgz commented Jun 4, 2020

Yes, I got something:

root@ip-172-31-10-89:/home/go/src/github.com/mizar# kubectl get objectsyncs.reliablesyncs.kubeedge.io
NAME                                                    AGE
ip-172-31-12-211.0c1faeb7-9a69-454a-92db-19ca3a33e106   80m
ip-172-31-12-211.6ebf7a85-81f1-4570-8411-b484e33dccd9   80m
ip-172-31-12-211.e3343f31-8284-447e-81af-0b58d8ed28dc   80m

Shall we continue this conversation on Slack? My username on Slack is Rio Zhu and I sent you a few messages before.

@GsssC
Copy link
Member Author

GsssC commented Jun 4, 2020

@zzxgzgz try
kubectl get objectsyncs.reliablesyncs.kubeedge.io -o yaml

If no information about kubeproxy configmap, means it is never sent to edge.

@GsssC
Copy link
Member Author

GsssC commented Jun 4, 2020

@zzxgzgz Sorry I am not used to use slack. And I think our conversation in github could help more people for bug fix.

@zzxgzgz
Copy link
Contributor

zzxgzgz commented Jun 4, 2020

You're right, I don't find any name related to the configMap:

apiVersion: v1
items:
- apiVersion: reliablesyncs.kubeedge.io/v1alpha1
  kind: ObjectSync
  metadata:
    creationTimestamp: "2020-06-04T16:19:43Z"
    generation: 1
    managedFields:
    - apiVersion: reliablesyncs.kubeedge.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:objectKind: {}
          f:objectName: {}
        f:status:
          .: {}
          f:objectResourceVersion: {}
      manager: cloudcore
      operation: Update
      time: "2020-06-04T16:59:53Z"
    name: ip-172-31-12-211.0c1faeb7-9a69-454a-92db-19ca3a33e106
    namespace: default
    resourceVersion: "9957"
    selfLink: /apis/reliablesyncs.kubeedge.io/v1alpha1/namespaces/default/objectsyncs/ip-172-31-12-211.0c1faeb7-9a69-454a-92db-19ca3a33e106
    uid: 6aa3bfe0-e751-4764-9a0e-e0537bd056fa
  spec:
    objectKind: pod
    objectName: mizar-daemon-hmsj9
  status:
    objectResourceVersion: "9956"
- apiVersion: reliablesyncs.kubeedge.io/v1alpha1
  kind: ObjectSync
  metadata:
    creationTimestamp: "2020-06-04T16:19:44Z"
    generation: 1
    managedFields:
    - apiVersion: reliablesyncs.kubeedge.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:objectKind: {}
          f:objectName: {}
        f:status:
          .: {}
          f:objectResourceVersion: {}
      manager: cloudcore
      operation: Update
      time: "2020-06-04T16:19:44Z"
    name: ip-172-31-12-211.6ebf7a85-81f1-4570-8411-b484e33dccd9
    namespace: default
    resourceVersion: "1943"
    selfLink: /apis/reliablesyncs.kubeedge.io/v1alpha1/namespaces/default/objectsyncs/ip-172-31-12-211.6ebf7a85-81f1-4570-8411-b484e33dccd9
    uid: 12b7a1f7-9abe-4a01-b2e6-735e6432cac0
  spec:
    objectKind: secret
    objectName: mizar-operator-token-q2rf2
  status:
    objectResourceVersion: "1872"
- apiVersion: reliablesyncs.kubeedge.io/v1alpha1
  kind: ObjectSync
  metadata:
    creationTimestamp: "2020-06-04T16:19:49Z"
    generation: 1
    managedFields:
    - apiVersion: reliablesyncs.kubeedge.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:objectKind: {}
          f:objectName: {}
        f:status:
          .: {}
          f:objectResourceVersion: {}
      manager: cloudcore
      operation: Update
      time: "2020-06-04T16:57:53Z"
    name: ip-172-31-12-211.e3343f31-8284-447e-81af-0b58d8ed28dc
    namespace: default
    resourceVersion: "9560"
    selfLink: /apis/reliablesyncs.kubeedge.io/v1alpha1/namespaces/default/objectsyncs/ip-172-31-12-211.e3343f31-8284-447e-81af-0b58d8ed28dc
    uid: 0b4dab0c-89bd-4c5d-a4ba-203939f7c655
  spec:
    objectKind: pod
    objectName: mizar-operator-74b778447b-l5s9f
  status:
    objectResourceVersion: "9556"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Should each pod shown in this yaml file (mizar-operator/daemon in this case) include a configMap in their yaml files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
2 participants