Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'token credentials are in wrong format' when edgecore tries to obtain certificate from cloudcore #2362

Closed
didier-durand opened this issue Nov 23, 2020 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@didier-durand
Copy link
Contributor

What happened

Right after a fresh installation of 2 Linux machines for cloud and edge, when edgecore connects to cloudcore to obtain the security certificate (via HTTP get for ca.crt) on port 10002 , it detects a 'token credentials are in wrong format' (see log below) in the received data. This happens even though both sides were installed at exact same version (v1.4.0) with keadm.

What you expected to happen

I would expect edgecore to connect to cloudcore in order to obtain the security certificate to establish the operational connection between the 2 parts and start the management of objects.

How to reproduce it (as minimally and precisely as possible)

Create a standard install on two machines with kedam v1.4.0 and connect edge to core : the issue happens at initial connection between the 2 sides.

log excerpt on edgecore machine during issue

The log lines below keep repeating in an infinite loop (with some other in addition) as edgecore cannot connect to cloudcore.


 I1123 06:53:29.156915    7550 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/run/dockershim.sock  <nil> 0 <nil>}] <nil> <nil>}
 I1123 06:53:29.156931    7550 clientconn.go:948] ClientConn switching balancer to "pick_first"
 I1123 06:53:29.156982    7550 remote_image.go:50] parsed scheme: ""
 I1123 06:53:29.156994    7550 remote_image.go:50] scheme "" not registered, fallback to default scheme
 I1123 06:53:29.157009    7550 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/run/dockershim.sock  <nil> 0 <nil>}] <nil> <nil>}
 I1123 06:53:29.157017    7550 clientconn.go:948] ClientConn switching balancer to "pick_first"
 W1123 06:53:29.232134    7550 nvidia.go:61] NVIDIA GPU metrics will not be available: no NVIDIA devices found
 I1123 06:53:29.271146    7550 kuberuntime_manager.go:214] Container runtime docker initialized, version: 19.03.11, apiVersion: 1.40.0
 I1123 06:53:29.287348    7550 container_manager_linux.go:276] container manager verified user specified cgroup-root exists: []
 I1123 06:53:29.287384    7550 container_manager_linux.go:281] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName:>
 I1123 06:53:29.287517    7550 topology_manager.go:126] [topologymanager] Creating topology manager with none policy
 I1123 06:53:29.287540    7550 container_manager_linux.go:311] [topologymanager] Initializing Topology Manager with none policy
 I1123 06:53:29.287550    7550 container_manager_linux.go:316] Creating device plugin manager: false
 table `device` already exists, skip
 table `device_attr` already exists, skip
 table `device_twin` already exists, skip
 table `meta` already exists, skip
 I1123 06:53:29.290591    7550 log.go:181] DEBUG: Installed strategy plugin: [RoundRobin].
 I1123 06:53:29.290785    7550 log.go:181] DEBUG: ConfigurationFactory Initiated
 I1123 06:53:29.291001    7550 log.go:181] INFO: Configuration files: []
 I1123 06:53:29.291249    7550 log.go:181] WARN: empty configurtion from [FileSource]
 I1123 06:53:29.291460    7550 log.go:181] INFO: invoke dynamic handler:FileSource
 I1123 06:53:29.291678    7550 log.go:181] INFO: archaius init success
 I1123 06:53:29.293670    7550 log.go:181] INFO: create new watcher
 E1123 06:53:29.294410    7550 csi_plugin.go:226] kubernetes.io/csi: CSIDriverLister not found on KubeletVolumeHost
 I1123 06:53:29.294774    7550 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
 F1123 06:53:29.319248    7550 certmanager.go:91] Error: token credentials are in the wrong format
 goroutine 105 [running]:
 k8s.io/klog.stacks(0xc000963600, 0xc0009240d0, 0x62, 0xd0)
         /root/codes/src/github.com/kubeedge/kubeedge/vendor/k8s.io/klog/klog.go:883 +0xb9
 k8s.io/klog.(*loggingT).output(0x4b36bc0, 0xc000000003, 0xc0004f0070, 0x482b13a, 0xe, 0x5b, 0x0)
         /root/codes/src/github.com/kubeedge/kubeedge/vendor/k8s.io/klog/klog.go:834 +0x35f
 k8s.io/klog.(*loggingT).printf(0x4b36bc0, 0x3, 0x346c1b6, 0x9, 0xc0008f3e10, 0x1, 0x1)
         /root/codes/src/github.com/kubeedge/kubeedge/vendor/k8s.io/klog/klog.go:715 +0x153
 k8s.io/klog.Fatalf(...)
         /root/codes/src/github.com/kubeedge/kubeedge/vendor/k8s.io/klog/klog.go:1284
 github.com/kubeedge/kubeedge/edge/pkg/edgehub/certificate.(*CertManager).Start(0xc000975520)
         /root/codes/src/github.com/kubeedge/kubeedge/edge/pkg/edgehub/certificate/certmanager.go:91 +0xd5
 github.com/kubeedge/kubeedge/edge/pkg/edgehub.(*EdgeHub).Start(0xc000975520)
         /root/codes/src/github.com/kubeedge/kubeedge/edge/pkg/edgehub/edgehub.go:68 +0x425
 created by github.com/kubeedge/beehive/pkg/core.StartModules
         /root/codes/src/github.com/kubeedge/kubeedge/vendor/github.com/kubeedge/beehive/pkg/core/core.go:23 +0x15a
 edgecore.service: Main process exited, code=exited, status=255/EXCEPTION
 edgecore.service: Failed with result 'exit-code'.
 edgecore.service: Scheduled restart job, restart counter is at 12.

Environment

keadm identical on the 2 machines:

keadm version

version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"dfcdab01d4538ebefc2284a1b82a407d649e8f94", GitTreeState:"clean", BuildDate:"2020-08-14T09:27:27Z", GoVersion:"go1.14", Compiler:"gc", Platform:"linux/amd64"}

Microk8s version on cloudcore machine: kubernetes version 1.19.4 at snap level 1810

snap list
Name              Version    Rev    Tracking         Publisher          Notes
core              16-2.47.1  10185  latest/stable    canonical✓         core
core18            20200929   1932   latest/stable    canonical✓         base
google-cloud-sdk  318.0.0    159    latest/stable/…  google-cloud-sdk✓  classic
lxd               4.0.4      18150  4.0/stable/…     canonical✓         -
microk8s          v1.19.4    1810   1.19/edge        canonical✓         classic
snapd             2.47.1     9721   latest/stable    canonical✓         snapd

Linux on the 2 identical machines :

uname -a

Linux microk8s-ke-cloud 5.4.0-1029-gcp #31-Ubuntu SMP Wed Oct 21 19:38:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.1 LTS
Release:	20.04
Codename:	foca
```l

@didier-durand didier-durand added the kind/bug Categorizes issue or PR as related to a bug. label Nov 23, 2020
@daixiang0
Copy link
Member

I notice your test at a dirty env, please run keadm reset first then do the test. The invalid certs issue may relate to dirty env most time.

@didier-durand
Copy link
Contributor Author

@daixiang0 : Thanks for your quick answer! I'll do that asap. Shall I do it on both cloudcore and edgecore or just one ? In that case, which one?

@daixiang0
Copy link
Member

Both.

@didier-durand
Copy link
Contributor Author

Ok. I'll do and report.

@didier-durand
Copy link
Contributor Author

@daixiang0 : keadm reset seems to stop the component. How do I restart it? via init for cloud and join for edge ? Or any other way? Thanks! Didier

sudo keadm reset

KubeEdge cloudcore is stopped, For logs visit:  /var/log/kubeedge/cloudcore.log

@daixiang0
Copy link
Member

keadm reset like kubeadm reset, would do uninstall as well, so init and join again.

@didier-durand
Copy link
Contributor Author

didier-durand commented Nov 23, 2020

Ok. But, I am a bit confused now : the issue that I described initially happens just after a fresh (fully automated) install of cloudcore and edgecore on the 2 distinct machines. They did not communicate with each other (nor with any other machine) yet: just keadm init on 1 side and keadm join on the other side. And then, the edgecore infinite loop described above starts.

So, I do not really see how my environment can get dirty since it is fully fresh.

So, if I unsintall, resinstall and do init & join, I am just be in same status as described initially and the loop starts again.

Any other suggestion to fix my issue?

PS: I am happy to provide you with all the data needed to diagnose and fix.

Didier

@didier-durand
Copy link
Contributor Author

didier-durand commented Nov 23, 2020

@daixiang0: To make things more efficient for you, I created this repo https://github.com/didier-durand/microk8s-kubeedge

The README will link you to 1 full execution log where edgecore doesn't get the proper format for the security token. My shell script doing the setup is also in this repo so that you can check how we install and find if our install is incorrect or if the issue is on KubeEdge side.

Look for the line saying 'kubeedge edgecore log (after 120s)' in the raw execution log. After it, you have a full export of the journals log for edgecore.service. You will see how the issue is reached as well as the loop on it.

Thanks!
Didier

@daixiang0
Copy link
Member

table device already exists, skip

This line shows env is not totally fresh.

Here, you need set token by running keadm gettoken at cloud side.

@didier-durand
Copy link
Contributor Author

didier-durand commented Nov 24, 2020

@daixiang0:

  • The environment is fresh: the full raw log shows that it's installed from scratch on a totally new GCE instance. The table device is created on first iteration that was my problem. If you look at iterations > 1, it says what you quote.
  • Anyway, my problem was the absence of the token generation that you pointed out.

With keadm gettoken, I get the connection between edge and cloud that I needed. Thanks a lot for your support!

Closing this ticket.

Didier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants