Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.16.1 join err #5529

Closed
thinkeng opened this issue Apr 10, 2024 · 14 comments
Closed

1.16.1 join err #5529

thinkeng opened this issue Apr 10, 2024 · 14 comments
Labels
kind/question Indicates an issue that is a support question.

Comments

@thinkeng
Copy link

thinkeng commented Apr 10, 2024

k8s v1.27.2 + docker v24.0.7 + cri-docker v0.3.10

kubeedge v1.16.1 + dokcer v26.0.0 + cri-docker v0.3.12

systemctl status cri-docker.socket
● cri-docker.socket - CRI Docker Socket for the API
     Loaded: loaded (/etc/systemd/system/cri-docker.socket; enabled; vendor preset: enabled)
     Active: active (listening) since Wed 2024-04-10 19:38:20 CST; 12min ago
   Triggers: ● cri-docker.service
     Listen: /run/cri-dockerd.sock (Stream)
      Tasks: 0 (limit: 18731)
     Memory: 0B
        CPU: 1ms
     CGroup: /system.slice/cri-docker.socket

4月 10 19:38:20 rootk systemd[1]: Starting CRI Docker Socket for the API...
4月 10 19:38:20 rootk systemd[1]: Listening on CRI Docker Socket for the API.

加入命令

keadm join \
--cloudcore-ipport=192.168.1.158:10000 \
--with-mqtt=false \
--kubeedge-version=1.16.1 \
--edgenode-name=edge-01 \
--token=xxxxx

加入时报如下错误

I0410 19:40:28.749010  826266 command.go:901] 1. Check KubeEdge edgecore process status
I0410 19:40:28.760203  826266 command.go:901] 2. Check if the management directory is clean
I0410 19:40:28.760306  826266 join.go:94] 3. Create the necessary directories
Error: edge node join failed: validate service connection: validate CRI v1 image API for endpoint "unix:///run/containerd/containerd.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.ImageService
execute keadm command failed:  edge node join failed: validate service connection: validate CRI v1 image API for endpoint "unix:///run/containerd/containerd.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.ImageService
@thinkeng thinkeng added the kind/question Indicates an issue that is a support question. label Apr 10, 2024
@Shelley-BaoYue
Copy link
Collaborator

add --remote-runtime-endpoint=unix:///var/run/cri-dockerd.sock

@thinkeng
Copy link
Author

thinkeng commented Apr 10, 2024

添加 --remote-runtime-endpoint=unix:///var/run/cri-dockerd.sock

I0410 19:58:32.759022  875478 command.go:901] 1. Check KubeEdge edgecore process status
I0410 19:58:32.772668  875478 command.go:901] 2. Check if the management directory is clean
I0410 19:58:32.772759  875478 join.go:94] 3. Create the necessary directories
I0410 19:58:32.861311  875478 join_others.go:183] 4. Pull Images
Pulling kubeedge/installation-package:v1.16.1 ...
Successfully pulled kubeedge/installation-package:v1.16.1
I0410 19:59:22.256568  875478 join_others.go:183] 5. Copy resources from the image to the management directory
E0410 19:59:42.276078  875478 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Error: edge node join failed: copy resources failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
execute keadm command failed:  edge node join failed: copy resources failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded

@Shelley-BaoYue
Copy link
Collaborator

Shelley-BaoYue commented Apr 11, 2024

I guess the status of cri-dockerd need to be active(running)

@thinkeng
Copy link
Author

thinkeng commented Apr 11, 2024

I guess the status of cri-dockerd need to be active(running)

cri-docker.service 的配置增加

ExecStart=/usr/local/bin/cri-dockerd --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-cache-dir=/var/lib/cni/cache --cni-conf-dir=/etc/cni/net.d   --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.7

journalctl -u edgecore.service -xe

看到日志

"Starting to sync pod status with apiserver"
44] "Starting kubelet main sync loop"
68] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
status.go:68] "Attempting to register node" node="barry-edge-01"
66] failed to unmarshal message content to unstructured obj: Object 'Kind' is missing in '{"metadata":{"name":"barry-edge-01","creationTimestamp":null,"labels":{"beta.>
8] process remote failed, req[msgID[c1178973-2eaf-4f93-b725-52a9d6c59acb] resource[default/node/barry-edge-01]], err: not connected
el.go:164] Get bad anonName: when sendresp message, do nothing
o:214] "Starting CPU manager" policy="none"
o:215] "Reconciling" reconcilePeriod="10s"
36] "Initialized new in-memory state store"
88] "Updated default CPUSet" cpuSet=""
96] "Updated CPUSet assignments" assignments={}
o:49] "None policy: Start"
r.go:169] "Starting memorymanager" policy="None"
35] "Initializing new in-memory state store"
75] "Updated machine memory state"
1] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
r.go:118] "Starting Kubelet Plugin Manager"
ger.go:262] "Eviction manager: failed to get summary stats" err="failed to get node info: node length from meta db is 0"
:295] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: node length from meta db is 0
:295] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: node length from meta db is 0
:295] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: node length from meta db is 0
:295] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: node length from meta db is 0
 start connect to mqtt server with client id: hub-client-sub-1712802162
 client hub-client-sub-1712802162 isconnected: false
] connect error: Network Error : dial tcp 127.0.0.1:1883: connect: connection refused
:295] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: node length from meta db is 0

@Shelley-BaoYue
Copy link
Collaborator

make sure if edge node can curl cloudcore or if there is any err log in cloudcore.log
image

@thinkeng
Copy link
Author

make sure if edge node can curl cloudcore or if there is any err log in cloudcore.log image
9091 端口需要往外暴露么,这个端口是干什么的

@thinkeng
Copy link
Author

make sure if edge node can curl cloudcore or if there is any err log in cloudcore.log image
9091 端口需要往外暴露么,这个端口是干什么的

可以了谢谢

@thinkeng
Copy link
Author

再问一下,9091 端口需要往外暴露么? 这个端口是干什么的

@Shelley-BaoYue
Copy link
Collaborator

再问一下,9091 端口需要往外暴露么? 这个端口是干什么的

https://github.com/kubeedge/kubeedge/blob/master/pkg/apis/componentconfig/cloudcore/v1alpha1/types.go#L54

@thinkeng
Copy link
Author

thinkeng commented Apr 11, 2024

节点加入成功后 pod 一直 ContainerCreating
报错

4月 11 19:35:56 rootk edgecore[1864043]: E0411 19:35:56.965075 1864043 pod_workers.go:1294] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"edge-node-exporter-g72hr_monitoring(ecf3a3ef-bee0-412d-8a6c-84d8cdb8c055)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"edge-node-exporter-g72hr_monitoring(ecf3a3ef-bee0-412d-8a6c-84d8cdb8c055)\\\": rpc error: code = Unknown desc = failed to create a sandbox for pod \\\"edge-node-exporter-g72hr\\\": Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as \\\"xxx.slice\\\"\"" pod="monitoring/edge-node-exporter-g72hr" podUID="ecf3a3ef-bee0-412d-8a6c-84d8cdb8c055"

docker 配置
/etc/docker/daemon.json

{
  "registry-mirrors":["https://bycacelf.mirror.aliyuncs.com"],
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}

@thinkeng
Copy link
Author

@cosmosiwi
Copy link

https://kubeedge.io/docs/setup/prerequisites/runtime#configure-the-runtime-for-edgecore-using-keadm-2

可以了,谢谢谢谢

can I know how you configure the cri-dockerd and the platform you boot k8s ?

@thinkeng
Copy link
Author

thinkeng commented Apr 12, 2024

https://kubeedge.io/docs/setup/prerequisites/runtime#configure-the-runtime-for-edgecore-using-keadm-2

可以了,谢谢谢谢

can I know how you configure the cri-dockerd and the platform you boot k8s ?

keadm 安装 的 边缘节点
边缘节点 cri-docker.service 修改下面配置

ExecStart=/usr/local/bin/cri-dockerd --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-cache-dir=/var/lib/cni/cache --cni-conf-dir=/etc/cni/net.d   --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.7

用kubekey 安装的k8s集群 https://github.com/kubesphere/kubekey
k8s 集群 cri-docker.service

[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com

[Service]
Type=notify
ExecStart=/usr/bin/cri-dockerd --pod-infra-container-image registry.cn-beijing.aliyuncs.com/kubesphereio/pause:3.9
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests

3 participants