Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there something wrong with etcd certs #3708

Closed
ElfenSterben opened this issue Aug 18, 2023 · 10 comments · Fixed by #3891
Closed

Is there something wrong with etcd certs #3708

ElfenSterben opened this issue Aug 18, 2023 · 10 comments · Fixed by #3891
Labels
kind/bug Something isn't working
Milestone

Comments

@ElfenSterben
Copy link

Sealos Version

v4.3.0

How to reproduce the bug?

sealos run labring/kubernetes:v1.27.4 --masters 10.xx.xx.101 10.xx.xx.102

What is the expected behavior?

openssl x509 -in /etc/kubernetes/pki/etcd/server.crt -text

X509v3 Subject Alternative Name:
DNS:localhost, DNS:k8s-master-01-101, IP Address:127.0.0.1, IP Address:10.xx.xx.101, IP Address:10.xx.xx.102, IP Address:0:0:0:0:0:0:0:1
Signature Algorithm: sha256WithRSAEncryption

What do you see instead?

openssl x509 -in /etc/kubernetes/pki/etcd/server.crt -text

X509v3 Subject Alternative Name:
DNS:localhost, DNS:k8s-master-01-101, IP Address:127.0.0.1, IP Address:10.xx.xx.101, IP Address:0:0:0:0:0:0:0:1
Signature Algorithm: sha256WithRSAEncryption

Operating environment

- Sealos version: v4.3.0
- containerd version: C v1.6.22
- Kubernetes version: v1.27.4
- Operating system: Ubuntu20.04 LTS
- Runtime environment: 32G menory, 4 core cpu
- Cluster size: 3 master
- Additional information:

Additional information

after run
sealos run labring/kubernetes:v1.27.4 --masters 10.xx.xx.101 10.xx.xx.102
i check the etcd cert in /etc/kubernetes/pki/etcd/server.crt
it's only have the first master ip in X509v3 Subject Alternative Name
X509v3 Subject Alternative Name: DNS:localhost, DNS:k8s-master-01-101, IP Address:127.0.0.1, IP Address:10.xx.xx.101, IP Address:0:0:0:0:0:0:0:1 Signature Algorithm: sha256WithRSAEncryption

@ElfenSterben ElfenSterben added the kind/bug Something isn't working label Aug 18, 2023
@muicoder
Copy link
Contributor

try sealos run labring/kubernetes:v1.27.4 --masters 10.xx.xx.101,10.xx.xx.102

@ElfenSterben
Copy link
Author

ElfenSterben commented Aug 22, 2023

I try it with two ways
sealos run labring/kubernetes:v1.27.4 --masters 10.xx.xx.101,10.xx.xx.102
and

sealos run labring/kubernetes:v1.27.4 --masters 10.xx.xx.101
sealos add --masters 10.xx.xx.102 

but it also has the same problem

i see that the content of /etc/kubernetes/pki/apiserver.crt has the right X509v3 Subject Alternative Name, it contains 10.xx.xx.101 and 10.xx.xx.102

@muicoder
Copy link
Contributor

You can -h see the detailed usage.

@muicoder
Copy link
Contributor

Here's a correct example:

sealos run labring/kubernetes:v1.27.4 \
        --masters 192.168.64.100,192.168.64.101,192.168.64.102 --nodes 192.168.64.103,192.168.64.104
sealos add --masters 192.168.64.98,192.168.64.99 --nodes 192.168.64.105

@cuisongliu cuisongliu added this to the v4.4 milestone Aug 24, 2023
@ElfenSterben
Copy link
Author

The problem is also unresolved, maybe it's a bug.
I create etcd certs by manual and restart etcd server,it now works correctly

@ghostloda
Copy link
Collaborator

ghostloda commented Sep 5, 2023

The problem is also unresolved, maybe it's a bug. I create etcd certs by manual and restart etcd server,it now works correctly

I am confused, why do we need to add the IP addresses of other master to the etcd server.crt? It's not the k8s API server certificate .

@ElfenSterben
Copy link
Author

The problem is also unresolved, maybe it's a bug. I create etcd certs by manual and restart etcd server,it now works correctly问题也没有解决,可能是bug。我手动创建 etcd 证书并重新启动 etcd 服务器,它现在可以正常工作

I am confused, why do we need to add the IP addresses of other master to the etcd server.crt? It's not the k8s API server certificate .我很困惑,为什么我们需要将其他master的IP地址添加到etcd server.crt中?它不是 k8s API 服务器证书。

i just see that there are a lot of logs of etcd call other etcd and alert error about certificate, it says '10.xx.xx.102' not in Subject Alt Name.And the logs are increasing rapidly.So i guess the certificate of etcd needs to include other master IPs, then i try it and now it seems work well.

@cuisongliu
Copy link
Collaborator

逻辑不同,如果sealos run --masters 10.10.0.1 的话应该只有第一个节点,当你add master的时候 master的证书不变,只有新增的节点会把master0加上。
master0: master0
master1: master0 master1
master2: master0 master1 master2

大概逻辑是这样的。

@sealos-ci-robot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


The logic is different. If sealos run --masters 10.10.0.1 is used, there should be only the first node. When you add master, the master's certificate remains unchanged, and only the new node will add master0.
master0: master0
master1: master0 master1
master2: master0 master1 master2

The logic is probably this.

@dinoallo
Copy link
Contributor

The issue is probably related to sealos generate and distribute the etcd certificates with wrong DNS altnames to each master. Since etcd v3.2, it's required that each master's own etcd certificates have their hostname and IP listed in the DNS altnames for secure connection. Hence if the etcd certificates on each master are all identical, there should be a connection problem.

If anyone is currently experiencing a similar issue, a temporary band-aid to recover the cluster is in #3887

@cuisongliu cuisongliu linked a pull request Sep 11, 2023 that will close this issue
cuisongliu pushed a commit to cuisongliu/sealos that referenced this issue Sep 29, 2023
Signed-off-by: cuisongliu <cuisongliu@qq.com>

labring#3708 labring#3887
cuisongliu added a commit that referenced this issue Sep 29, 2023
* fix: dnsDomain does not take effect in kubelet (#3834) (#3835)

Signed-off-by: yangxg <yangxggo@163.com>
Co-authored-by: yangxg <yangxggo@163.com>
(cherry picked from commit c60b2fd)

* fix: ignore http server close error (#3854) (#3857)

(cherry picked from commit 2d4d78b)

* fix: skip same path (#3898) (#3899)

Co-authored-by: 榴莲榴莲 <78798447@qq.com>
(cherry picked from commit a256283)

* fix: disable scp checksum by default (#3913) (#3919)

Co-authored-by: fengxsong <fengxsong@outlook.com>
(cherry picked from commit 96cb79d)

* feat: support timeout setting for lvscare http prober (#3901) (#3905)

Co-authored-by: fengxsong <fengxsong@outlook.com>
(cherry picked from commit 6bd5c0a)

* feature: kubefile CMD support ENV variable format (#3921) (#3942)

Co-authored-by: Zihan Li <eden.zh.li@outlook.com>
(cherry picked from commit 4b5f3fe)

* delete cr build for buildah (#3953) (#3954)

Co-authored-by: yy <56745951+lingdie@users.noreply.github.com>
(cherry picked from commit 865803c)

* delete: controller part and useless service. (#3950)

* delete controllers and useless service.

* delete buildah image cr part.

* delete ci.

* roll back

(cherry picked from commit 076c7c7)
Signed-off-by: cuisongliu <cuisongliu@qq.com>

* fix: using extra valid status codes when response status code greater than 400 (#3986) (#3988)

Co-authored-by: fengxsong <fengxsong@outlook.com>
(cherry picked from commit 7be765f)

* feature(main): add lvscare gomod (#3995)

Signed-off-by: cuisongliu <cuisongliu@qq.com>
(cherry picked from commit 050d70b)

* fix(main): sync cert for cert cmd

Signed-off-by: cuisongliu <cuisongliu@qq.com>

#3708 #3887

---------

Co-authored-by: sealos-ci-robot <109538726+sealos-ci-robot@users.noreply.github.com>
Co-authored-by: yy <56745951+lingdie@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants