-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem when changing control_plane node (upgrade from Debian bullseye to bookworm) #10560
Comments
ADDENDUM : Same problem when trying to add the node in bullseye. For testing I reinstalled the node in bullseye and I had the same issue when trying to add my node in control-plane with the command described bellow. |
Hello, not clear for me, master nodes also etcd nodes or not? If it is then maybe you should also use etcd group in limit like: '--limit=kube_control_plane,etcd' |
Ok ... I'll test the adding part with etcd limit but for me control_plane include etcd ... give infos in about 20 minutes EDIT : Same error with etcd included in limit parameter. |
Doc also mentions for etcd nodes you need to set -e ignore_assert_errors=yes |
Same with the -e ignore_asset_errors=yes As I said earlier it seems that there is a problem with cert generation because I've "bad tls certificate" on logs on my new node EDIT : this process to change a node in the control_plane worked well with release-2.22 branch ... maybe a regression ? EDIT 2 : logs on other nodes :
It seems that there is a big problem on certificate generation for the new node IP : 10.141.10.64 :'( |
Is that possible that you run it before changed host order in inventory also? That could cause generating new certs if run on empty master (because that's the first node) |
Yes i'll test it now EDIT : same result when new node is in first place
Here is the syslog on new node when try to start etcd
|
Sorry I wasn't asking you to do it this way. |
As said earlier the "new node" is done by : remove the node (with the remove-node.yml) after that I upgrade my node from bullseye to bookworm then I add again my node in the cluster with the appropriate command. This process have been already done with kubespray (branch release-2.22) at this time I'd changed my control_plane node on another cluster from VM to baremetal servers using the process described in docs/nodes.md and all worked perfectly. And the node IS not the first aster because nodes.md describe how to remove/add the first control_plane node and I followed this documentation. For me there is some problems in release-2.23 branch .. don't try with master branch. Actually I was working on another solution for upgrading my nodes : |
Any update on this? or did you go about upgrading the OS itself differently? |
Hi, Sorry for the delay :) All is working, with this method. I've also tested when all node are on the same distro version (all in debian bookworm) and now deleting control_plane node and readding it works. So maybe the issue is due to the version mismatch between control_plane nodes, my two cents :) Regards |
So whats your process now? remove the node, OS upgrade, then add it back? |
Yes it is
—
Cordialement
Christophe Caillet
…________________________________
De : FingerlessGloves ***@***.***>
Envoyé : Thursday, December 7, 2023 12:13:06 PM
À : kubernetes-sigs/kubespray ***@***.***>
Cc : CAILLET, Christophe ***@***.***>; Author ***@***.***>
Objet : Re: [kubernetes-sigs/kubespray] Problem when changing control_plane node (upgrade from Debian bullseye to bookworm) (Issue #10560)
Hi,
Sorry for the delay :)
All is working, with this method. I've also tested when all node are on the same distro version (all in debian bookworm) and now deleting control_plane node and readding it works.
So maybe the issue is due to the version mismatch between control_plane nodes, my two cents :)
Regards
So whats your process now?
remove the node, OS upgrade, then add it back?
—
Reply to this email directly, view it on GitHub<#10560 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AW2PVQPP3TFJOWRPV5UX66LYIGQEFAVCNFSM6AAAAAA6Q2CVDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBVGE2TGMJRGA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Possibly related #upgrade #10808 |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Environment:
Baremetal cluster upgrading member from bullseye to bookworm, the following version are exectuted on the kubespray deployment machine
OS (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"
):Linux 5.10.0-23-amd64 x86_64
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Version of Ansible (
ansible --version
):ansible [core 2.14.11]
config file = /homes/totof/myk8s/kubespray-master/ansible.cfg
configured module search path = ['/homes/totof/myk8s/kubespray-master/library']
ansible python module location = /usr/local/lib/python3.9/dist-packages/ansible
ansible collection location = /homes/totof/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110] (/usr/bin/python3)
jinja version = 3.1.2
libyaml = True
Version of Python (
python --version
):Python 3.9.2
Kubespray version (commit) (
git rev-parse --short HEAD
):7dcc22f
Network plugin used:
cilium
Full inventory with variables (
ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"
):inventory-variables.txt
Command used to invoke ansible:
ansible-playbook -i inventory/test-l2-multi/hosts.yml --become --become-user=root -K --limit=kube_control_plane cluster.yml
Output of ansible run:
Description of the problem :
I've run the following command to remove one control_plane node which is currently in bullseye (Debian 11)
ansible-playbook -i inventory/test-l2-multi/hosts.yml --become --become-user=root -K remove-node.yml -e node=lyo0-k8s-testm00.
I previously change the order on the inventory file as described in docs/nodes.md. After that I freshly installed the node with bookworm. And after that I try to add again my node which is now on bookworm with the command
ansible-playbook -i inventory/test-l2-multi/hosts.yml --become --become-user=root -K --limit=kube_control_plane cluster.yml
Here are the logs on the new control_plane node for etcd entries :
The command failed at etcd stage (deployed as host method) withe the message described bellow. I've already done this kind of operation but only for changing hardware for control plane nodes and with kubespray release-2.22 branch and all is working well.
Any assistance will be appreciated for resolving this etcd problem
The text was updated successfully, but these errors were encountered: