You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i have installed the cluster on 1.27 after it was done without doing anything else i have upgraded it to 1.28 (~3:42 UTC)
all nodes were up and running except one ( see screenshot).
i used hetzner console to have a look at that node , terminal is stuck in emergency mode (see screenshot 2)
i pressed 'Enter' and everything started and the node is now online again and upgraded (this was at 6:41 UTC).
upon following the recommendation by the OS looking into journalctl -xb (output_reducted.txt attached) i see that the root cause of the issue as far as i gather is that /boot/writable could not be mounted
any idea why this would happen?
Kube.tf file
## All values are referenced from here - https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/blob/master/kube.tf.examplemodule"kube-hetzner" {
providers={
hcloud = hcloud
}
source="kube-hetzner/kube-hetzner/hcloud"hcloud_token=var.hcloud_tokenrancher_install_channel="latest"initial_k3s_channel="v1.28"version="2.13.5"# ssh_port = 2222clearbase_domain="${replace(var.app_name,"-",".")}.XXX.XX"cluster_name="${var.app_name}"# rancher_hostname = "XX.XX.XX"enable_cert_manager=falseenable_rancher=falseenable_longhorn=false# enable_traefik = falseenable_klipper_metal_lb="false"control_plane_lb_enable_public_interface=true# enable_nginx = trueload_balancer_disable_public_network=falsessh_public_key=file("./ssh-key/id_rsa.pub")
# For more details on SSH see https://github.com/kube-hetzner/kube-hetzner/blob/master/docs/ssh.mdssh_private_key=file("./ssh-key/id_rsa")
network_region="eu-central"# change to `us-east` if location is ashcontrol_plane_nodepools=[
{
name ="control-plane-nbg1",
server_type ="cx21",
location ="nbg1",
labels = [],
taints = [],
count =2
},
{
name ="control-plane-hel1",
server_type ="cx21",
location ="hel1",
labels = [],
taints = [],
count =1
}
]
agent_nodepools=[
{
name ="workload-agent-0",
server_type ="cx41",
location ="nbg1",
labels = [
"node.kubernetes.io/pool=workload-agent-cx41"
],
taints = [],
count =3,
# longhorn_volume_size = 50
},
{
name ="longhorn-agent-0",
server_type ="cx41",
location ="nbg1",
labels = [
"node.kubernetes.io/server-usage=storage",
"node.kubernetes.io/pool=longhorn-agent-0"
],
taints = [],
count =3,
longhorn_volume_size =50
}
]
# * LB location and type, the latter will depend on how much load you want it to handle, see https://www.hetzner.com/cloud/load-balancerload_balancer_type="lb11"load_balancer_location="nbg1"### The following values are entirely optional (and can be removed from this if unused)# You can refine a base domain name to be use in this form of nodename.base_domain for setting the reserve dns inside Hetzner# To use local storage on the nodes, you can enable Longhorn, default is "false".# The file system type for Longhorn, if enabled (ext4 is the default, otherwise you can choose xfs)# longhorn_fstype = "xfs"# how many replica volumes should longhorn create (default is 3)longhorn_replica_count=1disable_hetzner_csi=falsekured_options={
"concurrency":3
}
# If you want to disable the Traefik ingress controller, to use the Nginx ingress controller for instance, you can can set this to "false". Default is "true".# We give you the possibility to use letsencrypt directly with Traefik because it's an easy setup, however it's not optimal,# as the free version of Traefik causes a little bit of downtime when when the certificates get renewed. For proper SSL management,# we instead recommend you to use cert-manager, that you can easily deploy with helm; see https://cert-manager.io/.# traefik_acme_tls = trueingress_controller="none"automatically_upgrade_os=trueallow_scheduling_on_control_plane=falseautomatically_upgrade_k3s=truecni_plugin="cilium"cilium_version="v1.15.4"cilium_routing_mode="native"
}
mysticaltech
changed the title
[Bug]: upgrading a clean cluster( just installed) 1.27 to 1.28 - one of the nodes stuck in emergency mode
Upgrading a clean cluster 1.27 to 1.28 - one of the nodes stuck in emergency mode
May 23, 2024
Description
i have installed the cluster on 1.27 after it was done without doing anything else i have upgraded it to 1.28 (~3:42 UTC)
all nodes were up and running except one ( see screenshot).
i used hetzner console to have a look at that node , terminal is stuck in emergency mode (see screenshot 2)
i pressed 'Enter' and everything started and the node is now online again and upgraded (this was at 6:41 UTC).
upon following the recommendation by the OS looking into
journalctl -xb
(output_reducted.txt attached) i see that the root cause of the issue as far as i gather is that/boot/writable
could not be mountedany idea why this would happen?
Kube.tf file
Screenshots
status after upgrade:
![image](https://private-user-images.githubusercontent.com/8994764/332706857-1558f29c-6014-46d6-b752-da3164b4aa54.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk5NTAwMzUsIm5iZiI6MTcxOTk0OTczNSwicGF0aCI6Ii84OTk0NzY0LzMzMjcwNjg1Ny0xNTU4ZjI5Yy02MDE0LTQ2ZDYtYjc1Mi1kYTMxNjRiNGFhNTQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcwMiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MDJUMTk0ODU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YjgxMjJhMTZiOTA5N2Y3ZGI3NWIwNDQ5NGQwOWZmODg2Njc0MGEzMzM2NjAwZTc0YWU1MGY3N2M0YjBlZjNiYiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.JYH6XeiCD5vdsCAWC_XUwC5n-q9oTjWg9L7PO_XsRSI)
stuck at emergency:
output_reducted.txt
Platform
Linux
The text was updated successfully, but these errors were encountered: