Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update shutdown-k3s.service #707

Merged
merged 2 commits into from
Mar 2, 2023

Conversation

davidcassany
Copy link
Contributor

Some little changes to shutdown-k3s.service. Mostly to prevent it from running if no k3s-killall script is available (e.g. in ISOs or before provisioning).

However the relevant fix is at https://build.opensuse.org/request/show/1068496 to ensure this service is enabled by default.

Related to #587

@kkaempf
Copy link
Contributor

kkaempf commented Mar 1, 2023

How does shutdown-k3s interact with RKE2 nodes ?!

@frelon
Copy link
Contributor

frelon commented Mar 1, 2023

How does shutdown-k3s interact with RKE2 nodes ?!

Hmm I did not test that.. will have a look!

@davidcassany
Copy link
Contributor Author

davidcassany commented Mar 2, 2023

How does shutdown-k3s interact with RKE2 nodes ?!

Good point. According to k3s-io/k3s#2400 (comment) I bet there should be an analog process for rke2

In addition I am still noticing some weird behavior related to this service on my env. On a provisioned cluster if I am run in the shell:

k3s-killall.sh && shutdown -r now

it reboots without issues, no errors during shutdown. However if shutting down directly and relaying on shutdown-k3s.service to stop k3s I don't see #587 (it is gone), however I consistently see an error umounting /usr/local/... this is weird. Trying to figure out the difference...

Marking it as draft for now, in any case I believe this is needed. At least to prevent service execution at shutdown before provisioning. In fact I believe this should be provided as part of the provisioning process... not entirely in our hands though

@davidcassany davidcassany marked this pull request as draft March 2, 2023 08:54
Signed-off-by: David Cassany <dcassany@suse.com>
@davidcassany davidcassany marked this pull request as ready for review March 2, 2023 15:13
Signed-off-by: David Cassany <dcassany@suse.com>
@davidcassany
Copy link
Contributor Author

davidcassany commented Mar 2, 2023

This is are the logs I get with a for a shutdown call including the changes in this PR:

[  OK  ] Unmounted /var/lib/rancher.
[  OK  ] Stopped Flush Journal to Persistent Storage.
         Unmounting /etc...
         Unmounting /var/lib/kubelet...
         Unmounting /var/log...
[  OK  ] Unmounted /etc.
[  OK  ] Unmounted /var/lib/kubelet.
[  OK  ] Unmounted /var/log.
         Unmounting /usr/local...
         Unmounting /var...
[FAILED] Failed unmounting /usr/local.
[  OK  ] Unmounted /var.
         Unmounting /run/overlay...
[  OK  ] Unmounted /run/overlay.
[  OK  ] Stopped target Preparation for Local File Systems.
[  OK  ] Stopped target Swaps.
         Stopping Monitoring of LVM…meventd or progress polling...
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Stopped Create System Users.
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Stopped Monitoring of LVM2… dmeventd or progress polling.
[  OK  ] Finished Kill containerd-shims on shutdown.
[  OK  ] Reached target System Shutdown.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Reached target Late Shutdown Services.
[  OK  ] Finished System Reboot.
[  OK  ] Reached target System Reboot.
[  124.714236][    T1] watchdog: watchdog0: watchdog did not stop!
[  124.730204][    T1] systemd-shutdown[1]: Syncing filesystems and block devices.
[  124.732986][    T1] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[  124.740777][ T1163] systemd-journald[1163]: Received SIGTERM from PID 1 (systemd-shutdow).
[  124.756717][    T1] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[  124.763001][    T1] systemd-shutdown[1]: Using hardware watchdog 'iTCO_wdt', version 0, device /dev/watchdog
[  124.767621][    T1] systemd-shutdown[1]: Unmounting file systems.
[  124.770725][ T4434] [4434]: Remounting '/usr/local' read-only in with options '(null)'.
[  124.835080][ T4434] EXT4-fs (vda5): re-mounted. Opts: (null). Quota mode: none.
[  124.844964][ T4435] [4435]: Unmounting '/usr/local'.

/usr/local fails to unmount during the first attempt, however it is successfully remounted to ro and unmounted afterwards in later stages. I don't think this is a relevant issue, but still a sign we might not properly shutdown k3s for a reboot. I am pretty much convinced this is a race issue between umount.target and shutdown-k3s.service. See the shutdown service finishes after attempting to umount /usr/local.

Interesting enough I don't see the issue if I increase the systemd verbosity to debug shutdown:

systemctl log-level debug
systemctl log-target console
shutdown -r now

Shutsdown without issues.

I have spent whole day running trials in Rke2 and k3s trying to find a way to include killall scripts as part of the shutdown and make sure they are finalized before umount.target pulls all dependencies (or conflicts to mountpoints). I could not find a way. In any case in both cases, for RKE2 and K3S, the shutdown is quick (with errors on /opt and /usr/local respectively) and reboots without errors. I'd say we can leave it like that for the time being. In all cases running the killall script just before calling shutdown works smooth.

@davidcassany
Copy link
Contributor Author

IMHO this is an issue for rancher provisioning, we should be providing such a service file, this should be provided as part of the provisioning, so the same logic that provides the k3s.service should provide a service or a way to reboot without having unclean umounts.

Copy link
Contributor

@frelon frelon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Agree that this should be handled by the provisioning, but this seems like a good workaround!

@davidcassany davidcassany merged commit f021c2c into rancher:main Mar 2, 2023
@davidcassany davidcassany deleted the fix_k3s_shutdown branch March 2, 2023 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants