New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] Stop-Service rke2 does not stops all rke2 services #2204
Comments
This issue relates to how cancelation of the context created from the Since the rke2 agent will return as soon as the k3s agent returns, there is not enough time for the Ideally there would be a way to ensure that both rke2 and k3s fully process the context cancelation before rke2 exits. Currently, |
Yeah... if we did this properly we would set up wait groups and properly sequence the shutdown. I have made attempts at this in the past, but unfortunately there is a lot of code in Kubernetes, Wrangler, and K3s that calls exit or panic on error, and effectively breaks the ability to do a clean exit when the top-level context is cancelled. |
For this specific issue the behavior of k3s is really what’s getting in the way. I can think of two solutions
|
We talked about this over slack, the better solution for this issue would be enhancing the k3s |
@HarrisonWAffel this https://www.suse.com/es-es/support/kb/doc/?id=000021254 is related, right? |
@manuelbuil Yep, until this ticket is handled that is the only way to get out of the crash loop (other than a full restart of the machine) |
Validated on Version:-$ rke2.exe version v1.29.2-rc1+rke2r1 (2bb7020162174863547a0b4773b74acf6fdab71c)
Environment DetailsInfrastructure Node(s) CPU architecture, OS, and Version: Cluster Configuration: Steps to validate the fix
Reproduction Issue:
Validation Results:
|
Extension of (#1755)
Environmental Info:
RKE2 Version:
rke2.exe version v1.22.4-rc2+rke2r1 (79dc33a)
go version go1.16.10b7
Node(s) CPU architecture, OS, and Version:
Server: Linux ip-172-31-47-190 5.4.0-1009-aws #9-Ubuntu SMP Sun Apr 12 19:46:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Agent: Windows 10 (1809)
Cluster Configuration:
1 server, 1 agent
Describe the bug:
Stop-Service rke2
does not stop all the rke2 services cleanly. One or two services remains in running state randomlySome progress was made as part of ticket (#1755) but additional code changes are needed to cleanly stop all the rke2 services
Steps To Reproduce:
Stop-Service rke2
Expected behavior:
Get-Process -Name containerd,kubelet,kube-proxy,rke2,calico-node
should enlist none of these processes are runningActual behavior:
One or two services are always running and not stopped cleanly
Additional context / logs:
(#1755) This ticket has some more details on the work
The text was updated successfully, but these errors were encountered: