-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow executors to define containerd
and cridockerd
behavior
#9184
Conversation
845aed9
to
5cb7c5d
Compare
Have you tested this in RKE2 yet? The sequencing of the agent startup is much fussier on that side; I'm curious if the changes will have an impact on that side. LGTM otherwise though, I am a fan of moving more stuff into the executor interface. |
I like this PR and what it tries to solve. Removing/restarting rke2 is a pain in Windows |
I was able to test out the fix for 2204 today using a custom rke2 binary compiled against this k3s branch. Once I updated the I'm looking into the CI failure today, will update once that is resolved |
It looks like the e2e tests have broken, glad to approve once those are green. |
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
5cb7c5d
to
509dcd2
Compare
Rebased this pr off of master, it looks like CI is now passing |
Any idea when we might be able to merge this? Also, should I go ahead and raise back port PR's for this change, or is that handled during the release process? |
Hey @HarrisonWAffel we entered code freeze for January releases on Friday (01/12). Is this crucial to get in for the January releases, or is a February timeline alright? cc @brandond @manuelbuil |
Nope this is not critical for the January release, I'll double back on this when code freeze is over. If there's any other action items on my side needed let me know! Thanks! |
@HarrisonWAffel devs are responsible for their own backport issues and PRs, if you're up for it you might as well get those created now. |
Proposed Changes
In order to properly address rancher/rke2#2204, the rke2 windows agent needs to be able to define how it reacts to
containerd
exiting. Currently, rke2 uses k3s' implementation forcontainerd
, which will callos.Exit
if the process stops running. On Windows nodes, this sudden exit prevents the top level context from fully canceling, resulting in supporting processes created withCommandContext
(kubelet
,kube-proxy
, etc.) continuing to run. In the event that the rke2 service is restarted, but those processes are not cleaned up, the service will enter a crash loop which must be manually resolved.k3s should provide a way for the rke2 windows agent (or any custom executor) to define how containerd is executed, in a similar manner to other components such as the
kubelet
orkube-proxy
.Executor
interface so that custom executors (i.e. rke2'spebinaryexecutor
) can define howcontainerd
andcridockerd
are executed.containerd.PreloadImages
andcontainerd.SetupContainerdConfig
so that code duplication between k3s and rke2 is reduced as much as possiblecontainerd
/cridockerd
whenagent.Agent
is called, as opposed to doing so in therun
functionTypes of Changes
Enhancement, custom executors now have control over the behavior of
containerd
andcridockerd
.Verification
The k3s embedded executor should be tested to ensure that it properly starts the desired container runtime on startup, otherwise there are no functional changes which would need more significant verification.
Testing
Manual testing:
After building a custom k3s binary, I was able to confirm that a 2 node cluster (1 cp+etcd server, 1 agent) will properly start the desired container runtime (both
containerd
andcridockerd
were tested), and that workloads can be deployed onto that cluster without issue usingkubectl
.Automated testing:
The existing integration and e2e tests cover basic startup and functionality of the k3s agent and server, so no new tests have been added. Since this PR focuses on extending the executor interface, I could not think of a way to add meaningful unit tests to cover these changes
Linked Issues
rancher/rke2#2204
User-Facing Change
Further Comments
In order to start the runtime using an executor I had to move the call from
agent.run
toagent.Agent
. This results in the agent being started before the following code is calledPreviously this call was made once the runtime was started, but before the agent was created. Looking at the code I don't think this has a negative impact on the server, however I wanted to call this out in case there is a subtlety that I'm missing in how that channel is expected to be closed