New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to skip building hook systemd unit and use raw manifest #4764
Comments
What about a daemonset or hook that runs a container? You can do basically anything, a good example is the nvidia driver: https://github.com/kubernetes/kops/tree/master/hooks/nvidia-bootstrap We need a hook where this has to run as part of boot, but I don't think there's any reason why this has to run first (I'm assuming the instances aren't that small!) A daemonset is also much easier to debug, because you can develop it without having to reboot the cluster. The static-files approach is indeed pretty limiting, but I'm not aware of anything you can't do in a daemonset or container-hook (a container hook run as a privileged container). FYI, kubernetes/kubernetes#53533 talks about why enabling swap might be problematic. |
Reading that issue, it sounds like the swap issue rears its head when you set memory limits on pods. The presence of swap confounds Kubelet's or Docker's ability to decide when a pod is approaching its memory limit. Currently we don't use any resource limiting in Kubernetes, so I don't think we'll hit the issue. In our environments, we see the operating system's OOM killer killing our containers when the node starts running out of memory. I'll look into doing this with a DaemonSet or hook container, but I'd read in a few places that it's difficult to manipulate the host's systemd from within a container. |
I don't know that this is a great fit for a DaemonSet, because it's one-time systemd setup. Job/CronJob DaemonSets are tracked in kubernetes/kubernetes#36601. There are a bunch of workarounds provided, most of which aren't very palatable. One of the most common is to design a DaemonSet container which does the job, then sits in a busy-wait loop forever. This would mean leaving a container with access to the underlying host systemd lying around forever, which is a security risk. Other solutions become more and more hacky, like the DaemonSet waiting for all of its pods to report that they're finished, then deleting itself; or having the DaemonSet run using a label selector, then update the label on its own node when it finishes running so it excludes itself from the node. Most of them involve writing a DaemonSet with some Kubernetes privileges that let it manage itself. Running with a hook container becomes tricky for us because we use ECR as our container registry. Doing an At this point we start looking at doing cluster-wide configuration management or rolling our own AMI just to get the utilities in place that would allow us to run a container hook, which is what I'm trying to avoid having to do by using hooks in the first place. 😄 We will probably end up there eventually anyway, but that's a deep rabbit hole that I don't want to go down until I absolutely have to. If the kops hook system would just put in my systemd units unmolested, that would be a really easy workaround for all of the above. Being able to install arbitrary systemd units also seems like a really powerful tool for Kubernetes operators, and seems like a positive contribution to the hook system, overall. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Background
I've just spent two frustrating workdays trying to add a swapfile to nodes in certain of our instancegroups. We're using CoreOS Container Linux as our OS, and are running in AWS. CoreOS's documentation for adding swap to Container Linux is here.
I started by trying to add swap with cloud-init, as described in #3251. However, the
additionalUserData
field is completely unusable with Container Linux. WhenadditionalUserData
is used, kops formats the AWS user data as a multipart mime message. On Container Linux, ignition and/or coreos-cloudinit are incapable of reading this format, and the instance bricks itself on boot. It never passes AWS intialization, and never becomes reachable via SSH. After a few minutes, the ASG restarts it, and it sits in a loop restarting forever.Next I turned to hooks. I was initially pleased to find that the kops hook system allows me to construct systemd unit files, since that's what ingnition or coreos-cloudinit would have done for me anyway. However, the current implementation is extremely limiting.
First, CoreOS want me to add a systemd swap unit, which is a unit file suffixed with ".swap." The kops hook system takes this name and appends ".service" to it, so I end up with a unit file of type ".swap.service," which is wrong.
Second, CoreOS want me to use systemd parameters in the
[Unit]
section which the hook system doesn't support, likeAfter
,RequiresMountsFor
, andConditionPathExists
. If I try to write a full-featured unit file and put it in the "manifest" section, the hook system ends up mangling it. This configuration:Results in this (broken) unit file:
I experimented for a bit with writing additional hooks which would unmangle the unit files written by the previous hooks, but my solutions started to get really ugly, really fast.
Proposed Solution
I'd like to extend the hook system with the ability to slurp the
manifest
field and use it as the unit, completely unmodified. This functionality would be gated behind a hook option to avoid breaking backwards compatibility. Similarly, I'd like to add an option which will prevent kops from mangling the name of the unit by attaching ".service" to the end of it, again gated behind a config option to preserve backwards compatibility.My proposal for the new options (with documentation) is:
useRawManifest
: If set totrue
, the value of themanifest
field will be used, unmodified, as the contents of the systemd unit created by the hook. Therequires
andbefore
options are ignored, and the hook will not generate the[Unit]
section of the unit file for you. This gives operators access to[Unit]
section parameters which are otherwise unsupported. Default:false
.useRawName
: For legacy reasons, if thename
field does not end with ".service" or ".timer," ".service" is appended to name of the systemd unit created by the hook. IfuseRawName
is set totrue
, thename
field will be left unmodified. This is useful for hooks which create units of systemd types aside from ".service" or ".timer," like ".swap" or ".target". Default:false
.I'm willing to work on this once the design is agreed upon.
The text was updated successfully, but these errors were encountered: