New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running script on instance startup #387

Closed
yissachar opened this Issue Aug 30, 2016 · 51 comments

Comments

Projects
None yet
@yissachar
Copy link
Contributor

yissachar commented Aug 30, 2016

I have a requirement that certain scripts need to be run whenever a new instance is brought up (node or master). For example, for compliance purposes let's say I have to ensure that all packages are up to date. I need to run:

sudo yum update

Whenever a new instance is started, whether as part of of the initial kops turn-up, or the ASG triggering a new instance.

How can this be accomplished with kops?

@justinsb justinsb added this to the 1.3.0 milestone Aug 30, 2016

@justinsb

This comment has been minimized.

Copy link
Member

justinsb commented Aug 30, 2016

It's an anti-pattern, but I'll do this anyway. We can make it clear in the docs the alternatives that users should be using instead, and ask them to open issues.

@justinsb justinsb added the P1 label Sep 1, 2016

@lattwood

This comment has been minimized.

Copy link
Contributor

lattwood commented Sep 7, 2016

I'm looking to start dnsmasq and modify resolv.conf on boot, it could be done with an arbitrary command

@adiri

This comment has been minimized.

Copy link

adiri commented Sep 8, 2016

+1 :) This is important :)

@ls-yann-david

This comment has been minimized.

Copy link

ls-yann-david commented Sep 14, 2016

That'd be nice for chef bootstrapping! +1

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Oct 27, 2016

I would recommend another approach. I would encourage users to use there own aws node, instead of injecting a script. Thoughts?

@jkemp101

This comment has been minimized.

Copy link

jkemp101 commented Oct 27, 2016

I kind of agree with @chrislovecnm. I build a custom AMI so I can add my puppet bootstrapping scripts and other modifications I need.

@yissachar

This comment has been minimized.

Copy link
Contributor Author

yissachar commented Oct 27, 2016

Using our own AMI would give us the most control, but it also introduces a management headache that we don't necessarily want to incur just to run a script on instance startup.

@hubt

This comment has been minimized.

Copy link

hubt commented Nov 8, 2016

A custom AMI also doesn't solve my personal issue of needing to install a different version of docker. Some components get overwritten by nodeup in a way that makes it difficult to customize. The reality of docker stability and bugs means that people will sometimes want to upgrade it or other components without having to upgrade all of k8s.

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Nov 8, 2016

@hubt you should be able to override the docker tag on yaml / edit. If not we need to fix that.

I am happy to have someone design and submit a PR for this support. We need to get a standard design process, as well, but I digress. I understand that we will have edge cases for pre and post scripts.

Here is the challenge ;)

Supporting non OOB installs can be a ton of fun. We know that not everyone can run OOB, but really creating a ecosystem that enables OOB installs is our vision.

What are your guys thoughts? How do we do this well?!

@OleksandrBerezianskyi

This comment has been minimized.

Copy link

OleksandrBerezianskyi commented Nov 14, 2016

@chrislovecnmdo is there a way to override the docker tag on yaml without rebuilding nodeup? we are struggling with getting the docker version update reliably automated. Chef is problematic, because if chef recipe is applied simultaneously with nodeup they could deadlock each other. any other hints?

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Nov 14, 2016

@OleksandrBerezianskyi at this point there is not. Now it should not install if it is already installed. Is this not the case?

@OleksandrBerezianskyi

This comment has been minimized.

Copy link

OleksandrBerezianskyi commented Nov 14, 2016

it is not the case - if docker is already installed then the previous version will be uninstalled during the nodeup

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Nov 15, 2016

@OleksandrBerezianskyi bummer. That should not be the case. You want to file an issue? Or shall I?

@OleksandrBerezianskyi

This comment has been minimized.

Copy link

OleksandrBerezianskyi commented Nov 16, 2016

@chrislovecnm filed an issue #908

@kris-nova

This comment has been minimized.

Copy link
Member

kris-nova commented Nov 16, 2016

This feels strikingly similar to the Terraform provisioners discussions.. Personally I am of the mentality that kops should offer plugin capabilities - but no support.. Despite it being an anti pattern..

In other words, giving the user a clean way of hooking into Nodeup either with an interface/pattern in go or with an executable such as a bash script would be fine..

Of course our support goes out the window once users start hacking onto their clusters, and that might be a community nightmare dealing with the potential issues this could bring..

Another concern would be failures.. if we do a fire and forget the user would have no way of knowing if their provisioner failed. We could have perfectly valid kops clusters floating around - that are perfectly invalid clusters to the user because their plugin failed.

On the other hand, if we did a wait and react approach - we introduce a few other pain points as well as kops could start failing - with the cluster technically online - which sounds dangerous..

Just my 2 cents :)

@kris-nova

This comment has been minimized.

Copy link
Member

kris-nova commented Nov 21, 2016

See my growing proposal on a kops plugin library.. I think creating an open ended library for the community to interface with might be worthwhile..

#958

@kris-nova kris-nova modified the milestones: 1.4.3, 1.4.2 Nov 28, 2016

@yissachar

This comment has been minimized.

Copy link
Contributor Author

yissachar commented Dec 15, 2016

I think the simplest solution would be to just allow users to provide an arbitrary script that gets appended to the to the AWS User Data. Actually, we probably would want to allow them to provide different scripts for the masters vs nodes.

What are the problems with this approach? Would this be accepted if somebody put together a PR for it?

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Dec 15, 2016

@yissachar always my man ... start with a quick PR based design write up. And rock and roll!!

@yissachar

This comment has been minimized.

Copy link
Contributor Author

yissachar commented Dec 15, 2016

@chrislovecnm Do we have a template or example for PR design proposal?

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Dec 15, 2016

We need to document howto do that .... Oh the fun of a growing project.

So we could steal the pattern from the main repo, or I like doing or update the docs, do a quick design in the pr, and start coding

I do like velocity

@OleksandrBerezianskyi

This comment has been minimized.

Copy link

OleksandrBerezianskyi commented Dec 16, 2016

@yissachar this will not help. We need a hook not at the end of AWS User Data but after nodeup has finished. Because nodeup will override everything that is done by your custom script including downgrading versions of packages.

@kris-nova

This comment has been minimized.

Copy link
Member

kris-nova commented Dec 16, 2016

That custom plugin library I am dreaming up is starting to sound pretty good at this point.. #958

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Jun 24, 2017

Added some info here #2795 and we have examples in the source tree as well

@mgarren

This comment has been minimized.

Copy link

mgarren commented Aug 7, 2017

I know this thread hasn't been commented on in a couple months, however I still feel like the hooks functionality isn't quite going to work for some use cases. Here's my example: We prebake amis using a bunch of lvm mount points. Looking at hooks.go, it only maps /, dbus, and systemd. If I have a script that I add to a hook that installs stuff in /opt, and that is an lvm mount point on the host, that seems like that could cause problems. Additionally, I'm trying to use hooks to install stuff like splunk forwarders that require a --accept license flag to be run on first start, which in this case would have to be done from a chroot jail?

@dcowden

This comment has been minimized.

Copy link

dcowden commented Aug 8, 2017

For what its worth, our use case ( after a few days of testing) has proven that the docker-hook is insufficient for us.

In our case, we seek to install an active security agent ( nessus, by Tenable) as a part of node startup. We'd rather not bake this into our base image, for a variety of reasons.

If we had a simple startup script, as @yissachar's PR implements, we would use this.

It turns out that the docker hook, even though it has scary permissions, still doesnt work for the case of installing software. That's because though you can install the files, you cannot use service or systemctl to actually start the service. If you try to use these commands from within a docker container, even with the host mounted, it will fail

We tried experimentally adding --pids=host and --ipc=host when the docker container is run, with the idea that if that worked, it woudl be a trivial change to the existing container. But that doesn't work either.

TL;DR: if you are reading this because you need to run some script on your host-- think twice before trying to use the docker container hook. There are still a LOT of things that will not work unless you are really on the host.

We thought about using cloudwatch triggers, but in AWS, cloudwatch triggers do not really work well, because you need the instance ID to register them. IE, doing this approach would work best as a kops mod.

We use ansible for deployments. Normally, we would use ansible to install our things. But with ASGs in play, its difficult to find all of the instances. Even then, it would even harder to hook it into the scaling lifecycle.

That leaves us with two solutions that require quite a lot of work:

(1) bake Nessus into our base image.
(2) back our base image with a 'phone home' call that installs our software. In this case, we'll probably query for a tag that contains what stuff to install, and then fetch a script. If you think this sounds a lot like puppet, you're right-- it is. We'd probably do that if we used puppet instead of ansbile.

I mention these mainly to illustrate that, while solving this problems are easily described as 'not a kops problem', it certainly saves a LOT of work for users with a VERY small amount of effort ( simply merging a PR that @yissachar has basically already written )

@qqshfox

This comment has been minimized.

Copy link
Contributor

qqshfox commented Aug 9, 2017

In our use case, we need to change the apt sources to some mirror since the connection is not that stable and speed of downloading is not satisfied in China. nodeup will do apt-get update and install some essential packages when it launches. So the container hook is not suitable to these kind of usage. For some reasons, we'd rather not bake this change into the base ami though.

@woodlee

This comment has been minimized.

Copy link

woodlee commented Aug 17, 2017

Since it doesn't technically run right at node startup time, this might not cover all use cases mentioned here, but for many folks the startup-script DaemonSet approach (see https://github.com/kubernetes/contrib/tree/master/startup-script) might work. It's basically just a privileged container that runs an arbitrary script you pass in as an env var. We used it to install xfsprogs on kops-provisioned k8s nodes.

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Aug 17, 2017

We will keep this open, as hooks meet some needs, but it seems that people still need a start script.

@mr-rick

This comment has been minimized.

Copy link

mr-rick commented Sep 16, 2017

While I'd like native support for a post-startup script, there may be an easier hack than doing nodeup builds.

I noticed in the LaunchConfiguration on AWS ASG's, there is a script that kops puts into place. The script is generated by the kops binary. I think I can modify the template in the kops source code and run "make" to get my own kops binary with a modified template. The template is locate here:

https://github.com/kubernetes/kops/blob/master/pkg/model/resources/nodeup.go

I'll try adding my own shell commands at the end of it, recompile kops, and see if it works. In addition, I'll have it download a script from s3 and execute it so that I don't have to make a new kops build each time. I can just modify the script in s3 instead.

If it does, it will be alot easier than making my own AMI or building my own nodeup binary.

I'll report back my findings.

@notmaxx

This comment has been minimized.

Copy link

notmaxx commented Sep 29, 2017

@mr-rick so, do you have any interesting news? The same task is relevant for us as well

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 6, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@Cryptophobia

This comment has been minimized.

Copy link
Contributor

Cryptophobia commented Feb 2, 2018

/remove-lifecycle stale

@Cryptophobia

This comment has been minimized.

Copy link
Contributor

Cryptophobia commented Feb 2, 2018

We would love this feature as well. Baking AMIs here and we can't append easily to EC2 startup scripts when using kops.

@joelittlejohn

This comment has been minimized.

Copy link

joelittlejohn commented Feb 15, 2018

Our use-case is the same as @qqshfox. In China we need to replace the default apt sources so that we are using a China mirror, otherwise when nodeup attempts to apt update it is likely to fail or be painfully slow. Hooks can't help with this as nodeup runs apt update before running hooks. I've baked these changes into the base AMI for now, but this would have been a lot simpler if some custom script could be run before nodeup.

It seems like the result of the discussion here is that hooks solve many of the issues mentioned above, but there are still a variety of things that are valid and can't be done with hooks. So the ability to inject some kind of custom script into the userdata is seen as a desirable and valid feature. There was some WIP in #1766 to achieve this, but that PR has now been closed in favour of hooks.

So I guess we're back to needing a new PR/design proposal?

@Cryptophobia

This comment has been minimized.

Copy link
Contributor

Cryptophobia commented Feb 15, 2018

@joelittlejohn , hooks seem to solve our problem for us and we are using them now. apt update is expected to be the first thing ran when bootstrapping new AMIs anyways. Sounds like nodeup would be a blocker for you in any case because it will always try to run apt update.

@joelittlejohn

This comment has been minimized.

Copy link

joelittlejohn commented Feb 15, 2018

@Cryptophobia Yes, nodeup will always run apt update, which is why a startup script is required to replace the default apt sources before nodeup runs. Hooks can't be used for this.

@Cryptophobia

This comment has been minimized.

Copy link
Contributor

Cryptophobia commented Feb 15, 2018

Yeah, but you should be bootstrapping your sources when making the AMI images is what I said. Bootstrapping as an idea is getting things like apt sources ready before deploying a virtual machine...

@joelittlejohn

This comment has been minimized.

Copy link

joelittlejohn commented Feb 15, 2018

@Cryptophobia I think what we're talking about here is a solution that allows this kind of minimal customisation without the requirement to introduce an AMI bootstrapping step. This is the gist of this comment if I understand it correctly:

#387 (comment)

@Cryptophobia

This comment has been minimized.

Copy link
Contributor

Cryptophobia commented Feb 15, 2018

We have a bootstrap process for the kops.io AMIs every time we deploy new ones. Do you work in an organization or a team where you don't have any bootstrapping and security tools deployed with your instances?

The tradeoff here is a lot of work to be done on something that will create little value and can mostly be done with hooks.

I don't want the kops team to work on things that are easily done by config management tools like Ansible and AMI build pipelines. That's a lot of effort that can applied to other more worthwhile features in kops.

@joelittlejohn

This comment has been minimized.

Copy link

joelittlejohn commented Feb 15, 2018

@Cryptophobia Sorry, we seem to be rehashing the same discussion about which use cases for this feature are valid and which aren't. There seems to be some agreement here that kops should (and could fairly easily) support injecting a custom step into the user data. I'm not trying to suggest that the use case I have is the one that justifies implementation of this feature, only that I would find the feature useful if it existed (and there have been a bunch of other commenters here, each with their own problem to solve, who have said the same).

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented May 16, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jun 15, 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jul 15, 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@KIVagant

This comment has been minimized.

Copy link

KIVagant commented Aug 31, 2018

The issue was auto-closed but I didn't get — was the feature declined or just it's not implemented yet? What's the best workaround? Is the creation of own AMI the only way?

@Cryptophobia

This comment has been minimized.

Copy link
Contributor

Cryptophobia commented Aug 31, 2018

The issue was auto-closed but I didn't get — was the feature declined or just it's not implemented yet? What's the best workaround? Is the creation of own AMI the only way?

The creation of a custom AMI is one way. It can be done with an Ansible script or shell script and automated. @joelittlejohn was making the case that it requires too much work and providing some way to inject a startup script would be nice to have if the kops team were to implement. Not sure what the kops team decided in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment