New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre/Post Install Script functionality #1766

Closed
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
8 participants
@austinmoore-
Copy link
Contributor

austinmoore- commented Feb 2, 2017

I've been using this locally for a while, and figure I'd make a PR to see if you guys would want to bring it in. It adds the ability to run a script before and/or after nodeup.

The pre/post scripts are set via environment variables, exactly like NODEUP_URL:

PRE_INSTALL_SCRIPT_URL=https://s3.amazonaws.com/my-bucket/pre-install-script POST_INSTALL_SCRIPT_URL=https://s3.amazonaws.com/my-bucket/post-install-script kops update cluster

You can also set PRE_INSTALL_SCRIPT_HASH and POST_INSTALL_SCRIPT_HASH like NODEUP_HASH.

It's a bit of a hack; I don't like the environment-variable-as-arguments pattern, but I don't have much experience with Go, and I needed something to get something working.

There is one caveat. The post-install step does not actually execute after nodeup has completed. It executes after nodeup has been started. This is because nodeup is launched in the background. I've tried to wait for it to finish, but there seems to be a weird race condition with docker and needing User Data to finish first (something like that).

In my opinion, I think if we were to continue down the pre/post install script path, a good end result would be to move these into the instance group spec. Something like PreInstallScriptUrl and PostInstallScriptUrl for each instance group. I think instance groups would be a good spot for startup stuff like this, and it's better than having it passed in via environment variables.


This change is Reviewable

austinmoore- added some commits Nov 2, 2016

Merge commit 'fed68310fa8881c122775fa169838128ebafe58a' into pre-post…
…-install-script

Conflicts:
	upup/models/cloudup/_aws/resources/nodeup.sh.template
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 2, 2017

Hi @austinmoore-. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@justinsb justinsb modified the milestone: 1.5.2 Feb 11, 2017

@justinsb

This comment has been minimized.

Copy link
Member

justinsb commented Feb 11, 2017

@k8s-bot ok to test

@justinsb

This comment has been minimized.

Copy link
Member

justinsb commented Feb 11, 2017

So this is awesome. A few thoughts:

  • Is there a specific need for a pre & post script? (The pre script, particularly)
  • I think it might be much easier to have nodeup run the script. I'm terrified of breaking that bash bootstrap :-) Also, this could allow retries etc.
  • I really like the idea of putting it in as env vars initially, and marking it experimental in some way. That could be simply documenting it as experimental, or glog.Warning("Using experimental pre/post scripts...")
  • Having this at the cluster level is probably more convenient, but then puts more work onto the scripts. We could start passing env vars with e.g. the Role for the instance group, so you can easily check if you are a master or a node. We could also just provide the full Cluster & InstanceGroup spec, in JSON format, and then you could use jq. Unsure - I suspect we do the bare minimum for what people need today and then look at what further use cases exist as they pop up.
@austinmoore-

This comment has been minimized.

Copy link
Contributor Author

austinmoore- commented Feb 14, 2017

So for the pre-install script, we're doing some dynamic Chef provisioning. One of the things provisioned is the CloudWatch Logs Agent. The reason I have this run in the pre-install is so that I can get logs like cloud-init's into AWS as the instance is coming up.

I'd like for us to use systemd for these types of dynamic startups, so our AMI is good to go without needing to run Chef on top of it. But until we get there, I need to run Chef with this pre-install script.

I currently only use the post-install for one thing: enabling RBAC. I have a simple script that adds the necessary RBAC flags in kube-apiserver.manifest. Once support for this is added to kops, I won't need it anymore.


As you can see, I've refactored the bash script that runs in User Data. I refactored it so that I could download the pre-/post-install scripts using the same code that's used to download nodeup. I understand that it's a critical piece of code though, so if you want, we can figure out a way to get this into nodeup without risk breaking the bash.


I agree that we should just stick with the bare minimum. As I've seen mentioned before, this type of thing is an anti-pattern. But it is a nice escape hatch to have when you really need it.

@justinsb justinsb modified the milestones: 1.5.2, 1.5.3 Feb 23, 2017

@nneul

This comment has been minimized.

Copy link

nneul commented Feb 27, 2017

I'd very much like to see this in the instance group settings as well. It would be cleaner than passing in environment variables externally.

@mirague

This comment has been minimized.

Copy link

mirague commented Feb 28, 2017

+1

Would love to have this!

@blakebarnett

This comment has been minimized.

Copy link
Collaborator

blakebarnett commented Mar 23, 2017

Doooooooonnnnn't dooooo iiiiiiiit.... ;)

@justinsb

This comment has been minimized.

Copy link
Member

justinsb commented Mar 29, 2017

@austinmoore- how would you feel about "actions" which run at a certain time. I'm very wary of binding things into the bootstrap script - I figure that script is very much an implementation detail, and it looks like it will vary between providers.

I'm thinking we have a few "triggers":

  • once asap (cloudwatch agents)
  • once after nodeup has finished (most tweaks)
  • after (re)boot

And the actions I'm thinking about:

  • install a package
  • run a docker image
  • run a kubernetes job
  • kubectl apply something (though we likely have a better mechanism for this)
  • run a script (I'd prefer a job or docker image)

And for how we express it, I'm inspired by the liveness checks in kubernetes itself, which have exec actions and http actions.

Obviously we don't want to do all of this right away, but I'm thinking this is a sustainable mechanism that doesn't lock us into that script, and it becomes an official part of the cluster specification.

I can code up at least the framework unless you're keen on doing it :-)

cc @kris-nova @yissachar @geojaz @chrislovecnm

@yissachar

This comment has been minimized.

Copy link
Contributor

yissachar commented Mar 29, 2017

@justinsb That sounds very interesting. Do you want to write up a more fleshed out proposal before diving into code? Or at least some examples of what the cluster spec would look like.

@justinsb

This comment has been minimized.

Copy link
Member

justinsb commented Apr 1, 2017

I put up a very simple PoC in #2260 - it can only run a docker command, but it shares most of the logic with the protokube bootstrapping.

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Apr 1, 2017

One thing to consider is to attempt to make this as os independent as possible. Focus primarily around driving this via systemd. Here is a good article around this systemd capability. http://www.richardbucker.com/2015/11/systemd-run-once.html

Possibly abstracting the creation of systemd units.

@austinmoore-

This comment has been minimized.

Copy link
Contributor Author

austinmoore- commented Apr 5, 2017

I'm going to close this in favor of the hooks solution @justinsb described, and has begun implementing in #2260.

@austinmoore- austinmoore- deleted the austinmoore-:pre-post-install-script branch May 9, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment