EC2 spot provision #43

Merged
merged 13 commits into from Apr 29, 2013

Conversation

Projects
None yet
8 participants
Contributor

ejc7119 commented Apr 5, 2013

Refactored pull request for initial EC2 Spot support. Spot instances can be provisioned with a specified price.

AMIs need to be configured with start up scripts to register themselves with Jenkins master once they have been fulfilled. For information on what is required for configuring an AMI see our guide here.

runnirr and others added some commits Mar 26, 2013

@runnirr runnirr Add UI fields for Spot instances
Add in an optional block for enabling Spot slaves and specifying a max bid
price including help text.
e3afb03
@runnirr runnirr Startup Scripts for Spot Instances
Spot instances can take a while to start up. These scripts are meant as
samples of how to set up an EC2 instance to register itself as a headless
slave to the Jenkins server when it has been launched.
2c65871
@runnirr runnirr Refactor EC2Slave to EC2SpotSlave and EC2OndemandSlave
Create an abstract super class for EC2 slaves with ondemand and spot
subclasses. If spot specific fields are provided, we spawn a EC2SpotSlave
which should be configured to register themselves with Jenkins. This prevents
Jenks from having to poll EC2 to check and see if the Spot instance has
started up prior to connecting to it via ssh. This also makes transitioning
to using persistant spot bids easier.
b295be0
@runnirr runnirr Add unit test for Spot configuration
Create unit tests that test the configuration of Spot slaves.
08f8a92
@ejc7119 @runnirr ejc7119 EC2 type is now displayed on node configure page. If the node is a Sp…
…ot instance, the specified bid price will also be displayed.
13380eb
@cms5822 @runnirr cms5822 Launcher page
Computer launcher now displays proper message for spot instances
f9a61b2

Jenkins » ec2-plugin #68 SUCCESS
This pull request looks good
(what's this?)

Member

francisu commented Apr 10, 2013

Hi Adam (and all), (Kevin, please read this)

First of all, thanks for your work on this and your patience while we review and criticize it. Hopefully we can get this all done soon; I'm not going to merge anything else until we get your stuff in (hopefully this will happen in the matter of a few days) to save you the work of having to address merge conflicts.

I have started to review your latest work on the spot instances stuff, and I accept your reasoning to require that the Spot instances register themselves with Jenkins as they start; I'm sympathetic with the difficulties of having Jenkins do the polling (which could take forever). Kevin, I'm interested if you share my opinion on this.

That being said, we need to consider the poor user who is trying to use spot instances for the first time and has not read your user's guide or the Wiki (btw, when your work is committed, your user's guide needs to be put into the Wiki so that it can be uniformly accessed and maintained). We need to make sure the help in the actual configuration guides the user about what to do if they are using spot instances (it will not be obvious to them that they need to include the scripts to have their AMI notify Jenkins upon startup). So if they check the "Use Spot Instance" they get very clear instructions as to how to configure their AMIs with the appropriate links to the scripts.

In the case of manually provisioning a spot instance, you need to have a very clear message that says that it's waiting for a spot instance to start, and that it expects the special scripts you profile (with a link to the scripts) to be included such that the instance notifies Jenkins when it starts. In the case of the automatic provisioning, there needs to be a similar special message that it's waiting for a spot instance (preferably providing helpful information about the price it's looking for and the current price, if that's easy to do).

I have looked through your code changes briefly at this point, and they seem to be in a state where I (and others) can do a thorough review. I also don't see any obvious problems (other than those mentioned above) about how you have approached this. I appreciate your addressing our previous comments to make your changes reviewable.

Tomorrow (California time), I will spend some time testing your code and may have further comments about that.

Contributor

cms5822 commented Apr 10, 2013

Hey Francis,
I've actually added a message to the configuration page that will display after a user has selected Spot Instances as the slave type. Unfortunately, it does not currently link to the necessary script file because we don't have a permanent place that we can host this file. If you have any recommendations, please let us know.
Screen Shot 2013-04-10 at 10 55 22 AM

Member

francisu commented Apr 10, 2013

Having the message appear like that when you check the Use Spot Instance is exactly what I wanted; excellent!

One way to access the file would just be to provide a link to the file on GitHub, I think that's easy enough to do. And the link would have the tag that corresponds to the version of the plugin (hopefully there is a way to do that without wiring it in), so that you would always have the script that goes with the right version (in case there are changes). Failing that, you could just link to the current version. Another possibility (and this is probably a better idea) is to provide these scripts as part of the deployment of the plugin, so that when the plugin is installed they are put in some place in the Jenkins installation. That might be easier (I'm sure there are some situations where Jenkins users can't get to the internet from their machines for security reasons). You might want to see how other plugins accomplish this and if you are stuck with how to do this, ask on the main dev mailing list to see if anyone has ideas.

Contributor

kpfleming commented Apr 10, 2013

Well, there's a lot of cool stuff here. Thanks for the excellent work breaking up the changes into easily reviewable hunks, that made an enormous difference! For the longer term, I want to completely rewrite this plugin to remove a lot of its ugliness, but I won't be able to put time into that for a few weeks at best :-)

A few notes, in no particular order:

  • You've changed EC2Slave into EC2OnDemandSlave; you will need to implement data migration for existing configurations and thoroughly test that persisted slaves are properly converted.
  • Please don't define methods in classes that override their superclass' method and only call the superclass method; they are superfluous. The superclass method will be called if the child class does not override it, which is adequate.
  • Why attach an EC2UnixLauncher to spot slaves, only to have the launcher choose to do nothing when it is called? Is this needed for some reason?
  • The various interrelated spot config options should all be validated at the time of configuration. It's good to recheck them at launch time, but right now if two of them are set and one is not, the launcher will launch an on-demand instance instead, which could be a subtle-to-debug problem for users. If the user has set spotConfig, no on-demand instance should be launched from that template under any circumstances.
  • You've duplicated a bunch of the code that is used to launch instances, this should be refactored. In addition, it appears that the spot slave launching code doesn't set tags on the launched instance, and it builds a describeInstancesFilter for no purpose, as there won't ever be any stopped stop instances to reanimate.
  • On that subject... can spot instances be stopped and restarted? If so, retaining the ability to stop-on-idle and reanimate would be mandatory, as that's a very useful feature.

There are a lot of coding style issues as well, but since they don't affect functionality I'll hold off on them until after we've come to agreement that this code works and is suitable for merging; I'm fine with making those changes myself in a followup pull request to keep them separated in case they cause any regressions.

Member

francisu commented Apr 10, 2013

I agree with Kevin's comments and I will hold off testing this until his comments (and mine) are addressed and the pull request has been revised or replaced.

aji9861 commented Apr 11, 2013

Thank you for the comments. I will try to answer some of your concerns here.

Kevin,

  • Will this require keeping an EC2Slave class to handle the persisted nodes? Or are there other points in the plugin in which similar situations are handled? If not, we will have to look into this more.
  • We will correct this.
  • The EC2UnixLauncher tries to connect as soon as the node is provisioned. With Spot nodes, there is nothing to connect to so the attempt fails. This results in the error message that used to be present on the slave startup screen, so instead we don't try to connect as soon as the node is in Jenkins. We can add additional comments to the file to outline this so that the next developer can understand its purpose.
  • Good catch. We fix this.
  • We will look into refactoring some of the code into methods and removing unused sections.
  • It appears that Spot instances cannot be stopped and restarted, just terminated or rebooted. We can discuss with Amazon if there is something we are missing so that we can include that functionality either now or down the road.

In addition to these we will look into a better way of distributing the callback scripts for users (as per Francis)

How should we go about correcting these issues? Is another pull request easier, or should we just add commits to the current one so that everything is kept in a single location?

Member

francisu commented Apr 11, 2013

Just add commits to this one; when I review stuff I don't pay any attention to the commits, but just to the net of the changed files, so don't worry about having lots of commits in the pull request. If we do that, then we can keep a consistent record of why the changes were done.

Contributor

kpfleming commented Apr 11, 2013

Answering in order:

  • No, there are ways to intercept the deserialization and replace the old EC2Slave class with the new one. If you look in PluginImpl, you can see how an alias was added when hudson.plugins.ec2.EC2Cloud became AmazonEC2Cloud.
  • Thanks.
  • My point was that for spot instances, it may not be necessary to provide a Launcher at all, but to be honest I haven't looked at that part of the Jenkins core to see how it would handle that situation. It's worth investigating, since you are essentially creating the same type of node that a user would create themselves.
  • Thanks.
  • Thanks.
  • OK, if they can't be stopped, that makes sense (and I can understand why Amazon would disallow that). If that is the case, you may not need to provide a RetentionStrategy object for a spot node at all; if a spot node becomes idle, it would be logical to just terminate it immediately.

I wouldn't worry about the scripts too much right now, that can be handled later. Once your code is merged, I'll proceed with my AMI Builder patch for this plugin, which could be used to make the process of making a spot-node capable AMI much easier for users (and we might be able to teach the plugin what needs to be done, but that's a discussion for a later day).

@cms5822 cms5822 Link to ami scripts in UI
Moved the ami scripts to the webapp folder; they are now packaged in
the .hpi file and are put into the jenkins static resources directory
upon plugin install. The description message under the chechbox for
Spot Instances now includes a link to the ami script.
af4e621

Jenkins » ec2-plugin #72 SUCCESS
This pull request looks good
(what's this?)

aji9861 commented Apr 11, 2013

I tried passing in a null launcher and it launched the slave the same, but shows as a jnlp slave. (screenshot attached). I think this is not an ideal presentation for the user.
jenkins-spot-start 1

For the retention strategy there would be few situations in which it might be valid. Spot requests can be set to one-time - once the instance is terminated the Spot request is also canceled - or persistent - if the instance is terminated the Spot request is resubmitted and a new instance can be started (the persistent option isn't included yet). Also, something to consider in the future is letting the instance stay up for a full hour since that is what would be paid for. There is always the chance the instance is terminated by Amazon before the hour and then the user doesn't need to pay for it.

If you have other thoughts let us know.

Contributor

kpfleming commented Apr 12, 2013

Well, I guess it is a JNLP slave to some extent, as the slave startup script is using the third method of startup listed there. I agree this could be confusing though. Rather than putting special-case code in the launcher to check whether the instance is a spot instance though, just create another launcher class with a suitable name and use that.

I don't think the retention strategy will play into the spot instance termination (by AWS) decisions. If AWS kills the spot instance, the job will go back into the queue and will look for another node to run on, without any reference to the retention strategy. The retention strategy, as far as I know, is only used to determine whether idle nodes should be kept connected/alive or not. Since spot instances can't be stopped and restarted, it might be good if the spot instance retention strategy allowed the instance to stay idle until 59 minutes of its lifetime, and then killed it if no jobs need it (since as you say, the hour is paid for either way).

Contributor

ejc7119 commented Apr 12, 2013

We tried killing an EC2 Spot instance with a job running on it and the job failed. It was not placed back into the queue to be run on another node. To solve this issue we have been working on our own plugin, https://github.com/jenkinsci/jobrequeue-plugin, which allows for jobs to be configured to be placed back in the queue if the node it's running on goes down.

We were thinking that once the Spot functionality is merged, this plugin could be added as a dependency.

Contributor

kpfleming commented Apr 15, 2013

Thanks for following up on that; without this ability, I think the practicality of spot instances would be reduced quite a bit, as jobs would be marked as failed when the job didn't actually fail.

Member

bwall commented Apr 15, 2013

Yeah, that is why we wrote it. It is in a separate plugin because because it adds functionality which is useful in situations other than EC2/Spot as well.

@runnirr runnirr Fix some concerns with use of Spot instances
Remove unecessary super call from EC2AbstractSlave
Refactor provisioning of spot/ondemand instances to reuse code where possible
Add tags to Spot Request
Remove the connectViaSsh field from slaves
Create a new launcher for Spot slaves
Add a check for the specified Spot price before requesting an instance
Add additional comments
11dae84

Jenkins » ec2-plugin #73 SUCCESS
This pull request looks good
(what's this?)

This looks suspect to me; you're calling reconfigure() from inside the reconfigure() method itself. I'm not a Java expert, but I'd be surprised if this did not result in a recursive call.

I would rewrite this comment, because there is a more important underlying issue: you can't use the EC2 generated instance ID as the Jenkins node name because you won't know what the instance ID is until (if) the spot request is fulfilled. If you did know it at this point, you could use it, but you don't know what it will be in the future :-) Something like:

We must provide a unique node name for the slave to connect to Jenkins, since unlike on-demand instances, we won't have the EC2-generated instance ID until after the spot instance is launched, and the Jenkins node must be created before that occurs.

Contributor

kpfleming commented Apr 16, 2013

Other than some mostly minor notes, this commit improves the code quite a lot. Nice work!

Jenkins » ec2-plugin #74 SUCCESS
This pull request looks good
(what's this?)

ejc7119 and others added some commits Apr 16, 2013

@ejc7119 ejc7119 Added data migration handling. Moved ec2Type into an abstract method …
…for handling backward compatability.
8f34771
@runnirr runnirr Address inline comments from review
Fix typos in comments
Update comments with SlaveTemplate
Fix potential infinite recursion
7c5f7f7
@runnirr runnirr Update slave launcher page for EC2SpotComputerLauncher 929194b

Jenkins » ec2-plugin #75 SUCCESS
This pull request looks good
(what's this?)

Jenkins » ec2-plugin #76 SUCCESS
This pull request looks good
(what's this?)

Member

francisu commented Apr 16, 2013

Looks like you folks are getting dangerously close to being done with this and it's ready for me to test and have a more detailed look (though I don't expect to find something given the thoroughness of Kevin's comments and that you have tested it completely). Please let me know when you are finished with all of your work with that and I will have a look within a day or so. Thanks again for all of your work on this!

Contributor

kpfleming commented Apr 17, 2013

Looks like all of my comments have been addressed. Thanks.

aji9861 commented Apr 18, 2013

Thank you for all of your feedback throughout this Kevin.

Francis, if you are looking to test this, I set up an AMI that calls back to Jenkins on us-east (ami-664e2f0f). This is based off of the Ubuntu 64bit 12.10 image. It should be publicly available if you want to use that.

If you want to set up your own, we have the steps in our user guide (which will eventually need to move over to the wiki). Section 6 describes configuring your own image to run with Jenkins/Spot. Some of the information in the other sections need to be re-looked at and updated.

Please let us know if you have any other concerns.

Thanks again for working with us through this process.

@runnirr runnirr Fix an issue when reconfiguring slaves
Spot instances would cause a NPE. This should be fixed now.
86e973d

Jenkins » ec2-plugin #82 FAILURE
Looks like there's a problem with this pull request
(what's this?)

Member

francisu commented Apr 29, 2013

Thanks for all of your work on this. I'm happy with the results of my testing.

francisu merged commit f473c09 into jenkinsci:master Apr 29, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment