Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate apps to CloudFormation scheduler #814

Merged
merged 1 commit into from
May 10, 2016
Merged

Conversation

ejholmes
Copy link
Contributor

@ejholmes ejholmes commented May 3, 2016

This is my initial thought on how we might migrate to the new CloudFormation based scheduler:

  1. All existing apps will continue to use the old scheduler.
  2. New apps will use the new scheduler.
  3. When you want to migrate an app to the new scheduler, you set an environment variable.

TODO

  • Delete from scheduler_migration when destroyed.
  • Docs about upgrading Empire to the CloudFormation backend.
  • Add existing apps to scheduler_migration table.

// Migrate submits the app to the CloudFormation scheduler, waits for the stack
// to successfully create, then removes the old API managed resources using the
// ECS scheduler.
func (s *MigrationScheduler) Migrate(ctx context.Context, app *scheduler.App) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the magic happens. The ordering is less than ideal, since it will inevitably incur some downtime.

Ideally, this would:

  1. Create the CloudFormation stack.
  2. Wait for it to update.
  3. Swap DNS over to the new stack.
  4. Once you've verified things are working, remove the old resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, a little better now. It will create the stack without any dns changes, destroy the old resources, then update the new stack with the CNAME.

@mwildehahn
Copy link
Contributor

The process of switching over new apps to cloudformation looks great.

For migrating an existing app, my ideal would be:

  • bring up new stacks
  • test they're working (maybe with some temporary dns record?)
  • swap dns record from old stacks to new stacks
  • check to make sure everything is working again
  • remove old stacks

my hesitation around the 10 minute wait time as well as waiting to remove the old app before bringing up the new one is just issues i've had previously where you can't delete a resource because something else depends on it or bringing up a stack takes longer than you expect. i don't think the removal of something with a dependency is an issue if you assume empire has managed the whole process, but that would be my worst case scenario thinking.

@ejholmes
Copy link
Contributor Author

ejholmes commented May 4, 2016

Yep, agreed. I think making it a 3 step manual process like that would be ideal.

@ejholmes
Copy link
Contributor Author

ejholmes commented May 5, 2016

Ok, I added some tests for the different migration steps and scenarios and addressed all the comments. I'll probably hold off on merging this until I have a chance to test migrating a couple of apps in our staging environment to see if any issues pop up.


// For newly created apps.
if err == sql.ErrNoRows {
backend = "cloudformation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious when you decide to use named returns vs not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is a named return only because of an existing change I had in here, where the return value was (string, string, error). Generally only used named returns to remove ambiguity in the return values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And fixed.

@mwildehahn
Copy link
Contributor

👍

@ejholmes
Copy link
Contributor Author

ejholmes commented May 5, 2016

Also, been thinking about how the scheduler changes will play out in new releases, and when to deprecate the old scheduler. My initial plan is in https://github.com/remind101/empire/wiki/Scheduler-Deprecation-Plan.

@ejholmes ejholmes modified the milestone: 0.11.0 May 6, 2016
@ejholmes
Copy link
Contributor Author

ejholmes commented May 7, 2016

Discovered an issue with this when moving one of our larger apps that generates a (very) large CloudFormation template (~830kb), which goes way over the 460kb limit when using s3 templates: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-limits.html

So, this means for large stacks, we'll need to generate multiple templates, then combine them into a single stack using nested templates. I think that should be pretty easy to do, and shouldn't have too many downsides. We can just generate a separate stack file for each process, then submit a really small template to CloudFormation.

@ejholmes
Copy link
Contributor Author

ejholmes commented May 7, 2016

Actually, nested stacks may not work well since they actually create separate cloudformation stacks, which complicates how we return the running tasks for an app. Not impossible if we have to go down that route, but then it makes it easier to hit the cloudformation stack limit.

This is a pretty exceptional case, so maybe there's something that we can do to reduce the size of templates. Most of it is duplication in task definitions anyway.

@ejholmes
Copy link
Contributor Author

ejholmes commented May 7, 2016

Alright, the vast majority of the size was just because we were using json.MarshalIndent. Getting rid of that cuts the file size in about half. May be some more that can be done to minify templates further, but for the 99%, this will be fine I think.

@ejholmes
Copy link
Contributor Author

Ok, used this to migrate 7 applications in our staging environment, and I'm pretty happy with the flow. For future reference, there's a Scheduler Migration Guide in the wiki that walks through the steps that this implements.

@ejholmes ejholmes merged commit f58051a into master May 10, 2016
@ejholmes ejholmes deleted the scheduler-migration branch May 10, 2016 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants