Implement atomic deployments for Nulecule application. Fixes #421 #456

rtnpro · 2015-12-15T07:05:54Z

When there's an error during running a Nulecule application, rollback the changes made by stopping the application.

This pull request just takes care of invoking stop on the main Nulecule application and not refactoring the internal implementation of stop.

dustymabe · 2015-12-15T14:56:13Z

Is this approach better than doing the undeploy inside of the provider files? So inside of the deploy() function for each provider, if error, then call undeploy?

Right now you have it all the way up at the NuleculeManager level.. that is ok, just wondering where the best place to do this is.

dustymabe · 2015-12-15T14:56:49Z

Also, if we go with this approach then it affects all providers. Do you mind testing it with all providers and check to make sure it works?

rtnpro · 2015-12-15T15:18:31Z

@dustymabe

Is this approach better than doing the undeploy inside of the provider files? So inside of the deploy() > function for each provider, if error, then call undeploy?

I had thought of it as well. The reason I did not go ahead with it was because, we wanted to undeploy the entire Nulecule application and not just that component which failed to get deployed.

Right now you have it all the way up at the NuleculeManager level.. that is ok, just wondering where the best place to do this is.

Because of the above explanation, this is intended.

Also, if we go with this approach then it affects all providers. Do you mind testing it with all providers and check to make sure it works?

👍 with this. I need to test it on the other providers as well.

dustymabe · 2015-12-15T17:49:25Z

Let me know when other providers have been tested. Also we need to fix unit tests.

We will postpone the merge of this until after tomorrow's release.

dustymabe · 2015-12-18T15:47:08Z

So.. the provider undeploy() functions really need to have "ignore_errors" set as well. Essentially where we have it now is outside of undeply(). Take kubernetes for example; a "componenth" could have multiple artifacts which will have multiple calls to kubernetes api. If the first call fails then subsequent artifacts won't get removed.

undeploy() needs to know to ignore errors and attempt to remove each artifact.

rtnpro · 2015-12-18T16:22:57Z

@dustymabe

undeploy() needs to know to ignore errors and attempt to remove each artifact.

Good point 👍

dustymabe · 2015-12-21T15:13:16Z

atomicapp/providers/kubernetes.py

@@ -203,4 +203,8 @@ def undeploy(self):
            cmd = [self.kubectl, "delete", "-f", path, "--namespace=%s" % self.namespace]
            if self.config_file:
                cmd.append("--kubeconfig=%s" % self.config_file)
-            self._call(cmd)
+            try:
+                self._call(cmd)


wouldn't it be best to add ignore_error to _call() and then _call() and call run_cmd() checkexitcode= arg set?

In that case the try/except would be all the way in run_cmd which already has this functionality built in.

dustymabe · 2016-01-04T20:20:15Z

@rtnpro now that stop for openshift is in can you update undeploy() for it as well?

dustymabe · 2016-01-06T03:59:14Z

ping @rtnpro ^^

rtnpro · 2016-01-06T04:28:23Z

@dustymabe aye!

rtnpro · 2016-01-11T10:39:58Z

@dustymabe Pushed changes for atomic deployment on Openshift provider as well.

dustymabe · 2016-01-12T05:50:19Z

@rtnpro I believe everything LGTM. Since we do have 4 providers and there probably isn't much code can you go ahead and do this for marathon as well?

I should be able to test this stuff out tomorrow on docker/kubernets. I hopefully will get an openshift setup working again tomorrow to test on that as well. In the meantime can you confirm that you have tested that this works (failure on run.. yields a stop).

dustymabe · 2016-01-12T20:40:50Z

ok. after running through some testing on this I'm not convinced this is the best path to take. We might be able to go with this but I think we can do better.

The sticking point I am on now is the state of the system when we start to deploy; after the failed deployment the system should be in the same state it was when we started deployment. One example of how to make our code "fail" and roll back is to have a half deployed application. Essentially part of the application will successfully deploy until it gets to the point at which it tries to deploy an artifact that already exists. That will fail and then we will "roll back" by undeploying the application.

The problem with this is that the undeploy will remove the service that existed before we ran our code, since it removes all artifacts. Is this OK?

I think a better approach my be to, on deploy, run through all artifacts first to see if they already exist. If all artifacts pass the "exists" test then we can run through them all and create them.

Considering everything I have written the change I am proposing (don't start deploy until checked that no artifacts already exist) could actually be done in a separate PR. That PR would take care of the failure case.

dustymabe · 2016-01-12T20:46:16Z

atomicapp/nulecule/main.py

+            logger.error('Application run error: %s' % e)
+            logger.debug('Nulecule run error: %s' % e, exc_info=True)
+            logger.info('Rolling back changes')
+            self.stop(cli_provider, ignore_errors=True, **kwargs)


So in this case we still need to error out after the "stop" has been performed. Otherwise the application returns a good exit code and the user might not realize that it didn't fail.

@dustymabe pushed fix!

dustymabe · 2016-01-12T21:24:57Z

I think a better approach my be to, on deploy, run through all artifacts first to see if they already exist. If all artifacts pass the "exists" test then we can run through them all and create them.

Considering everything I have written the change I am proposing (don't start deploy until checked that no artifacts already exist) could actually be done in a separate PR. That PR would take care of the failure case.

Opened #501 for this.

dustymabe · 2016-01-15T05:02:53Z

@rtnpro can you address the remaining items in this PR? Don't worry about #501 for now.

kadel · 2016-01-19T15:44:38Z

I tested this with couple of my examples on Marathon and OpenShift. Everything looks fine 👍

~~Only thing I have missing is returning error after roll back as @dustymabe mentioned, otherwise~~ LGTM

dustymabe · 2016-01-19T16:41:47Z

atomicapp/nulecule/base.py

@@ -302,6 +303,7 @@ def stop(self, provider_key=None, dryrun=False):
        provider_key, provider = self.get_provider(provider_key, dryrun)
        provider.artifacts = self.rendered_artifacts.get(provider_key, [])
        provider.init()
+        provider.ignore_errors = True


Is this supposed to be provider.ignore_errors = ignore_errors

Good catch 👍

However, there's no usecase where ignore_errors is gonna come as False here.

dustymabe · 2016-01-19T16:54:52Z

atomicapp/providers/openshift.py

+                    self.oc.delete(url)
+                except Exception as e:
+                    if not self.ignore_errors:
+                        raise e


the self.oc.scale() above (line 459) will fail if the rc doesn't exist in the target. We may need to consider putting the try/except deeper in the code. One extreme would be to put it at the Utils.make_rest_request() level, the other extreme would be to simply try/except around the scale call above. Thoughts?

I think this may be the only part @rtnpro you should have a look at, other code LGTM.

dustymabe · 2016-01-19T16:57:36Z

Let's postpone merging this PR til after tomorrows release.

cdrage · 2016-02-08T15:53:04Z

heads up that all tests pass for this 👍

although this PR needs rebasing now due to time that's passed

…tomic#421 When there's an error during running a Nulecule application, rollback the changes made by stopping the application.

Fixes projectatomic#421

…mic#421

…jectatomic#421

rtnpro · 2016-02-09T15:37:44Z

@cdrage @dustymabe rebased!

cdrage · 2016-02-10T13:58:29Z

Other than my one comment, LGTM and tests have passed!

dustymabe · 2016-02-11T16:54:27Z

Postponing until after GA as we are going to limit our changes to bugfixes.

cdrage · 2016-04-19T20:26:42Z

Just an update, we are still planning on implementing this, although focus at the moment has been to converting the docker and k8s providers to their respective API implementations.

rtnpro added the enhancement label Dec 15, 2015

rtnpro added this to the CDK 2 GA milestone Dec 15, 2015

rtnpro assigned kadel Dec 15, 2015

rtnpro force-pushed the atomic-deployment branch from cfd0321 to 3ef9939 Compare December 21, 2015 10:36

dustymabe reviewed Dec 21, 2015
View reviewed changes

rtnpro force-pushed the atomic-deployment branch from 3ef9939 to 7db3cca Compare December 22, 2015 14:31

rtnpro mentioned this pull request Dec 22, 2015

Remove running components if one component fails #319

Open

rtnpro force-pushed the atomic-deployment branch from 445607b to 3e85bff Compare January 11, 2016 10:39

dustymabe reviewed Jan 12, 2016
View reviewed changes

rtnpro force-pushed the atomic-deployment branch from ce9d033 to 1bdd309 Compare January 19, 2016 09:45

rtnpro force-pushed the atomic-deployment branch from 0ed4f1c to 629a96d Compare January 19, 2016 15:45

dustymabe reviewed Jan 19, 2016
View reviewed changes

rtnpro force-pushed the atomic-deployment branch from 629a96d to 68c2303 Compare January 19, 2016 16:52

dustymabe reviewed Jan 19, 2016
View reviewed changes

rtnpro added 5 commits February 9, 2016 21:06

Implement atomic deployments for Nulecule application. Fixes projecta…

deeaec6

…tomic#421 When there's an error during running a Nulecule application, rollback the changes made by stopping the application.

Fixed unittests for refactored NuleculeComponent.stop() signature

0afcf1f

Fixes projectatomic#421

Implement atomic deployments for openshift provider. Fixes projectato…

1153b29

…mic#421

Implemented atomic deployments for marathon provider.

37b31c9

Raise an error when rolling back an application deployment. Fixes pro…

2ffb11c

…jectatomic#421

rtnpro force-pushed the atomic-deployment branch from 68c2303 to 2ffb11c Compare February 9, 2016 15:37

This was referenced Feb 9, 2016

kubernetes: remove running components if one component fails #429

Open

openshift: remove running components if one component fails #428

Open

Should the Docker provider remove the deployed containers on failure? #421

Open

dustymabe assigned rtnpro and unassigned kadel Feb 11, 2016

dustymabe modified the milestones: CDK 2.1, CDK 2 GA Feb 11, 2016

dustymabe mentioned this pull request Apr 4, 2016

[WIP] k8s cmd to api #653

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement atomic deployments for Nulecule application. Fixes #421 #456

Implement atomic deployments for Nulecule application. Fixes #421 #456

rtnpro commented Dec 15, 2015

dustymabe commented Dec 15, 2015

dustymabe commented Dec 15, 2015

rtnpro commented Dec 15, 2015

dustymabe commented Dec 15, 2015

dustymabe commented Dec 18, 2015

rtnpro commented Dec 18, 2015

dustymabe Dec 21, 2015

dustymabe commented Jan 4, 2016

dustymabe commented Jan 6, 2016

rtnpro commented Jan 6, 2016

rtnpro commented Jan 11, 2016

dustymabe commented Jan 12, 2016

dustymabe commented Jan 12, 2016

dustymabe Jan 12, 2016

rtnpro Jan 19, 2016

rtnpro Jan 19, 2016

dustymabe commented Jan 12, 2016

dustymabe commented Jan 15, 2016

kadel commented Jan 19, 2016

dustymabe Jan 19, 2016

rtnpro Jan 19, 2016

rtnpro Jan 19, 2016

dustymabe Jan 19, 2016

cdrage Feb 10, 2016

dustymabe commented Jan 19, 2016

cdrage commented Feb 8, 2016

rtnpro commented Feb 9, 2016

cdrage commented Feb 10, 2016

dustymabe commented Feb 11, 2016

cdrage commented Apr 19, 2016

Implement atomic deployments for Nulecule application. Fixes #421 #456

Are you sure you want to change the base?

Implement atomic deployments for Nulecule application. Fixes #421 #456

Conversation

rtnpro commented Dec 15, 2015

dustymabe commented Dec 15, 2015

dustymabe commented Dec 15, 2015

rtnpro commented Dec 15, 2015

dustymabe commented Dec 15, 2015

dustymabe commented Dec 18, 2015

rtnpro commented Dec 18, 2015

Choose a reason for hiding this comment

dustymabe commented Jan 4, 2016

dustymabe commented Jan 6, 2016

rtnpro commented Jan 6, 2016

rtnpro commented Jan 11, 2016

dustymabe commented Jan 12, 2016

dustymabe commented Jan 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dustymabe commented Jan 12, 2016

dustymabe commented Jan 15, 2016

kadel commented Jan 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dustymabe commented Jan 19, 2016

cdrage commented Feb 8, 2016

rtnpro commented Feb 9, 2016

cdrage commented Feb 10, 2016

dustymabe commented Feb 11, 2016

cdrage commented Apr 19, 2016