New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swarm: Add update/rollback order #30261

Merged
merged 2 commits into from Apr 7, 2017

Conversation

@aaronlehmann
Contributor

aaronlehmann commented Jan 18, 2017

Update order is a new parameter to control the order of operations when rolling out an updated task. Either the old task is stopped before starting the new one, or the new task is started first, and the running tasks will briefly overlap.

Previously, stop_first was the only supported behavior. The behavior of start_first could be emulated by temporarily scaling up the service, so for awhile we hesitated to add it. However, there are some cases where it is awkward to change the number of replicas in a service just to support temporary overlap during updates. It doesn't work for some new features we have in mind like automatic preemption.

This PR adds Order to the API, and --update-order / --rollback-order flags to the CLI. It adds an integration test that performs a rolling update in the new start_first mode.

cc @aluzzardi @dongluochen

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jan 27, 2017

Contributor

Still depends on swarmkit PR

Contributor

LK4D4 commented Jan 27, 2017

Still depends on swarmkit PR

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 27, 2017

Contributor

Rebased. This is still waiting for the swarmkit PR, but I've updated it to match a change in the swarmkit PR. --update-rollout was renamed to --update-order.

An additional option called --rollback-order will be needed. Where this is done will depend on whether this PR or #31108 is merged first.

Contributor

aaronlehmann commented Feb 27, 2017

Rebased. This is still waiting for the swarmkit PR, but I've updated it to match a change in the swarmkit PR. --update-rollout was renamed to --update-order.

An additional option called --rollback-order will be needed. Where this is done will depend on whether this PR or #31108 is merged first.

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Mar 10, 2017

Contributor

Rebased and updated to support the rollback case. Still waiting for the swarmkit PR.

Contributor

aaronlehmann commented Mar 10, 2017

Rebased and updated to support the rollback case. Still waiting for the swarmkit PR.

@aaronlehmann aaronlehmann changed the title from [WIP] swarm: Add rollout mode to swarm: Add update/rollback order Mar 21, 2017

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Mar 21, 2017

Contributor

Swarmkit PR was merged

Updated this one to bring it in line, such as Order replacing RolloutMode.

This should finally be ready to go now.

Contributor

aaronlehmann commented Mar 21, 2017

Swarmkit PR was merged

Updated this one to bring it in line, such as Order replacing RolloutMode.

This should finally be ready to go now.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 21, 2017

Member

Looks like CI is failing @aaronlehmann

Member

thaJeztah commented Mar 21, 2017

Looks like CI is failing @aaronlehmann

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Mar 21, 2017

Contributor

CI will be fixed when #31714 is merged. Until then, vendoring swarmkit is blocked. Let's try to make that happen.

Contributor

aaronlehmann commented Mar 21, 2017

CI will be fixed when #31714 is merged. Until then, vendoring swarmkit is blocked. Let's try to make that happen.

@aluzzardi

This comment has been minimized.

Show comment
Hide comment
@aluzzardi

aluzzardi Mar 21, 2017

Member

nit: Underscores vs dashes. I believe in the CLI we favor dashes (e.g. on-failure, unless-stopped, ...) while this PR is using underscores (start_first).

Member

aluzzardi commented Mar 21, 2017

nit: Underscores vs dashes. I believe in the CLI we favor dashes (e.g. on-failure, unless-stopped, ...) while this PR is using underscores (start_first).

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Mar 21, 2017

Contributor

@aluzzardi: I think you're right about dashes vs. underscores. @thaJeztah can you please confirm?

Contributor

aaronlehmann commented Mar 21, 2017

@aluzzardi: I think you're right about dashes vs. underscores. @thaJeztah can you please confirm?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah
Member

thaJeztah commented Mar 21, 2017

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Mar 22, 2017

Contributor

Updated to use start-first and stop-first in the API and CLI

Contributor

aaronlehmann commented Mar 22, 2017

Updated to use start-first and stop-first in the API and CLI

Show outdated Hide outdated cli/command/service/opts.go Outdated
@@ -74,6 +74,7 @@ Options:
--rollback-max-failure-ratio float Failure rate to tolerate during a rollback
--rollback-monitor duration Duration after each task rollback to monitor for failure
(ns|us|ms|s|m|h) (default 0s)
--rollback-order string Rollback order ("start-first"|"stop-first") (default "stop-first")

This comment has been minimized.

@dongluochen

dongluochen Mar 24, 2017

Contributor

Stating default in service update command is a little misleading. If you don't specify the flag, it keeps existing value.

@dongluochen

dongluochen Mar 24, 2017

Contributor

Stating default in service update command is a little misleading. If you don't specify the flag, it keeps existing value.

This comment has been minimized.

@aaronlehmann

aaronlehmann Mar 24, 2017

Contributor

Agreed, however the other --update-* flags have the same problem.

      --update-delay duration              Delay between updates (ns|us|ms|s|m|h) (default 0s)
      --update-failure-action string       Action on update failure ("pause"|"continue"|"rollback") (default "pause")
      --update-monitor duration            Duration after each task update to monitor for failure (ns|us|ms|s|m|h)
      --update-parallelism uint            Maximum number of tasks updated simultaneously (0 to update all at once) (default 1)

Let's deal with this in a separate PR. Would you mind filing an issue?

@aaronlehmann

aaronlehmann Mar 24, 2017

Contributor

Agreed, however the other --update-* flags have the same problem.

      --update-delay duration              Delay between updates (ns|us|ms|s|m|h) (default 0s)
      --update-failure-action string       Action on update failure ("pause"|"continue"|"rollback") (default "pause")
      --update-monitor duration            Duration after each task update to monitor for failure (ns|us|ms|s|m|h)
      --update-parallelism uint            Maximum number of tasks updated simultaneously (0 to update all at once) (default 1)

Let's deal with this in a separate PR. Would you mind filing an issue?

Show outdated Hide outdated integration-cli/docker_api_swarm_service_test.go Outdated
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Mar 27, 2017

Contributor

Rebased. CI should pass now.

Contributor

aaronlehmann commented Mar 27, 2017

Rebased. CI should pass now.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Mar 27, 2017

Contributor
14:32:30 FAIL: check_test.go:353: DockerSwarmSuite.TearDownTest
14:32:30 
14:32:30 check_test.go:358:
14:32:30     d.Stop(c)
14:32:30 daemon/daemon.go:392:
14:32:30     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
14:32:30 ... Error: Error while stopping the daemon daacbb8324438 : exit status 2
14:32:30 
14:32:30 
14:32:30 ----------------------------------------------------------------------
14:32:30 PANIC: docker_cli_swarm_test.go:418: DockerSwarmSuite.TestSwarmIngressNetwork
14:32:30 
14:32:30 [daacbb8324438] waiting for daemon to start
14:32:30 [daacbb8324438] daemon started
14:32:30 
14:32:30 [daacbb8324438] exiting daemon
14:32:30 ... Panic: Fixture has panicked (see related PANIC)
Contributor

cpuguy83 commented Mar 27, 2017

14:32:30 FAIL: check_test.go:353: DockerSwarmSuite.TearDownTest
14:32:30 
14:32:30 check_test.go:358:
14:32:30     d.Stop(c)
14:32:30 daemon/daemon.go:392:
14:32:30     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
14:32:30 ... Error: Error while stopping the daemon daacbb8324438 : exit status 2
14:32:30 
14:32:30 
14:32:30 ----------------------------------------------------------------------
14:32:30 PANIC: docker_cli_swarm_test.go:418: DockerSwarmSuite.TestSwarmIngressNetwork
14:32:30 
14:32:30 [daacbb8324438] waiting for daemon to start
14:32:30 [daacbb8324438] daemon started
14:32:30 
14:32:30 [daacbb8324438] exiting daemon
14:32:30 ... Panic: Fixture has panicked (see related PANIC)
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Mar 27, 2017

Contributor

I don't think it's related.

ping @aboch

Contributor

aaronlehmann commented Mar 27, 2017

I don't think it's related.

ping @aboch

@tophj-ibm

This comment has been minimized.

Show comment
Hide comment
@tophj-ibm

tophj-ibm Mar 27, 2017

Contributor

failure isn't related, seen in #32143 janky as well

Contributor

tophj-ibm commented Mar 27, 2017

failure isn't related, seen in #32143 janky as well

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Apr 4, 2017

Contributor

I've tried to rebuild rebuild/windowsRS1 several times and it always fails. Is this a known issue?

Contributor

aaronlehmann commented Apr 4, 2017

I've tried to rebuild rebuild/windowsRS1 several times and it always fails. Is this a known issue?

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Apr 6, 2017

Contributor

ping @docker/core-swarmkit-maintainers

Any chance of getting this in before the freeze?

Contributor

aaronlehmann commented Apr 6, 2017

ping @docker/core-swarmkit-maintainers

Any chance of getting this in before the freeze?

@dperny

This comment has been minimized.

Show comment
Hide comment
@dperny

dperny Apr 6, 2017

Contributor

I am not a maintainer, but it LGTM.

Contributor

dperny commented Apr 6, 2017

I am not a maintainer, but it LGTM.

@vieux

This comment has been minimized.

Show comment
Hide comment
@vieux

vieux Apr 7, 2017

Collaborator

LGTM ping @aluzzardi

Collaborator

vieux commented Apr 7, 2017

LGTM ping @aluzzardi

@nishanttotla

LGTM (IANAM)

aaronlehmann added some commits Jan 18, 2017

Add support for update order
This parameter controls the order of operations when rolling out an
update task. Either the old task is stopped before starting the new one,
or the new task is started first, and the running tasks will briefly
overlap.

This commit adds Rollout to the API, and --update-order / --rollback-order
flags to the CLI.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Add integration test for START_FIRST update order
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@cpuguy83

LGTM

@cpuguy83 cpuguy83 merged commit 3de58eb into moby:master Apr 7, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 32655 has succeeded
Details
janky Jenkins build Docker-PRs 41265 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 1446 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 12382 has succeeded
Details
z Jenkins build Docker-PRs-s390x 1279 has succeeded
Details

@GordonTheTurtle GordonTheTurtle added this to the 17.05.0 milestone Apr 7, 2017

@aaronlehmann aaronlehmann deleted the aaronlehmann:rollout-mode branch May 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment