New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add `--stop-signal` for `service create` and `service update` #30754

Merged
merged 1 commit into from Mar 1, 2017

Conversation

@yongtang
Member

yongtang commented Feb 6, 2017

- What I did

This fix tries to address the issue raised in #25696 where it was not possible to specify --stop-signal for docker service create and docker service update, in order to use special signal to stop the container.

- How I did it

This fix adds --stop-signal and update the StopSignal in Config through service create and service update.

A SwarmKit PR docker/swarmkit#1924 has been opened separately.

Related docs has been updated.

- How to verify it

Additional unit test and Integration tests have been added.

- Description for the changelog

Add --stop-signal for service create and service update

- A picture of a cute animal (not mandatory but encouraged)

0ceca72e14a484c5e428f151e2aa9484

This fix fixes #25696.

Signed-off-by: Yong Tang yong.tang.github@outlook.com

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 6, 2017

Contributor

I think this needs some validation of StopSignal, probably in swarmkit's controlapi, and maybe also in the CLI.

Contributor

aaronlehmann commented Feb 6, 2017

I think this needs some validation of StopSignal, probably in swarmkit's controlapi, and maybe also in the CLI.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Feb 9, 2017

Member

Design LGTM, @aaronlehmann not sure how much validation we should do, or leave it to Linux?

Member

thaJeztah commented Feb 9, 2017

Design LGTM, @aaronlehmann not sure how much validation we should do, or leave it to Linux?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Feb 9, 2017

Member

@yongtang I think we can do --stop-timeout as a follow up

Member

thaJeztah commented Feb 9, 2017

@yongtang I think we can do --stop-timeout as a follow up

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 10, 2017

Contributor

@thaJeztah: I think docker service create --stop-signal gordon ... should fail instead of creating a service that fails immediately.

Contributor

aaronlehmann commented Feb 10, 2017

@thaJeztah: I think docker service create --stop-signal gordon ... should fail instead of creating a service that fails immediately.

@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Feb 11, 2017

Member

Thanks @aaronlehmann @thaJeztah for the review. The PR has been updated. Now the signal is validated in containerToGRPC(). Please take a look.

For --stop-timeout, I noticed that in docker it is defined as a pointer:
https://github.com/docker/docker/blob/v1.13.0/api/types/container/config.go#L60

type Config struct {
...
	StopTimeout     *int                `json:",omitempty"` // Timeout (in seconds) to stop a container
...
}

That might be an issue to define it in protobuf. I will take a look and see how to solve it.

Member

yongtang commented Feb 11, 2017

Thanks @aaronlehmann @thaJeztah for the review. The PR has been updated. Now the signal is validated in containerToGRPC(). Please take a look.

For --stop-timeout, I noticed that in docker it is defined as a pointer:
https://github.com/docker/docker/blob/v1.13.0/api/types/container/config.go#L60

type Config struct {
...
	StopTimeout     *int                `json:",omitempty"` // Timeout (in seconds) to stop a container
...
}

That might be an issue to define it in protobuf. I will take a look and see how to solve it.

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 11, 2017

Contributor
Contributor

aaronlehmann commented Feb 11, 2017

@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Feb 11, 2017

Member

@aaronlehmann It looks like

--stop-grace-period duration       Time to wait before force killing a container (ns|us|ms|s|m|h)

is kind of --stop-timeout in docker run. I think it might have been covered?

Also, --stop-grace-period actually does not setup the value StopTimeout in container's config. Instead, it is called explicitly in:
https://github.com/docker/docker/blob/v1.13.1/daemon/cluster/executor/container/adapter.go#L338-L347

func (c *containerAdapter) shutdown(ctx context.Context) error {
	// Default stop grace period to nil (daemon will use the stopTimeout of the container)
	var stopgrace *int
	spec := c.container.spec()
	if spec.StopGracePeriod != nil {
		stopgraceValue := int(spec.StopGracePeriod.Seconds)
		stopgrace = &stopgraceValue
	}
	return c.backend.ContainerStop(c.container.name(), stopgrace)
}

I am wondering if --stop-signal could do the same thing to call it explicitly in shutdown? (instead of set the value in container's config)

Member

yongtang commented Feb 11, 2017

@aaronlehmann It looks like

--stop-grace-period duration       Time to wait before force killing a container (ns|us|ms|s|m|h)

is kind of --stop-timeout in docker run. I think it might have been covered?

Also, --stop-grace-period actually does not setup the value StopTimeout in container's config. Instead, it is called explicitly in:
https://github.com/docker/docker/blob/v1.13.1/daemon/cluster/executor/container/adapter.go#L338-L347

func (c *containerAdapter) shutdown(ctx context.Context) error {
	// Default stop grace period to nil (daemon will use the stopTimeout of the container)
	var stopgrace *int
	spec := c.container.spec()
	if spec.StopGracePeriod != nil {
		stopgraceValue := int(spec.StopGracePeriod.Seconds)
		stopgrace = &stopgraceValue
	}
	return c.backend.ContainerStop(c.container.name(), stopgrace)
}

I am wondering if --stop-signal could do the same thing to call it explicitly in shutdown? (instead of set the value in container's config)

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 13, 2017

Contributor

I think it might have been covered?

Yeah, this is covered by --stop-grace-period.

Also, --stop-grace-period actually does not setup the value StopTimeout in container's config.

I'm not sure whether this is intentional. It may make sense to also apply this value in the container config. The initial purpose of --stop-grace-period was to handle this shutdown use case, but I don't see why it shouldn't set the StopTimeout for other cases as well.

Contributor

aaronlehmann commented Feb 13, 2017

I think it might have been covered?

Yeah, this is covered by --stop-grace-period.

Also, --stop-grace-period actually does not setup the value StopTimeout in container's config.

I'm not sure whether this is intentional. It may make sense to also apply this value in the container config. The initial purpose of --stop-grace-period was to handle this shutdown use case, but I don't see why it shouldn't set the StopTimeout for other cases as well.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Feb 13, 2017

Member

I'm not completely sure how the inner workings are w.r.t. --stop-grace-period; is it currently only used when updating a service? --stop-timeout is used when shutting down the daemon; does swarmkit perform actions before the daemon shuts down?

Slightly orthogonal, but should we rename --stop-timeout on docker run to --stop-grace-period? It's a pity we're not consistent in naming here if they serve the same purpose

Member

thaJeztah commented Feb 13, 2017

I'm not completely sure how the inner workings are w.r.t. --stop-grace-period; is it currently only used when updating a service? --stop-timeout is used when shutting down the daemon; does swarmkit perform actions before the daemon shuts down?

Slightly orthogonal, but should we rename --stop-timeout on docker run to --stop-grace-period? It's a pity we're not consistent in naming here if they serve the same purpose

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 13, 2017

Contributor

My understanding is that --stop-grace-period is currently only honored when swarmkit takes explicit action to shut down a container; i.e. during a service update. However, I think that honoring it in other cases makes more sense than adding a new parameter that would be confusingly similar.

Contributor

aaronlehmann commented Feb 13, 2017

My understanding is that --stop-grace-period is currently only honored when swarmkit takes explicit action to shut down a container; i.e. during a service update. However, I think that honoring it in other cases makes more sense than adding a new parameter that would be confusingly similar.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Feb 13, 2017

Member

Agreed; if they both do the same, we should have a single flag (and think of unifying the docker run and docker service create naming)

Member

thaJeztah commented Feb 13, 2017

Agreed; if they both do the same, we should have a single flag (and think of unifying the docker run and docker service create naming)

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Feb 13, 2017

Member

@htc2u if you have a bot running on your account; would it be possible to turn it off? It's generating a lot of noise (keep in mind that each comment sends over 3000 e-mails / notifications to subscribers to this repository)

Member

thaJeztah commented Feb 13, 2017

@htc2u if you have a bot running on your account; would it be possible to turn it off? It's generating a lot of noise (keep in mind that each comment sends over 3000 e-mails / notifications to subscribers to this repository)

yongtang added a commit to yongtang/docker that referenced this pull request Feb 16, 2017

Update SwarmKit to 6bc357e9c5f0ac2cdf801898a43d08c260b4d5d0
This fix tries to update the SwarmKit from
ed384f3b3957f65e3111bd020f9815f3d4296fa2
to
6bc357e9c5f0ac2cdf801898a43d08c260b4d5d0

The following is the list of docker related changes:
1. Took long time for Docker Swarm service turn desired state from Ready to Running (Issue moby#28291)
2. Native Swarm in 1.12 - panic: runtime error: index out of range (Issue moby#25608)
3. Global mode target replicas keep increasing (Issue moby#30854)
4. Creating service with publish mode=host and without published port crashes swarm manager (Issue moby#30938)
5. Define signals used to stop containers for updates (Issue moby#25696) (PR moby#30754)

This fix fixes moby#28291, moby#25608, moby#30854, moby#30938.
This fix is required by PR moby#30754.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Feb 17, 2017

Member

The PR has been updated with #31091 (SwarmKit vendoring) merged.

Member

yongtang commented Feb 17, 2017

The PR has been updated with #31091 (SwarmKit vendoring) merged.

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 17, 2017

Contributor

I still think some validation of StopSignal before creating the service is a good idea, but I guess it's fine this way if it's not feasible.

Contributor

aaronlehmann commented Feb 17, 2017

I still think some validation of StopSignal before creating the service is a good idea, but I guess it's fine this way if it's not feasible.

@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Feb 17, 2017

Member

Thanks @aaronlehmann for the feedback. I wasn't sure previously but let me take a look and see if we could do something with validation.

Member

yongtang commented Feb 17, 2017

Thanks @aaronlehmann for the feedback. I wasn't sure previously but let me take a look and see if we could do something with validation.

@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Feb 20, 2017

Member

@aaronlehmann Before a service is created, the Spec is converted and containerToGRPC then it will be passed to swarm. As the Signal validation is done inside containerToGRPC, now the validation happens before the service is actually created.

I created an additional test case in TestSwarmStopSignal so that invalid signal is covered. Please take a look and let me know if there are any issues.

Member

yongtang commented Feb 20, 2017

@aaronlehmann Before a service is created, the Spec is converted and containerToGRPC then it will be passed to swarm. As the Signal validation is done inside containerToGRPC, now the validation happens before the service is actually created.

I created an additional test case in TestSwarmStopSignal so that invalid signal is covered. Please take a look and let me know if there are any issues.

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 20, 2017

Contributor

As the Signal validation is done inside containerToGRPC, now the validation happens before the service is actually created.

It just occurred to me that this will validate against the set of signals that are valid on the manager. In a future mixed-OS cluster, it might cause problems if one, for example, connects to a Windows manager to set up a Linux service.

Ideally we would validate against the superset of signals defined on all platforms, but this might be making things complicated. I suppose we could move the SignalMap definitions out of the platform-specific files (as SignalMapLinux, SignalMapSolaris, etc.), and have platform-specific code that chooses the right map - and also a function that checks if a signal is defined in any map. Feel free to remove the validation if this is getting too complicated.

cc @friism

Contributor

aaronlehmann commented Feb 20, 2017

As the Signal validation is done inside containerToGRPC, now the validation happens before the service is actually created.

It just occurred to me that this will validate against the set of signals that are valid on the manager. In a future mixed-OS cluster, it might cause problems if one, for example, connects to a Windows manager to set up a Linux service.

Ideally we would validate against the superset of signals defined on all platforms, but this might be making things complicated. I suppose we could move the SignalMap definitions out of the platform-specific files (as SignalMapLinux, SignalMapSolaris, etc.), and have platform-specific code that chooses the right map - and also a function that checks if a signal is defined in any map. Feel free to remove the validation if this is getting too complicated.

cc @friism

@dnephin

This comment has been minimized.

Show comment
Hide comment
@dnephin

dnephin Feb 21, 2017

Member

I think it would be better to remove the validation.

Member

dnephin commented Feb 21, 2017

I think it would be better to remove the validation.

Add `--stop-signal` for `service create` and `service update`
This fix tries to address the issue raised in 25696 where
it was not possible to specify `--stop-signal` for `docker service create`
and `docker service update`, in order to use special signal to stop
the container.

This fix adds `--stop-signal` and update the `StopSignal` in `Config`
through `service create` and `service update`.

Related docs has been updated.

Integration test has been added.

This fix fixes 25696.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Feb 21, 2017

Contributor

LGTM

Contributor

aaronlehmann commented Feb 21, 2017

LGTM

@yongtang

This comment has been minimized.

Show comment
Hide comment
@yongtang

yongtang Feb 21, 2017

Member

Thanks @aaronlehmann @dnephin for the review. I think the mixture of workers with different OS might be a legitimate concern in the future. At least the platform related elements (like signals) probably could be considered as "constraints". In this case, tasks will not be deployed to those workers that cannot meet the constraint (like specific signals).

For now I just removed the validation. I think we could revisit it in the future.

Member

yongtang commented Feb 21, 2017

Thanks @aaronlehmann @dnephin for the review. I think the mixture of workers with different OS might be a legitimate concern in the future. At least the platform related elements (like signals) probably could be considered as "constraints". In this case, tasks will not be deployed to those workers that cannot meet the constraint (like specific signals).

For now I just removed the validation. I think we could revisit it in the future.

@thaJeztah

LGTM, thanks!

@thaJeztah thaJeztah merged commit bb9f195 into moby:master Mar 1, 2017

4 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 31009 has succeeded
Details
janky Jenkins build Docker-PRs 39622 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 10685 has succeeded
Details

@GordonTheTurtle GordonTheTurtle added this to the 17.04.0 milestone Mar 1, 2017

@yongtang yongtang deleted the yongtang:25696-stop-signal branch Mar 2, 2017

dnephin pushed a commit to dnephin/docker that referenced this pull request Apr 17, 2017

Merge pull request moby#30754 from yongtang/25696-stop-signal
Add `--stop-signal` for `service create` and `service update`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment