New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close logwatcher on context cancellation #30223

Merged
merged 1 commit into from Jan 20, 2017

Conversation

Projects
None yet
10 participants
@mlaventure
Contributor

mlaventure commented Jan 17, 2017

Signed-off-by: Kenfe-Mickael Laventure mickael.laventure@gmail.com

--

Fixes #30135

This patch addresses 2 issues:

  1. in tailfile() if somehow the logWatcher.Msg were to become full and the watcher closed before space was made into it, we were getting stuck there forever since we were not checking for the watcher getting closed
  2. When servicing docker logs, if the command was cancelled we were not closing the watcher (and hence notifying it to stop copying data)
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jan 17, 2017

Member

can you update the commit message, so that it has the description you provided here as PR description?

Member

thaJeztah commented Jan 17, 2017

can you update the commit message, so that it has the description you provided here as PR description?

@mlaventure

This comment has been minimized.

Show comment
Hide comment
@mlaventure
Contributor

mlaventure commented Jan 17, 2017

@thaJeztah done

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jan 17, 2017

Contributor
daemon/logs.go: daemon/logs.go:78:3: function must be invoked in defer statement
Contributor

cpuguy83 commented Jan 17, 2017

daemon/logs.go: daemon/logs.go:78:3: function must be invoked in defer statement
Close logwatcher on context cancellation
This commit addresses 2 issues:

  1. in `tailfile()` if somehow the `logWatcher.Msg` were to become full and the watcher closed before space was made into it, we were getting stuck there forever since we were not checking for the watcher getting closed
  2. when servicing `docker logs`, if the command was cancelled we were not closing the watcher (and hence notifying it to stop copying data)

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
@mlaventure

This comment has been minimized.

Show comment
Hide comment
@mlaventure

mlaventure Jan 17, 2017

Contributor

@cpuguy83 yes, just realized myself 😓

Contributor

mlaventure commented Jan 17, 2017

@cpuguy83 yes, just realized myself 😓

@cpuguy83

LGTM

@tonistiigi

This comment has been minimized.

Show comment
Hide comment
@tonistiigi

tonistiigi Jan 20, 2017

Member

LGTM

Member

tonistiigi commented Jan 20, 2017

LGTM

@tonistiigi tonistiigi merged commit 0566f3f into moby:master Jan 20, 2017

4 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 29743 has succeeded
Details
janky Jenkins build Docker-PRs 38341 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 9377 has succeeded
Details

@GordonTheTurtle GordonTheTurtle added this to the 1.14.0 milestone Jan 20, 2017

@jeinwag

This comment has been minimized.

Show comment
Hide comment
@jeinwag

jeinwag Jan 31, 2017

Is there any chance that this will make it into a 1.13.x release?

jeinwag commented Jan 31, 2017

Is there any chance that this will make it into a 1.13.x release?

@mlaventure mlaventure deleted the mlaventure:fix-logwatcher-deadlock branch Jan 31, 2017

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jan 31, 2017

Contributor

This seems like a pretty low-impact bug that's been around for some time now.

Contributor

cpuguy83 commented Jan 31, 2017

This seems like a pretty low-impact bug that's been around for some time now.

@jeinwag

This comment has been minimized.

Show comment
Hide comment
@jeinwag

jeinwag Jan 31, 2017

Well, we're currently experiencing the issues described in #30135, which can basically lead to the docker daemon becoming unresponsive.

jeinwag commented Jan 31, 2017

Well, we're currently experiencing the issues described in #30135, which can basically lead to the docker daemon becoming unresponsive.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jan 31, 2017

Contributor

@jeinwag Have you tested that this fixes your issue?

It's not immediately obvious that this would block daemon operations, but I guess it could if certain streams are never closed.

Contributor

cpuguy83 commented Jan 31, 2017

@jeinwag Have you tested that this fixes your issue?

It's not immediately obvious that this would block daemon operations, but I guess it could if certain streams are never closed.

@jeinwag

This comment has been minimized.

Show comment
Hide comment
@jeinwag

jeinwag Feb 1, 2017

@cpuguy83 OK, I was unable to reproduce a total freeze, although it has happened before as described in #30135.
What I could reproduce is a "zombie" container which cannot be removed:

docker run -d --name test debian cat /dev/urandom
docker logs test
<Ctrl-C>
docker stop test
docker rm test

Trying to remove the container results in "You cannot remove a running container...", although the process of the container has already terminated.
The only way to get rid of the container is to restart dockerd.

I can confirm that this PR fixes that problem.

jeinwag commented Feb 1, 2017

@cpuguy83 OK, I was unable to reproduce a total freeze, although it has happened before as described in #30135.
What I could reproduce is a "zombie" container which cannot be removed:

docker run -d --name test debian cat /dev/urandom
docker logs test
<Ctrl-C>
docker stop test
docker rm test

Trying to remove the container results in "You cannot remove a running container...", although the process of the container has already terminated.
The only way to get rid of the container is to restart dockerd.

I can confirm that this PR fixes that problem.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83
Contributor

cpuguy83 commented Feb 2, 2017

@kayrus

This comment has been minimized.

Show comment
Hide comment
@kayrus

kayrus Feb 7, 2017

Contributor

I face the same issue:

docker version
Client:
 Version:      1.13.0
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:58:26 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0
 API version:  1.25 (minimum version 1.12)
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:58:26 2017
 OS/Arch:      linux/amd64
 Experimental: false
Contributor

kayrus commented Feb 7, 2017

I face the same issue:

docker version
Client:
 Version:      1.13.0
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:58:26 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0
 API version:  1.25 (minimum version 1.12)
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:58:26 2017
 OS/Arch:      linux/amd64
 Experimental: false
@tbatchelli

This comment has been minimized.

Show comment
Hide comment
@tbatchelli

tbatchelli Feb 8, 2017

We just upgraded and will need to revert because of this issue. Didn't impact pre-production servers because of much lower level of logging.

Client:
 Version:      1.13.0
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:55:28 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0
 API version:  1.25 (minimum version 1.12)
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:55:28 2017
 OS/Arch:      linux/amd64
 Experimental: false

tbatchelli commented Feb 8, 2017

We just upgraded and will need to revert because of this issue. Didn't impact pre-production servers because of much lower level of logging.

Client:
 Version:      1.13.0
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:55:28 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0
 API version:  1.25 (minimum version 1.12)
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:55:28 2017
 OS/Arch:      linux/amd64
 Experimental: false
@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Feb 8, 2017

Contributor

@tbatchelli I'm sorry, I don't understand. This affects only users of the docker logs endpoint.

Contributor

cpuguy83 commented Feb 8, 2017

@tbatchelli I'm sorry, I don't understand. This affects only users of the docker logs endpoint.

@tbatchelli

This comment has been minimized.

Show comment
Hide comment
@tbatchelli

tbatchelli Feb 8, 2017

@cpuguy83 yes, that's how we trigger the bug, using docker logs and then Ctrl+C while it's still displaying logs.

It doesn't seem to be an issue when there is low loggging activity on the containers and hence not many logs entries written, so this didn't break on our pre-production servers where we test docker upgrades. Once we put 1.13.0 in production it took 2 docker logs commands to bring docker down to its knees, needing a full restart of the docker daemon.

I was able to reproduce the bug on our beta servers:

[root@beta01-docker containers]# docker run -d busybox sh -c "while true; do date; done"
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
4b0bc1c4050b: Pull complete
Digest: sha256:817a12c32a39bbe394944ba49de563e085f1d3c5266eb8e9723256bc4448680e
Status: Downloaded newer image for busybox:latest
f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
[root@beta01-docker containers]# docker logs -f f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
...
Wed Feb  8 17:22:11 UTC 2017
Wed Feb  8 17:22:11 UTC 2017
Wed Feb  8 17:22:11 UTC 2017
Wed Feb  8 17:22:11 UTC 2017
^C
[root@beta01-docker containers]# docker logs -f f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
...
Wed Feb  8 17:21:55 UTC 2017
Wed Feb  8 17:21:55 UTC 2017
Wed Feb  8 17:21:55 UTC 2017
Wed Feb  8 17:21:55 UTC 2017
Wed Feb  8 17:21:55 UTC 2017
Wed Feb  8 17:21:55 UTC 2017^C
[root@beta01-docker containers]# docker logs -f f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
^C
[root@beta01-docker containers]# docker logs -f f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
^C
[root@beta01-docker containers]# ^C
[root@beta01-docker containers]# docker logs -f f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
^C
[root@beta01-docker containers]# docker ps
CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS              PORTS                                       NAMES
f3ac5feedaeb        busybox                                 "sh -c 'while true..."   4 minutes ago       Up 4 minutes                                                    dreamy_agnesi

After the 2nd ^C the logs stopped responding. Restarting the container cleared the issue. From datadog it seemed that the memory consumption of the container keep ballooning.

I will try to downgrade this host to docker 1.12.5 and see if I can reproduce.

tbatchelli commented Feb 8, 2017

@cpuguy83 yes, that's how we trigger the bug, using docker logs and then Ctrl+C while it's still displaying logs.

It doesn't seem to be an issue when there is low loggging activity on the containers and hence not many logs entries written, so this didn't break on our pre-production servers where we test docker upgrades. Once we put 1.13.0 in production it took 2 docker logs commands to bring docker down to its knees, needing a full restart of the docker daemon.

I was able to reproduce the bug on our beta servers:

[root@beta01-docker containers]# docker run -d busybox sh -c "while true; do date; done"
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
4b0bc1c4050b: Pull complete
Digest: sha256:817a12c32a39bbe394944ba49de563e085f1d3c5266eb8e9723256bc4448680e
Status: Downloaded newer image for busybox:latest
f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
[root@beta01-docker containers]# docker logs -f f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
...
Wed Feb  8 17:22:11 UTC 2017
Wed Feb  8 17:22:11 UTC 2017
Wed Feb  8 17:22:11 UTC 2017
Wed Feb  8 17:22:11 UTC 2017
^C
[root@beta01-docker containers]# docker logs -f f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
Wed Feb  8 17:21:43 UTC 2017
...
Wed Feb  8 17:21:55 UTC 2017
Wed Feb  8 17:21:55 UTC 2017
Wed Feb  8 17:21:55 UTC 2017
Wed Feb  8 17:21:55 UTC 2017
Wed Feb  8 17:21:55 UTC 2017
Wed Feb  8 17:21:55 UTC 2017^C
[root@beta01-docker containers]# docker logs -f f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
^C
[root@beta01-docker containers]# docker logs -f f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
^C
[root@beta01-docker containers]# ^C
[root@beta01-docker containers]# docker logs -f f3ac5feedaeb5a3cac50996884dce52f4b07d733b22178289ab260ee22e09f12
^C
[root@beta01-docker containers]# docker ps
CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS              PORTS                                       NAMES
f3ac5feedaeb        busybox                                 "sh -c 'while true..."   4 minutes ago       Up 4 minutes                                                    dreamy_agnesi

After the 2nd ^C the logs stopped responding. Restarting the container cleared the issue. From datadog it seemed that the memory consumption of the container keep ballooning.

I will try to downgrade this host to docker 1.12.5 and see if I can reproduce.

@tbatchelli

This comment has been minimized.

Show comment
Hide comment
@tbatchelli

tbatchelli Feb 8, 2017

We downgraded one of our beta servers to 1.12.6 -- I can't reproduce the issue anymore. We'll downgrade the production servers today.

When we first saw the issue in production, the issue created by running docker log on a container cascaded to the point that other containers stopped responding to http requests, so I would definitely challenge the assessment that this is low impact.

tbatchelli commented Feb 8, 2017

We downgraded one of our beta servers to 1.12.6 -- I can't reproduce the issue anymore. We'll downgrade the production servers today.

When we first saw the issue in production, the issue created by running docker log on a container cascaded to the point that other containers stopped responding to http requests, so I would definitely challenge the assessment that this is low impact.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Feb 8, 2017

Contributor

@tbatchelli Sure. This can't go into .1 (which is imminent), but is considered for .2.

Contributor

cpuguy83 commented Feb 8, 2017

@tbatchelli Sure. This can't go into .1 (which is imminent), but is considered for .2.

@tbatchelli

This comment has been minimized.

Show comment
Hide comment
@tbatchelli

tbatchelli Feb 8, 2017

+1.

You guys rock!

tbatchelli commented Feb 8, 2017

+1.

You guys rock!

@thaJeztah thaJeztah added this to PRs in 17.03.2-maybe Feb 8, 2017

@hai-ld

This comment has been minimized.

Show comment
Hide comment
@hai-ld

hai-ld Feb 9, 2017

Is there any ETA for 1.13.2?

hai-ld commented Feb 9, 2017

Is there any ETA for 1.13.2?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Feb 9, 2017

Member

No, nor has been decided if a 1.13.2 will actually be released; see here for our patch release procedures/policy https://github.com/docker/docker/blob/master/project/PATCH-RELEASES.md

Member

thaJeztah commented Feb 9, 2017

No, nor has been decided if a 1.13.2 will actually be released; see here for our patch release procedures/policy https://github.com/docker/docker/blob/master/project/PATCH-RELEASES.md

@kayrus

This comment has been minimized.

Show comment
Hide comment
@kayrus

kayrus Feb 9, 2017

Contributor

You guys rock!

Contributor

kayrus commented Feb 9, 2017

You guys rock!

@thaJeztah thaJeztah modified the milestones: 1.13.2, 1.14.0 Feb 14, 2017

thaJeztah added a commit to thaJeztah/docker that referenced this pull request Feb 18, 2017

Merge pull request moby#30223 from mlaventure/fix-logwatcher-deadlock
Close logwatcher on context cancellation
(cherry picked from commit 0566f3f)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

@thaJeztah thaJeztah referenced this pull request Feb 18, 2017

Merged

17.03 cherry picks #31140

@thaJeztah thaJeztah moved this from PRs to picked in 17.03.2-maybe Feb 18, 2017

thaJeztah added a commit to thaJeztah/docker that referenced this pull request Feb 18, 2017

Merge pull request moby#30223 from mlaventure/fix-logwatcher-deadlock
Close logwatcher on context cancellation
(cherry picked from commit 0566f3f)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

@thaJeztah thaJeztah removed this from picked in 17.03.2-maybe Feb 22, 2017

@tonistiigi tonistiigi referenced this pull request Feb 24, 2017

Closed

Docker ps hang #28889

liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017

Merge branch 'close-logwatcher-on-context-cancellation' into 'huawei-…
…1.11.2'


Close logwatcher on context cancellation

cherry-pick from moby#30223

This commit addresses 2 issues:

  1. in `tailfile()` if somehow the `logWatcher.Msg` were to become full and the watcher closed before space was made into it, we were getting stuck there forever since we were not checking for the watcher getting closed
  2. when servicing `docker logs`, if the command was cancelled we were not closing the watcher (and hence notifying it to stop copying data)

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Signed-off-by: yangshukui <yangshukui@huawei.com>



See merge request docker/docker!535
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment