New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return an empty stats if "container not found" #34004

Merged
merged 1 commit into from Jul 12, 2017

Conversation

Projects
None yet
5 participants
@yummypeng
Contributor

yummypeng commented Jul 7, 2017

If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

Steps to reproduce:

# docker run -tid --name tmp --restart=always ubuntu true && docker stats --no-stream
48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c


(hangs...)



^C
#

It's a duplicate of #27772 which has already been closed by #28026, but #28026 is not a complete fix because if docker stats --no-stream comes faster after docker run, the container is still running when the code hits https://github.com/moby/moby/blob/master/daemon/stats.go#L35. Then I can see a lot of error logs from daemon:

time="2017-07-07T14:54:41.210577988+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:42.210686973+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:43.210721305+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:44.210637074+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:45.210742374+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:46.210928003+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:47.210899239+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:48.210914277+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:49.210748399+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:50.210818936+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:51.210596037+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"

This pr will ignore this error and just return an empty stats. So that client will not hang forever.

after this pr:

# docker run -tid --name tmp --restart=always ubuntu true && docker stats --no-stream
0c93066033730361fb296af0e3b07e4b264169535353aecd9121159b81dfa101
CONTAINER           CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS

Signed-off-by: Yuanhong Peng pengyuanhong@huawei.com

- A picture of a cute animal (not mandatory but encouraged)
images

@yummypeng

This comment has been minimized.

Show comment
Hide comment
@yummypeng

yummypeng Jul 7, 2017

Contributor

Oops, I close this pr by mistake and CI stopped @thaJeztah

Contributor

yummypeng commented Jul 7, 2017

Oops, I close this pr by mistake and CI stopped @thaJeztah

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 7, 2017

Member

Looks like CI started again, and is green now 👍

Member

thaJeztah commented Jul 7, 2017

Looks like CI started again, and is green now 👍

@thaJeztah

Left some suggestions :)

Show outdated Hide outdated daemon/stats/collector.go Outdated
@thaJeztah

left one more comment

also /cc @cpuguy83 for review

@@ -1165,6 +1165,9 @@ func (daemon *Daemon) stats(c *container.Container) (*types.StatsJSON, error) {
}
stats, err := daemon.containerd.Stats(c.ID)
if err != nil {
if strings.Contains(err.Error(), "container not found") {

This comment has been minimized.

@thaJeztah

thaJeztah Jul 10, 2017

Member

I guess this also needs to be done in the _windows version;

// Obtain the stats from HCS via libcontainerd
stats, err := daemon.containerd.Stats(c.ID)
if err != nil {
return nil, err
}

@thaJeztah

thaJeztah Jul 10, 2017

Member

I guess this also needs to be done in the _windows version;

// Obtain the stats from HCS via libcontainerd
stats, err := daemon.containerd.Stats(c.ID)
if err != nil {
return nil, err
}

This comment has been minimized.

@yummypeng

yummypeng Jul 10, 2017

Contributor

Updated 😉 @thaJeztah

@yummypeng

yummypeng Jul 10, 2017

Contributor

Updated 😉 @thaJeztah

Show outdated Hide outdated daemon/stats/collector.go Outdated
Return an empty stats if "container not found"
If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
@yummypeng

This comment has been minimized.

Show comment
Hide comment
@yummypeng

yummypeng Jul 10, 2017

Contributor

janky failed but it seems not related to my PR:

07:41:22 ----------------------------------------------------------------------
07:41:22 FAIL: docker_cli_swarm_test.go:1146: DockerSwarmSuite.TestSwarmLockUnlockCluster
07:41:22 
07:41:22 [d4ec576a08906] waiting for daemon to start
07:41:22 [d4ec576a08906] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 [d0e91bd982f1a] waiting for daemon to start
07:41:22 [d0e91bd982f1a] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [d4ec576a08906] exiting daemon
07:41:22 [d4ec576a08906] waiting for daemon to start
07:41:22 [d4ec576a08906] daemon started
07:41:22 
07:41:22 [d0e91bd982f1a] exiting daemon
07:41:22 [d0e91bd982f1a] waiting for daemon to start
07:41:22 [d0e91bd982f1a] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 docker_cli_swarm_test.go:1192:
07:41:22     // d2 is now set to lock
07:41:22     checkSwarmUnlockedToLocked(c, d2)
07:41:22 docker_utils_test.go:471:
07:41:22     c.Assert(v, checker, args...)
07:41:22 ... obtained bool = false
07:41:22 ... expected bool = true
07:41:22 
07:41:22 [d4ec576a08906] exiting daemon
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [d0e91bd982f1a] exiting daemon
07:41:35 
07:41:35 ----------------------------------------------------------------------
Contributor

yummypeng commented Jul 10, 2017

janky failed but it seems not related to my PR:

07:41:22 ----------------------------------------------------------------------
07:41:22 FAIL: docker_cli_swarm_test.go:1146: DockerSwarmSuite.TestSwarmLockUnlockCluster
07:41:22 
07:41:22 [d4ec576a08906] waiting for daemon to start
07:41:22 [d4ec576a08906] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 [d0e91bd982f1a] waiting for daemon to start
07:41:22 [d0e91bd982f1a] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [d4ec576a08906] exiting daemon
07:41:22 [d4ec576a08906] waiting for daemon to start
07:41:22 [d4ec576a08906] daemon started
07:41:22 
07:41:22 [d0e91bd982f1a] exiting daemon
07:41:22 [d0e91bd982f1a] waiting for daemon to start
07:41:22 [d0e91bd982f1a] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 docker_cli_swarm_test.go:1192:
07:41:22     // d2 is now set to lock
07:41:22     checkSwarmUnlockedToLocked(c, d2)
07:41:22 docker_utils_test.go:471:
07:41:22     c.Assert(v, checker, args...)
07:41:22 ... obtained bool = false
07:41:22 ... expected bool = true
07:41:22 
07:41:22 [d4ec576a08906] exiting daemon
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [d0e91bd982f1a] exiting daemon
07:41:35 
07:41:35 ----------------------------------------------------------------------
@yummypeng

This comment has been minimized.

Show comment
Hide comment
@yummypeng
Contributor

yummypeng commented Jul 11, 2017

@thaJeztah

LGTM

@vieux

This comment has been minimized.

Show comment
Hide comment
@vieux

vieux Jul 11, 2017

Collaborator

LGTM ping @tiborvass

Collaborator

vieux commented Jul 11, 2017

LGTM ping @tiborvass

@vieux

This comment has been minimized.

Show comment
Hide comment
@vieux

vieux Jul 11, 2017

Collaborator

ping @kolyshkin

Collaborator

vieux commented Jul 11, 2017

ping @kolyshkin

@tiborvass

This comment has been minimized.

Show comment
Hide comment
@tiborvass

tiborvass Jul 12, 2017

Collaborator

LGTM

Collaborator

tiborvass commented Jul 12, 2017

LGTM

@tiborvass tiborvass merged commit c8a2596 into moby:master Jul 12, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 35573 has succeeded
Details
janky Jenkins build Docker-PRs 44185 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 4558 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 15560 has succeeded
Details
z Jenkins build Docker-PRs-s390x 4261 has succeeded
Details

liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017

Return an empty stats if "container not found"
If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

fix DTS2017062812251
related upstream PR: moby#34004

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>

liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017

Merge branch 'fix-docker-stats-hang' into 'huawei-1.11.2'
Return an empty stats if "container not found"

If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

fix DTS2017062812251
related upstream PR: moby#34004

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>



See merge request docker/docker!666
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment