Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return an empty stats if "container not found" #34004

Merged
merged 1 commit into from Jul 12, 2017

Conversation

@yummypeng
Copy link
Contributor

commented Jul 7, 2017

If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

Steps to reproduce:

# docker run -tid --name tmp --restart=always ubuntu true && docker stats --no-stream
48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c


(hangs...)



^C
#

It's a duplicate of #27772 which has already been closed by #28026, but #28026 is not a complete fix because if docker stats --no-stream comes faster after docker run, the container is still running when the code hits https://github.com/moby/moby/blob/master/daemon/stats.go#L35. Then I can see a lot of error logs from daemon:

time="2017-07-07T14:54:41.210577988+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:42.210686973+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:43.210721305+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:44.210637074+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:45.210742374+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:46.210928003+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:47.210899239+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:48.210914277+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:49.210748399+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:50.210818936+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:51.210596037+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"

This pr will ignore this error and just return an empty stats. So that client will not hang forever.

after this pr:

# docker run -tid --name tmp --restart=always ubuntu true && docker stats --no-stream
0c93066033730361fb296af0e3b07e4b264169535353aecd9121159b81dfa101
CONTAINER           CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS

Signed-off-by: Yuanhong Peng pengyuanhong@huawei.com

- A picture of a cute animal (not mandatory but encouraged)
images

@yummypeng yummypeng closed this Jul 7, 2017

@yummypeng yummypeng reopened this Jul 7, 2017

@yummypeng yummypeng force-pushed the yummypeng:fix-docker-stats-hang branch from 4c77b79 to c33a3f4 Jul 7, 2017

@yummypeng

This comment has been minimized.

Copy link
Contributor Author

commented Jul 7, 2017

Oops, I close this pr by mistake and CI stopped @thaJeztah

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jul 7, 2017

Looks like CI started again, and is green now 👍

@thaJeztah
Copy link
Member

left a comment

Left some suggestions :)

// if we get "container not found" error from containerd, it's possibly
// because that this container has already been stopped. it will be ok to
// ignore this error and just return an empty stats.
if _, ok := err.(notRunningErr); !ok && !strings.Contains(err.Error(), "container not found") {

This comment has been minimized.

Copy link
@thaJeztah

thaJeztah Jul 7, 2017

Member

Could we make the container not found error a typed error (like we do for notRunningErr)?

This comment has been minimized.

Copy link
@thaJeztah

thaJeztah Jul 7, 2017

Member

Not introduced in this PR, but while looking at this code:
The double negative is a bit confusing here as well; would this be more readable (assuming a typed error for container not found was implemented;?

for _, pair := range pairs {
	stats, err := s.supervisor.GetContainerStats(pair.container)

	switch err.(type) {
	case nil:
		// FIXME: move to containerd on Linux (not Windows)
		stats.CPUStats.SystemUsage = systemUsage
		stats.CPUStats.OnlineCPUs = onlineCPUs

		pair.publisher.Publish(*stats)

	case notRunningErr, notFoundErr:
		// publish empty stats containing only name and ID if not running or not found
		pair.publisher.Publish(types.StatsJSON{
			Name: pair.container.Name,
			ID:   pair.container.ID,
		})

	default:
		logrus.Errorf("collecting stats for %s: %v", pair.container.ID, err)
	}
}

This comment has been minimized.

Copy link
@yummypeng

yummypeng Jul 10, 2017

Author Contributor

@thaJeztah Thanks for your suggestion ❤️ I have updated the codes.

@yummypeng yummypeng force-pushed the yummypeng:fix-docker-stats-hang branch from c33a3f4 to 911c002 Jul 10, 2017

@thaJeztah
Copy link
Member

left a comment

left one more comment

also /cc @cpuguy83 for review

@@ -1165,6 +1165,9 @@ func (daemon *Daemon) stats(c *container.Container) (*types.StatsJSON, error) {
}
stats, err := daemon.containerd.Stats(c.ID)
if err != nil {
if strings.Contains(err.Error(), "container not found") {

This comment has been minimized.

Copy link
@thaJeztah

thaJeztah Jul 10, 2017

Member

I guess this also needs to be done in the _windows version;

// Obtain the stats from HCS via libcontainerd
stats, err := daemon.containerd.Stats(c.ID)
if err != nil {
return nil, err
}

This comment has been minimized.

Copy link
@yummypeng

yummypeng Jul 10, 2017

Author Contributor

Updated 😉 @thaJeztah

@yummypeng yummypeng force-pushed the yummypeng:fix-docker-stats-hang branch from 911c002 to dc8b314 Jul 10, 2017


type notFoundErr interface {
error
}

This comment has been minimized.

Copy link
@thaJeztah

thaJeztah Jul 10, 2017

Member

Hm, looking at this again; this interface is not specific enough, as both daemon. errNotRunning, and daemon. errNotFound satisfy this interface; see https://play.golang.org/p/HtkBu1_QuM. If you make it more specific, it works though; https://play.golang.org/p/S7DzAuRITj

This comment has been minimized.

Copy link
@yummypeng

yummypeng Jul 10, 2017

Author Contributor

@thaJeztah Tons of thanks for finding this ❤️

This comment has been minimized.

Copy link
@thaJeztah

thaJeztah Jul 11, 2017

Member

@yummypeng you're welcome! I almost missed it as well ☺️

@yummypeng yummypeng force-pushed the yummypeng:fix-docker-stats-hang branch from dc8b314 to c567391 Jul 10, 2017

Return an empty stats if "container not found"
If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>

@yummypeng yummypeng force-pushed the yummypeng:fix-docker-stats-hang branch from c567391 to 4a6cbf9 Jul 10, 2017

@yummypeng

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2017

janky failed but it seems not related to my PR:

07:41:22 ----------------------------------------------------------------------
07:41:22 FAIL: docker_cli_swarm_test.go:1146: DockerSwarmSuite.TestSwarmLockUnlockCluster
07:41:22 
07:41:22 [d4ec576a08906] waiting for daemon to start
07:41:22 [d4ec576a08906] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 [d0e91bd982f1a] waiting for daemon to start
07:41:22 [d0e91bd982f1a] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [d4ec576a08906] exiting daemon
07:41:22 [d4ec576a08906] waiting for daemon to start
07:41:22 [d4ec576a08906] daemon started
07:41:22 
07:41:22 [d0e91bd982f1a] exiting daemon
07:41:22 [d0e91bd982f1a] waiting for daemon to start
07:41:22 [d0e91bd982f1a] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 docker_cli_swarm_test.go:1192:
07:41:22     // d2 is now set to lock
07:41:22     checkSwarmUnlockedToLocked(c, d2)
07:41:22 docker_utils_test.go:471:
07:41:22     c.Assert(v, checker, args...)
07:41:22 ... obtained bool = false
07:41:22 ... expected bool = true
07:41:22 
07:41:22 [d4ec576a08906] exiting daemon
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [d0e91bd982f1a] exiting daemon
07:41:35 
07:41:35 ----------------------------------------------------------------------
@yummypeng

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2017

@thaJeztah
Copy link
Member

left a comment

LGTM

@vieux

This comment has been minimized.

Copy link
Collaborator

commented Jul 11, 2017

LGTM ping @tiborvass

@vieux

This comment has been minimized.

Copy link
Collaborator

commented Jul 11, 2017

ping @kolyshkin

@tiborvass

This comment has been minimized.

Copy link
Collaborator

commented Jul 12, 2017

LGTM

@tiborvass tiborvass merged commit c8a2596 into moby:master Jul 12, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 35573 has succeeded
Details
janky Jenkins build Docker-PRs 44185 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 4558 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 15560 has succeeded
Details
z Jenkins build Docker-PRs-s390x 4261 has succeeded
Details
liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017
Return an empty stats if "container not found"
If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

fix DTS2017062812251
related upstream PR: moby#34004

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017
Merge branch 'fix-docker-stats-hang' into 'huawei-1.11.2'
Return an empty stats if "container not found"

If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

fix DTS2017062812251
related upstream PR: moby#34004

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>



See merge request docker/docker!666
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.