Flaky test: TestDockerNetworkHostModeUngracefulDaemonRestart #19368

clnperez · 2016-01-15T16:09:38Z

Description of problem:
This one has failed a few times in the past couple of days with gccgo:
https://jenkins.dockerproject.org/job/Docker%20Master%20%28gccgo%29/1285/consoleFull
https://jenkins.dockerproject.org/job/Docker%20Master%20%28gccgo%29/1287/consoleFull
https://jenkins.dockerproject.org/job/Docker%20Master%20%28gccgo%29/1294/consoleFull

docker version:
1.10.10-dev (latest upstream)

docker info:
❗ This is to make Gordon happy, and should be ignored since it's from my laptop (running in a container), not one of Docker's test nodes.

./docker info

Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 1.10.0-dev
Storage Driver: devicemapper
Pool Name: docker-253:2-5914896-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file: /dev/loop2
Metadata file: /dev/loop3
Data Space Used: 11.8 MB
Data Space Total: 107.4 GB
Data Space Available: 84.95 GB
Metadata Space Used: 581.6 kB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.147 GB
Udev Sync Supported: false
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Either use --storage-opt dm.thinpooldev or use --storage-opt dm.no_warn_on_loop_devices=true to suppress this warning.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.82 (2013-10-04)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
Volume: local
Network: null host bridge
Kernel Version: 4.2.8-200.fc22.x86_64
Operating System: Ubuntu 14.04.3 LTS (containerized)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.678 GiB
Name: 703f7cbf4c89
ID: 5VYE:UHVY:NVE7:FXOH:6XFB:MT4G:2WOX:BXUT:5E7S:VFRS:EL4S:6HOX
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

uname -a:
❗ same comment as above.
4.2.8-200.fc22.x86_64

Environment details (AWS, VirtualBox, physical, etc.):
docker's jenkins (see above links)

How reproducible:
flaky. always seen by me in docker's jenkins builds

Steps to Reproduce:

someone commits something
it gets merged
the jenkins build is triggered

Actual Results:
Test always passes

Expected Results:
Sometimes it doesn't

Additional info:
I'll try to look into it today. I haven't tried recreating it locally but wanted to get it into an issue so others can "me too" it and so I don't forget about it.

The text was updated successfully, but these errors were encountered:

thaJeztah · 2016-01-20T02:48:11Z

Just had this one again with gccgo https://jenkins.dockerproject.org/job/Docker-PRs-gccgo/748/console

thaJeztah · 2016-01-20T02:48:43Z

02:11:35 
02:11:35 ----------------------------------------------------------------------
02:11:35 FAIL: docker_cli_network_unix_test.go:945: TestDockerNetworkHostModeUngracefulDaemonRestart.pN59_github_com_docker_docker_integration_cli.DockerNetworkSuite
02:11:35 
02:11:35 [d82681000] waiting for daemon to start
02:11:35 [d82681000] daemon started
02:11:35 [d82681000] exiting daemon
02:11:35 [d82681000] waiting for daemon to start
02:11:35 [d82681000] daemon started
02:11:35 docker_cli_network_unix_test.go:966:
02:11:35     c.Assert(err, checker.IsNil)
02:11:35 ... value *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc20d85cb80)} ("exit status 1")
02:11:35 
02:11:35 [d82681000] exiting daemon
02:11:37 
02:11:37 ----------------------------------------------------------------------

clnperez · 2016-01-20T18:07:57Z

Thanks @thaJeztah. I was able to get it to fail locally at least once last week, but I'll have to try again since I keep getting sidetracked and I've since lost that container.

clnperez · 2016-01-20T20:25:41Z

This also fails on ARM with golang. https://jenkins.dockerproject.org/job/Docker-PRs-arm/97/console

tophj-ibm · 2016-01-21T18:33:28Z

some debug info

FAIL: docker_cli_network_unix_test.go:995: TestDockerNetworkHostModeUngracefulDaemonRestart.pN59_github_com_docker_docker_integration_cli.DockerNetworkSuite

[d46248000] waiting for daemon to start
[d46248000] daemon started
[d46248000] exiting daemon
[d46248000] waiting for daemon to start
[d46248000] daemon started
docker_cli_network_unix_test.go:1017:
    c.Assert(err, checker.IsNil, check.Commentf(fmt.Sprintf("Error starting %s: %s", cName, runningOut)))
... value *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc209785360)} ("exit status 1")
... Error starting hostc-9: 
Error: No such image or container: hostc-9


[d46248000] exiting daemon

clnperez · 2016-01-21T18:45:47Z

So apparently @tophj-ibm can recreate this a lot more easily than I can, but I put in an inaccurate debug message. It should be an "inspect error," not a "start error," in case that confuses anyone.

thaJeztah · 2016-02-01T22:54:17Z

Hitting this again; https://jenkins.dockerproject.org/job/Docker-PRs-gccgo/1386/consoleFull

mavenugo · 2016-02-01T23:56:44Z

@thaJeztah i will look into it.

clnperez · 2016-02-02T18:17:56Z

I've been looking into this, and I've seen it fail in two ways: 1) the container doesn't exist, or 2) the container isn't started. It's always the last container that's the problem. The Start() function just checks to see if the daemon responding to requests. So we can either add a sleep to the test, or rework Start() a bit. I have a feeling that reworking Start() might also require adding some logic in the daemon itself.

mavenugo · 2016-02-03T04:13:53Z

@cinperez thanks. am trying to understand why this is seen almost consistently for gccgo CI but not on other CI runs.

clnperez · 2016-02-03T15:58:47Z

@mavenugo Go code compiled with gccgo isn't as optimized as go code compile with gc (and there may be some things we can add to our builds but I'm not sure anyone has dug into it much), so things run more slowly. I've seen that pretty consistently. That doesn't prove that this issue is a timing issue, but it could be why we see it on gccgo only.

clnperez · 2016-02-04T20:09:56Z

Hm @tiborvass, @mavenugo, looks like this failed again on gccgo: https://jenkins.dockerproject.org/job/Docker%20Master%20%28gccgo%29/1584/consoleFull

le sigh

icecrime · 2016-02-11T17:54:01Z

@clnperez @mavenugo Any idea? This is becoming a huge pain point: I'll skip the test if we can't figure out a better way.

tophj-ibm · 2016-02-11T20:21:07Z

Just starting to relook into this issue.

I'm getting this error when trying to load the last container that was originally started. (not necessarily the last container to be restarted)

Failed to load container 7d00772d8d3242210243177bc2142f0b406008db6fc27e6b41a3ec5e9119d555: EOF

It's possible the daemon kill is happening before the container has fully started, I'll continue to investigate.

Fixes moby#19368 by waiting until all container statuses are running before killing the daemon Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>

thaJeztah added the area/testing label Jan 16, 2016

This was referenced Jan 20, 2016

Fix DockerTrustSuite SetUpTest #19471

Merged

Epic: Fix Flaky tests #19480

Closed

tophj-ibm mentioned this issue Jan 20, 2016

Enable DockerTrustSuite for ARM again #19488

Merged

StefanScherer mentioned this issue Jan 25, 2016

add support for building first ARM-based debian package #18176

Merged

thaJeztah mentioned this issue Jan 26, 2016

On container rm, don't remove named mountpoints #19568

Merged

mavenugo mentioned this issue Feb 3, 2016

Use waitRun in TestDockerNetworkHostModeUngracefulDaemonRestart #19979

Merged

tiborvass closed this as completed in #19979 Feb 4, 2016

tiborvass reopened this Feb 5, 2016

anusha-ragunathan mentioned this issue Feb 10, 2016

Add "dummy" network module for arm images. #20202

Merged

thaJeztah mentioned this issue Feb 11, 2016

GELF logger: Add gelf-compression-type and gelf-compression-level #19831

Merged

tophj-ibm mentioned this issue Feb 11, 2016

Fix flaky test, TestDockerNetworkHostModeUngracefulDaemonRestart #20246

Merged

calavera closed this as completed in #20246 Feb 11, 2016

tiborvass mentioned this issue Mar 22, 2016

vendor notary for docker1.11 #21303

Merged

WeiZhang555 mentioned this issue Mar 26, 2016

Flaky test: TestDockerNetworkHostModeUngracefulDaemonRestart #21545

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test: TestDockerNetworkHostModeUngracefulDaemonRestart #19368

Flaky test: TestDockerNetworkHostModeUngracefulDaemonRestart #19368

clnperez commented Jan 15, 2016

thaJeztah commented Jan 20, 2016

thaJeztah commented Jan 20, 2016

clnperez commented Jan 20, 2016

clnperez commented Jan 20, 2016

tophj-ibm commented Jan 21, 2016

clnperez commented Jan 21, 2016

thaJeztah commented Feb 1, 2016

mavenugo commented Feb 1, 2016

clnperez commented Feb 2, 2016

mavenugo commented Feb 3, 2016

clnperez commented Feb 3, 2016

clnperez commented Feb 4, 2016

icecrime commented Feb 11, 2016

tophj-ibm commented Feb 11, 2016

Flaky test: TestDockerNetworkHostModeUngracefulDaemonRestart #19368

Flaky test: TestDockerNetworkHostModeUngracefulDaemonRestart #19368

Comments

clnperez commented Jan 15, 2016

./docker info

thaJeztah commented Jan 20, 2016

thaJeztah commented Jan 20, 2016

clnperez commented Jan 20, 2016

clnperez commented Jan 20, 2016

tophj-ibm commented Jan 21, 2016

clnperez commented Jan 21, 2016

thaJeztah commented Feb 1, 2016

mavenugo commented Feb 1, 2016

clnperez commented Feb 2, 2016

mavenugo commented Feb 3, 2016

clnperez commented Feb 3, 2016

clnperez commented Feb 4, 2016

icecrime commented Feb 11, 2016

tophj-ibm commented Feb 11, 2016