Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TESTING do not merge; debugging delays in DockerNetworkSuite #38854

Open
wants to merge 11 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@thaJeztah
Copy link
Member

thaJeztah commented Mar 12, 2019

perhaps this shows where the extra time is taken #38852

@thaJeztah

This comment has been minimized.

Copy link
Member Author

thaJeztah commented Mar 12, 2019

Ahm.. CI doesn't use the Makefile 😞

@thaJeztah thaJeztah force-pushed the thaJeztah:test_network_suite_delays branch from 0f684e9 to 41fcd9a Mar 12, 2019

@thaJeztah thaJeztah requested a review from tianon as a code owner Mar 12, 2019

@codecov

This comment has been minimized.

Copy link

codecov bot commented Mar 12, 2019

Codecov Report

❗️ No coverage uploaded for pull request base (master@8aa3262). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #38854   +/-   ##
=========================================
  Coverage          ?    36.9%           
=========================================
  Files             ?      612           
  Lines             ?    45338           
  Branches          ?        0           
=========================================
  Hits              ?    16732           
  Misses            ?    26319           
  Partials          ?     2287
@thaJeztah

This comment has been minimized.

Copy link
Member Author

thaJeztah commented Mar 12, 2019

hah; interesting; actually getting some failures;

13:08:19 assertion failed: 
13:08:19 Command:  /usr/local/cli/docker network create --subnet=192.168.0.0/16 test0
13:08:19 ExitCode: 1
13:08:19 Error:    exit status 1
13:08:19 Stdout:   
13:08:19 Stderr:   Error response from daemon: Pool overlaps with other one on this address space
13:08:19 
13:08:19 
13:08:19 Failures:
13:08:19 ExitCode was 1 expected 0
13:08:19 Expected no error
13:08:19 START: docker_cli_network_unix_test.go:56: DockerNetworkSuite.TearDownTest
13:08:19 Daemon db0fadaf499f0 is not started
13:08:19 PASS: docker_cli_network_unix_test.go:56: DockerNetworkSuite.TearDownTest	0.009s
13:08:19 
13:08:19 FAIL: docker_cli_network_unix_test.go:744: DockerNetworkSuite.TestDockerNetworkIPAMInvalidCombinations
13:08:19 
13:08:19 START: docker_cli_network_unix_test.go:592: DockerNetworkSuite.TestDockerNetworkIPAMMultipleNetworks
13:08:19 START: docker_cli_network_unix_test.go:52: DockerNetworkSuite.SetUpTest
13:08:19 Creating a new daemon
13:08:19 PASS: docker_cli_network_unix_test.go:52: DockerNetworkSuite.SetUpTest	0.002s
13:08:19 
13:08:19 assertion failed: 
13:08:19 Command:  /usr/local/cli/docker network create --subnet=192.168.0.0/16 test1
13:08:19 ExitCode: 1
13:08:19 Error:    exit status 1
13:08:19 Stdout:   
13:08:19 Stderr:   Error response from daemon: Pool overlaps with other one on this address space
13:08:19 
13:08:19 
13:08:19 Failures:
13:08:19 ExitCode was 1 expected 0
13:08:19 Expected no error
13:08:19 START: docker_cli_network_unix_test.go:56: DockerNetworkSuite.TearDownTest
13:08:19 Daemon d73608b7c3d32 is not started
13:08:19 PASS: docker_cli_network_unix_test.go:56: DockerNetworkSuite.TearDownTest	0.013s
13:08:19 
13:08:19 FAIL: docker_cli_network_unix_test.go:592: DockerNetworkSuite.TestDockerNetworkIPAMMultipleNetworks
@thaJeztah

This comment has been minimized.

Copy link
Member Author

thaJeztah commented Mar 12, 2019

@jhowardmsft interesting; I don't see the delay here in CI 🤔

@jhowardmsft

This comment has been minimized.

Copy link
Contributor

jhowardmsft commented Mar 12, 2019

That is weird. If you look back at some the job history, at least the ones I quickly looked at, almost all of them have that delay. Curious!

@thaJeztah

This comment has been minimized.

Copy link
Member Author

thaJeztah commented Mar 12, 2019

Curious as well; wondering if it's due to some state that a machine could be in, or a networking issue causing some things to take longer.

Let me remove the TESTDIRS and the filter to have it run all the tests; perhaps some earlier tests cause something to be left behind affecting this

@thaJeztah thaJeztah force-pushed the thaJeztah:test_network_suite_delays branch 2 times, most recently from 7ec6f56 to de585a7 Mar 13, 2019

@thaJeztah

This comment has been minimized.

Copy link
Member Author

thaJeztah commented Mar 13, 2019

Ok, interesting, so;

When selectively running some tests (using -check.f DockerNetworkSuite.*):

  • some tests fail (see above) #38854 (comment)
  • TearDownTest is fast (0.xx seconds);
    13:08:19 START: docker_cli_network_unix_test.go:56: DockerNetworkSuite.TearDownTest
    13:08:19 Daemon d73608b7c3d32 is not started
    13:08:19 PASS: docker_cli_network_unix_test.go:56: DockerNetworkSuite.TearDownTest	0.013s
    

That's highly suspicious; looks like in that case TearDownTest might be skipped?

When running all tests, TearDownTest takes a long time (15 seconds), and looks to be the culprit:

20:06:13 START: docker_cli_network_unix_test.go:56: DockerNetworkSuite.TearDownTest
20:06:13 Daemon dc7bd08653476 is not started
20:06:28 PASS: docker_cli_network_unix_test.go:56: DockerNetworkSuite.TearDownTest	15.017s

@thaJeztah thaJeztah force-pushed the thaJeztah:test_network_suite_delays branch from d3e767c to ce1d0c2 Mar 27, 2019

@thaJeztah

This comment has been minimized.

Copy link
Member Author

thaJeztah commented Mar 28, 2019

ok; looks like we have more details now; https://jenkins.dockerproject.org/job/Docker-PRs-s390x/13729/console

19:05:24 START: docker_cli_network_unix_test.go:1792: DockerNetworkSuite.TestConntrackFlowsLeak
19:05:24 START: docker_cli_network_unix_test.go:52: DockerNetworkSuite.SetUpTest
19:05:24 Creating a new daemon
19:05:24 PASS: docker_cli_network_unix_test.go:52: DockerNetworkSuite.SetUpTest	0.000s
19:05:24 
19:05:24 START: docker_cli_network_unix_test.go:56: DockerNetworkSuite.TearDownTest
19:05:24 Daemon d1bc8c31dd97d is not started
19:05:24 completed stopping daemon
19:05:24 start unpauseAllContainers
19:05:24 end unpauseAllContainers
19:05:24 start deleteAllContainers
19:05:24 end deleteAllContainers
19:05:24 start deleteAllImages
19:05:24 end deleteAllImages
19:05:24 start deleteAllVolumes
19:05:39 end deleteAllVolumes
19:05:39 start deleteAllNetworks
19:05:39 end deleteAllNetworks
19:05:39 start deleteAllPlugins
19:05:39 end deleteAllPlugins
19:05:39 completed teardown test
19:05:39 FAIL: docker_cli_network_unix_test.go:56: DockerNetworkSuite.TearDownTest

Looks like deleteAllVolumes is taking 15..16 seconds;

19:05:24 start deleteAllVolumes
19:05:39 end deleteAllVolumes

thaJeztah added some commits Mar 12, 2019

TESTING do not merge
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
remove testdirs
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
remove filter
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add more logging
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
even more logging
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

@thaJeztah thaJeztah force-pushed the thaJeztah:test_network_suite_delays branch from ce1d0c2 to ed868da Apr 10, 2019

@thaJeztah thaJeztah changed the title TESTING do not merge TESTING do not merge; debugging delays in DockerNetworkSuite Apr 10, 2019

Debug deleteAllVolumes
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

@thaJeztah thaJeztah force-pushed the thaJeztah:test_network_suite_delays branch from ed868da to 5e619a4 Apr 10, 2019

@thaJeztah

This comment has been minimized.

Copy link
Member Author

thaJeztah commented Apr 10, 2019

Ah... I begin to have a suspicion;

16:04:39 START: docker_cli_network_unix_test.go:57: DockerNetworkSuite.TearDownTest
16:04:39 Daemon d09f26bf3bdd1 is not started
16:04:39 completed stopping daemon
16:04:39 start deleteAllVolumes
16:04:39 deleteAllVolumes: start c.VolumeList
16:04:54 deleteAllVolumes: end c.VolumeList
16:04:54 deleteAllVolumes: start cleaning up: found 0 volumes, have 0 proteced volumes
16:04:54 deleteAllVolumes: start cleaning up: 
16:04:54 end deleteAllVolumes
16:04:54 completed teardown test

The daemon is stopped, and then we try to get the volumelist; could that be it? Is the 15 seconds just a timeout?

@thaJeztah

This comment has been minimized.

Copy link
Member Author

thaJeztah commented Apr 10, 2019

oooooh;

time="2019-04-10T15:29:23.734300744Z" level=debug msg="Calling POST /swarm/init"
time="2019-04-10T15:29:23.735243307Z" level=debug msg="form data: {\"AdvertiseAddr\":\"\",\"AutoLockManagers\":false,\"Availability\":\"\",\"DataPathAddr\":\"\",\"DataPathPort\":0,\"DefaultAddrPool\":null,\"ForceNewCluster\":false,\"ListenAddr\":\"\",\"Spec\":{\"CAConfig\":{},\"Dispatcher\":{},\"EncryptionConfig\":{\"AutoLockManagers\":false},\"Labels\":null,\"Orchestration\":{},\"Raft\":{\"ElectionTick\":0,\"HeartbeatTick\":0},\"TaskDefaults\":{}},\"SubnetSize\":0}"
time="2019-04-10T15:29:23.735484919Z" level=error msg="Error initializing swarm: invalid ListenAddr \"\": invalid empty address"
time="2019-04-10T15:29:23.740543249Z" level=debug msg="Calling POST /swarm/join"
time="2019-04-10T15:29:23.740619460Z" level=debug msg="form data: {\"AdvertiseAddr\":\"\",\"Availability\":\"\",\"DataPathAddr\":\"\",\"JoinToken\":\"*****\",\"ListenAddr\":\"0.0.0.0:2377\",\"RemoteAddrs\":[\"\"]}"
time="2019-04-10T15:29:23.744312349Z" level=info msg="Processing signal 'interrupt'"
time="2019-04-10T15:29:23.744387485Z" level=debug msg="daemon configured with a 15 seconds minimum shutdown timeout"
time="2019-04-10T15:29:23.744428585Z" level=debug msg="start clean shutdown of all containers with a 15 seconds timeout..."
time="2019-04-10T15:29:23.746174421Z" level=debug msg="Unix socket /tmp/dxr/d107265fcafa4/libnetwork/7ac4e8e29fd6e11abdac705e9522377a197c48a8937d977203d561819042e604.sock doesn't exist. cannot accept client connections"
time="2019-04-10T15:29:23.746228335Z" level=debug msg="Cleaning up old mountid : start."
time="2019-04-10T15:29:23.746300688Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
time="2019-04-10T15:29:23.747696735Z" level=debug msg="Cleaning up old mountid : done."
time="2019-04-10T15:29:23.749336360Z" level=debug msg="Clean shutdown succeeded"
daemon configured with a 15 seconds minimum shutdown timeout

thaJeztah added some commits Apr 10, 2019

Set daemon shutdown timeout
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Remove some debugging
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

@thaJeztah thaJeztah force-pushed the thaJeztah:test_network_suite_delays branch from 4d9400e to ece70a3 Apr 10, 2019

Remove failing all tests
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

@thaJeztah thaJeztah force-pushed the thaJeztah:test_network_suite_delays branch from a28c1a9 to c2754bc Apr 10, 2019

thaJeztah added some commits Apr 10, 2019

Disable "verbose"
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Remove debug logs
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah

This comment has been minimized.

Copy link
Member Author

thaJeztah commented Apr 11, 2019

Looks like these tests may need a longer timeout 🤔 need to look into the logs for those;

https://jenkins.dockerproject.org/job/Docker-PRs-experimental/44885/console

23:09:17 ----------------------------------------------------------------------
23:09:17 FAIL: docker_cli_events_unix_test.go:390: DockerDaemonSuite.TestDaemonEvents
23:09:17 
23:09:17 Creating a new daemon
23:09:17 [d8dc042f756cd] waiting for daemon to start
23:09:17 [d8dc042f756cd] daemon started
23:09:17 
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 assertion failed: expression is false: strings.Contains(out, s)
23:09:17 [d8dc042f756cd] exiting daemon
23:09:20 
23:09:20 ----------------------------------------------------------------------
23:12:05 FAIL: docker_cli_daemon_test.go:2629: DockerDaemonSuite.TestDaemonShutdownTimeoutWithConfigFile
23:12:05 
23:12:05 Creating a new daemon
23:12:05 [d93451be2b668] waiting for daemon to start
23:12:05 [d93451be2b668] exiting daemon
23:12:05 docker_cli_daemon_test.go:2641:
23:12:05     s.d.Start(c, fmt.Sprintf("--config-file=%s", configFilePath))
23:12:05 /go/src/github.com/docker/docker/internal/test/daemon/daemon.go:201:
23:12:05     t.Fatalf("failed to start daemon with arguments %v : %v", args, err)
23:12:05 ... Error: failed to start daemon with arguments [--config-file=test.json] : [d93451be2b668] Daemon exited during startup: exit status 1
23:12:05 
23:19:20 ----------------------------------------------------------------------
23:19:20 FAIL: docker_api_swarm_test.go:362: DockerSwarmSuite.TestAPISwarmRaftQuorum
23:19:20 
23:19:20 Creating a new daemon
23:19:20 [d1735eddf6eaf] waiting for daemon to start
23:19:20 [d1735eddf6eaf] daemon started
23:19:20 
23:19:20 Creating a new daemon
23:19:20 [d338e75742885] waiting for daemon to start
23:19:20 [d338e75742885] daemon started
23:19:20 
23:19:20 Creating a new daemon
23:19:20 [d20781a947d38] waiting for daemon to start
23:19:20 [d20781a947d38] daemon started
23:19:20 
23:19:20 [d338e75742885] exiting daemon
23:19:20 waited for 1.816213ms (out of 30s)
23:19:20 [d20781a947d38] exiting daemon
23:19:20 docker_api_swarm_test.go:394:
23:19:20     // d1 will eventually step down from leader because there is no longer an active quorum, wait for that to happen
23:19:20     waitAndAssert(c, defaultReconciliationTimeout, func(c *check.C) (interface{}, check.CommentInterface) {
23:19:20         _, err := cli.ServiceCreate(context.Background(), service.Spec, types.ServiceCreateOptions{})
23:19:20         return err.Error(), nil
23:19:20     }, checker.Contains, "Make sure more than half of the managers are online.")
23:19:20 docker_utils_test.go:435:
23:19:20     c.Assert(v, checker, args...)
23:19:20 ... obtained string = "Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
23:19:20 ... substring string = "Make sure more than half of the managers are online."
23:19:20 
23:19:20 waited for 35.040456569s (out of 30s)
23:19:20 [d1735eddf6eaf] exiting daemon

https://jenkins.dockerproject.org/job/Docker-PRs/53706/console

22:39:19 === RUN   TestServiceRemoveKeepsIngressNetwork
22:39:50 --- FAIL: TestServiceRemoveKeepsIngressNetwork (31.36s)
22:39:50     service_test.go:232: Creating a new daemon
22:39:50     daemon.go:293: [dd2006f01dbcd] waiting for daemon to start
22:39:50     daemon.go:325: [dd2006f01dbcd] daemon started
22:39:50     service_test.go:237: timeout hit after 30s: ingress not ready: 0 peers and 0 containers
22:39:50     daemon.go:280: [dd2006f01dbcd] exiting daemon
22:39:50 === RUN   TestServiceWithDataPathPortInit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.