Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: DockerSwarmSuite.TestAPISwarmServicesStateReporting #39564

Open
arkodg opened this issue Jul 19, 2019 · 1 comment
Open

Flaky test: DockerSwarmSuite.TestAPISwarmServicesStateReporting #39564

arkodg opened this issue Jul 19, 2019 · 1 comment

Comments

@arkodg
Copy link
Contributor

arkodg commented Jul 19, 2019

Recently while raising a PR, noticed that this test case DockerSwarmSuite.TestAPISwarmServicesStateReporting is failing in #39562

Logs are at - https://jenkins.dockerproject.org/job/Docker-PRs-experimental/46036/console

FAIL: docker_api_swarm_service_test.go:540: DockerSwarmSuite.TestAPISwarmServicesStateReporting
16:11:24 
16:11:24 Creating a new daemon
16:11:24 [d0cc352ef387b] waiting for daemon to start
16:11:24 [d0cc352ef387b] waiting for daemon to start
16:11:24 [d0cc352ef387b] daemon started
16:11:24 
16:11:24 Creating a new daemon
16:11:24 [d453b2e97f137] waiting for daemon to start
16:11:24 [d453b2e97f137] waiting for daemon to start
16:11:24 [d453b2e97f137] daemon started
16:11:24 
16:11:24 Creating a new daemon
16:11:24 [ddd7f396bf6a6] waiting for daemon to start
16:11:24 [ddd7f396bf6a6] waiting for daemon to start
16:11:24 [ddd7f396bf6a6] daemon started
16:11:24 
16:11:24 waited for 2.039413398s (out of 30s)
16:11:24 waited for 850.587076ms (out of 30s)
16:11:24 waited for 29.740902ms (out of 30s)
16:11:24 assertion failed: expression is false: containers2[i] == nil

Sleeping for 1s might not be a reliable way to make sure all daemons are up

time.Sleep(1 * time.Second) // make sure all daemons are ready to accept

@tonistiigi
Copy link
Member

That sleep should be fine. The error appears later. Seems that after stopping container we still get that container in the active containers list. If we were to add a sleep then one place would be before

waitAndAssert(c, defaultReconciliationTimeout, reducedCheck(sumAsIntegers, d1.CheckActiveContainerCount, d2.CheckActiveContainerCount, d3.CheckActiveContainerCount), checker.Equals, instances)
. Currently, this check might not work because it is intended to check that new task was created but in this case probably just has not seen the stopped task to leave yet.

For the actual issue on why we get the bad status report, I wonder if this is

moby/daemon/monitor.go

Lines 75 to 89 in b4058a6

c.SetStopped(&exitStatus)
defer daemon.autoRemove(c)
}
defer c.Unlock() // needs to be called before autoRemove
// cancel healthcheck here, they will be automatically
// restarted if/when the container is started again
daemon.stopHealthchecks(c)
attributes := map[string]string{
"exitCode": strconv.Itoa(int(ei.ExitCode)),
}
daemon.LogContainerEventWithAttributes(c, "die", attributes)
daemon.Cleanup(c)
daemon.setStateCounter(c)
cpErr := c.CheckpointTo(daemon.containersReplica)
where it sets the container as stopped(releasing the /stop afaics) before updating the snapshot. It also does possibly slow things like cleanup in between. @cpuguy83 @thaJeztah

@thaJeztah thaJeztah added this to To do in Improving CI via automation Jul 19, 2019
@thaJeztah thaJeztah changed the title DockerSwarmSuite.TestAPISwarmServicesStateReporting is flaky Flaky test: DockerSwarmSuite.TestAPISwarmServicesStateReporting Jul 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Improving CI
  
To do
Development

No branches or pull requests

2 participants