Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: TestRunInteractiveWithRestartPolicy #39352

Closed
thaJeztah opened this issue Jun 11, 2019 · 11 comments · Fixed by #39853
Closed

Flaky test: TestRunInteractiveWithRestartPolicy #39352

thaJeztah opened this issue Jun 11, 2019 · 11 comments · Fixed by #39853

Comments

@thaJeztah
Copy link
Member

This test looks to be flaky recently; e.g. #38806 (comment), and #39351 (comment)

https://jenkins.dockerproject.org/job/Docker-PRs-WoW-RS5-Process/2722/console

03:45:02 FAIL: docker_cli_run_test.go:1792: DockerSuite.TestRunInteractiveWithRestartPolicy
03:45:02 
03:45:02 assertion failed: 
03:45:02 Command:  d:\CI\CI-f4d92b94d9\binary\docker.exe run -i --name test-inter-restart --restart=always busybox sh
03:45:02 ExitCode: 0
03:45:02 Stdout:   
03:45:02 Stderr:   
03:45:02 
03:45:02 Failures:
03:45:02 ExitCode was 0 expected 11

This test was flaky once before, which was addressed by increasing the timeout; #27078

@thaJeztah thaJeztah changed the title Flaky test on Windows: TestRunInteractiveWithRestartPolicy Flaky test: TestRunInteractiveWithRestartPolicy Jun 18, 2019
@thaJeztah
Copy link
Member Author

Also seeing this occur on experimental; https://jenkins.dockerproject.org/job/Docker-PRs-experimental/45700/console (docker#194)

20:16:51 FAIL: docker_cli_run_test.go:1792: DockerSuite.TestRunInteractiveWithRestartPolicy
20:16:51 
20:16:51 assertion failed: 
20:16:51 Command:  /usr/local/cli/docker run -i --name test-inter-restart --restart=always busybox sh
20:16:51 ExitCode: 0
20:16:51 Error:    <nil>
20:16:51 Stdout:   
20:16:51 Stderr:   
20:16:51 
20:16:51 Failures:
20:16:51 ExitCode was 0 expected 11

@tao12345666333
Copy link
Contributor

Is it because of the high number of build tasks or high load?

@thaJeztah
Copy link
Member Author

No idea; haven't looked into it yet. It has become (more?) flaky recently though

@thaJeztah
Copy link
Member Author

@tao12345666333
Copy link
Contributor

Looked at it quickly. This testcase TestRunInteractiveWithRestartPolicy uses the icmd.StartCmd function.

There is a description of this function in the documentation.

StartCmd starts a command, but doesn't wait for it to finish

https://godoc.org/gotest.tools/icmd#StartCmd

I think we can try to use RunCmd instead of StartCmd.

@thaJeztah
Copy link
Member Author

Reopening this one, as it has become flaky again recently on Windows

=== RUN TestDockerSuite/TestRunInteractiveWithRestartPolicy
--- FAIL: TestDockerSuite/TestRunInteractiveWithRestartPolicy (4.47s)
docker_cli_run_test.go:1797: assertion failed:
Command: d:\CI\PR-40373\4\binary\docker.exe run -i --name test-inter-restart --restart=always busybox sh
ExitCode: 0
Stdout:
Stderr:
Failures:
ExitCode was 0 expected 11

@thaJeztah thaJeztah reopened this Jan 14, 2020
Improving CI automation moved this from Done to In progress Jan 14, 2020
@tao12345666333
Copy link
Contributor

I will continue to follow up.

@thaJeztah
Copy link
Member Author

Thanks! Not sure what changed, but I noticed it failing a couple of times 😅

@vvoland
Copy link
Contributor

vvoland commented Jun 27, 2022

It seems that the test is not flaky - on Windows (EDIT: also happened on Linux) sometimes the return value of docker run --restart=always is zero.
Reproduced with PS script:

do {
docker run --name test -i --restart=always busybox sh -c 'exit 11'
set x $LASTEXITCODE
if ( $x -eq 0 ) {
   docker inspect test
   break
}
docker rm -f test > $nul
} while ( 1 )

After some time, the loop breaks:

[
    {
        "Id": "049f968aed2a79335d66379766e78efce6abd6c44a6647f89000f406213a3d5b",
        "Created": "2022-06-27T11:18:08.2528576Z",
        "Path": "sh",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 11942,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2022-06-27T11:18:09.051621Z",
            "FinishedAt": "2022-06-27T11:18:08.5757808Z"
        },
...

Some observations:

  • Without --restart=always it didn't happen for me.
  • Result is the same regardless if we provide exit 11 via -c or stdin.
  • Changing the command to sleep 5; exit 11 doesn't change the behaviour.
  • FinishedAt is always earlier than StartedAt. This is true for all the cases I've seen so far. So the restart is already done before the docker inspect is called.

Possibly there is some race which results in exit code being obtained after the process got already restarted?

@thaJeztah
Copy link
Member Author

Possibly there is some race which results in exit code being obtained after the process got already restarted?

Yes, that's quite likely; that would definitely explain it. Looks like the test is assuming that we capture the exit status, but combined with a --restart=always, that feels a bit odd (as it would reset the exit-status on restart).

@thaJeztah
Copy link
Member Author

It's a bit odd though that the CLI doesn't exit / capture the exit-code on the first exit though; so that would mean the cli is either "re-attaching" after the container restarts, or never detaches 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Improving CI
  
Done
3 participants