Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve exit code when docker-compose exists with an error #800

Closed
sqren opened this issue Mar 24, 2020 · 11 comments
Closed

Preserve exit code when docker-compose exists with an error #800

sqren opened this issue Mar 24, 2020 · 11 comments
Labels

Comments

@sqren
Copy link
Member

@sqren sqren commented Mar 24, 2020

Invoking apm-integration testing doesn't preserve the exit code from docker-compose.

Demonstration

scripts/compose.py start master

While running the above command, I kill docker which results in the following error:

ERROR: dial unix docker.raw.sock: connect: connection refused

However, the exit code is 0 indicating that everything went fine.

image

@sqren sqren mentioned this issue Mar 24, 2020
2 of 2 tasks complete
@kuisathaverat

This comment has been minimized.

Copy link
Contributor

@kuisathaverat kuisathaverat commented Mar 24, 2020

we run the docker-compose in detached mode, so this is the expected behavior the process is launched in background, when we saw this abrupt fail?

https://github.com/elastic/apm-integration-testing/blob/master/scripts/modules/cli.py#L629-L634

@kuisathaverat

This comment has been minimized.

Copy link
Contributor

@kuisathaverat kuisathaverat commented Mar 24, 2020

I have remembered how we manage this on the test, we use docker-compose-wait to wait for all the containers up and ready if docker-compose-wait failed the make target fails.

https://github.com/elastic/apm-integration-testing/blob/master/Makefile#L116

@sqren

This comment has been minimized.

Copy link
Member Author

@sqren sqren commented Mar 24, 2020

we run the docker-compose in detached mode, so this is the expected behavior the process is launched in background, when we saw this abrupt fail?

Is this a question for me?
I don't think this is expected behaviour from a consumer point-of-view. When any subprocess of compose.py fails, that should be propagated up, so the consumer can handle it. Right now it's being swallowed.

@kuisathaverat

This comment has been minimized.

Copy link
Contributor

@kuisathaverat kuisathaverat commented Mar 25, 2020

@sqren I meant, this is the expected behavior of docker-compose when you launch docker-compose up -d you are running it in the background so you do not care about the return code. If you run it in foreground it takes the terminal to show you the logs, so you can not make anything else in the terminal. Because compose.py is used to generate the docker-compose file and launch the containers, we delegate the responsibility of check everything is OK to docker-compose-wait and then we can run the tests. Everything is orchestrated by make docker-test-agent-go for example.

What it is the use case where you would run compose.py start and you need the return code in the same terminal? remember that if you launch docker-compose up in the foreground and everything goes OK the terminal is taken by docker-comose, you cannot run anything else in that terminal after that.

@sqren

This comment has been minimized.

Copy link
Member Author

@sqren sqren commented Mar 25, 2020

this is the expected behavior of docker-compose when you launch docker-compose up -d you are running it in the background so you do not care about the return code

I don't think this is accurate. docker-compose will run in the foreground until images have been downloaded and containers built. It'll then exit as soon containers are up. See this for instance:

image

In the above case docker compose returns exit code "1" because an error happened while downloading the image. If it had succeeded it would exit with code 0 and continue running the container(s) in the background

scripts/compose.py also runs the build step in the foreground but returns a 0 exit code even though it failed to download the image:
image

So to be clear: I'm not talking about the exit code if any of the docker containers exit unexpectedly. I'm talking about the exit code for the docker-compose process itself.

What it is the use case where you would run compose.py start and you need the return code in the same terminal?

I'm running ./scripts/compose.py as part of a bigger bash program. I want to exit the bash program if ./scripts/compose.py fails. The exit code is the easiest way to achieve this.

@kuisathaverat

This comment has been minimized.

Copy link
Contributor

@kuisathaverat kuisathaverat commented Mar 25, 2020

@sqren
I think I did not be clear, we launch a subprocess from Python to avoid block the terminal
https://github.com/elastic/apm-integration-testing/blob/master/scripts/modules/cli.py#L404, this is a fork of the current process, change this behavior impacts how ITs works.

I'm running ./scripts/compose.py as part of a bigger bash program. I want to exit the bash program if ./scripts/compose.py fails. The exit code is the easiest way to achieve this.

if the do not launch a subprocess this terminal blocks for the docker-compose process, so you can not continue with the execution of the script. see the following script.

#!/usr/bin/env bash

docker-compose up
echo "*****************************************************"
echo "*****************************************************"
echo "*****************************************************"
echo "*****************************************************"
echo "*****************************************************"
echo "I'll never be executed until docker-compose is killed"
echo "*****************************************************"
echo "*****************************************************"
echo "*****************************************************"
echo "*****************************************************"

On the other screen capture, you post

image

in this case compose.py is no longer running, it ends with code 0, the code that you have in the command "echo $?", the forked process docker-compose is detached of the terminal and it reports its exit code nowhere.

You should use the same approach we use for the ITs call docker-compose-wait command, it is a Python command that waits for all Docker container in the docker-compose file

@victor is working on the CI part of the execution of your script, we see that the script has a bunch of logic to orchestrate the execution of the containers, that logic is easy to manage on the docker-compose side. When we have the final solution for the CI we will review the script to see if you are interested in removing some parts and relay on the compose.py

@sqren

This comment has been minimized.

Copy link
Member Author

@sqren sqren commented Mar 25, 2020

if the do not launch a subprocess this terminal blocks for the docker-compose process, so you can not continue with the execution of the script. see the following script.

In the example you give you are not running in detached mode. If you run in detached mode docker-compose will exit with the appropriate exit code once the containers are up, and you'll then see the message "I'll never be executed until docker-compose is killed".

I think I did not be clear, we launch a subprocess from Python to avoid block the terminal
https://github.com/elastic/apm-integration-testing/blob/master/scripts/modules/cli.py#L404, this is a fork of the current process, change this behavior impacts how ITs works.

That's clear and I think it's fine to run as a subprocess.

the forked process docker-compose is detached of the terminal and it reports its exit code nowhere.

And this is what I'm arguing to improve. I'm by no means an experienced python dev so take the following with a grain of salt. From the python docs it looks like it's possible to get the return code from the subprocess like so:

    def run_docker_compose_process(self, docker_compose_cmd):
        try:
            return subprocess.call(docker_compose_cmd)
            
        except OSError as err:
            print('ERROR: Docker Compose might be missing. See below for further details.\n')
            raise OSError(err)

    def start_handler(self):
        returncode = run_docker_compose_process(...)
        sys.exit(returncode)

WDYT?

@kuisathaverat

This comment has been minimized.

Copy link
Contributor

@kuisathaverat kuisathaverat commented Mar 25, 2020

In the example you give you are not running in detached mode. If you run in detached mode docker-compose will exit when the containers are up with the appropriate exit code, and you'll see the message "I'll never be executed until docker-compose is killed".

I know this is an example of what happens when you launch compose in a regular way, or you are waiting for the exit code of a process in background, the terminal is blocked waiting for the process.

And this is what I'm arguing to improve. I'm by no means experienced in python so take below with a grain of salt, but from the python docs it looks like it's possible to get the return code from the sub process:

@sqren it was possible if compose.py garb the exit code. compose.py generates the docker-compose file, launch the docker-compose -d command and exits, compose.py does not wait for docker-compose -d exit code. So there is no return code at leats the docker-compose fails to start (for example if you do not have docker or docker-compose installed) if the process is launched correctly, and docker-compose fail pulling and image or building an image.
The following code could work as you expected.

    def run_docker_compose_process(self, docker_compose_cmd):
        try:
            returncode = subprocess.call(docker_compose_cmd)
        except OSError as err:
            print('ERROR: Docker Compose might be missing. See below for further details.\n')
            raise OSError(err)
       if returncode != 0:
         sys.exit(returncode)
@sqren

This comment has been minimized.

Copy link
Member Author

@sqren sqren commented Mar 25, 2020

compose.py does not wait for docker-compose -d exit code

Hmmm.. I might be misunderstanding you, but subprocess.call returns the returncode of the docker-compose subprocess, which is exactly what I'm interested in.

I made a proof of concept here: #804

@kuisathaverat

This comment has been minimized.

Copy link
Contributor

@kuisathaverat kuisathaverat commented Mar 25, 2020

see #803 this use check_call to wrap all docker-compose calls if any fails it will return 1 as exit code, otherwise return 0.

@sqren

This comment has been minimized.

Copy link
Member Author

@sqren sqren commented Mar 25, 2020

That's great! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants
You can’t perform that action at this time.