Skip to content
This repository has been archived by the owner on Jan 23, 2024. It is now read-only.

Preserve exit code when docker-compose exists with an error #800

Closed
sorenlouv opened this issue Mar 24, 2020 · 11 comments · Fixed by #803
Closed

Preserve exit code when docker-compose exists with an error #800

sorenlouv opened this issue Mar 24, 2020 · 11 comments · Fixed by #803

Comments

@sorenlouv
Copy link
Member

Invoking apm-integration testing doesn't preserve the exit code from docker-compose.

Demonstration

scripts/compose.py start master

While running the above command, I kill docker which results in the following error:

ERROR: dial unix docker.raw.sock: connect: connection refused

However, the exit code is 0 indicating that everything went fine.

image

@kuisathaverat
Copy link
Contributor

kuisathaverat commented Mar 24, 2020

we run the docker-compose in detached mode, so this is the expected behavior the process is launched in background, when we saw this abrupt fail?

https://github.com/elastic/apm-integration-testing/blob/master/scripts/modules/cli.py#L629-L634

@kuisathaverat
Copy link
Contributor

I have remembered how we manage this on the test, we use docker-compose-wait to wait for all the containers up and ready if docker-compose-wait failed the make target fails.

https://github.com/elastic/apm-integration-testing/blob/master/Makefile#L116

@sorenlouv
Copy link
Member Author

we run the docker-compose in detached mode, so this is the expected behavior the process is launched in background, when we saw this abrupt fail?

Is this a question for me?
I don't think this is expected behaviour from a consumer point-of-view. When any subprocess of compose.py fails, that should be propagated up, so the consumer can handle it. Right now it's being swallowed.

@kuisathaverat
Copy link
Contributor

@sqren I meant, this is the expected behavior of docker-compose when you launch docker-compose up -d you are running it in the background so you do not care about the return code. If you run it in foreground it takes the terminal to show you the logs, so you can not make anything else in the terminal. Because compose.py is used to generate the docker-compose file and launch the containers, we delegate the responsibility of check everything is OK to docker-compose-wait and then we can run the tests. Everything is orchestrated by make docker-test-agent-go for example.

What it is the use case where you would run compose.py start and you need the return code in the same terminal? remember that if you launch docker-compose up in the foreground and everything goes OK the terminal is taken by docker-comose, you cannot run anything else in that terminal after that.

@sorenlouv
Copy link
Member Author

sorenlouv commented Mar 25, 2020

this is the expected behavior of docker-compose when you launch docker-compose up -d you are running it in the background so you do not care about the return code

I don't think this is accurate. docker-compose will run in the foreground until images have been downloaded and containers built. It'll then exit as soon containers are up. See this for instance:

image

In the above case docker compose returns exit code "1" because an error happened while downloading the image. If it had succeeded it would exit with code 0 and continue running the container(s) in the background

scripts/compose.py also runs the build step in the foreground but returns a 0 exit code even though it failed to download the image:
image

So to be clear: I'm not talking about the exit code if any of the docker containers exit unexpectedly. I'm talking about the exit code for the docker-compose process itself.

What it is the use case where you would run compose.py start and you need the return code in the same terminal?

I'm running ./scripts/compose.py as part of a bigger bash program. I want to exit the bash program if ./scripts/compose.py fails. The exit code is the easiest way to achieve this.

@kuisathaverat
Copy link
Contributor

kuisathaverat commented Mar 25, 2020

@sqren
I think I did not be clear, we launch a subprocess from Python to avoid block the terminal
https://github.com/elastic/apm-integration-testing/blob/master/scripts/modules/cli.py#L404, this is a fork of the current process, change this behavior impacts how ITs works.

I'm running ./scripts/compose.py as part of a bigger bash program. I want to exit the bash program if ./scripts/compose.py fails. The exit code is the easiest way to achieve this.

if the do not launch a subprocess this terminal blocks for the docker-compose process, so you can not continue with the execution of the script. see the following script.

#!/usr/bin/env bash

docker-compose up
echo "*****************************************************"
echo "*****************************************************"
echo "*****************************************************"
echo "*****************************************************"
echo "*****************************************************"
echo "I'll never be executed until docker-compose is killed"
echo "*****************************************************"
echo "*****************************************************"
echo "*****************************************************"
echo "*****************************************************"

On the other screen capture, you post

image

in this case compose.py is no longer running, it ends with code 0, the code that you have in the command "echo $?", the forked process docker-compose is detached of the terminal and it reports its exit code nowhere.

You should use the same approach we use for the ITs call docker-compose-wait command, it is a Python command that waits for all Docker container in the docker-compose file

@victor is working on the CI part of the execution of your script, we see that the script has a bunch of logic to orchestrate the execution of the containers, that logic is easy to manage on the docker-compose side. When we have the final solution for the CI we will review the script to see if you are interested in removing some parts and relay on the compose.py

@sorenlouv
Copy link
Member Author

sorenlouv commented Mar 25, 2020

if the do not launch a subprocess this terminal blocks for the docker-compose process, so you can not continue with the execution of the script. see the following script.

In the example you give you are not running in detached mode. If you run in detached mode docker-compose will exit with the appropriate exit code once the containers are up, and you'll then see the message "I'll never be executed until docker-compose is killed".

I think I did not be clear, we launch a subprocess from Python to avoid block the terminal
https://github.com/elastic/apm-integration-testing/blob/master/scripts/modules/cli.py#L404, this is a fork of the current process, change this behavior impacts how ITs works.

That's clear and I think it's fine to run as a subprocess.

the forked process docker-compose is detached of the terminal and it reports its exit code nowhere.

And this is what I'm arguing to improve. I'm by no means an experienced python dev so take the following with a grain of salt. From the python docs it looks like it's possible to get the return code from the subprocess like so:

    def run_docker_compose_process(self, docker_compose_cmd):
        try:
            return subprocess.call(docker_compose_cmd)
            
        except OSError as err:
            print('ERROR: Docker Compose might be missing. See below for further details.\n')
            raise OSError(err)

    def start_handler(self):
        returncode = run_docker_compose_process(...)
        sys.exit(returncode)

WDYT?

@kuisathaverat
Copy link
Contributor

In the example you give you are not running in detached mode. If you run in detached mode docker-compose will exit when the containers are up with the appropriate exit code, and you'll see the message "I'll never be executed until docker-compose is killed".

I know this is an example of what happens when you launch compose in a regular way, or you are waiting for the exit code of a process in background, the terminal is blocked waiting for the process.

And this is what I'm arguing to improve. I'm by no means experienced in python so take below with a grain of salt, but from the python docs it looks like it's possible to get the return code from the sub process:

@sqren it was possible if compose.py garb the exit code. compose.py generates the docker-compose file, launch the docker-compose -d command and exits, compose.py does not wait for docker-compose -d exit code. So there is no return code at leats the docker-compose fails to start (for example if you do not have docker or docker-compose installed) if the process is launched correctly, and docker-compose fail pulling and image or building an image.
The following code could work as you expected.

    def run_docker_compose_process(self, docker_compose_cmd):
        try:
            returncode = subprocess.call(docker_compose_cmd)
        except OSError as err:
            print('ERROR: Docker Compose might be missing. See below for further details.\n')
            raise OSError(err)
       if returncode != 0:
         sys.exit(returncode)

@sorenlouv
Copy link
Member Author

compose.py does not wait for docker-compose -d exit code

Hmmm.. I might be misunderstanding you, but subprocess.call returns the returncode of the docker-compose subprocess, which is exactly what I'm interested in.

I made a proof of concept here: #804

@kuisathaverat
Copy link
Contributor

kuisathaverat commented Mar 25, 2020

see #803 this use check_call to wrap all docker-compose calls if any fails it will return 1 as exit code, otherwise return 0.

@sorenlouv
Copy link
Member Author

That's great! 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants