Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception during conversion can make the process hang #791

Closed
apyrgio opened this issue Apr 30, 2024 · 3 comments · Fixed by #793
Closed

Exception during conversion can make the process hang #791

apyrgio opened this issue Apr 30, 2024 · 3 comments · Fixed by #793
Labels
bug Something isn't working timeout Dangerzone Times Out
Milestone

Comments

@apyrgio
Copy link
Contributor

apyrgio commented Apr 30, 2024

While performing our QA tests for 0.6.1 on Windows, we tested some success and error cases, to see how our termination logic (see #749 and #772) works for them. Below are the logs for each run:

Successful conversion
$ .\dev_scripts\dangerzone-cli .\tests\test_docs\sample_bmp.bmp
Converting document to safe PDF
[DEBUG] Marking doc 1648eq as 'converting'
[INFO ] > 'C:\Program Files\Docker\Docker\resources\bin\docker.EXE' run --network none -u dangerzone --security-opt=no-new-privileges:true --cap-drop all --rm -i --name dangerzone-doc-to-pixels-1648eq dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.doc_to_pixels
[INFO ] [doc 1648eq] 0% Converting page 1/1 to pixels
[INFO ] [doc 1648eq] 49% Converted document to pixels
[INFO ] Conversion output (doc to pixels)
----- DOC TO PIXELS LOG START -----
Converting page 1/1 to pixels
Converted document to pixels
----- DOC TO PIXELS LOG END -----
[WARNING] Failed to kill container dangerzone-doc-to-pixels-1648eq: Command '['C:\\Program Files\\Docker\\Docker\\resources\\bin\\docker.EXE', 'kill', 'dangerzone-doc-to-pixels-1648eq']' returned non-zero exit status 1.
[WARNING] Stdout from the kill command: b''
[WARNING] Stderr from the kill command: b'Error response from daemon: cannot kill container: dangerzone-doc-to-pixels-1648eq: No such container: dangerzone-doc-to-pixels-1648eq\n'
[INFO ] > 'C:\Program Files\Docker\Docker\resources\bin\docker.EXE' run --network none -u dangerzone --security-opt=no-new-privileges:true --cap-drop all --rm -i --name dangerzone-pixels-to-pdf-1648eq -v 'C:\Users\apyrg\AppData\Local\Temp\tmpc_8gg2ys:/safezone:Z' -e OCR=0 -e OCR_LANGUAGE=None dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.pixels_to_pdf
[INFO ] [doc 1648eq] 50% Converting page 1/1 from pixels to PDF
[INFO ] [doc 1648eq] 100% Safe PDF created
[INFO ] Conversion output: (pixels to PDF)
----- PIXELS TO PDF LOG START -----

----- PIXELS TO PDF LOG END -----
[DEBUG] Marking doc 1648eq as 'safe'
Failed conversion by the conversion process
$ .\dev_scripts\dangerzone-cli .\tests\test_docs\sample_bad_pdf.pdf
Converting document to safe PDF
[DEBUG] Marking doc Bd5ntr as 'converting'
[INFO ] > 'C:\Program Files\Docker\Docker\resources\bin\docker.EXE' run --network none -u dangerzone --security-opt=no-new-privileges:true --cap-drop all --rm -i --name dangerzone-doc-to-pixels-Bd5ntr dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.doc_to_pixels
Exception: The document format is not supported
[ERROR] [doc Bd5ntr] 0% The document format is not supported
[DEBUG] Marking doc Bd5ntr as 'failed'

Failed to convert document(s)
Failed conversion by the host
$ .\dev_scripts\dangerzone-cli .\tests\test_docs\sample_bad_max_width.pdf
Converting document to safe PDF
[DEBUG] Marking doc ervQpR as 'converting'
[INFO ] > 'C:\Program Files\Docker\Docker\resources\bin\docker.EXE' run --network none -u dangerzone --security-opt=no-new-privileges:true --cap-drop all --rm -i --name dangerzone-doc-to-pixels-ervQpR dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.doc_to_pixels
[INFO ] [doc ervQpR] 0% Converting page 1/1 to pixels
[WARNING] Failed to kill container dangerzone-doc-to-pixels-ervQpR: Command '['C:\\Program Files\\Docker\\Docker\\resources\\bin\\docker.EXE', 'kill', 'dangerzone-doc-to-pixels-ervQpR']' returned non-zero exit status 1.
[WARNING] Stdout from the kill command: b''
[WARNING] Stderr from the kill command: b'Error response from daemon: cannot kill container: dangerzone-doc-to-pixels-ervQpR: container 07a0b4524bc8f43f75ea1a3fca278fcc13598bff910f09216a3dffc176c24ddd is not running\n'
[WARNING] Conversion process did not terminate gracefully after 15 seconds. Killing it forcefully...
[ERROR] [doc ervQpR] 0% A page exceeded the maximum width.
[DEBUG] Marking doc ervQpR as 'failed'

Failed to convert document(s)

Two main things are of note here:

  1. In some error cases, our termination logic reports that the container does not exist, whereas the conversion process (docker run ...) still exists.
  2. Specifically for the failed conversion case, when that happens in the host, the conversion process seems to be stuck indefinitely, and we have to kill it.
@apyrgio apyrgio added bug Something isn't working timeout Dangerzone Times Out labels Apr 30, 2024
@apyrgio apyrgio added this to the 0.6.1 milestone Apr 30, 2024
@apyrgio
Copy link
Contributor Author

apyrgio commented Apr 30, 2024

Our original assumption was that if the container has exited, the conversion process should immediately exit as well, maybe with some small delay for some cleanups. So, we originally attributed this to a race condition.

We have noticed this happening in case of successful conversions, where indeed it's due to a race condition. If we wait a tiny bit before checking the conversion process' status, we will see that it has stopped.

Things are very different in the case of failed conversions due to exceptions in the host. In that case, what happens is:

  1. The process in the container calculates the number of pages and width/height of each page.
  2. The host process receives these calculations from stdout, and decides that a unit is out of bounds. In the case of the sample_bad_max_width.pdf document, it's the width.
  3. The host process raises an exception, and from its perspective, the conversion is over. At no point have we terminated the process running in the container!
  4. In the meantime, the container will continue working on the document, and write pixels to stdout.
  5. At some point, the container will exit or we will kill it with docker kill .... This doesn't matter much, but what matters here is that Docker is very careful with the container's stdout. It does not want to lose it, and thus the docker run ... process will block until we have read every byte from the container.
  6. Deadlock! Our termination logic does not read data from standard streams, nor does it terminate the docker process, so both process will remain stuck.
    • Thankfully, we force kill the conversion process after 15 seconds, so we do end up making progress.

@apyrgio
Copy link
Contributor Author

apyrgio commented Apr 30, 2024

Now that we know what's going on behind the scenes, let's answer the above questions:

  1. In some error cases, our termination logic reports that the container does not exist, whereas the conversion process (docker run ...) still exists.

That's because the conversion process may still linger, until the parent has read all the data that the container has written.

  1. Specifically for the failed conversion case, when that happens in the host, the conversion process seems to be stuck indefinitely, and we have to kill it.

The conversion process remains stuck, because the container has written lots of data to the stdout, and the host process has not read them.

@apyrgio
Copy link
Contributor Author

apyrgio commented Apr 30, 2024

How can we improve here? We can do the following:

  1. Unconditionally kill the container once the conversion is over, and don't check if kill succeeded, since it's racy.
    • Log a warning if the container still exists after kill, since this is not racy, and not expected.
  2. Gracefully terminate the conversion process after we terminate the container, since we at that point, we don't want to read its data.

This was referenced Apr 30, 2024
apyrgio added a commit that referenced this issue Apr 30, 2024
Gracefully terminate certain conversion processes that may get stuck
when writing lots of data to stdout. Also, handle a race condition when
a conversion process terminates slightly after the associated container.

Fixes #791
apyrgio added a commit that referenced this issue May 9, 2024
Gracefully terminate certain conversion processes that may get stuck
when writing lots of data to stdout. Also, handle a race condition when
a conversion process terminates slightly after the associated container.

Fixes #791
@apyrgio apyrgio closed this as completed in ff25fa3 May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working timeout Dangerzone Times Out
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant