fix: Multiprocessing deadlock #357

nmheim · 2025-08-28T21:56:05Z

Relevant issue or PR

#356

Description of changes

Add a threading.Lock to tesseract_core.runtime.logs.LogPipe
Add a threading.Lock to tesseract_core.sdk.logs.LogPipe
Make apply endpoint as picklable as possible

Testing done

Add new reproducer example to tests (and rename the example?)

codecov · 2025-08-28T22:02:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.66%. Comparing base (a6ad367) to head (16e507c).

❗ There is a different number of reports uploaded between BASE (a6ad367) and HEAD (16e507c). Click for more details.

HEAD has 3 uploads less than BASE

Flag BASE (a6ad367) HEAD (16e507c)

29 26

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #357      +/-   ##
==========================================
- Coverage   76.35%   70.66%   -5.69%     
==========================================
  Files          29       29              
  Lines        3345     3348       +3     
  Branches      525      525              
==========================================
- Hits         2554     2366     -188     
- Misses        558      761     +203     
+ Partials      233      221      -12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

johnbcoughlin · 2025-09-02T17:51:31Z

examples/reproducer/run.sh

+
+docker system prune --force
+
+tesseract run reproducer apply '{"inputs":{}}' --output-path outputs


if we're going to commit this example let's name it something more specific, like process-pool-executor-reproducer

josiahbjorgaard · 2025-09-30T18:56:49Z

I dug into how we are doing this log redirect and the calling thread executes here - the LogPipe thread is executed within an ExitStack(), which I now know means that it calls each item as if it were a context manager.

This is why in LogPipe we do strange things - start and join the thread within the __enter__ and __exit__ method overrides. This is necessary because ExitStack() only uses the context manager methods of the LogPipe and so it must be started and joined in them. I actually find this structure to be overly complicated because I don't see the need to have LogPipe in a thread here, since we necessarily must block the calling thread until the LogPipe run is finished executing - which is why we need .join() in __exit__. I also do not see the need to have it run as a context manager - we could also have it in the calling function like

logpipe = LogPipe()
logpipe.start()
logpipe.join()

but still, why a thread if we block on join without doing other work?

Regardless, without restructuring the LogPipe class, I think the solution here is fine and I'm able to test it succefully. I recommend adding a warning to the join timeout because we are effectively telling it to stop running the LogPipe thread's run method at the timeout.

josiahbjorgaard

LGTM, suggest a warning and a question.

Curious if you think refactoring logpipe to not be threaded would be cleaner code and same functionality?

josiahbjorgaard · 2025-09-30T18:47:15Z

tesseract_core/runtime/logs.py

+            # Use a timeout so something weird happening in the logging thread doesn't
+            # cause this to hang indefinitely
+            #
+            # FIXME: this always times out in the multiprocessing case?


I tested this with timeout=1, 10, and no timeout and the reproducer had no error, but this timeout is going to vary depending on the time it takes run to execute.

Should we add a warning here so that we can revise the timeout later if we frequently find an issue with the timeout, or set it to something even longer?

if self.is_alive(): warn.warning("LogPipe thread timed out while executing.")

Do we need the FIXME, it works for me?

PasteurBot · 2025-10-01T17:12:43Z

CLA signatures required

Thank you for your PR, we really appreciate it! Like many open-source projects, we ask that all contributors sign our Contributor License Agreement before we can accept your contribution. This only needs to be done once per contributor. You can do so by commenting the following on this pull request:

@PasteurBot I have read the CLA Document and I hereby sign the CLA

1 out of 2 committers have signed the CLA.
✅ (nmheim)[https://github.com/nmheim]
❌ @josiahbjorgaard
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

dionhaefner · 2025-11-12T14:47:30Z

superseded by #392

nmheim added 6 commits August 28, 2025 23:22

reproduce logpipe deadlock

7280a8e

clean up

d5083a8

make it work with np.identity

47688ae

Merge branch 'main' into nh/log-lock

3cca8d5

lint

26902f6

clean up

f7838ce

nmheim requested a review from apaleyes September 1, 2025 11:43

johnbcoughlin reviewed Sep 2, 2025

View reviewed changes

josiahbjorgaard self-requested a review September 29, 2025 14:45

josiahbjorgaard reviewed Sep 30, 2025

View reviewed changes

Merge branch 'main' into nh/log-lock

364edca

Merge branch 'main' into nh/log-lock

16e507c

dionhaefner mentioned this pull request Nov 12, 2025

fix: ensure log messages aren't lost when application exits #392

Open

dionhaefner closed this Nov 12, 2025

pasteurlabs locked and limited conversation to collaborators Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Multiprocessing deadlock #357

fix: Multiprocessing deadlock #357

Uh oh!

nmheim commented Aug 28, 2025 •

edited

Loading

Uh oh!

codecov bot commented Aug 28, 2025 •

edited

Loading

Uh oh!

johnbcoughlin Sep 2, 2025 •

edited

Loading

Uh oh!

josiahbjorgaard commented Sep 30, 2025 •

edited

Loading

Uh oh!

josiahbjorgaard left a comment

Uh oh!

josiahbjorgaard Sep 30, 2025

Uh oh!

josiahbjorgaard Sep 30, 2025

Uh oh!

PasteurBot commented Oct 1, 2025

Uh oh!

dionhaefner commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants


		docker system prune --force

		tesseract run reproducer apply '{"inputs":{}}' --output-path outputs

fix: Multiprocessing deadlock #357

fix: Multiprocessing deadlock #357

Uh oh!

Conversation

nmheim commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issue or PR

Description of changes

Testing done

Uh oh!

codecov bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

johnbcoughlin Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josiahbjorgaard commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josiahbjorgaard left a comment

Choose a reason for hiding this comment

Uh oh!

josiahbjorgaard Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

josiahbjorgaard Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

PasteurBot commented Oct 1, 2025

CLA signatures required

Uh oh!

dionhaefner commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nmheim commented Aug 28, 2025 •

edited

Loading

codecov bot commented Aug 28, 2025 •

edited

Loading

johnbcoughlin Sep 2, 2025 •

edited

Loading

josiahbjorgaard commented Sep 30, 2025 •

edited

Loading