Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: PGF backend crashes at program exit after creating a plot #27437

Closed
startrekdude opened this issue Dec 4, 2023 · 10 comments · Fixed by #27785
Closed

[Bug]: PGF backend crashes at program exit after creating a plot #27437

startrekdude opened this issue Dec 4, 2023 · 10 comments · Fixed by #27785
Labels
backend: pgf OS: Microsoft Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Milestone

Comments

@startrekdude
Copy link

Bug summary

After successfully creating a plot using the PGF backend, Python will crash on exit while running cleanup code.

Code for reproduction

import matplotlib
matplotlib.use("pgf")

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1)
ax.plot(list(range(100)), [x * 2 for x in range(100)])
fig.savefig("test.pgf", format="pgf")

Actual outcome

Two exceptions are produced at exit. The first is:

Traceback (most recent call last):
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\weakref.py", line 666, in _exitfunc
    f()
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\weakref.py", line 590, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Anon\Desktop\Projects\mplbug\venv\Lib\site-packages\matplotlib\backends\backend_pgf.py", line 277, in finalize_latex
    latex.communicate()
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\subprocess.py", line 1209, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\subprocess.py", line 1610, in _communicate
    self.stdout_thread.start()
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 971, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't create new thread at interpreter shutdown

The second, which will not be produced if time elapses between plot creation and program exit (read: a sleep(0.5) call), is:

Traceback (most recent call last):
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\shutil.py", line 634, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\Anon\\AppData\\Local\\Temp\\tmp4qorphqa\\texput.aux'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\tempfile.py", line 891, in onexc
    _os.unlink(path)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\Anon\\AppData\\Local\\Temp\\tmp4qorphqa\\texput.aux'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\weakref.py", line 666, in _exitfunc
    f()
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\weakref.py", line 590, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\tempfile.py", line 923, in cleanup
    self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\tempfile.py", line 903, in _rmtree
    _shutil.rmtree(name, onexc=onexc)
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\shutil.py", line 796, in rmtree
    return _rmtree_unsafe(path, onexc)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\shutil.py", line 636, in _rmtree_unsafe
    onexc(os.unlink, fullname, err)
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\tempfile.py", line 894, in onexc
    cls._rmtree(path, ignore_errors=ignore_errors)
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\tempfile.py", line 903, in _rmtree
    _shutil.rmtree(name, onexc=onexc)
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\shutil.py", line 796, in rmtree
    return _rmtree_unsafe(path, onexc)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\shutil.py", line 612, in _rmtree_unsafe
    onexc(os.scandir, path, err)
  File "C:\Users\Anon\AppData\Local\Programs\Python\Python312\Lib\shutil.py", line 609, in _rmtree_unsafe
    with os.scandir(path) as scandir_it:
         ^^^^^^^^^^^^^^^^
NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\Users\\Anon\\AppData\\Local\\Temp\\tmp4qorphqa\\texput.aux'

Expected outcome

No crashes.

Additional information

Workaround that prevents both exceptions is running weakref.finalize._exitfunc() before program exit.
Workaround that prevents the second exception only is letting time elapse before program exit (like with a time.sleep(0.5) call).

Operating system

Windows 11 Enterprise 22H2

Matplotlib Version

3.8.2

Matplotlib Backend

pgf

Python version

3.12.0

Jupyter version

No response

Installation

pip

@anntzer
Copy link
Contributor

anntzer commented Dec 5, 2023

This appears to be a consequence of python/cpython@ce558e6 (disabling creation of new threads at shutdown) combined with the fact that on Windows, Popen._communicate is implemented by running proc.stdout.read() in a new thread with a timeout (POSIX uses selectors and thus doesn't have this problem). I suppose the "easy" way to fix that (again, only for Windows) is to effectively inline Popen._communicate implementation for finalize_latex (dropping out the unneeded parts as we don't actually care about the outputs at that point, nor about the timeouts), but making sure to start the reader threads much earlier (in _setup_latex_process) and letting them just wait until signalled (e.g. via a threading.Event) by finalize_latex to start reading.

Edit: perhaps

diff --git i/lib/matplotlib/backends/backend_pgf.py w/lib/matplotlib/backends/backend_pgf.py
index 78d15fe29b..29f04f8577 100644
--- i/lib/matplotlib/backends/backend_pgf.py
+++ w/lib/matplotlib/backends/backend_pgf.py
@@ -257,7 +257,7 @@ class LatexManager:
         # Open LaTeX process for real work; register it for deletion.  On
         # Windows, we must ensure that the subprocess has quit before being
         # able to delete the tmpdir in which it runs; in order to do so, we
-        # must first `kill()` it, and then `communicate()` with it.
+        # must first `kill()` it, and then `wait()` with it.
         try:
             self.latex = subprocess.Popen(
                 [mpl.rcParams["pgf.texsystem"], "-halt-on-error"],
@@ -274,7 +274,7 @@ class LatexManager:

         def finalize_latex(latex):
             latex.kill()
-            latex.communicate()
+            latex.wait()

         self._finalize_latex = weakref.finalize(
             self, finalize_latex, self.latex)

is enough in practice?

@anntzer anntzer added OS: Microsoft Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. labels Jan 25, 2024
@anntzer anntzer added this to the v3.8.3 milestone Jan 25, 2024
@anntzer
Copy link
Contributor

anntzer commented Jan 25, 2024

Even though the pgf file is correctly created I think the crash is annoying enough (and may leave temporary files lying around perhaps?) to mark this as RC, especially as there's a tentative patch. (A possible issue with using wait() is mentioned at https://docs.python.org/3/library/subprocess.html#subprocess.Popen.wait re: pipes filling up, but I think this may not be a possible issue here?)

@tacaswell
Copy link
Member

Is it worth opening an issue with upstream as this may not have been understand to be a sideeffect of that change?

@anntzer
Copy link
Contributor

anntzer commented Jan 25, 2024

I agree it's a bit annoying but I'm not really sure what they could do to fix that in the general case. One specific thing that could be done, though, is that right now they skip handling stdin/stdout/stderr in separate threads if at least two of them are None and there is no timeout (in that case they can just handle the remaining fd (if any) in the main thread) -- see the definition of Popen.communicate --; however, they could also do so if input is None (I think) (as in that case stdin effectively does nothing), which would happen to handle our use-case (and is probably not so rare in general).

@tacaswell
Copy link
Member

I'll try to do this next Tuesday if someone does not beat me to it.

@tacaswell
Copy link
Member

I did not get to opening an issue upstream yet.

@tacaswell
Copy link
Member

A concern with switching from communicate to wait is that the docs warn us of a possible deadlock:

https://docs.python.org/3/library/subprocess.html#subprocess.Popen.wait

Note This will deadlock when using stdout=PIPE or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use Popen.communicate() when using pipes to avoid that.

but I think in this case we should always be "small" while tearing down a the latex process.

Can we do

diff --git a/lib/matplotlib/backends/backend_pgf.py b/lib/matplotlib/backends/backend_pgf.py
index 78d15fe29b..ba01622bd0 100644
--- a/lib/matplotlib/backends/backend_pgf.py
+++ b/lib/matplotlib/backends/backend_pgf.py
@@ -257,7 +257,8 @@ class LatexManager:
         # Open LaTeX process for real work; register it for deletion.  On
         # Windows, we must ensure that the subprocess has quit before being
         # able to delete the tmpdir in which it runs; in order to do so, we
-        # must first `kill()` it, and then `communicate()` with it.
+        # must first `kill()` it, and then `communicate()` with or `wait()` on
+        # it.
         try:
             self.latex = subprocess.Popen(
                 [mpl.rcParams["pgf.texsystem"], "-halt-on-error"],
@@ -274,7 +275,10 @@ class LatexManager:

         def finalize_latex(latex):
             latex.kill()
-            latex.communicate()
+            try:
+                latex.communicate()
+            except RuntimeError:
+                latex.wait()

         self._finalize_latex = weakref.finalize(
             self, finalize_latex, self.latex)

@anntzer
Copy link
Contributor

anntzer commented Feb 9, 2024

Indeed, the deadlock seems improbable in this case; I guess the try... except approach is fine too.

@ksunden
Copy link
Member

ksunden commented Feb 13, 2024

Does somebody with a windows dev machine have the ability to test the proposed workaround?

Or are we convinced the ultimate solution is upstream and we should not be implementing the workaround here?

This is marked as Release Critical, though unless it gets in soon will probably have to be bumped because we are looking to release 3.8.3 with a MacOS deadlock fix.

@tacaswell
Copy link
Member

I think we can (blindly) go with the try...except solution.

tacaswell added a commit to tacaswell/matplotlib that referenced this issue Feb 13, 2024
On windows on py312 we can not use `.communicate` during process shutdown
because it internally uses Python threads which can no longer be created during
interpreter shutdown.

Closes matplotlib#27437
Impaler343 pushed a commit to Impaler343/matplotlib that referenced this issue Mar 8, 2024
On windows on py312 we can not use `.communicate` during process shutdown
because it internally uses Python threads which can no longer be created during
interpreter shutdown.

Closes matplotlib#27437
Impaler343 pushed a commit to Impaler343/matplotlib that referenced this issue Mar 14, 2024
On windows on py312 we can not use `.communicate` during process shutdown
because it internally uses Python threads which can no longer be created during
interpreter shutdown.

Closes matplotlib#27437
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: pgf OS: Microsoft Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants