Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

homebrew: docbuild crashes, libtcl AtForkPrepare - from sage.misc.cython globals / multiprocessing #31344

Closed
mkoeppe opened this issue Feb 5, 2021 · 37 comments

Comments

@mkoeppe
Copy link
Member

mkoeppe commented Feb 5, 2021

(from #31335, reported in https://groups.google.com/g/sage-devel/c/9EMs9h2i_H4)

CC: @jhpalmieri @zlscherr @kiwifb @kliem

Component: build

Author: Matthias Koeppe, John Palmieri

Branch/Commit: b4ceee5

Reviewer: John Palmieri

Issue created by migration from https://trac.sagemath.org/ticket/31344

@mkoeppe mkoeppe added this to the sage-9.3 milestone Feb 5, 2021
@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

comment:1

Bisecting src/doc/en/reference/misc/index.rst (running ./sage -docbuild --keep-going all html) reveals that the crash is coming from sage.misc.cython

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

comment:2

For some reason, this line: cblas_pc = pkgconfig.parse(get_cblas_pc_module_name())
seems to cause the trouble.

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

New commits:

80720d7src/sage/misc/cython.py: Do not run pkgconfig at import time

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

Author: Matthias Koeppe

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

Commit: 80720d7

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

comment:5

Easiest to test on #31335, which merges this branch

@jhpalmieri
Copy link
Member

comment:6

With #31335, I still see a failure during docbuilding when using homebrew's Python with Big Sur. The failure now appears when building thematic_tutorials instead of the reference manual.

------------------------------------------------------------------------
0   signals.cpython-39-darwin.so        0x00000001047e2542 print_backtrace + 66
1   signals.cpython-39-darwin.so        0x00000001047e6167 sigdie + 39
2   signals.cpython-39-darwin.so        0x00000001047e606a cysigs_signal_handler + 282
3   libsystem_platform.dylib            0x00007fff20486d7d _sigtramp + 29
4   Python                              0x00000001029edcf1 _PyArg_ParseTuple_SizeT + 158
5   libtcl8.6.dylib                     0x000000034143972e AtForkPrepare + 38
6   libsystem_pthread.dylib             0x00007fff204421a3 _pthread_atfork_prepare_handlers + 90
7   libSystem.B.dylib                   0x00007fff2a645934 libSystem_atfork_prepare + 11
8   libsystem_c.dylib                   0x00007fff20325b1b fork + 12
9   _posixsubprocess.cpython-39-darwin. 0x00000001030d77f3 subprocess_fork_exec + 860
10  Python                              0x000000010291c2da cfunction_call + 90
11  Python                              0x00000001028d1b56 _PyObject_MakeTpCall + 129
12  Python                              0x00000001029ca625 call_function + 278
13  Python                              0x00000001029c7e86 _PyEval_EvalFrameDefault + 45416
14  Python                              0x00000001029bbbd6 _PyEval_EvalCode + 403
...
272 Python                              0x00000001028d2774 _PyFunction_Vectorcall + 376
273 Python                              0x0000000102a3ade0 pymain_run_module + 212
274 Python                              0x0000000102a3a8aa pymain_run_python + 433
275 Python                              0x0000000102a3a6bd Py_RunMain + 23
276 Python                              0x0000000102a3b9da pymain_main + 35
277 Python                              0x0000000102a3bcb0 Py_BytesMain + 42
278 libdyld.dylib                       0x00007fff2045d621 start + 1
------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

comment:7

Thanks for testing! I'll try a clean rebuild of the documentation and see if I can reproduce it on Catalina as well.

For reference, the trick for bisecting was to use

make build && ./sage -docbuild --keep-going all html ; ./sage -docbuild all html

the first --keep-going was necessary so that WARNING: document isn't included in any toctree does not stop the whole process.

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

comment:8

OK, I can reproduce it

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

comment:9

reducing thematic_tutorials/index.rst to the following still reproduces the crash:

.. Sage documentation master file, created by sphinx-quickstart on Thu
.. Aug 21 20:15:55 2008. You can adapt this file completely to your
.. liking, but it should at least contain the root `toctree` directive.

.. _thematic-tutorials:

Welcome to the Sage Thematic Tutorials!
=======================================


* `Tutorial: Symbolics and Plotting (PREP) <../prep/Symbolics-and-Basic-Plotting.html>`_

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

comment:10

That's in an incremental docbuild - so something bad must have been saved in the inventory.

@jhpalmieri
Copy link
Member

comment:11

When I saw the original problem, I only saw it on the second pass through the ref manual build, which is consistent with seeing problems based on something in the inventory.

@zlscherr
Copy link

zlscherr commented Feb 5, 2021

comment:12

Does anyone know why

./sage --docbuild all html

fails at thematic_tutorial but

./sage --docbuild thematic_tutorial html

works?

@zlscherr
Copy link

zlscherr commented Feb 5, 2021

comment:13

In fact, after

./sage -docbuild --keep-going all html

failed, I tried building thematic_tutorial by itself. That worked, and then make doc says it was successful.

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 5, 2021

comment:14

I think the bug is triggered by the parallelization code in sage_setup.docbuild.AllBuilder.

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 6, 2021

comment:15

We previously had trouble with this code (build_many - from #28356, #27514, #27490) in #30351, #28483, ...

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 6, 2021

comment:16

see also #31289

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 6, 2021

comment:17

In any case, I think this ticket is an improvement by itself, as it removes some accidental globals from the module sage.misc.cython and reduced its load time.

@mkoeppe mkoeppe changed the title homebrew: docbuild crashes, libtcl AtForkPrepare homebrew: docbuild crashes, libtcl AtForkPrepare - from sage.misc.cython globals Feb 6, 2021
@jhpalmieri
Copy link
Member

comment:19

With this change, the documentation builds for me (but of course it is missing a plot):

diff --git a/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst b/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst
index 9faa9f2375..bc77d72e68 100644
--- a/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst
+++ b/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst
@@ -94,7 +94,6 @@ Vector fields can be plotted::
     E = EuclideanSpace(3)
     x, y, z = E.default_chart()[:]
     v = E.vector_field(-y, x, sin(x*y*z), name='v')
-    sphinx_plot(v.plot(max_range=1.5, scale=0.5))
 
 For customizing the plot, see the list of options in the documentation of
 :meth:`~sage.manifolds.differentiable.vectorfield.VectorField.plot`.

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 6, 2021

comment:20

This does not seem to help on my machine

@jhpalmieri
Copy link
Member

comment:21

Sorry, it turns out that it doesn't consistently help on mine, either. I think this should stop the non-reference manual docs from being built in parallel:

diff --git a/src/sage_setup/docbuild/__init__.py b/src/sage_setup/docbuild/__init__.py
index b07e9c100c..1d4139555e 100644
--- a/src/sage_setup/docbuild/__init__.py
+++ b/src/sage_setup/docbuild/__init__.py
@@ -286,13 +286,15 @@ class DocBuilder(object):
 
 from .utils import build_many as _build_many
 
-def build_many(target, args):
+def build_many(target, args, processes=None):
     """
     Thin wrapper around `sage_setup.docbuild.utils.build_many` which uses the
     docbuild settings ``NUM_THREADS`` and ``ABORT_ON_ERROR``.
     """
+    if processes is None:
+        processes = NUM_THREADS
     try:
-        _build_many(target, args, processes=NUM_THREADS)
+        _build_many(target, args, processes=processes)
     except BaseException as exc:
         if ABORT_ON_ERROR:
             raise
@@ -349,7 +351,7 @@ class AllBuilder(object):
 
         # build the other documents in parallel
         L = [(doc, name, kwds) + args for doc in others]
-        build_many(build_other_doc, L)
+        build_many(build_other_doc, L, 1)
         logger.warning("Elapsed time: %.1f seconds."%(time.time()-start))
         logger.warning("Done building the documentation!")

@jhpalmieri
Copy link
Member

comment:22

#31289 doesn't seem to help, by the way.

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 7, 2021

comment:23

Perhaps conditionalize this change on macOS?

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 7, 2021

comment:24

I've pushed this change to the branch of #31335, but it does not actually fix the problem for me. I'll try next if replacing the build_many by a for loop helps.

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 7, 2021

comment:25

(retracted)

@mkoeppe mkoeppe changed the title homebrew: docbuild crashes, libtcl AtForkPrepare - from sage.misc.cython globals homebrew: docbuild crashes, libtcl AtForkPrepare - from sage.misc.cython globals / multiprocessing Feb 7, 2021
@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 7, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

515f899sage_setup.docbuild.AllBuilder: stop the non-reference manual docs from being built in parallel
804ebd7sage_setup.dpcbuild.AllBuilder: Restrict workaround to macOS
b4ceee5sage_setup.docbuild: In the workaround, do not go through build_many to build serially

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 7, 2021

Changed commit from 80720d7 to b4ceee5

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 7, 2021

comment:28

This fixes the problem on my machine. Please test on Big Sur

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 7, 2021

Changed author from Matthias Koeppe to Matthias Koeppe, John Palmieri

@zlscherr
Copy link

zlscherr commented Feb 8, 2021

comment:31

Worked for me on Big Sur

@jhpalmieri
Copy link
Member

Reviewer: John Palmieri

@jhpalmieri
Copy link
Member

comment:32

This works for me, too. It would be nice to know that the actual problem is beyond "some murky issue with parallel docbuilding on OS X," but it's good enough to merge. @zlscherr, feel free to add your real name to the reviewers field (and also to the wiki page, if you want).

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 8, 2021

comment:33

Thanks!

@dimpase
Copy link
Member

dimpase commented Feb 8, 2021

comment:34

this fixed the docbuild crash on my Big Sur bix too

@slel
Copy link
Member

slel commented Feb 21, 2021

comment:35

On macOS 10.14.6: dochtml builds with this, while it does not with 9.3.beta7 or #31419.

@vbraun
Copy link
Member

vbraun commented Mar 1, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants