Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cygwin: test failure in sage.manifolds.differentiable.affine_connection #22694

Closed
embray opened this issue Mar 28, 2017 · 20 comments
Closed

Cygwin: test failure in sage.manifolds.differentiable.affine_connection #22694

embray opened this issue Mar 28, 2017 · 20 comments

Comments

@embray
Copy link
Contributor

embray commented Mar 28, 2017

I'm consistently getting these test failures in this module on my Cygwin branch (just rebased on the latest develop branch, but I've had this for a couple weeks now):

sage -t --long --warn-long 158.6 src/sage/manifolds/differentiable/affine_connection.py
**********************************************************************
File "src/sage/manifolds/differentiable/affine_connection.py", line 1749, in sage.manifolds.differentiable.affine_connection.AffineConnection.riemann
Failed example:
    for i in M.irange():
        for j in M.irange():
            for k in M.irange():
                nab.add_coef(eV)[i,j,k] = nab.coef(eVW)[i,j,k,c_uvW].expr()
Exception raised:
    Traceback (most recent call last):
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 498, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 861, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.manifolds.differentiable.affine_connection.AffineConnection.riemann[30]>", line 4, in <module>
        nab.add_coef(eV)[i,j,k] = nab.coef(eVW)[i,j,k,c_uvW].expr()
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 628, in coef
        gam[[k,i,j]] = self(ev[i])(ef[k],ev[j])
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 1348, in __call__
        return self._derive_paral(tensor_r)
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 1470, in _derive_paral
        for ii,val in make_CovDerivative(listParalInput):
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/parallel/multiprocessing_sage.py", line 75, in parallel_iter
        for res in result:
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python/multiprocessing/pool.py", line 668, in next
        raise value
    RuntimeError: Aborted
**********************************************************************
File "src/sage/manifolds/differentiable/affine_connection.py", line 1754, in sage.manifolds.differentiable.affine_connection.AffineConnection.riemann
Failed example:
    r = nab.riemann() ; r
Exception raised:
    Traceback (most recent call last):
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 498, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 861, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.manifolds.differentiable.affine_connection.AffineConnection.riemann[31]>", line 1, in <module>
        r = nab.riemann() ; r
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 1776, in riemann
        gam_gam = gam.contract(1, gam, 0)
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/tensor/modules/comp.py", line 2316, in contract
        for ii, val in make_Contraction(listParalInput):
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/parallel/multiprocessing_sage.py", line 75, in parallel_iter
        for res in result:
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python/multiprocessing/pool.py", line 668, in next
        raise value
    TypeError: Aborted
**********************************************************************
1 item had failures:
   2 of  36 in sage.manifolds.differentiable.affine_connection.AffineConnection.riemann
    [418 tests, 2 failures, 229.98 s]

I can reproduce this outside the test suite as well. The same code works fine without parallel processing, so my guess is a problem in the parallel processing itself (though the tests for sage.parallel itself all pass).

Upstream report: https://lists.opendylan.org/pipermail/bdwgc/2017-March/006266.html

Upstream: Reported upstream. No feedback yet.

CC: @man74cio @egourgoulhon @jpflori

Component: porting: Cygwin

Keywords: windows cygwin manifolds parallel

Author: Erik Bray

Branch/Commit: 877a3fd

Reviewer: Eric Gourgoulhon, Jean-Pierre Flori

Issue created by migration from https://trac.sagemath.org/ticket/22694

@embray embray added this to the sage-8.0 milestone Mar 28, 2017
@embray
Copy link
Contributor Author

embray commented Mar 28, 2017

comment:1

The RuntimeError: Aborted is from Cysignals. Something is causing a SIGABRT somewhere, though I'm not sure why.

@embray
Copy link
Contributor Author

embray commented Mar 29, 2017

comment:3

Attachment: stacktrace.log

By the way, I've made significant progress in getting to the bottom of this. The full traceback leading up to the SIGABRT is attached.

But the main issue here seems to have to do with libgc's handling of threads in the child process after a fork, on Cygwin. In principle libgc explicitly supports Cygwin, but there still seems to be a bug somewhere related to this (or possibly just a build issue though I haven't found that to be the case yet...). I'm trying to boil it down to a simpler test case, but the issue seems to be that for one reason or another the post-fork handler is either not running at all, or not working properly, because in the forked process it's leaving behind a reference to a thread from the parent process, which has a handle to a native (Windows) thread that is no longer valid in the child process. The pthread_atfork handler for the child process should have removed that thread but it doesn't.

A possible short-term workaround might be to compile libgc with threads disabled on Cygwin, but I don't know what other impacts that would have.

@embray
Copy link
Contributor Author

embray commented Mar 29, 2017

comment:4

Confirmed that building libgc with ./configure --enable-threads=none works around this particular issue.

But I'd still like to get to the bottom of this; ISTM like it might be a simple bug at the end of the day.

@egourgoulhon
Copy link
Member

comment:5

Replying to @embray:

By the way, I've made significant progress in getting to the bottom of this. The full traceback leading up to the SIGABRT is attached.

Thanks for this update.
Another place where doctests involve parallel processing is src/sage/tensor/modules/comp.py. Do you also experience any trouble with this file?

@embray
Copy link
Contributor Author

embray commented Mar 29, 2017

comment:6

Okay, it turns out the pthread_atexit handler doesn't actually get installed unless I explicitly build with ./configure --enable-handle-fork (it can also be enabled at runtime, incidentally).

That said, the fact that it outright crashes on Windows without this option is also a bug, IMO, albeit fixable I think. But for Sage's purposes it will be easy enough to add the appropriate flags when building on Cygwin \o/

@embray
Copy link
Contributor Author

embray commented Mar 29, 2017

comment:7

Replying to @egourgoulhon:

Another place where doctests involve parallel processing is src/sage/tensor/modules/comp.py. Do you also experience any trouble with this file?

Actually this doctest passes fine as is. This issue only arises in certain cases if ECL is allocating a lot of small objects, and gc needs to suspend all the threads whose stacks it's monitoring so that it can make more room in its heap for more small objects (at least, that's how I understand the issue).

@egourgoulhon
Copy link
Member

comment:8

Replying to @embray:

But for Sage's purposes it will be easy enough to add the appropriate flags when building on Cygwin \o/

Very good!

@egourgoulhon
Copy link
Member

comment:9

Replying to @embray:

Replying to @egourgoulhon:

Actually this doctest passes fine as is. This issue only arises in certain cases if ECL is allocating a lot of small objects, and gc needs to suspend all the threads whose stacks it's monitoring so that it can make more room in its heap for more small objects (at least, that's how I understand the issue).

OK I see. Indeed the doctests in src/sage/tensor/modules/comp.py do not involve ECL (no symbolic calculus, hence no Maxima simplifications).

@embray
Copy link
Contributor Author

embray commented Mar 29, 2017

Author: Erik Bray

@embray
Copy link
Contributor Author

embray commented Mar 29, 2017

Commit: 877a3fd

@embray

This comment has been minimized.

@embray
Copy link
Contributor Author

embray commented Mar 29, 2017

Branch: u/embray/cygwin/ticket-22694

@embray
Copy link
Contributor Author

embray commented Mar 29, 2017

Upstream: Reported upstream. No feedback yet.

@embray
Copy link
Contributor Author

embray commented Mar 29, 2017

comment:10

I've attached a patch to Sage to configure GC to work properly on Cygwin. I've also made an upstream report, but only to suggest that the default configuration for Cygwin should incorporate this (since it's broken otherwise). I don't think there's a great way to work around the issue in the code without using pthread_atfork that doesn't incur too much overhead.


New commits:

877a3fdSet the configure flags required to function correctly w.r.t. fork/threads on Cygwin

@egourgoulhon
Copy link
Member

comment:11

LGTM.
Just putting Jean-Pierre in Cc for a cross-check, since he knows far better than me Sage build process and Cygwin.

@egourgoulhon
Copy link
Member

Reviewer: Eric Gourgoulhon

@jpflori
Copy link

jpflori commented Mar 30, 2017

comment:12

Looks good to me.

@jpflori
Copy link

jpflori commented Mar 30, 2017

Changed reviewer from Eric Gourgoulhon to Eric Gourgoulhon, Jean-Pierre Flori

@egourgoulhon
Copy link
Member

comment:14

Replying to @jpflori:

Looks good to me.

Thanks!

@vbraun
Copy link
Member

vbraun commented Apr 3, 2017

Changed branch from u/embray/cygwin/ticket-22694 to 877a3fd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants