Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random segfaults in linear_tensor_element.pyx #28559

Open
jhpalmieri opened this issue Oct 5, 2019 · 15 comments
Open

random segfaults in linear_tensor_element.pyx #28559

jhpalmieri opened this issue Oct 5, 2019 · 15 comments

Comments

@jhpalmieri
Copy link
Member

Intermittent failure with linear_tensor_element.pyx on OS X with a Python 3 build of Sage:

sage -t --warn-long 58.7 src/sage/numerical/linear_tensor_element.pyx  # Killed due to abort

In more detail:

$ ./sage -t src/sage/numerical/linear_tensor_element.pyx 
Running doctests with ID 2019-10-05-11-47-01-e1b44413.
Git branch: develop
Using --optional=build,dochtml,sage
Doctesting 1 file.
glp_free: memory allocation error
Error detected in file env/alloc.c at line 72
------------------------------------------------------------------------
0   signals.cpython-37m-darwin.so       0x000000010ed22f0a print_backtrace + 58
1   signals.cpython-37m-darwin.so       0x000000010ed271b7 sigdie + 39
2   signals.cpython-37m-darwin.so       0x000000010ed27130 sigdie_for_sig + 256
3   libsystem_platform.dylib            0x00007fff621d7b5d _sigtramp + 29
4   libglpk.40.dylib                    0x0000000152fbb020 libglpk.40.dylib + 32
5   libsystem_c.dylib                   0x00007fff620916a6 abort + 127
6   libglpk.40.dylib                    0x0000000153019bd4 errfunc + 212
7   libglpk.40.dylib                    0x00000001530193c8 dma + 184
8   libglpk.40.dylib                    0x000000015302b02f _glp_dmp_delete_pool + 47
9   libglpk.40.dylib                    0x0000000152fd6621 delete_prob + 17
10  libglpk.40.dylib                    0x0000000152fd66d2 glp_delete_prob + 66
11  glpk_backend.cpython-37m-darwin.so  0x0000000152f7740c __pyx_tp_dealloc_4sage_9numerical_8backends_12glpk_backend_GLPKBackend + 76
12  mip.cpython-37m-darwin.so           0x000000014f5bc91e __pyx_tp_clear_4sage_9numerical_3mip_MixedIntegerLinearProgram + 78
13  libpython3.7m.dylib                 0x000000010d37316c collect + 2204
14  libpython3.7m.dylib                 0x000000010d3738a2 _PyObject_GC_Alloc + 386
15  libpython3.7m.dylib                 0x000000010d373914 _PyObject_GC_New + 20
16  libpython3.7m.dylib                 0x000000010d273c78 dictiter_new + 24
17  libpython3.7m.dylib                 0x000000010d22c918 PyObject_GetIter + 24
18  libpython3.7m.dylib                 0x000000010d39aaeb defdict_reduce + 107
19  libpython3.7m.dylib                 0x000000010d242cbc _PyMethodDef_RawFastCallDict + 588
20  libpython3.7m.dylib                 0x000000010d241cde _PyObject_FastCallDict + 270
21  libpython3.7m.dylib                 0x000000010d2a0cfe object___reduce_ex__ + 174
22  libpython3.7m.dylib                 0x000000010d242cbc _PyMethodDef_RawFastCallDict + 588
23  libpython3.7m.dylib                 0x000000010d241cde _PyObject_FastCallDict + 270
24  libpython3.7m.dylib                 0x000000010d2442db object_vacall + 619
25  libpython3.7m.dylib                 0x000000010d2444e0 PyObject_CallFunctionObjArgs + 144
26  _pickle.cpython-37m-darwin.so       0x000000010ddc9b82 save + 12626
27  _pickle.cpython-37m-darwin.so       0x000000010ddccf0a batch_dict + 490
28  _pickle.cpython-37m-darwin.so       0x000000010ddcbe56 save_reduce + 1142
29  _pickle.cpython-37m-darwin.so       0x000000010ddc9b29 save + 12537
30  _pickle.cpython-37m-darwin.so       0x000000010ddc8674 save + 7236
31  _pickle.cpython-37m-darwin.so       0x000000010ddc6701 dump + 257
32  _pickle.cpython-37m-darwin.so       0x000000010ddd5390 _pickle_Pickler_dump + 96
33  libpython3.7m.dylib                 0x000000010d243197 _PyMethodDef_RawFastCallKeywords + 775
34  libpython3.7m.dylib                 0x000000010d249181 _PyMethodDescr_FastCallKeywords + 81
35  libpython3.7m.dylib                 0x000000010d317538 call_function + 888
36  libpython3.7m.dylib                 0x000000010d313efe _PyEval_EvalFrameDefault + 27230
37  libpython3.7m.dylib                 0x000000010d31824d _PyEval_EvalCodeWithName + 3005
38  libpython3.7m.dylib                 0x000000010d242499 _PyFunction_FastCallKeywords + 217
39  libpython3.7m.dylib                 0x000000010d3174cc call_function + 780
40  libpython3.7m.dylib                 0x000000010d313f1e _PyEval_EvalFrameDefault + 27262
41  libpython3.7m.dylib                 0x000000010d2429bd function_code_fastcall + 237
42  libpython3.7m.dylib                 0x000000010d314140 _PyEval_EvalFrameDefault + 27808
43  libpython3.7m.dylib                 0x000000010d2429bd function_code_fastcall + 237
44  libpython3.7m.dylib                 0x000000010d3174cc call_function + 780
45  libpython3.7m.dylib                 0x000000010d313efe _PyEval_EvalFrameDefault + 27230
46  libpython3.7m.dylib                 0x000000010d2429bd function_code_fastcall + 237
47  libpython3.7m.dylib                 0x000000010d3174cc call_function + 780
48  libpython3.7m.dylib                 0x000000010d313efe _PyEval_EvalFrameDefault + 27230
49  libpython3.7m.dylib                 0x000000010d2429bd function_code_fastcall + 237
50  libpython3.7m.dylib                 0x000000010d243463 _PyObject_Call_Prepend + 131
51  libpython3.7m.dylib                 0x000000010d2425f8 PyObject_Call + 136
52  libpython3.7m.dylib                 0x000000010d3a7e87 t_bootstrap + 71
53  libpython3.7m.dylib                 0x000000010d35b699 pythread_wrapper + 25
54  libsystem_pthread.dylib             0x00007fff621e02eb _pthread_body + 126
55  libsystem_pthread.dylib             0x00007fff621e3249 _pthread_start + 66
56  libsystem_pthread.dylib             0x00007fff621df40d thread_start + 13
------------------------------------------------------------------------
Unhandled SIGABRT: An abort() occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
sage -t --warn-long 58.7 src/sage/numerical/linear_tensor_element.pyx
    Killed due to abort
**********************************************************************
Tests run before process (pid=66448) failed:
sage: mip.<x> = MixedIntegerLinearProgram('ppl')   # base ring is QQ ## line 6 ##
sage: lt = x[0] * vector([3,4]) + 1;   lt ## line 7 ##
(1, 1) + (3, 4)*x_0
sage: type(lt) ## line 9 ##
<class 'sage.numerical.linear_tensor_element.LinearTensor'>
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 11 ##
0
sage: parent = MixedIntegerLinearProgram().linear_functions_parent().tensor(RDF^2) ## line 47 ##
sage: parent({0: [1,2], 3: [-7,-8]}) ## line 48 ##
(1.0, 2.0)*x_0 + (-7.0, -8.0)*x_3
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 50 ##
0
sage: LT = MixedIntegerLinearProgram().linear_functions_parent().tensor(RDF^2) ## line 70 ##
sage: LT({0: [1,2], 3: [-7,-8]}) ## line 71 ##
(1.0, 2.0)*x_0 + (-7.0, -8.0)*x_3
sage: TestSuite(LT).run(skip=['_test_an_element', '_test_elements_eq_reflexive',
    '_test_elements_eq_symmetric', '_test_elements_eq_transitive',
    '_test_elements_neq', '_test_additive_associativity',
    '_test_elements', '_test_pickling', '_test_zero']) ## line 74 ##
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 78 ##
0
sage: p = MixedIntegerLinearProgram().linear_functions_parent().tensor(RDF^2) ## line 95 ##
sage: lt = p({0:[1,2], 3:[4,5]});  lt ## line 96 ##
(1.0, 2.0)*x_0 + (4.0, 5.0)*x_3
sage: lt[0] ## line 98 ##
x_0 + 4*x_3
sage: lt[1] ## line 100 ##
2*x_0 + 5*x_3
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 102 ##
0
sage: p = MixedIntegerLinearProgram().linear_functions_parent().tensor(RDF^2) ## line 120 ##
sage: lt = p({0:[1,2], 3:[4,5]}) ## line 121 ##
sage: lt.dict() ## line 122 ##
{0: (1.0, 2.0), 3: (4.0, 5.0)}
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 124 ##
0
sage: mip.<b> = MixedIntegerLinearProgram() ## line 144 ##
sage: lt = vector([1,2]) * b[3] + vector([4,5]) * b[0] - 5;  lt ## line 145 ##
(-5.0, -5.0) + (1.0, 2.0)*x_0 + (4.0, 5.0)*x_1
sage: lt.coefficient(b[3]) ## line 147 ##
(1.0, 2.0)
sage: lt.coefficient(0)      # x_0 is b[3] ## line 149 ##
(1.0, 2.0)
sage: lt.coefficient(4) ## line 151 ##
(0.0, 0.0)
sage: lt.coefficient(-1) ## line 153 ##
(-5.0, -5.0)
sage: lt.coefficient(b[3] + b[4]) ## line 158 ##
sage: lt.coefficient(2*b[3]) ## line 162 ##
sage: mip.<q> = MixedIntegerLinearProgram(solver='ppl') ## line 166 ##
sage: lt.coefficient(q[0]) ## line 167 ##
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 171 ##
0
sage: from sage.numerical.linear_functions import LinearFunctionsParent ## line 197 ##
sage: R.<s,t> = RDF[] ## line 198 ##
sage: LT = LinearFunctionsParent(RDF).tensor(R) ## line 199 ##
sage: LT.an_element()  # indirect doctest ## line 200 ##
(s) + (5.0*s)*x_2 + (7.0*s)*x_5
sage: LT = LinearFunctionsParent(RDF).tensor(RDF^2) ## line 203 ##
sage: LT.an_element()  # indirect doctest ## line 204 ##
(1.0, 0.0) + (5.0, 0.0)*x_2 + (7.0, 0.0)*x_5
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 206 ##
0
sage: from sage.numerical.linear_functions import LinearFunctionsParent ## line 235 ##
sage: LT = LinearFunctionsParent(RDF).tensor(RDF^(2,2)) ## line 236 ##
sage: LT.an_element()  # indirect doctest ## line 237 ##
[1 + 5*x_2 + 7*x_5 1 + 5*x_2 + 7*x_5]
[1 + 5*x_2 + 7*x_5 1 + 5*x_2 + 7*x_5]
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 240 ##
0
sage: from sage.numerical.linear_functions import LinearFunctionsParent ## line 278 ##
sage: LT = LinearFunctionsParent(RDF).tensor(RDF^2) ## line 279 ##
sage: LT({0: [1,2], 3: [-7,-8]}) + LT({2: [5,6], 3: [2,-2]}) + 16 ## line 280 ##
(16.0, 16.0) + (1.0, 2.0)*x_0 + (5.0, 6.0)*x_2 + (-5.0, -10.0)*x_3
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 282 ##
0
sage: from sage.numerical.linear_functions import LinearFunctionsParent ## line 298 ##
sage: LT = LinearFunctionsParent(RDF).tensor(RDF^2) ## line 299 ##
sage: -LT({0: [1,2], 3: [-7,-8]}) ## line 300 ##
(-1.0, -2.0)*x_0 + (7.0, 8.0)*x_3
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 302 ##
0
sage: from sage.numerical.linear_functions import LinearFunctionsParent ## line 322 ##
sage: LT = LinearFunctionsParent(RDF).tensor(RDF^2) ## line 323 ##
sage: LT({0: [1,2], 3: [-7,-8]}) - LT({1: [1,2]}) ## line 324 ##
(1.0, 2.0)*x_0 + (-1.0, -2.0)*x_1 + (-7.0, -8.0)*x_3
sage: LT({0: [1,2], 3: [-7,-8]}) - 16 ## line 326 ##
(-16.0, -16.0) + (1.0, 2.0)*x_0 + (-7.0, -8.0)*x_3
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 328 ##
0
sage: from sage.numerical.linear_functions import LinearFunctionsParent ## line 348 ##
sage: LT = LinearFunctionsParent(RDF).tensor(RDF^2) ## line 349 ##
sage: 10 * LT({0: [1,2], 3: [-7,-8]}) ## line 350 ##
(10.0, 20.0)*x_0 + (-70.0, -80.0)*x_3
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 352 ##
0
sage: mip.<x> = MixedIntegerLinearProgram() ## line 364 ##
sage: lt0 = x[0] * vector([1,2]) ## line 365 ##
sage: lt1 = x[1] * vector([2,3]) ## line 366 ##
sage: lt0.__le__(lt1)    # indirect doctest ## line 367 ##
(1.0, 2.0)*x_0 <= (2.0, 3.0)*x_1
sage: mip.<x> = MixedIntegerLinearProgram() ## line 372 ##
sage: from sage.numerical.linear_functions import LinearFunction ## line 373 ##
sage: x[0] * vector([1,2]) <= x[1] * vector([2,3]) ## line 374 ##
(1.0, 2.0)*x_0 <= (2.0, 3.0)*x_1
sage: x[0] * vector([1,2]) >= x[1] * vector([2,3]) ## line 377 ##
(2.0, 3.0)*x_1 <= (1.0, 2.0)*x_0
sage: x[0] * vector([1,2]) == x[1] * vector([2,3]) ## line 380 ##
(1.0, 2.0)*x_0 == (2.0, 3.0)*x_1
sage: x[0] * vector([1,2]) < x[1] * vector([2,3]) ## line 383 ##
sage: x[0] * vector([1,2]) > x[1] * vector([2,3]) ## line 388 ##
sage: lt = x[0] * vector([1,2]) ## line 395 ##
sage: cm = sage.structure.element.get_coercion_model() ## line 396 ##
sage: cm.explain(10, lt, operator.le) ## line 397 ##
Coercion on left operand via
    Coercion map:
      From: Integer Ring
      To:   Tensor product of Vector space of dimension 2 over Real Double Field and Linear functions over Real Double Field
Arithmetic performed after coercions.
Result lives in Tensor product of Vector space of dimension 2 over Real Double Field and Linear functions over Real Double Field
Tensor product of Vector space of dimension 2 over Real Double Field and Linear functions over Real Double Field
sage: operator.le(10, lt) ## line 406 ##
(10.0, 10.0) <= (1.0, 2.0)*x_0
sage: lt <= 1 ## line 408 ##
(1.0, 2.0)*x_0 <= (1.0, 1.0)
sage: lt >= 1 ## line 410 ##
(1.0, 1.0) <= (1.0, 2.0)*x_0
sage: 1 <= lt ## line 412 ##
(1.0, 1.0) <= (1.0, 2.0)*x_0
sage: 1 >= lt ## line 414 ##
(1.0, 2.0)*x_0 <= (1.0, 1.0)
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 416 ##
0
sage: p = MixedIntegerLinearProgram() ## line 444 ##
sage: lt0 = p[0] * vector([1,2]) ## line 445 ##
sage: hash(lt0)   # random output ## line 446 ##
-9223372036499170180
sage: d = {} ## line 448 ##
sage: d[lt0] = 3 ## line 449 ##
sage: f = p[0] * vector([1]) ## line 455 ##
sage: g = p[0] * vector([1]) ## line 456 ##
sage: set([f, f]) ## line 457 ##
{((1.0))*x_0}
sage: set([f, g]) ## line 459 ##
{((1.0))*x_0, ((1.0))*x_0}
sage: len(set([f, f+1])) ## line 461 ##
2
sage: d = {} ## line 464 ##
sage: d[f] = 123 ## line 465 ##
sage: d[g] = 456 ## line 466 ##
sage: len(list(d)) ## line 467 ##
2
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 469 ##
0

This has been discussed at #27587, but I think it deserves its own ticket. For some reason, this change makes the failure go away:

diff --git a/src/sage/numerical/linear_tensor_element.pyx b/src/sage/numerical/linear_tensor_element.pyx
index 597f96f953..fbd1f58a45 100644
--- a/src/sage/numerical/linear_tensor_element.pyx
+++ b/src/sage/numerical/linear_tensor_element.pyx
@@ -380,16 +380,6 @@ cdef class LinearTensor(ModuleElement):
             sage: x[0] * vector([1,2]) == x[1] * vector([2,3])
             (1.0, 2.0)*x_0 == (2.0, 3.0)*x_1
 
-            sage: x[0] * vector([1,2]) < x[1] * vector([2,3])
-            Traceback (most recent call last):
-            ...
-            ValueError: strict < is not allowed, use <= instead.
-
-            sage: x[0] * vector([1,2]) > x[1] * vector([2,3])
-            Traceback (most recent call last):
-            ...
-            ValueError: strict > is not allowed, use >= instead.
-
         TESTS::
 
             sage: lt = x[0] * vector([1,2])

but I don't know why.

CC: @collares

Component: python3

Keywords: random_fail

Issue created by migration from https://trac.sagemath.org/ticket/28559

@jhpalmieri jhpalmieri added this to the sage-9.0 milestone Oct 5, 2019
@jhpalmieri

This comment has been minimized.

@mwageringel
Copy link

comment:2

This problem still exists and is not limited to OS X. It appears in the patchbot results occasionally, for example here based on 9.0beta7: CentOS, LinuxMint.

@mwageringel mwageringel changed the title py3 + OS X + linear_tensor_element.pyx py3 + linear_tensor_element.pyx Nov 30, 2019
@kwankyu
Copy link
Collaborator

kwankyu commented Dec 2, 2019

comment:3

No one proposes a solution. How about adopting John's temporary measure here just to push sage on python 3? We can create a regular ticket to further track the issue.

@fchapoton
Copy link
Contributor

comment:4

Indeed, this also happening with LinuxMint:

https://patchbot.sagemath.org/log/0/LinuxMint/19.2/x86_64/4.15.0-65-generic/pc72/2019-12-02%2002:01:40

There is no urgency to fix this for python3. The switch to python3 will happen very soon anyway.

@vbraun
Copy link
Member

vbraun commented Dec 15, 2019

Changed keywords from none to random_fail

@embray
Copy link
Contributor

embray commented Dec 16, 2019

comment:6

I'm still surprised this isn't similar or related to #28106. Memory exhaustion is the most likely culprit for random failures like this.

@vbraun
Copy link
Member

vbraun commented Dec 16, 2019

comment:7

I think there is an underlying memory corruption bug here.

  • The test should be almost trivial, doesn't use a significant amount of memory
  • The traceback is from when the glpk memory structure is freed, after the computation succeded
    The glpk pool allocator has some headers on allocated memory regions to check violations, and this is being triggered here.

@embray
Copy link
Contributor

embray commented Jan 6, 2020

comment:8

Ticket retargeted after milestone closed

@embray embray modified the milestones: sage-9.0, sage-9.1 Jan 6, 2020
@mkoeppe
Copy link
Member

mkoeppe commented May 1, 2020

comment:9

Moving tickets to milestone sage-9.2 based on a review of last modification date, branch status, and severity.

@mkoeppe mkoeppe modified the milestones: sage-9.1, sage-9.2 May 1, 2020
@mkoeppe mkoeppe modified the milestones: sage-9.2, sage-9.3 Aug 13, 2020
@mkoeppe
Copy link
Member

mkoeppe commented Feb 13, 2021

comment:11

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

@mkoeppe mkoeppe modified the milestones: sage-9.3, sage-9.4 Feb 13, 2021
@jhpalmieri
Copy link
Member Author

comment:13

I haven't seen this problem in a long time. Has anyone else?

@collares
Copy link
Contributor

comment:14

Yes, this happened on the NixOS builders yesterday: https://nix-cache.s3.amazonaws.com/log/npaj3152j2q7nq1n58i8sncx57mkf6g3-sage-tests-9.2.drv

@collares
Copy link
Contributor

collares commented Apr 2, 2021

comment:15

It's easy to find examples of thread-safety issues related to GLPK in other projects, such as jyp/glpk-hs#9. I don't know how Cython's __dealloc__ works, but could it interact badly with GLPK's use of thread-local storage?

@mwageringel
Copy link

comment:16

Replying to @jhpalmieri:

I haven't seen this problem in a long time. Has anyone else?

A few days ago, this happened on a patchbot with Debian.

@mkoeppe mkoeppe modified the milestones: sage-9.4, sage-9.5 Aug 10, 2021
@mkoeppe mkoeppe removed this from the sage-9.5 milestone Dec 18, 2021
@mkoeppe mkoeppe added this to the sage-9.6 milestone Dec 18, 2021
@mkoeppe mkoeppe modified the milestones: sage-9.6, sage-9.7 May 3, 2022
@mkoeppe mkoeppe modified the milestones: sage-9.7, sage-9.8 Sep 19, 2022
@mkoeppe mkoeppe removed this from the sage-9.8 milestone Jan 29, 2023
@mkoeppe mkoeppe changed the title py3 + linear_tensor_element.pyx random segfaults in linear_tensor_element.pyx Dec 23, 2023
@mkoeppe
Copy link
Member

mkoeppe commented Dec 23, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants