Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault, if @njit is removed #9546

Open
TheTesla opened this issue Apr 25, 2024 · 8 comments
Open

segfault, if @njit is removed #9546

TheTesla opened this issue Apr 25, 2024 · 8 comments

Comments

@TheTesla
Copy link

I get segmentation fault, if I remove the @njit decorator from my function:

TheTesla/xyzcad@e847494

Maybe there is a problem in handling numba datatypes in plain python or in the conversion of datatypes between numba and python.

reproduce

  1. checkout the linked commit from branch tlt
  2. run python3 -m pip install -r requirements.txt
  3. run python3 demo.py
  4. segfault should happen
  5. uncomment @njit in line 550
  6. run again
  7. program should gracefully terminate without segfault

system

  • Lenovo Thinkpad p14s, Intel i5-1240p, 40 GB RAM
  • Ubuntu 23.10
  • numba 0.59.1
@esc esc added the needtriage label Apr 26, 2024
@esc
Copy link
Member

esc commented Apr 26, 2024

@TheTesla thank you for submitting this. It's a bit unusual that removing the @njit decorator leads to a segfault.

@esc
Copy link
Member

esc commented Apr 26, 2024

here is the backtrace:

findSurfacePnt time: 0.5227141380310059
getSurface time: 1.969843864440918
43780 - 86528 - 86528 - 43780
/Users/esc/miniconda3-arm64/envs/numba_9546/lib/python3.12/site-packages/numba/parfors/parfor_lowering.py:1153: NumbaParallelSafetyWarning: Variable i.1.5 used in parallel loop may be written to simultaneously by multiple workers and may result in non-deterministic or unintended results.

File "xyzcad/render.py", line 427:
def coords2relations(cubeCoordArray, ptCoordArray, ptValueArray, res):
    <source elided>
    cube2ptIdxArray = np.zeros((cubeCoordArray.shape[0],8),dtype='int')
    for i in prange(cubeCoordArray.shape[0]):
    ^

  warnings.warn(NumbaParallelSafetyWarning(msg, loc))
/Users/esc/miniconda3-arm64/envs/numba_9546/lib/python3.12/site-packages/numba/parfors/parfor_lowering.py:1153: NumbaParallelSafetyWarning: Variable i.1.4 used in parallel loop may be written to simultaneously by multiple workers and may result in non-deterministic or unintended results.

File "xyzcad/render.py", line 443:
def coords2relations(cubeCoordArray, ptCoordArray, ptValueArray, res):
    <source elided>
    cEdgeArray = np.zeros((cube2ptIdxArray.shape[0]*12,2),dtype='int')
    for i in prange(cube2ptIdxArray.shape[0]):
    ^

  warnings.warn(NumbaParallelSafetyWarning(msg, loc))
coords2relations time: 3.544940948486328
43780 - 43780 - 217572 - 86528 - 86528
cutCedgeIdx time: 0.10100579261779785
43804
precTrPnts time: 0.45185089111328125
43804
circList time: 0.024292945861816406
List(circList) time: 0.33193182945251465
[1]    6075 segmentation fault  python3 demo.py
python3 demo.py  8.01s user 1.94s system 89% cpu 11.112 total

@esc
Copy link
Member

esc commented Apr 26, 2024

I tried to comment back in the @njit decorator and got the following:

findSurfacePnt time: 0.3638169765472412
getSurface time: 2.0497782230377197
43780 - 86528 - 86528 - 43780
/Users/esc/miniconda3-arm64/envs/numba_9546/lib/python3.12/site-packages/numba/parfors/parfor_lowering.py:1153: NumbaParallelSafetyWarning: Variable i.1.5 used in parallel loop may be written to simultaneously by multiple workers and may result in non-deterministic or unintended results.

File "xyzcad/render.py", line 427:
def coords2relations(cubeCoordArray, ptCoordArray, ptValueArray, res):
    <source elided>
    cube2ptIdxArray = np.zeros((cubeCoordArray.shape[0],8),dtype='int')
    for i in prange(cubeCoordArray.shape[0]):
    ^

  warnings.warn(NumbaParallelSafetyWarning(msg, loc))
/Users/esc/miniconda3-arm64/envs/numba_9546/lib/python3.12/site-packages/numba/parfors/parfor_lowering.py:1153: NumbaParallelSafetyWarning: Variable i.1.4 used in parallel loop may be written to simultaneously by multiple workers and may result in non-deterministic or unintended results.

File "xyzcad/render.py", line 443:
def coords2relations(cubeCoordArray, ptCoordArray, ptValueArray, res):
    <source elided>
    cEdgeArray = np.zeros((cube2ptIdxArray.shape[0]*12,2),dtype='int')
    for i in prange(cube2ptIdxArray.shape[0]):
    ^

  warnings.warn(NumbaParallelSafetyWarning(msg, loc))
coords2relations time: 3.0353610515594482
43780 - 43780 - 217572 - 86528 - 86528
cutCedgeIdx time: 0.10198712348937988
43804
precTrPnts time: 0.40909886360168457
43804
circList time: 0.06513214111328125
List(circList) time: 0.29528284072875977
repair_surface time: 1.0242741107940674
extend time: 0.30850696563720703
43816
TrIdx2TrCoord time: 0.26154589653015137
43816
Traceback (most recent call last):
  File "/Users/esc/git/xyzcad/demo.py", line 109, in <module>
    render.renderAndSave(f, 'demo.stl', 1)
  File "/Users/esc/git/xyzcad/xyzcad/render.py", line 620, in renderAndSave
    verticesArray = calcTrianglesCor(circPtsCoordList, True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/esc/miniconda3-arm64/envs/numba_9546/lib/python3.12/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/Users/esc/miniconda3-arm64/envs/numba_9546/lib/python3.12/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Type of variable 'i.3' cannot be determined, operation: $phi306.1, location: /Users/esc/git/xyzcad/xyzcad/render.py (512)

File "xyzcad/render.py", line 512:
def calcTrianglesCor(corCircList, invertConvexness=False):
    <source elided>
            n = len(circ)
            trInCubeList = [(circ[0], circ[i+2], circ[i+1]) for i in range(n-2)]
            ^

python3 demo.py  9.31s user 2.16s system 127% cpu 8.979 total

@esc
Copy link
Member

esc commented Apr 26, 2024

I also tried the following, which was fine, so basically this would indicate that Numba is involved in this segfault. This happens in both cases, with the decorator activate and with the decorator commented out.

 💣 zsh» NUMBA_DISABLE_JIT=1 python3 demo.py                                                                                                                                                             :(
findSurfacePnt time: 2.5987625122070312e-05
getSurface time: 4.59610390663147
43780 - 86528 - 86528 - 43780
coords2relations time: 0.8624508380889893
43780 - 43780 - 217572 - 86528 - 86528
cutCedgeIdx time: 0.05721306800842285
43804
precTrPnts time: 2.329416036605835
43804
circList time: 0.037850141525268555
List(circList) time: 0.00018286705017089844
repair_surface time: 0.07135581970214844
extend time: 5.0067901611328125e-06
43816
TrIdx2TrCoord time: 0.051640987396240234
43816
calcTriangles time: 0.22364020347595215
to mesh time: 0.009541988372802734
save time: 0.006164073944091797
8.254678964614868
NUMBA_DISABLE_JIT=1 python3 demo.py  9.26s user 1.58s system 127% cpu 8.528 total

@esc
Copy link
Member

esc commented Apr 30, 2024

@TheTesla I would recommend to try to address the warnings and failed compilation in order to work out what is wrong here.

@esc
Copy link
Member

esc commented Apr 30, 2024

@TheTesla I debugged this with lldb today and got:

 💣 zsh» lldb python3 demo.py
(lldb) target create "python3"
Current executable set to '/Users/esc/miniconda3-arm64/envs/numba_9546/bin/python3' (arm64).
(lldb) settings set -- target.run-args  "demo.py"
(lldb) r
Process 67632 launched: '/Users/esc/miniconda3-arm64/envs/numba_9546/bin/python3' (arm64)
findSurfacePnt time: 0.28748393058776855
getSurface time: 1.9984591007232666
43780 - 86528 - 86528 - 43780
coords2relations time: 0.09797787666320801
43780 - 43780 - 217572 - 86528 - 86528
cutCedgeIdx time: 0.007802009582519531
43804
precTrPnts time: 0.44718384742736816
43804
circList time: 0.02591228485107422
List(circList) time: 0.32137632369995117
Process 67632 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001000a1f38 python3`list_dealloc + 40
python3`list_dealloc:
->  0x1000a1f38 <+40>: str    x8, [x10]
    0x1000a1f3c <+44>: ldr    x10, [x8, #0x8]
    0x1000a1f40 <+48>: bfxil  x9, x10, #0, #2
    0x1000a1f44 <+52>: str    x9, [x8, #0x8]
Target 0: (python3) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001000a1f38 python3`list_dealloc + 40
    frame #1: 0x000000010019c234 python3`_PyEval_EvalFrameDefault + 11416
    frame #2: 0x0000000100196e98 python3`PyEval_EvalCode + 260
    frame #3: 0x0000000100241404 python3`run_mod + 288
    frame #4: 0x0000000100241174 python3`pyrun_file + 148
    frame #5: 0x0000000100240b4c python3`_PyRun_SimpleFileObject + 288
    frame #6: 0x00000001002404e4 python3`_PyRun_AnyFileObject + 232
    frame #7: 0x000000010026be7c python3`pymain_run_file_obj + 260
    frame #8: 0x000000010026b618 python3`pymain_run_file + 72
    frame #9: 0x000000010026ae68 python3`Py_RunMain + 880
    frame #10: 0x00000001000043c4 python3`main + 56
    frame #11: 0x000000019c03bf28 dyld`start + 2236

Looks like this might be a Numba bug in typed-list..

@TheTesla
Copy link
Author

Reusing the variable name makes the segfault:

https://github.com/TheTesla/xyzcad/blob/debugsegfault/debugsegfault.py

@esc
Copy link
Member

esc commented May 2, 2024

Type of variable 'i.3' cannot be determined, operation: $phi306.1, location: /Users/esc/git/xyzcad/xyzcad/render.py (512)

That could make sense, the error:

Type of variable 'i.3' cannot be determined, operation: $phi306.1, location: /Users/esc/git/xyzcad/xyzcad/render.py (512)

does indicate some issue resolving a phi node, so perhaps this is down to variable name re-use? It seems odd however that this would segfault instead of failing to compile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants