Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: intermittent segfault in PyPy #3089

Merged
merged 15 commits into from
May 2, 2024

Conversation

ianna
Copy link
Collaborator

@ianna ianna commented Apr 22, 2024

awkward-cpp/tests-cpu-kernels-explicit/test_unit_cpuawkward_UnionArray32_flatten_length_64.py::test_unit_cpuawkward_UnionArray32_flatten_length_64_3 FAILED [ 56%]
Error: test_unit_cpuawkward_UnionArray32_flatten_length_64_3

assert [0] == approx([7 ± 7.0e-06])
  
  comparison failed. Mismatched elements: 1 / 1:
  Max absolute difference: 7
  Max relative difference: inf
  Index | Obtained | Expected   
  0     | 0        | 7 ± 7.0e-06
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    def test_unit_cpuawkward_UnionArray32_flatten_length_64_3():
        total_length = [123]
        total_length = (ctypes.c_int64*len(total_length))(*total_length)
        fromtags = [0, 0, 0, 0]
        fromtags = (ctypes.c_int8*len(fromtags))(*fromtags)
        fromindex = [0, 1, 2, 3]
        fromindex = (ctypes.c_int32*len(fromindex))(*fromindex)
        length = 4
        offsetsraws = [[0, 1, 3, 5, 7], [1, 3, 5, 7, 9]]
        offsetsraws_ptr = (ctypes.POINTER(ctypes.c_int64) * len(offsetsraws))()
        for i in range(len(offsetsraws)):
            offsetsraws_ptr[i] = (ctypes.c_int64 * len(offsetsraws[i]))(*offsetsraws[i])
        offsetsraws = offsetsraws_ptr
        funcC = getattr(lib, 'awkward_UnionArray32_flatten_length_64')
        ret_pass = funcC(total_length, fromtags, fromindex, length, offsetsraws)
        pytest_total_length = [7]
>       assert total_length[:len(pytest_total_length)] == pytest.approx(pytest_total_length)
E       assert [0] == approx([7 ± 7.0e-06])
E         
E         comparison failed. Mismatched elements: 1 / 1:
E         Max absolute difference: 7
E         Max relative difference: inf
E         Index | Obtained | Expected   
E         0     | 0        | 7 ± 7.0e-06
fromindex  = <_ctypes.array.c_int_Array_4 object at 0x000000000428f448>
fromtags   = <_ctypes.array.c_byte_Array_4 object at 0x000000000428f430>
funcC      = <ctypes.CDLL.__init__.<locals>._FuncPtr object at 0x00000000018b5388>
i          = 1
length     = 4
offsetsraws = <_ctypes.array.LP_c_long_Array_2 object at 0x000000000428f478>
offsetsraws_ptr = <_ctypes.array.LP_c_long_Array_2 object at 0x000000000428f478>
pytest_total_length = [7]
ret_pass   = <awkward_cpp._kernel_signatures.ERROR object at 0x0000000004dd0950>
total_length = <_ctypes.array.c_long_Array_1 object at 0x000000000428f418>
awkward-cpp/tests-cpu-kernels-explicit/test_unit_cpuawkward_UnionArray32_flatten_length_64.py:70: AssertionError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>
> /home/runner/work/awkward/awkward/awkward-cpp/tests-cpu-kernels-explicit/test_unit_cpuawkward_UnionArray32_flatten_length_64.py(70)test_unit_cpuawkward_UnionArray32_flatten_length_64_3()
-> assert total_length[:len(pytest_total_length)] == pytest.approx(pytest_total_length)
(Pdb) 
!!!!!!!!!!!!!!!!!!! _pytest.outcomes.Exit: Quitting debugger !!!!!!!!!!!!!!!!!!!
======================= 1 failed, 3261 passed in 18.02s ========================
Error: Process completed with exit code 2.

@ianna ianna linked an issue Apr 22, 2024 that may be closed by this pull request
@ianna ianna force-pushed the ianna/2971-intermittent-segfault-in-pypy branch from b682f40 to 4c40472 Compare April 22, 2024 13:09
@ianna ianna changed the title test: add debugging fix: intermittent segfault in PyPy Apr 30, 2024
Copy link
Collaborator Author

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope.

@ianna ianna force-pushed the ianna/2971-intermittent-segfault-in-pypy branch from 63243ba to e4baf54 Compare May 1, 2024 23:40
Copy link
Collaborator Author

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpivarski - removing coverage from the PyPy test job cures the segfault! I think, you can merge this PR to fix the failing jobs.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the diff in this PR removes the unnecessary length argument from awkward_unique_ranges, which can't be responsible for the PyPy segfault unless the length argument had been passed inconsistently, and now it's consistent. (It doesn't look that way—it looks to me like a neutral-refactoring.)

Another part of this diff is explicit integer casts in the C code. That also should be unrelated to PyPy segfaults, since it's not in the part of the code that's different between PyPy and CPython. Perhaps PyPy's ctypes is different, passing data as a different integer type than intended, but these are conversion-casts in the C code, not reinterpret-casts.

I don't see anything in this PR that explains why PyPy was segfaulting, or would cause PyPy to stop segfaulting. But if it's not segfaulting now, I'm definitely willing to merge this update and see what happens to the PyPy test for the next few weeks. (Everything in this PR is a net-positive, regardless of whether it fixes the segfault or not.) If the segfault is fixed, then it's likely something that I didn't catch by eye, but is a correction nonetheless.

@ianna, do you have a theory about what was causing the segfault before and not now? Some line of code in the diff that you can point to? Either way, go ahead and merge this PR and we'll see what happens. I'll even propagate it out to the Ragged library, where I initially encountered the segfault, to see if it's fixed there, too.

@jpivarski
Copy link
Member

Actually, don't merge this until #3086 is merged. There's going to be a conflict, and it will be easier to resolve the conflict in this PR (the diff is smaller). So let's get #3086 into main and then update this PR to main to deal with that.

The conflict will be in how removing the unnecessary length argument is done.

@jpivarski
Copy link
Member

#3086 has been merged, and so I updated this branch and will run through the tests again. If all the tests pass, this PR can be merged, so I've enabled auto-merge.

@jpivarski jpivarski enabled auto-merge (squash) May 2, 2024 16:46
@jpivarski
Copy link
Member

To record what I learned from @ianna during our meeting: the relevant change is that this PR runs the code-coverage test for CPython and not for PyPy. The hypothesis is that the segfault was never in Awkward, but in pytest-cov. (You have a link to an issue about that in the pytest-cov project, right, @ianna?) That hypothesis would make sense of a few things:

  • The segfault was only seen in CI; @ianna wasn't running pytest-cov locally and didn't see the segfault (unless you eventually added that to your local test?).
  • The segfault was first discovered in ragged, which has a very different test suite than Awkward, but it also runs pytest-cov.

Although it's still possible that an error in Awkward is only revealed when pytest-cov runs in PyPy (because it somehow stomps on memory that doesn't get stomped on without pytest-cov), if I remove pytest-cov from ragged's tests and get no segfault, that will be a strong indication that it's caused by pytest-cov and not Awkward, since ragged's test suite is so different from Awkward's. It wouldn't be 100% conclusive, but it would be conclusive enough for me.

@jpivarski
Copy link
Member

There's a test of our minimally supported pyarrow version that intermittently, but rarely, fails because of

AssertionError: assert [[['', 'αβγ', '', ''], ['', '', '']], [], [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', ' ']]] == [[['', 'αβγ', '', ''], ['', '', '']], [], [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', '', '']]]

I am very sure that this one is due to something in pyarrow. We could increase our minimal pyarrow, but that wouldn't be for compatibility reasons, it would be because pyarrow 7 has issues. Instead, I'll just re-run this test by hand.

@jpivarski jpivarski enabled auto-merge (squash) May 2, 2024 17:05
@jpivarski jpivarski merged commit 7f1d261 into main May 2, 2024
41 checks passed
@jpivarski jpivarski deleted the ianna/2971-intermittent-segfault-in-pypy branch May 2, 2024 17:07
@jpivarski
Copy link
Member

I'm testing it in ragged here: scikit-hep/ragged#47

Although I remember segfaults in PyPy, I'm not seeing them now—with or without codecov.

@jpivarski
Copy link
Member

FYI: #3097 was updated to main, with this fix in, and unfortunately, there's still a PyPy segfault: https://github.com/scikit-hep/awkward/actions/runs/8943876656/job/24569544958?pr=3097

I was going to make the PyPy test required, but it looks like it's not ready for that, yet.

@ianna
Copy link
Collaborator Author

ianna commented May 3, 2024

FYI: #3097 was updated to main, with this fix in, and unfortunately, there's still a PyPy segfault: https://github.com/scikit-hep/awkward/actions/runs/8943876656/job/24569544958?pr=3097

I was going to make the PyPy test required, but it looks like it's not ready for that, yet.

yes, I was just looking at it :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittent segfault in pypy
2 participants