Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabling valgrind for now. #3068

Merged
merged 1 commit into from
Jul 1, 2021
Merged

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Jul 1, 2021

EDIT 2020-07-10: This PR was rolled back with PR #3090.


Some observations:

  • gcc + Python are identical between "last working" and "first broken". See below.
  • The valgrind version is pinned (3.16.1).
  • The test triggering a valgrind error was new in tested PR smart_holder trampoline shared_from_this support #3023 (test_class_sh_trampoline_shared_from_this.py). See below for the valgrind error.
  • Google-internal testing with (ASAN, MSAN, UBSAN) x (Python 3.6, 3.7) x (fastbuild, opt, dbg) did not show any errors for test_class_sh_trampoline_shared_from_this.py (to self for future reference: log_test18_2021-07-01+0746).

The root cause for the error could be:

  1. A change in pybind11 under PR smart_holder trampoline shared_from_this support #3023.
  2. An existing bug in pybind11 triggered only by PR smart_holder trampoline shared_from_this support #3023.
  3. An existing bug in Python 3.9.5 triggered only by PR smart_holder trampoline shared_from_this support #3023.
  4. An existing bug in gcc 9.3.0 triggered only by PR smart_holder trampoline shared_from_this support #3023.
  5. A valgrind false positive.

Based on the ASAN, MSAN, UBSAN testing, 1. and 2. do not seem to be the most likely root causes, although only proving what the root cause actually is can rule that out completely. Unfortunately this is likely to be very time consuming.

../../tests/test_class_sh_trampoline_shared_from_this.py ..............  [ 38%]
==7052== Invalid read of size 1
==7052==    at 0x485AC8: subtype_dealloc (typeobject.c:1343)
==7052==    by 0x46C5C4: _Py_Dealloc (object.c:2209)
==7052==    by 0x44354B: _Py_DECREF (object.h:430)
==7052==    by 0x44354B: frame_dealloc (frameobject.c:582)
==7052==    by 0x46C5C4: _Py_Dealloc (object.c:2209)
==7052==    by 0x53950B: _Py_DECREF (object.h:430)
==7052==    by 0x53950B: _Py_XDECREF (object.h:497)
==7052==    by 0x53950B: tb_dealloc (traceback.c:167)
==7052==    by 0x46C5C4: _Py_Dealloc (object.c:2209)
==7052==    by 0x4362A5: _Py_DECREF (object.h:430)
==7052==    by 0x4362A5: BaseException_clear (exceptions.c:78)
==7052==    by 0x4368E1: BaseException_dealloc (exceptions.c:88)
==7052==    by 0x46C5C4: _Py_Dealloc (object.c:2209)
==7052==    by 0x47DA04: _Py_DECREF (object.h:430)
==7052==    by 0x47DA04: _Py_XDECREF (object.h:497)
==7052==    by 0x47DA04: tupledealloc (tupleobject.c:237)
==7052==    by 0x46C5C4: _Py_Dealloc (object.c:2209)
==7052==    by 0x45B2D6: _Py_DECREF (object.h:430)
==7052==    by 0x45B2D6: _Py_XDECREF (object.h:497)
==7052==    by 0x45B2D6: dict_dealloc (dictobject.c:2018)
==7052==  Address 0x7cdd0a9 is 185 bytes inside a block of size 936 free'd
==7052==    at 0x483C9FC: free (vg_replace_malloc.c:538)
==7052==    by 0x46FA5E: _PyMem_RawFree (obmalloc.c:127)
==7052==    by 0x471162: PyObject_Free (obmalloc.c:709)
==7052==    by 0x54AB9D: PyObject_GC_Del (gcmodule.c:2329)
==7052==    by 0x483B71: type_dealloc (typeobject.c:3486)
==7052==    by 0x8544410: pybind11_meta_dealloc (class.h:231)
==7052==    by 0x46C5C4: _Py_Dealloc (object.c:2209)
==7052==    by 0x8537F7B: _Py_DECREF (object.h:430)
==7052==    by 0x8545146: pybind11_object_dealloc (class.h:438)
==7052==    by 0x485AC7: subtype_dealloc (typeobject.c:1337)
==7052==    by 0x46C5C4: _Py_Dealloc (object.c:2209)
==7052==    by 0x44354B: _Py_DECREF (object.h:430)
==7052==    by 0x44354B: frame_dealloc (frameobject.c:582)
==7052==  Block was alloc'd at
==7052==    at 0x483B7FB: malloc (vg_replace_malloc.c:307)
==7052==    by 0x46FAD1: _PyMem_RawMalloc (obmalloc.c:99)
==7052==    by 0x4710D2: PyObject_Malloc (obmalloc.c:685)
==7052==    by 0x549BAD: _PyObject_GC_Alloc (gcmodule.c:2237)
==7052==    by 0x54A974: _PyObject_GC_Malloc (gcmodule.c:2264)
==7052==    by 0x488CAA: PyType_GenericAlloc (typeobject.c:1047)
==7052==    by 0x48FD40: type_new (typeobject.c:2596)
==7052==    by 0x488209: type_call (typeobject.c:1014)
==7052==    by 0x43396B: _PyObject_MakeTpCall (call.c:191)
==7052==    by 0x433C64: _PyObject_FastCallDictTstate (call.c:113)
==7052==    by 0x433CF8: PyObject_VectorcallDict (call.c:142)
==7052==    by 0x6A0E00: builtin___build_class__ (bltinmodule.c:232)
==7052==
{
   <insert_a_suppression_name_here>
   Memcheck:Addr1
   fun:subtype_dealloc
   fun:_Py_Dealloc
   fun:_Py_DECREF
   fun:frame_dealloc
   fun:_Py_Dealloc
   fun:_Py_DECREF
   fun:_Py_XDECREF
   fun:tb_dealloc
   fun:_Py_Dealloc
   fun:_Py_DECREF
   fun:BaseException_clear
   fun:BaseException_dealloc
   fun:_Py_Dealloc
   fun:_Py_DECREF
   fun:_Py_XDECREF
   fun:tupledealloc
   fun:_Py_Dealloc
   fun:_Py_DECREF
   fun:_Py_XDECREF
   fun:dict_dealloc
}
$ grep 'The CXX compiler identification' 0_last_working.txt 1_first_broken.txt 
0_last_working.txt:2021-06-29T23:19:15.2998535Z -- The CXX compiler identification is GNU 9.3.0
1_first_broken.txt:2021-06-29T23:32:34.3368079Z -- The CXX compiler identification is GNU 9.3.0
$ grep 'Unpacking python3.9' 0_last_working.txt
2021-06-29T23:16:57.2875405Z Unpacking python3.9-minimal (3.9.5-3~20.04.1) ...
2021-06-29T23:17:00.4808704Z Unpacking python3.9 (3.9.5-3~20.04.1) ...
2021-06-29T23:17:00.5488969Z Unpacking python3.9-dbg (3.9.5-3~20.04.1) ...
2021-06-29T23:17:02.0151383Z Unpacking python3.9-dev (3.9.5-3~20.04.1) ...
2021-06-29T23:17:02.0779172Z Unpacking python3.9-venv (3.9.5-3~20.04.1) ...
$ grep 'Unpacking python3.9' 1_first_broken.txt 
2021-06-29T23:30:12.7628298Z Unpacking python3.9-minimal (3.9.5-3~20.04.1) ...
2021-06-29T23:30:15.9816223Z Unpacking python3.9 (3.9.5-3~20.04.1) ...
2021-06-29T23:30:16.0515017Z Unpacking python3.9-dbg (3.9.5-3~20.04.1) ...
2021-06-29T23:30:17.5608930Z Unpacking python3.9-dev (3.9.5-3~20.04.1) ...
2021-06-29T23:30:17.6301530Z Unpacking python3.9-venv (3.9.5-3~20.04.1) ...

@rwgk rwgk marked this pull request as ready for review July 1, 2021 06:37
@rwgk rwgk requested a review from henryiii as a code owner July 1, 2021 06:37
@rwgk rwgk merged commit ba3f167 into pybind:smart_holder Jul 1, 2021
@rwgk rwgk deleted the sh_valgrind_false branch July 1, 2021 06:37
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Jul 1, 2021
@rwgk rwgk removed the needs changelog Possibly needs a changelog entry label Jul 1, 2021
@rwgk
Copy link
Collaborator Author

rwgk commented Jul 10, 2021

Logging a fresh observation for future reference (thanks a lot to @Yhg1s for making this possible!):

Google-internal testing with (ASAN, UBSAN) x (Python 3.9) x (fastbuild, opt, dbg) is clean

BUT

MSAN x (Python 3.9) x (fastbuild, opt, dbg) fails, pointing to the exact same line number in Objects/typeobject.c as valgrind. See below.

This makes potential root causes 4. and 5. in the PR description very unlikely.

Also note that test_class_sh_trampoline_shared_from_this.py is the only failing test. I had to exclude 5 tests that need numpy/scipy/eigen, but the rest runs clean with all sanitizers.

For completeness, the 5 excluded tests are: test_buffers.py, test_eigen.py, test_numpy_array.py, test_numpy_dtypes.py, test_numpy_vectorize.py.

platform linux -- Python 3.9.5+, pytest-6.2.3, py-1.10.0, pluggy-0.9.0
configfile: pytest.ini
collected 14 items

test_class_sh_trampoline_shared_from_this.py ..............              [100%]

============================== 14 passed in 0.52s ==============================
==5483==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5639a075a73d in subtype_dealloc Objects/typeobject.c:1343:46

rwgk added a commit to rwgk/pybind11 that referenced this pull request Jul 10, 2021
* Using latest pytest main branch for 3.9 and 3.10.
* WORKAROUND_ENABLING_ROLLBACK_OF_PR3068 in test_class_sh_trampoline_shared_from_this.py

First experiment combining two potential fixes: latest pytest, workaround.
If this succeeds the next step will be to try only latest pytest without the workaround.

Note: the workaround is known to resolve the MSAN error reported under
pybind#3068 (comment)
rwgk added a commit to rwgk/pybind11 that referenced this pull request Jul 10, 2021
* Using latest pytest main branch for 3.9 and 3.10.
* WORKAROUND_ENABLING_ROLLBACK_OF_PR3068 in test_class_sh_trampoline_shared_from_this.py

First experiment combining two potential fixes: latest pytest, workaround.
If this succeeds the next step will be to try only latest pytest without the workaround.

Note: the workaround is known to resolve the MSAN error reported under
pybind#3068 (comment)
rwgk added a commit that referenced this pull request Jul 11, 2021
* * Rollback of PR #3068.
* Using latest pytest main branch for 3.9 and 3.10.
* WORKAROUND_ENABLING_ROLLBACK_OF_PR3068 in test_class_sh_trampoline_shared_from_this.py

First experiment combining two potential fixes: latest pytest, workaround.
If this succeeds the next step will be to try only latest pytest without the workaround.

Note: the workaround is known to resolve the MSAN error reported under
#3068 (comment)

* WORKAROUND_ENABLING_ROLLBACK_OF_PR3068 = False

* Narrowing down WORKAROUND_ENABLING_ROLLBACK_OF_PR3068 to Python 3.9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant