Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyPy3: tests sometimes fail with "double free detected in tcache 2" #510

Closed
sbraz opened this issue Nov 23, 2020 · 13 comments
Closed

PyPy3: tests sometimes fail with "double free detected in tcache 2" #510

sbraz opened this issue Nov 23, 2020 · 13 comments
Assignees
Labels
Milestone

Comments

@sbraz
Copy link
Contributor

sbraz commented Nov 23, 2020

Hi,
I have noticed that tests sometimes fail with PyPy3 with a double free detected in tcache 2 error.

It's not always the same test that fails but the problem is always caused by objects being freed ("/wand/wand/cdefs/wandtypes.py", line 23 in __del__).

I used this Dockerfile to reproduce it, it might take a few dozen runs to trigger it but I have seen it fail on two systems:

FROM pypy:3
RUN apt-get update
RUN apt install -y imagemagick
RUN pip install pytest
RUN git clone --depth 1 https://github.com/emcconville/wand
WORKDIR wand
RUN bash -c 'i=0; while pytest -s -vv --skip-pdf; do let i++; echo done with run $i; done; echo crashed'

Here is the log from the last iteration of the loop, when tests fail.

I have also seen this error after the tests end, which might be related. However, I cannot reproduce it with the Docker image, I only see it on Gentoo:

  File "/tmp/Wand-0.6.4/wand/resource.py", line 72, in shutdown
    deallocator = allocation_map.pop(addr)
KeyError: 94182358832672
debug: OperationError:
debug:  operror-type: KeyError
debug:  operror-value: 94182358832672
@emcconville
Copy link
Owner

Thanks for posting this.

I'm seeing malloc(): unaligned fastbin chunk detected & double free detected in tcache 2 randomly.
And they seem to only show up with recent version of PyPy

[PyPy 7.3.1 with GCC 10.2.1 20200723 (Red Hat 10.2.1-1)]

Everything works as expected before 7.3.

[PyPy 7.2.0 with GCC 8.2.0]

I wouldn't be surprised if there's a bug somewhere in Wand, BUT when pytest is invoked with --cache-clear, the issues seem to clear. Can you try append that argument to the docker test?

@sbraz
Copy link
Contributor Author

sbraz commented Nov 24, 2020

Crashed after 81 runs:

tests/sequence_test.py::test_extend_first PASSED
tests/sequence_test.py::test_extend_first_sequence PASSED
tests/sequence_test.py::test_container PASSED
tests/sequence_test.py::test_index free(): double free detected in tcache 2
Fatal Python error: Aborted

Stack (most recent call first, approximate line numbers):
  File "/opt/pypy/lib_pypy/_ctypes/function.py", line 372 in _call_funcptr
  File "/opt/pypy/lib_pypy/_ctypes/function.py", line 289 in __call__
  File "/wand/wand/cdefs/wandtypes.py", line 23 in __del__
  File "/wand/wand/sequence.py", line 295 in index
  File "/wand/tests/sequence_test.py", line 273 in test_index
  File "/opt/pypy/site-packages/_pytest/python.py", line 177 in pytest_pyfunc_call
  File "/opt/pypy/site-packages/pluggy/callers.py", line 157 in _multicall
  File "/opt/pypy/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/pypy/site-packages/pluggy/manager.py", line 90 in _hookexec
  File "/opt/pypy/site-packages/pluggy/hooks.py", line 272 in __call__
  File "/opt/pypy/site-packages/_pytest/python.py", line 1625 in runtest
  File "/opt/pypy/site-packages/_pytest/runner.py", line 154 in pytest_runtest_call
  File "/opt/pypy/site-packages/_pytest/logging.py", line 681 in _runtest_for [jit inlined]
  File "/opt/pypy/site-packages/_pytest/logging.py", line 706 in pytest_runtest_call [jit inlined]
  File "/opt/pypy/site-packages/pluggy/callers.py", line 157 in _multicall [jit inlined]
  File "/opt/pypy/site-packages/pluggy/manager.py", line 84 in <lambda> [jit inlined]
  File "/opt/pypy/site-packages/pluggy/manager.py", line 90 in _hookexec [jit inlined]
  File "/opt/pypy/site-packages/pluggy/hooks.py", line 272 in __call__ [jitted]
  File "/opt/pypy/site-packages/pluggy/hooks.py", line 272 in __call__
  File "/opt/pypy/site-packages/_pytest/runner.py", line 256 in <lambda>
  File "/opt/pypy/site-packages/_pytest/runner.py", line 299 in from_call
  File "/opt/pypy/site-packages/_pytest/runner.py", line 241 in call_runtest_hook
  File "/opt/pypy/site-packages/_pytest/runner.py", line 213 in call_and_report
  File "/opt/pypy/site-packages/_pytest/runner.py", line 115 in runtestprotocol
  File "/opt/pypy/site-packages/_pytest/runner.py", line 107 in pytest_runtest_protocol
  File "/opt/pypy/site-packages/pluggy/callers.py", line 157 in _multicall
  File "/opt/pypy/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/pypy/site-packages/pluggy/manager.py", line 90 in _hookexec
  File "/opt/pypy/site-packages/pluggy/hooks.py", line 272 in __call__
  File "/opt/pypy/site-packages/_pytest/main.py", line 326 in pytest_runtestloop
  File "/opt/pypy/site-packages/pluggy/callers.py", line 157 in _multicall
  File "/opt/pypy/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/pypy/site-packages/pluggy/manager.py", line 90 in _hookexec
  File "/opt/pypy/site-packages/pluggy/hooks.py", line 272 in __call__
  File "/opt/pypy/site-packages/_pytest/main.py", line 309 in _main
  File "/opt/pypy/site-packages/_pytest/main.py", line 244 in wrap_session
  File "/opt/pypy/site-packages/_pytest/main.py", line 305 in pytest_cmdline_main
  File "/opt/pypy/site-packages/pluggy/callers.py", line 157 in _multicall
  File "/opt/pypy/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/pypy/site-packages/pluggy/manager.py", line 90 in _hookexec
  File "/opt/pypy/site-packages/pluggy/hooks.py", line 272 in __call__
  File "/opt/pypy/site-packages/_pytest/config/__init__.py", line 130 in main
  File "/opt/pypy/site-packages/_pytest/config/__init__.py", line 180 in console_main
  File "/opt/pypy/bin/pytest", line 3 in <module>
  File "<builtin>/app_main.py", line 800 in execfile
  File "<builtin>/app_main.py", line 90 in run_toplevel
  File "<builtin>/app_main.py", line 559 in run_command_line
  File "<builtin>/app_main.py", line 871 in entry_point
bash: line 1: 22849 Aborted                 (core dumped) pytest --cache-clear -s -vv --skip-pdf

@emcconville
Copy link
Owner

emcconville commented Nov 25, 2020

Okay. I think I have the issue identified, and it seems to be a related to memory chunks sitting in the garbage collector.

@sbraz
Copy link
Contributor Author

sbraz commented Nov 25, 2020

Thanks, it's a cold morning so I'm going to attempt to heat up the room by running tests from that branch continuously :)

I'm at run 45 and no crash so far.

@sbraz
Copy link
Contributor Author

sbraz commented Nov 25, 2020

Bad news, it's been stuck at run 47 for 2 hours:

tests/image_properties_test.py::test_length_of_bytes PASSED
tests/image_properties_test.py::test_loop PASSED
tests/image_properties_test.py::test_matte_color PASSED
tests/image_properties_test.py::test_mean PASSED
tests/image_properties_test.py::test_metadata 

I'm attaching a debug log, is there anything else you'd like me to try before I cancel this job?

@emcconville
Copy link
Owner

Don't know what else I can do. I've made it as high as 155 iterations before killing the docker instance.

FYI: running the original docker script without --cache-clear on issue_510 branch.

FROM pypy:3
RUN apt-get update
RUN apt install -y imagemagick
RUN pip install pytest
RUN git clone --depth=1 --branch=issue_510 https://github.com/emcconville/wand
WORKDIR wand
RUN bash -c 'i=0; while pytest -s -vv --skip-pdf; do let i++; echo done with run $i; done; echo crashed'

If anyone else has some wisdom, or extra CPU, a little help would be nice.

@emcconville
Copy link
Owner

Hit 133 iterations in 20min, and still going strong. Memory jumps between 64MB and 404MB.

Some additional thoughts:

  • Perhaps add --skip-slow & --skip-fft arguments in pytest to reduce CPU time for non-python tasks.
  • We're not really testing ImageMagick, but more focused on validated ctypes in PyPy. Specifically fastbin's allocation & deallocation management. Perhaps a better testsuite is needed.
  • Other bugs are hiding, and we might be doing a long-winded effort where we can't identify them.

I'll keep the current test running for the better part of the day, then repeat with CPython to verify. After which I'll reboot some old memory profile testing.

@sbraz
Copy link
Contributor Author

sbraz commented Nov 25, 2020

Thanks for your work, it's still much better than the master branch. I don't really know why it hanged earlier, maybe it was something on my end. At the moment, I'm done with run 112 of my Dockerfile (which is the same as yours) and it hasn't stopped.

@sbraz
Copy link
Contributor Author

sbraz commented Nov 25, 2020

I'm now at run 628 and it's still fine. I doubt the problem will appear again. I'm going to stop the container.

@emcconville
Copy link
Owner

Great! Both PyPy & CPython ran over 600 iterations without issue, so I think the original issue was resolved. I'll keep this ticket open as I continue regression tests + memory profiling, and will close it when the issue_510 branch merges into master.

@sbraz
Copy link
Contributor Author

sbraz commented Nov 29, 2020

Thanks a lot, will you release v0.6.5 today?

@emcconville
Copy link
Owner

Probably in the next 12 hours, but yes, 0.6.5 will be released shortly.

@emcconville
Copy link
Owner

Wand 0.6.5 released. Thanks for all your testing help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants