-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: saving a huge object array with np.savez leaks memory on PyPy #15775
Comments
@mattip I am interested in taking a look at this issue. Can you please post the link to the test suite that calls |
When you run |
@mattip Thanks for the info ! will take a look. |
I tried the following :
Setting The garbage collector's collect() call touches every object's PyGC_head, and this triggers CoW even though the object weren't written to in the user program. One option that we have here is to push the test with high memory requirements (test_large_zip in this case) to its own subprocess. WDYT? |
On CPython or PyPy? In my experience CPython will work but PyPy will crash. While running the test in its own subprocess would probably avoid the problem, I would like to get to the root cause: why is PyPy not releasing the memory and allowing the subprocess call to succeed? What objects are being held? Once gc.collect runs (you should run it at least 3 times on PyPy to make sure it has broken any reference cycles), any objects used during the call to test_leak should have been released. There should be minimal memory to be copied even if COW is not triggered. On PyPy, I think the needed memory delta to run the subprocess call is on the order of 4GB. |
My testing has been with CPython . I will try with PyPy. |
Here is what I did: Also, I tried to test whether there is a leak, with the existing code without the subprocess call. I ran the following script:
I wasn't able to reproduce memory leak with the 21650000 KB memory ulimit that I mentioned above. The above script ran close to 50 times with no increase in peak RSS of the process. I think as mentioned before the issue of subprocess OOM in numpy tests may be happening because of the following : 1. The garbage collection in one of the tests running in subprocess touches objects PyGC head, triggerring writes even without a write op. 2. The child process is allocating memory in shared pages, triggering CoW. The difference between the CPython and Pypy RSS memory is huge though : close to 9 to 10 GB. Going through the garbage collector doc for pypy mentions the following: http://doc.pypy.org/en/latest/cpython_differences.html
Probably what we are seeing here is a lot of unreturned memory that the garbage collector is holding on to and using it for its future memory allocations. @mattip please let me know if i am missing something. |
Thanks for the analysis. It seems to make sense. Let's leave this open until we can re-enable PyPy in CI (waiting on a fix for OpenBLAS, gh-15796) and try some alternatives like running this test in its own subprocess. |
hi @mattip, this probably not a blocker but I observe something weird when I was trying to adjust the requires_memory parameter. Running the following:
has a very different RSS peak (difference around 10 to 12 GB) when compared to running inside tests:
Do you know why we such a big difference ? |
so the stand-alone test takes 6GB and the in-test one takes 16GB? |
stand alone takes close to 20 GB and the in test one takes around 8 GB. |
Closing, gh-15893 was merged |
The
test_large_zip
test innumpy/lib/tests/test_io.py
is currently skipped on PyPy since the allocated memory does not seem to be freed, even after multiple calls togc.collect
. The test passes, but later in the test suite calls tosubprocess
fail, it seems the call to fork does not use copy-on-write and the non-freed memory requires more than 50% of the available memory.It is not clear to me where the problem is, what objects are not released. I tried to instrument with
weakref.ref
but did not find live objects aftergc.collect
, and valgrind also did not show what is holding memory.The text was updated successfully, but these errors were encountered: