Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: saving a huge object array with np.savez leaks memory on PyPy #15775

Closed
mattip opened this issue Mar 18, 2020 · 13 comments
Closed

BUG: saving a huge object array with np.savez leaks memory on PyPy #15775

mattip opened this issue Mar 18, 2020 · 13 comments
Labels

Comments

@mattip
Copy link
Member

mattip commented Mar 18, 2020

The test_large_zip test in numpy/lib/tests/test_io.py is currently skipped on PyPy since the allocated memory does not seem to be freed, even after multiple calls to gc.collect. The test passes, but later in the test suite calls to subprocess fail, it seems the call to fork does not use copy-on-write and the non-freed memory requires more than 50% of the available memory.

It is not clear to me where the problem is, what objects are not released. I tried to instrument with weakref.ref but did not find live objects after gc.collect, and valgrind also did not show what is holding memory.

@anirudh2290
Copy link
Member

@mattip I am interested in taking a look at this issue. Can you please post the link to the test suite that calls subprocess.

@mattip
Copy link
Member Author

mattip commented Mar 19, 2020

When you run python runtests.py -mfull, this test is run in the middle of a CI run. Later tests call subprocess.Popen. You can simulate this by changing the test, put out = subprocess.check_output(['ls']) immediately after the test. If you use ulimit` to restrict memory to the minimum needed for the first part of the test to pass, you should be able to cause a crash even on systems with "unlimited" memory.

@anirudh2290
Copy link
Member

@mattip Thanks for the info ! will take a look.

@anirudh2290
Copy link
Member

I tried the following :

import numpy as np
from numpy.testing import tempdir
import os
import gc
import subprocess

def test_leak():
    test_data = np.asarray([np.random.rand(50, 4)
                           for i in range(800000)], dtype=object)
    with tempdir() as tmpdir:
        np.savez(os.path.join(tmpdir, 'test.npz'), test_data=test_data)

test_leak()
gc.collect()
out = subprocess.check_output(['ls'])

Setting ulimit -v 12600000 causes the program to crash with the subprocess line commented out.
With ulimit -v 12610000 causes the program to run successfully, even with the subprocess line uncommented.
This shows that the additional subprocess line took a max of 10MB extra.
Having said that, one likely theory for the failure, may be related to what instagram experienced and which also lead to adding of gc.freeze API to CPython garbage collector:

The garbage collector's collect() call touches every object's PyGC_head, and this triggers CoW even though the object weren't written to in the user program.
The default threshold for gc being low, this may be happening in the some of the subprocess calls in numpy tests.

One option that we have here is to push the test with high memory requirements (test_large_zip in this case) to its own subprocess. WDYT?

@mattip
Copy link
Member Author

mattip commented Mar 24, 2020

With ulimit -v 12610000 causes the program to run successfully, even with the subprocess line uncommented.

On CPython or PyPy? In my experience CPython will work but PyPy will crash. While running the test in its own subprocess would probably avoid the problem, I would like to get to the root cause: why is PyPy not releasing the memory and allowing the subprocess call to succeed? What objects are being held?

Once gc.collect runs (you should run it at least 3 times on PyPy to make sure it has broken any reference cycles), any objects used during the call to test_leak should have been released. There should be minimal memory to be copied even if COW is not triggered. On PyPy, I think the needed memory delta to run the subprocess call is on the order of 4GB.

@anirudh2290
Copy link
Member

On CPython or PyPy?

My testing has been with CPython . I will try with PyPy.

@anirudh2290
Copy link
Member

import numpy as np
from numpy.testing import tempdir
import os
import gc
import sys
from time import sleep
import subprocess

def test_leak():
    test_data = np.asarray([np.random.rand(50, 4)
                            for i in range(800000)], dtype=object)
    with tempdir() as tmpdir:
        np.savez(os.path.join(tmpdir, 'test.npz'), test_data=test_data)
test_leak()
gc.collect()
gc.collect()
gc.collect()
print("gc collect completed")
print("sleeping")
sleep(20)
#out = subprocess.check_output(['ls'])

Here is what I did: ulimit -v 21650000 to run the above script with pypy3-c. This was the maximum memory without memory error that is required to run the above script.
It works both with the subprocess call and without the subprocess call. If you reduce the ulimit to 21600000 it will cause the code to crash irrespective of the subprocess call. So at max it requires 50 MB extra with the subprocess call is what I see on my machine.

Also, I tried to test whether there is a leak, with the existing code without the subprocess call. I ran the following script:

import numpy as np
from numpy.testing import tempdir
import os
import gc
import sys
from time import sleep
import subprocess

def test_leak():
    test_data = np.asarray([np.random.rand(50, 4)
                            for i in range(800000)], dtype=object)
    with tempdir() as tmpdir:
        np.savez(os.path.join(tmpdir, 'test.npz'), test_data=test_data)

def main():
    while True:
        test_leak()
        gc.collect()
        gc.collect()
        gc.collect()
        print("gc collect completed")
        print("sleeping")
        sleep(20)

if __init__ == '__main__':
    main()

I wasn't able to reproduce memory leak with the 21650000 KB memory ulimit that I mentioned above. The above script ran close to 50 times with no increase in peak RSS of the process. I think as mentioned before the issue of subprocess OOM in numpy tests may be happening because of the following : 1. The garbage collection in one of the tests running in subprocess touches objects PyGC head, triggerring writes even without a write op. 2. The child process is allocating memory in shared pages, triggering CoW.

The difference between the CPython and Pypy RSS memory is huge though : close to 9 to 10 GB. Going through the garbage collector doc for pypy mentions the following: http://doc.pypy.org/en/latest/cpython_differences.html

If the program allocates a lot of memory, and then frees it all by dropping all references to it, then we might expect to see the RSS to drop. (RSS = Resident Set Size on Linux, as seen by “top”; it is an approximation of the actual memory usage from the OS’s point of view.) This might not occur: the RSS may remain at its highest value. This issue is more precisely caused by the process not returning “free” memory to the OS. We call this case “unreturned memory”.

Probably what we are seeing here is a lot of unreturned memory that the garbage collector is holding on to and using it for its future memory allocations.
I used the get_stats call with memory pressure=True as mentioned in the doc, and I see the allocated memory to be much lower (around 4 GB) when compared to RSS (21 to 22 GB) which probably indicates unreturned memory.

@mattip please let me know if i am missing something.

@mattip
Copy link
Member Author

mattip commented Mar 27, 2020

Thanks for the analysis. It seems to make sense. Let's leave this open until we can re-enable PyPy in CI (waiting on a fix for OpenBLAS, gh-15796) and try some alternatives like running this test in its own subprocess.

@anirudh2290
Copy link
Member

Thanks @mattip, will wait for #2533 to be merged to openblas-libs and pypy re-enabled, before adding the test in its own subprocess.

@anirudh2290
Copy link
Member

hi @mattip, this probably not a blocker but I observe something weird when I was trying to adjust the requires_memory parameter.

Running the following:


import numpy as np
from numpy.testing import tempdir
import os
import gc
import sys
from time import sleep
import subprocess

def test_leak():
    test_data = np.asarray([np.random.rand(50, 4)
                           for i in range(800000)], dtype=object)
    with tempdir() as tmpdir:
        np.savez(os.path.join(tmpdir, 'test.npz'), test_data=test_data)

if __name__ == '__main__':
    test_leak()
    gc.collect()
    gc.collect()
    gc.collect()

has a very different RSS peak (difference around 10 to 12 GB) when compared to running inside tests:

      def test_large_zip(self):
          def check_large_zip():
              # The test takes at least 6GB of memory, writes a file larger than 4GB
              test_data = np.asarray([np.random.rand(np.random.randint(50,100),4)
                                     for i in range(800000)], dtype=object)
              with tempdir() as tmpdir:
                  np.savez(os.path.join(tmpdir, 'test.npz'), test_data=test_data)
          check_large_zip()
          gc.collect()
          gc.collect()
          gc.collect()

Do you know why we such a big difference ?

@mattip
Copy link
Member Author

mattip commented Apr 1, 2020

so the stand-alone test takes 6GB and the in-test one takes 16GB?

@anirudh2290
Copy link
Member

stand alone takes close to 20 GB and the in test one takes around 8 GB.

@mattip
Copy link
Member Author

mattip commented Apr 3, 2020

Closing, gh-15893 was merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants