-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when iterating over array #1280
Comments
I'm on a phone right now, so I can't test this directly yet, but Python will use all the memory it has available until it reaches a limit before calling the garbage collector. The above code should show a linear increase in memory consumption until it gets to that limit, then I think it plateaus (instead of a sawtooth shape, which is the other conceptual possiblity). If this eventually stops increasing in memory use, even if that's at the limit of your computer's resources, or at the process's ulimit, then that is correct behavior for a garbage-collected language. What it comes down to, though, is that this is an antipattern: you don't want to iterate over all elements of a large array with a Python for loop, since that creates Python objects for each element in the array (here, ak.Arrays of length 1 or 2). You want to do: my_sum = ak.sum(A, axis=-1) and possibly send that through ak.to_numpy if you need that to be NumPy, rather than Awkward. That does the sum entirely in compiled code—no Python objects for each short list, and no waiting on the garbage collector to bring memory use under control. These should be thought of as techniques to avoid using Python for anything large (memory or time). You get the computational result, but without representing all the intermediate steps in Python objects. (Same philosophy as NumPy.) |
Thanks for the detailed explanation Jim! We were seeing it use up all the memory available and then promptly crash the lxplus/SWAN node it was running on. |
I didn't do this question justice at all. I missed this, for instance:
where you were pretty clear that you know how it's supposed to be done. I tested the sample code and the extra memory used after the loop doesn't seem to go away with If we weren't porting all of these "handle" objects from C++ to Python, that would be something we'd have to fix. As it is, the new Python version of this should be automatically memory-clean. I just tried it by replacing |
Good news! PR #2311 apparently fixes this memory leak, too! I'm reopening this issue just so that it can be closed by the PR, for record-keeping. |
Version of Awkward Array
1.7.0
Description and code to reproduce
Hi,
A student I am working with noticed what appears to be a memory leak when iterating over array. I've replicated the issue with the following snippet.
A (much faster) solution without the leak to do the sum on the awkward array directly
But I figured I should report the issue here in case it is helpful.
The text was updated successfully, but these errors were encountered: