-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbounded memory use when iterating over RecordArray #2275
Comments
It could be one of these: #567, #1127, #1280. In Awkward 1.x, the layout tree structures that back the arrays were implemented in C++, and after all the investigation in those three issues, I'm still not sure whether there is a memory leak on tree node structures (the small metadata objects, not the large array buffers). To see this leak, if there is one, you have to create a large number of small arrays, which is not the intended usage pattern—as you've pointed out. You're also right that it goes away in Awkward 2.x. In fact, this was a fundamental motivation of the 2.x project, to avoid the Python/C++ border as much as possible. (ACAT talk) One thing that I know is a memory leak is circular references that cross the Python/C++ border; in fact, this is also a NumPy issue that has been open since 2009 (numpy/numpy#6581 (comment)). But in your case, the reference is not circular. Running your script with Awkward 1.10.2; layout nodes are C++ objects:
Running your script with the latest
If you know that you're going to be iterating in Python (not Numba), then an up-front conversion to Python objects ( If you're using Numba, the mechanism for iteration is so different that a memory leak here would be no indication of a memory leak in Numba. An So if your final application will be in Numba, try memory-profiling that. I think the results will be very different. |
Good news! A seemingly different memory leak in Coffea could be traced down to leaking strings when calling There will be a 1.10.3 bug-fix release with this in it (PR #2311). >>> import awkward
>>> awkward.__version__
'1.10.2'
>>> import uproot
>>> uproot.__version__
'4.3.7'
(does not grow; it used to grow by about a factor of 10.) |
Does seem to be fixed in 1.10.3, thanks! |
Version of Awkward Array
1.10.2
Description and code to reproduce
When using iterators for a RecordArray multiple times, we see apparently unbounded growth of the RSS of our code, which eventually causes them to be killed by the OOM-killer. Not entirely sure if this is coming from uproot or awkward, but the fact that it depends on the way the RecordArray is accessed suggests that it's an awkward issue.
To reproduce:
The test file can be obtained from root://utatlas.its.utexas.edu//data_ceph/onyisi/larlumi/root/run_00358031.root . The first set of iterations ("Array-at-a-time") will increase and level out RSS use, while "Iteration" will grow on each loop by a similar amount. Explicitly switching off the caches in uproot.open() doesn't affect this.
The issue appears in awkward 1.10.2 / uproot 4.3.7. I can confirm that it seems to be fixed in awkward 2.0.8 / uproot 5.0.3 ; however the coffea dependencies mean we can't upgrade to this version at this time.
(And yes we do have reasons why we can't use the array-at-a-time operations ... and numba makes them pretty fast anyway ...)
The text was updated successfully, but these errors were encountered: