New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in ArrayBuilder/GrowableBuffer #567
Comments
PR #570 fixes it: >>> import awkward as ak
>>> import gc
>>> builder = ak.ArrayBuilder()
>>> builder.integer(123)
CPU malloc at 0x56286a4b4a50 (8192 bytes)
>>> builder.integer(456)
>>> del builder
CPU free at 0x56286a4b4a50
>>> builder = ak.ArrayBuilder()
>>> builder.integer(123)
CPU malloc at 0x56286a4b4a50 (8192 bytes)
>>> builder.integer(456)
>>> array = builder.snapshot()
>>> del builder
>>> array
<Array [123, 456] type='2 * int64'>
>>> del array
>>> gc.collect()
CPU free at 0x56286a4b4a50
7 The ArrayBuilder nodes weren't freeing their memory before because each node had a circular reference. They had to return a I also have these snazzy new malloc/free messages that I can turn on when debugging. |
Hi @jpivarski , it is amazing to see how fast you react to problems and start fixing things. Last Friday i asked myself how long it will take until this issue will be fixed and now on Monday morning there are already two new versions! So i'm sad to report that even with the version (awkward 1.0.0) the memory still grows when i overwrite awkward arrays in a loop. Did you test the loops that you posted in Gitter with the new version? |
I wonder if you're using the version you think you're using. I just ran % python -i -c 'import awkward as ak; import numpy as np; import gc'
>>> for i in range(1000000):
... a = ak.Array([i])
... del a
... tmp = gc.collect()
... and your original % python -i -c 'import awkward as ak; import numpy as np; import gc'
>>> for i in range(1000000):
... a = ak.Array([i])
... del a
... (garbage collector outside of the loop, and therefore not relevant for scaling during the loop), and I don't see any significant increases in memory in two minutes. "Significant" for me means 100 MB, since this is the level of background noise on htop. htop isn't targeted, so I turned on my debugging statements and ran the loop without garbage collector: % python -i -c 'import awkward as ak; import numpy as np; import gc'
>>> for i in range(10):
... a = ak.Array([i])
... del a
...
CPU malloc at 0x55615983ce70 (8192 bytes)
CPU free at 0x55615983ce70
CPU malloc at 0x55615983ce70 (8192 bytes)
CPU free at 0x55615983ce70
CPU malloc at 0x55615983ce70 (8192 bytes)
CPU free at 0x55615983ce70
CPU malloc at 0x55615983ce70 (8192 bytes)
CPU free at 0x55615983ce70
CPU malloc at 0x55615983ce50 (8192 bytes)
CPU free at 0x55615983ce50
CPU malloc at 0x55615983ce50 (8192 bytes)
CPU free at 0x55615983ce50
CPU malloc at 0x55615983ce50 (8192 bytes)
CPU free at 0x55615983ce50
CPU malloc at 0x55615983ce50 (8192 bytes)
CPU free at 0x55615983ce50
CPU malloc at 0x55615983ce50 (8192 bytes)
CPU free at 0x55615983ce50
CPU malloc at 0x55615983ce50 (8192 bytes)
CPU free at 0x55615983ce50 Each % python -i -c 'import awkward as ak; import numpy as np; import gc'
>>> for i in range(10):
... a = ak.Array([i])
...
CPU malloc at 0x5620d2865bc0 (8192 bytes)
CPU malloc at 0x5620d2868440 (8192 bytes)
CPU free at 0x5620d2865bc0
CPU malloc at 0x5620d2865bc0 (8192 bytes)
CPU free at 0x5620d2868440
CPU malloc at 0x5620d2868440 (8192 bytes)
CPU free at 0x5620d2865bc0
CPU malloc at 0x5620d2865bc0 (8192 bytes)
CPU free at 0x5620d2868440
CPU malloc at 0x5620d2868440 (8192 bytes)
CPU free at 0x5620d2865bc0
CPU malloc at 0x5620d2865bc0 (8192 bytes)
CPU free at 0x5620d2868440
CPU malloc at 0x5620d2868440 (8192 bytes)
CPU free at 0x5620d2865bc0
CPU malloc at 0x5620d2865bc0 (8192 bytes)
CPU free at 0x5620d2868440
CPU malloc at 0x5620d2868440 (8192 bytes)
CPU free at 0x5620d2865bc0
>>> del a
CPU free at 0x5620d2868440 Just replacing |
If i run the following, i reach your limit of 100 MB in one and a half minutes. To be honest i don't know if tracemalloc also fills the memory with something but the tree largest positions in the print are due to the awkward package.
|
100 MB wasn't my limit—it's the level of noise: things jump up and down by 100 MB. Focusing in on the Python process could help, but still, memory accounting is tricky because OS-level memory can be shared among processes. That's why I was looking for a much stronger signal than 100 MB, to be sure it was a reproducible thing. On the opposite end of the granularity scale, I do see that your example of repeatedly creating |
Doing the garbage-collecting in the loop unfortunately increases the runtime so much that it becomes absolutely impossible to run the code. The event loop in my project is way to large and for each event i have to compute quite a few variables. |
I wasn't recommending running |
Hi @jpivarski , i'm wondering if you still consider this memory leakage an open issue or not ? I'm sorry that i keep annoying you about it but i have to know if i should start looking for a possible work around. |
I don't see a memory leak anymore—at least, I can't reproduce it. |
I have to agree that the growth of the memory is no longer as easy to observe as it was before the version 1.0.0 but it i think it is still there. Try to run something like this inside the loop.
I just put in a couple of variables in the loop with a structure that is not completely trivial. Running this code requires 425 MB and takes 0:02:26. As structures gets more complicated by adding a loop inside the loop and computing more variables this will reach the memory limit rather quickly. |
I believe you're seeing some memory issues, but if you're using the latest release, they're not in which creates non-GPU arrays as Both of these Since this has 100% coverage, we can track every array allocation and deletion. (For class instances, we have to trust C++'s shared memory handling.) In your new example, turning on the print-outs for two passes of the loop prints:
Every malloc is paired with a free for the same buffer. (Each array has two buffers for the ListOffsetArray In your original report, you identified a real memory leak, so I made the codebase airtight for all array allocations. (The four JSON-handling nodes should be investigated more deeply at a later date, but they're not relevant here.) Before the fix, I observed linearly increasing memory use, though I had to watch it more than a minute for the trend to be clear the way I was measuring it. After the fix, not only do we have this demonstration that all the allocations are paired with frees, but also I don't observe the linearly increasing memory, even after two minutes. That original error was fixed. You're still seeing memory leaks, but we know (from the above) that it's not any buffer in Awkward Array and I trust C++'s Maybe you have a situation in which objects are waiting for the garbage collector, but you have a memory limit on the process that the garbage collector doesn't know about, so the garbage collector doesn't kick in before the process already dies? Anyway, I don't have any evidence of a memory leak in Awkward Array (any more), so I have nothing to follow up on. |
I see that the awkward arrays are working flawlessly on the C++ level but on the python level there seems to be something odd happening. Overwriting an awkward array 2 million times (with version 1.0.1rc1) takes up 424 MB and doing the same with numpy array or a list does not even use 1MB... I'm afraid that in the end i'll have to rewrite the my whole analysis and abandon the awkward arrays, just because the automatic garbage collection does not work and i haven't found a way to free memory. What would you do in my position? Move on and rewrite everything? |
If I were in your situation, I would clone the awkward-1.0 GitHub repo, turn on the memory print-outs, and Also, thinking about this in a wider context, why are you creating millions of Awkward Arrays? Even without explicit leaks, that's not an efficient way to work. There's a significant time overheard in creating an array, so we want to use a small number of large arrays, using array-at-a-time operations ("vectorization"). If your real code looks like this benchmark, then it should probably be refactored anyway if running time is a concern. If what you're doing can't be reexpressed as a vectorized operation and must be in for loops, then you should consider Numba. You can't create Awkward Arrays inside of a Numba-compiled function, but you can create the equivalent of Python lists for temporary work. (That's what it looks like this function is doing.) |
Hi @jpivarski , thanks for your advice. I thought about the structure and if it is possible to use operations on an axis at the time but the issue is that structure that I face is really not trivial. I have a nested structure with vectors of vectors and with an “instruction” which elements belong together. I added a little example to illustrate what I'm talking about.
So this I why I’m creating so many arrays. I have around half a million events and per event there are like 10 pulses. I realized that accessing the elements in this fashion is really the limiting memory factor for me. Running something like the following already uses more than 200 MB and keeps growing with more accesses.
Do you also observe this behavior and is it actually what you would expect? If you observe the same thing, I think should switch to ROOT and do it there. |
If the garbage collector stops the world and cleans up before you physically run out of memory, as it's supposed to (but maybe don't if some strange The reason I was arguing for array-at-a-time functions or Numba is because of time, not memory usage. Creating a million Awkward Arrays is much slower than creating a million Python lists: >>> import awkward as ak
>>> a = ak.Array([[1,2,3,4],[5],[6,7,8],[9]]) # "a" is an Awkward Array
>>> def f(n):
... for i in range(n):
... b = a[[0, 1], [0, 0]] # addressed with an advanced slice
...
>>> starttime = time.time(); f(100000); time.time() - starttime
9.222622871398926
>>> a = [[1,2,3,4],[5],[6,7,8],[9]] # "a" is a Python list of lists
>>> def g(n):
... for i in range(n):
... b = [a[0][0], a[1][0]] # addressed with explicit indexes
...
>>> starttime = time.time(); g(100000); time.time() - starttime
0.04160714149475098 That was just 100,000, but already we see that the plain Python lists are 220× faster. Creating millions of Awkward Arrays shouldn't be a memory leak, but it's not often tested because it's such an antipattern for time usage. I don't have a big picture of what you're trying to do—it might be possible with array-at-a-time functions, but it seems like it's easier to think about this problem in an element-at-a-time way, so I'll show you how to get what you want, at scale, without having to rethink the problem to cast it into an array-at-a-time form. You can use Numba. Let's say you want a 2-D NumPy array of >>> import numba as nb
>>> import numpy as np
>>> a = ak.Array([[1,2,3,4],[5],[6,7,8],[9]])
>>> @nb.njit
... def h(n):
... out = np.empty((n, 2), np.int64)
... for i in range(n):
... out[i, 0] = a[0][0] # in Numba, Awkward Arrays have to be accessed as though they were lists
... out[i, 1] = a[1][0] # that is, no a[[0, 1], [0, 0]] syntax; do things one at a time
... return out
...
>>> h(1) # the first time you call it compiles it; not representative of the eventual speed
array([[1, 5]])
>>> starttime = time.time(); h(100000); time.time() - starttime
array([[1, 5],
[1, 5],
[1, 5],
...,
[1, 5],
[1, 5],
[1, 5]])
0.001569509506225586 And that gives you an additional factor of 26× over the Python lists, which is a total of 5800× times faster than what you're doing with Awkward Arrays outside of Numba. The Numba implementation does not create intermediate Awkward Arrays: So this method short-circuits any memory problems: it doesn't create any objects that would need to be deleted (whatever is going wrong in your application to not delete those objects), but it also addresses a many-thousands-of-times speedup that's probably more important than the memory issue itself. (Sadly, that's probably how the original memory issue went unnoticed, but based on the arguments I gave previously, I don't think Awkward Array has a memory issue in creating |
Man you are absolutely right! I played a bit with the lists and the numba functions and it instantly gave me a pretty good speed improvement. I decided to transform all the awkward arrays that I get from uproot into lists and only work with lists. Unfortunately numba does not allow for nested lists but even with python functions the analysis is way faster. Unfortunately I still had the issue with the continuously growing memory, so I tried to only run the uproot command and I figured out the memory leakage I was seeing, was coming from uproot.iterate and not from the awkward arrays. Luckily you actually solved that in a never version (0.1.2). Thanks for all the help! |
Yes, there was a memory leak in If you're writing for loop-style code, lists will be faster than Awkward Arrays. And then Numba will be faster than the pure Python, but Numba doesn't deal with nested structures in lists very well, and the transformation of Python objects into Numba objects can itself be expensive. The optimal pairing is Awkward Arrays with Numba: any Awkward data structure can be viewed in Numba without conversion (i.e. it's an O(1) view, rather than an O(n) conversion). The limitation is that if you're using Numba, Awkward Arrays have to be addressed one element at a time: the array-at-a-time style outside of Numba is mutually exclusive with the element-at-a-time style inside of Numba. This was deliberately added to provide flexibility. But if the speed of Python lists is good enough (keep in mind how your analysis will eventually scale), then stick with it! It's probably simplest. You might also be able to save an unnecessary conversion step if you set |
Shit i celebrated to early.... the issue it is still there when i combine uproot and awkward. I tried to bypass awkward but unfortunately if have to deal with vectors of vectors within the root file so when i use |
I think STLVector objects have a https://uproot.readthedocs.io/en/latest/uproot.containers.STLVector.html If your dataset has STLVectors, then the Awkward Arrays were being produced by converting the STLVectors into Awkward Arrays, so skipping that conversion will be a speedup. |
Thanks a lot for your help! I finally managed to get the memory issues under control and run the jobs on the universities cluster without crashing the machines. The solution for me was to completely avoid the awkward arrays and go from the |
My response to @sbuse's report on Scikit-HEP/awkward-array (Gitter).
I did a dirty but conclusive-enough experiment (watching htop while running commands in a Zoom meeting). The following consistently increased memory linearly: 100 MB in 80 seconds.
and the following did not increase by even 10 MB in 80 seconds (where 10 MB is the level of noise—other applications allocating and freeing memory on my system).
The first is a simplified version of yours: your example creates arrays from Python data, which internally invoke the ArrayBuilder. The second makes Awkward Arrays by wrapping NumPy arrays. Your example,
is pretty much a combination of the two steps: ArrayBuilder, then wrap as ak.Array. Doing the above explicitly accumulated 120 MB in 80 seconds.
So it sounds like this is a memory leak in ArrayBuilder, very likely the GrowableBuffer that gets allocated is somehow not getting freed. The memory that GrowableBuffer allocates is in a
std::shared_ptr
that should be kept alive only by the fact that the ArrayBuilder is held as a Python reference. In my first example,del a
should have dropped the Python reference count to zero andgc.collect()
should have deleted the ArrayBuilder, then GrowableBuffer instance, which should have dropped thestd::shared_ptr
reference count to zero to immediately free the memory. That doesn't seem to be happening, but I know where to look.(For future triage: memory leaks are bugs, not performance issues.)
The text was updated successfully, but these errors were encountered: