Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awkward1 form.parameters exhibits unbounded memory growth #2310

Closed
lgray opened this issue Mar 13, 2023 · 7 comments · Fixed by #2311
Closed

awkward1 form.parameters exhibits unbounded memory growth #2310

lgray opened this issue Mar 13, 2023 · 7 comments · Fixed by #2311
Labels
bug The problem described is something that must be fixed

Comments

@lgray
Copy link
Contributor

lgray commented Mar 13, 2023

This is an issue for all present users of coffea until they migrate to the to-be released coffea 2023 (previously 0.8), due to how nanoevents manipulates the form.

uproot version = '4.3.7'
awkward version = '1.10.2'

The following code will grow in memory continuously, this becomes a problem for very long running jobs (and often kills people's analysis runs over the course of hours):

import uproot
import awkward

i = 0
x = uproot.open({"tests/samples/nano_dy.root": "Events"})

while True:
    print("manipulating form",i)
    i += 1
    for _, branch in x.iteritems():
        form = branch.interpretation.awkward_form(None)
        form = uproot._util.awkward_form_remove_uproot(awkward, form)

Upon commenting out this last line the unbounded memory growth stops.

@lgray lgray added the bug (unverified) The problem described would be a bug, but needs to be triaged label Mar 13, 2023
@jpivarski
Copy link
Member

Yes, now I see it: RSS for this process is 80000 with the last line commented out, but grows ~linearly with it in.

@jpivarski jpivarski added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Mar 13, 2023
@jpivarski
Copy link
Member

@jpivarski
Copy link
Member

Merely asking for the form.parameters gives us linear memory growth:

while True:
    print("manipulating form",i)
    i += 1
    for _, branch in x.iteritems():
        form = branch.interpretation.awkward_form(None)
        # form = awkward_form_remove_uproot(awkward, form)
        form.parameters

@jpivarski
Copy link
Member

It goes through this:

awkward/src/python/content.cpp

Lines 1737 to 1752 in 198cf61

py::dict
parameters2dict(const ak::util::Parameters& in) {
py::dict out;
for (auto pair : in) {
std::string cppkey = pair.first;
std::string cppvalue = pair.second;
py::str pykey(PyUnicode_DecodeUTF8(cppkey.data(),
cppkey.length(),
"surrogateescape"));
py::str pyvalue(PyUnicode_DecodeUTF8(cppvalue.data(),
cppvalue.length(),
"surrogateescape"));
out[pykey] = py::module::import("json").attr("loads")(pyvalue);
}
return out;
}

It's possible that the PyUnicode_DecodeUTF8 Python C API functions returns a new ref, which is not getting DECREF'ed by pybind11.

It does return a new ref.

@jpivarski
Copy link
Member

This (whole script) shows the unbounded memory growth:

import json
import awkward

form = awkward.forms.Form.fromjson(
    json.dumps({"class": "EmptyArray", "parameters": {"a": "b"}})
)

i = 0
while True:
    i += 1
    if i % 100000 == 0:
        print(i)
    form.parameters

@lgray
Copy link
Contributor Author

lgray commented Mar 13, 2023

That's mildly terrifying. I guess this should be moved over to an awkward issue in that case for a fix into the main-v1 branch?

@jpivarski
Copy link
Member

I'm working on that now. I can move this issue.

Actually, I'm seeing it with some relief: we've been trying to find these tiny memory leaks for over 2 years, and it really was a major part of the motivation to rewrite in Python. After this, I'm going to check the reproducers on some of the old leak reports, to see if it's finally gone.

This Python C API function is used on all of the bytes ↔ str conversions. I had thought that pybind11 would own it and delete it, but apparently not.

@jpivarski jpivarski transferred this issue from scikit-hep/uproot5 Mar 13, 2023
@jpivarski jpivarski changed the title uproot4 awkward_form_remove_uproot exhibits unbounded memory growth awkward1 form.parameters exhibits unbounded memory growth Mar 13, 2023
@jpivarski jpivarski linked a pull request Mar 13, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants