Possible resource leak in glob in non-refcount implementations #88648
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
assignee = None closed_at = None created_at = <Date 2021-06-22.05:40:18.277> labels = ['3.11', 'library', '3.9', '3.10', 'performance'] title = 'Possible resource leeak in glob in non-refcount implementations' updated_at = <Date 2021-06-27.11:28:49.507> user = 'https://github.com/serhiy-storchaka'
activity = <Date 2021-06-27.11:28:49.507> actor = 'serhiy.storchaka' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = <Date 2021-06-22.05:40:18.277> creator = 'serhiy.storchaka' dependencies =  files =  hgrepos =  issue_num = 44482 keywords = ['patch'] message_count = 9.0 messages = ['396306', '396358', '396397', '396402', '396406', '396423', '396553', '396554', '396573'] nosy_count = 5.0 nosy_names = ['gvanrossum', 'terry.reedy', 'serhiy.storchaka', 'Guido.van.Rossum', 'miss-islington'] pr_nums = ['26843', '26872', '26916'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'resource usage' url = 'https://bugs.python.org/issue44482' versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']
The text was updated successfully, but these errors were encountered:
There is possible resource leak in glob on alternate Python implementation witch do not use reference counting to immediate releasing resources.
It is in the code
names = list(_iterdir(dirname, dir_fd, dironly))
where _iterdir() is a generator function which calls os.scandir() and yields entry names after some filtering. If an exception happens inside _iterdir(), the scandir iterator will be immediately closed, because of using the with statement. But an exception can happens outside of _iterdir(), in the list constructor (MemoryError). In this case the generator will be closed immediately in CPython because of reference counting (newly created generator has only one reference), but on other implementation it can be deferred on arbitrary time. And even in CPython we rely on garbage collector if there reference loops in the generator frame. This issue has very small chance to occur but still.
The right way is to close the generator explicitly:
it = _iterdir(dirname, dir_fd, dironly) try: names = list(it) finally: it.close()
with contextlib.closing(_iterdir(dirname, dir_fd, dironly)) as it: names = list(it)
We should analyze all other generator functions which acquire some resources and ensure that they are always properly closed.
So this is a problem because the generator owns a resource that it will only release once its .close() method is called, right? And we have no control over when that happens -- we can't make it the responsibility of list() to close the iterator passed into it. It is indeed a painful thing. I wonder if we should put a warning somewhere in the docs and tutorials for generators?
The PR looks good.
Yes. I added you Guido to the nosy list to attract attention to a general issue. If the generator owns resources, it should be guaranteed closed. Since generator do not support the context manager protocol, we need to use closing().
Using generators is common in the code on my work, so I am going to revise all uses of resource-owning generators in my work code, in the stdlib and in common libraries like aiohttp.
May be we will add a decorator for generator functions which would add support of the context manager protocol. Or make generator objects supporting the context manager protocol by default. Or make the with statement fall back to the close() method by default. Or even add special syntax shortcut for combining "with" and "for" if this idiom will be common enough.
"Generators deliberately don't support the context management protocol. This is so that they raise an explicit TypeError or AttributeError (pointing out that __exit__ is missing) if you leave out the @contextmanager decorator when you're using a generator to write an actual context manager.
Generators supporting the context management protocol natively would turn that into a far more subtle (and confusing) error: your code would silently fail to invoke the generator body.
Ensuring this common error remains easy to detect is far more important than making it easier to invoke close() on a generator object (particularly when contextlib.closing() already makes that very easy)."
The new entry in the Design FAQ is this:
"Why don’t generators support the with statement?
For technical reasons, a generator used directly as a context manager would not work correctly. When, as is most common, a generator is used as an iterator run to completion, no closing is needed. When it is, wrap it as “contextlib.closing(generator)” in the ‘with’ statment."
If Nick's explanation is not currently correct, the above should be removed. If it is, perhaps further doc is needed.