-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File descriptor leaks in os.scandir() #70182
Comments
os.scandir() opens a file descriptor and closes it only in its destructor. This can causes file descriptor leaks in Python implementations without reference counting and if the scandir iterator becomes a part of reference loop or long living object. Since the number of open file descriptors is limited, this can leads to problems. We need to add the close() method to the scandir iterator (as in files and generators). It would be useful also to make it a context manager. In 3.5 we have to add a warning about this behavior. |
Hi, If I recall correctly, this issue was discussed in the long review of os.scandir(): issue bpo-22524. "os.scandir() opens a file descriptor and closes it only in its destructor." Hopefully, it's incorrect: it's closed when the iterator is exhausted. See how ScandirIterator_close() is used. "This can causes file descriptor leaks in Python implementations without reference counting" Destructors are part of the Python language. Are you aware of a Python implementation *never* calls destructors? I know that PyPy can call destructors "later", not necessary when the last reference to the object is removed. Do you think that we may reach the file descriptor limit because of that? "We need to add the close() method to the scandir iterator (as in files and generators)." Is it part of the iterator protocol? Or do you mean to explicitly call close()? "It would be useful also to make it a context manager." If we decide to add a close() method, it like the idea of also supporting the context manager protocol. "In 3.5 we have to add a warning about this behavior." Yeah, maybe we can elaborate the doc to explain how the file descriptor / Windows handle is used. |
I'm not sure this is actually a leak, because (looking at the code) ScandirIterator_close() is called not just in the destructor, but also at the end of iterating, just before raising StopIteration. Is the issue that if an exception is raised or you stop iterating before the end, then it's less determined when the destructor/close is called? I think we could fairly easily add an explicit close method to the iterator and make it a context manager (add __enter__ and __exit__ methods to the iterator). Is this what you're thinking of: # manual close
it = scandir.scandir(path)
first_entry = next(it)
it.close()
with scandir.scandir(path) as it:
first_entry = next(it) |
Yes, it's what I mean. Add methods close, __enter__ and __exit__ to the iterator. The scandir iterator is not just iterator, it is like file object. And as in file object, we perhaps have to emit a ResourceWarning in the destructor if close() or __exit__() were not called. |
Note there's also a nasty corner case related to generators and GC. If a generator contains a with-block or finally-clause, and the generator is not run until its end because the caller hit an exception on one of the items returned, and the generator object is somehow kept alive (either because it's stored in some longer-living state or because the Python implementation doesn't use reference counting) then the close() call in the finally-clause or in the __exit__ method doesn't run until the generator object is garbage-collected. IOW even making it a context manager doesn't completely solve this issue (though it can certainly help). |
Guido are you saying in the following code, the “finally” message is not guaranteed to be printed out? Or just that you cannot limit a ResourceWarning to garbage collection? def g():
try:
yield "item"
finally:
# Run at exhaustion, close(), and garbage collection
print("finally")
gi = g()
try:
item = next(gi)
print(item / 2) # Oops, TypeError
finally:
# Should be run as the exception passes through. GeneratorExit is raised inside the generator, causing the other “finally” block to execute. All before the original exception is caught and a traceback is printed.
gi.close() But as I understand it, os.scandir() is not a generator, so none of these problems would apply. |
It was a bit more subtle. I think like this: def f():
with some_lock:
yield 0
yield 1
def g():
with another_lock:
it = f()
for i in it:
raise We determined that another_lock was freed *before* some_lock. This is because the local variable 'it' keeps the generator object returned by f() alive until after the with-block in g() has released another_lock. I think this does not strictly require f to be a generator -- it's any kind of closable resource that only gets closed by its destructor. The solution is to add a try/finally that calls it.close() before leaving the with-block in g(). Hope this helps. |
Okay I understand. You have to explicitly call the close() method if you want a generator to be cleaned up properly, which parallels how Serhiy’s proposal would be used. |
I definitely support adding a close() method, and a ResourceWarning if the iterator is not exhausted when it is deallocated. Adding context manager support is probably worthwhile as well, though there is one disadvantage: it would be hard to implement scandir() as a plain generator, because generators don’t support the context manager protocol. But C Python’s implementation is in C, and even in native Python you could implement it as a custom iterator class. I suggest using this issue for adding the new API(s), presumably to 3.6 only, and bpo-26111 for adding a warning to the existing 3.5 documentation. |
If scandir() is implemented as native Python generator (see for example bpo-25911), it could easily be converted to context manager: def scandir(path):
return contextlib.closing(native_scandir(path))
def native_scandir(path):
...
yield ... |
Since the close() method can be added only in 3.6, but we have leaks in 3.5 too, I suggest to closed file descriptor if error happened during iteration (bpo-26117). This will allow to write safe code even in 3.5. |
Contextlib.closing() cannot be used in general, because it doesn’t inherit the iterator protocol from the wrapped generator. So I think you really need a class that implements __exit__(), __iter__(), etc at the same time. |
Proposed patch adds the close() methon and the support of the context manager protocol for the os.scandir class. |
Context manager protocol, close() method: it looks more and more like a file. In this case, I suggest to emit a ResourceWarning in the destructor if it's not closed explicitly. It mean that the scandir() must always be used with "with" or at least that close() is always closed. We probably need to update the code to be more explicitly on that. |
Adding a ResourceWarning even if the generator is run to completion? That seems... dev hostile. I mean, yes, probably best to document it as best practice to use with with statement, but something simple like |
I would be in favour of adding a ResourceWarning in 3.6 if the iterator is garbage collected without being exhausted. But as Josh says, it might be overkill emitting a warning when we already know the iterator has been finished and cleaned up. |
Josh Rosenberg added the comment:
I'm ok to only emit the warning is the generator is not exhausted. The warning would be emited in the destructor if the generator is not |
Updated patch adds a resource warning, tests, improves the documentation, and addresses other comments. |
Review sent. |
Updated patch addresses new Victor's comments. |
About testing that list(iterator) is empty after the iterator is closed, IMO this is an implementation detail. It would be equally (or more) sensible to raise an exception, like proposed for “async def” coroutines in bpo-25887. I suppose the main purpose of those tests is to ensure there is no leakage or warning, so maybe add some more filterwarnings() checks in those tests instead? I left some other review comments. |
Updated patch addresses Martins comments. |
scandir_close_4.patch LGTM, I just added a comment on test_os.py, you may factorize the code. |
New changeset f98ed0616d07 by Serhiy Storchaka in branch 'default': |
Committed with using the helper from bpo-26325 for testing. Thank you all for your reviews, especially for help with the documentation. |
Great! Good job. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: