Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Windows, os.scandir will keep a handle on the directory until the iterator is exhausted #70299

Closed
remyroy mannequin opened this issue Jan 14, 2016 · 10 comments
Closed
Labels
docs Documentation in the Doc dir OS-windows type-bug An unexpected behavior, bug, or error

Comments

@remyroy
Copy link
Mannequin

remyroy mannequin commented Jan 14, 2016

BPO 26111
Nosy @pfmoore, @tjguk, @benhoyt, @vadmium, @zware, @serhiy-storchaka, @eryksun, @zooba
Dependencies
  • bpo-25994: File descriptor leaks in os.scandir()
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-02-24.23:12:36.406>
    created_at = <Date 2016-01-14.18:45:09.967>
    labels = ['type-bug', 'OS-windows', 'docs']
    title = 'On Windows, os.scandir will keep a handle on the directory until the iterator is exhausted'
    updated_at = <Date 2021-02-25.17:59:56.545>
    user = 'https://bugs.python.org/remyroy'

    bugs.python.org fields:

    activity = <Date 2021-02-25.17:59:56.545>
    actor = 'steve.dower'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2021-02-24.23:12:36.406>
    closer = 'eryksun'
    components = ['Documentation', 'Windows']
    creation = <Date 2016-01-14.18:45:09.967>
    creator = 'remyroy'
    dependencies = ['25994']
    files = []
    hgrepos = []
    issue_num = 26111
    keywords = []
    message_count = 10.0
    messages = ['258212', '258219', '258225', '258226', '258228', '258235', '258236', '258248', '387642', '387683']
    nosy_count = 10.0
    nosy_names = ['paul.moore', 'tim.golden', 'benhoyt', 'docs@python', 'martin.panter', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'steve.dower', 'remyroy']
    pr_nums = []
    priority = 'normal'
    resolution = 'third party'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue26111'
    versions = ['Python 3.5']

    @remyroy
    Copy link
    Mannequin Author

    remyroy mannequin commented Jan 14, 2016

    On Windows, os.scandir will keep a handle on the directory being scanned until the iterator is exhausted. This behavior can cause various problems if try to use some filesystem calls like os.chmod or os.remove on the directory while the handle is still being kept.

    There are some use cases where the iterator is not going to be exhausted like looking for a specific entry in a directory and breaking from the loop prematurely.

    This behavior should at least be documented. Alternatively, it might be interesting to provide a way prematurely end the scan without having to exhaust it and close the handle.

    As a workaround, you can force the exhaustion after you are done with the iterator with something like:

    for entry in iterator:
        pass

    This is going to affect os.walk as well since it uses os.scandir .

    The original github issue can be found on benhoyt/scandir#58 .

    @remyroy remyroy mannequin added OS-windows type-bug An unexpected behavior, bug, or error labels Jan 14, 2016
    @eryksun
    Copy link
    Contributor

    eryksun commented Jan 14, 2016

    If you own the only reference you can also delete the reference, which deallocates the iterator and closes the handle.

    Can you provide concrete examples where os.remove and os.chmod fail? At least in Windows 7 and 10 the directory handle is opened with the normal read and write sharing, but also with delete sharing. This sharing mode is fairly close to POSIX behavior (an important distinction is noted below). I get the following results in Windows 10:

        >>> import os, stat
        >>> os.mkdir('test')
        >>> f = open('test/file1', 'w'); f.close()
        >>> f = open('test/file2', 'w'); f.close()
        >>> it = os.scandir('test')
        >>> next(it)
        <DirEntry 'file1'>

    rename, chmod, and rmdir operations succeed:

    >>> os.rename('test', 'spam')
    >>> os.chmod('spam', stat.S_IREAD)
    >>> os.chmod('spam', stat.S_IWRITE)
    >>> os.remove('spam/file1')
    >>> os.remove('spam/file2')
    >>> os.rmdir('spam')
    

    Apparently cached entries can be an issue, but this caching is up to WinAPI FindNextFile and the system call NtQueryDirectoryFile:

        >>> next(it)
        <DirEntry 'file2'>

    An important distinction is that a deleted file in Windows doesn't actually get unlinked until all handles and kernel pointer references are closed. Also, once the delete disposition is set, no *new* handles can be created for the existing file or directory (all access is denied), and a new file or directory with same name cannot be created.

        >>> os.listdir('spam')
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        PermissionError: [WinError 5] Access is denied: 'spam'
    
        >>> f = open('spam', 'w')
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        PermissionError: [Errno 13] Permission denied: 'spam'

    If we had another handle we could use that to rename "spam" to get it out of the way, at least. Without that, AFAIK, all we can do is deallocate the iterator or wait for it to be exhausted, which closes the handle and thus allows Windows to finally unlink "spam":

        >>> next(it)
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        StopIteration

    Creating a new file named "spam" is allowed now:

    >>> f = open('spam', 'w')
    >>> f.close()
    

    @vadmium
    Copy link
    Member

    vadmium commented Jan 14, 2016

    Remy, is this the same problem described in bpo-25994? There a close() method (like on generators) and/or context manager support is proposed for the scandir() iterator. Perhaps we can keep this issue open for adding a warning to the documentation, and the other issue can be for improving the API in 3.6.

    @vadmium vadmium added the docs Documentation in the Doc dir label Jan 14, 2016
    @remyroy
    Copy link
    Mannequin Author

    remyroy mannequin commented Jan 14, 2016

    I believe Eryk's explanation on how a file in Windows doesn't actually get unlinked until all handles and kernel pointer references are closed is spot on about the problem I had.

    I had a complex example that could probably have been simplified to what Eryk posted.

    That behavior on Windows is quite counterintuitive. I'm not sure about what can be done to help it.

    @remyroy
    Copy link
    Mannequin Author

    remyroy mannequin commented Jan 14, 2016

    This issue is not same as bpo-25994 but it is quite related. Some kind of close() method and/or context manager support could help here as well.

    @vadmium
    Copy link
    Member

    vadmium commented Jan 14, 2016

    Can you explain how it is different? The way I see it, both problems are about the scandir() iterator holding an open reference (file descriptor or handle) to a directory/folder, when the iterator was not exhausted, but the caller no longer needs it.

    @eryksun
    Copy link
    Contributor

    eryksun commented Jan 14, 2016

    That behavior on Windows is quite counterintuitive.

    It's counter-intuitive from a POSIX point of view, in which anonymous files are allowed. In contrast, Windows allows any existing reference to unset the delete disposition, so the name cannot be unlinked until all references are closed.

    @remyroy
    Copy link
    Mannequin Author

    remyroy mannequin commented Jan 14, 2016

    From my point of view, bpo-25994 is about the potential file descriptor/handle leaks and this issue is about being unable to perform some filesystem calls because of a hidden unclosed file descriptor/handle.

    I am not going to protest if you want to treat them as the same issue.

    @eryksun
    Copy link
    Contributor

    eryksun commented Feb 24, 2021

    bpo-25994 added support for the context-manager protocol and close() method in 3.6. So it's at least much easier to ensure that the handle gets closed.

    The documentation of scandir() links to WinAPI FindFirstFile and FindNextFile, which at least mentions the "search handle". It's not made explicit that this encapsulates a handle for a kernel file object, nor are the operations (e.g. move, rename, delete) discussed that are allowed directly on the directory. Similarly, the directory stream that's returned by and used by POSIX opendir() and readdir() may or may not encapsulate a file descriptor.

    I don't think Python's documentation is the best place to discuss platform-specific implementation details in most cases. Exceptions should be made in some cases, but I don't think this is one of them because I can't even link to a document about the implementation details of FindNextFile. At a lower level I can link to documents about the NtQueryDirectoryFile[Ex] system call, but that's not much help in terms of officially documenting what FindNextFile does. Microsoft prefers to keep the Windows API details opaque, which gives them wiggle room.

    FYI, in Windows 10, deleting files and directories now tries a POSIX delete (if supported by the filesystem) that immediately unlinks the name as soon as the handle that's used to perform the delete is closed, such as the handle that's opened to implement DeleteFile (os.unlink) and RemoveDirectory (os.rmdir). NTFS supports this feature by moving the file/directory to a reserved "\$Extend\$Deleted" directory:

        >>> os.mkdir('spam')
        >>> h = win32file.CreateFile('spam', 0, 0, None, 3, 0x0200_0000, None)
        >>> print(win32file.GetFinalPathNameByHandle(h, 0))
        \\?\C:\Temp\test\test\spam
    
        >>> os.rmdir('spam')
        >>> print(win32file.GetFinalPathNameByHandle(h, 0))
        \\?\C:\$Extend\$Deleted\001000000000949A5E2FE5BB

    Of course, none of the above is documented for RemoveDirectory().

    @eryksun eryksun closed this as completed Feb 24, 2021
    @zooba
    Copy link
    Member

    zooba commented Feb 25, 2021

    FYI, in Windows 10, deleting files and directories now tries a POSIX delete

    Yeah, FWIW, I haven't been able to get clear guidance on what I can/cannot publicly announce we've done in this space. But since you've found it I guess I can say sorry that I couldn't announce it more loudly! :)

    A number of our other issues should be able to be closed soon once the changes get out in the open.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants