Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pathlib.Path.walk method #90385

Closed
Ovsyanka83 mannequin opened this issue Jan 2, 2022 · 12 comments
Closed

add pathlib.Path.walk method #90385

Ovsyanka83 mannequin opened this issue Jan 2, 2022 · 12 comments
Assignees
Labels
3.11 only security fixes stdlib Python modules in the Lib dir topic-pathlib type-feature A feature request or enhancement

Comments

@Ovsyanka83
Copy link
Mannequin

Ovsyanka83 mannequin commented Jan 2, 2022

BPO 46227
Nosy @merwok, @barneygale, @ajoino, @Ovsyanka83
PRs
  • bpo-46227: Add pathlib.Path.walk method #30340
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2022-01-02.18:34:12.724>
    labels = ['type-feature', 'library', '3.11']
    title = 'add pathlib.Path.walk method'
    updated_at = <Date 2022-01-08.12:58:28.255>
    user = 'https://github.com/Ovsyanka83'

    bugs.python.org fields:

    activity = <Date 2022-01-08.12:58:28.255>
    actor = 'Ovsyanka'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2022-01-02.18:34:12.724>
    creator = 'Ovsyanka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46227
    keywords = ['patch']
    message_count = 4.0
    messages = ['409511', '409514', '409994', '410098']
    nosy_count = 4.0
    nosy_names = ['eric.araujo', 'barneygale', 'ajoino', 'Ovsyanka']
    pr_nums = ['30340']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue46227'
    versions = ['Python 3.11']

    @Ovsyanka83
    Copy link
    Mannequin Author

    Ovsyanka83 mannequin commented Jan 2, 2022

    Pathlib is great, yet every time I have to parse a bunch of files, I have to use os.walk and join paths by hand. That's not a lot of code but I feel like pathlib should have higher-level abstractions for all path-related functionality of os. I propose we add a Path.walk method that could look like this:

    def walk(self, topdown=True, onerror=None, followlinks=False):
        for root, dirs, files in self._accessor.walk(
            self,
            topdown=topdown,
            onerror=onerror,
            followlinks=followlinks
        ):
            root_path = Path(root)
            yield (
                root_path,
                [root_path._make_child_relpath(dir_) for dir_ in dirs],
                [root_path._make_child_relpath(file) for file in files],
            )

    Note: this version does not handle a situation when top does not exist (similar to os.walk that also doesn't handle it and just returns an empty generator)

    @Ovsyanka83 Ovsyanka83 mannequin added 3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 2, 2022
    @Ovsyanka83
    Copy link
    Mannequin Author

    Ovsyanka83 mannequin commented Jan 2, 2022

    Some people could suggest using Path.glob instead but I found it to be less convenient for some use cases and generally slower (~2.7 times slower).

    >>> timeit("list(Path('Lib').walk())", number=100, globals=globals())
    1.9074640140170231
    >>> timeit("list(Path('Lib').glob('**/*'))", number=100, globals=globals())
    5.14890358998673

    @merwok
    Copy link
    Member

    merwok commented Jan 7, 2022

    The idea is interesting, and I agree that glob with a maxi wildcard is not a great solution. There is discussion on the PR about adding walk vs extending iterdir; could you post a message on discuss.python.org and sum up the the discussion? (Pull requests on the CPython repo are only used to discuss implementation, not for debating ideas or proposing features.)

    @Ovsyanka83
    Copy link
    Mannequin Author

    Ovsyanka83 mannequin commented Jan 8, 2022

    Thanks for the tip! Hopefully, I created it correctly:
    https://discuss.python.org/t/add-pathlib-path-walk-method/

    It is currently on review.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @zmievsa
    Copy link
    Contributor

    zmievsa commented May 8, 2022

    The discussion ended on the following decisions:

    • The original solution was based on wrapping os.walk but we have decided that it will be both faster and better in the long term to reimplement it (though I was not able to come up with any implementation that is significantly better than that of os.walk so it's 99% the same)
    • The original solution was yielding a tuple[Path, list[Path], list[Path]] but we decided to stay true to the original implementation and yield tuple[Path, list[str], list[str]]. It is both better in terms of optimization and should cause a little less confusion to the users
    • The original solution had the same arguments as os.walk but the interface was too convoluted so: we decided to rename arguments to snake-case and I decided to remove topdown argument by splitting walk into two methods. So we get walk() and walk_bottom_up(). I.e. Now we get two methods instead of one but with two modes of operation. The naming of methods is up to discussion because I myself am not very happy with these names but those are the best I found

    I have been absent for quite some time so the pull request got stale and outdated so I'm opening a new one.

    @brettcannon
    Copy link
    Member

    The views on the change seem generally positive and the API seems like it makes sense, so I'm inclined to accept the change once the PR passes review.

    @tiran
    Copy link
    Member

    tiran commented Jul 23, 2022

    GH-92517 broke tests on wasm32-wasi platform.

    ======================================================================
    FAIL: test_walk_symlink_location (test.test_pathlib.WalkTests.test_walk_symlink_location)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Lib/test/test_pathlib.py", line 2626, in test_walk_symlink_location
        self.assertIn("link", files)
    AssertionError: 'link' not found in ['tmp3']
    ----------------------------------------------------------------------
    Ran 458 tests in 1.229s
    FAILED (failures=1, skipped=182)
    test test_pathlib failed
    

    @zmievsa
    Copy link
    Contributor

    zmievsa commented Jul 24, 2022

    @tiran I apologize for introducing this problem and I am really thankful to you for resolving it, especially so quickly!

    @merwok
    Copy link
    Member

    merwok commented Jul 24, 2022

    No worry! We all forgot to run the PR through buildbots before merging, so let’s just do that next time.

    @brettcannon
    Copy link
    Member

    It's also partially my fault by forgetting that WASI doesn't support symlinks and thus to have you add the test guard for those sorts of platforms.

    @zmievsa
    Copy link
    Contributor

    zmievsa commented Jul 26, 2022

    By the way, are we adding this PR into what's new? Or is it only gonna be a part of changelog? Seems big enough for what's new

    @brettcannon
    Copy link
    Member

    @Ovsyanka83 it should end up in What's New. If you would like to write a PR for that I will happily review it. We also go through the changelog before release to make sure everything relevant ends up in there, so we don't have to worry about it being missed.

    zmievsa added a commit to zmievsa/cpython that referenced this issue Jul 30, 2022
    zmievsa added a commit to zmievsa/cpython that referenced this issue Jul 30, 2022
    …thub.com:Ovsyanka83/cpython into pythongh-90385/add-pathlib-path-whatsnew-section
    zmievsa added a commit to zmievsa/cpython that referenced this issue Aug 5, 2022
    miss-islington pushed a commit that referenced this issue Aug 11, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes stdlib Python modules in the Lib dir topic-pathlib type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants