Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support recursive globs #58176

Closed
ubershmekel mannequin opened this issue Feb 8, 2012 · 87 comments
Closed

Support recursive globs #58176

ubershmekel mannequin opened this issue Feb 8, 2012 · 87 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@ubershmekel
Copy link
Mannequin

ubershmekel mannequin commented Feb 8, 2012

BPO 13968
Nosy @loewis, @rhettinger, @ncoghlan, @pitrou, @vstinner, @giampaolo, @merwok, @bitdancer, @serhiy-storchaka, @marc-h38
PRs
  • doc: recursive glob ** follows symlinks to directories #12918
  • Dependencies
  • bpo-16618: Different glob() results for strings and bytes
  • Files
  • rglob.patch
  • glob.doublestars.patch
  • glob_recursive_2.patch
  • glob_recursive_3.patch
  • glob_recursive_4.patch
  • glob_recursive_6.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-09-11.18:44:25.323>
    created_at = <Date 2012-02-08.11:22:26.050>
    labels = ['type-feature', 'library']
    title = 'Support recursive globs'
    updated_at = <Date 2019-04-23.07:49:09.759>
    user = 'https://bugs.python.org/ubershmekel'

    bugs.python.org fields:

    activity = <Date 2019-04-23.07:49:09.759>
    actor = 'marc-h38'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-09-11.18:44:25.323>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2012-02-08.11:22:26.050>
    creator = 'ubershmekel'
    dependencies = ['16618']
    files = ['24451', '25360', '28527', '28728', '32669', '36555']
    hgrepos = []
    issue_num = 13968
    keywords = ['patch', 'needs review']
    message_count = 87.0
    messages = ['152843', '152846', '152847', '152849', '152852', '152853', '152856', '152857', '152858', '152868', '152871', '152873', '152876', '152877', '152879', '152880', '152882', '152894', '152898', '152917', '152918', '152922', '152925', '152943', '152945', '152946', '152949', '152950', '152951', '152952', '152953', '152955', '152957', '152958', '152959', '152960', '152962', '152965', '152966', '152968', '152971', '152982', '152986', '152990', '152994', '152998', '152999', '153001', '153002', '153003', '153004', '153005', '153006', '153013', '153085', '153192', '153208', '153281', '153300', '154600', '154603', '157278', '157280', '157281', '157287', '157290', '159265', '160057', '160079', '160743', '176996', '178810', '179947', '179979', '203177', '212878', '226422', '226459', '226471', '226476', '226757', '226759', '226761', '226768', '226769', '226778', '340696']
    nosy_count = 14.0
    nosy_names = ['loewis', 'rhettinger', 'ncoghlan', 'pitrou', 'vstinner', 'giampaolo.rodola', 'eric.araujo', 'r.david.murray', 'cvrebert', 'ubershmekel', 'elsdoerfer', 'python-dev', 'serhiy.storchaka', 'marc-h38']
    pr_nums = ['12918']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue13968'
    versions = ['Python 3.5']

    @ubershmekel
    Copy link
    Mannequin Author

    ubershmekel mannequin commented Feb 8, 2012

    This is a feature I've wanted to use in too many times to remember. I've made a patch with an implementation, docs and a test. I've named the function rglob and tried to stay within the conventions of the glob package.

    @ubershmekel ubershmekel mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 8, 2012
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Feb 8, 2012

    I'm inclined to close this as a functional duplicate of http://bugs.python.org/issue13229

    @ubershmekel
    Copy link
    Mannequin Author

    ubershmekel mannequin commented Feb 8, 2012

    I'd say it's very close to a duplicate but maybe isn't so. If walkdir is added then rglob can be implemented using it.

    I'd say "rglob" to "walkdir" is like "urlopen" to "http.client". One is the stupid and simple function (that still has a bazillion use cases) and the other is the heavy lifting swiss army knife.

    "file_paths(filtered_walk('.', included_files=['*.py']))" is a lot longer than "rglob('*.py')".

    @pitrou
    Copy link
    Member

    pitrou commented Feb 8, 2012

    "file_paths(filtered_walk('.', included_files=['*.py']))" is a lot
    longer than "rglob('*.py')".

    Agreed.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Feb 8, 2012

    A fair point indeed.

    To follow the shutil naming convention (rmtree, copytree, and likely chmodtree, chowntree), a more appropriate name might be "globtree". (Thanks to string methods, the 'r' prefix doesn't read correctly to me: what does "globbing from the right" mean?)

    @pitrou
    Copy link
    Member

    pitrou commented Feb 8, 2012

    To follow the shutil naming convention (rmtree, copytree, and likely
    chmodtree, chowntree), a more appropriate name might be "globtree".
    (Thanks to string methods, the 'r' prefix doesn't read correctly to
    me: what does "globbing from the right" mean?)

    Well, if you put it in the glob module, it doesn't have to follow the
    shutil naming convention :-)
    (I prefer "rglob" myself)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Feb 8, 2012

    I can live with it either way - I just wanted to point out that our current examples of this kind of recursive filesystem access use a 'tree' suffix rather than an 'r' prefix.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Feb 8, 2012

    "file_paths(filtered_walk('.', included_files=['*.py']))" is a lot longer than "rglob('*.py')".

    It is, but is that a good enough reason to have both? It can also be achieved with just a bit more code using the simple os.walk. I suppose there are a lot of instances of stdlib tools where we could add new tools that would make the code slightly shorter. However, this is not really faithful to the Python spirit, since it adds too many ways to do achieve the same effect, and ultimately confuses users.

    That it adds additional maintenance burden on the coredevs goes without saying :-) Each such new burden should have a very good reason.

    To conclude, personally I'm -1 on this, especially if walkdir eventually makes it into the stdlib.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 8, 2012

    "file_paths(filtered_walk('.', included_files=['*.py']))" is a lot
    longer than "rglob('*.py')".

    It is, but is that a good enough reason to have both?

    It is. globbing is a well-known operation that many people expect to be
    easily done.

    However, this is not really faithful to the Python spirit, since it
    adds too many ways to do achieve the same effect, and ultimately
    confuses users.

    Which "Python spirit" are you talking about? We have many high-level
    tools in the stdlib.

    @merwok
    Copy link
    Member

    merwok commented Feb 8, 2012

    There is an alternative: supporting ** syntax, e.g. '**/.py', which should find all *.py files in the current directory and all descendents. At present glob('**/.py') is equivalent to glob('*/*.py'), but we would say this behavior was undefined and the new behavior would be a new feature.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Feb 8, 2012

    > It is. globbing is a well-known operation that many people expect to be easily done.

    According to Wikipedia (http://en.wikipedia.org/wiki/Glob_%28programming%29) - "The noun "glob" is used to refer to a particular pattern, e.g. "use the glob *.log to match all those log files"".

    IOW, globbing is usually understood as the act of expanding a pattern to the files it matches. Nothing in that implies recursive traversal of a directory tree. On the other hand, os.walk and/or walkdir suggest that in their name.

    > Which "Python spirit" are you talking about? We have many high-level
    tools in the stdlib.

    There should be one -- and preferably only one -- obvious way to do it.

    Admittedly, we already have more than one, and a high-level tool is proposed with Nick's walkdir. Why add *yet another* high-level tool?

    @pitrou
    Copy link
    Member

    pitrou commented Feb 8, 2012

    IOW, globbing is usually understood as the act of expanding a pattern
    to the files it matches. Nothing in that implies recursive traversal
    of a directory tree.

    Still, that's a common need. "I want all Python files in a subtree".

    On the other hand, os.walk and/or walkdir suggest that in their name.

    I don't know why "walk" is supposedly more recursive than "glob".

    Admittedly, we already have more than one, and a high-level tool is
    proposed with Nick's walkdir. Why add *yet another* high-level tool?

    Because the walkdir spelling (IIUC) is longish, tedious and awkward.
    I could see myself typing "rglob('*.py')" in a short script or an
    interpreter session, without having to look up any kind of docs.
    Certainly not the walkdir alternative (I've already forgotten what it
    is).

    @merwok merwok changed the title Add a recursive function to the glob package Support recursive globs Feb 8, 2012
    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Feb 8, 2012

    > IOW, globbing is usually understood as the act of expanding a pattern
    > to the files it matches. Nothing in that implies recursive traversal
    > of a directory tree.

    Still, that's a common need. "I want all Python files in a subtree".

    > On the other hand, os.walk and/or walkdir suggest that in their name.

    I don't know why "walk" is supposedly more recursive than "glob".

    Google "walk directory". First hit is a Rosetta code page with
    *recursive* walking implemented in various languages. So I guess it
    does have this connotation. Regardless, os.walk has been in Python for
    ages, and it's always been the go-to tool for recursive traversal.
    walkdir's name suggests the same.

    > Admittedly, we already have more than one, and a high-level tool is
    > proposed with Nick's walkdir. Why add *yet another* high-level tool?

    Because the walkdir spelling (IIUC) is longish, tedious and awkward.
    I could see myself typing "rglob('*.py')" in a short script or an
    interpreter session, without having to look up any kind of docs.
    Certainly not the walkdir alternative (I've already forgotten what it
    is).

    walkdir is a new module proposal. If its API is tedious and awkward,
    it should probably be improved *now* while it's in development. Adding
    yet another tool that implements part of its functionality, winning a
    golf tournament along the way, isn't the solution, IMHO.

    @elibendersky elibendersky mannequin changed the title Support recursive globs Add a recursive function to the glob package Feb 8, 2012
    @merwok
    Copy link
    Member

    merwok commented Feb 8, 2012

    Feedback from Antoine on IRC about my syntax proposal: “The "**" meaning is not really universal like other quantifiers are. [...] (also, it would be quite harder to implement, I think)”

    That and the compat issue makes me go in favor of a new function.

    I’m not sure glob is the right place: when you use glob.glob, the search is rooted in the current directory, and you may have sub-directories in your pattern, e.g. 'Lib/*/main.py'. A function meaning “look for this file pattern recursively” would be IMO more at home in fnmatch.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 8, 2012

    Google "walk directory". First hit is a Rosetta code page with
    *recursive* walking implemented in various languages. So I guess it
    does have this connotation. Regardless, os.walk has been in Python for
    ages, and it's always been the go-to tool for recursive traversal.
    walkdir's name suggests the same.

    You still haven't explained what your problem is with the idea of an
    explicitly recursive glob (as both "rglob" and "globtree" suggest).

    walkdir is a new module proposal. If its API is tedious and awkward,
    it should probably be improved *now* while it's in development.

    walkdir is not yet a module proposal, there's not even a PEP for it, and
    it's in a very young state.

    This issue has a working patch for rglob(), which is a single, obvious,
    incremental addition to the existing glob module. If you want to discuss
    walkdir, I suggest you do it in a separate issue.

    (and, yes, rglob() can be reimplemented using walkdir later, if there is
    a point in doing so)

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Feb 8, 2012

    > Google "walk directory". First hit is a Rosetta code page with
    > *recursive* walking implemented in various languages. So I guess it
    > does have this connotation. Regardless, os.walk has been in Python for
    > ages, and it's always been the go-to tool for recursive traversal.
    > walkdir's name suggests the same.

    You still haven't explained what your problem is with the idea of an
    explicitly recursive glob (as both "rglob" and "globtree" suggest).

    The problem is that I prefer the walkdir approach, because it solves a
    more general problem and overall more useful. This is also why I don't
    see how it makes sense to stop discussing it here and focus on rglob.
    They are related, after all!

    Anyway, I'm not sure what else I can add to the discussion. I'm
    starting to repeat myself, which means that I should just shut up :)

    I've stated my preference, and I understand and respect yours. So
    let's just see what others think.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 8, 2012

    I'm trying the patch and its behaviour is strange:

    >>> list(glob.rglob('setup.py'))
    ['setup.py']
    >>> list(glob.rglob('setu*.py'))
    []
    >>> list(glob.rglob('*/setu*.py'))
    ['./setup.py', './Mac/Tools/Doc/setup.py', './Tools/test2to3/setup.py', './Doc/includes/setup.py', './PC/example_nt/setup.py']

    I can understand the first example (although that makes the documentation slightly incorrect, since you need an explicit "*" path component for the search to be recursive), but the second one looks straight wrong.

    @merwok
    Copy link
    Member

    merwok commented Feb 8, 2012

    >> list(glob.rglob('*/setu*.py'))

    It looks quite strange to me that '/' should be allowed in a function that recurses down directories (see my messages above). OTOH fnmatch is not really appropriate, contrary to my earlier feeling.

    (Restoring my title change: as my messages were apparently overlooked, I assume that Eli did not revert my change on purpose but by replying to older email)

    @merwok merwok changed the title Add a recursive function to the glob package Support recursive globs Feb 8, 2012
    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Feb 8, 2012

    Oops, Éric, sorry about the title. I didn't even notice :)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Feb 8, 2012

    I think it's important to be clear on what the walkdir API aims to be: a composable toolkit of utilities for directory tree processing. It's overall design is inspired directly by the itertools module.

    Yes, it started life as a simple proposal to add shutil.filtered_walk (http://bugs.python.org/issue13229), but I soon realised that implementing this solely as a monolothic function would be foolish, since that approach isn't composable. What if you just wanted file filtering? Or depth limiting? Having it as a filtering toolkit lets you choose the exact filters you need for a given use case. walkdir.filtered_walk() is just an API for composing filtering pipelines without needing to pass the same information to multiple pipeline stages.

    However, along with that itertools inspired iterator pipeline based design, I've also inherited Raymond's preference that particular *use cases* start life as recipes in the documentation.

    A recursive glob is just a basic walkdir pipeline composition:

    >>> from walkdir import file_paths, include_files
    >>> def globtree(pattern, path='.'):
    ...     return file_paths(include_files(os.walk(path), pattern))
            
    Since filtered_walk() is just a pipeline builder, the composition can also be written:
    
    >>> from walkdir import file_paths, filtered_walk
    >>> def globtree(pattern, path='.'):
    ...     return file_paths(filtered_walk(path, included_files=[pattern]))

    That latter approach then suggests an alternative signature for globtree:

    def globtree(*patterns, **kwds):
        kwds.setdefault("top", ".")
        return file_paths(filtered_walk(included_files=patterns, **kwds))
    >>> print '\n'.join(sorted(globtree('*.rst')))
    ./index.rst
    ./py3k_binary_protocols.rst
    ./venv_bootstrap.rst
    
    >>> print '\n'.join(sorted(globtree('*.rst', '*.py')))
    ./conf.py
    ./index.rst
    ./py3k_binary_protocols.rst
    ./venv_bootstrap.rst

    On a somewhat related note, I'd also like to see us start concentrating higher level shell utilities in the shutil namespace so users don't have to check multiple locations for shell-related functionality quite so often (that is, I'd prefer shutil.globtree over glob.rglob).

    @pitrou
    Copy link
    Member

    pitrou commented Feb 9, 2012

    However, along with that itertools inspired iterator pipeline based
    design, I've also inherited Raymond's preference that particular *use
    cases* start life as recipes in the documentation.

    I think it's important to remember where we are coming from. Many people
    complain that using os.walk is too cumbersome. Proposing another
    cumbersome solution doesn't really help.

    So I'm not against walkdir *per se*, but I'm -1 on the idea that walkdir
    can eliminate the need for practical functions that anybody can use
    *easily*.

    >>> print '\n'.join(sorted(globtree('*.rst', '*.py')))
    ./conf.py
    ./index.rst
    ./py3k_binary_protocols.rst
    ./venv_bootstrap.rst

    I think it's rather nice, but it should be available as a stdlib
    function rather than a "recipe" in the documentation.

    Recipes are really overrated: they aren't tested, they aren't
    maintained, they aren't part of a module's docstrings or
    (pydoc-generated) contents, it's not obvious what kind of quality you
    can expect from them (do they handle all cases correctly), it's not
    obvious which Python versions they support. Raymond may like the idea,
    but that doesn't make it a "good practice" Python should follow for its
    batteries.

    On a somewhat related note, I'd also like to see us start
    concentrating higher level shell utilities in the shutil namespace so
    users don't have to check multiple locations for shell-related
    functionality quite so often (that is, I'd prefer shutil.globtree over
    glob.rglob).

    Well, if glob() already lived in shutil, this decision would be a
    no-brainer :) Having glob() in the glob module and globtree() in the
    shutil module, though, looks a bit weird.
    (I agree having a separate module for glob isn't ideal)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Feb 9, 2012

    We do have the option of aliasing glob.iglob as shutil.glob...

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Feb 9, 2012

    > Well, if glob() already lived in shutil, this decision would be a
    no-brainer :) Having glob() in the glob module and globtree() in the
    shutil module, though, looks a bit weird.
    (I agree having a separate module for glob isn't ideal)

    Would it be feasible to deprecate the 'glob' module, moving its functionality to shutil? In some future Python version, then, the module can be removed.

    The same fate would go for fnmatch, I guess. There are too many modules lying around dealing with the same problems.

    On a related note, the doc of glob explicitly mentions that it is implemented with os.listdir and fnmatch. Similarly, *if* the recursive glob gets accepted it should be implemented with walkdir (once that's in).

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Feb 9, 2012

    This discussion (particularly my final globtree recipe) made me realise that the exact same approach would greatly improve the usability of the all_paths, file_paths and dir_paths iterators in walkdir [1]. Accordingly, walkdir 0.4 will let you write a recursive grep for ReST and Python source files as:

        file_paths(top, included_files="*.py *.rst".split())

    Scanning multiple directories will be as simple as:

        file_paths(dir1, dir2, included_files="*.py *.rst".split())

    [1] https://bitbucket.org/ncoghlan/walkdir/issue/15

    @ubershmekel
    Copy link
    Mannequin Author

    ubershmekel mannequin commented Feb 9, 2012

    Thanks for the bug find Antoine, I worked surprisingly hard trying to make this right in more edge cases and while fixing it I noticed rglob/globtree has 3 options:

    • Behave like a glob for every subdirectory. Meaning that every relative path gets a '*/' prepended to it. Eg rglob('c/d') started from the directory 'a' will yield 'a/b/c/d'.

    • Behave like a glob for every subdirectory of the directory in the filter string. Meaning rglob('c/d') from dir 'a' won't yield 'a/b/c/d'. It would try to walk from 'a/c' and yield nothing if the directory 'c' doesn't exist in 'a'. Note that if the directory 'c' does exist then '/a/c/f/d' would be yielded. That seems kind of quirky to me.

    • Behave like a filtered walk. Meaning that in order to yield files nested in subdirectories a wildcard must be introduced. Eg rglob('c/d') started from the directory 'a' won't yield 'a/b/c/d'. For that to occur you would need to use rglob('*c/d') or rglob('*/c/d'). What's more unfortunate is that rglob('d') doesn't yield 'a/b/c/d' which seems wrong. So I think for this we should special case paths that don't have path separators and prepend the "*/". Though some may argue it's wrong that rglob('d') yields 'a/b/c/d' even though rglob('c/d') won't yield it, I think that's the correct choice for this route.

    Note that absolute paths with/without wildcards don't have this ambiguity. In both rel/abs wildcards should match directories and files alike.

    Which option do you guys think would be best? I already have a fixed patch for option 1 and 3 but I'd rather hear your thoughts before I introduce either.

    P.s. another slight issue I ran into is the fact that fnmatch doesn't ignore os.curdir:

        >>> fnmatch.fnmatch('./a', 'a')
        False

    @pitrou
    Copy link
    Member

    pitrou commented Apr 1, 2012

    I found this comprehensive description of the '**' convention at
    http://www.codeproject.com/Articles/2809/Recursive-patterned-File-Globbing that can translate directly to unittests.

    I'd like to fix the patch for these specs but should it be in a new
    rglob function or in the existing glob.glob()? I think it should be a
    new one to avoid any edge-case compatibility concerns even though on
    face value there shouldn't be any.

    I think it should be the existing glob.glob(). We won't introduce a new
    function any time we add a new syntactic feature in the glob
    mini-language.

    @ubershmekel
    Copy link
    Mannequin Author

    ubershmekel mannequin commented Apr 1, 2012

    I don't have a strong opinion on "rglob vs glob" so whichever way the majority here thinks is fine by me.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Apr 1, 2012

    For "**" globbing see http://ant.apache.org/manual/dirtasks.html#patterns .

    If we extend pattern syntax of templates, why not implement Perl, Tcl or Bash extensions?

    @ubershmekel
    Copy link
    Mannequin Author

    ubershmekel mannequin commented Apr 1, 2012

    On Sun, Apr 1, 2012 at 4:42 PM, Serhiy Storchaka <report@bugs.python.org> wrote:

    For "**" globbing see http://ant.apache.org/manual/dirtasks.html#patterns

    They mention that "mypackage/test/ is interpreted as if it were mypackage/test/**" so that's not really an option. I'm pretty sure we should only recurse if "**" appears explicitly.

    If we extend pattern syntax of templates, why not implement Perl, Tcl or
    Bash extensions?

    I'm not sure what you mean here but if it's that ##{} stuff then it should probably be discussed in a separate issue as it's not related to recursive globs.

    @ubershmekel
    Copy link
    Mannequin Author

    ubershmekel mannequin commented Apr 25, 2012

    I added the doublestar functionality to iglob and updated the docs and tests.

    Also, a few readability renames in that module were a long time coming.

    I'd love to hear your feedback.

    @ubershmekel
    Copy link
    Mannequin Author

    ubershmekel mannequin commented May 6, 2012

    So, anybody for or against this patch? I'd really like to see this feature make its way in...

    @pitrou
    Copy link
    Member

    pitrou commented May 6, 2012

    So, anybody for or against this patch? I'd really like to see this
    feature make its way in...

    I think the feature is useful, but someone needs to review the patch.
    Sorry if it takes some time.

    @pitrou
    Copy link
    Member

    pitrou commented May 15, 2012

    I'm looking at the tests and I don't understand why '**/bcd/*' should match 'a/bcd/efg/ha'. Am I missing something?

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Dec 5, 2012

    Here is a patch which implements recursive globbing which conforms to Bash globbing with "globstar" option.

    For backward compatibility recursive globbing off by default and works only if new argument "recursive" is true (default is False). I am not sure this is a better variant. Possible the default should be True. '**' pattern is very unlikely in old code. However recursive globbing on arbitrary pattern and arbitrary tree is not safe, it can hang on recursive symlinks.

    The patch contains changes from bpo-16618.

    @serhiy-storchaka serhiy-storchaka self-assigned this Dec 29, 2012
    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Jan 2, 2013

    Patch updated for current tip.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Jan 14, 2013

    I should add a symlink loop detecting to _rlistdir() as Antoine advised me on IRC.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Jan 14, 2013

    In fact glob() is already protected against an endless recursion (in the same way as Bash). The level of recursion is simply limited by the maximum length of the path. So I did not change the implementation, I have just added a test for symlink loop. I also corrected the other new tests so that they should not fail on the platform without symlinks.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Nov 17, 2013

    In updated patch fixed warning/errors when ran with -b or -bb options.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Mar 7, 2014

    Oops, Python 3.4 has ** support in pathlib, but we missed Serhiy's patch for the glob module itself. We should resolve that discrepancy for 3.5 :)

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Sep 5, 2014

    Could you make a review Nick?

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 5, 2014

    Mostly looks good to me, just two comments:

    1. Is there a reason the helper function is "glob2" rather than either "_glob2" or else something more self-documenting?

    2. "match a files" in the docs and docstrings doesn't read correctly. "match any files", perhaps?

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Sep 6, 2014

    Thank you Nick.

    1. Is there a reason the helper function is "glob2" rather than either
      "_glob2" or else something more self-documenting?

    Only consistency with other helper functions (glob0, glob1).

    1. "match a files" in the docs and docstrings doesn't read correctly. "match
      any files", perhaps?

    Of course. This is just a typo.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 6, 2014

    Looks good to me!

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 11, 2014

    New changeset ff4b9d654691 by Serhiy Storchaka in branch 'default':
    Issue bpo-13968: The glob module now supports recursive search in
    http://hg.python.org/cpython/rev/ff4b9d654691

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Sep 11, 2014

    Thank you for your review Nick.

    @vstinner
    Copy link
    Member

    vstinner commented Sep 11, 2014

    The test failed on a buildbot, I reopen the issue.

    http://buildbot.python.org/all/builders/x86%20Ubuntu%20Shared%203.x/builds/10607/steps/test/logs/stdio

    ======================================================================
    FAIL: test_selflink (test.test_glob.SymlinkLoopGlobTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_glob.py", line 284, in test_selflink
        self.assertIn(path, results)
    AssertionError: '@test_23056_tmp_dir/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/file' not found in {'noodly2', '@test_23056_tmp-\udcff.py', '__pycache__'}

    @vstinner vstinner reopened this Sep 11, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 11, 2014

    New changeset 180f5bf7d1b9 by Serhiy Storchaka in branch 'default':
    Issue bpo-13968: Fixed newly added recursive glob test.
    http://hg.python.org/cpython/rev/180f5bf7d1b9

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Sep 11, 2014

    Thank you Victor. The test was failed also when run it directly, omitting the
    test.regrtest module (which run a test inside temporary directory):

    ./python [Lib/test/test_glob.py](https://github.com/python/cpython/blob/main/Lib/test/test_glob.py)
    

    Now it is fixed.

    However perhaps we should consider as a bug if a test ran by regrtest doesn't
    clean created files or directories ('noodly2', '@test_23056_tmp-\udcff.py', and
    '__pycache__' are created by some previous tests).

    @vstinner
    Copy link
    Member

    vstinner commented Sep 11, 2014

    However perhaps we should consider as a bug if a test ran by regrtest doesn't clean created files or directories

    => yes, I opened the issue bpo-22390.

    @marc-h38
    Copy link
    Mannequin

    marc-h38 mannequin commented Apr 23, 2019

    Please review one word documentation change at #12918 to clarify that recursive glob ** follows symbolic links to directories.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants