New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path/filepath: Glob should support `**` for zero or more directories #11862

Open
ascarter opened this Issue Jul 24, 2015 · 31 comments

Comments

Projects
None yet
@ascarter

ascarter commented Jul 24, 2015

Go version 1.4.2
Mac OS X 10.10

Example:

package main

import "fmt"
import "path/filepath"
import "os"

func main() {
    files, err := filepath.Glob("/usr/local/go/src/**/*.go")
    if err != nil {
            fmt.Print(err)
            os.Exit(1)
    }
    fmt.Printf("files: %d\n", len(files))
    for _, f := range files {
        fmt.Println(f)
    }
}

Expected:

% ls /usr/local/go/src/**/*.go | wc -l
    1633

Actual:

files: 732

It seems that ** is equivalent to *. The extended ** pattern is common in shells and is supported in Rust and Java for example.

@ascarter

This comment has been minimized.

ascarter commented Jul 24, 2015

For reference, here is Rust's implementation. I think these rules would be a good description of how Go's filepath.Glob should work:

https://github.com/rust-lang/glob/blob/master/src/lib.rs#L369

@ianlancetaylor ianlancetaylor changed the title from filepath.Glob should support `**` for zero or more directories to filepath: Glob should support `**` for zero or more directories Jul 25, 2015

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 25, 2015

For reference, this is supported by bash when invoked as "bash -O globstar" or after running "shopt -s globstar".

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Jul 25, 2015

@mikioh mikioh changed the title from filepath: Glob should support `**` for zero or more directories to path/filepath: Glob should support `**` for zero or more directories Jul 25, 2015

ascarter added a commit to ascarter/gotags that referenced this issue Aug 5, 2015

Add exclude option
Added --exclude option that mimics ctag's exclude flag. The pattern is expected to be a shell glob pattern (which is what Go's filepath.Match method expects). It can be set multiple times to set several filters. Once the file list is built, the patterns will be applied to each file to filter out any names that match the exclude patterns before processing.

Note: Glob pattern '**' does not work recursively. See open issue golang/go#11862
@nodirt

This comment has been minimized.

Member

nodirt commented Oct 22, 2015

OK to implement? @adg

@nodirt

This comment has been minimized.

Member

nodirt commented Oct 22, 2015

@bradfitz for opinion

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Oct 22, 2015

I think this is OK to implement.

@adg

This comment has been minimized.

Contributor

adg commented Oct 23, 2015

Yep, I agree.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 23, 2015

CL https://golang.org/cl/16280 mentions this issue.

@dotcomputercraft

This comment has been minimized.

dotcomputercraft commented Dec 13, 2015

all, I wanted to check in to see if this issue (#11862) was fixed. Please let me know status of this ticket. Is there a work around to this problem? Talk to you soon. Thanks in advance.

@nodirt

This comment has been minimized.

Member

nodirt commented Dec 13, 2015

Status: currently I'm not working on this. Feel free to unassign.

@rsc

This comment has been minimized.

Contributor

rsc commented Jan 4, 2016

I certainly see the utility here but adding ** has various knock-on effects that may not be obvious. For example, * matches symlinks to other directories. ** cannot, at least not without extra work to avoid file system cycles. Also it's not obvious this should be considered a backwards compatible change, both for Glob and for Match. And you'd probably want to do path.Match as well. There's a lot to consider here, even though it looks trivial.

@slimsag

This comment has been minimized.

slimsag commented Feb 8, 2016

@rsc closed the most recent CL for this with the comment:

I really don't think we should do any of this. This is much less understandable than the current code. If you need a very fast, very complex filepath.Glob, it is of course possible to write one outside the standard library.

Should this issue remain open?

@rsc

This comment has been minimized.

Contributor

rsc commented Feb 9, 2016

This issue does remain open. What's confusing is that GitHub has linked just above your comment to issue Shopify/themekit#102, which is closed.

@slimsag

This comment has been minimized.

slimsag commented Feb 9, 2016

@rsc that is because someone linked to it in a Markdown link / comment: Shopify/themekit#102 (comment)

What can be done here? A better implementation? It's not clear to me.

@extemporalgenome

This comment has been minimized.

extemporalgenome commented Feb 9, 2016

I think it's useful, but I wouldn't consider it backwards compatible. ** already works and does something (which is the same something as a single star).

@ascarter

This comment has been minimized.

ascarter commented Feb 9, 2016

@extemporalgenome At best, I'd argue that ** is undefined and it just happens to do the same thing as *. I don't think this should represent an issue in regards to backwards compatibility.

@ascarter

This comment has been minimized.

ascarter commented Feb 9, 2016

@slimsag the shopify issue I think is different. They chose to walk the tree and that solved their case - I don't think that applies to the general problem. I think @rsc was saying that the solution here is non trivial (and it's not just filepath.Glob but also path.Match).

I'd like it to stay on the list. Coming from other languages, I think this is a pretty common feature for glob syntax.

Note that the downside of attempting to write your own implementation is that I think you would wind up re-implementing glob entirely. I'm trying to figure out how to solve this for http://github.com/jstemmer/gotags since it won't be solved in the standard library anytime soon.

@extemporalgenome

This comment has been minimized.

extemporalgenome commented Feb 9, 2016

@ascarter * is very well defined as matching any sequence (including an empty sequence) of non-separator characters. Thus ** is equally well defined as matching zero or more non-separator characters, followed by a match of zero characters on the tail of the same filename.

The proposed special-casing of ** will certainly not match less than it does now, but may match more, and the question is whether existing programs could become broken through what constitutes a change in behavior.

@mattn

This comment has been minimized.

Member

mattn commented Feb 9, 2016

I'm thinking this feature should be provided from third-party products not core.
for example, you can use https://github.com/mattn/go-zglob

@r03

This comment has been minimized.

r03 commented Aug 15, 2017

I would be nice if the documentation about Glob mentioned that ** isn't implemented.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Aug 15, 2017

@r03 The current docs are precise about what is implemented. I'm not sure it's necessary to say what is not implemented.

@ryantriangles

This comment has been minimized.

ryantriangles commented Oct 20, 2017

I agree with @r03 that it would be nice to see the lack of support mentioned; it is the way glob functions in other languages, like Ruby, work, so it might not even occur to users that the feature might not be implemented.

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 20, 2017

Like Ian said, I'm not convinced we should enumerate the many possible glob extension syntaxes that exist but are not implemented by this function.

@vladbarosan vladbarosan referenced this issue Mar 7, 2018

Merged

Add indexer tool #1181

0 of 6 tasks complete
@mfilotto

This comment has been minimized.

mfilotto commented Mar 28, 2018

Any update ?

@FabledWeb

This comment has been minimized.

FabledWeb commented Jun 26, 2018

I would have to disagree with anyone claiming that the docs shouldn't mention this.

If you called it something other than "glob", then I would agree. But "glob" has a set of features that exist across languages/systems. Calling it the same as everything else, but not being like the other versions is worth calling out. Otherwise, don't call it glob.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 26, 2018

@FabledWeb filepath.Glob correponds to the glob function in the C standard library.

@FabledWeb

This comment has been minimized.

FabledWeb commented Jun 26, 2018

@ianlancetaylor I'm sure there is a good reason for it, but the common use of the word has a different meaning. Documentation shouldn't rely on someone knowing that the name refers to the C standard library version and that the C standard library acts differently than most things they'd come in contact with.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 26, 2018

@FabledWeb I suppose that what I'm doing is disagreeing with you about the common use of the word. To mean "glob" means the operation in the C library and the operation performed by the shell, and for neither of those does ** have a special meaning (in bash you can set an option to give ** a special meaning, but it's not the default). I understand that for you glob implies **, but for me it doesn't.

I continue to believe that the documentation should explain what the function does, not what it doesn't do.

@FabledWeb

This comment has been minimized.

FabledWeb commented Jun 27, 2018

Man, I wish there was threading here so I don't take all this screen real-estate.

The fact that we disagree on what the word means, and the fact that I and others didn't immediately realize that it didn't support the glob syntax we assumed was standard, suggests that there's something missing from the documentation that would make it clear what's not supported (by way of stating what is supported if you prefer). My real point about all this is that people are getting confused and the way to fix that is not to say "the documentation is accurate". The documentation should serve to help users understand, not just be a pure technical spec. If you don't want to say "the multi-directory wildcard ** isn't supported", then I can't say precisely what to change to make it more consumable, but apparently there is something lacking right now.

By the way @ianlancetaylor, I love that you're so active on these issues. It's refreshing to actually get responses when commenting or opening PRs.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 27, 2018

I guess it seems to me that the documentation of Glob does say what is supported, albeit by reference to Match which lists the precise syntax. I'd be happy to see specific suggestions for how to improve the docs, I just want to avoid saying "By the way, you may have heard about ** in glob, but that doesn't work." That just seems odd.

By the way @ianlancetaylor, I love that you're so active on these issues. It's refreshing to actually get responses when commenting or opening PRs.

Thanks, and, you're welcome.

@EliCDavis

This comment has been minimized.

EliCDavis commented Aug 1, 2018

Python solved this with an optional recursive=True parameter that could be passed to the function for globbing.

One potential solution could be an addition to the API that would provide recursive functionality so backward compatibility would be less of an issue, something like:

filepath.GlobRecursive("**/*")

However that would no longer use exact "Match" functionality that is described in the docs. If that was an issue you could always add an aditonal method MatchRecursive, however that does not seem very elegant of a solution.

Sidenote: I too made the assumption on how ** worked because of how it's been used in other languages and I am unfamilar with C.

@stu-b-doo

This comment has been minimized.

stu-b-doo commented Aug 2, 2018

Another 3rd party lib: I've just come across https://github.com/bmatcuk/doublestar, seems to be working okay so far...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment