Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: ignore directories containing `.goignore` in file scans #30058

Open
tj opened this issue Feb 1, 2019 · 36 comments

Comments

@tj
Copy link

commented Feb 1, 2019

@bradfitz mentioned go doc could special-case node_modules for exclusion.

If you have many client/server projects in $GOPATH, node_modules can quickly bloat and cause go doc to resolve slowly—this is especially noticeable in editors which use go doc to display inline documentation (see vscode issue).

I recursively removed my node_modules (15gb, crazy I know), here are the results before and after:

$ time go doc something

real    2m25.068s
user    0m17.772s
sys    1m45.039s

$ time go doc something

real    0m19.955s
user    0m2.499s
sys    0m9.343s
@tj tj changed the title cmd/doc: performance hindered greatly by node_modules cmd/doc: improve performance by excluding node_modules Feb 1, 2019
@agnivade

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

/cc @robpike

@bradfitz

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

@bradfitz mentioned go doc could special-case node_modules for exclusion.

That was done as a result of #16417, which you opened in 2016 and is still open.

@bradfitz bradfitz added this to the Go1.13 milestone Feb 1, 2019
@mvdan

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

I presume this would extend the following bit of go help packages:

Directory and file names that begin with "." or "_" are ignored by the go tool, as are directories named "testdata".

Or is it fine for go doc to skip certain directories, while other commands like go build and go test wouldn't?

@tj

This comment has been minimized.

Copy link
Author

commented Feb 2, 2019

Now that I think about it, gorename took ~5-10 minutes before as well, I'll have to try it now. It does seem a little hacky to special-case directories specific to some other language/framework, but I can't say I've ever wanted to operate against anything in node_modules either so it probably wouldn't be harmful

@agnivade

This comment has been minimized.

Copy link
Member

commented Feb 2, 2019

To generalize this behavior, should we also ignore top-level directories under .gitignore ? That should take care of node_modules as well.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2019

This can't be done in just cmd/doc. If we're going to do it, it should be globally, in cmd/go, go/build, etc. (And we can't start parsing .gitignore.)

It's unfortunate to have the definition of Go packages depending on or working around problems specific to other languages. On the other hand 15 GB is pretty big.

@rsc rsc changed the title cmd/doc: improve performance by excluding node_modules proposal: cmd/doc: improve performance by excluding node_modules Feb 6, 2019
@gopherbot gopherbot added the Proposal label Feb 6, 2019
@rsc

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

If we do this, we have to write:

Directory and file names that begin with "." or "_" are ignored by the go tool,
as are directories named "node_modules" or "testdata".

That's a bit strange, to call out Node explicitly like that. Are there any other giant directories to avoid too?

We could also make it user-configurable in some way, but then the meaning of things like 'go list foo/...' or 'go install foo/...' becomes configuration-specific, which we've tried hard to avoid.

@snikch

This comment has been minimized.

Copy link

commented Feb 27, 2019

I've been watching this as I've been affected heavily in the past with this via goimports and more recently go doc.

I agree that it is unfortunate to have to call out a specific language (and all derivatives of), however in the scope of web development as a practice today with node_modules being ubiquitous for front end development, it doesn't seem unreasonable.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

I've also had this problem with Carthage. Looking at my .goimportsignore, I also see a few packages that are Go wrappers around large C projects that have lots of unrelated code and that do in-place builds, leading to large numbers of build artifacts.

@agnivade

This comment has been minimized.

Copy link
Member

commented Feb 28, 2019

Some more discussion here - #16427. I agree that we should make it user-configurable. Hardcoding node_modules just seems odd to me.

At the expense of adding new knobs, I think having a .goignore file will help a lot (given that we already have .goimportsignore). And that file can serve as a common base which can be inspected by all other tools (go doc, gorename, goimports).

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

I agree that adding .goignore would be better than special-casing node_modules.

@robpike

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

Please don't create more dot files. If the tools need configuration, use a single file.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

The suggestion is basically to rename .goimportsignore to .goignore so there would be a single file.

It occurs to me that if we adopt #30411 then another possible approach would be to use an environment variable.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

What about

go env -w GOIGNORE=

(as Ian suggested)? This would apply to go doc and so on but not to the extraction of modules from your Git repos. If the files go into your git repos, they will go into the module zips.

We've been circling various options here for a while. This seems like the least bad.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

go env -w GOIGNORE=

Ignore lists like this tend to grow slowly over time. This is a nice fit for a text file, which is easy to edit and append to. Env vars are much less edit-friendly. Every time I type export PATH=“$PATH”: I experience a Rob-like twinge of exasperation. (Or anyway, I imagine it is Rob-like.)

@rsc

This comment has been minimized.

Copy link
Contributor

commented May 7, 2019

Ian suggests that we might also consider a special file in the directory to be ignored. For example, we could say that when the go command walks into a directory and finds a file named .goignore or .goskipdir or .gorobots.txt, it turns around and walks back out immediately. So if you needed a node_modules inside your Go module working directory, you'd just put this magic file in there too.

Thoughts? Name bikesheds?

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented May 7, 2019

That can be a bit annoying to work with for directories that sometimes get removed and then replaced with newer versions or for directories owned by a tool that then needs to be told to ignore the sentinel file.

It semi-makes sense to include the blacklist in go.mod since it would be listing directories that, though they happen to be contained in the module root, are not actually part of the module.

@agnivade

This comment has been minimized.

Copy link
Member

commented May 8, 2019

Ian suggests that we might also consider a special file in the directory to be ignored.

A major problem that I see is that it will need .gitignore'd directories to be in the filesystem. For eg. node_modules are not checked in to a repo. But in this case, we would need an empty node_modules to be present, just for that special file to reside. Does not feel very elegant.

Moreover, this will create multiple files peppered throughout the repo compared to a single file. Also, now it becomes harder to know what directories are ignored at once.

What is the advantage of this solution over having a single dot file, since both are using a special magic file anyways (which I thought was the main point of contention) ?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

Typically the directories that should be ignored don't have to live in a Go repo at all. After all, by definition they don't contain any Go code. So the advantage of a file in a directory is that only people who create that directory will need to put the file in it. If we use a single dot file, it's hard to avoid putting that file into the Go repo, but now people who create a directory that doesn't contain any Go code have to maintain a local patch to the Go repo. (That argument does not apply to .gitignore since git supports putting ignore patterns outside of the repo.)

(I'm not saying we have to do it that way, I'm just answering the question about the advantage of this approach over a single dot file.)

@mvdan

This comment has been minimized.

Copy link
Member

commented May 8, 2019

Android has a .nomedia empty file and I think it works nicely. I think it would be a cleaner solution than adding more config files or more env vars. If anyone wishes to mix Go with other tools/languages in a single repository which produce tons of files, I assume they can make sure to also add such a file as necessary.

@agnivade

This comment has been minimized.

Copy link
Member

commented May 8, 2019

Thanks for the clarification Ian.

The only issue I see is that new users cloning a node repo have to traverse inside the node_modules folder and create the special file everytime, as opposed to the folder ignore path being already there in the single-file solution. But your point on maintaining local Go patches is valid as well. I am fine with both solutions.

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

So the advantage of a file in a directory is that only people who create that directory will need to put the file in it.

I'm concerned that this won't play well with external dependencies, either managed via git submodules or via some package manager. To get the go tool to ignore them, we'd have to add a Go-specific file to a completely non-Go repository; I can't imagine other projects being happy about that. The alternative is to add the file yourself after downloading the code, but dirtying the repo will also be an annoyance; it'll cause git submodule pain, any package manager that checks directory contents for security purposes may fail, etc.

@andybons andybons removed this from the Go1.13 milestone Jul 8, 2019
@andybons andybons added this to the Go1.14 milestone Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Proposal Jul 16, 2019
@rsc

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

There really shouldn't be these directories inside module trees to begin with, so hopefully this is a rare case.

In this rare case, is it unreasonable to expect people to put .goignore files in those directories? If a script is deleting and recreating that directory, it can create the file, right? Is there an established non-Go-specific convention? (.nomedia doesn't sound right, but anything else?)

@tj

This comment has been minimized.

Copy link
Author

commented Oct 2, 2019

@rsc sounds fine to me! But if the tooling doesn't fall back on GOPATH maybe this is a non-issue going forward

@rsc rsc changed the title proposal: cmd/doc: improve performance by excluding node_modules proposal: cmd/go: ignore directories containing `.goignore` in file scans Oct 2, 2019
@rsc

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2019

The discussion has meandered a bit, but the original reporter at least (@tj) is happy with the ".goignore in the directory to be ignored" solution, and this will become less of a problem as GOPATH is phased out anyway.

Does anyone object to moving forward with this .goignore behavior?

@agnivade

This comment has been minimized.

Copy link
Member

commented Oct 3, 2019

I somewhat share Josh's concern that we are pushing the responsibility of modifying Go-specific behavior to a non-Go user. But overall, it is a +1 from my side.

Happy to work on it if this gets accepted.

@robpike

This comment has been minimized.

Copy link
Contributor

commented Oct 3, 2019

Still not happy about this, and now that GOPATH is going away it seems even less suitable.

@myitcv

This comment has been minimized.

Copy link
Member

commented Oct 3, 2019

Does anyone object to moving forward with this .goignore behavior?

No objection.

this will become less of a problem as GOPATH is phased out anyway

The problem will, however, still remain for the main module (which is our use case at least).

To get the go tool to ignore them, we'd have to add a Go-specific file to a completely non-Go repository

@josharian what workflow are you thinking of where this would be necessary? External dependencies will not likely have things like node_modules committed (per @rsc in #30058 (comment)). And if an external dependency does have a large directory that contains non-Go code that should be ignored, then I think it's reasonable to ask for a .goignore to be added (by definition, the external dependency must have some Go code).

@tj

This comment has been minimized.

Copy link
Author

commented Oct 3, 2019

We could close it for now and revisit this later if it's a problem with Go modules. Now that tooling won't be scanning all of my client/server apps full of node_modules this shouldn't be a huge problem.

@mvdan

This comment has been minimized.

Copy link
Member

commented Oct 3, 2019

In an ideal world there would be a standard like a .nocode or .codeignore file that would be language+tool agnostic. I don't think that exists, so it seems to me like .goignore is reasonable.

@myitcv

This comment has been minimized.

Copy link
Member

commented Oct 3, 2019

this will become less of a problem as GOPATH is phased out anyway

The problem will, however, still remain for the main module (which is our use case at least).

I think the ordering of my response above might have given the impression that I wasn't objecting to the proposal despite the problem still remaining for the main module.

Just to clarify, the proposed .goignore will also work within the main module, hence I have no objection to the proposal.

@kaey

This comment has been minimized.

Copy link

commented Oct 3, 2019

If it's only about node_modules there is work being done to replace it with PnP https://yarnpkg.com/lang/en/docs/pnp/

@mvdan

This comment has been minimized.

Copy link
Member

commented Oct 3, 2019

Interesting, thanks for the insight @kaey. That seems to support @tj's idea that perhaps we can revisit this later if it continues to be a problem in the future.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2019

Given @tj and @mvdan's comments, it sounds like we can just do nothing for now, learn more about how modules work, and come back if we need to. So this sounds like a likely decline.

Leaving open for a week for final comments.

@myitcv

This comment has been minimized.

Copy link
Member

commented Oct 11, 2019

@rsc are you referring to @tj's comment in #30058 (comment)?

This problem remains in a Go modules world, where node_modules (or similar) exists within the main module (we have exactly that situation in our mono repo). Hence in #30058 (comment) I agreed with the proposal of using a .goignore or similar.

Therefore I would suggest it's still worth fixing.

@gocs

This comment has been minimized.

Copy link

commented Oct 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.