Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/doc: improve performance by excluding node_modules #30058

Open
tj opened this issue Feb 1, 2019 · 22 comments

Comments

Projects
None yet
@tj
Copy link

commented Feb 1, 2019

@bradfitz mentioned go doc could special-case node_modules for exclusion.

If you have many client/server projects in $GOPATH, node_modules can quickly bloat and cause go doc to resolve slowly—this is especially noticeable in editors which use go doc to display inline documentation (see vscode issue).

I recursively removed my node_modules (15gb, crazy I know), here are the results before and after:

$ time go doc something

real    2m25.068s
user    0m17.772s
sys    1m45.039s

$ time go doc something

real    0m19.955s
user    0m2.499s
sys    0m9.343s

@tj tj changed the title cmd/doc: performance hindered greatly by node_modules cmd/doc: improve performance by excluding node_modules Feb 1, 2019

@agnivade

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

/cc @robpike

@bradfitz

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

@bradfitz mentioned go doc could special-case node_modules for exclusion.

That was done as a result of #16417, which you opened in 2016 and is still open.

@bradfitz bradfitz added this to the Go1.13 milestone Feb 1, 2019

@mvdan

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

I presume this would extend the following bit of go help packages:

Directory and file names that begin with "." or "_" are ignored by the go tool, as are directories named "testdata".

Or is it fine for go doc to skip certain directories, while other commands like go build and go test wouldn't?

@tj

This comment has been minimized.

Copy link
Author

commented Feb 2, 2019

Now that I think about it, gorename took ~5-10 minutes before as well, I'll have to try it now. It does seem a little hacky to special-case directories specific to some other language/framework, but I can't say I've ever wanted to operate against anything in node_modules either so it probably wouldn't be harmful

@agnivade

This comment has been minimized.

Copy link
Member

commented Feb 2, 2019

To generalize this behavior, should we also ignore top-level directories under .gitignore ? That should take care of node_modules as well.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2019

This can't be done in just cmd/doc. If we're going to do it, it should be globally, in cmd/go, go/build, etc. (And we can't start parsing .gitignore.)

It's unfortunate to have the definition of Go packages depending on or working around problems specific to other languages. On the other hand 15 GB is pretty big.

@rsc rsc changed the title cmd/doc: improve performance by excluding node_modules proposal: cmd/doc: improve performance by excluding node_modules Feb 6, 2019

@gopherbot gopherbot added the Proposal label Feb 6, 2019

@rsc

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

If we do this, we have to write:

Directory and file names that begin with "." or "_" are ignored by the go tool,
as are directories named "node_modules" or "testdata".

That's a bit strange, to call out Node explicitly like that. Are there any other giant directories to avoid too?

We could also make it user-configurable in some way, but then the meaning of things like 'go list foo/...' or 'go install foo/...' becomes configuration-specific, which we've tried hard to avoid.

@snikch

This comment has been minimized.

Copy link

commented Feb 27, 2019

I've been watching this as I've been affected heavily in the past with this via goimports and more recently go doc.

I agree that it is unfortunate to have to call out a specific language (and all derivatives of), however in the scope of web development as a practice today with node_modules being ubiquitous for front end development, it doesn't seem unreasonable.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

I've also had this problem with Carthage. Looking at my .goimportsignore, I also see a few packages that are Go wrappers around large C projects that have lots of unrelated code and that do in-place builds, leading to large numbers of build artifacts.

@agnivade

This comment has been minimized.

Copy link
Member

commented Feb 28, 2019

Some more discussion here - #16427. I agree that we should make it user-configurable. Hardcoding node_modules just seems odd to me.

At the expense of adding new knobs, I think having a .goignore file will help a lot (given that we already have .goimportsignore). And that file can serve as a common base which can be inspected by all other tools (go doc, gorename, goimports).

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

I agree that adding .goignore would be better than special-casing node_modules.

@robpike

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

Please don't create more dot files. If the tools need configuration, use a single file.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

The suggestion is basically to rename .goimportsignore to .goignore so there would be a single file.

It occurs to me that if we adopt #30411 then another possible approach would be to use an environment variable.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

What about

go env -w GOIGNORE=

(as Ian suggested)? This would apply to go doc and so on but not to the extraction of modules from your Git repos. If the files go into your git repos, they will go into the module zips.

We've been circling various options here for a while. This seems like the least bad.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

go env -w GOIGNORE=

Ignore lists like this tend to grow slowly over time. This is a nice fit for a text file, which is easy to edit and append to. Env vars are much less edit-friendly. Every time I type export PATH=“$PATH”: I experience a Rob-like twinge of exasperation. (Or anyway, I imagine it is Rob-like.)

@rsc

This comment has been minimized.

Copy link
Contributor

commented May 7, 2019

Ian suggests that we might also consider a special file in the directory to be ignored. For example, we could say that when the go command walks into a directory and finds a file named .goignore or .goskipdir or .gorobots.txt, it turns around and walks back out immediately. So if you needed a node_modules inside your Go module working directory, you'd just put this magic file in there too.

Thoughts? Name bikesheds?

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented May 7, 2019

That can be a bit annoying to work with for directories that sometimes get removed and then replaced with newer versions or for directories owned by a tool that then needs to be told to ignore the sentinel file.

It semi-makes sense to include the blacklist in go.mod since it would be listing directories that, though they happen to be contained in the module root, are not actually part of the module.

@agnivade

This comment has been minimized.

Copy link
Member

commented May 8, 2019

Ian suggests that we might also consider a special file in the directory to be ignored.

A major problem that I see is that it will need .gitignore'd directories to be in the filesystem. For eg. node_modules are not checked in to a repo. But in this case, we would need an empty node_modules to be present, just for that special file to reside. Does not feel very elegant.

Moreover, this will create multiple files peppered throughout the repo compared to a single file. Also, now it becomes harder to know what directories are ignored at once.

What is the advantage of this solution over having a single dot file, since both are using a special magic file anyways (which I thought was the main point of contention) ?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

Typically the directories that should be ignored don't have to live in a Go repo at all. After all, by definition they don't contain any Go code. So the advantage of a file in a directory is that only people who create that directory will need to put the file in it. If we use a single dot file, it's hard to avoid putting that file into the Go repo, but now people who create a directory that doesn't contain any Go code have to maintain a local patch to the Go repo. (That argument does not apply to .gitignore since git supports putting ignore patterns outside of the repo.)

(I'm not saying we have to do it that way, I'm just answering the question about the advantage of this approach over a single dot file.)

@mvdan

This comment has been minimized.

Copy link
Member

commented May 8, 2019

Android has a .nomedia empty file and I think it works nicely. I think it would be a cleaner solution than adding more config files or more env vars. If anyone wishes to mix Go with other tools/languages in a single repository which produce tons of files, I assume they can make sure to also add such a file as necessary.

@agnivade

This comment has been minimized.

Copy link
Member

commented May 8, 2019

Thanks for the clarification Ian.

The only issue I see is that new users cloning a node repo have to traverse inside the node_modules folder and create the special file everytime, as opposed to the folder ignore path being already there in the single-file solution. But your point on maintaining local Go patches is valid as well. I am fine with both solutions.

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

So the advantage of a file in a directory is that only people who create that directory will need to put the file in it.

I'm concerned that this won't play well with external dependencies, either managed via git submodules or via some package manager. To get the go tool to ignore them, we'd have to add a Go-specific file to a completely non-Go repository; I can't imagine other projects being happy about that. The alternative is to add the file yourself after downloading the code, but dirtying the repo will also be an annoyance; it'll cause git submodule pain, any package manager that checks directory contents for security purposes may fail, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.