Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: avoid surprising inclusion of "hidden" files when using //go:embed #42328

Open
Merovius opened this issue Nov 1, 2020 · 114 comments

Comments

@Merovius
Copy link

@Merovius Merovius commented Nov 1, 2020

This is forked off from #41191 to talk about a specific issue with the design as-accepted. #42325 and #42321 talk about the same problem, but they are very solution-focused and I think it is more appropriate to discuss the actual problem first.

Namely: @carlmjohnson has pointed out that the //go:embed directive includes .DS_Store files under MacOS. This is certainly working-as-designed, but I think it should be discussed whether that's actually the semantic we want. I want to talk about the .DS_Store example specifically, but there are other dot-files with similar properties. .DS_Store does illustrate something significant though, because it is a directory that is (AIUI, I'm not a Mac-user myself) created non-interactively by a third-party software in every subdirectory. So it is not explicitly created by the user and it is permanent (and will be re-created at some point, if deleted).

There have been several suggestions made so far, which IMO are deficient in one way or another:

  • Manually clean up before go build or live with inclusion of any such files. IMO this puts unreasonable expectations on users. No matter how much we feel that in an ideal world, this should be what happens - I just don't think it is what will happen, in practice.
  • Run git clean before go build. This is certainly useful in CI/CD, but during regular development, it would also delete progress made. It's certainly not something I'd want to run before every go test or go build. In CI/CD it is useful, but it will probably still be forgotten by many people - though it's also far less of a problem, because the chance of pollution is low.
  • Add ways to exclude specific files. IMO .DS_Store is something that ~always should be excluded and as others have pointed out, there are many other patterns of files that should be excluded and would have to be listed. In effect, this would mean that ~every //go:embed directive would have to list a non-canonical, long list of exclusions, to cover any tools used by people. It also still suffers from the problem that people have to know about the problem in the first place. So IMO it would still end up with accidental inclusions of .DS_Store and similar files.
  • Make * not match dot-files. This isn't really helpful either, as even then, if a directory is named (either directly as //go:embed assets or indirectly via a glob), we would still recursively include all subdirectories, including .DS_Store.
  • Not include dot-files at all, unless explicitly mentioned. It can be argued that this is a "dirty" approach, because dot-files aren't actually special and I would agree with that. OTOH, it's IMO a) the approach leading to the least accidents with the lowest impact in actual practice and b) when an accident happens, it can be debugged and fixed most easily (simple testing will show the dot-file to be missing and the docs can clearly say that they have to be mentioned explicitly).

There are probably other approaches that can be discussed. It would also be a valid answer to close this as WAI. But if we release go 1.16 with embedding, we get locked into the semantics we implemented, so IMO this should be closed one way or another before releasing it into the wild.


Discussion summary (as of 2020-11-21, 10:00 UTC)

This is my best attempt at a fair and unbiased discussion-summary. It is necessarily subjective, though, so apologies if I left something out or misrepresented someone.

The resulting proposal from the discussion which is currently marked as "likely accept" is to have globs match as it currently does, but to exclude .* and _* files (editors remark: "as the go command" probably implies testdata as well) from recursive directory walks when a directory is given explicitly.

There are a couple of open questions:

  • Will hidden files also be excluded if a directory is matched by a glob, or just if it is mentioned by name (comment by @mpx)? Consensus seems to be that it would apply to globbed directories as well - that is, "expand globs first, then recursively walk every directory given with hidden files/dirs skipped".
  • Does "hidden" also apply to extended filesystem attributes? (comment by @andrius4669)? There was one comment in favor by @Merovius and one comment against by @inliquid.
  • Should _* and testdata really be skipped (comment by @mpx)? The case for them comes down to consistency with go build and seems significantly weaker than for .*.

There where some alternatives suggested:

  • Add a ** matching operator to filepath.Glob - don't leave out hidden files when walking directories (comment by @ianthehat). @rsc remarks that this might be unexpectedly unspecified. Also, the presence of ** would also make it easier to specifically match all hidden files, if they are desired (comment by @Merovius) so can be argued in favor of this proposal as well.
  • Provide a simple way to test for accidental file inclusion, leave semantics as-is (comment by @nightlyone). The main argument against that is that it requires knowledge that this should be tested (comment by @Merovius).
  • Provide a wrapper-fs that filters out undesirable files (comment by @seankhliao). In addition to the same argument as against tests, even the embedding of undesired files might be harmful, not just their use (comment by @SamWhited)
  • Remove support for recursive directory inclusion for now, leave globs as-is. A separate go-generate tool can create complete file-listings and we can experiment syntax/semantics using such a tool during the next cycle (comment by @ianthehat).
  • Remove both recursive directory walking and globs, only allow explicit lists (comment by @mpx).

There where some counter arguments:

  • It is inconsistent and confusing that hidden files are included when using * but not in directory walks (comment by @dcormier). The proponents response is that we agree, but it might still be better than the current alternative. We need to make a tradeoff (comment by @Merovius). @SamWhited posted examples of where they embedded dot-files, which might be useful to inform that tredoff.
  • We need to have an escape hatch for "everything exactly as on disk" (comment by @seankhliao). The best escape hatch available so far is to use go-generate to create a list (comment by @mvdan), there might be the need for a more convenient one, if this comes up often enough.
@mvdan
Copy link
Member

@mvdan mvdan commented Nov 1, 2020

Thanks for filing this issue. In my comment in the original proposal I did suggest filing a new issue from the point of view of a bug report, so I agree with you that we should begin with the problem and not multiple potential solutions. I hope this doesn't mean I get more thumbs down reactions :)

I fully agree that this needs a decision before 1.16, even if the decision is that we're okay with the current semantics. I also tend to agree that excluding files by accident is less harmful than including them by accident, because the former is easy to spot but the latter could go unnoticed for a long time.

I'm still uneasy about introducing the notion of "hidden files" in the Go toolchain, but they do have a sort of precedent:

$ go help packages
[...]

Directory and file names that begin with "." or "_" are ignored
by the go tool, as are directories named "testdata".

Perhaps we could copy the same rule here. * is about files and not packages, but since builds happen inside package directories, I think it could make sense to be consistent. And the notion of "ignored filenames" by the toolchain would remain easy to remember.

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 2, 2020

I think that would be a fine approach (as long as explicit listing or a .*-glob would still allow you to include them). Just to be clear though:

* is about files and not packages

I want to make sure we are in agreement that this isn't just about globs specifically. As I said, I think //go:embed assets also shouldn't embed assets/.DS_Store (for example), even though it doesn't contain a glob.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 2, 2020

I think this issue has made a very strongly case for "fail safe" instead of "fail unsafe but document/provide an escape hatch" semantics.

One additional thought: if we get the semantics "wrong" in 1.16, would it be possible to revise them in 1.17 with a go.mod directive? If so, that to me suggests doing something "conservative" in 1.16 and revising to be "liberal" in later versions of Go if the conservative thing was found to be too conservative.

@dmitshur dmitshur added this to the Go1.16 milestone Nov 2, 2020
@dmitshur
Copy link
Member

@dmitshur dmitshur commented Nov 2, 2020

CC @rsc.

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 3, 2020

@Merovius agreed. The rule would limit both recursing into directories, and the * glob. I think any other glob shouldn't be affected, so that one may use globs such as .* or _*.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 3, 2020

Icon\r isn’t dotfile, but it is hidden. Should Go include it?

@ianlancetaylor ianlancetaylor changed the title embed: Surprising inclusion of "hidden" files proposal: cmd/go: avoid surprising inclusion of "hidden" files when using //go:embed Nov 3, 2020
@gopherbot gopherbot added the Proposal label Nov 3, 2020
@ianlancetaylor ianlancetaylor modified the milestones: Go1.16, Proposal Nov 3, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Nov 3, 2020
@zephyrtronium
Copy link
Contributor

@zephyrtronium zephyrtronium commented Nov 3, 2020

In a similar vein, any file can be marked hidden on NTFS. Should //go:embed read filesystem attributes, even on a Linux host compiling code on an NTFS volume, to ignore thumbs.db?

@andig
Copy link
Contributor

@andig andig commented Nov 4, 2020

For sake of completeness: we need the ability to exclude files on purpose, thunk assets.go. Might also solve excluding dotfiles.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 4, 2020

#42325 was closed to focus discussion here.

The proposal there was to add a magic comment for negative globs, like //go:embed-exclude *.go. The objection was that it would lead to a lot of OS/IDE specific exclusions being listed and still miss some things that should filtered when encountering a new environment.

My current feeling is that a combination of approaches should be taken:

  • Dot files should be excluded unless specifically included. (Definitely do this.)
  • Hidden files should probably also be excluded to the extent that we can figure out what files are hidden on the system. Since hidden files this is system specific, it may not always work properly, e.g. across a network. (Maybe do this if it can be made portable.)
  • A way to specifically exclude files might be nice. (Can be left for Go 1.17 if we exclude dotfiles now.)
  • Go source files could be automatically excluded at the cost of losing a few blog posts about easy Quines. (Probably do this unless there is a compelling usecase.)
@Merovius
Copy link
Author

@Merovius Merovius commented Nov 4, 2020

@andig I don't agree that that's a necessity for the usecase you mention. There is no need to have assets.go in the same directory, you can have it in a directory further up and use a StripPrefixFS (if it's not in the stdlib, it can always be provided by a third party).

@nightlyone
Copy link
Contributor

@nightlyone nightlyone commented Nov 5, 2020

A well tested approach, see https://docs.docker.com/engine/reference/builder/#dockerignore-file , is providing something like a .goembedignore file. The .gitignore file is another such example.

Those are well understood semantics, easy to share via dotfile collection, easy to configure in template repositories, easy to add to UIs for repository creation like github.com or gitlab.com as well as repository compliance checkers often found in enterprises can easily verify its existence, syntax and content.

The extensions star-star-match, match-exceptions as well as the ability to add comments mentioned in the .dockerignore documentation also proved critical to practical use.

Adding such information to each mention of go:embed using wildcards seems unpractical to me and too easy to forget.

This problem has been solved pretty well before and I don't think Go needs yet another way to be special here.

Followup questions now are:

  • Where are those .goembedignore files allowed?
  • Should the go:embed directive allow mentioning such a file?
  • Can mentioning this file to go:embed pragma be the only way that this information is passed?
  • Do we still need sensible defaults then or is explicit better than implicit here?
@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 5, 2020

Interesting idea. Maybe the default is to exclude dot files and .go files and then tweaking the go.embedignore file for a module could change the parameters to something else. It’s an interesting idea that could wait for Go 1.17 if the window for making it into 1.16 is too narrow now.

@nightlyone
Copy link
Contributor

@nightlyone nightlyone commented Nov 5, 2020

@rsc / @dmitshur For Go 1.16 I would suggest to revert the ability to add wildcard based trees to go:embed until this issue is solved so we can explore the solution space without releasing a feature with a potential security risk given such precedents in Docker and git before each had a way to ignore certain file patterns recursively.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 5, 2020

Even without a wildcard pattern, the problem of including a directory that has an unexpected file will be there.

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 5, 2020

Personally, I'm opposed to .goembedignore and go:embed-exclude in general, for a similar reason why we rejected a .goignore in #30058. Here, you have two options (besides my proposal in #42328 (comment)):

A) To only include a subset of the files, use go:embed on a subdirectory and place them there.

B) To exclulde some sub-directory if option A isn't desirable, you could always drop a go.mod in there to exclude the directory from the current module.

@mpx
Copy link
Contributor

@mpx mpx commented Nov 5, 2020

The files referenced by //go:embed are effectively part of the build, just like *.go, *.c, etc.. Ideally it would be similarly easy to understand which embedded files are included. The rule to determine which files are included should be simple and avoid surprising results.

Excluding all files starting with "." seems simplest and least prone to confusion. This avoids including files which typically aren't visible in the directory, and entries which typically related to other system purposes (eg, .DS_Store). build.Context.Import also excludes _, so it might be reasonable to exclude it here as well.

Perhaps this rule could be extended to include dot-files when the pattern/filenames explicitly starts with ".". Another option would be including explicitly referenced files. Eg: //go:embed data data/.mysecrets. However, I suspect few codebases would be motivated to use this option.

@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented Nov 10, 2020

I think it's counterintuitive to exclude certain files, especially when I specify a directory, I expect everything inside it. Furthermore, most (all?) of the existing tools do not have such behaviour.

From the original issue description, I would argue it is fine for in-development builds to contain extra files such as .DS_Store, building from a clean repo for release is something we should promote and is more or less equivalent to what already happens for go get / go install

@andig
Copy link
Contributor

@andig andig commented Nov 10, 2020

I really dislike the inclusion of hidden files. However, this is what http.Dir for example says:

Note that Dir could expose sensitive files and directories. Dir will follow symlinks pointing out of the directory tree, which can be especially dangerous if serving from a directory in which users are able to create arbitrary symlinks. Dir will also allow access to files and directories starting with a period, which could expose sensitive directories like .git or sensitive files like .htpasswd. To exclude files with a leading period, remove the files/directories from the server or create a custom FileSystem implementation.

@ianthehat
Copy link

@ianthehat ianthehat commented Nov 21, 2020

Except you don't really want "whatever file I drop in there", you want "whatever file I drop in there unless it is a file I don't want" and coming up with a heuristic for that last part that we can all agree on and easily explain seems to be the sticking point.

My position is:

  • We should keep glob behaving exactly like a filepath.Glob except reserving any syntax we might want to mean something different in the future, like ** or {alternatives}, which filepath.Match currently accepts (I think it treats ** as just two stars, and {} as normal characters)
  • We should not put any kind of recursive directory in this version, and we should experiment with what that should look like during the next cycle using a tool that expands experimental syntax into the embed form already understood.

This gives us a useful embed feature this cycle that covers a lot of cases well, and is powerful enough to cover all cases with outside help but with less clean UX. It leaves us time and space to expand the feature to a clean syntax for recursive directories once we are more sure about the use cases and have tried it out.

@mpx
Copy link
Contributor

@mpx mpx commented Nov 21, 2020

The easy choice should be safe - I think this is an important design principle. Developers will start with the easy option and won't necessarily recognise the implications.

Currently, developers can be confident that go build includes what they want:

  • Only files matching valid patterns are included (eg, *.go, *.c,..)
  • Tooling is extremely unlikely require or add unwanted source files matching those patterns.
  • All source files are visible (.unwanted.go is excluded).

It would be unacceptable if developers needed to clean their source trees before go build to obtain a binary that behaves consistently. Again, requiring a clean step isn't the easy option and the consequences aren't immediately obvious - so it won't happen.

Individual authors can't make guarantees about other developers or their machines. Even if a single developer believes they are disciplined enough not to get burnt, this won't necessarily apply to the majority of the community, or other people who may compile their code. Eg, a Windows developer may not consider excluding .DS_Store important, but maybe the Mac developer compiling their code does? Do VS Code developers realise that Vim leaves .foo.swp files in the current directory? Have they deliberately handled this in their code?

If only globs are supported (without recursion), then the "easy" option will be dir/* - which doesn't solve this issue since developers will choose the easy unsafe option.

Excluding hidden files seems like an imperfect compromise that addresses most of the problem outlined here while still being "easy" for the common cases.

However, if this is unacceptable, I think it would be much better to make file selection entirely explicit (no globs, no recursion). This way builds will only include the files that the developer intends. TBH, I would easily entertain an argument that being 100% explicit is the only acceptable choice to ensure builds match intent. This is also very simple, and leaves more options open for the next development cycle.

A separately maintained filelist could be added to simplify the process via //go:embed @filelist (or similar). Either way, some consideration should be given to handling many files in a single embed.FS (probably next cycle).

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 21, 2020

I edited the top-post to include a summary of the discussion so far. I would kindly ask everyone to refrain from simply re-stating or re-wording things that are already mentioned there, in the interest of keeping the discussion focused :)

@ianthehat Thank you for clarifying. I agree that removing directory walking for now is a safe compromise, should the exclusion of hidden files be considered unacceptable. Personally, I still feel that the semantics as detailed by @rsc are the best ones (pending answers to some of the open questions), but I could live with settling on something people feel safer with.

@mpx I personally feel also excluding globs would go too far, though. Especially as it seems uncontroversial that a glob should mean "everything matched as with filepath.Glob".

@SamWhited Apologies, I seem to have missed your comment until I compiled the discussion summary. Thank you very much for looking at practical examples :) May I ask, in those cases, how bad (for lack of a better word) would it have been to embed them with the semantics as proposed? Could there maybe have been a single glob that still matched all interesting files? Leaving aside, for the moment, the question of how confusing it would have been to see them excluded at first - only asking whether the mechanisms as proposed would still be sufficient to at least heal the damage.

To be clear, while I think it makes sense just to include all files, I could even go either way. Ignoring dotfiles would be fine, let's just not make it even more confusing by including them sometimes and ignoring them other times and having 'dir' be different from 'dir/*'.

As I understood Russ, "globs should match exactly like filepath.Glob" should be considered a line in the sand right now (unless you can make a really strong case against it) and under that constraint, I think one sadly implies the other. i.e. dir/* has to match dir/.foo, so we can't both have some "hidden files are excluded by default" semantic and have dir and dir/* behave the same way.

Personally, FWIW, I'm with @carlmjohnson, I don't expect dir and dir/* to do the same thing. In fact, the difference isn't even subtle at all - if dir contains itself a directory, ls dir/* will also expand that subdir, while ls dir will not. So I honestly don't understand why people would expect those two to do the same and I'm personally okay with people having to learn the difference between them. I still believe the semantics as proposed are very simple to document clearly and remember, so any confusion should happen at most once.

@SamWhited
Copy link
Member

@SamWhited SamWhited commented Nov 21, 2020

in those cases, how bad (for lack of a better word) would it have been to embed them with the semantics as proposed? Could there maybe have been a single glob that still matched all interesting files?

For the templates that render to files example I don't think there's currently a way to match only the templates other than including the whole directory tree because they may also be multiple directories deep. You'd need to match **/*.tmpl and include dotfiles. With my current tool I just tell it to include the whole directory and accept the chance that if I'm on certain operating systems I'll have to be more careful. I think "embed tmpls/*" would match my current behavior though, so after I inevitably get confused by the missing templates when I did "embed tmpls/" I could add the * and be fine :)

As I understood Russ, "globs should match exactly like filepath.Glob" should be considered a line in the sand right now

I tend to agree that matching filepath.Glob makes sense and not doing so would be confusing personally. This is partially why I think "hidden files excluded by default" isn't actually a good default (though as I said, I'm more concerned with consistency than the exact behavior).

I don't expect dir and dir/* to do the same thing. In fact, the difference isn't even subtle at all - if dir contains itself a directory, ls dir/* will also expand that subdir, while ls dir will not.

In the sense that we're saying it should be the same I think those two things are doing the same thing.

ls is performing some action on all of the things in dir when you use the glob so it makes sense that ls would expand the subtree. However you would expect that ls dir and ls dir/* contain the same top level items. If one suddenly was missing dot files and the other contained them, we're in the exact same position where this would be confusing.

Thanks for the excellent summary of the discussion so far!

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 21, 2020

I think those two things are doing the same thing.

I could not disagree more. They return completely different results. Arguing "it's not like one of them returns hidden files and the other doesn't" seems like an extremely narrow definition of "doing the same thing".

FWIW, I was considering adding a clause that the behavior of ls dir vs. ls dir/* matches our proposed semantics surprisingly well (you can see it in one of the edits of the comment). i.e. you can explain the difference between them fully by saying "ls expands globs and if that matches a directory, it lists the files in there" while we propose "embed expands globs and if that matches a directory, it uses recursively walks it, skipping hidden files" - i.e. both can be decomposed into "globbing builds a list and then we do something for directories". I decided to remove that for exactly the reason I say - it is motivated reasoning. It is pretending that some commonality is the defining feature, while the differences are to be ignored.

The fact of the matter is, ls dir and ls dir/* provide different very output. The complaint was "//go:embed dir should do the same thing as //go:embed dir/* and I still don't understand where that complaint is coming from, since these are never equivalent (that I can think of).

@SamWhited
Copy link
Member

@SamWhited SamWhited commented Nov 21, 2020

seems like an extremely narrow definition of "doing the same thing".

It is a narrow definition, but it's the same definition we were using for embed.

I think we're arguing similar things with slightly different wording though, and I'm unsure how to fix it so let me tweak what I'm saying and hope it makes it clearer: I am not suggesting that embed dir and embed dir/* should provide exactly the same output. The second would not have a root directory in the filesystem called "dir". I am suggesting they should have the same files in exactly the same way that ls dir and lsdir/* show different output, as you said, but list all the same files and don't hide any arbitrarily just because I used a glob. In that case the glob makes ls show more files because it was never operating recursively, in our case we're operating recursively either way, it's just what level we're starting at that's different. I don't expect ls to show me different types of files depending on whether I used a glob or not, so I don't see why I should expect embed to.

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 21, 2020

but list all the same files and don't hide any arbitrarily just because I used a glob.

But it does (in most shells). It's just that you don't notice because ls also hides them. But ls -a dir/* does not show hidden files, and ls -a dir does. The semantics of filepath.Glob and the common implementation of * in shells differ in that regard. At the end of the day, it's really one mechanism that expands the glob and it's another mechanism that decides what to do with matched directories. And what they do is orthogonal.

Again: dir and dir/* usually behave very differently. Empirically, if you use them, you should expect different results. I don't think I could ever agree to an argument predicated on calling their behavior the same. It's cherry-picking.

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 21, 2020

I don't expect ls to show me different types of files depending on whether I used a glob or not, so I don't see why I should expect embed to.

FTR, this is not the proposal. You won't see different files depending on whether you use a glob or not. //go:embed dir/* and //go:embed dir/a dir/b dir/.c will behave exactly the same.

@SamWhited
Copy link
Member

@SamWhited SamWhited commented Nov 21, 2020

But ls -a dir/* does not show hidden files, and ls -a dir does

Ooh, okay, fair enough, I am wrong to use the ls comparison because apparently it still doesn't work the way I thought it did. However I think this whole thing still illustrates why they should be the same: I've been using ls dozens of times a day for many years and apparently am still confused about when it does and does not display things or how the behavior works.

FTR, this is not the proposal. You won't see different files depending on whether you use a glob or not. //go:embed dir/* and //go:embed dir/a dir/b dir/.c will behave exactly the same.

That's not what we were comparing though, or am I still missing something? I do understand that if I manually list all the files that would be in the glob, it would be the same as just using the glob. After re-reading all these comments several times I am still under the impression that if you try to embed a directory that looks like this:

$ tree -a dir/
dir/
├── a
└── .b

using //go:embed dir/* the virtual filesystem will include "a" and ".b" but if you use //go:embed dir/ the virtual filesystem will only include "dir/a". That is what I've been comparing.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 21, 2020

I think it is important to discuss now, because the files that are included for an existing valid embed line can never be changed, so we need to make sure that what we allow now is compatible with our long term plan and anything not compatible is a compile time error right now, no matter how much more awkward it makes the feature in the short term.

//go:embed requires the use of Go modules, so theoretically if we decide after the fact that the embedding semantics chosen are unsuitable, they could be changed by adding a go 1.17 directive with different rules. Obviously, that's kind of ugly and should be avoided if we can help it, but it does take a little pressure off.

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 21, 2020

Ooh, okay, fair enough, I am wrong to use the ls comparison because apparently it still doesn't work the way I thought it did.

FWIW, the ls comparison came from me, to illustrate why I don't understand the expectation that dir and dir/* should be the same. And ls works exactly as you think it does. It's just that there are two mechanisms - one of them is the shell-expansion of globs and one of them is what the tool does with the expanded list. [edit] I could've also used tar BTW, to illustrate a tool that by default does include hidden files when you pass a directory - and thus, in a sense, does the exact opposite of the proposed mechanic. Just to illustrate how orthogonal the mechanic of globbing is to what is actually done to the expanded list [/edit]

I've been using ls dozens of times a day for many years and apparently am still confused about when it does and does not display things or how the behavior works.

And yet, you are not constantly confused and frustrated when using it, are you? Theoretically it should be confusing, according to the argument. But in practice, it does not actually impede you in any way. I would say this shows that there is something wrong with that argument.

That's not what we were comparing though, or am I still missing something?

That is the difference between using a glob or not using a glob. What you are referring to is the difference between listing all the files in a directory and listing the directory itself. That was my point. That is the difference between dir and dir/* - one passes a directory to the tool, the other passes a list of its content to the tool. The tool has no clue whether or not that list was created by a glob or not.

@SamWhited
Copy link
Member

@SamWhited SamWhited commented Nov 21, 2020

And yet, you are not constantly confused and frustrated when using it, are you?

No, it's worse than that: apparently I have been thinking things did or did not exist and silently doing things wrong for years (probably, who knows).

one passes a directory to the tool, the other passes a list of its content to the tool. The tool has no clue whether or not that list was created by a glob or not.

I understand that part, that's fine. What I'm suggesting is that regardless of whether the list is created by a glob or not the files that get included should be the same. The only difference is where they're included (in the dir/ directory or just straight in the filesystem).

@hherman1
Copy link

@hherman1 hherman1 commented Nov 21, 2020

I read //go:embed dir As “embed the contents of this directory” and //go:embed dir/* as “embed everything within this directory” which I expect to be the same set of things. I think there’s no getting around that this proposal makes embedding rules more complicated in an unintuitive way, after all it takes more lines of text to explain how it functions.

I think the question is, whether or not this extra complication is worthwhile. I think some good arguments have been put forward that the extra complication is worthwhile, although I’m of the opinion that it would be nice to keep this api as simple as possible. I think keeping the api simple makes it easy for programmers to predict what’s going to happen when they use it, and understand it with low cognitive overhead. I think the complexity diff is low in this instance, but complexity is like a bundle of hay, one too many straws and you’ll break the camels back :)

@tv42
Copy link

@tv42 tv42 commented Nov 21, 2020

@Merovius Bash globbing does not include hidden files by default. The fact that ls foo and ls foo/* look similar is not from ls, echo foo/* hides hidden files too.

From man bash:

When a pattern is used for pathname expansion, the character ".'' at the start of a name or immediately following a slash must be matched explicitly, unless the shell option dotglob is set. The filenames ".'' and "..'' must always be matched explicitly, even if dotglob is set.

I think including hidden files in foo/* will hurt people. All this discussion is demonstrating to me is that filepath.Glob is not flexible enough to handle hidden files well; it seems to be based on a plan9ish design, for a world where hidden files are not used. Yet the world is not plan9, hidden files won, and users expect them to be hidden.

@neild
Copy link
Contributor

@neild neild commented Nov 22, 2020

The one thing that I am convinced of by the above discussion is that there is no universally obvious choice for handling the intersection of globbing and recursion with system files, dotfiles, and other special files. Any choice will be surprising, inconvenient, and/or dangerous to at least some users.

As such, I think @mpx has the right idea: The simplest choice is to explicitly list every file that should be included. It is trivial to write a program (possibly executed with go generate) to generate this list according to whatever rule is desired. This is not the most convenient approach in all circumstances, but it is always clear.

Explicit listing of all files, with no support for globs or recursion, is also the approach which conserves the most space for possible future expansion. Globs can be added in the future; changing globbing rules once added is more difficult.

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 22, 2020

I'm still unconvinced that "it's confusing that dir and dir/* behave differently" is an actual problem. I feel that they are traditionally completely unrelated and inconsistent between tools, yes, but we're all still using them merrily in other contexts with at worst minor annoyances. And I'm still unconvinced that skipping hidden files in directory walks will elicit anything but a 5m confusion the first time it happens unexpectedly. But arguing about it further does nothing but reinforce the impression that it's an enormously contentious issue - IMO everything has been said about it.

Personally, I'd rather leave the current semantics as they are than crippling //go:embed by removing globs or directory walks form it altogether. Accidental inclusion of hidden files is a negligible issue, compared to requiring go-generate for even the most basic use cases of //go:embed.

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 22, 2020

@tv42 It's not clear to me, what concrete action you are arguing in favor of. Saying filepath.Glob isn't flexible enough might or might not be true, but it doesn't imply a concrete solution to this issue.

@mpx
Copy link
Contributor

@mpx mpx commented Nov 22, 2020

There isn't that much time until the Go 1.16 beta is planned for release (1st Dec?). Rushing design and implementation makes it more likely the project (or parts of the community) will regret the outcome in future.

A partial solution now (globs only) is more or less equivalent to encouraging globs that include hidden/unwanted files, effectively passing on solving this issue.

Why enable a shortcut that causes non-deterministic builds?

Expanding on the suggestion above, I think external filelists could make generation trivial (find static -type f > webfs.list, //go:embed @webfs.list), and perhaps mitigate complexity concerns with go generate. File lists have several advantages:

  • Unwanted files would not be included (eg, .DS_Store) - solving this issue
  • It's just as easy to include hidden files as not 🙂
  • Builds would fail when required files are missing
  • Builds would always match the developer intent, and avoid unknown incidental variation
  • It is much easier than generating more complex source/pragmas
  • The "easy option" (generation) is safe for everyone, authors and other developers
  • Contents of virtual filesystems are obvious during code review
  • Developers can generate, or hand manage the file list anyway they choose.

Perhaps there are other alternatives that aren't as controversial as including or not including hidden files, or improve on filelists? I suspect it's past time in the development cycle to work out and implement a good solution now. Keeping it simple keeps those opportunities alive for Go 1.17. Go 1.16 will still let people embed arbitrarily complex filesystems, even without globs/recursion.

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 22, 2020

I don't agree with your list of advantages. You seem to be assuming that the generator is magically perfect. In practice, it will be just as complicated, if not worse, than the current directives - with the added downside that you need to invoke it in an extra step. In more detail:

It's just as easy to include hidden files as not :)

The current semantics are captured by find dir -type f, which is the option you mention. That's already worse than what we have currently and I can't think of a super easy way to exclude hidden files. So I don't think it's "just as easy".

Unwanted files would not be included (eg, .DS_Store) - solving this issue

I don't have a Mac, but find dir -type f (which developers unaware of this issue would likely use) does include hidden files. It makes it more obvious by putting it in the file, though, I can give you that.

Builds would fail when required files are missing
Builds would always match the developer intent, and avoid unknown incidental variation

Maybe. I think in practice, "the build" would just become a two-step process of running the generator and then running go build and you get the same situation you have now - the expanded list will be derived from some pattern put into the generator and that's the list declared "required" and whether or not that matches intent is just as unclear as right now.

It is much easier than generating more complex source/pragmas

You are shifting that complexity unto developers though, who now have to hand-roll their own.

Developers can generate, or hand manage the file list anyway they choose.

If you want to hand manage your file list, you can already do that (just don't use globs and don't list directories).

Perhaps there are other alternatives that aren't as controversial as including or not including hidden files, or improve on filelists?

I can't speak for other people (I'd be genuinely interested in the opinion of @carlmjohnson et al on this) but as far as I'm concerned, the idea of keeping the current semantics seems like it should be far less controversial than removing globs and lists altogether.

The way I read this discussion, the people who are against the exclusion of hidden files seem to feel far more strongly on the matter, than the people who are in favor. Personally, I am of the opinion that excluding them is the better option and I'm willing to argue that opinion - but if it doesn't happen, I won't care that much. It's ultimately a very minor detail to the feature.

But if my arguing for their exclusion is taken as an indicator that any sort of globbing or recursive inclusion is far too controversial to do, that would make me unhappy. Because then my efforts are being twisted to make the feature significantly worse for everyone.

@josharian
Copy link
Contributor

@josharian josharian commented Nov 22, 2020

@neild a related option is to only allow globs with concrete file extensions (lastindex(*)<lastindex(.)). Then people can write *.css *.js *.html *.png, which seems fairly usable. In practice I believe the intersection of likely extension-containing globs and likely sensitive hidden files is almost zero.

@tv42
Copy link

@tv42 tv42 commented Nov 22, 2020

@Merovius If I had to make a concrete suggestion, it'd be this: Postpone glob support until they can be added with semantics that are not surprising to users. Start by supporting 1) explicit filenames and 2) directory recursion that ignores hidden files (and _ prefix too, if that's wanted).

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 22, 2020

@tv42 Thank you for clarifying.

@hherman1
Copy link

@hherman1 hherman1 commented Nov 23, 2020

How many users are actually going to care about excluding .DS_store? It seems like a niche need

@mpx
Copy link
Contributor

@mpx mpx commented Nov 23, 2020

How many users are actually going to care about excluding .DS_store? It seems like a niche need

How many Mac developers open directories in Finder?

There are many tools that create dotfiles, temporary or otherwise, and for purposes that haven't been imagined so far - this issue clearly indicates that people are concerned about it. I know the tooling I use will interact poorly if hidden files are embedded. I don't write all the code I compile - so I don't get to choose whether globs are used or whether the author has enough awareness to understand the issue.

"the build" would just become a two-step process of running the generator and then running go build

I don't believe so. Using globs means that the list of files is evaluated every build, but is that really necessary or useful? I haven't found that to be common or desirable with any of the existing tooling.

Personally, I've found the lists of embedded files change very rarely compared to editing code, or running go build.

Hence, evaluating globs every build typically saves very little development time in practice, but it encourages a pattern that can result in non-deterministic builds. That seems like a poor tradeoff to me, why optimise it?

I undestand that some developers might find updating explicit file lists onerous, but there are options to simplify it in a later release without compromising build integrity. //go:embed is already a huge improvement over the existing tooling - it doesn't need globs or recursion to be successful.

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 23, 2020

@mpx

I undestand that some developers might find updating explicit file lists onerous, but there are options to simplify it in a later release without compromising build integrity.

We already have two perfectly reasonable options on the table - one of which already made it through the full proposal process without issues. I see absolutely no reason why their proponents would be able to convince the opposition of that in a later release, if they can't do it now. And I see no reason to believe that there are similarly good options that are more acceptable to them.

//go:embed is already a huge improvement over the existing tooling - it doesn't need globs or recursion to be successful.

I strongly disagree. I think it will be very poorly received without them.

@mpx
Copy link
Contributor

@mpx mpx commented Nov 23, 2020

We already have two perfectly reasonable options on the table - one of which already made it through the full proposal process without issues.

If the original proposal was "perfectly reasonable" this issue wouldn't exist, or be so controversial. I thought the recursion compromise would be good enough too, then I realised (comment 1, comment 2, comment 3):

  • there are more unpleasant implications than I initially understood
  • globs/recursion save less effort that I initially anticipated
  • there are probably alternatives that strike a better balance (and not enough time to consider them)

//go:embed is already a huge improvement over the existing tooling - it doesn't need globs or recursion to be successful.

I strongly disagree. I think it will be very poorly received without them.

This is why I think it would be useful seeing wider community experience before locking in a hasty decision. We're all making some assumptions. There has been no significant use of //go:embed in production/day-to-day development yet. Taking time to consider options certainly doesn't preclude any of the ideas covered so far in this issue, others yet to come, or further improvement. I'm sure a better decision can be made with the benefit of real experience.

@dcormier
Copy link
Contributor

@dcormier dcormier commented Nov 23, 2020

It is inconsistent and confusing that hidden files are included when using * but not in directory walks (comment by @dcormier). The proponents response is that we agree, but it might still be better than the current alternative.

It completely disagree that the inconsistent behavior will be better than the current alternative (* including all files). It will result in people trying to figure out what incantation they need to utter to get files included consistently. To figure out why it included different files in one place than in another.

Either include dotfiles when they're not explicitly listed or don't. Just do it consistently.

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 23, 2020

We're fast approaching one hundred comments in this thread, so I think it's time to slow down again and let the original proposal authors - Russ and Brad - catch up and chime in. @Merovius has been kind enough to summarize the thread so far in the original post, too.

@ianthehat one detail I think you might have missed is this comment. From the user's perspective I completely agree with you that explicit extended globs would be better than implicit ignoring of filenames, so perhaps you could clarify what precise definition of ** you're thinking of.

@ianthehat
Copy link

@ianthehat ianthehat commented Nov 23, 2020

@mvdan I didn't miss it, I was just trying to address the meta question instead.
I am not proposing we clarify any definition right now, I am proposing we disallow its existence at all in embed lines right now so we can experiment in an external tool to see what works for people.
My preference would be to only allow /**/ which matches any depth of directories (including 0) but nothing else, as I think that covers the most common cases succinctly while avoiding many of the awkward corner cases, but again, I don't think we should be deciding that or trying to implement it right now, I think we should just make sure we have reserved the space to allow it in the future.
At this point I am mostly on the side of only allow explicit file embeds, and have another tool to maintain the (possibly very long) lines of embed statements this implies. In my experience the set of files I want to embed changes far more rarely than the contents of those files, and the ability to flip between the jailed actual file system and the embedded file system would cover the development cycle cleanly anyway, so regenerating the embed lines would be a rare and thus not overly problematic extra step. It would certainly be a huge step up from what we do right now with all the existing embed tooling.

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 23, 2020

FWIW, while adding some version ** to filepath.Glob is probably feasible, with Go 1.16 the canonical globbing semantics will probably shift towards fs.Glob. And fs.Glob uses an interface, so its semantics can't be changed after release. A third-party GlobFS might get released in the meantime, only supporting the old semantics.

So, even in a later release, we can either have //go:embed support ** or have it use the canonical globbing semantics, but not both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Likely Accept
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.