-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go: avoid surprising inclusion of "hidden" files when using //go:embed #42328
Comments
Thanks for filing this issue. In my comment in the original proposal I did suggest filing a new issue from the point of view of a bug report, so I agree with you that we should begin with the problem and not multiple potential solutions. I hope this doesn't mean I get more thumbs down reactions :) I fully agree that this needs a decision before 1.16, even if the decision is that we're okay with the current semantics. I also tend to agree that excluding files by accident is less harmful than including them by accident, because the former is easy to spot but the latter could go unnoticed for a long time. I'm still uneasy about introducing the notion of "hidden files" in the Go toolchain, but they do have a sort of precedent:
Perhaps we could copy the same rule here. |
I think that would be a fine approach (as long as explicit listing or a
I want to make sure we are in agreement that this isn't just about globs specifically. As I said, I think |
I think this issue has made a very strongly case for "fail safe" instead of "fail unsafe but document/provide an escape hatch" semantics. One additional thought: if we get the semantics "wrong" in 1.16, would it be possible to revise them in 1.17 with a go.mod directive? If so, that to me suggests doing something "conservative" in 1.16 and revising to be "liberal" in later versions of Go if the conservative thing was found to be too conservative. |
CC @rsc. |
@Merovius agreed. The rule would limit both recursing into directories, and the |
|
In a similar vein, any file can be marked hidden on NTFS. Should //go:embed read filesystem attributes, even on a Linux host compiling code on an NTFS volume, to ignore |
For sake of completeness: we need the ability to exclude files on purpose, thunk assets.go. Might also solve excluding dotfiles. |
#42325 was closed to focus discussion here. The proposal there was to add a magic comment for negative globs, like My current feeling is that a combination of approaches should be taken:
|
@andig I don't agree that that's a necessity for the usecase you mention. There is no need to have |
A well tested approach, see https://docs.docker.com/engine/reference/builder/#dockerignore-file , is providing something like a .goembedignore file. The .gitignore file is another such example. Those are well understood semantics, easy to share via dotfile collection, easy to configure in template repositories, easy to add to UIs for repository creation like github.com or gitlab.com as well as repository compliance checkers often found in enterprises can easily verify its existence, syntax and content. The extensions star-star-match, match-exceptions as well as the ability to add comments mentioned in the .dockerignore documentation also proved critical to practical use. Adding such information to each mention of go:embed using wildcards seems unpractical to me and too easy to forget. This problem has been solved pretty well before and I don't think Go needs yet another way to be special here. Followup questions now are:
|
Interesting idea. Maybe the default is to exclude dot files and .go files and then tweaking the go.embedignore file for a module could change the parameters to something else. It’s an interesting idea that could wait for Go 1.17 if the window for making it into 1.16 is too narrow now. |
@rsc / @dmitshur For Go 1.16 I would suggest to revert the ability to add wildcard based trees to go:embed until this issue is solved so we can explore the solution space without releasing a feature with a potential security risk given such precedents in Docker and git before each had a way to ignore certain file patterns recursively. |
Even without a wildcard pattern, the problem of including a directory that has an unexpected file will be there. |
Personally, I'm opposed to A) To only include a subset of the files, use B) To exclulde some sub-directory if option A isn't desirable, you could always drop a |
The files referenced by Excluding all files starting with "." seems simplest and least prone to confusion. This avoids including files which typically aren't visible in the directory, and entries which typically related to other system purposes (eg, Perhaps this rule could be extended to include dot-files when the pattern/filenames explicitly starts with ".". Another option would be including explicitly referenced files. Eg: |
I think it's counterintuitive to exclude certain files, especially when I specify a directory, I expect everything inside it. Furthermore, most (all?) of the existing tools do not have such behaviour. From the original issue description, I would argue it is fine for in-development builds to contain extra files such as |
I really dislike the inclusion of hidden files. However, this is what
|
On Tue, Dec 01, 2020 at 02:22:34PM -0800, Russ Cox wrote:
Both choices are good (and bad). On balance, the choice that doesn't accidentally vacuum up potentially sensitive files seems like the better one.
Ok, thanks Russ. I certainly understand the reasoning.
|
Change https://golang.org/cl/275092 mentions this issue: |
I just hit this while working with 1.16beta1. I have a //go:embed of ~700 files, ~70 directories and where some files have a leading underscore; these are not dot files, nor go source files, and it took me some time to work out that the reason I couldn't open some of them without error was that they were excluded from the build. I've since found and read through the discussion here. Please let me know if there is a more appropriate place to provide feedback now that this issue is closed. Otherwise, please read on... The draft proposal doesn't mention this restriction (I now realize it just hasn't been updated since July), and it was counterintuitive to me that files would be arbitrarily excluded because they're considered "hidden" on some platforms. Why counterintuitive? I don't know of any other tools that automatically exclude dot or leading-underscore files, other than Sometimes we need to include everything in an embed of a directory tree. Please give us a way out! :) One suggestion is a simple flag like Right now I am debating whether to write more code and wrap my file tree in a store-only ZIP file and //go:embed that, or just use a 3rd party embed code-generator tool and eat the slower build times. Neither approach is preferred because they burden the project with an external generator. (It's been helpful during development to have the embed available as-is on the filesystem, so if using a ZIP file or generated embed.go file the content will be kept in both forms.) Another approach which I dislike is to use some encoding to escape filenames that //go:embed otherwise denies me from using naturally. That would force design changes to a larger surface area of an otherwise already complex project just to simplify the build. |
It sounds like simply including underscore files would work. I'm not aware of any situations where including underscore files would be problematic and I haven't seen any in this issue (it's highly unlikely Unfortunately providing a
@rsc Perhaps excluding underscore files could be reconsidered? |
@jpap I'm not sure, but it may help to open a separate issue to provide this feedback since this is closed and may not get attention. |
Better documentation will solve everything. |
That is easily avoided though a "--" terminator, and easily implemented using the For example, to include a file called
Using
A change that leveraged the In src/cmd/go/internal/load/pkg.go
|
This is probably the wrong place for discussion but if you just need to embed some files with underscores, a quick workaround is that you can write a wrapper fs.FS that changes file names from -whatever to _whatever automatically and then rename the files on disk to start with a dash. |
The workaround we've discussed is to write a small package main
func main() {
const base = "embed"
var dirs []string
err := filepath.Walk(base, func(path string, info os.FileInfo, err error) {
if err != nil {
return err
}
if info.IsDir() {
rel, _ := filepath.Rel(base, path)
dirs = append(dirs, filepath.Join(rel, "*"))
}
})
if err != nil {
log.Fatal(err)
}
err = ioutil.WriteFile("embed.gen.go", []byte(fmt.Sprintf(template, strings.Join(dirs, " "))), 0644)
if err != nil {
log.Fatal(err)
}
}
const template = `// Code generated by go run embed.go; DO NOT EDIT.
package pkg
import "embed"
//go:embed %s
var embedded embed.FS` That way you still get the efficiency and integration of the native embedding support (which you lack if you use a third-party tool). I agree that it's less convenient to still have to run In any case, I think the release of go 1.16 is imminent, so I think it's too late to change it for this release. I see that you've opened a new issue, I think we can discuss options there. |
That doesn't work because these files aren't included in the executable to begin with: there's nothing to filter. I've opened #43854 if you want to continue the discussion. |
This is an undesirable workaround for the reasons already discussed:
Let's continue the discussion in #43854. |
The idea is that you change a/_whatever.txt to a/-whatever.txt and then add a wrapper than interprets |
Thanks for the clarification. That's the approach I also found undesirable in my original post, quoted again below.
In my project that change isn't limited to a one-time bulk rename: it means changing the design of the all the code that uses this directory/file tree structure. The project uses many such directory trees; most are used outside of //go:generate on a live filesystem. The motivation for using //go:generate here is that there are some special trees that are "constant" and need to be distributed with the executable. Others that live on the filesystem are specific to the environment in which they run. |
Although my initial inclination was to add a One alternative may be to have a second directive with different semantics:
This could become unwieldy if many permutations were anticipated to be added in the future. One additional thought: does the module proxy respect embed directives? If not, the files may be excluded before the embed directive takes place anyway. Edit: I'll go add this comment to the discussion on #43854 as well. |
Change https://golang.org/cl/359413 mentions this issue: |
When //go:embed d matches directory d, it embeds the directory tree rooted at d, but it excludes files beginning with . and _, as well as files having problematic names that will not be packaged into modules (names such as .git and com1). After long discussions on #42328 and #43854, we decided to keep the behavior of excluding . and _ files by default, but to allow the pattern prefix 'all:' to override this default. This CL implements that change. Note that paths like .git and com1 are still excluded, as they must be, since they will never be packed into a module. Fixes #43854. Change-Id: I4f3731e14ecffd4b691fda3a0890b460027fe209 Reviewed-on: https://go-review.googlesource.com/c/go/+/359413 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Jonathan Amsterdam <jba@google.com> Reviewed-by: Bryan C. Mills <bcmills@google.com>
Edit by rsc, Dec 1 2020: Please note that the "likely accepted" answer here is #42328 (comment), specifically:
patterns passed to //go:embed are evaluated exactly with filepath.Glob, not "Glob with special cases".
So matching * keeps matching .foo, because it does in Glob.
when you name a directory explicitly, as in //go:embed dir, the code that walks that directory collecting
all the files to embed will ignore names beginning with dot or underscore, the same as the go command does
for deciding what to build.
This means that
//go:embed static/*
will embed .DS_Store if you really want to, while//go:embed static
will not.This is forked off from #41191 to talk about a specific issue with the design as-accepted. #42325 and #42321 talk about the same problem, but they are very solution-focused and I think it is more appropriate to discuss the actual problem first.
Namely: @carlmjohnson has pointed out that the
//go:embed
directive includes.DS_Store
files under MacOS. This is certainly working-as-designed, but I think it should be discussed whether that's actually the semantic we want. I want to talk about the.DS_Store
example specifically, but there are other dot-files with similar properties..DS_Store
does illustrate something significant though, because it is a directory that is (AIUI, I'm not a Mac-user myself) created non-interactively by a third-party software in every subdirectory. So it is not explicitly created by the user and it is permanent (and will be re-created at some point, if deleted).There have been several suggestions made so far, which IMO are deficient in one way or another:
go build
or live with inclusion of any such files. IMO this puts unreasonable expectations on users. No matter how much we feel that in an ideal world, this should be what happens - I just don't think it is what will happen, in practice.git clean
beforego build
. This is certainly useful in CI/CD, but during regular development, it would also delete progress made. It's certainly not something I'd want to run before everygo test
orgo build
. In CI/CD it is useful, but it will probably still be forgotten by many people - though it's also far less of a problem, because the chance of pollution is low..DS_Store
is something that ~always should be excluded and as others have pointed out, there are many other patterns of files that should be excluded and would have to be listed. In effect, this would mean that ~every//go:embed
directive would have to list a non-canonical, long list of exclusions, to cover any tools used by people. It also still suffers from the problem that people have to know about the problem in the first place. So IMO it would still end up with accidental inclusions of.DS_Store
and similar files.*
not match dot-files. This isn't really helpful either, as even then, if a directory is named (either directly as//go:embed assets
or indirectly via a glob), we would still recursively include all subdirectories, including.DS_Store
.There are probably other approaches that can be discussed. It would also be a valid answer to close this as WAI. But if we release go 1.16 with embedding, we get locked into the semantics we implemented, so IMO this should be closed one way or another before releasing it into the wild.
Discussion summary (as of 2020-11-21, 10:00 UTC)
This is my best attempt at a fair and unbiased discussion-summary. It is necessarily subjective, though, so apologies if I left something out or misrepresented someone.
The resulting proposal from the discussion which is currently marked as "likely accept" is to have globs match as it currently does, but to exclude
.*
and_*
files (editors remark: "as the go command" probably impliestestdata
as well) from recursive directory walks when a directory is given explicitly.There are a couple of open questions:
_*
andtestdata
really be skipped (comment by @mpx)? The case for them comes down to consistency withgo build
and seems significantly weaker than for.*
.There where some alternatives suggested:
**
matching operator tofilepath.Glob
- don't leave out hidden files when walking directories (comment by @ianthehat). @rsc remarks that this might be unexpectedly unspecified. Also, the presence of**
would also make it easier to specifically match all hidden files, if they are desired (comment by @Merovius) so can be argued in favor of this proposal as well.There where some counter arguments:
*
but not in directory walks (comment by @dcormier). The proponents response is that we agree, but it might still be better than the current alternative. We need to make a tradeoff (comment by @Merovius). @SamWhited posted examples of where they embedded dot-files, which might be useful to inform that tredoff.The text was updated successfully, but these errors were encountered: