Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: avoid surprising inclusion of "hidden" files when using //go:embed #42328

Closed
Merovius opened this issue Nov 1, 2020 · 140 comments
Closed

cmd/go: avoid surprising inclusion of "hidden" files when using //go:embed #42328

Merovius opened this issue Nov 1, 2020 · 140 comments

Comments

@Merovius
Copy link

@Merovius Merovius commented Nov 1, 2020

Edit by rsc, Dec 1 2020: Please note that the "likely accepted" answer here is #42328 (comment), specifically:

  • patterns passed to //go:embed are evaluated exactly with filepath.Glob, not "Glob with special cases".
    So matching * keeps matching .foo, because it does in Glob.

  • when you name a directory explicitly, as in //go:embed dir, the code that walks that directory collecting
    all the files to embed will ignore names beginning with dot or underscore, the same as the go command does
    for deciding what to build.

This means that //go:embed static/* will embed .DS_Store if you really want to, while //go:embed static will not.


This is forked off from #41191 to talk about a specific issue with the design as-accepted. #42325 and #42321 talk about the same problem, but they are very solution-focused and I think it is more appropriate to discuss the actual problem first.

Namely: @carlmjohnson has pointed out that the //go:embed directive includes .DS_Store files under MacOS. This is certainly working-as-designed, but I think it should be discussed whether that's actually the semantic we want. I want to talk about the .DS_Store example specifically, but there are other dot-files with similar properties. .DS_Store does illustrate something significant though, because it is a directory that is (AIUI, I'm not a Mac-user myself) created non-interactively by a third-party software in every subdirectory. So it is not explicitly created by the user and it is permanent (and will be re-created at some point, if deleted).

There have been several suggestions made so far, which IMO are deficient in one way or another:

  • Manually clean up before go build or live with inclusion of any such files. IMO this puts unreasonable expectations on users. No matter how much we feel that in an ideal world, this should be what happens - I just don't think it is what will happen, in practice.
  • Run git clean before go build. This is certainly useful in CI/CD, but during regular development, it would also delete progress made. It's certainly not something I'd want to run before every go test or go build. In CI/CD it is useful, but it will probably still be forgotten by many people - though it's also far less of a problem, because the chance of pollution is low.
  • Add ways to exclude specific files. IMO .DS_Store is something that ~always should be excluded and as others have pointed out, there are many other patterns of files that should be excluded and would have to be listed. In effect, this would mean that ~every //go:embed directive would have to list a non-canonical, long list of exclusions, to cover any tools used by people. It also still suffers from the problem that people have to know about the problem in the first place. So IMO it would still end up with accidental inclusions of .DS_Store and similar files.
  • Make * not match dot-files. This isn't really helpful either, as even then, if a directory is named (either directly as //go:embed assets or indirectly via a glob), we would still recursively include all subdirectories, including .DS_Store.
  • Not include dot-files at all, unless explicitly mentioned. It can be argued that this is a "dirty" approach, because dot-files aren't actually special and I would agree with that. OTOH, it's IMO a) the approach leading to the least accidents with the lowest impact in actual practice and b) when an accident happens, it can be debugged and fixed most easily (simple testing will show the dot-file to be missing and the docs can clearly say that they have to be mentioned explicitly).

There are probably other approaches that can be discussed. It would also be a valid answer to close this as WAI. But if we release go 1.16 with embedding, we get locked into the semantics we implemented, so IMO this should be closed one way or another before releasing it into the wild.


Discussion summary (as of 2020-11-21, 10:00 UTC)

This is my best attempt at a fair and unbiased discussion-summary. It is necessarily subjective, though, so apologies if I left something out or misrepresented someone.

The resulting proposal from the discussion which is currently marked as "likely accept" is to have globs match as it currently does, but to exclude .* and _* files (editors remark: "as the go command" probably implies testdata as well) from recursive directory walks when a directory is given explicitly.

There are a couple of open questions:

  • Will hidden files also be excluded if a directory is matched by a glob, or just if it is mentioned by name (comment by @mpx)? Consensus seems to be that it would apply to globbed directories as well - that is, "expand globs first, then recursively walk every directory given with hidden files/dirs skipped".
  • Does "hidden" also apply to extended filesystem attributes? (comment by @andrius4669)? There was one comment in favor by @Merovius and one comment against by @inliquid.
  • Should _* and testdata really be skipped (comment by @mpx)? The case for them comes down to consistency with go build and seems significantly weaker than for .*.

There where some alternatives suggested:

  • Add a ** matching operator to filepath.Glob - don't leave out hidden files when walking directories (comment by @ianthehat). @rsc remarks that this might be unexpectedly unspecified. Also, the presence of ** would also make it easier to specifically match all hidden files, if they are desired (comment by @Merovius) so can be argued in favor of this proposal as well.
  • Provide a simple way to test for accidental file inclusion, leave semantics as-is (comment by @nightlyone). The main argument against that is that it requires knowledge that this should be tested (comment by @Merovius).
  • Provide a wrapper-fs that filters out undesirable files (comment by @seankhliao). In addition to the same argument as against tests, even the embedding of undesired files might be harmful, not just their use (comment by @SamWhited)
  • Remove support for recursive directory inclusion for now, leave globs as-is. A separate go-generate tool can create complete file-listings and we can experiment syntax/semantics using such a tool during the next cycle (comment by @ianthehat).
  • Remove both recursive directory walking and globs, only allow explicit lists (comment by @mpx).

There where some counter arguments:

  • It is inconsistent and confusing that hidden files are included when using * but not in directory walks (comment by @dcormier). The proponents response is that we agree, but it might still be better than the current alternative. We need to make a tradeoff (comment by @Merovius). @SamWhited posted examples of where they embedded dot-files, which might be useful to inform that tredoff.
  • We need to have an escape hatch for "everything exactly as on disk" (comment by @seankhliao). The best escape hatch available so far is to use go-generate to create a list (comment by @mvdan), there might be the need for a more convenient one, if this comes up often enough.
@mvdan
Copy link
Member

@mvdan mvdan commented Nov 1, 2020

Thanks for filing this issue. In my comment in the original proposal I did suggest filing a new issue from the point of view of a bug report, so I agree with you that we should begin with the problem and not multiple potential solutions. I hope this doesn't mean I get more thumbs down reactions :)

I fully agree that this needs a decision before 1.16, even if the decision is that we're okay with the current semantics. I also tend to agree that excluding files by accident is less harmful than including them by accident, because the former is easy to spot but the latter could go unnoticed for a long time.

I'm still uneasy about introducing the notion of "hidden files" in the Go toolchain, but they do have a sort of precedent:

$ go help packages
[...]

Directory and file names that begin with "." or "_" are ignored
by the go tool, as are directories named "testdata".

Perhaps we could copy the same rule here. * is about files and not packages, but since builds happen inside package directories, I think it could make sense to be consistent. And the notion of "ignored filenames" by the toolchain would remain easy to remember.

Loading

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 2, 2020

I think that would be a fine approach (as long as explicit listing or a .*-glob would still allow you to include them). Just to be clear though:

* is about files and not packages

I want to make sure we are in agreement that this isn't just about globs specifically. As I said, I think //go:embed assets also shouldn't embed assets/.DS_Store (for example), even though it doesn't contain a glob.

Loading

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 2, 2020

I think this issue has made a very strongly case for "fail safe" instead of "fail unsafe but document/provide an escape hatch" semantics.

One additional thought: if we get the semantics "wrong" in 1.16, would it be possible to revise them in 1.17 with a go.mod directive? If so, that to me suggests doing something "conservative" in 1.16 and revising to be "liberal" in later versions of Go if the conservative thing was found to be too conservative.

Loading

@dmitshur dmitshur added this to the Go1.16 milestone Nov 2, 2020
@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented Nov 2, 2020

CC @rsc.

Loading

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 3, 2020

@Merovius agreed. The rule would limit both recursing into directories, and the * glob. I think any other glob shouldn't be affected, so that one may use globs such as .* or _*.

Loading

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 3, 2020

Icon\r isn’t dotfile, but it is hidden. Should Go include it?

Loading

@ianlancetaylor ianlancetaylor changed the title embed: Surprising inclusion of "hidden" files proposal: cmd/go: avoid surprising inclusion of "hidden" files when using //go:embed Nov 3, 2020
@ianlancetaylor ianlancetaylor removed this from the Go1.16 milestone Nov 3, 2020
@ianlancetaylor ianlancetaylor added this to the Proposal milestone Nov 3, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Nov 3, 2020
@zephyrtronium
Copy link
Contributor

@zephyrtronium zephyrtronium commented Nov 3, 2020

In a similar vein, any file can be marked hidden on NTFS. Should //go:embed read filesystem attributes, even on a Linux host compiling code on an NTFS volume, to ignore thumbs.db?

Loading

@andig
Copy link
Contributor

@andig andig commented Nov 4, 2020

For sake of completeness: we need the ability to exclude files on purpose, thunk assets.go. Might also solve excluding dotfiles.

Loading

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 4, 2020

#42325 was closed to focus discussion here.

The proposal there was to add a magic comment for negative globs, like //go:embed-exclude *.go. The objection was that it would lead to a lot of OS/IDE specific exclusions being listed and still miss some things that should filtered when encountering a new environment.

My current feeling is that a combination of approaches should be taken:

  • Dot files should be excluded unless specifically included. (Definitely do this.)
  • Hidden files should probably also be excluded to the extent that we can figure out what files are hidden on the system. Since hidden files this is system specific, it may not always work properly, e.g. across a network. (Maybe do this if it can be made portable.)
  • A way to specifically exclude files might be nice. (Can be left for Go 1.17 if we exclude dotfiles now.)
  • Go source files could be automatically excluded at the cost of losing a few blog posts about easy Quines. (Probably do this unless there is a compelling usecase.)

Loading

@Merovius
Copy link
Author

@Merovius Merovius commented Nov 4, 2020

@andig I don't agree that that's a necessity for the usecase you mention. There is no need to have assets.go in the same directory, you can have it in a directory further up and use a StripPrefixFS (if it's not in the stdlib, it can always be provided by a third party).

Loading

@nightlyone
Copy link
Contributor

@nightlyone nightlyone commented Nov 5, 2020

A well tested approach, see https://docs.docker.com/engine/reference/builder/#dockerignore-file , is providing something like a .goembedignore file. The .gitignore file is another such example.

Those are well understood semantics, easy to share via dotfile collection, easy to configure in template repositories, easy to add to UIs for repository creation like github.com or gitlab.com as well as repository compliance checkers often found in enterprises can easily verify its existence, syntax and content.

The extensions star-star-match, match-exceptions as well as the ability to add comments mentioned in the .dockerignore documentation also proved critical to practical use.

Adding such information to each mention of go:embed using wildcards seems unpractical to me and too easy to forget.

This problem has been solved pretty well before and I don't think Go needs yet another way to be special here.

Followup questions now are:

  • Where are those .goembedignore files allowed?
  • Should the go:embed directive allow mentioning such a file?
  • Can mentioning this file to go:embed pragma be the only way that this information is passed?
  • Do we still need sensible defaults then or is explicit better than implicit here?

Loading

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 5, 2020

Interesting idea. Maybe the default is to exclude dot files and .go files and then tweaking the go.embedignore file for a module could change the parameters to something else. It’s an interesting idea that could wait for Go 1.17 if the window for making it into 1.16 is too narrow now.

Loading

@nightlyone
Copy link
Contributor

@nightlyone nightlyone commented Nov 5, 2020

@rsc / @dmitshur For Go 1.16 I would suggest to revert the ability to add wildcard based trees to go:embed until this issue is solved so we can explore the solution space without releasing a feature with a potential security risk given such precedents in Docker and git before each had a way to ignore certain file patterns recursively.

Loading

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 5, 2020

Even without a wildcard pattern, the problem of including a directory that has an unexpected file will be there.

Loading

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 5, 2020

Personally, I'm opposed to .goembedignore and go:embed-exclude in general, for a similar reason why we rejected a .goignore in #30058. Here, you have two options (besides my proposal in #42328 (comment)):

A) To only include a subset of the files, use go:embed on a subdirectory and place them there.

B) To exclulde some sub-directory if option A isn't desirable, you could always drop a go.mod in there to exclude the directory from the current module.

Loading

@mpx
Copy link
Contributor

@mpx mpx commented Nov 5, 2020

The files referenced by //go:embed are effectively part of the build, just like *.go, *.c, etc.. Ideally it would be similarly easy to understand which embedded files are included. The rule to determine which files are included should be simple and avoid surprising results.

Excluding all files starting with "." seems simplest and least prone to confusion. This avoids including files which typically aren't visible in the directory, and entries which typically related to other system purposes (eg, .DS_Store). build.Context.Import also excludes _, so it might be reasonable to exclude it here as well.

Perhaps this rule could be extended to include dot-files when the pattern/filenames explicitly starts with ".". Another option would be including explicitly referenced files. Eg: //go:embed data data/.mysecrets. However, I suspect few codebases would be motivated to use this option.

Loading

@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented Nov 10, 2020

I think it's counterintuitive to exclude certain files, especially when I specify a directory, I expect everything inside it. Furthermore, most (all?) of the existing tools do not have such behaviour.

From the original issue description, I would argue it is fine for in-development builds to contain extra files such as .DS_Store, building from a clean repo for release is something we should promote and is more or less equivalent to what already happens for go get / go install

Loading

@andig
Copy link
Contributor

@andig andig commented Nov 10, 2020

I really dislike the inclusion of hidden files. However, this is what http.Dir for example says:

Note that Dir could expose sensitive files and directories. Dir will follow symlinks pointing out of the directory tree, which can be especially dangerous if serving from a directory in which users are able to create arbitrary symlinks. Dir will also allow access to files and directories starting with a period, which could expose sensitive directories like .git or sensitive files like .htpasswd. To exclude files with a leading period, remove the files/directories from the server or create a custom FileSystem implementation.

Loading

@magical
Copy link
Contributor

@magical magical commented Nov 29, 2020

@slrz Exactly, they're configured by the admin (.snapshot) or the OS (.DS_Store) and as a user i have no control over them. If i try to build a third-party go package that embeds a directory or a glob on such a machine, it's going to end up embedding a lot of files that the author probably didn't expect.

Loading

@slrz
Copy link

@slrz slrz commented Nov 30, 2020

It's the same behaviour exhibited by basically every other tool in existence, so I don't get the unexpected part. Specifying a directory includes it as a whole, whatever might be in there.

It's only shell globs and some ls variants that behave differently. Incidentally, I would have much less of an issue with this if we defined the glob star to not match dot files (although the mismatch with filepath.Glob would be unfortunate). It's specifying a directory explicitly (or matching it through a glob) and then having its contents filtered by some magic set of rules. That is unexpected and I'm not aware of anything else behaving this way. Are you?

I really don't think this is an appropriate area for Go to "innovate", especially not by introducing complexity. It wouldn't fully solve the problem anyway: the go tool might ignore those files but cp, tar, zip and friends certainly won't.

Loading

@tv42
Copy link

@tv42 tv42 commented Dec 1, 2020

@slrz GNU tar has an elaborate --exclude-* flag set, for that reason. People are trying to avoid that complexity.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Dec 1, 2020

Hi all. Thanks for the lively discussion and thanks to @Merovius for the summary.

Regarding the open questions in the summary, the current proposal's answers are:

Will hidden files also be excluded if a directory is matched by a glob, or just if it is mentioned by name?

Yes. A name is a glob. There is no distinction. To process //go:embed, you figure out what is matched and then also walk directories that were matched.

Does "hidden" also apply to extended filesystem attributes?

No, just to .* and _* names.

Should _* and testdata really be skipped (comment by @mpx)?

testdata definitely not. That's not in the proposal. (I understand the confusion, but it was deliberately omitted compared to the go command.) But _* is there for a reason and so I think it makes sense to preserve. In particular the go command reads a _netrc file on Windows instead of .netrc, so at least some tools do treat _ as the Windows equivalent.

Regarding the alternatives, it's important not to deviate too much from what we already discussed and accepted. It's clear that being able to say //go:embed static and embed a directory tree is a very important case. We're not going to remove directory walking entirely. It's also clear that //go:embed static/*.jpg is important. We're not going to remove globs entirely. I think the proposal would not have been accepted without both of these, and we should respect that it was.

I would be interested to hear estimates of how often "everything on disk" will come up versus "ignore dot files". Clearly lots of people don't want dot files, hence this and other issues. And I also like that Hugo is ignoring them too.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Dec 1, 2020

Overall while there's lots of discussion since two weeks ago, I'm not seeing any clear change in consensus since we marked this "likely accept". It's a difficult call, but since embed is a new feature, "fall back to doing nothing" is not as strong. We are introducing a new feature, and we need to decide what is best for that new landing. There's no prior behavior to bias toward. More data would be helpful, as I mentioned in the previous comment, if anyone has any.

Overall it still sounds like we should make the directory change. If it's truly awful and wrong for the vast majority of Go users, it's easy to change in the future based on the go version in go.mod. But I doubt we're at that point. Instead, it's closer to a 50-50 call and we have to decide something. Both choices are good (and bad). On balance, the choice that doesn't accidentally vacuum up potentially sensitive files seems like the better one.

Loading

@mpx
Copy link
Contributor

@mpx mpx commented Dec 2, 2020

But _* is there for a reason and so I think it makes sense to preserve. In particular the go command reads a _netrc file on Windows instead of .netrc, so at least some tools do treat _ as the Windows equivalent.

I appreciate we have accepted a solution for dot-files (thanks, it greatly reduces the risk in my environment). However, I think it's worth looking at underscore-files a little closer since there are no examples of problems with _* in this thread and it appears the same justification does not apply:

  • tools automatically create dot-files (not underscore-files?)
  • dot-files are hidden under Unix (underscore files are not hidden)

The decision for using _netrc for Go was in https://go-review.googlesource.com/c/vgo/+/103866/. os.UserHomeDir is almost certainly outside any embed data directory.

I suspect _netrc is a consequence of porting ftp to early DOS/Windows and git continues to use it today - not a wider trend. Afaik, underscore config files are not common under Windows (but I'm far from an expert :) ). There is probably some historical reluctance to use dot-files under Windows due to it's FAT heritage with 8.3 filenames -- but this is no longer valid with "modern" filesystems (since Windows 95).

Can anyone suggest examples of tooling that creates underscore files under Windows or Unix that might interfere with //go:embed?

To be clear, I'm sure there are many people who would prefer to include underscore files (many comments above), no need to reply on that point - I'm just trying to find examples where excluding underscore-files might help.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Dec 2, 2020

There's been a lot of discussion but I don't see a change in the consensus from when this got marked likely accept. Let's make the change for the beta (hopefully next week) and then get more feedback. For now, accepted.

Loading

@rsc rsc moved this from Likely Accept to Accepted in Proposals Dec 2, 2020
@rsc rsc changed the title proposal: cmd/go: avoid surprising inclusion of "hidden" files when using //go:embed cmd/go: avoid surprising inclusion of "hidden" files when using //go:embed Dec 2, 2020
@rsc rsc removed this from the Proposal milestone Dec 2, 2020
@rsc rsc added this to the Backlog milestone Dec 2, 2020
@slrz
Copy link

@slrz slrz commented Dec 2, 2020

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 3, 2020

Change https://golang.org/cl/275092 mentions this issue: cmd/go, embed: exclude .* and _* from embedded directory trees

Loading

@gopherbot gopherbot closed this in 37588ff Dec 4, 2020
@jpap
Copy link
Contributor

@jpap jpap commented Jan 22, 2021

I just hit this while working with 1.16beta1. I have a //go:embed of ~700 files, ~70 directories and where some files have a leading underscore; these are not dot files, nor go source files, and it took me some time to work out that the reason I couldn't open some of them without error was that they were excluded from the build.

I've since found and read through the discussion here. Please let me know if there is a more appropriate place to provide feedback now that this issue is closed. Otherwise, please read on...

The draft proposal doesn't mention this restriction (I now realize it just hasn't been updated since July), and it was counterintuitive to me that files would be arbitrarily excluded because they're considered "hidden" on some platforms. Why counterintuitive? I don't know of any other tools that automatically exclude dot or leading-underscore files, other than ls for dot files, which provides the well-known -a escape hatch. Git, find, tar, etc., include such files by default. Even the macOS Finder allows you to toggle display dot files with ⌘-shift-..

Sometimes we need to include everything in an embed of a directory tree. Please give us a way out! :) One suggestion is a simple flag like //go:embed -all dir. Otherwise having some explicit filtering option in the directive (with an "otherwise include-all") would be welcomed, as was suggested in #42325. The proposed //go:embed dir vs. //go:embed dir/** is very subtle and reminds me of the arcane build constraint syntax that was thankfully recently revisited. An explicit inclusion-list, written by hand or code generated, is a big turn-off.

Right now I am debating whether to write more code and wrap my file tree in a store-only ZIP file and //go:embed that, or just use a 3rd party embed code-generator tool and eat the slower build times. Neither approach is preferred because they burden the project with an external generator. (It's been helpful during development to have the embed available as-is on the filesystem, so if using a ZIP file or generated embed.go file the content will be kept in both forms.) Another approach which I dislike is to use some encoding to escape filenames that //go:embed otherwise denies me from using naturally. That would force design changes to a larger surface area of an otherwise already complex project just to simplify the build.

Loading

@mpx
Copy link
Contributor

@mpx mpx commented Jan 22, 2021

I just hit this while working with 1.16beta1. I have a //go:embed of ~700 files, ~70 directories and where some files have a leading underscore; these are not dot files,

It sounds like simply including underscore files would work. I'm not aware of any situations where including underscore files would be problematic and I haven't seen any in this issue (it's highly unlikely _netrc would inadvertently appear within a module).

Unfortunately providing a -all option would be problematic too:

  • Extra complexity: supporting the inclusion of file named -all would require escaping as well.
  • Code using -all might incorrectly include dot-files when compiled on some systems (see the many examples in this issue).
  • Any changes this late in the cycle would have to be very simple.

@rsc Perhaps excluding underscore files could be reconsidered?

Loading

@mpx
Copy link
Contributor

@mpx mpx commented Jan 22, 2021

@jpap I'm not sure, but it may help to open a separate issue to provide this feedback since this is closed and may not get attention.

Loading

@inliquid
Copy link

@inliquid inliquid commented Jan 22, 2021

Better documentation will solve everything.

Loading

@jpap
Copy link
Contributor

@jpap jpap commented Jan 22, 2021

Unfortunately providing a -all option would be problematic too:

  • Extra complexity: supporting the inclusion of file named -all would require escaping as well.

That is easily avoided though a "--" terminator, and easily implemented using the flag package.

For example, to include a file called -all, you would simply write //go:embed -all -- -all. This pattern is likely familiar to most go developers, because we often use it on the command-line. This is also likely an edge case.

  • Code using -all might incorrectly include dot-files when compiled on some systems (see the many examples in this issue).

Using -all is opt-in, an escape hatch, so yes -- you would get all dot-files because maybe that's what you want! :)

  • Any changes this late in the cycle would have to be very simple.

A change that leveraged the flag package is likely smaller than CL 275092 above.

In src/cmd/go/internal/load/pkg.go(*Package).resolveEmbed, you could feed patterns into a flag.FlagSet having an "all" bool flag, then use it to short-circuit the offending condition:

if path != file && (isBadEmbedName(name) || name[0] == '.' || name[0] == '_') {
	// Ignore bad names, assuming they won't go into modules.
	// Also avoid hidden files that user may not know about.
	// See golang.org/issue/42328.
	if info.IsDir() {
		return fs.SkipDir
	}
	return nil
}

Loading

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Jan 22, 2021

This is probably the wrong place for discussion but if you just need to embed some files with underscores, a quick workaround is that you can write a wrapper fs.FS that changes file names from -whatever to _whatever automatically and then rename the files on disk to start with a dash.

Loading

@Merovius
Copy link
Author

@Merovius Merovius commented Jan 22, 2021

The workaround we've discussed is to write a small go generate program that generates a //go:embed comment. Something along the lines of

package main

func main() {
    const base = "embed"
    var dirs []string
    err := filepath.Walk(base, func(path string, info os.FileInfo, err error) {
        if err != nil {
            return err
        }
        if info.IsDir() {
            rel, _ := filepath.Rel(base, path)
            dirs = append(dirs, filepath.Join(rel, "*"))
        }
    })
    if err != nil {
        log.Fatal(err)
    }
    err = ioutil.WriteFile("embed.gen.go", []byte(fmt.Sprintf(template, strings.Join(dirs, " "))), 0644)
    if err != nil {
        log.Fatal(err)
    }
}

const template = `// Code generated by go run embed.go; DO NOT EDIT.

package pkg

import "embed"

//go:embed %s
var embedded embed.FS`

That way you still get the efficiency and integration of the native embedding support (which you lack if you use a third-party tool). I agree that it's less convenient to still have to run go generate when the set of files changes, but I'm still convinced that this is enough of an edge-case that it's fine if it,s less convenient.

In any case, I think the release of go 1.16 is imminent, so I think it's too late to change it for this release. I see that you've opened a new issue, I think we can discuss options there.

Loading

@jpap
Copy link
Contributor

@jpap jpap commented Jan 22, 2021

This is probably the wrong place for discussion but if you just need to embed some files with underscores, a quick workaround is that you can write a wrapper fs.FS that changes file names from -whatever to _whatever automatically and then rename the files on disk to start with a dash.

That doesn't work because these files aren't included in the executable to begin with: there's nothing to filter.

I've opened #43854 if you want to continue the discussion.

Loading

@jpap
Copy link
Contributor

@jpap jpap commented Jan 22, 2021

The workaround we've discussed is to write a small go generate program that generates a //go:embed comment.

This is an undesirable workaround for the reasons already discussed:

  • Use of an external generator.
  • Not automatically kept fresh.
  • Increased complexity compared to a simple -all flag, or just not excluding files with leading underscore.

Let's continue the discussion in #43854.

Loading

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Jan 22, 2021

That doesn't work because these files aren't included in the executable to begin with: there's nothing to filter.

The idea is that you change a/_whatever.txt to a/-whatever.txt and then add a wrapper than interprets Open("a/_whatever.txt") as meaning Open("a/-whatever.txt") and similarly changes the ReadDir calls. It's an ugly hack, but probably could be done in like 20 LoC plus a bulk rename.

Loading

@jpap
Copy link
Contributor

@jpap jpap commented Jan 22, 2021

The idea is that you change a/_whatever.txt to a/-whatever.txt and then add a wrapper than interprets Open("a/_whatever.txt") as meaning Open("a/-whatever.txt") and similarly changes the ReadDir calls. It's an ugly hack, but probably could be done in like 20 LoC plus a bulk rename.

Thanks for the clarification. That's the approach I also found undesirable in my original post, quoted again below.

Another approach which I dislike is to use some encoding to escape filenames that //go:embed otherwise denies me from using naturally. That would force design changes to a larger surface area of an otherwise already complex project just to simplify the build.

In my project that change isn't limited to a one-time bulk rename: it means changing the design of the all the code that uses this directory/file tree structure. The project uses many such directory trees; most are used outside of //go:generate on a live filesystem. The motivation for using //go:generate here is that there are some special trees that are "constant" and need to be distributed with the executable. Others that live on the filesystem are specific to the environment in which they run.

Loading

@miquella
Copy link

@miquella miquella commented Jan 22, 2021

Although my initial inclination was to add a -all flag, or similar, I would worry about the additional complexities that incurs. Specifically that escaping or other escape hatches have to be added to the mix.

One alternative may be to have a second directive with different semantics:

  • //go:embed — keeps the existing behavior
  • //go:embed-all — embeds all files

This could become unwieldy if many permutations were anticipated to be added in the future.

One additional thought: does the module proxy respect embed directives? If not, the files may be excluded before the embed directive takes place anyway.


Edit: I'll go add this comment to the discussion on #43854 as well.

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 29, 2021

Change https://golang.org/cl/359413 mentions this issue: cmd/go: add //go:embed all:pattern

Loading

gopherbot pushed a commit that referenced this issue Nov 9, 2021
When //go:embed d matches directory d, it embeds the directory
tree rooted at d, but it excludes files beginning with . and _,
as well as files having problematic names that will not be packaged
into modules (names such as .git and com1).

After long discussions on #42328 and #43854, we decided to keep
the behavior of excluding . and _ files by default, but to allow the pattern
prefix 'all:' to override this default. This CL implements that change.

Note that paths like .git and com1 are still excluded, as they must be,
since they will never be packed into a module.

Fixes #43854.

Change-Id: I4f3731e14ecffd4b691fda3a0890b460027fe209
Reviewed-on: https://go-review.googlesource.com/c/go/+/359413
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet