Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: add //go:embed all:<pattern> to allow . and _ matches #43854

Open
jpap opened this issue Jan 22, 2021 · 39 comments
Open

cmd/go: add //go:embed all:<pattern> to allow . and _ matches #43854

jpap opened this issue Jan 22, 2021 · 39 comments

Comments

@jpap
Copy link
Contributor

@jpap jpap commented Jan 22, 2021

This issue follows from #42328 which is now closed, and where @mpx suggested I create a new one.

Overview

I have a //go:embed of ~700 files, ~70 directories and where some files have a leading underscore. These are not dot files, nor go source files, and it took me some time to work out that the reason I couldn't open some of them without error was that they were excluded from the build.

The draft proposal doesn't mention this restriction. It was written in July and the exclusion of these files was made in December by a change resulting from #42328.

It was counterintuitive to me that files would be arbitrarily excluded because they're considered "hidden" on some platforms. Why counterintuitive? I don't know of any other tools that automatically exclude dot or leading-underscore files, other than ls for dot files, which provides the well-known -a escape hatch. Git, find, tar, etc., include such files by default. Even the macOS Finder allows you to toggle display dot files with ⌘-shift-..

Related to this, I sometimes use "filesystem templates" that include empty directories that are later populated by code: these are either initially extracted to the filesystem, or used with a virtual in-memory filesystem. To embed such a template using //go:embed, I'd have to add placeholder files like a readme.txt with contents "This file intentionally left blank.". (When forced to do this, it is preferable to use .empty dot files, but they're also unsupported by //go:embed!)

I've also experienced a similar chicken or the egg problem with an empty embed dir; see [1] below.

Sometimes we need to include everything in an embed of a directory tree. Please give us a way out! :)

Suggestions and feedback

A simple opt-in flag like //go:embed -all dir to include everything in dir without filtering, and without declaring an error if the directory is empty [1]. This flag would instruct //go:embed to include all files, including those having a leading underscore, and any dot files. For the edge-case of a file named "-all", we can use a simple flag terminator: //go:embed -all -- -all. The flag package makes this implementation straightforward.

Otherwise having some explicit filtering option in the directive (with an "otherwise include-all") would be welcomed, as was suggested in #42325.

The proposal in #42328 for //go:embed dir vs. //go:embed dir/** is very subtle and reminds me of the arcane build constraint syntax that was thankfully recently revisited. An explicit inclusion-list, written by hand or code generated, is a big turn-off.

[1] In one project, I have a package that embeds a directory tree that is generated by another program. The generator needs to use the same package, which leads to a chicken or the egg problem. I currently resolve this by having a Makefile add a readme.txt file into the embed directory before each build (which unfortunately remains in the embed). I am not using the Makefile for any other purpose here.

Workarounds

  1. Write more code and wrap my file tree in a store-only ZIP file and //go:embed that.
  2. Use a code-generator tool: either to explicitly generate //go:embed filenames for the entire tree, or a 3rd party embed tool (and eat the slower build times).
  3. Using some encoding to escape filenames that //go:embed otherwise denies me from using naturally.
  4. Add placeholder (non-dot) files in empty directories that are managed by Makefiles.

The first two options are not preferred because they burden the project with an external generator. The third option would force design changes to a larger surface area of an otherwise already complex project just to simplify the build.

I'd love to avoid the use of Makefiles in the last workaround, especially when they serve no other purpose.

@Merovius
Copy link

@Merovius Merovius commented Jan 22, 2021

From #42328

The workaround we've discussed is to write a small go generate program that generates a //go:embed comment.

This is an undesirable workaround for the reasons already discussed:

I acknowledged that. I'm still confused why you didn't even mention it in your list - but did include the option of a third-party embedding tool, which has largely the same drawbacks (with the exception that go:embed doesn't support empty directories, which is fair) and then some.

  • Increased complexity compared to a simple -all flag, or just not excluding files with leading underscore.

I think this really depends - the complexity of flags for go:embed is spread over all users, while the complexity of using a custom generator is localized to the projects that need it. So the total complexity really depends on how frequent one or the other is.

But, to be clear, I don't really have strong opinions, one way or another. If I where you, I'd at least consider generating the list until you get an answer. I don't think there is any real new information here that wasn't available when we discussed #42328 - we where aware, at the time, that there will be users who will want hidden files included, so getting a confirmation isn't really new. However, at the time, my main concern was to get safe default behavior - I'm not opposed to adding optional other behaviors.

@miquella
Copy link

@miquella miquella commented Jan 22, 2021

Although my initial inclination was to add a -all flag, or similar, I would worry about the additional complexities that incurs. Specifically that escaping or other escape hatches have to be added to the mix.

One alternative may be to have a second directive with different semantics:

  • //go:embed — keeps the existing behavior
  • //go:embed-all — embeds all files

This could become unwieldy if many permutations were anticipated to be added in the future.

One additional thought: does the module proxy respect embed directives? If not, the files may be excluded before the embed directive takes place anyway.

@jpap
Copy link
Contributor Author

@jpap jpap commented Jan 22, 2021

I'm still confused why you didn't even mention it in your list - but did include the option of a third-party embedding tool, which has largely the same drawbacks (with the exception that go:embed doesn't support empty directories, which is fair) and then some.

Fair comment; I've updated the list. As I think you agree, a generator for explicit per-file //go:embed directives is still a burden.

@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented Jan 22, 2021

@Merovius
Copy link

@Merovius Merovius commented Jan 22, 2021

One additional thought: does the module proxy respect embed directives? If not, the files may be excluded before the embed directive takes place anyway.

File can be embedded if they are included in the module zip file. And if they're in the zip-file, the proxy serves them.
So, no, hidden files are not excluded - they can still be regularly embedded, if you mention them explicitly.

@miquella
Copy link

@miquella miquella commented Jan 22, 2021

@Merovius: Sorry, I think I may have expressed my thought poorly, let me try one more time!

It's my understanding that directories that do not contain .go files will be excluded from the module zip file. If such a directory (one without .go files) were to be specified in an embed directive, is that directory still excluded from the module zip?

If embed directives don't modify the list of files included in the module zip, that seems like it's going to severely hamper the usefulness of the feature, as you won't be able to embed files that don't exist in the module zip.

@Merovius
Copy link

@Merovius Merovius commented Jan 22, 2021

@miquella

It's my understanding that directories that do not contain .go files will be excluded from the module zip file.

I don't think this understanding is correct. For example, if I run go mod download github.com/gokrazy/gokrazy@latest, I end up with a non-empty folder in ~/pkg/mod/github.com/gokrazy/gokrazy@v0.0.0-20210121075046-2e975fb90b29/assets not containing any .go files. And I also can't find any documentation saying they would be excluded.

So, the answer is still "yes, everything works just fine".

@miquella
Copy link

@miquella miquella commented Jan 22, 2021

@Merovius: Ah, good to know! Now I wish I could remember what made me think otherwise… 🤔

Either way, I do think something like //go:embed-all has fewer edges to consider (aside from the obvious introduction of another directive).

@networkimprov
Copy link

@networkimprov networkimprov commented Feb 4, 2021

Should be titled "embed: ..." vs cmd/go. Might explain why it's been overlooked.

@mpx
Copy link
Contributor

@mpx mpx commented Feb 4, 2021

[mostly including my comment from the other issue for reference here]

#42328 ignored dot-files since including them will cause problems on many systems. I'm not sure there is any benefit to discussing the pros/cons again here, the same arguments still apply.

Providing -all has significant downsides:

  • Code using -all would be vulnerable to including dot-files when compiled on some systems. It would encourage developers to write code that misbehaves on other systems.
  • Extra complexity: supporting the inclusion of file named -all would require escaping as well.
  • Any changes this late in the cycle would have to be very simple. Including underscore is about as simple as possible.

Fortunately, it sounds like simply including underscore files would solve the the original problem highlighted in this issue. I'm not aware of any situations where including underscore files would be problematic (_netrc was suggested, but it's highly unlikely it would inadvertently appear within a module). Prior comments on this topic: 1, 2

Using code generation when dot-files are required has a significant advantage: It guarantees that the correct files are included on other systems. From what I've seen including dot-files is very uncommon, so this inconvenience should be fairly rare. If not, it would be trivial to create a tool the community can use with go:generate.

@rsc
Copy link
Contributor

@rsc rsc commented Feb 10, 2021

It's always going to be possible to come up with a use case that embed does not support.
That's true of every language feature or library we add.
We spent a long time on the exact semantics of embed and came up with the current ones.
If they are not right, we should change the current ones.
We shouldn't add more knobs. In general we avoid knobs as much as possible in Go (see GOGC being the only GC knob).

@rsc rsc changed the title cmd/go: opt-in for //go:embed to *not* ignore files and empty dirs proposal: cmd/go: opt-in for //go:embed to *not* ignore files and empty dirs Feb 10, 2021
@rsc rsc added this to Active in Proposals Feb 10, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Feb 10, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@jpap
Copy link
Contributor Author

@jpap jpap commented Feb 11, 2021

On the dot/underscore file discussion, the summary at the end of the top-post of #42328 has some good suggestions to keep in mind. Some new suggestions below.

  1. I prefer the idea in #42325 of having stacked directives, where //go:embed include all files and //go:embed-exclude strips undesirables. Stacking //go:embed also makes it more pleasant to list multiple files or globs without wrapping a long list in a messy, hard-to-read way.

    I'd love to see the stacking of //go:embed directives for multiple file patterns, even if we never converge on the dot/underscore solution.

  2. I can live with the idea of keeping the existing //go:embed syntax (excluding dot and underscore files), if we also add a new glob pattern that means "everything without exclusion". While it might not be a good idea to use ** for this, perhaps there is another pattern we can define instead...

    One candidate is the 3-dot ellipsis ... which is familiar to Go developers when writing variadic functions having zero or more arguments:

    // Include all files/dirs under the assets tree, including
    // those with a leading dot or underscore.
    //go:embed assets/...
    

    Another candidate is % which is inspired by the SQL wildcard to match any number of any characters including none:

    // Include all files/dirs under the assets tree, including
    // those with a leading dot or underscore.
    //go:embed assets/%
    

    While it is convenient to just use filepath.Glob to implement //go:embed, I don't see why we can't offer a more sophisticated glob for embed patterns. (In a sense, the existing implementation does exactly that by excluding dot and underscore files.)

    This option becomes workable without using a build-time exclusion list when paired with an io/fs.FS implementation that excludes files that we don't want. Yes, excluded files ideally shouldn't be in the executable in the first place, which is what a //go:embed-exclude directive would do; but a runtime exclusion filter is a compromise over those who propose we just use a clean build environment, which is not always practical.

On the empty embed error, I hope we can just drop it. Warning the user that an embed is empty is probably better suited for go vet, rather than causing the chicken or the egg problem outlined in the top-post. I've not seen any objections to this part of the proposal.

@mpx
Copy link
Contributor

@mpx mpx commented Feb 11, 2021

Any suggestions which automatically include dot-files will misbehave on some systems (not to mention adding more complexity for an edge case). This goes against the extensively explored reasoning for accepting #42328.

The window to fix this in Go1.16 is vanishingly small (if it exists at all). The barrier to fixing it later will be higher again.

I'd recommend focusing on simply adding underscore files back in. It's a trivial change now (also less likely after release), and it would solve the motivating example behind this issue. So far, there have been no explicit examples where doing so would cause a problem.

@rsc, you mentioned earlier that it would be better to release the beta and gather feedback. Given this issue, I think it would be worth rapidly reconsidering including underscore files. I suspect underscore files will be more desirable to the wider community, despite being much less of a risk than dot-files.

@rsc
Copy link
Contributor

@rsc rsc commented Feb 17, 2021

I am still not clear about what the use case being described in the top comment is exactly.
Can you say more about the case when it would be important to include _* and .* files?
We discussed this at length at #42328 as well. Is there something new to add to that discussion?

I would also point out that we basically cannot include empty directories, since they do not exist as a concept in Git.
(So any downloading of the dependency would not find the empty directory in the file system,
so there would be nothing for embed to match, even if it did accept empty directories.)

@jpap
Copy link
Contributor Author

@jpap jpap commented Feb 18, 2021

I am still not clear about what the use case being described in the top comment is exactly.
Can you say more about the case when it would be important to include _* and .* files?
We discussed this at length at #42328 as well.

My use-case was described briefly in #42328, but only after it was already closed. It was suggested there that I create a new issue. Some further detail:

  • I have a 3rd-party program A that generates a bundle of work files that is given to another 3rd-party program B for processing: the output of B is then further processed by A. Program B is executed directly by A many times, concurrently, during a session.
  • I have a Go program C that intercepts all of these files via interpose and stores them in a persistent cache so they can be "played back" to A when B is unavailable.
  • The persistent cache uses a flat file scheme on the live filesystem, using the naming convention established by A and B. The concurrent interpose makes it cumbersome to use some kind of consolidated structure, because it would then require coordination. Executions of B can be further categorized into well-defined groups, and all work bundles for each group are stored in distinct folders. In all, the persistent cache is a tree of files.
  • There are some trees that are "universal" and not easily generated by the user. These are best pre-prepared and embedded directly into program C.
  • The embed uses the same naming convention as the filesystem persistent cache: in that way, the large amount of existing code that manages the filesystem version could transparently use the embedded version... which would've worked out perfectly if //go:embed didn't filter out those files having leading underscores.
  • To constrain the naming convention and satisfy //go:embed would mean changing a large surface area of the codebase so that the filesystem persistent cache is consistent: for example, by encoding filenames using a scheme that avoids a leading underscore or period. That is, the design of //go:embed would cascade onto the design of the rest of program C. Other undesirable workarounds are given in the top-post.

Is there something new to add to that discussion?

This is new feedback, based on a project that attempted to use the new feature and found the current design to be overly constraining. (As you know, the filename filtering wasn't in your draft design, added because of #42328, and was otherwise undocumented at the time.) Most of the posts in #42328 are users voicing their preference or opinion on how filtering should behave and not feedback based on practice. The closest thing to that are these posts:

  1. Talking about prior experience with embedding.
  2. Another by the OP talking about a "most often" wanted use-case.
  3. Another about an "ideal embed line" for a "common case" of embedding httpd assets.
  4. Yet another about wanting to embed files from a non-Go build system.

I would also point out that we basically cannot include empty directories, since they do not exist as a concept in Git.
(So any downloading of the dependency would not find the empty directory in the file system, so there would be nothing for embed to match, even if it did accept empty directories.)

The need for dropping the empty-dir error can arise without the developer ever invoking git. The chicken or the egg problem can appear when writing a new package for the first time, or updating an embed from scratch within an existing checkout, when using a Go program that imports the package containing the //go:embed to (re)write its contents. The latter can happen even when the final update is transmitted to a remote vcs with a fully populated embed dir.

Separate to that, I urge you to reconsider coupling //go:embed so tightly to git, when go supports other vcs like bzr and svn that do allow empty dirs in a repo. (Even on git, it is also common to use .gitignore or other a .gitkeep placeholder in a git repo to host an "empty" dir: because of the current dotfile exclusion on //go:embed, such an embed will be empty even on a "go get", if the error didn't stop you.)

One path forward is to drop the error, and add a go vet check to warn a user that the directory might inadvertently be empty, in case that was not the developer's intention.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 19, 2021

To constrain the naming convention and satisfy //go:embed would mean changing a large surface area of the codebase so that the filesystem persistent cache is consistent: for example, by encoding filenames using a scheme that avoids a leading underscore or period. That is, the design of //go:embed would cascade onto the design of the rest of program C.

One could imagine an io/fs.FS implementation that rewrote file names before passing them to the fs.FS from the //go:embed directives (and rewrote the results of the ReadDir method). That would constrain the rewriting to one small part of the program rather than letting them cascade everywhere.

@jpap
Copy link
Contributor Author

@jpap jpap commented Feb 19, 2021

One could imagine an io/fs.FS implementation that rewrote file names before passing them to the fs.FS from the //go:embed directives (and rewrote the results of the ReadDir method). That would constrain the rewriting to one small part of the program rather than letting them cascade everywhere.

That's only the consumption side of the story: you also need to ensure the production side matches. One would need to ensure the files on disk have been renamed, so that they can get past the great //go:embed filter and be included in the go build, otherwise there's nothing for a custom fs.FS implementation to transparently rename. That's a lot of dancing around a design constraint on //go:embed: give us an escape hatch if you would, please.

@jfesler
Copy link

@jfesler jfesler commented Mar 16, 2021

My use case for _: I have template .go files in a _template/ directory. Being that it starts with an underscore, tools ignore it. Awesome.

But, I can't include those files with //go:embed - so I had to rename that _template to template. And suddenly tools see templated go files that don't parse properly, at least not until the templates are expanded. So the IDE is unhappy, linters are unhappy, and so forth. I've had to add a throwaway extension to the end of every file, and then modify the tools using the templates to always add that throwaway extension.

It's a kludge, and not obvious to anyone looking at my repo for the first time.. but still beats the third party bundlers.

I very much would like to be able to, perhaps by way of my embed criteria, be able to intentionally include _ and . prefixed files and directories.

@rsc
Copy link
Contributor

@rsc rsc commented Mar 24, 2021

It seems like we could add //go:embed -all foo, and people using Go 1.16 would get an error that -all does not exist, which is OK. (And presumably we'd only allow it with go 1.17 in the go.mod, so after the error about -all not existing you'd get an error that the module wants Go 1.17.)

We would have to figure out how to change the go/build.Package encoding as well.

Thoughts on all of this, @jayconrod and @matloob?

@thomasf
Copy link

@thomasf thomasf commented Apr 2, 2021

What if you have a directory or file called all:? Is this just unlikely enough to ignore? Or do you have to all:all: (and thus forced to include hidden files)?

While it is possible to create a file or directory like that I believe that any : in a filename is invalid on windows so it would be really bad practice to use those names anywhere.

@jayconrod
Copy link
Contributor

@jayconrod jayconrod commented Apr 2, 2021

What if you have a directory or file called all:? Is this just unlikely enough to ignore? Or do you have to all:all: (and thus forced to include hidden files)?

The character : is not allowed in file names within modules. module.CheckFilePath has the specific restrictions.

The reason is what @thomasf getting at: when the go command downloads a module, it needs to be able to reliably extract it into the module cache. It needs to work on all supported operating systems.

Because of that restriction, it should be safe to use : as a control character in go:embed directives. Most of the other forbidden punctuation characters like * and ? already mean something to path.Match.

@antichris
Copy link

@antichris antichris commented Apr 5, 2021

The POSIX spec also precludes the use of : in directory names that might be used in PATH.

@rsc rsc changed the title proposal: cmd/go: opt-in for //go:embed to *not* ignore files and empty dirs proposal: cmd/go: add //go:embed all:<pattern> to allow . and _ matches Apr 7, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Apr 7, 2021

Retitled.

@jpap:

Sounds good, but can we please clarify that this would also make it legal for a specified directory to contain zero files

Sorry, but this is not possible. The module packaging (which cannot change because of checksums) does not have a concept of including a directory, only files (directories are implicit). So there is no way to embed an empty directory, because when compiled as a dependency that directory will not even exist. But with all:dir you will at least be able to write a dir/.empty file and get that.

@rsc rsc moved this from Active to Likely Accept in Proposals Apr 7, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Apr 7, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

@rsc rsc commented Apr 14, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: cmd/go: add //go:embed all:<pattern> to allow . and _ matches cmd/go: add //go:embed all:<pattern> to allow . and _ matches Apr 14, 2021
@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented May 19, 2021

It's a pity that Go doesn't have recursive glob support, such that you could write //go:embed templates/**/*.html or the like. ISTM, that would solve the underscore issue without introducing accidental hidden file inclusion, which this proposal encourages.

@thomasf
Copy link

@thomasf thomasf commented Aug 4, 2021

It's a pity that Go doesn't have recursive glob support, such that you could write //go:embed templates/**/*.html or the like. ISTM, that would solve the underscore issue without introducing accidental hidden file inclusion, which this proposal encourages.

For what it's worth I have have needed the :all feature and had to back off using embed.FS to not complicated things a couple of times so far. In both scenarios I needed both underscore and dot files to be included. I don't doubt that there are situations where only one of them are wanted but maybe it isn't so bad having a single all switch.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Aug 4, 2021

Just to get a sense of the problem to be solved, can you explain what the dot files were and why you needed them?

@thomasf
Copy link

@thomasf thomasf commented Aug 4, 2021

It was similar situations both times for me. Generating various types of project templates that includes files like .editorconfig, .gitignore, .env... and python modules which includes __init__.py files.

In every other situation I have encountered it has been easy to just avoid naming files in slightly weird ways like that.

@feedmeapples
Copy link

@feedmeapples feedmeapples commented Aug 13, 2021

@rsc any updates on this proposal?

The current limitation makes it super complicated to work with Svelte Kit sveltejs/kit#1370 (comment)

@leefernandes
Copy link

@leefernandes leefernandes commented Oct 1, 2021

The most tragic of events to unfold... paths with emojis don't load 😿

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet