Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embed, cmd/go: add support for embedded files #41191

Open
rsc opened this issue Sep 2, 2020 · 84 comments
Open

embed, cmd/go: add support for embedded files #41191

rsc opened this issue Sep 2, 2020 · 84 comments

Comments

@rsc
Copy link
Contributor

@rsc rsc commented Sep 2, 2020

In July, @bradfitz and I posted a draft design for embedded files. The doc links to a video, prototype code, and a Reddit discussion.

The feedback on that design has been overwhelmingly positive.

I propose to adopt the embedded files draft design for Go 1.16, with one addition, suggested in the discussion, to simplify the case of direct access to the bytes in a single embedded file.

As long as a file imports "embed" (import _ "embed" if necessary), it will be permitted to use //go:embed naming a single file (no glob patterns or directory matching allowed) to initialize a plain string or []byte variable:

//go:embed gopher.png
var gopherPNG []byte

The import is required to flag the file as containing //go:embed lines and needing processing. Goimports (and gopls etc) can be taught this rule and automatically add the import in any file with a //go:embed as needed.

The embedded files design depends on the file system interface draft design, which I've also proposed to adopt in #41190.

This issue is only about adopting the embedded files design, under the assumption that the file system interface design is also adopted. If this proposal is accepted before the file system interface design is, we'd simply wait for the file system interface design before starting to land changes.

@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Sep 2, 2020

Would it be an error to have a //go:embed directive without importing embed?

@rsc rsc changed the title embed, cmd/go: add support for embedded files proposal: embed, cmd/go: add support for embedded files Sep 2, 2020
@gopherbot gopherbot added the Proposal label Sep 2, 2020
@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 2, 2020

@rsc rsc added this to Active in Proposals Sep 2, 2020
@pierrec
Copy link

@pierrec pierrec commented Sep 2, 2020

@rsc Maybe I have missed it in the draft, but I don't see the ability to embed a single file that you mention in your comment.
Also, would you be able to embed a single file as a const string as well?
Thanks for this great proposal.

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 2, 2020

@pierrec It's not in the draft doc (the "one addition" is the text in the comment above). Const strings can end up playing a role in deciding whether a program type checks, which would mean all type checkers would need to understand //go:embed'ed consts. In contrast, if we stick to vars, type checkers are none the wiser and can be left alone. Seems like we should probably stick to vars.

Is there a particular reason you wanted a const instead of a var? Using them should be about the same as far as efficiency. (References to const strings end up compiling to what amount to references to hidden vars anyway.)

@pierrec
Copy link

@pierrec pierrec commented Sep 2, 2020

Thanks for the explanation. I tend to embed static assets as const strings at the moment this is why I asked. I am fine with vars too!

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Sep 2, 2020

Interesting, so I could do something like:

//go:embed version.txt
var Version string

And potentially even have a //go:generate comment to generate version.txt. That would cut out a large use case for makefiles/ldflags.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Sep 2, 2020

Is it an error if the file isn't found? If so, where technically is the error considered to occur? Link time?

@docmerlin
Copy link

@docmerlin docmerlin commented Sep 2, 2020

Can we make sure that go:embed runs after go:generate so we can do things like easily generate versions, etc?

@tooolbox
Copy link

@tooolbox tooolbox commented Sep 2, 2020

Can we make sure that go:embed runs after go:generate so we can do things like easily generate versions, etc?

From my understanding,go:generate will occur with go generate while go:embed would happen with go build.

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 2, 2020

@carlmjohnson Yes, it is always an error to say //go:embed foo where foo does not exist.
The error happens when compiling the source file containing that line.
(If you were to remove foo after compiling that source file, you still wouldn't get to a link step - the go command would notice that the package needs rebuilding because foo was deleted.)

@tv42
Copy link

@tv42 tv42 commented Sep 2, 2020

I think that this proposal is not complete without saying something about ETag.
https://old.reddit.com/r/golang/comments/hv96ny/qa_goembed_draft_design/fzi0pok/

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 2, 2020

@tv42, yes, we will make ETag work. I'm not sure what the shape of that is but we will.
(Also affirmed on #35950 (comment).)

@andig
Copy link
Contributor

@andig andig commented Sep 3, 2020

TwoThree things I've noticed from working with mjibson/esc:

  • As go:embed doesn't need to generate go files for embedding as read-only filesystem, it would take away the pain of changing timestamps on the go:generateed files that defied git porcelain tests on CI- very nice
  • One thing I didn't find in the proposal but would need is the ability to live-reload the embedded files during development cycles. Using mjibson/esc I can currently do that instructing it to use the local filesystem (though it wouldn't pick up new files) and change the behaviour using build tags. I'm wondering what that could fit into the proposal?
  • Update Another thing I remember is that esc required to be able to transparently strip (parts of) the base path in order to e.g. export an assets folder as web root.

Afterthought: I guess the second point could be remedied in conjunction with the io/fs proposal where I would either use the embedded or the live filesystem for inclusion? Implement the path stripping as io/fs middleware?

@Merovius
Copy link

@Merovius Merovius commented Sep 3, 2020

@andig You can already strip prefixes when serving a filesystem over HTTP. I agree that the live-reloading can be done by a third party library wrapping an io/fs.

@andig
Copy link
Contributor

@andig andig commented Sep 3, 2020

One more thing: if I understand correctly, embed will consider files locally to the package and forbids ... My current design has /assets and /server/ where the latter contains the server‘s code and today hosts the generated files. With this proposal the embed would need to move to the root folder as assets wouldn‘t be accessible from server. This imposes different accessibility constraints from normal imports. I was wondering if this is necessary for security reasons or if module-local embeds should be generally allowed.

@thomasf
Copy link

@thomasf thomasf commented Sep 3, 2020

One more thing: if I understand correctly, embed will consider files locally to the package and forbids ... My current design has /assets and /server/ where the latter contains the server‘s code and today hosts the generated files. With this proposal the embed would need to move to the root folder as assets wouldn‘t be accessible from server. This imposes different accessibility constraints from normal imports. I was wondering if this is necessary for security reasons or if module-local embeds should be generally allowed.

You can create a emed.go file in your assets directory and make the assets available as it's own package to the rest of your program.

@tristanfisher
Copy link

@tristanfisher tristanfisher commented Sep 3, 2020

Another explicit goal is to avoid a language change. To us, embedding static assets seems like a tooling issue, not a language issue.

Agreed. In my opinion, adding syntactical sugar in the language to support this tooling change is a language change. I'm sure this is obvious to others, but this is effectively comment-as-code.

I strongly feel that magic/sugar detracts from the simplicity and readability of the language; it is very easy to miss a magical comment that embeds a file. While a response to this could easily be "okay, then don't use it", this change means that a reviewer still has to be vigilant for others using this feature and has to remember that comments around variable declarations can break builds or fail at compile-time.

I believe this is going to add confusion, detract from language usability, and will result in opaque, large binaries without clear benefit (regarding the lattermost concern, this will even lead to an anti-pattern of re-building binaries due to plain file changes). If go mod allowed for a --withNonGoCodeAssets, I believe this would solve the needs of most developers that don't want to write more complex build pipelines (I assume end-user distribution is a smaller subset of the problem for users).

@rsc
Copy link
Contributor Author

@rsc rsc commented Sep 3, 2020

@tristanfisher, I understand your point about language vs tooling change. It's certainly near the line. The reason that I consider it more a tooling change is that the language spec is unaffected - whether a program is valid does not change, the type checking process does not change. All that changes is the initial value of that variable following the comment. In this way it is a bit like the linker's -X flag, which can set the initial value of a top-level var of type string. It's fine for us to disagree; I just wanted to make my definition clear and explain the distinction I'm making.

As for bloat, I guess we'll have to see, but I don't anticipate programs getting much larger than they already are. People already run tools that turn arbitrary files into Go code, check them into their repos, and make the compiler build them. The design removes some overhead from this process but does not enable anything new. Maybe people will abuse it now that it's easier to do, but on balance I don't expect that to be much of a problem. (And if some dependency embeds something so big that it bloats your binaries, you can always choose not to use that dependency.)

As for rebuilds due to plain file changes, the only files that can trigger rebuilds are the ones in your own top-level module, since dependencies are immutable. If you found rebuilds happening more often than you'd like, the only explanation is (1) you are embedding files and (2) you are modifying those files. You would be in complete control of doing something about either cause. (It would be another thing entirely if a dependency's choice of what to use was somehow forcing extra rebuilds or other expense on you. But that's not the case here.)

@tristanfisher
Copy link

@tristanfisher tristanfisher commented Sep 3, 2020

@rsc I agree that it's okay for us to disagree and I appreciate your response. My feeling is that if it's included by default in the standard tooling and comments can lead to an implicit initialization of a variable, then it's a language change. Outside of that debate, I guess my icky feeling is around more directives as "magic" comments that need to be memorized by (human) code readers. This could be taken to the absurd conclusion of adding new features via block comments that get handled at build time.

That said, if this gets added to the ecosystem, I will be thankful that importing embed is required -- that's better than nothing as a "hey, head's up" when auditing code. I think go mod allowing for non .go would solve a majority of use cases (I imagine most people are going to glob files for webservers) and would also live entirely in tooling.

I think your point regarding the linker is a good one. It also helps explain my feelings on this: if the very end user (e.g. not someone that simply imports a package) is making the decision, there's no way to be surprised by blobs of non-code coming along for the ride. My concerns are born out of reviewing/pairing-on others' work and "tech-leady" responsibilities, which is why I felt the need to respond.

I think "we'll have to see" sums it up well (I'm more cynical about bloat/misuse).

@aykevl
Copy link

@aykevl aykevl commented Sep 3, 2020

I will read through the draft design tonight, so far it looks good from the TinyGo perspective.

I just wanted to clarify one thing:

On the other hand, projects like TinyGo and U-root target systems with more RAM than disk or flash. For those projects, compressing assets and using incremental decompression at runtime could provide significant savings.

I don't know about U-root, but for TinyGo the main targets are microcontrollers which normally have far more flash than RAM (usually a factor of 8 or 16). A quick look at the draft design seems to suggest the idea is to keep the files in read-only memory, which would work fine for these targets: the embedded files can be read directly from flash. It would most likely not be desirable for TinyGo targets to decompress files at runtime.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 3, 2020

The io/fs proposal on which this depends looks to be blocked on Readdir/FileInfo problems, under discussion in #41188 and previously #40352.

I've drafted an API to replace them in #41188 (comment)

@Merovius
Copy link

@Merovius Merovius commented Oct 31, 2020

My shell doesn't include dot-files in *, so if I want to include them, I have to use * .*. This might be a (perhaps equally surprising) way to give a level of control.

I'm not sure what to think - IMO, dot-files shouldn't really be treated differently by the pattern, but OTOH the .DS_Store example seems like something that really should be addressed.

@mvdan
Copy link
Member

@mvdan mvdan commented Oct 31, 2020

Why not just do git clean -dffx && go build? If the DS_Store files are in git, then they'll be included in the build. If they're not, they won't be. This will also work nicely with gitignore.

You should be building with a clean VCS checkout, anyway. If you add random temporary files, they might make it into the final build and you can't possibly know what files people will end up with. We might want to document this.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Oct 31, 2020

@mvdan I agree in principle, but in practice, I worry that a lot of people will get burned by dirty builds if they are not warned. I don't want to see a secret leak someone didn't realize their .env file was embedded by mistake. I have seen tons of examples of an equivalent mistake with PHP hosting, even though it could have been easily prevented by telling Apache to exclude dot files.


Re: embeding http.FileServers

If you use such a huge system of plugins as webpack it's easy to just install another webpack plugin to generate the embed.go along the assets themselves.

That's true, but it's very fiddly. The point of embed.FS is to reduce the need for crazy Makefile workarounds. I think the simple solution is to have fs.WithPrefix(string) fs.FS that locks an FS into a subdirectory. I thought there was some discussion of this in the proposal, but I can't find it now. Maybe it was just mooted on Reddit or something.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Oct 31, 2020

One way to remedy that could be to add the ability to exclude specific files/folders from embedding (@rsc ?)

It could be something like

//go:embed static
//go:embed-exclude .*
var staticFiles embed.FS

The embed-exclude directive could just do a glob filter on the accepted files and remove any matches…

@SamWhited
Copy link
Member

@SamWhited SamWhited commented Oct 31, 2020

If we want to avoid adding more to this proposal it could also be a lint rule that checks embedded filesystems for potentially unexpected dotfiles and warns you so that you can fix your build to remove them.

@andig
Copy link
Contributor

@andig andig commented Oct 31, 2020

Or exclude dot files by default unless they are specifically mentioned by adding .* or similar.

That still wouldn‘t take care of exposing an assets.go file. As for the proposal already having implemented, please note that the question about the asset generation was raised during the discussion phase. It‘s probably not dangerous to have an (otherwise empty, except for the embed directive) assets.go exposed but it would be cleaner to not have it. As usual there are all sorts of workarounds that can be applied.

skx added a commit to skx/rss2email that referenced this issue Oct 31, 2020
Rather than embedding our template into our binary using
`implant` instead use the new go-embed system:

* golang/go#41191
* https://go.googlesource.com/proposal/+/master/design/draft-embed.md

This required rebuilding go from source (took five minutes!):

* https://golang.org/doc/install/source

Once done I used the following repository for reference:

* https://github.com/mattn/go-embed-example

Seems to work, but we'll hold off merging this unless/until
we have go 1.16.x released because requiring a source build
is a pain.
@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 1, 2020

It‘s probably not dangerous to have an (otherwise empty, except for the embed directive) assets.go exposed but it would be cleaner to not have it.

I agree that this is very unlikely to be a problem, but I would hate to see any closed source code accidentally leaked due to misconfiguration if we can make it easy to configure things correctly.

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 1, 2020

I don't want to see a secret leak someone didn't realize their .env file was embedded by mistake.

If someone uses //go:embed static/* and there's a static/.env or static/.super-secret, wouldn't you say that the user really meant to include those files? Otherwise, why would they be in the static directory?

I get that this is up to what the user expects and what * means in most contexts, but I personally think that https://golang.org/pkg/path/filepath/#Glob semantics is our only good option. It's the simplest and what most Go users will be used to in the context of developing Go.

I think warning against the dangers of embedding * is a good idea in any case, though, because in a significant amount of cases one could reduce the chance of error by using more specific globs like *.png.

Also, I still encourage you to file a separate issue that can be tracked against the 1.16 release, written from the point of view of a bug report. This proposal is accepted and implemented, so I imagine it will be closed very soon. For example, you could phrase the bug report like: embedded file support easily leads to including unintended files (and give some examples).

@inliquid
Copy link

@inliquid inliquid commented Nov 1, 2020

If someone uses //go:embed static/* and there's a static/.env or static/.super-secret, wouldn't you say that the user really meant to include those files? Otherwise, why would they be in the static directory?

There is a huge amount of corner cases, for example you opened an edit session with vim, but didn't close it, and it created .*.swp file with some contents that you would like nobody to see.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Nov 1, 2020

Moving discussion to #42321.

@steebchen
Copy link

@steebchen steebchen commented Nov 10, 2020

This would be highly appreciated by the Prisma team for our Database Go client, since we use a query engine at runtime which is written in rust and needs to be in the built go binary somehow.

The way we're currently doing it is packing the binary into a .go file at go:generate time, but the file size is much higher than when it's a binary .gz file.

Native embeds would make this much better so we could directly embed a .gz file into the final go binary.

@tv42
Copy link

@tv42 tv42 commented Nov 10, 2020

If someone uses //go:embed static/* and there's a static/.env or static/.super-secret, wouldn't you say that the user really meant to include those files?

No, I would not.

$ mkdir z
$ touch z/.secret z/intended
$ ls z/*
z/intended
$ ls z
intended
@mvdan
Copy link
Member

@mvdan mvdan commented Nov 10, 2020

See my later comment in #42328 (comment).

@brightzheng100
Copy link

@brightzheng100 brightzheng100 commented Nov 24, 2020

I love the idea to make static files / templates embeddable which definitely can boost go dev productivity a lot.

But should we innovate another tag (e.g. @ or whatever) other than to reuse this //, which is supposed to be comments?

As of now, the // has been overused I guess, think of these:

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:webhook:verbs=create;update,path=/validate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=false,failurePolicy=fail,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,versions=v1,name=vcronjob.kb.io
// +optional
...
@ptman
Copy link

@ptman ptman commented Nov 24, 2020

But should we innovate another tag (e.g. @ or whatever) other than to reuse this //, which is supposed to be comments?

That's a separate discussion and while it would be nice for directives not to be comments, a lot of go1 compat code already relies on that, so it's most likely not going to change.

@Merovius
Copy link

@Merovius Merovius commented Nov 24, 2020

But should we innovate another tag (e.g. @ or whatever) other than to reuse this //, which is supposed to be comments?

I tried finding it and failed, but I remember there being a decision to standardize on //go:<world> for these kinds of directives.
It doesn't seem like a good idea to change syntax around, given the decision to expressly converge on them.
(Also, of course, they are comments specifically so that the compiler ignores them - these directives are specific to the go tool, so they shouldn't leak into the language proper)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.