proposal: cmd/go: support embedding static assets (files) in binaries #35950

bradfitz · 2019-12-03T23:39:15Z

There are many tools to embed static asset files into binaries:

https://godoc.org/perkeep.org/pkg/fileembed / perkeep.org/pkg/fileembed/genfileembed
https://godoc.org/github.com/gobuffalo/packr
https://godoc.org/github.com/knadh/stuffbin
https://github.com/rakyll/statik
Bazel go_embed_data

Actually, https://tech.townsourced.com/post/embedding-static-files-in-go/ lists more:

vfsgen - https://github.com/shurcooL/vfsgen
go.rice - https://github.com/GeertJohan/go.rice
statik - https://github.com/rakyll/statik
esc - https://github.com/mjibson/esc
go-embed - https://github.com/pyros2097/go-embed
go-resources - https://github.com/omeid/go-resources
statics - https://github.com/go-playground/statics
templify - https://github.com/wlbr/templify
gnoso/go-bindata - https://github.com/gnoso/go-bindata
shuLhan/go-bindata - https://github.com/shuLhan/go-bindata
fileb0x - https://github.com/UnnoTed/fileb0x
gobundle - https://github.com/alecthomas/gobundle
parcello - https://github.com/phogolabs/parcello

Proposal

I think it's time to do this well once & reduce duplication, adding official support for embedding file resources into the cmd/go tool.

Problems with the current situation:

There are too many tools
Using a go:generate-based solution bloats the git history with a second (and slightly larger) copy of each file.
Not using go:generate means not being go install-able or making people write their own Makefiles, etc.

Goals:

don't check in generated files
don't generate *.go files at all (at least not in user's workspace)
make go install / go build do the embedding automatically
let user choose per file/glob which type of access is needed (e.g. []byte, func() io.Reader, io.ReaderAt, etc)
Maybe store assets compressed in the binary where appropriate (e.g. if user only needs an io.Reader)? (edit: but probably not; see comments below)
No code execution at compilation time; that is a long-standing Go policy. go build or go install can not run arbitrary code, just like go:generate doesn't run automatically at install time.

The two main implementation approaches are //go:embed Logo logo.jpg or a well-known package (var Logo = embed.File("logo.jpg")).

go:embed approach

For a go:embed approach, one might say that any go/build-selected *.go file can contain something like:

//go:embed Logo logo.jpg

Which, say, compiles to:

func Logo() *io.SectionReader

(adding a dependency to the io package)

Or:

//go:embedglob Assets assets/*.css assets/*.js

compiling to, say:

var Assets interface{
     Files() []string
     Open func(name string) *io.SectionReader
} = runtime.EmbedAsset(123)

Obviously this isn't fully fleshed out. There'd need to be something for compressed files too that yield only an io.Reader.

embed package approach

The other high-level approach is to not have a magic //go:embed syntax and instead just let users write Go code in some new "embed" or "golang.org/x/foo/embed" package:

var Static = embed.Dir("static")
var Logo = embed.File("images/logo.jpg")
var Words = embed.CompressedReader("dict/words")

Then have cmd/go recognize the calls to embed.Foo("foo/*.js") etc and glob do the work in cmd/go, rather than at runtime. Or maybe certain build tags or flags could make it fall back to doing things at runtime instead. Perkeep (linked above) has such a mode, which is nice to speed up incremental development where you don't care about linking one big binary.

Concerns

Pick a style (//go:embed* vs a magic package).
Block certain files?
- Probably block embedding ../../../../../../../../../../etc/shadow
- Maybe block reaching into .git too

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2019-12-04T00:01:22Z

It's worth considering whether embedglob should support a complete file tree, perhaps using the ** syntax supported by some Unix shells.

ghost · 2019-12-04T01:39:24Z

Some people would need the ability to serve the embedded assets with HTTP using the http.FileServer.

I personally use either mjibson/esc (which does that) or in some cases my own file embedding implementation which renames files to create unique paths and adds a map from the original paths to the new ones, e.g. "/js/bootstrap.min.js": "/js/bootstrap.min.827ccb0eea8a706c4c34a16891f84e7b.js". Then you can use this map in the templates like this: href="{{ static_path "/css/bootstrap.min.css" }}".

cespare · 2019-12-04T02:12:26Z

I think a consequence of this would be that it would be nontrivial to figure out what files are necessary to build a program.

The //go:embed approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.

(Just musing out loud here.)

bradfitz · 2019-12-04T02:40:30Z

@opennota,

would need the ability to serve the embedded assets with HTTP using the http.FileServer.

Yes, the first link above is a package I wrote (in 2011, before Go 1) and still use, and it supports using http.FileServer: https://godoc.org/perkeep.org/pkg/fileembed#Files.Open

bradfitz · 2019-12-04T02:41:55Z

@cespare,

The //go:embed approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.

Yes, good point. That's a very strong argument for using a package. It also makes it more readable & documentable, since we can document it all with regular godoc, rather than deep in cmd/go's docs.

agnivade · 2019-12-04T03:25:45Z

@bradfitz - Do you want to close this #3035 ?

bradfitz · 2019-12-04T03:50:28Z

@agnivade, thanks for finding that! I thought I remembered that but couldn't find it. Let's leave it open for now and see what others think.

balasanjay · 2019-12-04T04:47:02Z

If we go with the magic package, we could use the unexported type trick to ensure that callers pass compile-time constants as arguments: https://play.golang.org/p/RtHlKjhXcda.

(This is the strategy referenced here: https://groups.google.com/forum/#!topic/golang-nuts/RDA9Hag8RZw/discussion)

AlexRouSg · 2019-12-04T09:15:07Z

One concern I have is how would it hanle invividual or all assets being too big to fit into memory and whether there would be maybe a build tag or per file access option to choose between pritorizing access time vs memory footprint or some middle ground implementation.

urandom · 2019-12-04T12:20:44Z

the way i've solved that problem (because of course i also have my own implementation :) ) is to provide an http.FileSystem implementation that serves all embedded assets. That way, you don't to rely on magic comments in order to appease the typechecker, the assets can easily be served by http, a fallback implementation can be provided for development purposes (http.Dir) without changing the code, and the final implementation is quite versatile, as http.FileSystem covers quite a bit, not only in reading files, but listing directories as well.

One can still use magic comments or whatever to specify what needs to be embedded, though its probably easier to specify all the globs via a plain text file.

ianlancetaylor · 2019-12-04T14:44:44Z

@AlexRouSg This proposal would only be for files which are appropriate to include directly in the final executable. It would not be appropriate to use this for files that are too big to fit in memory. There's no reason to complicate this tool to handle that case; for that case, just don't use this tool.

bradfitz · 2019-12-04T15:45:46Z

@ianlancetaylor, I think the distinction @AlexRouSg was making was between having the files provided as global []bytes (unpageable, potentially writable memory) vs providing a read-only, on-demand view of an ELF section that can normally live on disk (in the executable), like via an Open call that returns an *io.SectionReader. (I don't want to bake in http.File or http.FileSystem into cmd/go or runtime... net/http can provide an adapter.)

urandom · 2019-12-04T15:56:00Z

@bradfitz both http.File itself is an interface with no technical dependencies to the http package. It might be a good idea for any Open method to provide an implementation that conforms to that interface, because both the Stat and Readdir methods are quite useful for such assets

bradfitz · 2019-12-04T16:08:09Z

@urandom, it couldn't implement http.FileSystem, though, without referring to the "http.File" name (https://play.golang.org/p/-r3KjG1Gp-8).

rsc · 2019-12-04T16:19:07Z

@robpike and I talked through a proposal for doing this years ago (before there was a proposal process) and never got back to doing anything. It's been bugging me for years that we never finished doing that. The idea as I remember it was to just have a special directory name like "static" containing the static data and automatically make them available through an API, with no annotations needed.

I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes. If users want to store compressed data in that file, great, the details are up to them and there's no API needed on Go's side at all.

jayconrod · 2019-12-04T16:34:21Z

A couple thoughts:

It should not be possible to embed any file outside the module doing the embedding. We need to make sure files are part of module zip files when we create them, so that also means no symbolic links, case conflicts, etc. We can't change the algorithm that produces zip files without breaking sums.
I think it's simpler to restrict embedding to be in the same directory (if //go:embed comments are used) or a specific subdirectory (if static is used). This makes it a lot easier to understand the relationship between packages and embedded files.

Either way, this blocks embedding /etc/shadow or .git. Neither can be included in a module zip.

In general, I'm worried about expanding the scope of the go command too much. However, the fact that there are so many solutions to this problem means there probably ought to be one official solution.

I'm familiar with go_embed_data and go-bindata (of which there are several forks), and this seems to cover those use cases. Are there any important problems the others solve that this doesn't cover?

DeedleFake · 2019-12-04T16:36:04Z

Blocking certain files shouldn't be too hard, especially if you use a static or embed directory. Symlinks might complicate that a bit, but you can just prevent it from embedding anything outside of the current module or, if you're on GOPATH, outside of the package containing the directory.

I'm not particularly a fan of a comment that compiles to code, but I also find the pseudo-package that affects compilation to be a bit strange as well. If the directory approach isn't used, maybe it might make a bit more sense to have some kind embed top-level declaration actually built into the language. It would work similarly to import, but would only support local paths and would require a name for it to be assigned to. For example,

embed ui "./ui/build"

func main() {
  file, err := ui.Open("version.txt")
  if err != nil {
    panic(err)
  }
  version, err = ioutil.ReadAll(file)
  if err != nil {
    panic(err)
  }
  file.Close()

  log.Printf("UI Version: %s\n", bytes.TrimSpace(version))
  http.ListenAndServe(":8080", http.EmbeddedDir(ui))
}

Edit: You beat me to it, @jayconrod.

josharian · 2019-12-04T17:28:02Z

To expand on #35950 (comment), there is a puzzle about the exposed API. The obvious ways to expose the data are []byte, string, and Read-ish interfaces.

The typical case is that you want the embedded data to be immutable. However, all interfaces exposing []byte (which includes io.Reader, io.SectionReader, etc.) must either (1) make a copy, (2) allow mutability, or (3) be immutable despite being a []byte. Exposing the data as strings solves that, but at the cost of an API that will often end up requiring copying anyway, since lots of code that consumes embedded files eventually requires byte slices one way or another.

I'd suggest route (3): be immutable despite being a []byte. You can enforce this cheaply by using a readonly symbol for the backing array. This also lets you safely expose the same data as a []byte and a string; attempts to mutate the data will fail. The compiler can't take advantage of the immutability, but that's not too great of a loss. This is something that toolchain support can bring to the table that (as far as I know) none of the existing codegen packages do.

(A third party codegen package could do this by generating a generic assembly file containing DATA symbols that are marked as readonly, and then short arch-specific assembly files exposing those symbols in the form of strings and []bytes. I wrote CL 163747 specifically with this use case in mind, but never got around to integrating it into any codegen packages.)

DeedleFake · 2019-12-04T17:35:13Z

I'm unsure what you're talking about in terms of immutability. io.Reader already enforces immutability. That's the entire point. When you call Read(buf), it copies data into the buffer that you provided. Changing buf after that has zero effect on the internals of the io.Reader.

bradfitz · 2019-12-04T17:36:32Z

I agree with @DeedleFake. I don't want to play games with magic []byte array backings. It's okay to copy from the binary into user-provided buffers.

gdamore · 2019-12-04T19:11:57Z

Just another wrinkle here -- I have a different project which uses DTrace source code (embedded). This is sensitive to differences between \n and \r\n. (We can argue whether this is a dumb thing in DTrace or not -- that's beside the point and it is the situation today.)

It's super useful that backticked strings treat both as \n regardless of how they appear in source, and I rely on this with a go-generate to embed the DTrace.

So if there is an embed file added to the go command, I would gently suggest that options to change the handling of CR/CRLF might come in very handy, particularly for folks who might be developing on different systems where the default line endings can be a gotcha.

rsc · 2019-12-04T19:15:11Z

Like with compression, I'd really like to stop at "copy the file bytes into the binary". CR/CRLF normalization, Unicode normalization, gofmt'ing, all that belongs elsewhere. Check in the files containing the exact bytes you want. (If your version control can't leave them alone, maybe check in gzipped content and gunzip them at runtime.) There are many file munging knobs we could imagine adding. Let's stop at 0.

rsc · 2019-12-04T19:16:16Z

It may be too late to introduce a new reserved directory name, as much as I'd like to.
(It wasn't too late back in 2014, but it's probably too late now.)
So some kind of opt-in comment may be necessary.

Suppose we define a type runtime.Files. Then you could imagine writing:

//go:embed *.html (or static/* etc)
var files runtime.Files

And then at runtime you just call files.Open to get back an interface { io.ReadSeeker; io.ReaderAt } with the data. Note that the var is unexported, so one package can't go around grubbing in another package's embedded files.

Names TBD but as far as the mechanism it seems like that should be enough, and I don't see how to make it simpler. (Simplifications welcome of course!)

rsc · 2019-12-04T19:16:21Z

Whatever we do, it needs to be possible to support with Bazel and Gazelle too. That would mean having Gazelle recognize the comment and write out a Bazel rule saying the globs, and then we'd need to expose a tool (go tool embedgen or whatever) to generate the extra file to include in the build (the go command would do this automatically and never actually show the extra file). That seems straightforward enough.

gdamore · 2019-12-04T19:23:57Z

If various munging won't do the trick, then that's an argument against using this new facility. It's not a stopper for me -- I can use go generate like I've been doing, but it means I cannot benefit from the new feature.

With respect to munging in general -- I can imagine a solution where someone provides an implementation of an interface (something like a Reader() on one side, and something to receive the file on the other -- maybe instantianted with an io.Reader from the file itself) -- which the go cmd would build and run to prefilter the file before embedding. Then folks can provide whatever filter they want. I imagine some folks would provide quasi-standard filters like a dos2unix implementation, compression, etc. (Maybe they should be chainable even.)

I guess there'd have to be an assumption that whatever the embedded processor is, it must be compilable on ~every build system, as go would be building a temporary native tool for this purpose.

magical · 2019-12-04T19:27:34Z

It may be too late to introduce a new reserved directory name, as much as I'd like to. [...] some kind of opt-in comment may be necessary.

If the files are only accessible through a special package, say runtime/embed, then importing that package could be the opt-in signal.

diamondburned · 2020-05-29T21:11:29Z

Has anybody looked at markbates/pkger? It's a pretty simple solution of using go.mod as the current working directory. Assuming an index.html is to be embedded, opening it would be pkger.Open("/index.html"). I think this is a better idea than hardcoding a static/ directory in the project.

It's also worth mentioning that Go doesn't have any significant structure requirements for a project as far as I could see. go.mod is just a file and not a lot of people ever use vendor/. I personally don't think that a static/ directory would be any good.

TheMightyGit · 2020-07-06T20:55:19Z

As we already have a way of injecting (albeit limited) data into a build via the existing ldflags link flag -X importpath.name=value, could that code path be adjusted to accept -X importpath.name=@filename to inject external arbitrary data?

I realise this doesn't cover all of the stated goals of the original issue, but as an extension of the existing -X functionality does it seem a reasonable step forward?

(And if that works out then extending the go.mod syntax as a neater way of specifying ldflags -X values is a next reasonable step?)

earthboundkid · 2020-07-06T21:20:26Z

That's a very interesting idea, but I'm worried about the security implications.

It's pretty common to do -X 'pkg.BuildVersion=$(git rev-parse HEAD)', but we wouldn't want to let go.mod run arbitrary commands, would we? (I guess go generate does, but that's not something you typically run for downloaded OSS packages.) If go.mod can't handle that, it ends up missing a major use case, so ldflags would still be very common.

Then there's the other issue of making sure @filename is not a symlink to /etc/passwd or whatever.

flimzy · 2020-07-06T21:24:40Z

Using the linker precludes support for WASM, and possibly other targets that don't use a linker.

rsc · 2020-07-21T15:21:57Z

Based on the discussion here, @bradfitz and I worked out a design that sits somewhere in the middle of the two approaches considered above, taking what seems to be the best of each. I've posted a draft design doc, video, and code (links below). Instead of comments on this issue, please use the Reddit Q&A for comments on this specific draft design - Reddit threads and scales discussions better than GitHub does. Thanks!

Video: https://golang.org/s/draft-embed-video
Design: https://golang.org/s/draft-embed-design
Q&A: https://golang.org/s/draft-embed-reddit
Code: https://golang.org/s/draft-embed-code

ghost · 2020-07-22T22:11:52Z

@rsc In my opinion, the go:embed proposal is inferior to providing universal sandboxed Go code execution at compile-time which would include reading files and transforming read data into an optimal format best suitable for consumption at runtime.

diamondburned · 2020-07-22T22:26:44Z

@atomsymbol That sounds like something waaay outside the scope of this issue.

ghost · 2020-07-22T22:28:26Z

@atomsymbol That sounds like something waaay outside the scope of this issue.

I am aware of that.

kokes · 2020-07-31T08:16:48Z

I read through the proposal and scanned the code, but couldn't find an answer to this: Will this embedding scheme contain information about the file on disk (~os.Stat)? Or will these timestamps get reset to build time? Either way, these are useful pieces information that gets used in various places, e.g. we can send a 304 for unchanged assets based on this.

Thanks!

Edit: found it in the reddit thread.

The modification time for all embedded files is the zero time, for exactly the reproducibility concerns you listed. (Modules don't even record modification times, again for the same reason.)

https://old.reddit.com/r/golang/comments/hv96ny/qa_goembed_draft_design/fytj7my/

thomasf · 2020-07-31T12:44:51Z

Either way, these are useful pieces information that gets used in various places, e.g. we can send a 304 for unchanged assets based on this.

An ETag header based on the file data hash would solve that problem without having to know anything about dates. But that would have to be known by http.HandlerFS or something to be able to work and to not waste resources it would have to be done only once per file.

earthboundkid · 2020-07-31T12:48:35Z

But that would have to be known by http.HandlerFS or something to be able to work and to not waste resources it would have to be done only once per file.

How would http.HandlerFS know that the fs.FS was immutable? Should there be an IsImmutable() bool optional interface?

thomasf · 2020-07-31T12:55:21Z

How would http.HandlerFS know that the fs.FS was immutable? Should there be an IsImmutable() bool optional interface?

I don't want to get into implementation details because I'm not the designer of these things but http.HandlerFS could check if it's an embed.FS type and act upon that as a special case, I don't think anyone wants to expand the FS API right now. There could also be an option argument to HandlerFS specifically to tell it to treat a filesystem as immutable. Also if this is done on application start up and all ctime/mtime have zero value handlerFS could use that info to "know" that the file hasn't changed but there are also file systems which might not have mtime or have it disabled so there might be problems there as well.

rsc · 2020-09-02T16:20:07Z

I wasn't watching the comments on this issue.

@atomsymbol welcome back! It's great to see you commenting here again.
I agree in principle that if we had sandboxing many things would be easier.
On the other hand many things might be harder - builds might never finish.
In any event, we definitely don't have that kind of sandboxing today. :-)

@kokes I am not sure about the details,
but we'll make sure serving an embed.Files over HTTP gets ETags right by default.

rsc · 2020-09-02T16:20:47Z

I have filed #41191 for accepting the design draft posted back in July.
I am going to close this issue as superseded by that one.
Thanks for the great preliminary discussion here.

gopherbot added this to the Proposal milestone Dec 3, 2019

gopherbot added the Proposal label Dec 3, 2019

rsc mentioned this issue Dec 4, 2019

cmd/go: handle program or library assets #3035

Closed

ianlancetaylor mentioned this issue May 22, 2020

proposal: Go 2: Permit running code at compile-time using "comptime" keyword #39216

Closed

mreiferson mentioned this issue Jun 21, 2020

nsqadmin: move away from go-bindata nsqio/nsq#994

Merged

cirelli94 mentioned this issue Jun 22, 2020

Embed scripts and sql inside agent ercole-io/ercole-agent#157

Closed

nicks mentioned this issue Jul 29, 2020

feature request: load web assets from offline tilt-dev/tilt#2608

Closed

joe-getcouragenow mentioned this issue Aug 1, 2020

How to run the GUI ? ctessum/cityaq#5

Closed

tlimoncelli mentioned this issue Aug 18, 2020

New feature: require_glob() (similar to require() but supports globs) StackExchange/dnscontrol#804

Merged

knadh mentioned this issue Aug 30, 2020

Switch stuffbin for go.rice to embed static assets knadh/niltalk#24

Closed

rsc closed this as completed Sep 2, 2020

rsc mentioned this issue Sep 2, 2020

embed, cmd/go: add support for embedded files #41191

Closed

vrongmeal mentioned this issue Nov 9, 2020

Add ability to package static content into the binary. sdslabs/pinger#69

Closed

tv42 mentioned this issue Jan 28, 2021

proposal: io/fs, net/http: define interface for automatic ETag serving #43223

Closed

logrusorgru mentioned this issue Jan 28, 2021

Option to suppress generation date in the output logrusorgru/textFileToGoConst#2

Closed

halyph mentioned this issue Mar 14, 2021

go embed halyph/mind-flow#151

Open

golang locked and limited conversation to collaborators Sep 2, 2021

gopherbot added the FrozenDueToAge label Sep 2, 2021

proposal: cmd/go: support embedding static assets (files) in binaries #35950

proposal: cmd/go: support embedding static assets (files) in binaries #35950

Comments

bradfitz commented Dec 3, 2019 • edited Loading

Proposal

Problems with the current situation:

Goals:

go:embed approach

embed package approach

Concerns

ianlancetaylor commented Dec 4, 2019

ghost commented Dec 4, 2019 • edited by ghost Loading

cespare commented Dec 4, 2019

bradfitz commented Dec 4, 2019

bradfitz commented Dec 4, 2019

agnivade commented Dec 4, 2019

bradfitz commented Dec 4, 2019

balasanjay commented Dec 4, 2019

AlexRouSg commented Dec 4, 2019

urandom commented Dec 4, 2019 • edited Loading

ianlancetaylor commented Dec 4, 2019

bradfitz commented Dec 4, 2019

urandom commented Dec 4, 2019

bradfitz commented Dec 4, 2019

rsc commented Dec 4, 2019

jayconrod commented Dec 4, 2019

DeedleFake commented Dec 4, 2019 • edited Loading

josharian commented Dec 4, 2019

DeedleFake commented Dec 4, 2019

bradfitz commented Dec 4, 2019

gdamore commented Dec 4, 2019

rsc commented Dec 4, 2019

rsc commented Dec 4, 2019

rsc commented Dec 4, 2019

gdamore commented Dec 4, 2019

magical commented Dec 4, 2019

diamondburned commented May 29, 2020

TheMightyGit commented Jul 6, 2020 • edited Loading

earthboundkid commented Jul 6, 2020

flimzy commented Jul 6, 2020

rsc commented Jul 21, 2020

ghost commented Jul 22, 2020

diamondburned commented Jul 22, 2020

ghost commented Jul 22, 2020

kokes commented Jul 31, 2020 • edited Loading

thomasf commented Jul 31, 2020

earthboundkid commented Jul 31, 2020

thomasf commented Jul 31, 2020 • edited Loading

rsc commented Sep 2, 2020

rsc commented Sep 2, 2020

bradfitz commented Dec 3, 2019 •

edited

Loading

ghost commented Dec 4, 2019 •

edited by ghost

Loading

urandom commented Dec 4, 2019 •

edited

Loading

DeedleFake commented Dec 4, 2019 •

edited

Loading

TheMightyGit commented Jul 6, 2020 •

edited

Loading

kokes commented Jul 31, 2020 •

edited

Loading

thomasf commented Jul 31, 2020 •

edited

Loading