Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: support embedding static assets (files) in binaries #35950

Open
bradfitz opened this issue Dec 3, 2019 · 174 comments
Open

proposal: cmd/go: support embedding static assets (files) in binaries #35950

bradfitz opened this issue Dec 3, 2019 · 174 comments

Comments

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 3, 2019

There are many tools to embed static asset files into binaries:

Actually, https://tech.townsourced.com/post/embedding-static-files-in-go/ lists more:

Proposal

I think it's time to do this well once & reduce duplication, adding official support for embedding file resources into the cmd/go tool.

Problems with the current situation:

  • There are too many tools
  • Using a go:generate-based solution bloats the git history with a second (and slightly larger) copy of each file.
  • Not using go:generate means not being go install-able or making people write their own Makefiles, etc.

Goals:

  • don't check in generated files
  • don't generate *.go files at all (at least not in user's workspace)
  • make go install / go build do the embedding automatically
  • let user choose per file/glob which type of access is needed (e.g. []byte, func() io.Reader, io.ReaderAt, etc)
  • Maybe store assets compressed in the binary where appropriate (e.g. if user only needs an io.Reader)? (edit: but probably not; see comments below)
  • No code execution at compilation time; that is a long-standing Go policy. go build or go install can not run arbitrary code, just like go:generate doesn't run automatically at install time.

The two main implementation approaches are //go:embed Logo logo.jpg or a well-known package (var Logo = embed.File("logo.jpg")).

go:embed approach

For a go:embed approach, one might say that any go/build-selected *.go file can contain something like:

//go:embed Logo logo.jpg

Which, say, compiles to:

func Logo() *io.SectionReader

(adding a dependency to the io package)

Or:

//go:embedglob Assets assets/*.css assets/*.js

compiling to, say:

var Assets interface{
     Files() []string
     Open func(name string) *io.SectionReader
} = runtime.EmbedAsset(123)

Obviously this isn't fully fleshed out. There'd need to be something for compressed files too that yield only an io.Reader.

embed package approach

The other high-level approach is to not have a magic //go:embed syntax and instead just let users write Go code in some new "embed" or "golang.org/x/foo/embed" package:

var Static = embed.Dir("static")
var Logo = embed.File("images/logo.jpg")
var Words = embed.CompressedReader("dict/words")

Then have cmd/go recognize the calls to embed.Foo("foo/*.js") etc and glob do the work in cmd/go, rather than at runtime. Or maybe certain build tags or flags could make it fall back to doing things at runtime instead. Perkeep (linked above) has such a mode, which is nice to speed up incremental development where you don't care about linking one big binary.

Concerns

  • Pick a style (//go:embed* vs a magic package).
  • Block certain files?
    • Probably block embedding ../../../../../../../../../../etc/shadow
    • Maybe block reaching into .git too
@gopherbot gopherbot added this to the Proposal milestone Dec 3, 2019
@gopherbot gopherbot added the Proposal label Dec 3, 2019
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 4, 2019

It's worth considering whether embedglob should support a complete file tree, perhaps using the ** syntax supported by some Unix shells.

@opennota
Copy link

@opennota opennota commented Dec 4, 2019

Some people would need the ability to serve the embedded assets with HTTP using the http.FileServer.

I personally use either mjibson/esc (which does that) or in some cases my own file embedding implementation which renames files to create unique paths and adds a map from the original paths to the new ones, e.g. "/js/bootstrap.min.js": "/js/bootstrap.min.827ccb0eea8a706c4c34a16891f84e7b.js". Then you can use this map in the templates like this: href="{{ static_path "/css/bootstrap.min.css" }}".

@cespare
Copy link
Contributor

@cespare cespare commented Dec 4, 2019

I think a consequence of this would be that it would be nontrivial to figure out what files are necessary to build a program.

The //go:embed approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.

(Just musing out loud here.)

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 4, 2019

@opennota,

would need the ability to serve the embedded assets with HTTP using the http.FileServer.

Yes, the first link above is a package I wrote (in 2011, before Go 1) and still use, and it supports using http.FileServer: https://godoc.org/perkeep.org/pkg/fileembed#Files.Open

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 4, 2019

@cespare,

The //go:embed approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.

Yes, good point. That's a very strong argument for using a package. It also makes it more readable & documentable, since we can document it all with regular godoc, rather than deep in cmd/go's docs.

@agnivade
Copy link
Contributor

@agnivade agnivade commented Dec 4, 2019

@bradfitz - Do you want to close this #3035 ?

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 4, 2019

@agnivade, thanks for finding that! I thought I remembered that but couldn't find it. Let's leave it open for now and see what others think.

@balasanjay
Copy link
Contributor

@balasanjay balasanjay commented Dec 4, 2019

If we go with the magic package, we could use the unexported type trick to ensure that callers pass compile-time constants as arguments: https://play.golang.org/p/RtHlKjhXcda.

(This is the strategy referenced here: https://groups.google.com/forum/#!topic/golang-nuts/RDA9Hag8RZw/discussion)

@AlexRouSg
Copy link
Contributor

@AlexRouSg AlexRouSg commented Dec 4, 2019

One concern I have is how would it hanle invividual or all assets being too big to fit into memory and whether there would be maybe a build tag or per file access option to choose between pritorizing access time vs memory footprint or some middle ground implementation.

@urandom
Copy link

@urandom urandom commented Dec 4, 2019

the way i've solved that problem (because of course i also have my own implementation :) ) is to provide an http.FileSystem implementation that serves all embedded assets. That way, you don't to rely on magic comments in order to appease the typechecker, the assets can easily be served by http, a fallback implementation can be provided for development purposes (http.Dir) without changing the code, and the final implementation is quite versatile, as http.FileSystem covers quite a bit, not only in reading files, but listing directories as well.

One can still use magic comments or whatever to specify what needs to be embedded, though its probably easier to specify all the globs via a plain text file.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 4, 2019

@AlexRouSg This proposal would only be for files which are appropriate to include directly in the final executable. It would not be appropriate to use this for files that are too big to fit in memory. There's no reason to complicate this tool to handle that case; for that case, just don't use this tool.

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 4, 2019

@ianlancetaylor, I think the distinction @AlexRouSg was making was between having the files provided as global []bytes (unpageable, potentially writable memory) vs providing a read-only, on-demand view of an ELF section that can normally live on disk (in the executable), like via an Open call that returns an *io.SectionReader. (I don't want to bake in http.File or http.FileSystem into cmd/go or runtime... net/http can provide an adapter.)

@urandom
Copy link

@urandom urandom commented Dec 4, 2019

@bradfitz both http.File itself is an interface with no technical dependencies to the http package. It might be a good idea for any Open method to provide an implementation that conforms to that interface, because both the Stat and Readdir methods are quite useful for such assets

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 4, 2019

@urandom, it couldn't implement http.FileSystem, though, without referring to the "http.File" name (https://play.golang.org/p/-r3KjG1Gp-8).

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2019

@robpike and I talked through a proposal for doing this years ago (before there was a proposal process) and never got back to doing anything. It's been bugging me for years that we never finished doing that. The idea as I remember it was to just have a special directory name like "static" containing the static data and automatically make them available through an API, with no annotations needed.

I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes. If users want to store compressed data in that file, great, the details are up to them and there's no API needed on Go's side at all.

@rsc rsc added this to Incoming in Proposals Dec 4, 2019
@jayconrod
Copy link
Contributor

@jayconrod jayconrod commented Dec 4, 2019

A couple thoughts:

  • It should not be possible to embed any file outside the module doing the embedding. We need to make sure files are part of module zip files when we create them, so that also means no symbolic links, case conflicts, etc. We can't change the algorithm that produces zip files without breaking sums.
  • I think it's simpler to restrict embedding to be in the same directory (if //go:embed comments are used) or a specific subdirectory (if static is used). This makes it a lot easier to understand the relationship between packages and embedded files.

Either way, this blocks embedding /etc/shadow or .git. Neither can be included in a module zip.

In general, I'm worried about expanding the scope of the go command too much. However, the fact that there are so many solutions to this problem means there probably ought to be one official solution.

I'm familiar with go_embed_data and go-bindata (of which there are several forks), and this seems to cover those use cases. Are there any important problems the others solve that this doesn't cover?

@DeedleFake
Copy link

@DeedleFake DeedleFake commented Dec 4, 2019

Blocking certain files shouldn't be too hard, especially if you use a static or embed directory. Symlinks might complicate that a bit, but you can just prevent it from embedding anything outside of the current module or, if you're on GOPATH, outside of the package containing the directory.

I'm not particularly a fan of a comment that compiles to code, but I also find the pseudo-package that affects compilation to be a bit strange as well. If the directory approach isn't used, maybe it might make a bit more sense to have some kind embed top-level declaration actually built into the language. It would work similarly to import, but would only support local paths and would require a name for it to be assigned to. For example,

embed ui "./ui/build"

func main() {
  file, err := ui.Open("version.txt")
  if err != nil {
    panic(err)
  }
  version, err = ioutil.ReadAll(file)
  if err != nil {
    panic(err)
  }
  file.Close()

  log.Printf("UI Version: %s\n", bytes.TrimSpace(version))
  http.ListenAndServe(":8080", http.EmbeddedDir(ui))
}

Edit: You beat me to it, @jayconrod.

@josharian
Copy link
Contributor

@josharian josharian commented Dec 4, 2019

To expand on #35950 (comment), there is a puzzle about the exposed API. The obvious ways to expose the data are []byte, string, and Read-ish interfaces.

The typical case is that you want the embedded data to be immutable. However, all interfaces exposing []byte (which includes io.Reader, io.SectionReader, etc.) must either (1) make a copy, (2) allow mutability, or (3) be immutable despite being a []byte. Exposing the data as strings solves that, but at the cost of an API that will often end up requiring copying anyway, since lots of code that consumes embedded files eventually requires byte slices one way or another.

I'd suggest route (3): be immutable despite being a []byte. You can enforce this cheaply by using a readonly symbol for the backing array. This also lets you safely expose the same data as a []byte and a string; attempts to mutate the data will fail. The compiler can't take advantage of the immutability, but that's not too great of a loss. This is something that toolchain support can bring to the table that (as far as I know) none of the existing codegen packages do.

(A third party codegen package could do this by generating a generic assembly file containing DATA symbols that are marked as readonly, and then short arch-specific assembly files exposing those symbols in the form of strings and []bytes. I wrote CL 163747 specifically with this use case in mind, but never got around to integrating it into any codegen packages.)

@DeedleFake
Copy link

@DeedleFake DeedleFake commented Dec 4, 2019

I'm unsure what you're talking about in terms of immutability. io.Reader already enforces immutability. That's the entire point. When you call Read(buf), it copies data into the buffer that you provided. Changing buf after that has zero effect on the internals of the io.Reader.

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 4, 2019

I agree with @DeedleFake. I don't want to play games with magic []byte array backings. It's okay to copy from the binary into user-provided buffers.

@gdamore
Copy link

@gdamore gdamore commented Dec 4, 2019

Just another wrinkle here -- I have a different project which uses DTrace source code (embedded). This is sensitive to differences between \n and \r\n. (We can argue whether this is a dumb thing in DTrace or not -- that's beside the point and it is the situation today.)

It's super useful that backticked strings treat both as \n regardless of how they appear in source, and I rely on this with a go-generate to embed the DTrace.

So if there is an embed file added to the go command, I would gently suggest that options to change the handling of CR/CRLF might come in very handy, particularly for folks who might be developing on different systems where the default line endings can be a gotcha.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2019

Like with compression, I'd really like to stop at "copy the file bytes into the binary". CR/CRLF normalization, Unicode normalization, gofmt'ing, all that belongs elsewhere. Check in the files containing the exact bytes you want. (If your version control can't leave them alone, maybe check in gzipped content and gunzip them at runtime.) There are many file munging knobs we could imagine adding. Let's stop at 0.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2019

It may be too late to introduce a new reserved directory name, as much as I'd like to.
(It wasn't too late back in 2014, but it's probably too late now.)
So some kind of opt-in comment may be necessary.

Suppose we define a type runtime.Files. Then you could imagine writing:

//go:embed *.html (or static/* etc)
var files runtime.Files

And then at runtime you just call files.Open to get back an interface { io.ReadSeeker; io.ReaderAt } with the data. Note that the var is unexported, so one package can't go around grubbing in another package's embedded files.

Names TBD but as far as the mechanism it seems like that should be enough, and I don't see how to make it simpler. (Simplifications welcome of course!)

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2019

Whatever we do, it needs to be possible to support with Bazel and Gazelle too. That would mean having Gazelle recognize the comment and write out a Bazel rule saying the globs, and then we'd need to expose a tool (go tool embedgen or whatever) to generate the extra file to include in the build (the go command would do this automatically and never actually show the extra file). That seems straightforward enough.

@gdamore
Copy link

@gdamore gdamore commented Dec 4, 2019

If various munging won't do the trick, then that's an argument against using this new facility. It's not a stopper for me -- I can use go generate like I've been doing, but it means I cannot benefit from the new feature.

With respect to munging in general -- I can imagine a solution where someone provides an implementation of an interface (something like a Reader() on one side, and something to receive the file on the other -- maybe instantianted with an io.Reader from the file itself) -- which the go cmd would build and run to prefilter the file before embedding. Then folks can provide whatever filter they want. I imagine some folks would provide quasi-standard filters like a dos2unix implementation, compression, etc. (Maybe they should be chainable even.)

I guess there'd have to be an assumption that whatever the embedded processor is, it must be compilable on ~every build system, as go would be building a temporary native tool for this purpose.

@magical
Copy link
Contributor

@magical magical commented Dec 4, 2019

It may be too late to introduce a new reserved directory name, as much as I'd like to. [...] some kind of opt-in comment may be necessary.

If the files are only accessible through a special package, say runtime/embed, then importing that package could be the opt-in signal.

@sascha-andres
Copy link

@sascha-andres sascha-andres commented Mar 20, 2020

I like the idea of @chris.ackermanm. But I would prefer a combination:

A go.res file specifying the namespace within a directory.

This allows for

  • multiple includes as long as the namespace differs
  • not knowing the files before and having to generate a list

The latter one should tackle the output of webpack and the likes which may change the layout due to updates, different options, whatever you can think of.

Regarding compression: that I think is more a feature in terms of not having the binary sizes explode and should be transparent to the using code.

Later you could allow for rewrites such as

filename => stored-as.png

Just my 2¢

@ChrisAckerman
Copy link

@ChrisAckerman ChrisAckerman commented Mar 20, 2020

@sascha-andres It seems like ultra simplicity and zero magic is the tone of this thread. See the edits I made to my comment re your suggestions.

I don’t like the mapping. No need. That’s possible by exposing your own read function from a separate package anyway, and now we need a new file syntax, or something more complex than file-per-line.

@jeffguorg
Copy link

@jeffguorg jeffguorg commented Mar 24, 2020

Hi

This proposal is awesome!

And I have my approach to emebed assets. no need to introduce any tools other than GNU bintools. It is sort of dirty, but works well for me for now. I just want to share it and see if it helps.

my approach is to just embed my assets(compressed with tar&gz) in a elf/pe32 section with objcopy, and read it via package debug/elf and debug/pe32 along with zip when needed. all that i need to remember is to not touch any existing section. all the assets are immutable and then the code read the content and process it in memory.

i'm pretty inexperienced on language design or compiler design. so i would just use the approach described above and use .goassets or something like that as the section name. and make compresssion optional.

@koshatul
Copy link

@koshatul koshatul commented Mar 25, 2020

my approach is to just embed my assets(compressed with tar&gz) in a elf/pe32 section with objcopy, and read it via package debug/elf and debug/pe32 along with zip when needed. all that i need to remember is to not touch any existing section. all the assets are immutable and then the code read the content and process it in memory.

That sounds like it works on elf/pe32 but what about mach-o/plan9 ?

Another issue is that it relies on opening a file handle on the executable, if the executable has been overwritten/updated/deleted then this will return different data, not sure if that's a legitimate problem or an unexpected feature.

I had a bit of a go myself (using debug/macho), but I can't see a way to get this working cross-platform, I'm building on macOS and the GNU binutils installed just seems to corrupt the mach-o-x86-64 file (that could just be my lack of mach-o structure understanding and too long since I even looked at objcopy).

@tie
Copy link
Contributor

@tie tie commented Mar 25, 2020

Another issue is that it relies on opening a file handle on the executable

I’m pretty sure that program loader will (or could) load the resources section into memory, so there is no need to use debug packages. Though accessing the data would require much more tinkering with object files than it is worth.

@bokysan
Copy link

@bokysan bokysan commented Mar 25, 2020

Why not follow what works -- e.g. how Java does it. I would require things o be a big go-ish, but something in the lines:

  • create a go.res file or modify go.mod to point to the directory where the resources are
  • all files from this directory are automatically included, no exceptions by the compiler in the final executable
  • language provides an path-like API for accessing these resources

Compression, etc. should be outside of scope of this resource bundling and are up to any // go:generate scripts if needed.

@diamondburned
Copy link

@diamondburned diamondburned commented May 29, 2020

Has anybody looked at markbates/pkger? It's a pretty simple solution of using go.mod as the current working directory. Assuming an index.html is to be embedded, opening it would be pkger.Open("/index.html"). I think this is a better idea than hardcoding a static/ directory in the project.

It's also worth mentioning that Go doesn't have any significant structure requirements for a project as far as I could see. go.mod is just a file and not a lot of people ever use vendor/. I personally don't think that a static/ directory would be any good.

@TheMightyGit
Copy link

@TheMightyGit TheMightyGit commented Jul 6, 2020

As we already have a way of injecting (albeit limited) data into a build via the existing ldflags link flag -X importpath.name=value, could that code path be adjusted to accept -X importpath.name=@filename to inject external arbitrary data?

I realise this doesn't cover all of the stated goals of the original issue, but as an extension of the existing -X functionality does it seem a reasonable step forward?

(And if that works out then extending the go.mod syntax as a neater way of specifying ldflags -X values is a next reasonable step?)

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Jul 6, 2020

That's a very interesting idea, but I'm worried about the security implications.

It's pretty common to do -X 'pkg.BuildVersion=$(git rev-parse HEAD)', but we wouldn't want to let go.mod run arbitrary commands, would we? (I guess go generate does, but that's not something you typically run for downloaded OSS packages.) If go.mod can't handle that, it ends up missing a major use case, so ldflags would still be very common.

Then there's the other issue of making sure @filename is not a symlink to /etc/passwd or whatever.

@flimzy
Copy link

@flimzy flimzy commented Jul 6, 2020

Using the linker precludes support for WASM, and possibly other targets that don't use a linker.

@rsc
Copy link
Contributor

@rsc rsc commented Jul 21, 2020

Based on the discussion here, @bradfitz and I worked out a design that sits somewhere in the middle of the two approaches considered above, taking what seems to be the best of each. I've posted a draft design doc, video, and code (links below). Instead of comments on this issue, please use the Reddit Q&A for comments on this specific draft design - Reddit threads and scales discussions better than GitHub does. Thanks!

Video: https://golang.org/s/draft-embed-video
Design: https://golang.org/s/draft-embed-design
Q&A: https://golang.org/s/draft-embed-reddit
Code: https://golang.org/s/draft-embed-code

@atomsymbol
Copy link

@atomsymbol atomsymbol commented Jul 22, 2020

@rsc In my opinion, the go:embed proposal is inferior to providing universal sandboxed Go code execution at compile-time which would include reading files and transforming read data into an optimal format best suitable for consumption at runtime.

@diamondburned
Copy link

@diamondburned diamondburned commented Jul 22, 2020

@atomsymbol That sounds like something waaay outside the scope of this issue.

@atomsymbol
Copy link

@atomsymbol atomsymbol commented Jul 22, 2020

@atomsymbol That sounds like something waaay outside the scope of this issue.

I am aware of that.

@kokes
Copy link

@kokes kokes commented Jul 31, 2020

I read through the proposal and scanned the code, but couldn't find an answer to this: Will this embedding scheme contain information about the file on disk (~os.Stat)? Or will these timestamps get reset to build time? Either way, these are useful pieces information that gets used in various places, e.g. we can send a 304 for unchanged assets based on this.

Thanks!

Edit: found it in the reddit thread.

The modification time for all embedded files is the zero time, for exactly the reproducibility concerns you listed. (Modules don't even record modification times, again for the same reason.)

https://old.reddit.com/r/golang/comments/hv96ny/qa_goembed_draft_design/fytj7my/

@thomasf
Copy link

@thomasf thomasf commented Jul 31, 2020

Either way, these are useful pieces information that gets used in various places, e.g. we can send a 304 for unchanged assets based on this.

An ETag header based on the file data hash would solve that problem without having to know anything about dates. But that would have to be known by http.HandlerFS or something to be able to work and to not waste resources it would have to be done only once per file.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Jul 31, 2020

But that would have to be known by http.HandlerFS or something to be able to work and to not waste resources it would have to be done only once per file.

How would http.HandlerFS know that the fs.FS was immutable? Should there be an IsImmutable() bool optional interface?

@thomasf
Copy link

@thomasf thomasf commented Jul 31, 2020

How would http.HandlerFS know that the fs.FS was immutable? Should there be an IsImmutable() bool optional interface?

I don't want to get into implementation details because I'm not the designer of these things but http.HandlerFS could check if it's an embed.FS type and act upon that as a special case, I don't think anyone wants to expand the FS API right now. There could also be an option argument to HandlerFS specifically to tell it to treat a filesystem as immutable. Also if this is done on application start up and all ctime/mtime have zero value handlerFS could use that info to "know" that the file hasn't changed but there are also file systems which might not have mtime or have it disabled so there might be problems there as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.