Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: speed up large container start-up times without pre-pulling containers into VMs (CRFS) #30829

Open
bradfitz opened this Issue Mar 14, 2019 · 20 comments

Comments

Projects
None yet
9 participants
@bradfitz
Copy link
Member

bradfitz commented Mar 14, 2019

Tracking bug for improving how we maintain & deploy our larger builder environment containers easily and quickly while also having them start up quickly.

Our current situation (building a container, pushing to gcr.io, then automating the creation of a COS-like VM images that has the image pre-pulled) is pretty gross and tedious.

I propose CRFS: a Container-Registry Filesystem. See design doc at https://github.com/golang/build/tree/master/crfs#crfs-container-registry-filesystem

The gist of it is that we can read bytes from gcr.io directly with a FUSE filesystem, rather than doing huge docker pulls. It's not very hard once you tweak the tarballs into a more amenable format.

@bradfitz bradfitz self-assigned this Mar 14, 2019

@gopherbot gopherbot added this to the Unreleased milestone Mar 14, 2019

@gopherbot gopherbot added the Builders label Mar 14, 2019

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Mar 14, 2019

Change https://golang.org/cl/167392 mentions this issue: crfs: start of a README / design doc of sorts

gopherbot pushed a commit to golang/build that referenced this issue Mar 14, 2019

crfs: start of a README / design doc of sorts
Updates golang/go#30829

Change-Id: I8790dfcd30e3fb4d68b6e4cb9f8baf44c45d2cd6
Reviewed-on: https://go-review.googlesource.com/c/build/+/167392
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@ktock

This comment has been minimized.

Copy link

ktock commented Mar 14, 2019

Interesting idea.

Maybe you know, there are some related concepts around the container world, which are aiming to make image lightweight and to boot containers faster using lazy-pull and de-duplication technology.

Don't you aim to minimize image size by taking each chunks much smaller? like:

GZIP(TAR(file1_small_chunk1)) + GZIP(TAR(file1_small_chunk2)) + GZIP(TAR(file1_small_chunk3)) + GZIP(TAR(file2_small_chunk1)) + ... + GZIP(TAR(index of earlier files in magic file))

If you take chunk smaller, you can achive inter-image de-duplication on chunk level like casync and desync doing (not only partial-pulling).

Recently, I'm implementing a rough PoC which tackles similar kind of issue, (booting containers faster and minimizing image size).
Additionally, I aim to achieve it without any modification on runtime or registry, using init-like program inside container and using FUSE-in-container like technique.

Thanks.

@dprotaso

This comment has been minimized.

Copy link

dprotaso commented Mar 14, 2019

Heyo, don't know if you've seen this: containerd/containerd#2968

Once that settles it should enable creating a crfs 'snapshotter' that skips pulling images and would just perform a FUSE mount.

@bradfitz

This comment has been minimized.

Copy link
Member Author

bradfitz commented Mar 14, 2019

@dprotaso, I hadn't seen that. Excellent. Thanks for the link!

@bradfitz

This comment has been minimized.

Copy link
Member Author

bradfitz commented Mar 14, 2019

@ktock, while I'm a big fan of content-addressable storage & deduplication (my https://perkeep.org/ project is all about it), it's not my goal with this project to address that. I just want fast boot times here. Storage as far as I'm concerned is free.

@dprotaso

This comment has been minimized.

Copy link

dprotaso commented Mar 14, 2019

Also you might not need to reinvent the wheel with stargz

https://github.com/samtools/htslib/blob/develop/bgzf.c
https://github.com/biogo/hts/tree/master/bgzf

Another interesting thing from: http://samtools.github.io/hts-specs/SAMv1.pdf

It is worth noting that there is a known bug in the Java GZIPInputStream class that concatenated gzip archives cannot be successfully decompressed by this class. BGZF files can be created and manipulated using the built-in Java util.zip package, but naive use of GZIPInputStream on a BGZF file will not work due to this bug.

@glyn

This comment has been minimized.

Copy link

glyn commented Mar 14, 2019

I just wanted to check that, if this feature goes ahead, it won't be bundled into the standard library as that seems inappropriate to me.

@bradfitz

This comment has been minimized.

Copy link
Member Author

bradfitz commented Mar 14, 2019

@glyn, no, that won't happen. That would be entirely bizarre. The Go team writes a lot of code but very little of it goes into the standard library. I even added the FAQ entry that says we don't want most code in the standard library: https://golang.org/doc/faq#x_in_std

@lukasheinrich

This comment has been minimized.

Copy link

lukasheinrich commented Mar 14, 2019

Hi -- just commenting here to link this to an issue within containerd which seems to tackle a similar problem as described here containerd/containerd#2943 (comment)

@stevvooe

This comment has been minimized.

Copy link

stevvooe commented Mar 14, 2019

@bradfitz This is a very cool hack.

It might be worth just turning off layer compression (easier said that done, but works with standard docker once you push that way), then just use transport compression when fetching the individual file chunks. That might complicate backend storage a bit, which might have to use a different compression technique, but the images would be runnable by an unmodified docker daemon.

It's at least worth a look. ;)

@bradfitz

This comment has been minimized.

Copy link
Member Author

bradfitz commented Mar 15, 2019

@stevvooe, you'd still need an index somewhere. If you already need to push modified or additional layers to hold the index, might as well also compress it all?

@bradfitz bradfitz changed the title x/build: speed up large container start-up times without pre-pulling containers into VMs x/build: speed up large container start-up times without pre-pulling containers into VMs (CRFS) Mar 15, 2019

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Mar 15, 2019

Change https://golang.org/cl/167769 mentions this issue: crfs/stargz: add start of package

gopherbot pushed a commit to golang/build that referenced this issue Mar 15, 2019

crfs/stargz: add start of package
Basic API, format, tests.

Good enough checkpoint.

Updates golang/go#30829

Change-Id: Iaec5b205314d64fca5056f6b19a7bae52e5cef94
Reviewed-on: https://go-review.googlesource.com/c/build/+/167769
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Mar 16, 2019

Change https://golang.org/cl/167920 mentions this issue: crfs/stargz: add basic file reading, chunking big files, more tests, docs

@bradfitz

This comment has been minimized.

Copy link
Member Author

bradfitz commented Mar 16, 2019

@stevvooe, my index comment was slightly unrelated in retrospect. You're probably more concerned about runtime CPU usage for decoding gzip for reads, eh? Turning off layer compression should indeed solve that, but would increase the $$$ cost for image storage. And I'm unsure both whether a) gcr.io supports transport compression (probably), and b) whether it's even worth it inside a very fast network.

@dmitshur

This comment has been minimized.

Copy link
Member

dmitshur commented Mar 16, 2019

@bradfitz I've read the original issue and the linked design doc in full, which helped me understand this better, but I still have an unanswered question about this part:

The gist of it is that we can read bytes from gcr.io directly with a FUSE filesystem, rather than doing huge docker pulls.

I understand one of the benefits is the ability to stream the container image, so parts of it can start being accessed sooner, instead of waiting for the entire container image to be downloaded before the first byte can be read.

But is there also an advantage that a typical workload would read less bytes than the entire container image contains? I.e., only a small subset is typically needed, so the savings are also that less bytes need to be downloaded in total?

gopherbot pushed a commit to golang/build that referenced this issue Mar 21, 2019

crfs/stargz: add file reading, chunking big files, more tests, docs
Updates golang/go#30829

Change-Id: I1ce8c1cbfa580c372341af63ed161e421103fad4
Reviewed-on: https://go-review.googlesource.com/c/build/+/167920
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Mar 22, 2019

Change https://golang.org/cl/168737 mentions this issue: crfs/stargz/stargzify: add tool to convert a tar.gz to stargz

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Mar 22, 2019

Change https://golang.org/cl/168799 mentions this issue: crfs, stargz: basics of read-only FUSE filesystem, directory support

gopherbot pushed a commit to golang/build that referenced this issue Mar 22, 2019

crfs/stargz/stargzify: add tool to convert a tar.gz to stargz
And in testing converting the Debian base layer I found a hard link,
so add enough hardlink support (mostly in TODO form) for the tool to
run for now. Proper hardlink support later.

Size stats:

-rw-r--r-- 1 bradfitz bradfitz 51354364 Mar  3 03:32 debian.tar.gz
-rw-r--r-- 1 bradfitz bradfitz 55061714 Mar 21 20:37 debian.stargz

About 7.6% bigger. (Acceptable)

Updates golang/go#30829

Change-Id: I4d76850be68d32ea6e8c2bd81c4233df1b5fc7af
Reviewed-on: https://go-review.googlesource.com/c/build/+/168737
Reviewed-by: Jon Johnson <jonjohnson@google.com>

gopherbot pushed a commit to golang/build that referenced this issue Mar 22, 2019

crfs, stargz: basics of read-only FUSE filesystem, directory support
No network support yet. But this implements the basic FUSE support
reading from a local stargz file.

Updates golang/go#30829

Change-Id: I342e957b3b36cded5aec8b1cdca65c3f5e788db3
Reviewed-on: https://go-review.googlesource.com/c/build/+/168799
Reviewed-by: Maisem Ali <maisem@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@dw

This comment has been minimized.

Copy link

dw commented Mar 22, 2019

Hi Brad,

I came via HN :) Cool project, just a few thoughts:

It's possible to do 'solid' compression while retaining the same level of compatibility as done here, the benefit is not resetting the compressor for small files. Looks like regular chunk size also makes it possible to drop at least one TOCEntry field

Regarding TOCEntry, some kind of sorted array that does not require full decoding rather than a recursive structure would make the format far more appealing for reuse, and also reduces the runtime requirements for any parser

One place to look for design inspiration might be squashfs, it's solving a similar problem although its constraints are a little looser. For example squashfs does not store a single large index, subdirectories have their own separate representation

@bradfitz

This comment has been minimized.

Copy link
Member Author

bradfitz commented Mar 22, 2019

@dw, thanks. I was meaning to explore grouping small files together into one gzip stream but first I want to get all the pieces working before I optimize too much. For now a 7% bloat is acceptable.

Looks like regular chunk size also makes it possible to drop at least one TOCEntry field

Yeah, there's a lot of redundant info in there (including the name, which stores its full path), but I liked the flexibility to perhaps do file-specific chunk sizes in the future based on known access patterns for different types of files.

Regarding TOCEntry, some kind of sorted array that does not require full decoding rather than a recursive structure would make the format far more appealing for reuse, and also reduces the runtime requirements for any parser

Yeah, the JSON is slightly inefficient, but I figured it's okay to just slurp the whole thing in at start-up (for all layers) and keep it all in memory. It's not big (at least for the layers I've seen or work with), so I didn't want to prematurely optimize. But people with millions of files in their layers might not find it as acceptable.

@bradfitz

This comment has been minimized.

Copy link
Member Author

bradfitz commented Mar 22, 2019

CRFS is now at https://github.com/google/crfs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.