Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use zstd to compress cache entries #784

Merged
merged 1 commit into from
Jun 12, 2020

Conversation

froydnj
Copy link
Contributor

@froydnj froydnj commented Jun 5, 2020

zstd is faster and gives slightly smaller (~5%) compressed blobs than
deflate does, as measured on a Firefox build. Rather than inventing our
own compressed archive format, we piggyback on top of zip's "stored"
files to stuff zstd-compressed blobs in the zip archive.

@froydnj froydnj requested a review from glandium June 5, 2020 17:13
@luser
Copy link
Contributor

luser commented Jun 5, 2020

I feel like this has been discussed before, but you might look at using .tar.zstd if you're going to make this change, since that would allow compression across entries where zip compresses one entry at a time.

@froydnj
Copy link
Contributor Author

froydnj commented Jun 5, 2020

I feel like this has been discussed before, but you might look at using .tar.zstd if you're going to make this change, since that would allow compression across entries where zip compresses one entry at a time.

I thought about doing this, but this change was much simpler; a quick run over my sccache cache corpus (a bunch of Firefox builds, ~8GB of zip files) says that .tar.zstd gains about half a percent more than just stuffing zstd blobs into zip files. I'm not sure that's worth it.

@jwatt
Copy link
Contributor

jwatt commented Jun 5, 2020

I feel like this has been discussed before

In both:
#160
#552

@glandium
Copy link
Collaborator

glandium commented Jun 5, 2020

I don't think it would be worth for compression size, but it could be worth for latency: with the zip format, you have to have downloaded the entire thing before you can start decompressing. With tar, you can stream the decompression. With things ordered in the right way, you wouldn't even need to keep everything in memory.

@glandium
Copy link
Collaborator

glandium commented Jun 5, 2020

(OTOH, the overhead of decompressing is so low...)

@froydnj
Copy link
Contributor Author

froydnj commented Jun 5, 2020

I don't think it would be worth for compression size, but it could be worth for latency: with the zip format, you have to have downloaded the entire thing before you can start decompressing. With tar, you can stream the decompression. With things ordered in the right way, you wouldn't even need to keep everything in memory.

I can totally get behind this. At the same time, making cache entries tar files would be somewhat more complicated than this, and then to take advantage of being able to stream decompress the cache entries would require significant rewriting of the storage layer (though I see from poking around that our GCS/S3/Azure support have basically copy-paste'd code for converting network requests into in-memory Vec<u8>s, so maybe it wouldn't be that bad?)

Experiments have also suggested that hiding the network latency here is not worth it...yet. I do like the idea of .tar.zstd; I just don't think its time has come. (And per the suggestion from @mostynb in #759, maybe we should be adopting Bazel's cache format -- though I'm not totally sure what that buys us at the current point beyond, I think, somewhat more code than even .tar.zstd.)

@mostynb
Copy link
Contributor

mostynb commented Jun 5, 2020

The potential benefits of using bazel's cache format (ActionResult protobuf) are orthogonal to whatever compression is applied to the serialized blobs:

  • Adding an REAPI cache backend would be easy, and give us access to several existing implementations with some nice qualities.
  • Supporting compile jobs that output additional files would be easy- they're simply stored by file name in the output_files field, which can't be confused with the stdout and stderr fields.

src/cache/cache.rs Outdated Show resolved Hide resolved
src/cache/cache.rs Outdated Show resolved Hide resolved
src/cache/cache.rs Outdated Show resolved Hide resolved
src/cache/cache.rs Outdated Show resolved Hide resolved
src/cache/cache.rs Outdated Show resolved Hide resolved
src/cache/cache.rs Show resolved Hide resolved
zstd is faster and gives slightly smaller (~5%) compressed blobs than
deflate does, as measured on a Firefox build.  Rather than inventing our
own compressed archive format, we piggyback on top of zip's "stored"
files to stuff zstd-compressed blobs in the zip archive.
@glandium glandium merged commit be9da57 into mozilla:master Jun 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants