-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache git repo, draft references in Circle #3009
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was concerned that this wouldn't be very much good.
The {{ .Revision }}
key means that that cache entry is only going to be good for rebuilding that revision, something that we don't do, except when we are tagging a release. The tag for {{ .Branch }}
is probably good, but not for master
, which will be baking in as soon as this lands (if it hasn't already).
The {{ epoch }}
tag is only good if we run builds within a second of each other.
How about an idea that I just had. We can run arbitrary code to generate a file. If we were to write something to a file and then use {{ checksum ".reference-cache-key" }}
we could ensure that the value rolls over daily so that we have a hot reference cache. We could do the same for the git cache so that it is mostly at most a certain date old. You can even cascade caches by reading from {{ checksum ".cache-today" }}
and {{ checksum ".cache-yesterday" }}
.
The important piece there are the multi-tiered keys in restore:
The revision will never match (except for tag builds, as you note), so it picks up the most recent git repo cached for that branch, or the most recent overall if it's a new branch. The epoch will almost never match, so it picks up the most recent reference cache generated without regard for branches. We could drop the first line of each restore_cache keys directive and get the same behavior. Look at the run logs on this branch -- it is picking up the cache from previous runs, because the first restore key strikes out (as expected) and it rolls over to the more general key and finds the previous run. |
And too, the proof is in the pudding -- the first build on this branch took 1:48; git caching brought it to 0:56, and reference caching brought it to 0:37.
There's a few seconds' variation around each of those -- the last build on the branch went back up to 0:43. But still, I stand by my claim that we improve by nearly a minute per build. (That one with 2:55 runtime spent 2:04 of that downloading issues.) |
Newly-created branches are most likely to have been based on master; if no cache exists for a repo, take master as the starting point
That's great for now, while there are relatively few new commits to add and the references are fresh. However, we are creating caches for things that won't ever be used again ( More seriously, the |
That would be true if we created a cache once and never wrote an updated one. But we're not -- we're creating a fresher one each time we run. We'll be picking up the cache from the last time master built, not the first. Because each key is immutable once written, the keys have to be unique (epoch and revision). But when we're searching by prefix, that doesn't mean it's going to use the oldest possible match. From https://circleci.com/docs/2.0/caching/#restoring-cache:
Again, look at the actual behavior for builds on this branch:
Each build creates a new cache instance, and each run uses:
Sure, odds are that each stored cache only gets used once and then sits there collecting dust until Circle expires it 30 days later, which means we're not amortizing the time to create the cache over multiple future runs. But the combined time to store a new cache instance and retrieve the latest cache instance is massively outweighed by the time saved by having that cache. |
How did I completely miss the prefix-matching thing. This is good. |
This appears to shave almost a minute off of build times. See https://circleci.com/gh/quicwg/base-drafts/tree/circle_caching for the progression.