Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore caching strategies for CI builds #847

Open
bocon13 opened this issue Nov 3, 2021 · 3 comments
Open

Explore caching strategies for CI builds #847

bocon13 opened this issue Nov 3, 2021 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed Infra Things related to CI/CD, build, and tests

Comments

@bocon13
Copy link
Member

bocon13 commented Nov 3, 2021

For CircleCI, we download a 4GB cache (which is everything from the Bazel run) and it takes about 2 minutes. Then for every job we also upload a new copy of the Bazel cache, which takes about 6 minutes. So every job, even incremental rebuilds take at least 8 minutes.

# Cache Bazel output root (see bazelrc) to speed up job execution time.
# The idea is to use the last cache available for the same branch, or the one
# from main if this is the first build for this branch.
# TODO: consider using Bazel remote cache (e.g. local HTTP proxy cache backed by S3)
restore_bazel_cache: &restore_bazel_cache
restore_cache:
keys:
- v4-bazel-cache-{{ .Environment.CIRCLE_JOB }}-{{ .Branch }}-{{ .Revision }}
- v4-bazel-cache-{{ .Environment.CIRCLE_JOB }}-{{ .Branch }}
- v4-bazel-cache-{{ .Environment.CIRCLE_JOB }}-main
save_bazel_cache: &save_bazel_cache
save_cache:
# Always saving the cache, even in case of failures, helps with completing
# jobs where the bazel process was killed because it took too long or OOM.
# Restart the job if you see the bazel server being terminated abruptly.
when: always
key: v4-bazel-cache-{{ .Environment.CIRCLE_JOB }}-{{ .Branch }}-{{ .Revision }}
paths:
- /tmp/bazel-cache
- /tmp/bazel-disk-cache
clean_bazel_cache: &clean_bazel_cache
run:
name: Clean Bazel disk cache of files that have not been modified in 30 days
# mtime is the only time preserved after untaring the cache.
command: /usr/bin/find /tmp/bazel-disk-cache -mtime +30 -exec rm -v {} \;

We can likely do better...

One proposal is to use the CircleCI cache for remote repositories and a new remote Bazel cache for build artifacts. With two fully populated caches, here are some preliminary results:

brian@7c90d4cb0e3c:/stratum$ bazel clean --expunge
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
brian@7c90d4cb0e3c:/stratum$ xargs -a .circleci/build-targets.txt bazel build --repository_cache=/tmp/cache --remote_cache=http://${BAZEL_CACHE_USER}:${BAZEL_CACHE_PASSWORD}@bazel-cache.stratumproject.org:8080 --jobs=2 --remote_download_minimal
...
INFO: Elapsed time: 164.956s, Critical Path: 4.61s
INFO: 6206 processes: 4939 remote cache hit, 1267 internal.
INFO: Build completed successfully, 6206 total actions

Build stats with --repository_cache and --remote_cache:
< 3 minutes, 950MB downloaded, 24MB uploaded
/tmp/cache is 1.2GB and only changes when a dependency is updated

We can also simplify the caching strategy because all deps are specified in one file:
- bazel-repo-cache-v1-{{ checksum "bazel/deps.bzl" }}

Further optimization is likely possible but also risks cache bloat, so paying the one time cost for a new cache in the infrequent dependency update event seems like a fair tradeoff. One goal is to avoid uploading a Bazel cache at the end of each job, which is a costly operation.

@bocon13 bocon13 added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers Infra Things related to CI/CD, build, and tests labels Nov 3, 2021
@pudelkoM
Copy link
Member

pudelkoM commented Nov 3, 2021

One additional point for remote caching: Today, forked CI runs start without a cache and take ~2 hours. Even if we don't share the remote bazel cache creds with forks, we can allow anonymous reads, speeding up these builds considerably.

@bocon13
Copy link
Member Author

bocon13 commented Nov 3, 2021

There are also some dependencies here: https://github.com/stratum/stratum/blob/020e33231439518ba8406e60827e8f46e66e39a0/bazel/rules/build_tools.bzl

We should figure out how to work those into our cache key name as well.

@bocon13
Copy link
Member Author

bocon13 commented Nov 3, 2021

One additional point for remote caching: Today, forked CI runs start without a cache and take ~2 hours. Even if we don't share the remote bazel cache creds with forks, we can allow anonymous reads, speeding up these builds considerably.

It turns out running the build on a dual core machine in Docker also takes over an hour. So we might consider giving read-only access to the cache to developers with lightly powered machines, too. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed Infra Things related to CI/CD, build, and tests
Projects
None yet
Development

No branches or pull requests

2 participants