This tool is a GOCACHEPROG implementation that uses S3/Minio-compatible storages as a remote storage backend. It helps to maintain a stable and fast CI job run times by hooking up Go compiler cache directly to external storage.
If your CI pipeline configured to preserve Go compiler cache it, most likely, works in the following way:
- Initial run:
- Run go compiler
- Archive
GOCACHEdirectory - Upload archive to external storage
- Following runs:
- Download archive from external storage
- Unpack archive into
GOCACHEdirectory - Run go compiler
- Archive
GOCACHEdirectory - Upload archive to external storage
This is how GitLab CI caching, Github Actions Cache and probably other ones work. Docker distributed caching (
--cacheargs inRUN) also work in a similar way.
This technique does work without extra tools but has a major downside: pushing and pulling orphan objects from/to remote storage. Such objects are result of changes in source code, compiler version, etc. They are still present in cache but probably never will be used in a future. It also prevents possible deduplication on external storage side. Go standard library is included in every project so compiling may it will generate a lot of objects that are identical between all projects.
To be fair Go compiler performs periodical cache trimming to avoid bloating but this problem hit us hard when we tried to preserve cache for golangci-lint. We experienced a severe delays during uploading/downloading and (un)packing golangci-lint cache. That delays suddenly became so big that clean runs (without cache) became faster than runs with cache.
However, in Go 1.24 we got GOCACHEPROG-plugin support. It's became possible for Go compiler to directly interact with external storage and query individual objects in it.
So we decided to develop such plugin to interact with S3 in our CI environment. Integration of this plugin gave us improved and stable job run times.
We are aware about already existing implementation from Tailscale but it has features that we don't need (http and module/sumdb proxy) and doesn't have features we need: object compression, Minio compatibility for testing purposes, metrics pushing, pluggable external storages like it's done in Athens.
Also we tried to maintain a balance between resulting binary size and verstility by choosing dependencies wisely i.e. https://github.com/VictoriaMetrics/metrics/ instead of standard prometheus metrics library and keeping track of dead-code-elimination usage.
To reduce storage cost and upload/download times objects are transparently being compressed/decompressed using zstd algorithm with default settings.
To improve run times this tool asynchronously uploads objects to remote storage and storing object metadata in S3 metadata instead of additional metadata file.
This tool is intended to use in CI enviroments and distributed as a Docker image with single binary because it's a most versatile way to distribute tools for CI.
We do not recommend to install from source code using go get/go install inside CI job because it will compile in every single job run and basically ruin the whole idea of caching in external storage to reduce job run time.
To see all possible settings and their descriptions run cacheprog direct --help or cacheprog proxy --help.
You will need an S3 bucket with read-write access from CI jobs and optionally metrics collector like Prometheus Pushgateway or VictoriaMetrics configured to accept metric pushes.
Note. S3 bucket should be as close as possible to CI job runners because Go compiler cache is latency-sensitive. Self-compilation with prefilled cache on a local machine showed that extra 50ms latency with 10ms jitter to each request increases compile time from 1s to 8s.
To use custom external storage incompatible with AWS S3 you can run HTTP server using ./pkg/httpstorage/server.go.
Switching between different storages is done via CACHEPROG_REMOTE_STORAGE_TYPE environment variable. Available values: s3, http, disabled. Default: disabled.
Environment variables for S3-compatible storage are:
CACHEPROG_S3_ENDPOINT- Storage endpoint, not needed for AWS S3.CACHEPROG_S3_BUCKET- S3 bucket name, required.CACHEPROG_S3_REGION- S3 region name. If not provided will be detected automatically via GetBucketLocation API.CACHEPROG_S3_FORCE_PATH_STYLE- Enable path-style routing instead of virtual-hosted-style routing. Detailed description. Default:false.CACHEPROG_S3_ACCESS_KEY_ID- S3 access key id, not needed for AWS S3.CACHEPROG_S3_ACCESS_KEY_SECRET- S3 access key secret, not needed for AWS S3.CACHEPROG_S3_SESSION_TOKEN- S3 session token, not needed for AWS S3.CACHEPROG_S3_CREDENTIALS_ENDPOINT- Credentials endpoint for S3-compatible storages, not needed for AWS S3.CACHEPROG_S3_PREFIX- Prefix for S3 keys, useful to run multiple apps on same bucket i.e. for separation between multiple projects in one bucket or coexistance with sccache. Templated,GOOS,GOARCHandenv.<env var>are available. Template format:{% GOOS %}.CACHEPROG_S3_EXPIRATION- Sets expiration for each S3 object during Put, 0 - no expiration. Accepts duration string, e.g.1h,10m,10s, etc.
Coniguration notes for some S3-compatible storages:
- AWS S3: nothing special needed.
- Minio: use schemes
minio+http://,minio+https://for endpoint because Minio uses a little bit different path resolution logic for buckets. Login and password are provided viaCACHEPROG_S3_ACCESS_KEY_IDandCACHEPROG_S3_ACCESS_KEY_SECRET. - Google Cloud Storage: https://cloud.google.com/storage/docs/aws-simple-migration, extra variables needed:
CACHEPROG_S3_ENDPOINTset tohttps://storage.googleapis.com,CACHEPROG_S3_ACCESS_KEY_ID,CACHEPROG_S3_ACCESS_KEY_SECRET.CACHEPROG_S3_REGIONmay be set toauto. - DigitalOcean Spaces: https://docs.digitalocean.com/products/spaces/how-to/use-aws-sdks/, extra variables needed:
CACHEPROG_S3_ENDPOINTset tohttps://<region>.digitaloceanspaces.com,CACHEPROG_S3_ACCESS_KEY_IDisSPACES_KEY,CACHEPROG_S3_ACCESS_KEY_SECRETisSPACES_SECRET.CACHEPROG_S3_REGIONmay be set tous-east-1. - Alibaba OSS: https://www.alibabacloud.com/help/en/oss/developer-reference/use-amazon-s3-sdks-to-access-oss, extra variables needed:
CACHEPROG_S3_ENDPOINTset tohttps://oss-<region>.aliyuncs.com,CACHEPROG_S3_ACCESS_KEY_IDisOSS_ACCESS_KEY_ID,CACHEPROG_S3_ACCESS_KEY_SECRETisOSS_ACCESS_KEY_SECRET.CACHEPROG_S3_REGIONmay be set tooss-<region>. - Ceph via Rados Gateway: https://docs.ceph.com/en/latest/radosgw/s3/, extra variables needed:
CACHEPROG_S3_ENDPOINTset to Rados Gateway URL,CACHEPROG_S3_ACCESS_KEY_ID,CACHEPROG_S3_ACCESS_KEY_SECRET,CACHEPROG_S3_FORCE_PATH_STYLEset totrue.
To simplify setup we provide an adapter that wraps Go interface implementation into HTTP server. See ./pkg/httpstorage/server.go for details.
Environment variables for HTTP storage are:
CACHEPROG_HTTP_STORAGE_BASE_URL- Base URL, required.CACHEPROG_HTTP_STORAGE_EXTRA_HEADERS- Extra headers to be added to each request. Comma-separated list ofkey:valuepairs.
Prometheus Pushgateway or VictoriaMetrics configured to accept metric pushes are required in this case.
Environment variables for Metrics pushing are:
CACHEPROG_USE_VM_HISTOGRAMS- Use VictoriaMetrics-style histograms with dynamic buckets instead of Prometheus-style if1. Default:0. If you use VictoriaMetrics as a metrics storage you should set this to1.CACHEPROG_METRICS_PUSH_ENDPOINT- Metrics endpoint, metrics will be pushed if provided.CACHEPROG_METRICS_PUSH_METHOD- HTTP method to use for sending metrics. Default:GET.CACHEPROG_METRICS_PUSH_EXTRA_LABELS- Extra labels to be added to each metric, format:key=value.CACHEPROG_METRICS_PUSH_EXTRA_HEADERS- Extra headers to be added to each request. Comma-separated list ofkey:valuepairs.
All application-specific metrics are prefixed with cacheprog_ prefix.
GOCACHEPROG protocol requires to store files on disk and these files must be accessible by Go compiler. You can specify root directory for disk storage via CACHEPROG_DISK_STORAGE_ROOT environment variable. If not provided, temporary directory will be used.
NOTE. Cacheprog does not perform any kind of garbage collection on disk storage so this directory should not be preserved between CI job runs.
More detailed installation and usage process will be described in usecases.
To deliver a cacheprog binary to CI job we recommend to create a custom base image with Go compiler, cacheprog binary and other dependencies.
Cacheprog binary may be added via
COPY --from=<cacheprog-image> /cacheprog /bin/cacheproginstruction.
This is a recommended approach because such base images may be cached on runner nodes (in case of appropriate runner configuration, of course) so job may start much faster.
CI job must have following environment variables set:
GOCACHEPROG- path tocacheprogbinary.- storage configuration variables.
Other flags or environment variables may be set depending on the needs. To see them run cacheprog direct --help.
This is a case when your project is being build by running docker build command in CI. So you have something like this in your Dockerfile:
ARG GO_VERSION=1.25
FROM golang:{GO_VERSION}-alpine AS builder
RUN go build -o /bin/project ./cmd/projectTo make things possible here we provide a cacheprog proxy mode.
In this mode cachprog runs as a sidecar for translating requests from http storage type into any storage type.
It allows us to not deal with authentication inside docker build context.
NOTE. This mode is not intended to be used as dedicated service because it does not contain any authentication mechanism for incoming requests.
Setup is a bit more complex in this case. Following things are required:
- custom base image with Go compiler, cacheprog binary and other dependencies
- all project dockerfiles should inherit this base image
- started cachprog in a proxy mode before running
docker buildcommand, cacheprog proxy must be network-accessible fromdocker buildcommand
Base image also must have following settings in ONBUILD instructions:
ONBUILD ARG CACHEPROG_SERVER=cacheprog:8080
ONBUILD ENV \
CACHEPROG_REMOTE_STORAGE_TYPE=http \
CACHEPROG_HTTP_STORAGE_BASE_URL=http://${CACHEPROG_SERVER}/ \
CACHEPROG_METRICS_PUSH_ENDPOINT=http://${CACHEPROG_SERVER}/metricsproxy \ # if you want to push metrics
GOCACHEPROG="/bin/cacheprog"And then all dockerfiles in project should inherit from this base image.
ARG GO_IMAGE
FROM ${GO_IMAGE} AS builder
RUN go build -o /bin/project ./cmd/projectCI job must be configured in a following way (assuming all necessary preparations are done before):
- set basic environment variables:
CACHEPROG_PROXY_PORT- port to listen on for cacheprog proxy.CACHEPROG_SERVER- address to connect to cacheprog proxy, i.e."127.0.0.1:${CACHEPROG_PROXY_PORT}".- remote storage configuration variables
- run a cacheprog proxy with credentials necessary to access remote storage. I.e. for AWS S3:
if [[ -f "$AWS_WEB_IDENTITY_TOKEN_FILE" ]]; then export DOCKER_VOLUME_BINDINGS="$DOCKER_VOLUME_BINDINGS --volume ${AWS_WEB_IDENTITY_TOKEN_FILE}:${AWS_WEB_IDENTITY_TOKEN_FILE}:ro" fi docker run \ --network host \ # may be custom network but we found that its the most convenient way --detach \ --name cacheprog \ --entrypoint /bin/cacheprog \ --env CACHEPROG_PROXY_LISTEN_ADDRESS=0.0.0.0:${CACHEPROG_PROXY_PORT} \ --env CACHEPROG_METRICS_PROXY_ENDPOINT=${CACHEPROG_METRICS_PUSH_ENDPOINT} \ # METRICS_PROXY variables needs to be set if you want to push metrics --env CACHEPROG_METRICS_PROXY_EXTRA_LABELS=${CACHEPROG_METRICS_PUSH_EXTRA_LABELS} \ --env CACHEPROG_METRICS_PROXY_EXTRA_HEADERS=${CACHEPROG_METRICS_PUSH_EXTRA_HEADERS} \ --env-file <(env | grep -E '^AWS_|^CACHEPROG_') \ ${DOCKER_VOLUME_BINDINGS} \ ${GO_IMAGE} proxy || true
GO_IMAGEis a base image mentioned before. - provide extra build arguments to docker build command:
export BUILDER_BUILD_ARGS="$BUILDER_BUILD_ARGS \ --build-arg GO_IMAGE=${GO_IMAGE} \ --build-arg CACHEPROG_SERVER=${CACHEPROG_SERVER}" docker build --network host ${BUILDER_BUILD_ARGS} -f ${BUILDER_DOCKERFILE_LOCATION}/Dockerfile -t ${BUILDER_REGISTRY_IMAGE} ${BUILDER_BUILD_CONTEXT}
- collect logs from cacheprog proxy:
docker logs cacheprog || true
Note that we used || true to avoid job failure if cacheprog proxy fails to start. Cacheprog running inside docker build context can tolerate unavailable external storage. Also note that we're not using things like GitLab services to run a cacheprog proxy because in this case it's hard to make it accessible from docker build context.
This case is almost similar to direct running Go compiler in a CI job. Here we also recommend to create a custom base image from golangci/golangci-lint image.
Settings are almost the same as for Go compiler with small differences:
GOLANGCI_LINT_CACHEPROGmust be set. We recommend to set it to something like<cacheprog-path> --s3-prefix="{% env.CACHEPROG_S3_PREFIX %}/golangci-lint-cache". This needed because golangci-lint has own caching mechanism and separating it from Go compiler cache should prevent possible conflicts.GOCACHEPROGrecommended to be set to<cacheprog-path> --log-output=gocacheprog.logto collect logs because golangci-lint runs Go compiler in a subprocess under the hood and logs from it are not visible in the main process.
If you are planning to support Go versions older than 1.24 on your CI environment, you can simply combine old-style caching via GOCACHE and caching via GOCACHEPROG.
GOCACHEPROG takes priority if specified so after updating Go to 1.24 it will be used automatically. Just make sure that GOCACHE and CACHEPROG_ROOT_DIRECTORY are not the same.
However, we recommend to add a script to automatically cleanup GOCACHE in case of GOCACHEPROG being used to clear CI-managed cache. Something like this:
if [[ -d "$CACHEPROG_ROOT_DIRECTORY" ]]; then
# GOCACHEPROG was used, cleanup CI cache directory to prevent further pulling
rm -rf "$GOCACHE"
rm -rf "$GOLANGCI_LINT_CACHE"
# Force CI cache cleanup
mkdir -p "$GOCACHE"
touch "$GOCACHE/.empty"
fiIn this case CACHEPROG_ROOT_DIRECTORY must be explicitly provided in job environment variables.
To enable debugging, profiling and tracing you can use following environment variables:
CACHEPROG_LOG_LEVEL- logging level. Available:DEBUG,INFO,WARN,ERROR. Default:INFO.CACHEPROG_CPU_PROFILE_PATH- path to write CPU profile.CACHEPROG_MEM_PROFILE_PATH- path to write memory profile.CACHEPROG_TRACE_PROFILE_PATH- path to write trace profile.CACHEPROG_FGPROF_PATH- path to write fgprof (wall-clock profiling) profile.
All these variables are optional and if not provided nothing will be written. Artifacts produced by cachprog may be collected using your CI artifact collection mechanism.
NOTE. DEBUG logging level generates a lot of logs so it's not recommended to use it in production.
If you encounter a weird compiler errors after i.e. upgrading Go compiler version it's most likely because Go compiler picked up wrong cache object due to changed object key computation logic.
To deal with it you may set CACHEPROG_DISABLE_GET=true environment variable to disable getting objects from any storage. This will force Go compiler to recompile the project and populate cache again. This variable may be wired to CI labels to make this process a bit more convenient.
To run e2e tests see functests.
This project includes a docker-compose.yml file to run local environment for testing. It has
- Minio server to simulate external storage.
- Toxiproxy to simulate network latency and failures.
To run it use:
docker compose -f deployments/compose/docker-compose.yml upBy default minio web interface is available at http://localhost:9001, s3 interface is available at http://localhost:9000. Default credentials are minioadmin for both login and password.
Toxiproxy may be used to simulate network latency and failures. It's configured in deployments/compose/toxiproxy.json file. I.e. to simulate 50ms latency and 10ms jitter run:
docker-compose -f deployments/compose/docker-compose.yml exec toxiproxy /toxiproxy-cli toxic add -t latency -a latency=50 -a jitter=10 minio_masterAnd this is the way it was used to measure impact of network latency on compile time.


