A tiny, operator-curated artifact cache for a small lab, for the big vendor
downloads you re-pull constantly (CUDA, ROCm, DOCA, firmware, drivers), fronted
by transparent curl/wget shims so existing scripts use it with no changes.
Think of it as "ccache for HTTP artifacts, without a proxy."
curl -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz # your script, unchanged
└─ curlwithcache shim ─ WITHCACHE_SERVER set?
├─ cached → served from the cache-host (fast, local)
└─ miss/unset/unreachable → runs the real curl, exactly as written
Artifacts are cached by their origin URL as a key; the shim opts in by re-pointing the URL at the cache. No transparent proxy, no TLS interception, no client CA. The URL is a lookup key, not a connection target.
By default a miss is auto-fetched: the request falls through to origin (so
the caller gets its file straight away), and the cache-host pulls the same
artifact in the background, so the next request hits. Run with --curate to
require a human instead, who reviews the miss list in a small web UI and presses
Download (or pre-seeds via Add from URI). Either way the cache-host is the
only box that needs internet egress (and any vendor credentials), and clients
never write to it.
For https:// (i.e. every vendor download) a forward proxy can't cache without
SSL-bump / MITM: curl tunnels TLS end-to-end via CONNECT, so the proxy
only sees ciphertext. The shim sidesteps that entirely by re-pointing the URL
to the cache instead of intercepting the connection. And no proxy offers the
optional operator-curated model (--curate: a miss queue a human approves).
| Path | What it is |
|---|---|
src/withcache/server.py |
The cache-host: blob store + miss table + background download manager + operator UI (Pico.css + HTMX) |
src/withcache/_shim.py |
Shared shim core (find URL → probe → rewrite → exec) |
src/withcache/curlwithcache.py / wgetwithcache.py |
The Python curl / wget shims |
shim/shim.zig |
The native shim: one static binary, both tools via argv[0] |
deploy/Containerfile, deploy/compose.yml |
Single Podman/Docker host deploy |
The cache-host and the Python shims are stdlib-only (no third-party runtime deps); the native shim is a dependency-free static binary.
The cache-host and Python shims (works on any box with Python):
pipx install withcache # or: uv tool install withcache / pip install withcache
# provides: curlwithcache wgetwithcache withcache-serverThe native shim (no Python needed, for minimal/distroless boxes; ~200 KB static musl binary). Grab it from the Releases page; one binary serves both tools by the name it's invoked as:
curl -L .../releases/.../withcache-shim-x86_64-linux-musl -o /usr/local/bin/curlwithcache
chmod +x /usr/local/bin/curlwithcacheThe Python shim is also the tested oracle and install-time fallback for
platforms without a prebuilt binary; a differential test
asserts the binary and the Python plan() rewrite argv identically.
export WITHCACHE_ADMIN_PASSWORD=change-me # protects the operator UI
podman compose -f deploy/compose.yml up -d # or: docker compose -f ...
# operator UI: http://withcache-server:3000/Or without containers:
WITHCACHE_ADMIN_PASSWORD=change-me withcache-server --data-dir ./data --port 3000Data (blobs + cache.db + session-secret) lives in the /data volume (or
--data-dir). Artifacts are immutable per version, so there's no cache
invalidation. --workers N sets the number of concurrent download workers,
--curate switches from auto-fetch to operator-approved pulls, and --max-bytes
(e.g. 50G) caps the cache: when full it refuses new fills (no auto-eviction),
and you free space by deleting artifacts in the UI.
Every approach is the same two ingredients: (1) point at the cache with
WITHCACHE_SERVER, and (2) make curl/wget resolve to the shim. They differ
only in how widely the system curl/wget is shadowed. Pick the least
invasive one that fits.
Safety: with
WITHCACHE_SERVERunset the shim is a pure pass-through (it justexecs the real tool, zero network/parsing), so even the system-wide setup is harmless wherever the cache isn't configured. Worst case is always "no caching,curlstill works."
These all use command -v curlwithcache, so they work whether you installed the
native binary or the Python launcher (both land under that name).
Nothing is renamed; you opt in per command. Good for trying it out or a script you can edit.
export WITHCACHE_SERVER=http://withcache-server:3000
curlwithcache -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz
wgetwithcache https://the/origin/rocm.tar.gzPut curl/wget symlinks in a dir and prepend it to PATH in the current
shell. Reversible by just closing the shell.
mkdir -p ~/.withcache/bin
ln -sf "$(command -v curlwithcache)" ~/.withcache/bin/curl
ln -sf "$(command -v wgetwithcache)" ~/.withcache/bin/wget
export WITHCACHE_SERVER=http://withcache-server:3000
export PATH="$HOME/.withcache/bin:$PATH"
hash -r # forget any cached curl/wget location
command -v curl # -> ~/.withcache/bin/curl (verify it's the shim)
curl -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz # existing scripts, unchanged
wget https://the/origin/rocm.tar.gz # still saved as rocm.tar.gzCreate the symlinks once, then add the two exports to your shell rc. Affects all your future interactive shells; undo by deleting the block.
mkdir -p ~/.withcache/bin
ln -sf "$(command -v curlwithcache)" ~/.withcache/bin/curl
ln -sf "$(command -v wgetwithcache)" ~/.withcache/bin/wget
cat >> ~/.bashrc <<'EOF'
# withcache: transparent curl/wget caching
export WITHCACHE_SERVER=http://withcache-server:3000
export PATH="$HOME/.withcache/bin:$PATH"
EOFDrop an .envrc in a project tree (requires direnv); caching applies only
inside that directory.
# .envrc
export WITHCACHE_SERVER=http://withcache-server:3000
PATH_add ~/.withcache/bin # assumes the symlinks from approach 2/3 existThen direnv allow.
Install the shim as curl/wget in /usr/local/bin (ahead of /usr/bin on
the default PATH) and set the server globally. This also catches build tools
and package managers that shell out to curl/wget.
sudo ln -sf "$(command -v curlwithcache)" /usr/local/bin/curl
sudo ln -sf "$(command -v wgetwithcache)" /usr/local/bin/wget
# A login-shell env file (covers interactive logins; daemons started outside a
# login shell won't see it; set WITHCACHE_SERVER in their unit if you need it).
echo 'export WITHCACHE_SERVER=http://withcache-server:3000' \
| sudo tee /etc/profile.d/withcache.sh >/dev/nullOn minimal/distroless hosts use the native shim binary here: same symlink, no Python required.
command -v curl # which curl is in effect (the shim, or the real one)
export REAL_CURL=/usr/bin/curl # optional: pin the wrapped tool (also $REAL_WGET)
unset WITHCACHE_SERVER # instantly back to plain curl (pass-through)
rm ~/.withcache/bin/curl ~/.withcache/bin/wget # remove shadowing entirelyHow it works: the shim scans for the URL, asks the cache, and execs the real tool:
- Find the real
curl/wgeton$PATH(skipping itself;$REAL_CURL/$REAL_WGEToverride). - With
WITHCACHE_SERVERset, find the URL (thescheme://arg, or--url). - Probe the cache with that same tool (
curl -I/wget --spider).- Hit → re-point only the URL at
http://server/b/<base64(origin)>/<basename>andexecthe real tool (so-o,-O,-L,--retry, … all still apply, and the file is named after the artifact). - Miss / unreachable →
execthe real tool with your arguments untouched (origin); the miss is recorded for the operator.
- Hit → re-point only the URL at
- With no
WITHCACHE_SERVER, it does zero network/parsing, justexecs the real tool.
Notes & limits (all degrade gracefully; worst case is "no caching, curl still works"):
- Needs the wrapped tool present (it shims it). Adds ~Python-startup latency per call.
- URLs hidden in a
-K/-iconfig file or piped via stdin aren't seen → those calls pass through uncached. - Per-tool env override:
CURLWITHCACHE_SERVER/WGETWITHCACHE_SERVERbeatWITHCACHE_SERVER.
http://withcache-server:3000/ (Pico.css + HTMX, bundled offline) shows:
- Misses: auto-fetched by default, or (under
--curate) each with Download (queues a background pull) and Dismiss. - Downloads: live progress bars,
queued/running/completed/cancelled/failed, Cancel, and Clear finished. Downloads run in a background worker pool, not in the request, so large pulls never block, modelled on bty's job managers. - Cached artifacts: URL, size, hits (times served) and misses (times requested before it was cached), SHA-256, fetched-at, each with Delete to free space.
- Add from URI: pre-seed an artifact before anyone misses it.
Single-tenant session-cookie auth (modelled on bty's approach, env password
instead of PAM). The read path (/blob, /b/…, /healthz) is open so shims
never log in; the operator surface (/, /admin/*) is gated.
| Env var | Purpose |
|---|---|
WITHCACHE_SERVER |
Cache-host URL the shims use |
CURLWITHCACHE_SERVER / WGETWITHCACHE_SERVER |
Per-tool override of the above |
WITHCACHE_ADMIN_PASSWORD |
Operator login password (unset ⇒ UI open, with a warning) |
WITHCACHE_SESSION_SECRET |
Override the persisted cookie-signing key (optional) |
The key is scheme://host/path with the query string dropped by default, so
CDN/presigned URLs (whose tokens change every request) still match by path. Pass
--keep-query to the server for query-sensitive keys. Package-manager repos
(.deb/.rpm) are GPG-signed and verified by the client regardless of
transport, so caching them this way is safe.
A tool that already knows its download URLs (e.g. an installer or a provisioner)
can prefer the cache without shelling out to a shim or re-implementing the /b/
scheme. withcache.client is stdlib-only, so importing it adds no dependencies:
from withcache import client
# "use the cache when it's warm, the origin otherwise"
url = client.serve_url("http://cache:3000", origin) or originis_cached() is a graceful HEAD (a miss, timeout, or unreachable cache all
return False, so you fall back to the origin), and it doubles as a warm-up:
the probe records the miss and, in auto-fetch mode, enqueues the fill, so the
next call flips to the cache. The encoding is shared with the shims and server,
so consumers stay in lockstep with the cache-host.
python -m unittest discover -s tests # stdlib only, no test deps