Performance benchmarks for leanSig and leanMultisig across hardware.
Run one command on a target machine → get a JSON file with sign / verify / aggregate timings plus CPU and memory usage, committed back to this repo → GitHub Pages renders it with charts grouped by machine.
Flow: workloads/ (Rust binary) → runner/ (Python harness) →
results/*.json → scripts/build_index.py → site/ (static, deployed
to Pages).
- leanSig (variant
SIGTargetSumLifetime20W4NoOff, lifetime 2^20) — sign / verify - xmss (the XMSS inside
leanEthereum/leanMultisig, which is what leanSpec actually consumes) — sign / verify at crate defaults - leanMultisig aggregation at
LOG_INV_RATE_PROD=2— flat aggregation over 125 / 250 / 500 / 1000 sigs, plus tree aggregation with fan-in 2 / 4 / 8 over 125 / 250 / 500-sig leaves. 13 aggregate variants total, all run by default.
Key-generation is opt-in (--include-keygen) because lifetime-2^20 keygen
adds significant wall time.
- Python ≥ 3.10 and uv — uv manages
the Python side (
psutil). - rustc ≥ 1.87 via rustup — release builds use
target-cpu=native. Cargo fetches everything else. - On Linux, you may need
build-essentialandpkg-configif they aren't already installed. - For
uv run remote-benchonly: the gcloud CLI. See One-time GCP setup.
uv run bench| Option | Description |
|---|---|
--label <name> |
Override the auto-detected label (defaults to hostname, falling back to a CPU-derived slug if the hostname is generic) |
--include-keygen |
Also run leansig.keygen and xmss.keygen workloads — excluded by default because lifetime-2^20 keygen is slow |
--only <workload> |
Run only the named workload(s); pass the flag multiple times to include more than one workload |
--samples <N> |
Timed samples per workload (default 30); more = tighter stats at the cost of wall time |
--warmup <N> |
Warm-up runs before timing (default 3); more reduces cold-start noise |
--notes <text> |
Free-form note attached to the record |
Output lands at results/<YYYY-MM-DDTHH-MM-SSZ>__<fingerprint>.json. The
fingerprint is a 10-char hash of (CPU model, physical cores, memory GB, OS
family) — stable across runs on the same machine so the site groups them.
GCP is the only provider currently supported — contributions to add others are welcome. First time? See One-time GCP setup (least-privilege) for the service-account, role, and credentials JSON setup.
The command below spins up a fresh VM, installs everything, runs the benchmark, pulls the result JSON back locally, and destroys the VM:
# Run the default machine matrix (asks y/N before kicking off):
uv run remote-bench --credentials gcp-credentials.json
# Or pin a single machine type:
uv run remote-bench \
--credentials gcp-credentials.json \
--machine-type n2-standard-8 \
--image-family ubuntu-2404-lts-amd64The default matrix is intentionally small and EIP-7870-anchored — defined
in scripts/remote_bench.py as
DEFAULT_MACHINE_TYPES.
| Option | Description |
|---|---|
--credentials <path> |
Path to GCP service-account JSON key (required) |
--machine-type <name> |
Pin to one machine type; default runs the full matrix |
--no-parallel |
Run VMs sequentially (parallel by default) |
--yes / -y |
Skip the y/N prompt in unattended runs |
--image-family <name> |
ubuntu-2404-lts-amd64 for x86_64, ubuntu-2404-lts-arm64 for ARM |
--project <id> |
Defaults to project_id from the credentials JSON |
The result lands in ./results/<timestamp>__<fingerprint>.json. Commit
and push to this repo to share your results publicly. The VM is destroyed
in a try/finally, including on Ctrl-C; orphans are tagged
lean-bench=true so they're easy to spot.
uv run serve # http://localhost:8000
uv run serve --port 4000Serves site/ and results/ live from their source locations —
edit site/app.js, drop a new run JSON into results/, just refresh the
browser. /results/index.json is regenerated on every request so it always
reflects what's on disk.
The site is the union of every committed run file. If your machine isn't already represented — different CPU generation, ARM, unusual core count, atypical OS — your numbers fill a gap.
- Run the bench (
uv run benchlocally, oruv run remote-benchfor cloud). - The result lands in
results/<timestamp>__<fingerprint>.json. Don't edit it. - Commit and open a PR against
main. - On merge, the site rebuilds and your machine appears grouped by fingerprint.
- Quiet machine. Close browsers, kill background tasks. Sign / verify are single-threaded; aggregation uses Rayon — co-tenants skew both.
- Use
--notesfor context. Cooling, ambient temp, shared-host status, anything a future reader needs to interpret the numbers. - Use
--labelfor cloud VMs.--label "gcp-c4-standard-8"reads better than the auto-detected hostname. - Multiple runs on the same machine are fine — they group by hardware fingerprint and the site shows variance.
leanBench/
├─ pyproject.toml primary project; uv commands live here
├─ uv.lock
├─ runner/ Python orchestrator (entry: `uv run bench`)
│ ├─ bench.py invokes binary, samples resources, writes JSON
│ ├─ sysinfo.py CPU / RAM / OS detection + fingerprint
│ └─ sampler.py psutil-based CPU/memory polling
├─ workloads/ Rust binary, one subcommand per workload → JSON stdout
│ ├─ Cargo.toml pins leanSig + leanMultisig SHAs
│ ├─ build.rs bakes the pinned SHAs into the binary
│ └─ src/
│ ├─ main.rs workload dispatch + per-sample timing
│ └─ workloads.rs per-workload setup and measurement
├─ scripts/
│ ├─ build_index.py scans results/*.json → results/index.json (CI)
│ ├─ dev_server.py live-reload preview (uv run serve)
│ ├─ remote_bench.py spin up + tear down cloud VMs (uv run remote-bench)
│ └─ provisioners/ cloud-provider drivers (currently GCP only)
├─ results/ committed JSON, one file per run
├─ site/ static site (vanilla JS + Chart.js, vendored)
│ ├─ index.html list of machines; cross-machine comparison
│ ├─ run.html per-run detail with per-workload charts
│ ├─ topology.html aggregation-topology feasibility explorer
│ ├─ trend.html trends across runs and SHAs
│ ├─ app.js / topology.js / trend.js
│ ├─ style.css
│ └─ vendor/ chart.umd.min.js (no CDN dependency)
└─ .github/workflows/
└─ deploy-pages.yml on push: rebuild index + deploy site to Pages
Avoid using your personal gcloud session for this — create a dedicated
service account with a custom role limited to exactly what the script
needs.
-
Create a service account:
gcloud iam service-accounts create lean-bench \ --display-name="leanBench remote runner" \ --project=$PROJECT -
Create a custom role with only these permissions:
compute.instances.createcompute.instances.deletecompute.instances.getcompute.instances.setMetadatacompute.instances.setLabelscompute.disks.createcompute.subnetworks.usecompute.subnetworks.useExternalIpcompute.zones.getcompute.projects.getiap.tunnelInstances.accessViaIAP
gcloud iam roles create leanBenchRunner --project=$PROJECT \ --title="leanBench runner" \ --permissions=compute.instances.create,compute.instances.delete,compute.instances.get,compute.instances.setMetadata,compute.instances.setLabels,compute.disks.create,compute.subnetworks.use,compute.subnetworks.useExternalIp,compute.zones.get,compute.projects.get,iap.tunnelInstances.accessViaIAP
To add a permission to an existing role:
gcloud iam roles update leanBenchRunner --project=$PROJECT \ --add-permissions=<permission.name>
-
Grant the role to the SA (optionally with an IAM Condition scoping to the
lean-bench=truelabel so even a leaked key can only manage VMs the script itself created):gcloud projects add-iam-policy-binding $PROJECT \ --member="serviceAccount:lean-bench@$PROJECT.iam.gserviceaccount.com" \ --role="projects/$PROJECT/roles/leanBenchRunner"
-
Create a JSON key and stash it locally (gitignored):
gcloud iam service-accounts keys create gcp-credentials.json \ --iam-account="lean-bench@$PROJECT.iam.gserviceaccount.com" -
Enable required APIs once:
gcloud services enable compute.googleapis.com iap.googleapis.com --project=$PROJECT
The script activates the SA in a temp CLOUDSDK_CONFIG directory per
invocation — your normal gcloud session is never touched, and no auth
material persists after the script exits.
For extra safety, run benchmarks in their own GCP project so a compromise stays contained, billing is isolated, and you can set a low budget alert specifically for benchmark VMs.
- SHAs of leanSig and leanMultisig are pinned in
Cargo.tomland baked into the runner binary bybuild.rs— the output JSON records them. - Runner uses a deterministic seed (
--seed, default0xC0FFEE) for all RNG draws — keygen entropy, message generation, and per-iteration key variation. Two machines running the same SHA with the same seed operate on byte-identical inputs, so timing deltas reflect hardware and build, not RNG luck. - The fingerprint is coarse by design — OS release and kernel changes don't break machine grouping. The full OS/kernel string is still in the record for diagnosis.
- Aggregation benchmark returns total wall-clock including upstream one-time setup (DFT twiddles, bytecode init) on the first call. We use a per-workload warmup to amortize it; first process launch on a cold signer cache still regenerates 10k XMSS keys (~minutes on first run ever, cached thereafter).
- CPU percentage is summed across logical cores — a fully-utilized 16-core machine reports 1600%, not 100%.
- No turbo / governor pinning baked in. Pinning the governor to
performanceand disabling turbo gives a flat baseline (same clock all the time, no thermal-driven drift), so results don't depend on what the machine was doing 5 seconds ago (governor state) or ambient room temperature (turbo headroom).