Skip to content

Benchmarks

Holden Salomon edited this page Jun 13, 2026 · 3 revisions

Benchmarks

Inference latency and throughput for winnow's two embedding models, measured on real hardware. Run with the scripts/benchmark.py script (included in the repo).


Hardware

CPU AMD / Intel (see per-run notes)
GPU NVIDIA GeForce RTX 2070 SUPER
Driver 570.172.08
Container CUDA 12.8.1
Host OS Debian 13 (TrueNAS LXC)

InsightFace Buffalo_L — face mode

Detection (RetinaFace det_10g) + ArcFace embedding (w600k_r50) on a 640×640 image. This is the pipeline winnow runs for every face crop it evaluates.

RTX 2070 SUPER

Mode Model load Median latency Throughput
GPU (FORCE_CPU=false) 2.5 s 12.8 ms 78.3 img/s
CPU (FORCE_CPU=true) 4.3 s 102 ms 9.8 img/s

Notes

  • Latency measured over 30 runs after 5 warmup iterations.
  • Input: 640×640 synthetic image. The detection network processes the full input regardless of whether faces are found; timing is representative of real-world single-image throughput.
  • 320×320 input: CPU 101 ms, GPU 13.4 ms — detection runtime is dominated by fixed model overhead, not image size at these resolutions.
  • GPU is 8× faster than CPU for InsightFace (12.8 ms vs 102 ms).

SigLIP ViT-B/16 — object mode

google/siglip-base-patch16-224 — 224×224 Vision Transformer used for object-mode diversity selection. Supports batched inference; GPU benefit scales with batch size.

RTX 2070 SUPER

GPU (FORCE_CPU=false)

Batch ms/batch ms/img img/s p95/img
1 13.1 13.13 76.1 14.2
4 24.4 6.10 163.9 6.2
8 45.4 5.67 176.3 5.7
16 87.7 5.48 182.4 5.5
32 171.5 5.36 186.6 5.4

CPU (FORCE_CPU=true)

Batch ms/batch ms/img img/s p95/img
1 216 216 4.6 243
4 757 189 5.3 192
8 1450 181 5.5 202
16 2846 178 5.6 188
32 5683 178 5.6 188

Model load: 16.1 s (CPU; first load, no cache)

CPU batching saturates quickly — throughput barely improves past batch 4 (~5.5 img/s ceiling). GPU shows 33× speedup at batch 32 (186 img/s vs 5.6 img/s).


Running the benchmark

# Inside the container — GPU mode:
docker exec winnow python /app/scripts/benchmark.py

# CPU-only mode:
docker exec -e FORCE_CPU=true winnow python /app/scripts/benchmark.py

# Or directly with docker run:
docker run --rm --gpus all \
  --entrypoint /app/.venv/bin/python \
  -v /your/models:/insightface \
  -e INSIGHTFACE_HOME=/insightface \
  ghcr.io/sudolulo/winnow:latest \
  /app/scripts/benchmark.py

Clone this wiki locally