Skip to content

Benchmarks

Holden Salomon edited this page Jun 13, 2026 · 3 revisions

Benchmarks

Inference latency and throughput for winnow's two embedding models, measured on real hardware. Run with the scripts/benchmark.py script (included in the repo).


Hardware

CPU AMD / Intel (see per-run notes)
GPU NVIDIA GeForce RTX 2070 SUPER
Driver 570.172.08
Container CUDA 12.8.1
Host OS Debian 13 (TrueNAS LXC)

InsightFace Buffalo_L — face mode

Detection (RetinaFace det_10g) + ArcFace embedding (w600k_r50) on a 640×640 image. This is the pipeline winnow runs for every face crop it evaluates.

RTX 2070 SUPER

Mode Model load Median latency Throughput
GPU (FORCE_CPU=false)
CPU (FORCE_CPU=true) 4.3 s 102 ms 9.8 img/s

GPU results pending CUDA 12.8.1 image build.

Notes

  • Latency measured over 30 runs after 5 warmup iterations.
  • Input: 640×640 synthetic image. The detection network processes the full input regardless of whether faces are found; timing is representative of real-world single-image throughput.
  • 320×320 input: CPU median 101 ms — detection runtime is dominated by the fixed model overhead, not image size at these resolutions.

SigLIP ViT-B/16 — object mode

google/siglip-base-patch16-224 — 224×224 Vision Transformer used for object-mode diversity selection. Supports batched inference; GPU benefit scales with batch size.

RTX 2070 SUPER

GPU (FORCE_CPU=false)

Batch ms/batch ms/img img/s p95/img
1
4
8
16
32

GPU results pending.

CPU (FORCE_CPU=true)

Batch ms/batch ms/img img/s p95/img
1 253 253 4.0 315
4 821 205 4.9 214
8 1566 196 5.1 214
16 3181 199 5.0 205
32 6175 193 5.2 199

Model load: 18.8 s (CPU; first load, no cache)

CPU batching saturates quickly — throughput barely improves past batch 4 (~5 img/s ceiling). On GPU, large batches are expected to see significant throughput gains.


Running the benchmark

# Inside the container — GPU mode:
docker exec winnow python /app/scripts/benchmark.py

# CPU-only mode:
docker exec -e FORCE_CPU=true winnow python /app/scripts/benchmark.py

# Or directly with docker run:
docker run --rm --gpus all \
  --entrypoint /app/.venv/bin/python \
  -v /your/models:/insightface \
  -e INSIGHTFACE_HOME=/insightface \
  ghcr.io/sudolulo/winnow:latest \
  /app/scripts/benchmark.py

Clone this wiki locally