-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmarks
Holden Salomon edited this page Jun 13, 2026
·
3 revisions
Inference latency and throughput for winnow's two embedding models, measured on real hardware. Run with the scripts/benchmark.py script (included in the repo).
| CPU | AMD / Intel (see per-run notes) |
| GPU | NVIDIA GeForce RTX 2070 SUPER |
| Driver | 570.172.08 |
| Container CUDA | 12.8.1 |
| Host OS | Debian 13 (TrueNAS LXC) |
Detection (RetinaFace det_10g) + ArcFace embedding (w600k_r50) on a 640×640 image. This is the pipeline winnow runs for every face crop it evaluates.
| Mode | Model load | Median latency | Throughput |
|---|---|---|---|
GPU (FORCE_CPU=false) |
— | — | — |
CPU (FORCE_CPU=true) |
4.3 s | 102 ms | 9.8 img/s |
GPU results pending CUDA 12.8.1 image build.
- Latency measured over 30 runs after 5 warmup iterations.
- Input: 640×640 synthetic image. The detection network processes the full input regardless of whether faces are found; timing is representative of real-world single-image throughput.
- 320×320 input: CPU median 101 ms — detection runtime is dominated by the fixed model overhead, not image size at these resolutions.
google/siglip-base-patch16-224 — 224×224 Vision Transformer used for object-mode diversity selection. Supports batched inference; GPU benefit scales with batch size.
| Batch | ms/batch | ms/img | img/s | p95/img |
|---|---|---|---|---|
| 1 | — | — | — | — |
| 4 | — | — | — | — |
| 8 | — | — | — | — |
| 16 | — | — | — | — |
| 32 | — | — | — | — |
GPU results pending.
| Batch | ms/batch | ms/img | img/s | p95/img |
|---|---|---|---|---|
| 1 | 253 | 253 | 4.0 | 315 |
| 4 | 821 | 205 | 4.9 | 214 |
| 8 | 1566 | 196 | 5.1 | 214 |
| 16 | 3181 | 199 | 5.0 | 205 |
| 32 | 6175 | 193 | 5.2 | 199 |
Model load: 18.8 s (CPU; first load, no cache)
CPU batching saturates quickly — throughput barely improves past batch 4 (~5 img/s ceiling). On GPU, large batches are expected to see significant throughput gains.
# Inside the container — GPU mode:
docker exec winnow python /app/scripts/benchmark.py
# CPU-only mode:
docker exec -e FORCE_CPU=true winnow python /app/scripts/benchmark.py
# Or directly with docker run:
docker run --rm --gpus all \
--entrypoint /app/.venv/bin/python \
-v /your/models:/insightface \
-e INSIGHTFACE_HOME=/insightface \
ghcr.io/sudolulo/winnow:latest \
/app/scripts/benchmark.py