Benchmarks

Inference latency and throughput for winnow's two embedding models, measured on real hardware. Run with the scripts/benchmark.py script (included in the repo).

Hardware


CPU	AMD / Intel (see per-run notes)
GPU	NVIDIA GeForce RTX 2070 SUPER
Driver	570.172.08
Container CUDA	12.8.1
Host OS	Debian 13 (TrueNAS LXC)

InsightFace Buffalo_L — face mode

Detection (RetinaFace det_10g) + ArcFace embedding (w600k_r50) on a 640×640 image. This is the pipeline winnow runs for every face crop it evaluates.

RTX 2070 SUPER

Mode	Model load	Median latency	Throughput
GPU (`FORCE_CPU=false`)	2.5 s	12.8 ms	78.3 img/s
CPU (`FORCE_CPU=true`)	4.3 s	102 ms	9.8 img/s

Notes

Latency measured over 30 runs after 5 warmup iterations.
Input: 640×640 synthetic image. The detection network processes the full input regardless of whether faces are found; timing is representative of real-world single-image throughput.
320×320 input: CPU 101 ms, GPU 13.4 ms — detection runtime is dominated by fixed model overhead, not image size at these resolutions.
GPU is 8× faster than CPU for InsightFace (12.8 ms vs 102 ms).

SigLIP ViT-B/16 — object mode

google/siglip-base-patch16-224 — 224×224 Vision Transformer used for object-mode diversity selection. Supports batched inference; GPU benefit scales with batch size.

RTX 2070 SUPER

GPU (`FORCE_CPU=false`)

Batch	ms/batch	ms/img	img/s	p95/img
1	13.1	13.13	76.1	14.2
4	24.4	6.10	163.9	6.2
8	45.4	5.67	176.3	5.7
16	87.7	5.48	182.4	5.5
32	171.5	5.36	186.6	5.4

CPU (`FORCE_CPU=true`)

Batch	ms/batch	ms/img	img/s	p95/img
1	216	216	4.6	243
4	757	189	5.3	192
8	1450	181	5.5	202
16	2846	178	5.6	188
32	5683	178	5.6	188

Model load: 16.1 s (CPU; first load, no cache)

CPU batching saturates quickly — throughput barely improves past batch 4 (~5.5 img/s ceiling). GPU shows 33× speedup at batch 32 (186 img/s vs 5.6 img/s).

Running the benchmark

# Inside the container — GPU mode:
docker exec winnow python /app/scripts/benchmark.py

# CPU-only mode:
docker exec -e FORCE_CPU=true winnow python /app/scripts/benchmark.py

# Or directly with docker run:
docker run --rm --gpus all \
  --entrypoint /app/.venv/bin/python \
  -v /your/models:/insightface \
  -e INSIGHTFACE_HOME=/insightface \
  ghcr.io/sudolulo/winnow:latest \
  /app/scripts/benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks

Benchmarks

Hardware

InsightFace Buffalo_L — face mode

RTX 2070 SUPER

Notes

SigLIP ViT-B/16 — object mode

RTX 2070 SUPER

GPU (`FORCE_CPU=false`)

CPU (`FORCE_CPU=true`)

Running the benchmark

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Benchmarks

Benchmarks

Hardware

InsightFace Buffalo_L — face mode

RTX 2070 SUPER

Notes

SigLIP ViT-B/16 — object mode

RTX 2070 SUPER

GPU (FORCE_CPU=false)

CPU (FORCE_CPU=true)

Running the benchmark

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

GPU (`FORCE_CPU=false`)

CPU (`FORCE_CPU=true`)