Benchmarks

Inference latency and throughput for winnow's two embedding models, measured on real hardware. Run with the scripts/benchmark.py script (included in the repo).

Hardware


CPU	AMD / Intel (see per-run notes)
GPU	NVIDIA GeForce RTX 2070 SUPER
Driver	570.172.08
Container CUDA	12.8.1
Host OS	Debian 13 (TrueNAS LXC)

InsightFace Buffalo_L — face mode

Detection (RetinaFace det_10g) + ArcFace embedding (w600k_r50) on a 640×640 image. This is the pipeline winnow runs for every face crop it evaluates.

RTX 2070 SUPER

Mode	Model load	Median latency	Throughput
GPU (`FORCE_CPU=false`)	—	—	—
CPU (`FORCE_CPU=true`)	4.3 s	102 ms	9.8 img/s

GPU results pending CUDA 12.8.1 image build.

Notes

Latency measured over 30 runs after 5 warmup iterations.
Input: 640×640 synthetic image. The detection network processes the full input regardless of whether faces are found; timing is representative of real-world single-image throughput.
320×320 input: CPU median 101 ms — detection runtime is dominated by the fixed model overhead, not image size at these resolutions.

SigLIP ViT-B/16 — object mode

google/siglip-base-patch16-224 — 224×224 Vision Transformer used for object-mode diversity selection. Supports batched inference; GPU benefit scales with batch size.

RTX 2070 SUPER

GPU (`FORCE_CPU=false`)

Batch	ms/batch	ms/img	img/s	p95/img
1	—	—	—	—
4	—	—	—	—
8	—	—	—	—
16	—	—	—	—
32	—	—	—	—

GPU results pending.

CPU (`FORCE_CPU=true`)

Batch	ms/batch	ms/img	img/s	p95/img
1	253	253	4.0	315
4	821	205	4.9	214
8	1566	196	5.1	214
16	3181	199	5.0	205
32	6175	193	5.2	199

Model load: 18.8 s (CPU; first load, no cache)

CPU batching saturates quickly — throughput barely improves past batch 4 (~5 img/s ceiling). On GPU, large batches are expected to see significant throughput gains.

Running the benchmark

# Inside the container — GPU mode:
docker exec winnow python /app/scripts/benchmark.py

# CPU-only mode:
docker exec -e FORCE_CPU=true winnow python /app/scripts/benchmark.py

# Or directly with docker run:
docker run --rm --gpus all \
  --entrypoint /app/.venv/bin/python \
  -v /your/models:/insightface \
  -e INSIGHTFACE_HOME=/insightface \
  ghcr.io/sudolulo/winnow:latest \
  /app/scripts/benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks

Benchmarks

Hardware

InsightFace Buffalo_L — face mode

RTX 2070 SUPER

Notes

SigLIP ViT-B/16 — object mode

RTX 2070 SUPER

GPU (`FORCE_CPU=false`)

CPU (`FORCE_CPU=true`)

Running the benchmark

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Batch	ms/batch	ms/img	img/s	p95/img
1	—	—	—	—
4	—	—	—	—
8	—	—	—	—
16	—	—	—	—
32	—	—	—	—

Batch	ms/batch	ms/img	img/s	p95/img
1	—	—	—	—
4	—	—	—	—
8	—	—	—	—
16	—	—	—	—
32	—	—	—	—

Benchmarks

Benchmarks

Hardware

InsightFace Buffalo_L — face mode

RTX 2070 SUPER

Notes

SigLIP ViT-B/16 — object mode

RTX 2070 SUPER

GPU (FORCE_CPU=false)

CPU (FORCE_CPU=true)

Running the benchmark

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

GPU (`FORCE_CPU=false`)

CPU (`FORCE_CPU=true`)

Batch	ms/batch	ms/img	img/s	p95/img
1	—	—	—	—
4	—	—	—	—
8	—	—	—	—
16	—	—	—	—
32	—	—	—	—