Skip to content

Benchmarks

Holden Salomon edited this page Jun 14, 2026 · 3 revisions

Benchmarks

Inference latency and throughput for InsightFace Buffalo_L, measured on real hardware. Run with scripts/benchmark.py (included in the repo).


Hardware

GPU NVIDIA GeForce RTX 2070 SUPER
Driver 570.172.08
Container CUDA 12.8.1
Host OS Debian 13 (TrueNAS LXC)

InsightFace Buffalo_L

Detection (RetinaFace det_10g) + ArcFace embedding (w600k_r50) on a 640×640 image. This is the pipeline winnow runs for every face crop it evaluates during diversity selection and for crop alignment during upload.

RTX 2070 SUPER

Mode Model load Median latency Throughput
GPU (FORCE_CPU=false) 2.5 s 12.8 ms 78.3 img/s
CPU (FORCE_CPU=true) 4.3 s 102 ms 9.8 img/s

GPU is 8× faster than CPU for InsightFace (12.8 ms vs 102 ms median).

Notes

  • Latency measured over 30 runs after 5 warmup iterations.
  • Input: 640×640 synthetic image. The detection network processes the full input regardless of whether faces are found; timing is representative of real-world single-image throughput.
  • 320×320 input: CPU 101 ms, GPU 13.4 ms — runtime is dominated by fixed model overhead, not image size at these resolutions.

Practical impact

For a diversity pool of 500 candidates (typical for a well-tagged person):

Mode Embedding phase Notes
GPU ~6 s 500 images × 12.8 ms
CPU ~51 s 500 images × 102 ms

For large libraries with 3000-candidate pools (the internal cap), GPU reduces the diversity phase from ~5 min to ~38 s.


Running the benchmark

# Inside a running container — GPU mode:
docker exec winnow python /app/scripts/benchmark.py

# CPU-only mode:
docker exec -e FORCE_CPU=true winnow python /app/scripts/benchmark.py

Clone this wiki locally