Benchmarks

Inference latency and throughput for InsightFace Buffalo_L, measured on real hardware. Run with scripts/benchmark.py (included in the repo).

Hardware


GPU	NVIDIA GeForce RTX 2070 SUPER
Driver	570.172.08
Container CUDA	12.8.1
Host OS	Debian 13 (TrueNAS LXC)

InsightFace Buffalo_L

Detection (RetinaFace det_10g) + ArcFace embedding (w600k_r50) on a 640×640 image. This is the pipeline winnow runs for every face crop it evaluates during diversity selection and for crop alignment during upload.

RTX 2070 SUPER

Mode	Model load	Median latency	Throughput
GPU (`FORCE_CPU=false`)	2.5 s	12.8 ms	78.3 img/s
CPU (`FORCE_CPU=true`)	4.3 s	102 ms	9.8 img/s

GPU is 8× faster than CPU for InsightFace (12.8 ms vs 102 ms median).

Notes

Latency measured over 30 runs after 5 warmup iterations.
Input: 640×640 synthetic image. The detection network processes the full input regardless of whether faces are found; timing is representative of real-world single-image throughput.
320×320 input: CPU 101 ms, GPU 13.4 ms — runtime is dominated by fixed model overhead, not image size at these resolutions.

Practical impact

For a diversity pool of 500 candidates (typical for a well-tagged person):

Mode	Embedding phase	Notes
GPU	~6 s	500 images × 12.8 ms
CPU	~51 s	500 images × 102 ms

For large libraries with 3000-candidate pools (the internal cap), GPU reduces the diversity phase from ~5 min to ~38 s.

Running the benchmark

# Inside a running container — GPU mode:
docker exec winnow python /app/scripts/benchmark.py

# CPU-only mode:
docker exec -e FORCE_CPU=true winnow python /app/scripts/benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks

Benchmarks

Hardware

InsightFace Buffalo_L

RTX 2070 SUPER

Notes

Practical impact

Running the benchmark

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally