-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmarks
Holden Salomon edited this page Jun 14, 2026
·
3 revisions
Inference latency and throughput for InsightFace Buffalo_L, measured on real hardware. Run with scripts/benchmark.py (included in the repo).
| GPU | NVIDIA GeForce RTX 2070 SUPER |
| Driver | 570.172.08 |
| Container CUDA | 12.8.1 |
| Host OS | Debian 13 (TrueNAS LXC) |
Detection (RetinaFace det_10g) + ArcFace embedding (w600k_r50) on a 640×640 image. This is the pipeline winnow runs for every face crop it evaluates during diversity selection and for crop alignment during upload.
| Mode | Model load | Median latency | Throughput |
|---|---|---|---|
GPU (FORCE_CPU=false) |
2.5 s | 12.8 ms | 78.3 img/s |
CPU (FORCE_CPU=true) |
4.3 s | 102 ms | 9.8 img/s |
GPU is 8× faster than CPU for InsightFace (12.8 ms vs 102 ms median).
- Latency measured over 30 runs after 5 warmup iterations.
- Input: 640×640 synthetic image. The detection network processes the full input regardless of whether faces are found; timing is representative of real-world single-image throughput.
- 320×320 input: CPU 101 ms, GPU 13.4 ms — runtime is dominated by fixed model overhead, not image size at these resolutions.
For a diversity pool of 500 candidates (typical for a well-tagged person):
| Mode | Embedding phase | Notes |
|---|---|---|
| GPU | ~6 s | 500 images × 12.8 ms |
| CPU | ~51 s | 500 images × 102 ms |
For large libraries with 3000-candidate pools (the internal cap), GPU reduces the diversity phase from ~5 min to ~38 s.
# Inside a running container — GPU mode:
docker exec winnow python /app/scripts/benchmark.py
# CPU-only mode:
docker exec -e FORCE_CPU=true winnow python /app/scripts/benchmark.py