-
Notifications
You must be signed in to change notification settings - Fork 1
Hardware Performance
Each VSX card has 42 GB of storage for holding reference vectors.
Each VSX card searches its memory at a rate of 20 ms / GB.
This leads to a very simple equation for calculating the expected latency for a single vector:
Note that datasets are always distributed evenly across all available VSX cards.
Remember when applying this calculation that:
- After integer mapping, your components are 8-bit integers and not 32-bit floats.
- 1 GB = 2^30 Bytes (not 1E9 bytes!). Similarly, 1 GB = 2^10 MB.
Each VSX card has 16 query "slots" which allow it to process up to 16 querries simultaneously. This means that when submitting querries as a batch, you can divide the latency by 16.
The MNIST training dataset consists of 55,000 reference vectors with 1,024 components each. Therefore, using two VSX cards, the expected latency for a single query is given by:
Latency = (55,000 vectors x 1,024 bytes / 2^30bytes/GB) / 2 VSX Cards x 20ms/GB = 0.52ms
If we instead present a batch of 1,024 queries, the latency becomes:
Batch Latency = 1,024 queries x 0.52ms/query / 16 slots = 33ms
Brute Force Benchmarks
ANN Benchmarks