This repository contains a C++23, header-only port of PrefixSieve1024, a deterministic prefix router for 16-bit keys. The implementation lives entirely in include/PrefixSieve1024.h and exposes a single lookup API:
int server = prefixsieve1024::PrefixSieve1024::GetServerId(prefix, serverCount);The design follows the C# implementation closely:
- 65,536 possible 16-bit prefixes.
- Up to 1,024 servers.
- Deterministic integer partitioning rather than probabilistic balancing.
- Precomputed lookup structures for fast steady-state routing.
- Header-only initialization through function-local statics.
The repository also includes include/JumpHash.h as a comparison baseline and benchmarks/benchmark.cpp to measure both algorithms under the same call shape.
Requirements:
- A C++23 compiler.
- CMake 3.10 or newer.
Build and run on Linux or macOS:
cmake -B build
cmake --build build
./build/benchmarkThe benchmark target pulls Daniel Lemire's counters library through CPM as configured in CMakeLists.txt.
Minimal example:
#include "PrefixSieve1024.h"
#include <cstdint>
int main() {
uint16_t prefix = 12345;
int serverCount = 64;
int server = prefixsieve1024::PrefixSieve1024::GetServerId(prefix, serverCount);
return server;
}Useful inspection helpers:
PrefixSieve1024::LowerRunCount()returns the number of compressed lower-run rows.PrefixSieve1024::TailRowCount()returns the number of ranks that require tail decoding.PrefixSieve1024::TailStreamByteCount()returns the size of the encoded tail stream.PrefixSieve1024::ApproximateTableBytes()returns the approximate memory used by the precomputed tables.
The tables are constructed lazily on first use, then reused for all later lookups in the process.
The data structure in include/PrefixSieve1024.h is split into a fast lower path and a compact tail path:
RunRowstores a lower-owner bit mask, the last rank covered by the row, and a residual bias.- Bucket hints every 16 ranks narrow the search to a tiny run interval.
- For
serverCount <= 64, lookup stays in the mask path and resolves the owner with bit operations. - For larger server counts, a sparse tail window identifies ranks that need encoded upper-owner data.
- The remaining cases continue from a checkpoint using the precomputed div/mod table.
On x86 targets, the implementation can use BMI2 bzhi for low-bit masking. On non-x86 targets it falls back to portable C++ bit logic.
The benchmark in benchmarks/benchmark.cpp:
- Generates 65,536 uniformly random 16-bit prefixes with a fixed seed.
- Verifies that both PrefixSieve1024 and Jump Hash always return ids in
[0, serverCount)for representative server counts. - Reports per-lookup latency and throughput for both algorithms at server counts 2, 8, 64, 256, and 1,024.
This is a microbenchmark of lookup cost only. It does not measure construction time, concurrent access patterns, cache interference from a larger application, or key distributions other than uniform random 16-bit prefixes.
On the current machine, ./build/benchmark prints:
Approx PrefixSieve1024 table bytes: 63312
JumpHash extra memory: 0 bytes
Keys per benchmark run: 65536
--- serverCount = 2 ---
PrefixSieve1024::GetServerId : 0.920 ns 1.09 Gv/s
JumpHash::GetServerId : 2.672 ns 0.37 Gv/s
--- serverCount = 8 ---
PrefixSieve1024::GetServerId : 0.905 ns 1.10 Gv/s
JumpHash::GetServerId : 8.619 ns 0.12 Gv/s
--- serverCount = 64 ---
PrefixSieve1024::GetServerId : 0.914 ns 1.09 Gv/s
JumpHash::GetServerId : 20.507 ns 0.05 Gv/s
--- serverCount = 256 ---
PrefixSieve1024::GetServerId : 7.589 ns 0.13 Gv/s
JumpHash::GetServerId : 27.095 ns 0.04 Gv/s
--- serverCount = 1024 ---
PrefixSieve1024::GetServerId : 16.940 ns 0.06 Gv/s
JumpHash::GetServerId : 36.800 ns 0.03 Gv/s
The most important pattern in these numbers is that PrefixSieve1024 stays near 1 ns while the lookup remains in the lower 64-owner mask path, then slows once larger server counts require the tail or checkpoint logic. Even in the 1,024-server case from this benchmark, it remains materially faster than the Jump Hash baseline.
- include/PrefixSieve1024.h: header-only PrefixSieve1024 implementation.
- include/JumpHash.h: baseline Jump Hash implementation used for comparison.
- benchmarks/benchmark.cpp: benchmark driver and invariant checks.
- CMakeLists.txt: build configuration.