PrefixSieve1024

This repository contains a C++23, header-only port of PrefixSieve1024, a deterministic prefix router for 16-bit keys. The implementation lives entirely in include/PrefixSieve1024.h and exposes a single lookup API:

int server = prefixsieve1024::PrefixSieve1024::GetServerId(prefix, serverCount);

The design follows the C# implementation closely:

65,536 possible 16-bit prefixes.
Up to 1,024 servers.
Deterministic integer partitioning rather than probabilistic balancing.
Precomputed lookup structures for fast steady-state routing.
Header-only initialization through function-local statics.

The repository also includes include/JumpHash.h as a comparison baseline and benchmarks/benchmark.cpp to measure both algorithms under the same call shape.

Build

Requirements:

A C++23 compiler.
CMake 3.10 or newer.

Build and run on Linux or macOS:

cmake -B build
cmake --build build
./build/benchmark

The benchmark target pulls Daniel Lemire's counters library through CPM as configured in CMakeLists.txt.

Using The Header

Minimal example:

#include "PrefixSieve1024.h"
#include <cstdint>

int main() {
	uint16_t prefix = 12345;
	int serverCount = 64;
	int server = prefixsieve1024::PrefixSieve1024::GetServerId(prefix, serverCount);
	return server;
}

Useful inspection helpers:

PrefixSieve1024::LowerRunCount() returns the number of compressed lower-run rows.
PrefixSieve1024::TailRowCount() returns the number of ranks that require tail decoding.
PrefixSieve1024::TailStreamByteCount() returns the size of the encoded tail stream.
PrefixSieve1024::ApproximateTableBytes() returns the approximate memory used by the precomputed tables.

The tables are constructed lazily on first use, then reused for all later lookups in the process.

Implementation Notes

The data structure in include/PrefixSieve1024.h is split into a fast lower path and a compact tail path:

RunRow stores a lower-owner bit mask, the last rank covered by the row, and a residual bias.
Bucket hints every 16 ranks narrow the search to a tiny run interval.
For serverCount <= 64, lookup stays in the mask path and resolves the owner with bit operations.
For larger server counts, a sparse tail window identifies ranks that need encoded upper-owner data.
The remaining cases continue from a checkpoint using the precomputed div/mod table.

On x86 targets, the implementation can use BMI2 bzhi for low-bit masking. On non-x86 targets it falls back to portable C++ bit logic.

What The Benchmark Measures

The benchmark in benchmarks/benchmark.cpp:

Generates 65,536 uniformly random 16-bit prefixes with a fixed seed.
Verifies that both PrefixSieve1024 and Jump Hash always return ids in [0, serverCount) for representative server counts.
Reports per-lookup latency and throughput for both algorithms at server counts 2, 8, 64, 256, and 1,024.

This is a microbenchmark of lookup cost only. It does not measure construction time, concurrent access patterns, cache interference from a larger application, or key distributions other than uniform random 16-bit prefixes.

Current Benchmark Output

On the current machine, ./build/benchmark prints:

Approx PrefixSieve1024 table bytes: 63312
JumpHash extra memory: 0 bytes
Keys per benchmark run: 65536

--- serverCount = 2 ---
PrefixSieve1024::GetServerId             :   0.920 ns   1.09 Gv/s
JumpHash::GetServerId                    :   2.672 ns   0.37 Gv/s
--- serverCount = 8 ---
PrefixSieve1024::GetServerId             :   0.905 ns   1.10 Gv/s
JumpHash::GetServerId                    :   8.619 ns   0.12 Gv/s
--- serverCount = 64 ---
PrefixSieve1024::GetServerId             :   0.914 ns   1.09 Gv/s
JumpHash::GetServerId                    :  20.507 ns   0.05 Gv/s
--- serverCount = 256 ---
PrefixSieve1024::GetServerId             :   7.589 ns   0.13 Gv/s
JumpHash::GetServerId                    :  27.095 ns   0.04 Gv/s
--- serverCount = 1024 ---
PrefixSieve1024::GetServerId             :  16.940 ns   0.06 Gv/s
JumpHash::GetServerId                    :  36.800 ns   0.03 Gv/s

The most important pattern in these numbers is that PrefixSieve1024 stays near 1 ns while the lookup remains in the lower 64-owner mask path, then slows once larger server counts require the tail or checkpoint logic. Even in the 1,024-server case from this benchmark, it remains materially faster than the Jump Hash baseline.

Repository Layout

include/PrefixSieve1024.h: header-only PrefixSieve1024 implementation.
include/JumpHash.h: baseline Jump Hash implementation used for comparison.
benchmarks/benchmark.cpp: benchmark driver and invariant checks.
CMakeLists.txt: build configuration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PrefixSieve1024

Build

Using The Header

Implementation Notes

What The Benchmark Measures

Current Benchmark Output

Repository Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
cmake		cmake
include		include
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

PrefixSieve1024

Build

Using The Header

Implementation Notes

What The Benchmark Measures

Current Benchmark Output

Repository Layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages