Fast, immutable, compact map from strings to 64-bit integers — for Python.
fastconstmap is a C implementation of the binary fuse filter
construction (a static perfect-hash-like structure), exposed to Python.
Given a dict[str, int] at build time, you get back a lookup object that:
- uses ~9 bytes per key (or ~18 with missing-key detection),
- answers a lookup in one xxhash call plus three array reads,
- is immutable and serializable to bytes / a file,
- exposes both a single-key API (
m[key]) and a batched API (m.get_many([...])) that amortises Python-C call overhead.
This package is a C port of the Go library
github.com/lemire/constmap. It vendors
xxHash (BSD-2) for string hashing.
pip install fastconstmap
To build from source you need a C compiler. There are no Python runtime dependencies.
from fastconstmap import ConstMap, VerifiedConstMap
d = {"apple": 100, "banana": 200, "cherry": 300}
# Variant 1: minimal memory, no missing-key detection.
m = ConstMap(d)
m["apple"] # -> 100
m.get_many(["banana", "cherry"]) # -> [200, 300]
m["grape"] # undefined value! use VerifiedConstMap if you care.
# Variant 2: dict-like, detects keys not in the original mapping.
vm = VerifiedConstMap(d)
vm["apple"] # -> 100
vm.get("grape") # -> None
vm.get("grape", -1) # -> -1
"grape" in vm # -> False
vm["grape"] # raises KeyError
vm.get_many(["banana", "grape"], default=-1) # -> [200, -1]
# Either kind can be saved and loaded.
m.save("mymap.cmap")
m2 = ConstMap.load("mymap.cmap")
# ... or as raw bytes.
blob = m.to_bytes()
m3 = ConstMap.from_bytes(blob)ConstMap |
VerifiedConstMap |
|
|---|---|---|
| Bytes per key | ~9 | ~18 |
| Lookup of present key | value | value |
| Lookup of missing key | undefined garbage | KeyError / default / None |
| Best if | you always look up known keys | you need dict-like semantics |
The false-positive rate of VerifiedConstMap (a missing key wrongly
reported as present) is roughly 2⁻⁶⁴, which is negligible in practice.
- Keys may be
strorbytes.stris encoded as UTF-8 internally; lookups must use the same encoding to match. - Values are 64-bit integers. We accept anything in
[-2**63, 2**64 - 1]; negatives are stored via two's complement, som[k]returns2**64 - 1for a value of-1. (To recover the signed reading, reinterpret bits yourself.)
Keys must be unique (Python dict semantics already guarantee this).
Construction raises ValueError in the extremely unlikely event of an
xxhash collision (~2⁻⁶⁴ per key pair).
A map's serialized form is its in-memory lookup array (plus a small
header). That means it can live in a
multiprocessing.shared_memory
block and be opened by any number of processes without copying — every
process reads the same physical pages.
Three methods make this work:
| Method | Purpose |
|---|---|
m.serialized_size() |
bytes needed to hold the serialized map |
m.write_into(buffer) |
serialize straight into a writable buffer (no intermediate bytes) |
ConstMap.from_buffer(buffer) |
open a zero-copy map that reads directly from buffer |
Producer — build once, publish into a named shared-memory block:
from multiprocessing.shared_memory import SharedMemory
from fastconstmap import ConstMap
SHM_NAME = "fastconstmap_demo"
cm = ConstMap({f"key-{i}": i for i in range(1_000_000)})
shm = SharedMemory(create=True, size=cm.serialized_size(), name=SHM_NAME)
cm.write_into(shm.buf)
# keep `shm` alive (do not close/unlink) while consumers are runningConsumer — attach to the same name with no copy:
from multiprocessing.shared_memory import SharedMemory
from fastconstmap import ConstMap
SHM_NAME = "fastconstmap_demo"
shm = SharedMemory(name=SHM_NAME) # attach by name, no `create=`
cm = ConstMap.from_buffer(shm.buf) # zero-copy: no per-process copy
cm["key-42"] # reads straight from shared memoryChoosing the name yourself (rather than letting SharedMemory generate
one) means consumers can hard-code it or read it from config — no need to
pass the auto-generated name around. Pick a unique name; creating a block
whose name already exists raises FileExistsError.
Notes and constraints:
from_bufferdoes not copy. The returned map holds a reference to the buffer; the buffer (and, for shared memory, theSharedMemoryobject) must stay alive and must not be closed or mutated while the map is in use. Drop the map (del cm) before callingshm.close().- The map is immutable — the intended pattern is write once in the producer, then only read in every process. Concurrent readers need no locking.
from_bufferverifies the magic bytes and the FNV-1a checksum, so a truncated or corrupt block raisesValueErrorrather than returning garbage.- Requirements: a little-endian host (x86-64, ARM64, …) and an
8-byte-aligned buffer.
SharedMemory.buf,bytes, andbytearrayall satisfy the alignment requirement; an offset slice of a buffer may not, in which casefrom_bufferraisesValueError— usefrom_bytes()(which copies) instead. VerifiedConstMapsupports the same three methods.
from_bytes() remains available when you want an owned copy (or need to
load on a big-endian host): it copies the data and the resulting map owns
its memory independently of the source buffer.
On an Apple M-series CPU, with 1,000,000 string keys
(key-{i}-{hex} shaped strings):
=== fastconstmap benchmark — n = 1,000,000 keys, python 3.14.3 ===
Construction:
ConstMap.__init__ 0.145 s
VerifiedConstMap.__init__ 0.129 s
dict(d) 0.005 s
Memory:
ConstMap.nbytes 9,043,968 bytes (9.04 bytes/key)
VerifiedConstMap.nbytes 18,087,936 bytes (18.09 bytes/key)
dict (table+keys+values) 118,380,958 bytes (118.38 bytes/key)
ratio dict / ConstMap 13.1x
Single lookup, 2,000,000 ops:
dict[k] 397.7 ns/op (0.795 s total)
ConstMap[k] 179.3 ns/op (0.359 s total)
VerifiedConstMap[k] 213.2 ns/op (0.426 s total)
Batched lookup, 2000 × 1024:
dict comprehension 31.9 ns/op (0.03 ms/batch of 1024)
ConstMap.get_many 14.6 ns/op (0.01 ms/batch of 1024)
VerifiedConstMap.get_many 16.5 ns/op (0.02 ms/batch of 1024)
Serialization:
ConstMap.to_bytes 0.009 s (9,044,004 bytes)
ConstMap.from_bytes 0.009 s
For better performance use get_many when you have an array of keys to look
up at once.
To reproduce:
python benchmarks/benchmark.py 1000000
Given n (key, value) pairs the algorithm:
- Hashes each key with xxhash3 to a 64-bit value.
- Maps each hashed key to three positions
h0, h1, h2in an array of size ~1.125·n, using overlapping segments. - Finds, via peeling, an ordering in which each key has an
exclusive cell among its three; walks that ordering in reverse,
setting each cell so
array[h0] ^ array[h1] ^ array[h2] == value.
Lookup is one xxhash, three array reads, and two XORs.
References:
Thomas Mueller Graf and Daniel Lemire, Binary Fuse Filters: Fast and Smaller Than Xor Filters, ACM Journal of Experimental Algorithmics, Vol. 27, 2022. DOI: 10.1145/3510449
Apache License 2.0. See LICENSE.
fastconstmap vendors xxHash, which is licensed under the BSD-2-clause
license; see src/third_party/xxhash/LICENSE.