All changes preserve the public API exactly. Happy to submit a PR if any or all of these look worthwhile — raising as an issue first since some involve trade-offs worth discussing.
1. POLYGLOT_RANDOM_ARRAY — list → tuple
The 781-element constant array is changed from a list literal to a tuple literal and annotated as Tuple[int, ...].
Why it matters:
CPython list indexing requires a bounds check and a pointer indirection through an overallocated backing array. Tuple indexing uses a fixed-size C array with no overallocation, so reads are slightly faster and the object is ~64 bytes smaller. hash_board() indexes into this array once per occupied square (up to 32 times per call), so the saving compounds across every position evaluation.
The ZobristHasher.__init__ signature is widened from List[int] to Union[List[int], Tuple[int, ...]] so existing callers passing a list are unaffected.
2. ZobristHasher.hash_board — typed bitboard iteration
Original approach:
for pivot, squares in enumerate(board.occupied_co):
for square in chess.scan_reversed(squares):
piece_index = (typing.cast(chess.PieceType, board.piece_type_at(square)) - 1) * 2 + pivot
zobrist_hash ^= self.array[64 * piece_index + square]
Each square requires a BB_SQUARES[square] lookup plus up to 6 bitboard membership tests inside piece_type_at() (expected ~3.5 on an average board).
New approach:
Iterate the six typed piece bitboards (board.pawns, board.knights, etc.) directly, masking with occupied_co[WHITE] and occupied_co[BLACK]. Use the LSB trick (lsb = bb & -bb; sq = lsb.bit_length() - 1) instead of scan_reversed to avoid Python generator overhead. Pre-compute array base offsets once per piece type.
What is eliminated per occupied square:
- One
BB_SQUARES[square] lookup
- One
piece_type_at() call (up to 6 bitboard membership tests)
- The color branch is hoisted outside the inner loop
A module-level _PIECE_BB_OFFSET tuple pre-computes the six (piece_type - 1) * 2 values to eliminate constant multiplication from the inner loop.
3. MemoryMappedReader.iter — struct.iter_unpack over memoryview
Original:
for i in range(len(self)):
yield self[i]
Each self[i] call computes index * ENTRY_STRUCT.size, calls unpack_from(), and decodes move fields. Index arithmetic and __getitem__'s negative-index handling and IndexError guard are all unnecessary during full sequential iteration.
New:
mv = memoryview(self.mmap)
for key, raw_move, weight, learn in ENTRY_STRUCT.iter_unpack(mv):
...
struct.iter_unpack() advances through the buffer internally in C with no Python-level index arithmetic. The memoryview avoids copying bytes out of the mmap. Move decoding is inlined, eliminating the __getitem__ call frame per entry.
Note: this duplicates the move-decoding logic from __getitem__. A shared _decode_entry() helper could consolidate them at a small per-entry call cost if preferred.
4. MemoryMappedReader.choice — single collect + rng.choice
Original reservoir sampling:
chosen_entry = None
for i, entry in enumerate(self.find_all(...)):
if chosen_entry is None or _randint(rng, 0, i) == i:
chosen_entry = entry
_randint() is called once per entry after the first, each time allocating a Python frame. Uniform selection over N entries requires N−1 random draws.
New:
entries = list(self.find_all(...))
return rng.choice(entries)
random.Random.choice() is implemented in C and performs a single uniform draw. For the common case of a small number of book entries (typically 1–20) the list allocation is negligible and the code is simpler.
Trade-off: reservoir sampling preserves O(1) memory if find_all ever returns a very large number of entries. For opening books this is practically irrelevant, but worth noting.
The module-level _randint() helper is removed. A module-level _random_module instance (seeded from OS entropy) is added so callers passing random=None get a properly seeded RNG rather than relying on the implicit global state of the random module.
5. MemoryMappedReader.weighted_choice — single-pass collect
Original:
total_weights = sum(entry.weight for entry in self.find_all(board, ...))
...
for entry in self.find_all(board, ...): # second full scan
...
find_all() scans the sorted key range twice — one bisect_key_left binary search and sequential read per scan.
New:
entries = list(self.find_all(board, ...))
total_weights = sum(entry.weight for entry in entries)
...
for entry in entries:
...
One scan, one bisect. Memory cost is the same small list of Entry namedtuples that was being iterated twice anyway.
A patch against current master is available if useful.
All changes preserve the public API exactly. Happy to submit a PR if any or all of these look worthwhile — raising as an issue first since some involve trade-offs worth discussing.
1. POLYGLOT_RANDOM_ARRAY — list → tuple
The 781-element constant array is changed from a list literal to a tuple literal and annotated as
Tuple[int, ...].Why it matters:
CPython list indexing requires a bounds check and a pointer indirection through an overallocated backing array. Tuple indexing uses a fixed-size C array with no overallocation, so reads are slightly faster and the object is ~64 bytes smaller.
hash_board()indexes into this array once per occupied square (up to 32 times per call), so the saving compounds across every position evaluation.The
ZobristHasher.__init__signature is widened fromList[int]toUnion[List[int], Tuple[int, ...]]so existing callers passing a list are unaffected.2. ZobristHasher.hash_board — typed bitboard iteration
Original approach:
Each square requires a
BB_SQUARES[square]lookup plus up to 6 bitboard membership tests insidepiece_type_at()(expected ~3.5 on an average board).New approach:
Iterate the six typed piece bitboards (
board.pawns,board.knights, etc.) directly, masking withoccupied_co[WHITE]andoccupied_co[BLACK]. Use the LSB trick (lsb = bb & -bb; sq = lsb.bit_length() - 1) instead ofscan_reversedto avoid Python generator overhead. Pre-compute array base offsets once per piece type.What is eliminated per occupied square:
BB_SQUARES[square]lookuppiece_type_at()call (up to 6 bitboard membership tests)A module-level
_PIECE_BB_OFFSETtuple pre-computes the six(piece_type - 1) * 2values to eliminate constant multiplication from the inner loop.3. MemoryMappedReader.iter — struct.iter_unpack over memoryview
Original:
Each
self[i]call computesindex * ENTRY_STRUCT.size, callsunpack_from(), and decodes move fields. Index arithmetic and__getitem__'s negative-index handling andIndexErrorguard are all unnecessary during full sequential iteration.New:
struct.iter_unpack()advances through the buffer internally in C with no Python-level index arithmetic. Thememoryviewavoids copying bytes out of the mmap. Move decoding is inlined, eliminating the__getitem__call frame per entry.Note: this duplicates the move-decoding logic from
__getitem__. A shared_decode_entry()helper could consolidate them at a small per-entry call cost if preferred.4. MemoryMappedReader.choice — single collect + rng.choice
Original reservoir sampling:
_randint()is called once per entry after the first, each time allocating a Python frame. Uniform selection over N entries requires N−1 random draws.New:
random.Random.choice()is implemented in C and performs a single uniform draw. For the common case of a small number of book entries (typically 1–20) the list allocation is negligible and the code is simpler.Trade-off: reservoir sampling preserves O(1) memory if
find_allever returns a very large number of entries. For opening books this is practically irrelevant, but worth noting.The module-level
_randint()helper is removed. A module-level_random_moduleinstance (seeded from OS entropy) is added so callers passingrandom=Noneget a properly seeded RNG rather than relying on the implicit global state of therandommodule.5. MemoryMappedReader.weighted_choice — single-pass collect
Original:
find_all()scans the sorted key range twice — onebisect_key_leftbinary search and sequential read per scan.New:
One scan, one bisect. Memory cost is the same small list of
Entrynamedtuples that was being iterated twice anyway.A patch against current master is available if useful.