Performance improvements to chess/polyglot.py (5 micro-optimisations)

All changes preserve the public API exactly. Happy to submit a PR if any or all of these look worthwhile — raising as an issue first since some involve trade-offs worth discussing.

---

## 1. POLYGLOT_RANDOM_ARRAY — list → tuple

The 781-element constant array is changed from a list literal to a tuple literal and annotated as `Tuple[int, ...]`.

**Why it matters:**
CPython list indexing requires a bounds check and a pointer indirection through an overallocated backing array. Tuple indexing uses a fixed-size C array with no overallocation, so reads are slightly faster and the object is ~64 bytes smaller. `hash_board()` indexes into this array once per occupied square (up to 32 times per call), so the saving compounds across every position evaluation.

The `ZobristHasher.__init__` signature is widened from `List[int]` to `Union[List[int], Tuple[int, ...]]` so existing callers passing a list are unaffected.

---

## 2. ZobristHasher.hash_board — typed bitboard iteration

**Original approach:**
```python
for pivot, squares in enumerate(board.occupied_co):
    for square in chess.scan_reversed(squares):
        piece_index = (typing.cast(chess.PieceType, board.piece_type_at(square)) - 1) * 2 + pivot
        zobrist_hash ^= self.array[64 * piece_index + square]
```
Each square requires a `BB_SQUARES[square]` lookup plus up to 6 bitboard membership tests inside `piece_type_at()` (expected ~3.5 on an average board).

**New approach:**
Iterate the six typed piece bitboards (`board.pawns`, `board.knights`, etc.) directly, masking with `occupied_co[WHITE]` and `occupied_co[BLACK]`. Use the LSB trick (`lsb = bb & -bb; sq = lsb.bit_length() - 1`) instead of `scan_reversed` to avoid Python generator overhead. Pre-compute array base offsets once per piece type.

**What is eliminated per occupied square:**
- One `BB_SQUARES[square]` lookup
- One `piece_type_at()` call (up to 6 bitboard membership tests)
- The color branch is hoisted outside the inner loop

A module-level `_PIECE_BB_OFFSET` tuple pre-computes the six `(piece_type - 1) * 2` values to eliminate constant multiplication from the inner loop.

---

## 3. MemoryMappedReader.__iter__ — struct.iter_unpack over memoryview

**Original:**
```python
for i in range(len(self)):
    yield self[i]
```
Each `self[i]` call computes `index * ENTRY_STRUCT.size`, calls `unpack_from()`, and decodes move fields. Index arithmetic and `__getitem__`'s negative-index handling and `IndexError` guard are all unnecessary during full sequential iteration.

**New:**
```python
mv = memoryview(self.mmap)
for key, raw_move, weight, learn in ENTRY_STRUCT.iter_unpack(mv):
    ...
```
`struct.iter_unpack()` advances through the buffer internally in C with no Python-level index arithmetic. The `memoryview` avoids copying bytes out of the mmap. Move decoding is inlined, eliminating the `__getitem__` call frame per entry.

Note: this duplicates the move-decoding logic from `__getitem__`. A shared `_decode_entry()` helper could consolidate them at a small per-entry call cost if preferred.

---

## 4. MemoryMappedReader.choice — single collect + rng.choice

**Original reservoir sampling:**
```python
chosen_entry = None
for i, entry in enumerate(self.find_all(...)):
    if chosen_entry is None or _randint(rng, 0, i) == i:
        chosen_entry = entry
```
`_randint()` is called once per entry after the first, each time allocating a Python frame. Uniform selection over N entries requires N−1 random draws.

**New:**
```python
entries = list(self.find_all(...))
return rng.choice(entries)
```
`random.Random.choice()` is implemented in C and performs a single uniform draw. For the common case of a small number of book entries (typically 1–20) the list allocation is negligible and the code is simpler.

**Trade-off:** reservoir sampling preserves O(1) memory if `find_all` ever returns a very large number of entries. For opening books this is practically irrelevant, but worth noting.

The module-level `_randint()` helper is removed. A module-level `_random_module` instance (seeded from OS entropy) is added so callers passing `random=None` get a properly seeded RNG rather than relying on the implicit global state of the `random` module.

---

## 5. MemoryMappedReader.weighted_choice — single-pass collect

**Original:**
```python
total_weights = sum(entry.weight for entry in self.find_all(board, ...))
...
for entry in self.find_all(board, ...):   # second full scan
    ...
```
`find_all()` scans the sorted key range twice — one `bisect_key_left` binary search and sequential read per scan.

**New:**
```python
entries = list(self.find_all(board, ...))
total_weights = sum(entry.weight for entry in entries)
...
for entry in entries:
    ...
```
One scan, one bisect. Memory cost is the same small list of `Entry` namedtuples that was being iterated twice anyway.

---

A patch against current master is available if useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance improvements to chess/polyglot.py (5 micro-optimisations) #1191

1. POLYGLOT_RANDOM_ARRAY — list → tuple

2. ZobristHasher.hash_board — typed bitboard iteration

3. MemoryMappedReader.iter — struct.iter_unpack over memoryview

4. MemoryMappedReader.choice — single collect + rng.choice

5. MemoryMappedReader.weighted_choice — single-pass collect

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Performance improvements to chess/polyglot.py (5 micro-optimisations) #1191

Description

1. POLYGLOT_RANDOM_ARRAY — list → tuple

2. ZobristHasher.hash_board — typed bitboard iteration

3. MemoryMappedReader.iter — struct.iter_unpack over memoryview

4. MemoryMappedReader.choice — single collect + rng.choice

5. MemoryMappedReader.weighted_choice — single-pass collect

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions