To feed models data, the features need to be in a numeric form. Fortunately, in chess the PGN-format (portable game notation) is often used, and this also is the case in our dataset. PGN makes it easy to get all states and moves from a game using <a href='https://en.wikipedia.org/wiki/Algebraic_notation_(chess)'>algebraic notation</a>. To extract features from the data, we need to somehow parse the PGN-format. For parsing, move generation, rule checks and other useful features, we will use the <a href='https://python-chess.readthedocs.io/en/latest/index.html'>python-chess</a> library.

First, let's import required modules and a game from the dataset.

In [2]:
import chess.pgn
import chess
import pandas as pd
import numpy as np
from pathlib import Path

root = Path.cwd().parent
data_dir = root / 'data/raw'
first_file = next(data_dir.iterdir())

pgn = open(file_path, encoding='utf-8')

game_1 = chess.pgn.read_game(pgn)

Using chess libs Game and Board objects, all kinds of operations can be done. For example, we can load up the first game from the sourcefile and see the board on move 10 or print out bit-representation of any piece type on the board.

In [23]:
board_1 = game_1.board()

# Skip to move 10
for mode_nr, move in enumerate(game_1.mainline_moves(), start=1):
    board.push(move)
    if mode_nr == 10:
        break

print(board_1, '\n')
print(bin(board_1.pieces(chess.PAWN, chess.WHITE)))

r n . q k b . r
p b p p . p . p
. p . . p . . p
. . . . . . . .
. . . P P . . .
P . N . . . . .
. P P . . P P P
R . . Q K B N R 

0b11000000000011110011000000000


To turn the different game states into features, we need to represent them in numbers somehow. To represent the board's piece placement, we can use a 12x8x8 tensor, where for each piece type there is an 8x8 bitboard that shows all positions of the corresponding pieces. However, merely piece placement is not enough, since we also need to know castling rights, en passant target square, the state of the fifty-move rule and the current player's turn.

Let's implement functions to convert the board into these features:

In [4]:
def get_bitboard_tensor(board: chess.Board) -> np.ndarray:
    """
    Converts a chess board representation into a bitboard representation.

    The function converts a given chess board into a 3D NumPy array where each layer
    corresponds to a specific combination of piece type and color. Each layer is an 8x8
    matrix representing the board, with `1` at positions where the specific piece type and
    color exist, and `0` elsewhere. The 3D array has 12 layers representing:
    - Index 0-5: White pieces (Pawn, Knight, Bishop, Rook, Queen, King)
    - Index 6-11: Black pieces (Pawn, Knight, Bishop, Rook, Queen, King)

    Parameters:
    board: chess.Board
        The current chess board states to be converted into a bitboard.

    Returns:
    np.ndarray
        A 12x8x8 NumPy array representing the bitboard equivalent of the input chess board.
    """
    colors = [chess.WHITE, chess.BLACK]
    pieces = [chess.PAWN, chess.KNIGHT, chess.BISHOP, chess.ROOK, chess.QUEEN, chess.KING]

    bitmask = np.zeros((12, 8, 8), dtype=np.uint8)

    for i, color in enumerate(colors):
        for j, piece in enumerate(pieces):
            piece_int = board.pieces(piece, color)
            for k in range(64):
                if piece_int & 1:
                    rank = k // 8
                    bitmask[i * 6 + j][rank][k - rank * 8] = 1
                piece_int >>= 1
    return


def get_castling_tensor(board: chess.Board) -> np.ndarray:
    """
    Get castling tensors for the given chess board.

    This function generates a 2x2 numpy array representing the castling rights for
    both players in the given chess board. Each row corresponds to a player (0 for
    white, 1 for black), and the two columns represent kingside and queenside
    castling rights respectively. Values are 1 if castling is allowed and 0
    otherwise.

    Args:
        board (chess.Board): The current state of the chess game to analyze.

    Returns:
        np.ndarray: A 2x2 array where rows represent players and columns represent
        kingside and queenside castling rights.
    """
    castling_tensors = np.zeros((2, 2), dtype=np.uint8)
    for i, color in enumerate([chess.WHITE, chess.BLACK]):
        castling_tensors[i][0] = board.has_kingside_castling_rights(color)
        castling_tensors[i][1] = board.has_queenside_castling_rights(color)
    return castling_tensors

print(get_castling_tensors(board)