AlphaZero Chess Engine

A complete implementation of the AlphaZero algorithm for chess using PyTorch, featuring:

Monte Carlo Tree Search (MCTS) with PUCT
Deep neural network with residual architecture
Self-play training pipeline
Model evaluation arena

Project Structure

chesshacks/
├── config.py                      # Configuration and hyperparameters
├── chess_board.py                 # Chess game rules and move generation
├── encoder_decoder.py             # Board encoding/decoding for neural network
├── alpha_net.py                   # AlphaGoZero neural network (19 residual blocks)
├── MCTS_chess.py                  # Monte Carlo Tree Search with PUCT
├── train.py                       # Neural network training
├── train_multiprocessing.py       # Parallel self-play training
├── evaluator.py                   # Model evaluation arena
├── pipeline.py                    # Full training pipeline orchestrator
├── requirements.txt               # Python dependencies
└── README.md                      # This file

Components

1. Chess Board (`chess_board.py`)

Complete chess rules implementation using python-chess library
Move generation and validation
Game state management
Board encoding utilities

2. Encoder/Decoder (`encoder_decoder.py`)

Encodes board state into 18-plane tensor representation:
- 12 planes for pieces (6 types × 2 colors)
- 2 planes for castling rights
- 2 planes for en passant
- 1 plane for turn color
- 1 plane for move count
Converts between chess moves and neural network outputs
Policy vector creation and decoding

3. Neural Network (`alpha_net.py`)

AlphaGoZero architecture with:

Input: 18×8×8 board representation
Architecture:
- Initial convolutional block (batch norm + ReLU)
- 19 residual blocks (each with 2 conv layers)
- 256 convolutional filters throughout
Output Heads:
- Policy head: 4096-dimensional move probabilities (log softmax)
- Value head: Position evaluation in range [-1, 1] (tanh)

4. MCTS (`MCTS_chess.py`)

Monte Carlo Tree Search implementation featuring:

PUCT (Polynomial Upper Confidence Trees) for node selection
Dirichlet noise for root exploration
Neural network guided search
Self-play game generation

5. Training (`train.py`)

PyTorch training loop with mixed loss (policy + value)
KL divergence loss for policy
MSE loss for value
Adam optimizer with weight decay
Batch training with data loader

6. Multiprocessing (`train_multiprocessing.py`)

Parallel self-play using multiple CPU workers
Efficient data generation across cores
Queue-based result collection

7. Evaluator (`evaluator.py`)

Arena for comparing model versions
Head-to-head game evaluation
Win rate calculation
Model replacement logic

8. Pipeline (`pipeline.py`)

Full training iteration pipeline:

Self-play using MCTS to generate training data
Train neural network on generated data
Evaluate new model against previous best
Keep best performing model

Installation

# Install dependencies
pip install -r requirements.txt

Requirements

Python 3.8+
PyTorch 2.0+
python-chess
NumPy
tqdm

Usage

Quick Start

Run the full training pipeline:

python pipeline.py --iterations 10 --games 50 --evaluate

Command Line Arguments

python pipeline.py [OPTIONS]

Options:
  --iterations N        Number of training iterations (default: 10)
  --games N            Self-play games per iteration (default: 100)
  --workers N          Number of parallel workers (default: CPU count)
  --evaluate           Enable model evaluation
  --eval-frequency N   Evaluate every N iterations (default: 5)
  --resume PATH        Resume from checkpoint
  --device DEVICE      Device to use: cpu or cuda (default: cuda)
  --verbose            Enable verbose output for detailed progress

Examples

Basic training (10 iterations, 50 games each):

python pipeline.py --iterations 10 --games 50

Training with evaluation (compare models every 5 iterations):

python pipeline.py --iterations 20 --games 100 --evaluate --eval-frequency 5

Resume from checkpoint:

python pipeline.py --iterations 10 --resume checkpoints/checkpoint_iter_0010.pth

CPU-only training with 4 workers:

python pipeline.py --iterations 5 --games 20 --workers 4 --device cpu

Configuration

Edit config.py to adjust hyperparameters:

Neural Network

NUM_RESIDUAL_BLOCKS = 19    # Number of residual blocks
NUM_FILTERS = 256           # Convolutional filters

MCTS

NUM_MCTS_SIMS = 800         # Simulations per move
CPUCT = 1.0                 # Exploration constant
DIRICHLET_ALPHA = 0.3       # Dirichlet noise alpha

Training

BATCH_SIZE = 256
LEARNING_RATE = 0.001
NUM_EPOCHS = 10
NUM_SELF_PLAY_GAMES = 100   # Games per iteration

Evaluation

EVAL_GAMES = 40             # Games for evaluation
WIN_THRESHOLD = 0.55        # Win rate to replace model

Training Pipeline Details

Full Iteration Cycle

Self-Play Generation
- Neural network plays against itself using MCTS
- Generates (state, policy, value) training examples
- Explores with Dirichlet noise at root node
- Temperature-based move selection
Neural Network Training
- Trains on generated examples
- Mixed loss: policy (KL divergence) + value (MSE)
- Adam optimizer with weight decay
- Batch normalization throughout
Model Evaluation (Optional)
- Pits new model against previous best
- Head-to-head games with alternating colors
- Keeps model with higher win rate
- Configurable win threshold

Directory Structure After Training

chesshacks/
├── checkpoints/           # Model checkpoints
│   ├── checkpoint_iter_0000.pth
│   ├── checkpoint_iter_0001.pth
│   └── ...
├── training_data/         # Self-play game data
│   ├── iteration_0000.npy
│   ├── iteration_0001.npy
│   └── ...
└── logs/                  # Training history
    └── training_history.json

Testing Individual Components

Test Neural Network

python alpha_net.py

Test MCTS

python MCTS_chess.py

Test Training

python train.py

Test Evaluator

python evaluator.py

Test Multiprocessing

python train_multiprocessing.py

Performance Notes

GPU Training: Significantly faster with CUDA-enabled PyTorch
Self-Play: CPU-bound, benefits from multiprocessing
Memory: Each game generates ~100-200 training examples
Time: ~1-2 hours per iteration with 100 games (depends on hardware)

Training Strategy

Early Training (Iterations 1-10)

Disable evaluation to accumulate diverse training data
Higher temperature for more exploration
Focus on generating large dataset

Mid Training (Iterations 10-50)

Enable evaluation every 5 iterations
Gradually reduce temperature
Start comparing model improvements

Late Training (Iterations 50+)

Frequent evaluation
Lower temperature for stronger play
Fine-tune on high-quality games

Algorithm Overview

AlphaZero combines three key techniques:

Self-Play: The system learns by playing against itself
MCTS: Guided tree search using neural network predictions
Deep Learning: CNN with residual blocks learns patterns

The neural network provides:

Policy: Move probabilities (which moves are promising)
Value: Position evaluation (who is winning)

MCTS uses these predictions to efficiently explore the game tree and select strong moves during self-play.

Citation

Based on the AlphaZero algorithm:

Silver, D., et al. (2017). Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm. arXiv:1712.01815

License

MIT License - See LICENSE file for details

Contributing

Contributions welcome! Areas for improvement:

Distributed training across multiple machines
Advanced data augmentation (board symmetries)
Opening book integration
Endgame tablebase integration
Tensorboard visualization
Model compression for faster inference

Troubleshooting

Out of Memory: Reduce BATCH_SIZE or NUM_RESIDUAL_BLOCKS in config.py

Slow Training:

Ensure PyTorch has CUDA support: torch.cuda.is_available()
Increase --workers for parallel self-play
Reduce NUM_MCTS_SIMS for faster (but weaker) play

Checkpoint Not Found: Check checkpoints/ directory exists and path is correct

Multiprocessing Errors: Set proper start method in your script:

import torch.multiprocessing as mp
mp.set_start_method('spawn', force=True)

Support

For issues and questions, please open an issue on the project repository.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
MCTS_chess.py		MCTS_chess.py
README.md		README.md
SOLUTION_DOCUMENTATION.md		SOLUTION_DOCUMENTATION.md
alpha_net.py		alpha_net.py
chess_board.py		chess_board.py
config.py		config.py
encoder_decoder.py		encoder_decoder.py
evaluator.py		evaluator.py
example_quick_start.py		example_quick_start.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
test_components.py		test_components.py
test_components_simple.py		test_components_simple.py
train.py		train.py
train_multiprocessing.py		train_multiprocessing.py
training.py		training.py
training_fixed.py		training_fixed.py

proprogrammer504/chesshacks

Folders and files

Latest commit

History

Repository files navigation