A complete implementation of the AlphaZero algorithm for chess using PyTorch, featuring:
- Monte Carlo Tree Search (MCTS) with PUCT
- Deep neural network with residual architecture
- Self-play training pipeline
- Model evaluation arena
chesshacks/
├── config.py # Configuration and hyperparameters
├── chess_board.py # Chess game rules and move generation
├── encoder_decoder.py # Board encoding/decoding for neural network
├── alpha_net.py # AlphaGoZero neural network (19 residual blocks)
├── MCTS_chess.py # Monte Carlo Tree Search with PUCT
├── train.py # Neural network training
├── train_multiprocessing.py # Parallel self-play training
├── evaluator.py # Model evaluation arena
├── pipeline.py # Full training pipeline orchestrator
├── requirements.txt # Python dependencies
└── README.md # This file
- Complete chess rules implementation using
python-chesslibrary - Move generation and validation
- Game state management
- Board encoding utilities
- Encodes board state into 18-plane tensor representation:
- 12 planes for pieces (6 types × 2 colors)
- 2 planes for castling rights
- 2 planes for en passant
- 1 plane for turn color
- 1 plane for move count
- Converts between chess moves and neural network outputs
- Policy vector creation and decoding
AlphaGoZero architecture with:
- Input: 18×8×8 board representation
- Architecture:
- Initial convolutional block (batch norm + ReLU)
- 19 residual blocks (each with 2 conv layers)
- 256 convolutional filters throughout
- Output Heads:
- Policy head: 4096-dimensional move probabilities (log softmax)
- Value head: Position evaluation in range [-1, 1] (tanh)
Monte Carlo Tree Search implementation featuring:
- PUCT (Polynomial Upper Confidence Trees) for node selection
- Dirichlet noise for root exploration
- Neural network guided search
- Self-play game generation
- PyTorch training loop with mixed loss (policy + value)
- KL divergence loss for policy
- MSE loss for value
- Adam optimizer with weight decay
- Batch training with data loader
- Parallel self-play using multiple CPU workers
- Efficient data generation across cores
- Queue-based result collection
- Arena for comparing model versions
- Head-to-head game evaluation
- Win rate calculation
- Model replacement logic
Full training iteration pipeline:
- Self-play using MCTS to generate training data
- Train neural network on generated data
- Evaluate new model against previous best
- Keep best performing model
# Install dependencies
pip install -r requirements.txt- Python 3.8+
- PyTorch 2.0+
- python-chess
- NumPy
- tqdm
Run the full training pipeline:
python pipeline.py --iterations 10 --games 50 --evaluatepython pipeline.py [OPTIONS]
Options:
--iterations N Number of training iterations (default: 10)
--games N Self-play games per iteration (default: 100)
--workers N Number of parallel workers (default: CPU count)
--evaluate Enable model evaluation
--eval-frequency N Evaluate every N iterations (default: 5)
--resume PATH Resume from checkpoint
--device DEVICE Device to use: cpu or cuda (default: cuda)
--verbose Enable verbose output for detailed progressBasic training (10 iterations, 50 games each):
python pipeline.py --iterations 10 --games 50Training with evaluation (compare models every 5 iterations):
python pipeline.py --iterations 20 --games 100 --evaluate --eval-frequency 5Resume from checkpoint:
python pipeline.py --iterations 10 --resume checkpoints/checkpoint_iter_0010.pthCPU-only training with 4 workers:
python pipeline.py --iterations 5 --games 20 --workers 4 --device cpuEdit config.py to adjust hyperparameters:
NUM_RESIDUAL_BLOCKS = 19 # Number of residual blocks
NUM_FILTERS = 256 # Convolutional filtersNUM_MCTS_SIMS = 800 # Simulations per move
CPUCT = 1.0 # Exploration constant
DIRICHLET_ALPHA = 0.3 # Dirichlet noise alphaBATCH_SIZE = 256
LEARNING_RATE = 0.001
NUM_EPOCHS = 10
NUM_SELF_PLAY_GAMES = 100 # Games per iterationEVAL_GAMES = 40 # Games for evaluation
WIN_THRESHOLD = 0.55 # Win rate to replace model-
Self-Play Generation
- Neural network plays against itself using MCTS
- Generates (state, policy, value) training examples
- Explores with Dirichlet noise at root node
- Temperature-based move selection
-
Neural Network Training
- Trains on generated examples
- Mixed loss: policy (KL divergence) + value (MSE)
- Adam optimizer with weight decay
- Batch normalization throughout
-
Model Evaluation (Optional)
- Pits new model against previous best
- Head-to-head games with alternating colors
- Keeps model with higher win rate
- Configurable win threshold
chesshacks/
├── checkpoints/ # Model checkpoints
│ ├── checkpoint_iter_0000.pth
│ ├── checkpoint_iter_0001.pth
│ └── ...
├── training_data/ # Self-play game data
│ ├── iteration_0000.npy
│ ├── iteration_0001.npy
│ └── ...
└── logs/ # Training history
└── training_history.json
python alpha_net.pypython MCTS_chess.pypython train.pypython evaluator.pypython train_multiprocessing.py- GPU Training: Significantly faster with CUDA-enabled PyTorch
- Self-Play: CPU-bound, benefits from multiprocessing
- Memory: Each game generates ~100-200 training examples
- Time: ~1-2 hours per iteration with 100 games (depends on hardware)
- Disable evaluation to accumulate diverse training data
- Higher temperature for more exploration
- Focus on generating large dataset
- Enable evaluation every 5 iterations
- Gradually reduce temperature
- Start comparing model improvements
- Frequent evaluation
- Lower temperature for stronger play
- Fine-tune on high-quality games
AlphaZero combines three key techniques:
- Self-Play: The system learns by playing against itself
- MCTS: Guided tree search using neural network predictions
- Deep Learning: CNN with residual blocks learns patterns
The neural network provides:
- Policy: Move probabilities (which moves are promising)
- Value: Position evaluation (who is winning)
MCTS uses these predictions to efficiently explore the game tree and select strong moves during self-play.
Based on the AlphaZero algorithm:
Silver, D., et al. (2017). Mastering Chess and Shogi by Self-Play with a
General Reinforcement Learning Algorithm. arXiv:1712.01815
MIT License - See LICENSE file for details
Contributions welcome! Areas for improvement:
- Distributed training across multiple machines
- Advanced data augmentation (board symmetries)
- Opening book integration
- Endgame tablebase integration
- Tensorboard visualization
- Model compression for faster inference
Out of Memory: Reduce BATCH_SIZE or NUM_RESIDUAL_BLOCKS in config.py
Slow Training:
- Ensure PyTorch has CUDA support:
torch.cuda.is_available() - Increase
--workersfor parallel self-play - Reduce
NUM_MCTS_SIMSfor faster (but weaker) play
Checkpoint Not Found: Check checkpoints/ directory exists and path is correct
Multiprocessing Errors: Set proper start method in your script:
import torch.multiprocessing as mp
mp.set_start_method('spawn', force=True)For issues and questions, please open an issue on the project repository.