A chess engine implementation inspired by AlphaZero, designed for Apple Silicon with MLX neural network acceleration.
- Apple Silicon Support: Uses MLX for neural network acceleration on Apple Silicon
- Neural Network: 32M-parameter ResNet + multi-head attention stack with squeeze-and-excitation
- MCTS Search: Monte Carlo Tree Search with PUCT algorithm
- Self-Play Training: Generates training data through self-play games
- UCI Protocol: Compatible with chess GUIs
- Multi-Threading: Parallel MCTS search and self-play workers
- Expanded Features: 22 AlphaZero-style planes including clock, material, and check indicators
- Training Pipeline: Cosine LR schedule, gradient accumulation, EMA evaluation, and checkpoint management
BetaOne/
├── network/ # MLX neural network (Python)
├── engine/ # C++ MCTS + UCI engine
├── orchestrator/ # Swift orchestrator
├── data/ # Training data storage
├── checkpoints/ # Model weights
└── logs/ # Logging output
- Neural Network (MLX): Residual tower with stacked attention blocks and gated channel mixing for policy/value heads
- MCTS Engine (C++): Tree search with virtual loss parallelization
- UCI Interface (C++): Standard chess engine protocol
- Orchestrator (Swift): Coordinates self-play, training, and evaluation
The system consists of three main components:
- Swift Orchestrator: Manages the training loop, coordinates self-play games, training, and evaluation
- C++ Engine: Implements MCTS search, UCI protocol, and position handling
- Python Network: MLX-based neural network for policy and value prediction
- Self-Play: Generate games using current model with visit-count-aware policy exports
- Training: Train with AlphaZero loss, cosine warmup/decay, gradient accumulation, and mixed precision
- Evaluation: Compare both raw and EMA checkpoints against the previous best on held-out shards
- Promotion: Update best model if validation/evaluation metrics improve beyond thresholds
- macOS with Apple Silicon (M1/M2/M3)
- Xcode Command Line Tools
- Python 3.8+
-
Install dependencies:
brew install cmake
-
Build the system:
./build.sh
-
Run the orchestrator: (example with three workers and 24 total games)
swift run --package-path orchestrator Orchestrator "$(pwd)/config.json" --workers 3 --games 24 --iterations 1 --verbose
cd network && source venv/bin/activate && python -c "from model import ensure_device, create_model; from config import get_input_planes; import mlx.core as mx; ensure_device(); planes=get_input_planes(); m=create_model(); x=mx.random.normal((1,planes,8,8)); p,v=m(x); mx.eval(p,v); print('policy',p.shape,'value',v.shape)"Expected output: policy (1, 4672) value (1, 1)
cd .. && PYTHONPATH=network network/venv/bin/python -m network.inference --batch-size 2Expected output: JSON response with policy array and value
cd engine/build/bin && echo "uci\nisready\nquit" | ./betaoneExpected output: UCI protocol responses
cd orchestrator && swift run OrchestratorThis runs the complete self-play, training, and evaluation loop
./engine/build/bin/betaoneThis starts the UCI engine for use with chess GUIs
cd network && source venv/bin/activate && python train.pyIf MLX reports a Metal compiler failure (e.g. Unable to load kernel rbitsc), the process now exits cleanly with guidance. Restart the machine or rerun the training step with MLX_DISABLE_METAL=1 to force CPU execution.
This runs training on existing self-play data
The Swift orchestrator accepts a number of knobs so you can tailor short smoke tests or longer training cycles without editing config.json:
| Flag | Description |
|---|---|
--workers N |
Override the number of concurrent self-play workers (defaults to self_play.num_workers). |
--games N |
Total self-play games per iteration (work is split evenly among workers). |
--simulations N |
Override MCTS simulations per move. |
--batch-size N, --learning-rate X, --steps-per-checkpoint N |
Adjust training hyper-parameters on the fly. |
--eval-games N, --promotion-threshold X |
Tune evaluation workload and promotion rule. |
--max-cpu-cores N |
Limit Python/MLX CPU threads if you want predictable resource usage. |
--config-overrides '{...}' |
Apply ad-hoc JSON overrides for advanced scenarios. |
When you launch the orchestrator the CLI writes a persistent JSONL event log under logs/ (e.g. logs/orchestrator_run_2025-10-21T19-15-43.995Z.jsonl). Tail it while a run is in progress to monitor phase transitions, per-iteration stats, and anomalous events:
tail -f "$(ls -t logs/orchestrator_run_*.jsonl | head -n1)"Each iteration also updates logs/selfplay_manifest.json with a catalog of shards (size, timestamp) so you can inspect what training will consume.
Tip: The training loop now validates the corpus size, applies cosine LR scheduling, and evaluates both raw and EMA weights each epoch. Ensure you regenerate self-play shards after upgrading—existing 18-plane datasets are rejected because the feature encoder now emits 22 planes.
Once the UCI engine is running, use these commands:
uci
isready
position startpos
go depth 10
quit
The orchestrator automatically generates self-play data, trains the model, and evaluates performance in a continuous loop. Each move now captures both normalized policy logits and raw visit counts from MCTS, which are encoded into the training shards to preserve search strength.
Device selection can be controlled through:
- Command-line
--deviceflag (auto,gpu, orcpu) on training and inference tools - Environment variable
BETAONE_DEVICE runtime.deviceinconfig.json(defaults toauto)
When auto is selected, the system attempts to use GPU acceleration first and falls back to CPU if unavailable.
Edit config.json to adjust parameters:
{
"self_play": {
"num_games": 100,
"num_workers": 2,
"mcts_simulations": 800,
"temperature_moves": 30
},
"training": {
"batch_size": 256,
"learning_rate": 0.001,
"steps_per_checkpoint": 1000,
"l2_reg": 0.0001
},
"runtime": {
"device": "auto",
"network_batch_size": 32,
"network_timeout_ms": 5000,
"max_concurrent_engines": 2
},
"evaluation": {
"num_games": 100,
"promotion_threshold": 0.55
}
}runtime.network_batch_sizecontrols how many FENs the C++ engine batches per inference RPCruntime.network_timeout_msis the client-side deadline (in milliseconds) for inference responsesruntime.max_concurrent_enginesbounds how many self-play/evaluation engines (and therefore MLX inference servers) may run at once; lower this to avoid exhausting Metal resources on smaller GPUstraining.min_positionsdefines the minimum number of positions required before a training epoch will run (the orchestrator now skips training gracefully when the corpus is too small)training.max_cpu_cores,training.ema_decay, andtraining.grad_clipexpose additional safeguards for MLX training stability
- Event log: every orchestrator run creates a JSON Lines file in
logs/with phase boundaries, per-iteration summaries, and warnings (engine restarts, policy fallbacks, training skips, etc.). - Self-play manifest:
logs/selfplay_manifest.jsontracks all generated shards, their size, and timestamps—useful for auditing what training will consume. - Console stream:
--verbosekeeps the classic human-readable log (still the best way to watch MCTS statistics and evaluation outcomes in real time).
- Model Size: Approximately 22M parameters
- Inference: Varies by device and batch size
- Self-Play: Depends on MCTS simulations and hardware
- Training: Depends on batch size and hardware
- Memory Usage: Varies during training
tools/benchmark/latency_tracker.pyruns scripted matches at fixed simulation counts and reports per-move latency, nodes per second, and basic engine telemetry.tools/benchmark/resource_monitor.pywraps the latency tracker, samples CPU/GPU/memory usage while varying worker counts, and writes JSON reports underlogs/benchmark/for later analysis.
C++ Engine:
cd engine
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(sysctl -n hw.ncpu)Swift Orchestrator:
cd orchestrator
swift build -c releasePython Environment:
cd network
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtCheckpoint artifacts are not committed. After MLX is available, create fresh base weights with:
cd ..
./network/venv/bin/python network/make_base_checkpoint.py --name base --config config.jsonThe training pipeline now auto-initialises latest.safetensors / EMA snapshots from the regenerated base checkpoint, so no manual copies are required.
Run the environment doctor followed by the integration smoke test after the first build:
python tools/doctor.py
./tools/smoke_test.shIf you prefer manual spot checks:
# Test neural network
cd network && source venv/bin/activate
python -c "from model import create_model; model = create_model()"
# Test UCI engine
echo "uci\nisready\nquit" | ./engine/build/bin/betaone
# Test orchestrator
cd orchestrator && swift run Orchestrator --help- Input: 8x8x18 planes (piece positions, side to move, castling rights, en-passant target)
- Backbone: 10 ResNet blocks with 256 channels
- Attention: Multi-head self-attention (8 heads)
- Policy Head: 4672 output classes (AlphaZero-style move encoding)
- Value Head: Scalar output (-1 to 1)
- Parameters: Approximately 22M total
- PUCT Formula:
Q + c_puct * P * sqrt(N_parent) / (1 + N_child) - Virtual Loss: Parallelization with virtual loss penalty
- Dirichlet Noise: Root node exploration
- Temperature: Move selection temperature scheduling
- Self-Play: Generate games using current model
- Data Collection: Store positions, policies, and values
- Training: AlphaZero loss (policy + value + L2)
- Evaluation: Play against previous best model
- Promotion: Update best model if win rate > 55%
- Build failures: Ensure all dependencies are installed
- Python errors: Activate virtual environment before running
- UCI connection: Check that the engine binary is built correctly
- Memory issues: Reduce batch size or number of workers
- Ensure Xcode Command Line Tools are installed:
xcode-select --install - Check CMake version:
cmake --version(requires 3.20+) - Verify Swift installation:
swift --version - Check if jsoncpp is installed:
brew list jsoncpp
- Check Python virtual environment:
cd network && source venv/bin/activate - Verify MLX installation:
python -c "import mlx.core as mx; print('MLX OK')" - Check engine binary:
./engine/build/bin/betaone --help - Verify inference server:
echo '{"type": "single", "fen": "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"}' | cd network && source venv/bin/activate && python inference.py
- Monitor memory usage during training:
top -pid $(pgrep -f "python.*train.py") - Adjust
num_workersin config for your CPU cores - Reduce
mcts_simulationsfor faster games - Use smaller batch sizes if memory constrained
- Check Metal GPU usage:
system_profiler SPDisplaysDataType
Check the logs/ directory for detailed execution logs:
training.log: Training progress and loss curvesorchestrator.log: Self-play and evaluation metricsengine.log: UCI engine debug output
# Check system resources
top -l 1 | grep -E "(CPU|PhysMem)"
# Monitor Metal GPU usage
system_profiler SPDisplaysDataType | grep -A 5 "Metal"
# Check Python environment
cd network && source venv/bin/activate && python -c "import mlx.core as mx; print(f'MLX version: {mx.__version__}')"
# Test UCI engine connectivity
echo "uci\nisready\nquit" | ./engine/build/bin/betaone
# Check build artifacts
ls -la engine/build/bin/
ls -la orchestrator/.build/release/tools/chess_utils.py: Shared helpers for applying moves, querying board status, and parsing policy stringstools/fen_tools.py: CLI wrapper that calls the shared helpers and packages self-play shardslogs/selfplay_manifest.json: Automatically maintained summary of generated training shards
This is a personal project, but suggestions and improvements are welcome.
MIT License - see LICENSE file for details.
- Inspired by AlphaZero and LC0
- Built with MLX for Apple Silicon optimization
- Uses standard UCI protocol for compatibility