Skip to content

lukifer23/BetaOne

Repository files navigation

BetaOne Chess Engine

A chess engine implementation inspired by AlphaZero, designed for Apple Silicon with MLX neural network acceleration.

Features

  • Apple Silicon Support: Uses MLX for neural network acceleration on Apple Silicon
  • Neural Network: 32M-parameter ResNet + multi-head attention stack with squeeze-and-excitation
  • MCTS Search: Monte Carlo Tree Search with PUCT algorithm
  • Self-Play Training: Generates training data through self-play games
  • UCI Protocol: Compatible with chess GUIs
  • Multi-Threading: Parallel MCTS search and self-play workers
  • Expanded Features: 22 AlphaZero-style planes including clock, material, and check indicators
  • Training Pipeline: Cosine LR schedule, gradient accumulation, EMA evaluation, and checkpoint management

Architecture

BetaOne/
├── network/          # MLX neural network (Python)
├── engine/           # C++ MCTS + UCI engine
├── orchestrator/     # Swift orchestrator
├── data/            # Training data storage
├── checkpoints/     # Model weights
└── logs/            # Logging output

Components

  1. Neural Network (MLX): Residual tower with stacked attention blocks and gated channel mixing for policy/value heads
  2. MCTS Engine (C++): Tree search with virtual loss parallelization
  3. UCI Interface (C++): Standard chess engine protocol
  4. Orchestrator (Swift): Coordinates self-play, training, and evaluation

System Architecture

The system consists of three main components:

  1. Swift Orchestrator: Manages the training loop, coordinates self-play games, training, and evaluation
  2. C++ Engine: Implements MCTS search, UCI protocol, and position handling
  3. Python Network: MLX-based neural network for policy and value prediction

Training Pipeline

  1. Self-Play: Generate games using current model with visit-count-aware policy exports
  2. Training: Train with AlphaZero loss, cosine warmup/decay, gradient accumulation, and mixed precision
  3. Evaluation: Compare both raw and EMA checkpoints against the previous best on held-out shards
  4. Promotion: Update best model if validation/evaluation metrics improve beyond thresholds

Quick Start

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3)
  • Xcode Command Line Tools
  • Python 3.8+

Installation

  1. Install dependencies:

    brew install cmake
  2. Build the system:

    ./build.sh
  3. Run the orchestrator: (example with three workers and 24 total games)

    swift run --package-path orchestrator Orchestrator "$(pwd)/config.json" --workers 3 --games 24 --iterations 1 --verbose

Usage

Testing Individual Components

1. Test Neural Network

cd network && source venv/bin/activate && python -c "from model import ensure_device, create_model; from config import get_input_planes; import mlx.core as mx; ensure_device(); planes=get_input_planes(); m=create_model(); x=mx.random.normal((1,planes,8,8)); p,v=m(x); mx.eval(p,v); print('policy',p.shape,'value',v.shape)"

Expected output: policy (1, 4672) value (1, 1)

2. Test MLX Inference Server

cd .. && PYTHONPATH=network network/venv/bin/python -m network.inference --batch-size 2

Expected output: JSON response with policy array and value

3. Test UCI Engine

cd engine/build/bin && echo "uci\nisready\nquit" | ./betaone

Expected output: UCI protocol responses

Running the Full System

Option 1: Swift Orchestrator (Recommended)

cd orchestrator && swift run Orchestrator

This runs the complete self-play, training, and evaluation loop

Option 2: Direct UCI Engine

./engine/build/bin/betaone

This starts the UCI engine for use with chess GUIs

Option 3: Direct Training

cd network && source venv/bin/activate && python train.py

If MLX reports a Metal compiler failure (e.g. Unable to load kernel rbitsc), the process now exits cleanly with guidance. Restart the machine or rerun the training step with MLX_DISABLE_METAL=1 to force CPU execution. This runs training on existing self-play data

Orchestrator CLI Highlights

The Swift orchestrator accepts a number of knobs so you can tailor short smoke tests or longer training cycles without editing config.json:

Flag Description
--workers N Override the number of concurrent self-play workers (defaults to self_play.num_workers).
--games N Total self-play games per iteration (work is split evenly among workers).
--simulations N Override MCTS simulations per move.
--batch-size N, --learning-rate X, --steps-per-checkpoint N Adjust training hyper-parameters on the fly.
--eval-games N, --promotion-threshold X Tune evaluation workload and promotion rule.
--max-cpu-cores N Limit Python/MLX CPU threads if you want predictable resource usage.
--config-overrides '{...}' Apply ad-hoc JSON overrides for advanced scenarios.

When you launch the orchestrator the CLI writes a persistent JSONL event log under logs/ (e.g. logs/orchestrator_run_2025-10-21T19-15-43.995Z.jsonl). Tail it while a run is in progress to monitor phase transitions, per-iteration stats, and anomalous events:

tail -f "$(ls -t logs/orchestrator_run_*.jsonl | head -n1)"

Each iteration also updates logs/selfplay_manifest.json with a catalog of shards (size, timestamp) so you can inspect what training will consume.

Tip: The training loop now validates the corpus size, applies cosine LR scheduling, and evaluates both raw and EMA weights each epoch. Ensure you regenerate self-play shards after upgrading—existing 18-plane datasets are rejected because the feature encoder now emits 22 planes.

UCI Engine Commands

Once the UCI engine is running, use these commands:

uci
isready
position startpos
go depth 10
quit

Self-Play Data Generation

The orchestrator automatically generates self-play data, trains the model, and evaluates performance in a continuous loop. Each move now captures both normalized policy logits and raw visit counts from MCTS, which are encoded into the training shards to preserve search strength.

Device Selection

Device selection can be controlled through:

  1. Command-line --device flag (auto, gpu, or cpu) on training and inference tools
  2. Environment variable BETAONE_DEVICE
  3. runtime.device in config.json (defaults to auto)

When auto is selected, the system attempts to use GPU acceleration first and falls back to CPU if unavailable.

Configuration

Edit config.json to adjust parameters:

{
  "self_play": {
    "num_games": 100,
    "num_workers": 2,
    "mcts_simulations": 800,
    "temperature_moves": 30
  },
  "training": {
    "batch_size": 256,
    "learning_rate": 0.001,
    "steps_per_checkpoint": 1000,
    "l2_reg": 0.0001
  },
  "runtime": {
    "device": "auto",
    "network_batch_size": 32,
    "network_timeout_ms": 5000,
    "max_concurrent_engines": 2
  },
  "evaluation": {
    "num_games": 100,
    "promotion_threshold": 0.55
  }
}
  • runtime.network_batch_size controls how many FENs the C++ engine batches per inference RPC
  • runtime.network_timeout_ms is the client-side deadline (in milliseconds) for inference responses
  • runtime.max_concurrent_engines bounds how many self-play/evaluation engines (and therefore MLX inference servers) may run at once; lower this to avoid exhausting Metal resources on smaller GPUs
  • training.min_positions defines the minimum number of positions required before a training epoch will run (the orchestrator now skips training gracefully when the corpus is too small)
  • training.max_cpu_cores, training.ema_decay, and training.grad_clip expose additional safeguards for MLX training stability

Logs & Diagnostics

  • Event log: every orchestrator run creates a JSON Lines file in logs/ with phase boundaries, per-iteration summaries, and warnings (engine restarts, policy fallbacks, training skips, etc.).
  • Self-play manifest: logs/selfplay_manifest.json tracks all generated shards, their size, and timestamps—useful for auditing what training will consume.
  • Console stream: --verbose keeps the classic human-readable log (still the best way to watch MCTS statistics and evaluation outcomes in real time).

Performance

  • Model Size: Approximately 22M parameters
  • Inference: Varies by device and batch size
  • Self-Play: Depends on MCTS simulations and hardware
  • Training: Depends on batch size and hardware
  • Memory Usage: Varies during training

Benchmarking Tools

  • tools/benchmark/latency_tracker.py runs scripted matches at fixed simulation counts and reports per-move latency, nodes per second, and basic engine telemetry.
  • tools/benchmark/resource_monitor.py wraps the latency tracker, samples CPU/GPU/memory usage while varying worker counts, and writes JSON reports under logs/benchmark/ for later analysis.

Development

Building Individual Components

C++ Engine:

cd engine
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(sysctl -n hw.ncpu)

Swift Orchestrator:

cd orchestrator
swift build -c release

Python Environment:

cd network
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Checkpoint artifacts are not committed. After MLX is available, create fresh base weights with:

cd ..
./network/venv/bin/python network/make_base_checkpoint.py --name base --config config.json

The training pipeline now auto-initialises latest.safetensors / EMA snapshots from the regenerated base checkpoint, so no manual copies are required.

Testing

Run the environment doctor followed by the integration smoke test after the first build:

python tools/doctor.py
./tools/smoke_test.sh

If you prefer manual spot checks:

# Test neural network
cd network && source venv/bin/activate
python -c "from model import create_model; model = create_model()"

# Test UCI engine
echo "uci\nisready\nquit" | ./engine/build/bin/betaone

# Test orchestrator
cd orchestrator && swift run Orchestrator --help

Technical Details

Neural Network Architecture

  • Input: 8x8x18 planes (piece positions, side to move, castling rights, en-passant target)
  • Backbone: 10 ResNet blocks with 256 channels
  • Attention: Multi-head self-attention (8 heads)
  • Policy Head: 4672 output classes (AlphaZero-style move encoding)
  • Value Head: Scalar output (-1 to 1)
  • Parameters: Approximately 22M total

MCTS Algorithm

  • PUCT Formula: Q + c_puct * P * sqrt(N_parent) / (1 + N_child)
  • Virtual Loss: Parallelization with virtual loss penalty
  • Dirichlet Noise: Root node exploration
  • Temperature: Move selection temperature scheduling

Training Process

  1. Self-Play: Generate games using current model
  2. Data Collection: Store positions, policies, and values
  3. Training: AlphaZero loss (policy + value + L2)
  4. Evaluation: Play against previous best model
  5. Promotion: Update best model if win rate > 55%

Troubleshooting

Common Issues

  1. Build failures: Ensure all dependencies are installed
  2. Python errors: Activate virtual environment before running
  3. UCI connection: Check that the engine binary is built correctly
  4. Memory issues: Reduce batch size or number of workers

Detailed Troubleshooting

Build Issues

  • Ensure Xcode Command Line Tools are installed: xcode-select --install
  • Check CMake version: cmake --version (requires 3.20+)
  • Verify Swift installation: swift --version
  • Check if jsoncpp is installed: brew list jsoncpp

Runtime Issues

  • Check Python virtual environment: cd network && source venv/bin/activate
  • Verify MLX installation: python -c "import mlx.core as mx; print('MLX OK')"
  • Check engine binary: ./engine/build/bin/betaone --help
  • Verify inference server: echo '{"type": "single", "fen": "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"}' | cd network && source venv/bin/activate && python inference.py

Performance Issues

  • Monitor memory usage during training: top -pid $(pgrep -f "python.*train.py")
  • Adjust num_workers in config for your CPU cores
  • Reduce mcts_simulations for faster games
  • Use smaller batch sizes if memory constrained
  • Check Metal GPU usage: system_profiler SPDisplaysDataType

Logs

Check the logs/ directory for detailed execution logs:

  • training.log: Training progress and loss curves
  • orchestrator.log: Self-play and evaluation metrics
  • engine.log: UCI engine debug output

Debug Commands

# Check system resources
top -l 1 | grep -E "(CPU|PhysMem)"

# Monitor Metal GPU usage
system_profiler SPDisplaysDataType | grep -A 5 "Metal"

# Check Python environment
cd network && source venv/bin/activate && python -c "import mlx.core as mx; print(f'MLX version: {mx.__version__}')"

# Test UCI engine connectivity
echo "uci\nisready\nquit" | ./engine/build/bin/betaone

# Check build artifacts
ls -la engine/build/bin/
ls -la orchestrator/.build/release/

Utilities

  • tools/chess_utils.py: Shared helpers for applying moves, querying board status, and parsing policy strings
  • tools/fen_tools.py: CLI wrapper that calls the shared helpers and packages self-play shards
  • logs/selfplay_manifest.json: Automatically maintained summary of generated training shards

Contributing

This is a personal project, but suggestions and improvements are welcome.

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Inspired by AlphaZero and LC0
  • Built with MLX for Apple Silicon optimization
  • Uses standard UCI protocol for compatibility

About

AlphaZero‑inspired chess engine for Apple Silicon, combining an MLX neural network, C++ MCTS UCI engine, and Swift orchestrator for self‑play training, evaluation, and benchmarking.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors