A reinforcement learning laboratory project for training AI agents to play Trigo, a 3D variant of the board game Go.
TrigoRL is an experimental platform for exploring reinforcement learning techniques in the context of Trigo
- a strategic board game that extends the rules of Go into three-dimensional space. While traditional Go is played on a 2D 19×19 board, Trigo is played on a cubic grid, introducing new strategic dimensions and complexity.
Trigo is a modern reimplementation of a 3D Go variant with the following characteristics:
- Board: 3D cubic grid (default: 5×5×5, configurable to other dimensions including 2D boards)
- Rules: Based on Go mechanics adapted for 3D space
- Stone placement with capture detection
- Ko rule enforcement
- Territory calculation in 3D
- Pass, undo/redo, and resignation support
- Notation: TGN (Trigo Game Notation) - a PGN-inspired text format for recording games
- Coordinate System: Center-symmetric notation (e.g.,
000= center,aaa= corner)
TRY IT YOURSELF ONLINE: here is a Trigo demo page.
View and validate the TGNDataset:
# View dataset statistics
python tools/view_dataset.py configs/training/trigo-gpt2.yaml --stats
# Validate dataset implementation
python tools/view_dataset.py configs/training/trigo-gpt2.yaml --validate
# View a specific sample
python tools/view_dataset.py configs/training/trigo-gpt2.yaml --sample 0 --tokensSee tools/README.md for comprehensive CLI documentation.
Train language models from scratch or resume from checkpoints:
# Start new training from scratch
python train_lm.py configs/training/trigo-gpt2.yaml
# Start with config overrides
python train_lm.py configs/training/trigo-gpt2.yaml training.epochs=50 training.learning_rate=5e-5
# Resume from checkpoint by specifying resume_from in config
python train_lm.py configs/training/trigo-gpt2.yaml training.resume_from=outputs/trigor/20251113-trigo-gpt2/checkpoints/best.chkpt
# Resume from experiment directory (automatically loads latest checkpoint)
python train_lm.py outputs/trigor/20251113-trigo-gpt2
# Resume with config overrides (useful for fine-tuning)
python train_lm.py outputs/trigor/20251113-trigo-gpt2 training.learning_rate=1e-5 training.epochs=100Available training configs:
trigo-gpt2.yaml- GPT-2 with standard multi-head attentiontrigo-llama.yaml- LLaMA with grouped query attention (GQA)trigo-rwkv.yaml- RWKV with linear attentiontrigo-gpt2-invsqrt.yaml- GPT-2 with inverse square root scheduler
Resume training options:
-
From experiment directory:
python train_lm.py outputs/trigor/[experiment-dir]- Automatically loads
checkpoints/latest.chkpt - Preserves all previous config settings
- Continues wandb logging to the same run (if wandb enabled)
- Automatically loads
-
From specific checkpoint: Set
training.resume_fromin config or override:training: resume_from: path/to/checkpoint.chkpt # null = train from scratch
- Can use
best.chkpt,latest.chkpt, or any epoch checkpoint - Restores model weights, optimizer state, and training progress
- Useful for transfer learning or fine-tuning
- Can use
Training outputs:
outputs/trigor/[experiment-id]/config.yaml- Saved configurationoutputs/trigor/[experiment-id]/train.log- Training logsoutputs/trigor/[experiment-id]/checkpoints/- Model checkpointsbest.chkpt- Best model (based on validation metric)latest.chkpt- Latest model (for resuming)epoch_N.chkpt- Periodic checkpoints
Run the model test suite:
python tests/test_models.pyThis validates:
- Model registry with 4 CausalLM models
- Configuration loading (dict and OmegaConf)
- Forward passes for GPT-2, LLaMA, and RWKV
- Parameter counting and memory estimation
Test all training configs:
python examples/verify_training_configs.pyExport trained models for cross-platform deployment:
# Export best checkpoint (default - standard inference mode)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt
# Export in evaluation mode with fixed dimensions
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
--evaluation-mode --prefix-len 10 --seq-len 15
# Export evaluation mode with dynamic dimensions (prefix-len/seq-len only for dummy input)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
--evaluation-mode --dynamic-seq
# Export with INT8 quantization (recommended for deployment)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
--quantize --quant-type int8
# Export with dynamic batch/sequence sizes
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
--dynamic-batch --dynamic-seq
# Export with static quantization (best accuracy)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
--quantize --quant-method static --calibration-samples 200Export Modes:
- Standard mode: Single input
input_ids, returnslogitsfor all positions - Evaluation mode: Three inputs (
prefix_ids,evaluated_ids,evaluated_mask), returns logits for last prefix + evaluated positions. Supports custom attention patterns like tree attention for computing sequence probabilities.
Quantization benefits:
- INT8 dynamic: ~3-4x smaller model, minimal accuracy loss
- INT4: ~8x smaller, more aggressive compression
- Static quantization: Better accuracy than dynamic, requires calibration
See docs/onnx_quantization_guide.md for comprehensive quantization documentation.
- PyTorch: Deep learning framework for model implementation
- Transformers: Architecture foundation for the RL agent (GPT-2, LLaMA, RWKV, xLSTM)
- Weights & Biases (wandb): Training metrics and experiment tracking
- ONNX: Model weight export format for cross-platform deployment
- OmegaConf/Hydra: Hierarchical configuration management
✅ Data Pipeline
- TGNDataset: PyTorch dataset for TGN files with byte-level tokenization
- TGNByteTokenizer: 259-token vocab (256 bytes + PAD/START/END)
- Configuration-driven dataset loading
✅ Model Architecture
- 4 CausalLM models: GPT2, LLaMA (with GQA), RWKV (linear attention), xLSTM
- Model registry with factory pattern
- OmegaConf integration for flexible configuration
- Parameter counting and memory footprint estimation
✅ Training Configuration
- Complete YAML configs for all 4 models
- Hyperparameters tuned for each architecture
- WandB integration (optional)
- Checkpointing and learning rate scheduling
✅ Development Tools
- CLI tool for dataset inspection and validation
- Model testing suite (109 tests passing)
- Configuration verification scripts
✅ Model Export
- ONNX export script with checkpoint loading
- INT8/INT4 quantization (dynamic and static)
- 3-4x model compression with minimal accuracy loss
- Node.js inference validation and testing
The following components need to be implemented for the RL framework:
-
Data Pipeline✅ COMPLETETGNDataset implementation with byte tokenizationDataset configuration and loadingValidation and inspection tools
-
Model Architecture✅ COMPLETETransformer-based CausalLM implementationsModel registry and factory patternConfiguration management
-
Training Pipeline 🚧 IN PROGRESS
- Training loop implementation
- Self-play game generation
- Experience replay buffer
- Policy gradient or actor-critic implementation
- Integration with Weights & Biases for experiment tracking
-
Environment Wrapper 📋 PLANNED
- Python interface to the Trigo game engine
- OpenAI Gym-compatible environment
- State representation for 3D board positions
- Action space definition
-
Model Export✅ COMPLETEONNX conversion utilitiesINT8/INT4 quantization supportStatic and dynamic quantizationNode.js inference validation
-
Evaluation & Analysis 📋 PLANNED
- Agent performance metrics
- Game quality assessment
- Visualization tools
The Trigo game engine provides:
- 3D Visualization: Interactive Three.js-based board rendering
- Multiplayer Support: Real-time gameplay via WebSocket
- Game Notation: TGN format for saving and loading games
- REST API: Programmatic game control
- Comprehensive Testing: 10 test suites covering core functionality
For detailed API documentation, see:
- Based on the Trigo game engine by k-l-lambda
- Inspired by AlphaGo and other game-playing RL systems