TrigoRL

A reinforcement learning laboratory project for training AI agents to play Trigo, a 3D variant of the board game Go.

Overview

TrigoRL is an experimental platform for exploring reinforcement learning techniques in the context of Trigo

a strategic board game that extends the rules of Go into three-dimensional space. While traditional Go is played on a 2D 19×19 board, Trigo is played on a cubic grid, introducing new strategic dimensions and complexity.

About Trigo

Trigo is a modern reimplementation of a 3D Go variant with the following characteristics:

Board: 3D cubic grid (default: 5×5×5, configurable to other dimensions including 2D boards)
Rules: Based on Go mechanics adapted for 3D space
- Stone placement with capture detection
- Ko rule enforcement
- Territory calculation in 3D
- Pass, undo/redo, and resignation support
Notation: TGN (Trigo Game Notation) - a PGN-inspired text format for recording games
Coordinate System: Center-symmetric notation (e.g., 000 = center, aaa = corner)

TRY IT YOURSELF ONLINE: here is a Trigo demo page.

Quick Start

Inspect Dataset

View and validate the TGNDataset:

# View dataset statistics
python tools/view_dataset.py configs/training/trigo-gpt2.yaml --stats

# Validate dataset implementation
python tools/view_dataset.py configs/training/trigo-gpt2.yaml --validate

# View a specific sample
python tools/view_dataset.py configs/training/trigo-gpt2.yaml --sample 0 --tokens

See tools/README.md for comprehensive CLI documentation.

Training Models

Train language models from scratch or resume from checkpoints:

# Start new training from scratch
python train_lm.py configs/training/trigo-gpt2.yaml

# Start with config overrides
python train_lm.py configs/training/trigo-gpt2.yaml training.epochs=50 training.learning_rate=5e-5

# Resume from checkpoint by specifying resume_from in config
python train_lm.py configs/training/trigo-gpt2.yaml training.resume_from=outputs/trigor/20251113-trigo-gpt2/checkpoints/best.chkpt

# Resume from experiment directory (automatically loads latest checkpoint)
python train_lm.py outputs/trigor/20251113-trigo-gpt2

# Resume with config overrides (useful for fine-tuning)
python train_lm.py outputs/trigor/20251113-trigo-gpt2 training.learning_rate=1e-5 training.epochs=100

Available training configs:

trigo-gpt2.yaml - GPT-2 with standard multi-head attention
trigo-llama.yaml - LLaMA with grouped query attention (GQA)
trigo-rwkv.yaml - RWKV with linear attention
trigo-gpt2-invsqrt.yaml - GPT-2 with inverse square root scheduler

Resume training options:

From experiment directory: python train_lm.py outputs/trigor/[experiment-dir]
- Automatically loads checkpoints/latest.chkpt
- Preserves all previous config settings
- Continues wandb logging to the same run (if wandb enabled)
From specific checkpoint: Set training.resume_from in config or override:
```
training:
  resume_from: path/to/checkpoint.chkpt  # null = train from scratch
```
- Can use best.chkpt, latest.chkpt, or any epoch checkpoint
- Restores model weights, optimizer state, and training progress
- Useful for transfer learning or fine-tuning

Training outputs:

outputs/trigor/[experiment-id]/config.yaml - Saved configuration
outputs/trigor/[experiment-id]/train.log - Training logs
outputs/trigor/[experiment-id]/checkpoints/ - Model checkpoints
- best.chkpt - Best model (based on validation metric)
- latest.chkpt - Latest model (for resuming)
- epoch_N.chkpt - Periodic checkpoints

Test Models

Run the model test suite:

python tests/test_models.py

This validates:

Model registry with 4 CausalLM models
Configuration loading (dict and OmegaConf)
Forward passes for GPT-2, LLaMA, and RWKV
Parameter counting and memory estimation

Verify Configurations

Test all training configs:

python examples/verify_training_configs.py

Export Models to ONNX

Export trained models for cross-platform deployment:

# Export best checkpoint (default - standard inference mode)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt

# Export in evaluation mode with fixed dimensions
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
    --evaluation-mode --prefix-len 10 --seq-len 15

# Export evaluation mode with dynamic dimensions (prefix-len/seq-len only for dummy input)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
    --evaluation-mode --dynamic-seq

# Export with INT8 quantization (recommended for deployment)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
    --quantize --quant-type int8

# Export with dynamic batch/sequence sizes
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
    --dynamic-batch --dynamic-seq

# Export with static quantization (best accuracy)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
    --quantize --quant-method static --calibration-samples 200

Export Modes:

Standard mode: Single input input_ids, returns logits for all positions
Evaluation mode: Three inputs (prefix_ids, evaluated_ids, evaluated_mask), returns logits for last prefix + evaluated positions. Supports custom attention patterns like tree attention for computing sequence probabilities.

Quantization benefits:

INT8 dynamic: ~3-4x smaller model, minimal accuracy loss
INT4: ~8x smaller, more aggressive compression
Static quantization: Better accuracy than dynamic, requires calibration

See docs/onnx_quantization_guide.md for comprehensive quantization documentation.

Technical Stack

Reinforcement Learning Framework

PyTorch: Deep learning framework for model implementation
Transformers: Architecture foundation for the RL agent (GPT-2, LLaMA, RWKV, xLSTM)
Weights & Biases (wandb): Training metrics and experiment tracking
ONNX: Model weight export format for cross-platform deployment
OmegaConf/Hydra: Hierarchical configuration management

Current Implementation Status

✅ Data Pipeline

TGNDataset: PyTorch dataset for TGN files with byte-level tokenization
TGNByteTokenizer: 259-token vocab (256 bytes + PAD/START/END)
Configuration-driven dataset loading

✅ Model Architecture

4 CausalLM models: GPT2, LLaMA (with GQA), RWKV (linear attention), xLSTM
Model registry with factory pattern
OmegaConf integration for flexible configuration
Parameter counting and memory footprint estimation

✅ Training Configuration

Complete YAML configs for all 4 models
Hyperparameters tuned for each architecture
WandB integration (optional)
Checkpointing and learning rate scheduling

✅ Development Tools

CLI tool for dataset inspection and validation
Model testing suite (109 tests passing)
Configuration verification scripts

✅ Model Export

ONNX export script with checkpoint loading
INT8/INT4 quantization (dynamic and static)
3-4x model compression with minimal accuracy loss
Node.js inference validation and testing

Development Roadmap

The following components need to be implemented for the RL framework:

~~Data Pipeline~~ ✅ COMPLETE
- ~~TGNDataset implementation with byte tokenization~~
- ~~Dataset configuration and loading~~
- ~~Validation and inspection tools~~
~~Model Architecture~~ ✅ COMPLETE
- ~~Transformer-based CausalLM implementations~~
- ~~Model registry and factory pattern~~
- ~~Configuration management~~
Training Pipeline 🚧 IN PROGRESS
- Training loop implementation
- Self-play game generation
- Experience replay buffer
- Policy gradient or actor-critic implementation
- Integration with Weights & Biases for experiment tracking
Environment Wrapper 📋 PLANNED
- Python interface to the Trigo game engine
- OpenAI Gym-compatible environment
- State representation for 3D board positions
- Action space definition
~~Model Export~~ ✅ COMPLETE
- ~~ONNX conversion utilities~~
- ~~INT8/INT4 quantization support~~
- ~~Static and dynamic quantization~~
- ~~Node.js inference validation~~
Evaluation & Analysis 📋 PLANNED
- Agent performance metrics
- Game quality assessment
- Visualization tools

Game Engine Features

The Trigo game engine provides:

3D Visualization: Interactive Three.js-based board rendering
Multiplayer Support: Real-time gameplay via WebSocket
Game Notation: TGN format for saving and loading games
REST API: Programmatic game control
Comprehensive Testing: 10 test suites covering core functionality

For detailed API documentation, see:

Acknowledgments

Based on the Trigo game engine by k-l-lambda
Inspired by AlphaGo and other game-playing RL systems

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.claude		.claude
configs		configs
docs		docs
examples		examples
tests		tests
third_party		third_party
tools		tools
trigor		trigor
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
README.md		README.md
agentlog.md		agentlog.md
exportOnnx.py		exportOnnx.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train.py		train.py
train_lm.py		train_lm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrigoRL

Overview

About Trigo

Quick Start

Inspect Dataset

Training Models

Test Models

Verify Configurations

Export Models to ONNX

Technical Stack

Reinforcement Learning Framework

Current Implementation Status

Development Roadmap

Game Engine Features

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TrigoRL

Overview

About Trigo

Quick Start

Inspect Dataset

Training Models

Test Models

Verify Configurations

Export Models to ONNX

Technical Stack

Reinforcement Learning Framework

Current Implementation Status

Development Roadmap

Game Engine Features

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages