# AlphaZero Arcade - Training

Train superhuman AI for board games using AlphaZero.

**Setup:** Runtime > Change runtime type > **A100 GPU**

| Game | Quick | Strong | Notes |
|------|-------|--------|-------|
| Tic-Tac-Toe | ~2 min | - | Perfect play |
| Connect 4 | ~30 min | ~3 hrs | Very strong |

In [None]:
# 1. Setup (run this first)
%cd /content
!rm -rf /content/alphazero-arcade
!git clone https://github.com/mindswim/connect4-zero.git /content/alphazero-arcade
%cd /content/alphazero-arcade
!pip install -e . -q

import torch
gpu_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'
print(f"Device: {gpu_name}")
print(f"CUDA: {torch.version.cuda}")

In [None]:
# 2. List available games
!python -m alphazero.cli list-games

## Train Tic-Tac-Toe (Quick Test)
Perfect for verifying setup works. Should reach perfect play in ~2 minutes.

In [None]:
# Train Tic-Tac-Toe (fast - ~2 min)
!python -m alphazero.cli train tictactoe \
    --iterations 30 \
    --games 20 \
    --sims 50 \
    --batch-size 16

In [None]:
# Test Tic-Tac-Toe
!python -m alphazero.cli benchmark tictactoe --sims 50 --games 10

## Train Connect 4 (Quick - 30 min)
Good for testing. Produces a decent player.

In [None]:
# Train Connect 4 - Quick (~30 min)
!python -m alphazero.cli train connect4 \
    --iterations 150 \
    --games 30 \
    --sims 100 \
    --batch-size 16 \
    --lr 0.001

## Train Connect 4 (Strong - 3 hrs)
Maximum strength within Colab Pro budget. Significantly stronger than quick training.

| Parameter | Quick | Strong |
|-----------|-------|--------|
| Iterations | 150 | 500 |
| Games/iter | 30 | 50 |
| MCTS sims | 100 | 250 |
| Batch size | 16 | 32 |

In [None]:
# Train Connect 4 - Strong (~3 hrs on A100)
# Maximum strength config - significantly better than quick training
!python -m alphazero.cli train connect4 \
    --iterations 500 \
    --games 50 \
    --sims 250 \
    --batch-size 32 \
    --lr 0.001

In [None]:
# Resume training (if interrupted)
# Quick: resume to 300 iterations
# !python -m alphazero.cli train connect4 --iterations 300 --resume checkpoints/connect4_best.pt

# Strong: resume to 500+ iterations  
# !python -m alphazero.cli train connect4 --iterations 600 --games 50 --sims 250 --batch-size 32 --resume checkpoints/connect4_best.pt

In [None]:
# Benchmark Connect 4 (use higher sims for strong model)
!python -m alphazero.cli benchmark connect4 --sims 250 --games 10 --batch-size 32

## Play Against Your Model

In [None]:
# Play Connect 4 at different difficulties
# (Run in separate cells to avoid blocking)

# Easy
# !python -m alphazero.cli play connect4 --model checkpoints/connect4_best.pt --difficulty easy

# Hard
# !python -m alphazero.cli play connect4 --model checkpoints/connect4_best.pt --difficulty hard

## Export for Web Deployment

In [None]:
# Export to ONNX for browser
!pip install onnx onnxscript -q
!python -m connect4zero.cli export --model checkpoints/connect4_best.pt --output exports/
print("Exported to exports/model.onnx")

## Save to Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

!mkdir -p /content/drive/MyDrive/alphazero-arcade/checkpoints
!cp -r /content/alphazero-arcade/checkpoints/* /content/drive/MyDrive/alphazero-arcade/checkpoints/
print("Saved checkpoints to Google Drive!")

In [None]:
# Download to computer
from google.colab import files
files.download('/content/alphazero-arcade/checkpoints/connect4_best.pt')

---
## Training Configs Reference

### Quick Test (~5 min)
```python
!python -m alphazero.cli train connect4 --iterations 20 --games 15 --sims 50
```

### Standard (~30 min)
```python
!python -m alphazero.cli train connect4 --iterations 150 --games 30 --sims 100 --batch-size 16
```

### Strong (~3 hrs)
```python
!python -m alphazero.cli train connect4 --iterations 500 --games 50 --sims 250 --batch-size 32
```

### Maximum (overnight)
```python
!python -m alphazero.cli train connect4 --iterations 1000 --games 75 --sims 300 --batch-size 32
```