## Step 1: Check GPU

In [None]:
!nvidia-smi

## Step 2: Clone Repository

In [None]:
# Replace YOUR_TOKEN and YOUR_USERNAME
!git clone https://YOUR_GITHUB_TOKEN@github.com/YOUR_USERNAME/cuda-mc-chess-engine.git
%cd cuda-mc-chess-engine

## Step 3: Checkout Latest Branch

In [None]:
!git fetch
!git checkout monte-carlo-v3-sejtanluci
!git pull origin monte-carlo-v3-sejtanluci
%cd gpu
!ls -la

## Step 4: Verify Files Exist

In [None]:
print("=== Checking source files ===")
!ls -la src/
print("\n=== Checking headers ===")
!ls -la include/
print("\n=== Checking tests ===")
!ls -la tests/

## Step 5: Compile Test Suite

In [None]:
# Compile PUCT test suite
# Note: Adjust -arch flag based on your GPU (sm_75 for T4, sm_70 for V100, sm_80 for A100)
!nvcc -std=c++17 -arch=sm_75 -O3 -Iinclude \
  tests/test_puct_mcts.cpp \
  src/gpu_kernels.cu \
  src/init_tables.cu \
  src/mcts.cpp \
  src/puct_mcts.cpp \
  -o test_puct_mcts \
  -lcudart -lcurand

## Step 6: Run Comprehensive Test Suite

In [None]:
!./test_puct_mcts

## Step 7: Build Main Engine

In [None]:
!make clean
!make

## Step 8: PUCT Benchmark (1600 simulations)

In [None]:
!./puct_chess --puct --benchmark --sims 1600

## Step 9: Compare PUCT vs Original MCTS

In [None]:
print("=" * 50)
print("PUCT MCTS (AlphaZero-style heuristic)")
print("=" * 50)
!./puct_chess --puct --benchmark --sims 800

print("\n" + "=" * 50)
print("Original UCB1 MCTS")
print("=" * 50)
!./puct_chess --original --benchmark --sims 800

## Step 10: Fast Test (400 simulations)

In [None]:
!./puct_chess --puct --benchmark --sims 400

## Step 11: Self-Play Game

In [None]:
!./puct_chess --puct --play --sims 800 --moves 30

## Step 12: Performance Profiling

In [None]:
# Test different batch sizes
for batch_size in [64, 128, 256, 512]:
    print(f"\n=== Batch Size: {batch_size} ===")
    !./puct_chess --puct --benchmark --sims 800 --batch {batch_size}

## Step 13: Check GPU Memory Usage

In [None]:
!nvidia-smi

## ðŸ“Š Expected Results

### Test Suite (10 tests):
- âœ… All tests should pass
- Total time: 5-15 seconds
- Pass rate: 100%

### Performance (T4 GPU):
- Simulations/sec: 8,000-12,000
- 400 sims: ~40-50ms
- 1600 sims: ~150-200ms

### Comparison:
- PUCT should be 2-3x faster than original MCTS
- PUCT should find better moves with fewer simulations
- Both should find reasonable opening moves (e2e4, d2d4, g1f3, etc.)

## ðŸŽ¯ Success Criteria

1. âœ… All 10 tests pass
2. âœ… PUCT finds valid moves
3. âœ… Sims/sec > 5000
4. âœ… No CUDA errors
5. âœ… Memory properly freed