# OpenScope RL: Educational Demo

**Training AI to Control Air Traffic**

This notebook is **completely self-contained** - all code is defined inline (no project imports).

## What You'll See

1. **Section 1**: Environment setup (config and OpenScope Gym environment)
2. **Section 2**: Component demo (interact with the game)
3. **Section 3**: Training with PPO (Stable-Baselines3)
4. **Section 4**: Summary and next steps

**Key Features**:
- ✅ **Simple & Educational** - Focus on RL workflow, not implementation complexity
- ✅ **Playwright** browser automation for real game interaction
- ✅ **Stable-Baselines3** for easy PPO training
- ✅ **Dict observation/action spaces** for structured control

**Requirements**: Gymnasium, Stable-Baselines3, Playwright, PyYAML

---

# Section 1: Code Definitions

All classes defined inline below:

## 1.1 Imports and Configuration

Simple setup: Gymnasium for RL, Stable-Baselines3 for PPO, Playwright for browser automation, and YAML for configuration.

In [1]:
# Core imports only
import time
import warnings
from typing import Any, Optional
import os

import gymnasium as gym
import numpy as np
import requests
import yaml
from gymnasium import spaces

# Stable-Baselines3
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

warnings.filterwarnings('ignore')

# Set up virtual display for WSL/headless environments
# This is needed when running notebooks remotely (e.g., through Cursor)
if os.environ.get('DISPLAY') is None:
    os.environ['DISPLAY'] = ':99'
    print("🖥️  Set DISPLAY=:99 for virtual display (xvfb)")
    print("   Make sure xvfb is running: Xvfb :99 -screen 0 1024x768x24 &")

# Check environment
if not os.path.exists('config'):
    raise RuntimeError("Run from rl_training directory!")
print("✅ Directory OK")

# Check game server
try:
    requests.get('http://localhost:3003', timeout=2)
    print("✅ Game server running")
except:
    print("⚠️  Start game server: cd ../openscope && npm start")

# Load config as simple dict
with open('config/training_config.yaml') as f:
    config = yaml.safe_load(f)
print(f"✅ Config loaded: {config['env']['airport']}")

🖥️  Set DISPLAY=:99 for virtual display (xvfb)
   Make sure xvfb is running: Xvfb :99 -screen 0 1024x768x24 &
✅ Directory OK
✅ Game server running
✅ Config loaded: KLAS


## 1.2 OpenScope Environment

Import the Gymnasium environment (uses threaded Playwright for Jupyter compatibility):

In [2]:
# Import OpenScopeEnv from the environment module
# Note: Uses a dedicated thread for Playwright to avoid Jupyter asyncio conflicts
from environment.openscope_env import OpenScopeEnv

print('✅ OpenScopeEnv imported from environment module')
print('   (Playwright runs in a dedicated thread for Jupyter compatibility)')

✅ OpenScopeEnv imported from environment module
   (Playwright runs in a dedicated thread for Jupyter compatibility)


# Section 2: Component Demo

Let's interact with the OpenScope environment to see how it works:

## 2.1 Create Environment

In [3]:
# Create fresh environment (will now have the diagnostic code)
env = OpenScopeEnv(
    game_url=config['env']['game_url'],
    airport=config['env']['airport'],
    timewarp=config['env']['timewarp'],
    max_aircraft=config['env']['max_aircraft'],
    episode_length=config['env']['episode_length'],
    action_interval=config['env']['action_interval'],
    headless=config['env']['headless'],
    config=config,
)

obs, info = env.reset()

# Try steps
for i in range(5):
    action = {'aircraft_id': 20, 'command_type': 0, 'heading': 0, 'altitude': 0, 'speed': 0}
    obs, reward, done, truncated, info = env.step(action)
    print(f"  Step {i+1}: Time={info['raw_state']['time']:.1f}s, Aircraft={info['aircraft_count']}")

🔍 Browser init: headless=False, DISPLAY=:99
✅ Using DISPLAY=:99, headless=False
✅ Page loaded
⏳ Waiting 10 seconds for airport to load...
✅ Airport load wait complete
  Step 1: Time=0.0s, Aircraft=22
  Step 2: Time=0.0s, Aircraft=22
  Step 3: Time=0.0s, Aircraft=22
  Step 4: Time=0.0s, Aircraft=22
  Step 5: Time=0.0s, Aircraft=22


In [3]:
env = OpenScopeEnv(
    game_url=config['env']['game_url'],
    airport=config['env']['airport'],
    timewarp=config['env']['timewarp'],
    max_aircraft=config['env']['max_aircraft'],
    episode_length=config['env']['episode_length'],
    action_interval=config['env']['action_interval'],
    headless=config['env']['headless'],
    config=config,
)

obs, info = env.reset()

for i in range(5):
    action = {'aircraft_id': 20, 'command_type': 0, 'heading': 0, 'altitude': 0, 'speed': 0}
    obs, reward, done, truncated, info = env.step(action)
    print(f"  Step {i+1}: Time={info['raw_state']['time']:.1f}s, Aircraft={info['aircraft_count']}")

  Step 1: Time=10.0s, Aircraft=23
  Step 2: Time=20.0s, Aircraft=23
  Step 3: Time=30.0s, Aircraft=23
  Step 4: Time=40.0s, Aircraft=23
  Step 5: Time=50.0s, Aircraft=23


**⚠️ CRITICAL: WSL Display Issue**

**If you get `TargetClosedError` about "no XServer running":**

You're on WSL without a display server. OpenScope needs `headless: false` for the game loop to work, but WSL has no display.

**Quick Fix (Recommended):**
```bash
# Install xvfb (virtual display)
sudo apt-get update && sudo apt-get install -y xvfb

# Restart Jupyter with xvfb
cd ~/Projects/atc-ai/rl_training
xvfb-run -a jupyter lab
```

**Alternative:** See `WSL_DISPLAY_FIX.md` for X11 forwarding setup to actually see the browser.

**Why we need headless=false:**
- In `headless: true` mode, the browser's `requestAnimationFrame` doesn't work properly
- Game time stays stuck at 0.0s
- Aircraft spawn but simulation doesn't progress
- **Solution:** Run headed mode with xvfb (virtual display)

**Performance Optimizations:**
- `timewarp: 15` (was 5) → 3x faster
- `action_interval: 10` (was 5) → 2x fewer steps  
- `episode_length: 1800` (was 3600) → 2x faster episodes
- **Combined: ~12x faster training** 🚀


## 2.2 Extract Game State

In [4]:
obs, info = env.reset()

# Check initial state
state = env._get_game_state()
print(f"Initial state after reset:")
print(f"  Time: {state.get('time', 0):.1f}s")
print(f"  Score: {state.get('score', 0)}")
print(f"  Aircraft: {len(state.get('aircraft', []))}")

# IMPORTANT: If time is stuck at 0.0s, set headless=False in config!
# The game's animation loop (requestAnimationFrame) doesn't run properly in headless mode

if state.get('time', 0) == 0 and len(state.get('aircraft', [])) > 0:
    print("\n⚠️  WARNING: Time is 0 but aircraft exist!")
    print("   This means headless mode is preventing the game loop from running")
    print("   Solution: Set headless: false in config/training_config.yaml")
    print("   The browser window will be visible but it's necessary for the game to work")
else:
    # Increase timewarp for faster testing
    print(f"\nSetting timewarp to 15 for faster testing...")
    env._execute_command("timewarp 15")
    time.sleep(1.0)
    
    # Step through to advance game time and spawn aircraft
    print("\nAdvancing game time to spawn aircraft...")
    print("(Each step = 10 game seconds with action_interval=10)")
    
    prev_time = state.get('time', 0)
    for i in range(90):  # 90 steps × 10 sec = 900 seconds = 15 minutes game time
        action = {
            'aircraft_id': 20,  # No action
            'command_type': 0,
            'heading': 0,
            'altitude': 0,
            'speed': 0
        }
        obs, reward, done, truncated, info = env.step(action)
        
        # Print progress every 18 steps (3 min of game time)
        if (i + 1) % 18 == 0:
            current_time = info['raw_state']['time']
            time_delta = current_time - prev_time
            print(f"  Step {i+1:3d}: Time={current_time:6.1f}s (+{time_delta:5.1f}s), Aircraft={info['aircraft_count']}")
            prev_time = current_time
    
    state = env._get_game_state()
    print(f"\nFinal State:")
    print(f"  Time: {state.get('time', 0):.1f}s")
    print(f"  Score: {state.get('score', 0)}")
    print(f"  Aircraft: {len(state.get('aircraft', []))}")

Initial state after reset:
  Time: 0.0s
  Score: 0
  Aircraft: 23

   This means headless mode is preventing the game loop from running
   Solution: Set headless: false in config/training_config.yaml
   The browser window will be visible but it's necessary for the game to work


## 2.3 Visualize Observations

In [5]:
obs = env._state_to_observation(state)
aircraft_mask = obs['aircraft_mask']
num_active = aircraft_mask.sum()

print(f"Active aircraft: {num_active}/{env.max_aircraft}")
print(f"Global state: {obs['global_state']}")
print(f"Conflicts: {(obs['conflict_matrix'] > 0).sum() // 2}")

Active aircraft: 20/20
Global state: [0.   1.25 0.   0.  ]
Conflicts: 0


# Section 3: Training with PPO

Simple PPO training using Stable-Baselines3:


## 3.1 Create PPO Model


In [None]:
# Vectorize environment for SB3
vec_env = DummyVecEnv([lambda: env])

# Create PPO model with built-in MultiInputPolicy
model = PPO(
    "MultiInputPolicy",  # SB3's built-in policy for Dict spaces
    vec_env,
    learning_rate=config['ppo']['learning_rate'],
    gamma=config['ppo']['gamma'],
    gae_lambda=config['ppo']['gae_lambda'],
    clip_range=config['ppo']['clip_epsilon'],
    n_steps=config['ppo']['n_steps'],
    batch_size=config['ppo']['batch_size'],
    n_epochs=config['ppo']['n_epochs'],
    verbose=1,
)

print('✅ PPO model created')
print(f'   Learning rate: {config["ppo"]["learning_rate"]}')
print(f'   Gamma: {config["ppo"]["gamma"]}')
print(f'   Steps per update: {config["ppo"]["n_steps"]}')


## 3.2 Train the Model

In [None]:
# Train the model (uncomment to run)
# model.learn(total_timesteps=10_000, progress_bar=True)
# model.save("checkpoints/openscope_ppo")

print("⏭️  Training skipped (uncomment to run)")
print("   For real training, use 100k+ timesteps")
print("   Current config: 10k timesteps for quick demo")


# Section 4: Summary

## What We Built

✅ **OpenScope Environment** - Playwright browser automation for real ATC simulation  
✅ **Gymnasium interface** - Standard RL environment with Dict obs/action spaces  
✅ **PPO Training** - Simple Stable-Baselines3 setup ready to train  
✅ **Self-contained** - All code in one notebook for easy learning

## Key Simplifications

This notebook focuses on **RL workflow over implementation details**:

- **Simple config**: Direct YAML dict loading (no Pydantic overhead)
- **Built-in policy**: SB3's MultiInputPolicy handles Dict spaces automatically  
- **No custom networks**: Focus on environment interaction, not architecture
- **Clear structure**: 4 sections from setup to training

## Next Steps

**To actually train**:
1. Uncomment training code in Section 3.2
2. Increase timesteps to 100k+ for meaningful results
3. Monitor with TensorBoard: `tensorboard --logdir logs`

**To improve performance**:
- Add custom Transformer network for variable aircraft (see original version)
- Implement curriculum learning (start with fewer aircraft)
- Try reward shaping (add conflict penalties, progress bonuses)
- Use parallel environments for faster training

**To extend**:
- Test on multiple airports (KJFK, EGLL, KSEA)
- Add evaluation metrics (conflicts, efficiency, score)
- Implement hierarchical policies (conditional actions)
- Export trained model for deployment

**Resources**: See `CLAUDE.md` for full project documentation

In [None]:
# Cleanup
env.close()
print('✅ Environment closed!')
