# OpenScope Data Extraction

This notebook demonstrates **what data can be extracted from OpenScope** using the project's built-in extraction functions.

**What this notebook shows:**
1. **Basic Extraction** - Core properties needed for basic RL training
2. **Enhanced Extraction** - Additional properties including wind, waypoints, and more
3. **Optimal Extraction** - Production-ready extraction with all ATC-critical data
4. **Data Comparison** - Side-by-side comparison of what each function extracts

**Purpose**: To see exactly what data is available from OpenScope and how each extraction function accesses it.

**Prerequisites**:
- OpenScope server running at http://localhost:3003
- Browser automation setup


## Setup and Imports


In [1]:
import nest_asyncio
nest_asyncio.apply()

import json
from pathlib import Path
from typing import Dict, Any, List
from environment import PlaywrightEnv
from environment.utils import (
    extract_game_state,
    extract_enhanced_game_state,
    extract_optimal_game_state
)

print("✅ Imports complete")
print("   - PlaywrightEnv for environment setup")
print("   - extract_game_state() - Basic game state extraction")
print("   - extract_enhanced_game_state() - Enhanced with more properties")
print("   - extract_optimal_game_state() - Optimal for training data collection")


✅ Imports complete
   - PlaywrightEnv for environment setup
   - extract_game_state() - Basic game state extraction
   - extract_enhanced_game_state() - Enhanced with more properties
   - extract_optimal_game_state() - Optimal for training data collection


## Initialize Environment


In [2]:
# Create the environment directly (PlaywrightEnv accepts these parameters)
env = PlaywrightEnv(
    airport="KLAS",
    max_aircraft=10,
    episode_length=300,
    headless=False  # Keep visible to see what's happening
)

# Reset to get initial state
obs, info = env.reset()

print("✅ Environment initialized")
print(f"   Aircraft count: {info.get('aircraft_count', 0)}")


✅ Environment initialized
   Aircraft count: 0


## Using Project Extraction Functions

The project provides three extraction functions with different levels of detail. Let's demonstrate each one:


### 1. Basic Game State Extraction (`extract_game_state`)

The simplest extraction function - extracts core properties needed for basic RL training.


In [3]:
# Use the basic extraction function
page = env.browser_manager.page
basic_state = extract_game_state(page)

print("✅ Extracted basic game state")
print(f"\n📊 State Summary:")
print(f"   Aircraft Count: {basic_state.get('numAircraft', 0)}")
print(f"   Score: {basic_state.get('score', 'N/A')}")
print(f"   Game Time: {basic_state.get('time', 'N/A')}")
print(f"   Conflicts: {len(basic_state.get('conflicts', []))}")

if basic_state.get('aircraft'):
    sample_ac = basic_state['aircraft'][0]
    print(f"\n✈️  Sample Aircraft Data ({len(sample_ac)} properties):")
    print(f"   Callsign: {sample_ac.get('callsign', 'N/A')}")
    
    # Show ALL properties with their values
    print(f"\n   All Extracted Properties:")
    for key in sorted(sample_ac.keys()):
        val = sample_ac[key]
        if isinstance(val, list) and len(val) == 2:
            print(f"      {key:20s}: [{val[0]:7.2f}, {val[1]:7.2f}]")
        elif isinstance(val, (int, float)):
            print(f"      {key:20s}: {val}")
        else:
            print(f"      {key:20s}: {val}")


✅ Extracted basic game state

📊 State Summary:
   Aircraft Count: 20
   Score: 0
   Game Time: 10.066635999999999
   Conflicts: 0

✈️  Sample Aircraft Data (14 properties):
   Callsign: QXE3919

   All Extracted Properties:
      altitude            : 15000
      assignedAltitude    : 12000
      assignedHeading     : 5.028252360551294
      assignedSpeed       : 320
      callsign            : QXE3919
      category            : arrival
      groundSpeed         : 405.3626043793639
      heading             : 5.007181246119508
      isEstablished       : False
      isOnGround          : False
      isTaxiing           : False
      position            : [  59.34,  -20.44]
      speed               : 320
      targetRunway        : 16R


### 2. Enhanced Game State Extraction (`extract_enhanced_game_state`)

Extracts additional properties including wind data, flight phase, waypoints, and more.


In [4]:
# Use the enhanced extraction function
enhanced_state = extract_enhanced_game_state(page)

print("✅ Extracted enhanced game state")
print(f"\n📊 State Summary:")
print(f"   Aircraft Count: {enhanced_state.get('numAircraft', 0)}")
print(f"   Score: {enhanced_state.get('score', 'N/A')}")
print(f"   Game Time: {enhanced_state.get('time', 'N/A')}")
print(f"   Conflicts: {len(enhanced_state.get('conflicts', []))}")

if enhanced_state.get('aircraft'):
    sample_ac = enhanced_state['aircraft'][0]
    print(f"\n✈️  Sample Aircraft Data ({len(sample_ac)} properties):")
    print(f"   Callsign: {sample_ac.get('callsign', 'N/A')}")
    
    # Group properties by category for better readability
    core = ['callsign', 'position', 'altitude', 'heading', 'speed', 'groundSpeed', 
            'assignedAltitude', 'assignedHeading', 'assignedSpeed', 'category',
            'isOnGround', 'isTaxiing', 'isEstablished', 'targetRunway']
    
    flight_dynamics = ['verticalSpeed', 'flightPhase', 'groundTrack', 'course', 'track', 'bearing',
                       'pitch', 'roll', 'yaw', 'mach', 'trueAirspeed', 'calibratedAirspeed']
    
    position_data = ['absolutePosition', 'lat', 'lon', 'latitude', 'longitude']
    
    wind_data = ['windSpeed', 'windDirection', 'headwind', 'tailwind', 'crosswind']
    
    aircraft_info = ['squawk', 'distanceToRunway', 'timeToRunway', 'eta', 
                     'fuel', 'weight', 'mass']
    
    flight_plan = ['flightPlan']
    
    # Display grouped properties
    def print_group(name, props):
        found = [(p, sample_ac[p]) for p in props if p in sample_ac]
        if found:
            print(f"\n   {name}:")
            for prop, val in found:
                if isinstance(val, list) and len(val) == 2:
                    print(f"      {prop:25s}: [{val[0]:7.2f}, {val[1]:7.2f}]")
                elif isinstance(val, dict):
                    if prop == 'flightPlan':
                        fp = val
                        print(f"      {prop:25s}: waypoints={len(fp.get('waypoints', [])) if fp.get('waypoints') else 0}, "
                              f"current={fp.get('currentWaypoint', 'N/A')}, next={fp.get('nextWaypoint', 'N/A')}")
                    else:
                        print(f"      {prop:25s}: {list(val.keys()) if val else '{}'}")
                elif isinstance(val, (int, float)):
                    print(f"      {prop:25s}: {val}")
                elif val is not None:
                    print(f"      {prop:25s}: {val}")
    
    print_group("Core Properties", core)
    print_group("Flight Dynamics", flight_dynamics)
    print_group("Position Data", position_data)
    print_group("Wind Data", wind_data)
    print_group("Aircraft Info", aircraft_info)
    print_group("Flight Plan", flight_plan)
    
    # Show any remaining properties not in groups
    all_shown = set(core + flight_dynamics + position_data + wind_data + aircraft_info + flight_plan)
    remaining = [(k, v) for k, v in sorted(sample_ac.items()) if k not in all_shown and v is not None]
    if remaining:
        print(f"\n   Additional Properties:")
        for prop, val in remaining:
            if isinstance(val, (int, float)):
                print(f"      {prop:25s}: {val}")
            elif isinstance(val, list) and len(val) == 2:
                print(f"      {prop:25s}: [{val[0]:7.2f}, {val[1]:7.2f}]")
            else:
                print(f"      {prop:25s}: {val}")


✅ Extracted enhanced game state

📊 State Summary:
   Aircraft Count: 20
   Score: 0
   Game Time: 10.091636
   Conflicts: 0

✈️  Sample Aircraft Data (44 properties):
   Callsign: QXE3919

   Core Properties:
      callsign                 : QXE3919
      position                 : [  59.33,  -20.44]
      altitude                 : 15000
      heading                  : 5.007180380146209
      speed                    : 320
      groundSpeed              : 405.3625976465687
      assignedAltitude         : 12000
      assignedHeading          : 5.028252360551294
      assignedSpeed            : 320
      category                 : arrival
      isOnGround               : False
      isTaxiing                : False
      isEstablished            : False
      targetRunway             : 16R

   Flight Dynamics:
      flightPhase              : CRUISE
      groundTrack              : -1.2564098379795459
      trueAirspeed             : 396.8

   Position Data:

   Wind Data:

   Aircraf

### 3. Optimal Game State Extraction (`extract_optimal_game_state`)

Extracts the 14 core features per aircraft plus additional ATC-critical data. This is optimized for production training data collection.


In [5]:
# Use the optimal extraction function (for production training data)
optimal_state = extract_optimal_game_state(page)

print("✅ Extracted optimal game state")
print(f"\n📊 State Summary:")
print(f"   Aircraft Count: {optimal_state.get('numAircraft', 0)}")
print(f"   Score: {optimal_state.get('score', 'N/A')}")
print(f"   Game Time: {optimal_state.get('time', 'N/A')}")
print(f"   Conflicts: {len(optimal_state.get('conflicts', []))}")

if optimal_state.get('aircraft'):
    sample_ac = optimal_state['aircraft'][0]
    print(f"\n✈️  Sample Aircraft Data ({len(sample_ac)} properties):")
    print(f"   Callsign: {sample_ac.get('callsign', 'N/A')}")
    
    # Group properties
    core_14 = ['position', 'altitude', 'heading', 'speed', 'groundSpeed',
               'assignedAltitude', 'assignedHeading', 'assignedSpeed',
               'category', 'isOnGround', 'isTaxiing', 'isEstablished', 'targetRunway']
    
    atc_critical = ['windComponents', 'flightPhase', 'nextWaypoint', 'currentWaypoint',
                    'flightPlanAltitude', 'flightPlanRoute', 'hasApproachClearance',
                    'isOnFinal', 'isEstablishedOnGlidepath']
    
    operational = ['isControllable', 'transponderCode', 'groundTrack', 
                   'trueAirspeed', 'climbRate', 'distance']
    
    # Display grouped properties
    def print_group(name, props):
        found = [(p, sample_ac[p]) for p in props if p in sample_ac]
        if found:
            print(f"\n   {name}:")
            for prop, val in found:
                if isinstance(val, list) and len(val) == 2:
                    print(f"      {prop:25s}: [{val[0]:7.2f}, {val[1]:7.2f}]")
                elif isinstance(val, dict):
                    if prop == 'windComponents':
                        print(f"      {prop:25s}: head={val.get('head', 'N/A'):6.2f} kts, cross={val.get('cross', 'N/A'):6.2f} kts")
                    else:
                        print(f"      {prop:25s}: {list(val.keys()) if val else '{}'}")
                elif isinstance(val, (int, float)):
                    print(f"      {prop:25s}: {val}")
                elif val is not None:
                    print(f"      {prop:25s}: {val}")
    
    print_group("Core 14 Features", core_14)
    print_group("ATC-Critical Data", atc_critical)
    print_group("Operational State", operational)
    
    # Show any remaining properties
    all_shown = set(core_14 + atc_critical + operational + ['callsign'])
    remaining = [(k, v) for k, v in sorted(sample_ac.items()) if k not in all_shown and v is not None]
    if remaining:
        print(f"\n   Additional Properties:")
        for prop, val in remaining:
            if isinstance(val, (int, float)):
                print(f"      {prop:25s}: {val}")
            elif isinstance(val, list) and len(val) == 2:
                print(f"      {prop:25s}: [{val[0]:7.2f}, {val[1]:7.2f}]")
            else:
                print(f"      {prop:25s}: {val}")


✅ Extracted optimal game state

📊 State Summary:
   Aircraft Count: 20
   Score: 0
   Game Time: 10.107336
   Conflicts: 0

✈️  Sample Aircraft Data (29 properties):
   Callsign: QXE3919

   Core 14 Features:
      position                 : [  59.33,  -20.44]
      altitude                 : 15000
      heading                  : 5.007179872508006
      speed                    : 320
      groundSpeed              : 405.36259369976494
      assignedAltitude         : 12000
      assignedHeading          : 5.028252360551294
      assignedSpeed            : 320
      category                 : arrival
      isOnGround               : False
      isTaxiing                : False
      isEstablished            : False
      targetRunway             : 16R

   ATC-Critical Data:
      windComponents           : head= -6.57 kts, cross=  6.15 kts
      flightPhase              : CRUISE
      nextWaypoint             : AUBRN
      currentWaypoint          : HUMPP
      flightPlanAltitude      

### Comparison: Basic vs Enhanced vs Optimal

Side-by-side comparison of what each function extracts.

Note: All three functions extract the same conflict and global state data (score, time).


In [6]:
# Compare the three extraction functions
if basic_state.get('aircraft') and enhanced_state.get('aircraft') and optimal_state.get('aircraft'):
    basic_props = set(basic_state['aircraft'][0].keys())
    enhanced_props = set(enhanced_state['aircraft'][0].keys())
    optimal_props = set(optimal_state['aircraft'][0].keys())
    
    print("📊 Extraction Function Comparison:")
    print(f"\n   Property Counts:")
    print(f"      Basic:     {len(basic_props):3d} properties")
    print(f"      Enhanced:  {len(enhanced_props):3d} properties")
    print(f"      Optimal:   {len(optimal_props):3d} properties")
    
    # Calculate overlaps
    common_all = basic_props & enhanced_props & optimal_props
    only_enhanced = enhanced_props - basic_props - optimal_props
    only_optimal = optimal_props - basic_props - enhanced_props
    
    print(f"\n   Overlap:")
    print(f"      Common to all:    {len(common_all):3d} properties")
    print(f"      Only in Enhanced: {len(only_enhanced):3d} properties")
    print(f"      Only in Optimal:  {len(only_optimal):3d} properties")
    
    # Show examples of unique properties
    if only_enhanced:
        print(f"\n   Enhanced-Only Examples:")
        for prop in sorted(list(only_enhanced))[:8]:
            print(f"      - {prop}")
        if len(only_enhanced) > 8:
            print(f"      ... and {len(only_enhanced) - 8} more")
    
    if only_optimal:
        print(f"\n   Optimal-Only Examples:")
        for prop in sorted(list(only_optimal))[:8]:
            print(f"      - {prop}")
        if len(only_optimal) > 8:
            print(f"      ... and {len(only_optimal) - 8} more")
    
    # Show conflict data structure (same for all)
    if optimal_state.get('conflicts'):
        print(f"\n   🔍 Conflict Data Structure:")
        sample_conflict = optimal_state['conflicts'][0]
        print(f"      Properties: {list(sample_conflict.keys())}")
        print(f"      Example: {sample_conflict}")
    else:
        print(f"\n   🔍 Conflicts: {len(optimal_state.get('conflicts', []))} (structure available when conflicts exist)")
    
    print(f"\n   💡 Usage:")
    print(f"      Basic:     Simple RL - core 14 properties only")
    print(f"      Enhanced:  Exploration - all available properties (44+)")
    print(f"      Optimal:   Production - 14 core + ATC-critical data (29 total)")


📊 Extraction Function Comparison:

   Property Counts:
      Basic:      14 properties
      Enhanced:   44 properties
      Optimal:    29 properties

   Overlap:
      Common to all:     14 properties
      Only in Enhanced:  27 properties
      Only in Optimal:   12 properties

   Enhanced-Only Examples:
      - absolutePosition
      - bearing
      - calibratedAirspeed
      - course
      - crosswind
      - distanceToRunway
      - eta
      - flightPlan
      ... and 19 more

   Optimal-Only Examples:
      - climbRate
      - currentWaypoint
      - distance
      - flightPlanAltitude
      - flightPlanRoute
      - hasApproachClearance
      - isControllable
      - isEstablishedOnGlidepath
      ... and 4 more

   🔍 Conflicts: 0 (structure available when conflicts exist)

   💡 Usage:
      Basic:     Simple RL - core 14 properties only
      Enhanced:  Exploration - all available properties (44+)
      Optimal:   Production - 14 core + ATC-critical data (29 total)


## Cleanup


In [7]:
env.close()
print("✅ Environment closed")


✅ Environment closed
