# 🗺️ OSM Tile Extractor for Intersection Detection

This notebook extracts OpenStreetMap tiles systematically for training machine learning models to detect road intersections.

## Features
- 📊 **Visual Progress Tracking** - Real-time charts and progress bars
- 🗺️ **Interactive Preview Maps** - See your extraction area before downloading
- ⚡ **Fast Concurrent Downloads** - Multiple servers with rate limiting
- 💾 **Resume Capability** - Pick up where you left off
- 📈 **Performance Analytics** - Speed, success rates, and statistics

## 1. Setup & Dependencies

In [None]:
# Install required packages (run this first)
!pip install aiohttp folium tqdm pandas matplotlib seaborn requests ipywidgets

print("✅ Dependencies installed!")

In [None]:
# Import all necessary libraries
import asyncio
import aiohttp
import time
import json
import math
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from datetime import datetime
from dataclasses import dataclass, asdict
from typing import List, Dict, Optional, Tuple
from tqdm.notebook import tqdm
import folium
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets
import random
import numpy as np

# Configure matplotlib for inline plots
plt.style.use('default')
sns.set_palette("husl")
%matplotlib inline

print("📦 All imports successful!")
print(f"📅 Session started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 2. Configuration & Settings

In [None]:
# Configuration settings
CONFIG = {
    'output_dir': 'osm_training_data',
    'zoom_level': 16,
    'max_workers': 8,
    'batch_size': 100,
    'tile_size': 256,
    'progress_update_interval': 50  # Update progress every N tiles
}

# US regions for quick selection
REGIONS = {
    "test_sf": (37.7849, 37.7949, -122.4094, -122.3994),  # Small SF test area
    "downtown_sf": (37.7749, 37.8049, -122.4194, -122.3894),  # Downtown SF
    "california": (32.534156, 42.009518, -124.409591, -114.131211),
    "texas": (25.837377, 36.500704, -106.645646, -93.508292),
    "florida": (24.523096, 31.000888, -87.634938, -80.031362),
    "new_york": (40.477399, 45.015865, -79.762152, -71.777491),
    "chicago": (41.644335, 42.023135, -87.940101, -87.523661),
    "los_angeles": (33.704538, 34.337306, -118.668176, -118.155289),
    "seattle": (47.481002, 47.734138, -122.459696, -122.224433)
}

print("⚙️ Configuration loaded:")
for key, value in CONFIG.items():
    print(f"   {key}: {value}")

print(f"\n📍 Available regions: {list(REGIONS.keys())}")

## 3. Core Classes & Functions

In [None]:
@dataclass
class TileInfo:
    """Information about a map tile"""
    x: int
    y: int
    zoom: int
    lat_min: float
    lat_max: float
    lon_min: float
    lon_max: float
    filename: str
    downloaded: bool = False
    timestamp: Optional[str] = None
    server_used: Optional[str] = None

@dataclass
class TileServer:
    """Configuration for a tile server"""
    name: str
    url_template: str
    subdomains: List[str]
    max_requests_per_second: float
    last_request_time: float = 0

@dataclass
class DownloadStats:
    """Statistics for download session"""
    total_tiles: int = 0
    downloaded: int = 0
    failed: int = 0
    skipped: int = 0
    start_time: float = 0
    end_time: float = 0
    speeds: List[float] = None
    
    def __post_init__(self):
        if self.speeds is None:
            self.speeds = []
    
    @property
    def success_rate(self) -> float:
        if self.total_tiles == 0:
            return 0.0
        return (self.downloaded / self.total_tiles) * 100
    
    @property
    def average_speed(self) -> float:
        return np.mean(self.speeds) if self.speeds else 0.0
    
    @property
    def total_time(self) -> float:
        return self.end_time - self.start_time if self.end_time > self.start_time else 0

print("📋 Core classes defined successfully!")

In [None]:
class OSMTileExtractor:
    """Jupyter-optimized OSM tile extractor with visual progress tracking"""
    
    def __init__(self, output_dir: str = "osm_training_data", max_workers: int = 8):
        self.output_dir = Path(output_dir)
        self.max_workers = max_workers
        self.stats = DownloadStats()
        
        # Setup tile servers
        self.tile_servers = self._setup_tile_servers()
        self.server_rotation_index = 0
        
        # Progress tracking
        self.progress_bar = None
        self.live_stats = None
        
        self.setup_directories()
        self.init_database()
        
        print(f"🚀 OSM Extractor initialized:")
        print(f"   📁 Output directory: {self.output_dir}")
        print(f"   👥 Max workers: {self.max_workers}")
        print(f"   🌐 Tile servers: {len(self.tile_servers)}")
    
    def _setup_tile_servers(self) -> List[TileServer]:
        """Setup multiple tile servers for load balancing"""
        return [
            # OpenStreetMap servers
            TileServer("osm_main", "https://tile.openstreetmap.org/{z}/{x}/{y}.png", [""], 1.0),
            TileServer("osm_a", "https://a.tile.openstreetmap.org/{z}/{x}/{y}.png", [""], 1.0),
            TileServer("osm_b", "https://b.tile.openstreetmap.org/{z}/{x}/{y}.png", [""], 1.0),
            TileServer("osm_c", "https://c.tile.openstreetmap.org/{z}/{x}/{y}.png", [""], 1.0),
            
            # CartoDB (faster)
            TileServer("cartodb_light", "https://cartodb-basemaps-{s}.global.ssl.fastly.net/light_all/{z}/{x}/{y}.png", 
                      ["a", "b", "c", "d"], 3.0),
            
            # Stamen
            TileServer("stamen_toner", "https://stamen-tiles-{s}.a.ssl.fastly.net/toner/{z}/{x}/{y}.png", 
                      ["a", "b", "c", "d"], 2.0),
        ]
    
    def setup_directories(self):
        """Create organized directory structure"""
        directories = ["tiles", "metadata", "progress", "annotations", "processed"]
        for dir_name in directories:
            (self.output_dir / dir_name).mkdir(parents=True, exist_ok=True)
    
    def init_database(self):
        """Initialize SQLite database for progress tracking"""
        db_path = self.output_dir / "progress.db"
        with sqlite3.connect(db_path) as conn:
            conn.execute("""
                CREATE TABLE IF NOT EXISTS tiles (
                    id INTEGER PRIMARY KEY,
                    x INTEGER, y INTEGER, zoom INTEGER,
                    lat_min REAL, lat_max REAL, lon_min REAL, lon_max REAL,
                    filename TEXT, downloaded BOOLEAN DEFAULT FALSE,
                    timestamp TEXT, server_used TEXT,
                    UNIQUE(x, y, zoom)
                )
            """)
            conn.commit()
    
    def deg2num(self, lat_deg: float, lon_deg: float, zoom: int) -> Tuple[int, int]:
        """Convert lat/lon coordinates to tile numbers"""
        lat_rad = math.radians(lat_deg)
        n = 2.0 ** zoom
        x = int((lon_deg + 180.0) / 360.0 * n)
        y = int((1.0 - math.asinh(math.tan(lat_rad)) / math.pi) / 2.0 * n)
        return x, y
    
    def num2deg(self, x: int, y: int, zoom: int) -> Tuple[float, float, float, float]:
        """Convert tile numbers to lat/lon bounding box"""
        n = 2.0 ** zoom
        lon_min = x / n * 360.0 - 180.0
        lat_max = math.degrees(math.atan(math.sinh(math.pi * (1 - 2 * y / n))))
        lon_max = (x + 1) / n * 360.0 - 180.0
        lat_min = math.degrees(math.atan(math.sinh(math.pi * (1 - 2 * (y + 1) / n))))
        return lat_min, lat_max, lon_min, lon_max
    
    def generate_tile_grid(self, bbox: Tuple[float, float, float, float], zoom: int) -> List[TileInfo]:
        """Generate systematic grid of tiles for a bounding box"""
        lat_min, lat_max, lon_min, lon_max = bbox
        
        x_min, y_max = self.deg2num(lat_min, lon_min, zoom)
        x_max, y_min = self.deg2num(lat_max, lon_max, zoom)
        
        tiles = []
        for x in range(x_min, x_max + 1):
            for y in range(y_min, y_max + 1):
                tile_lat_min, tile_lat_max, tile_lon_min, tile_lon_max = self.num2deg(x, y, zoom)
                filename = f"tile_{zoom}_{x}_{y}.png"
                
                tile = TileInfo(
                    x=x, y=y, zoom=zoom,
                    lat_min=tile_lat_min, lat_max=tile_lat_max,
                    lon_min=tile_lon_min, lon_max=tile_lon_max,
                    filename=filename
                )
                tiles.append(tile)
        
        return tiles
    
    def get_next_server(self) -> TileServer:
        """Get next available server with load balancing"""
        current_time = time.time()
        
        for _ in range(len(self.tile_servers)):
            server = self.tile_servers[self.server_rotation_index]
            self.server_rotation_index = (self.server_rotation_index + 1) % len(self.tile_servers)
            
            min_interval = 1.0 / server.max_requests_per_second
            if current_time - server.last_request_time >= min_interval:
                server.last_request_time = current_time
                return server
        
        # If all servers are rate limited, use the least recently used
        return min(self.tile_servers, key=lambda s: s.last_request_time)
    
    def build_tile_url(self, server: TileServer, x: int, y: int, z: int) -> str:
        """Build tile URL with subdomain rotation"""
        subdomain = random.choice(server.subdomains) if server.subdomains else ""
        
        if "{s}" in server.url_template:
            return server.url_template.format(s=subdomain, x=x, y=y, z=z)
        else:
            return server.url_template.format(x=x, y=y, z=z)

print("🔧 OSMTileExtractor class defined successfully!")

## 4. Visualization & Preview Functions

In [None]:
def create_preview_map(bbox: Tuple[float, float, float, float], zoom: int = 10) -> folium.Map:
    """Create an interactive map showing the extraction area"""
    lat_min, lat_max, lon_min, lon_max = bbox
    
    # Center point
    center_lat = (lat_min + lat_max) / 2
    center_lon = (lon_min + lon_max) / 2
    
    # Create map
    m = folium.Map(
        location=[center_lat, center_lon],
        zoom_start=zoom,
        tiles='OpenStreetMap'
    )
    
    # Add bounding box rectangle
    folium.Rectangle(
        bounds=[[lat_min, lon_min], [lat_max, lon_max]],
        popup=f"Extraction Area<br>Lat: {lat_min:.4f} to {lat_max:.4f}<br>Lon: {lon_min:.4f} to {lon_max:.4f}",
        tooltip="Click for area details",
        color='red',
        weight=2,
        fillOpacity=0.1
    ).add_to(m)
    
    # Add corner markers
    corners = [
        ([lat_min, lon_min], "SW Corner"),
        ([lat_min, lon_max], "SE Corner"),
        ([lat_max, lon_min], "NW Corner"),
        ([lat_max, lon_max], "NE Corner"),
        ([center_lat, center_lon], "Center")
    ]
    
    for (lat, lon), label in corners:
        folium.Marker(
            [lat, lon],
            popup=f"{label}<br>{lat:.4f}, {lon:.4f}",
            tooltip=label
        ).add_to(m)
    
    return m

def calculate_area_stats(bbox: Tuple[float, float, float, float], zoom: int) -> Dict:
    """Calculate statistics for the extraction area"""
    lat_min, lat_max, lon_min, lon_max = bbox
    
    # Approximate area in km²
    lat_diff = lat_max - lat_min
    lon_diff = lon_max - lon_min
    
    # Rough conversion (varies with latitude)
    avg_lat = (lat_min + lat_max) / 2
    km_per_degree_lat = 111.0
    km_per_degree_lon = 111.0 * math.cos(math.radians(avg_lat))
    
    area_km2 = lat_diff * lon_diff * km_per_degree_lat * km_per_degree_lon
    
    # Calculate number of tiles
    n = 2.0 ** zoom
    x_min = int((lon_min + 180.0) / 360.0 * n)
    x_max = int((lon_max + 180.0) / 360.0 * n)
    y_min = int((1.0 - math.asinh(math.tan(math.radians(lat_max))) / math.pi) / 2.0 * n)
    y_max = int((1.0 - math.asinh(math.tan(math.radians(lat_min))) / math.pi) / 2.0 * n)
    
    total_tiles = (x_max - x_min + 1) * (y_max - y_min + 1)
    
    # Estimate download time and storage
    avg_tile_size_kb = 15  # Typical OSM tile size
    estimated_storage_mb = total_tiles * avg_tile_size_kb / 1024
    estimated_storage_gb = estimated_storage_mb / 1024
    
    # Time estimates for different speeds
    slow_speed = 1.0  # tiles per second
    fast_speed = 10.0  # tiles per second with optimization
    
    slow_time_hours = total_tiles / slow_speed / 3600
    fast_time_hours = total_tiles / fast_speed / 3600
    
    return {
        'bbox': bbox,
        'zoom': zoom,
        'area_km2': area_km2,
        'total_tiles': total_tiles,
        'estimated_storage_mb': estimated_storage_mb,
        'estimated_storage_gb': estimated_storage_gb,
        'slow_download_hours': slow_time_hours,
        'fast_download_hours': fast_time_hours,
        'tile_dimensions': f"{x_max-x_min+1} x {y_max-y_min+1}"
    }

def display_area_stats(stats: Dict):
    """Display area statistics in a nice format"""
    print("📊 EXTRACTION AREA STATISTICS")
    print("=" * 40)
    print(f"📍 Bounding Box: {stats['bbox']}")
    print(f"🔍 Zoom Level: {stats['zoom']}")
    print(f"🗺️  Area: {stats['area_km2']:.1f} km²")
    print(f"🔢 Total Tiles: {stats['total_tiles']:,}")
    print(f"📐 Tile Grid: {stats['tile_dimensions']}")
    print(f"💾 Storage (Est.): {stats['estimated_storage_mb']:.1f} MB ({stats['estimated_storage_gb']:.2f} GB)")
    print("\n⏱️  DOWNLOAD TIME ESTIMATES:")
    print(f"   🐌 Slow (1 tile/sec): {stats['slow_download_hours']:.1f} hours")
    print(f"   🚀 Fast (10 tiles/sec): {stats['fast_download_hours']:.1f} hours")
    
    # Visual representation
    if stats['total_tiles'] < 1000:
        size_category = "🟢 SMALL - Good for testing"
    elif stats['total_tiles'] < 10000:
        size_category = "🟡 MEDIUM - Good for development"
    elif stats['total_tiles'] < 100000:
        size_category = "🟠 LARGE - Production dataset"
    else:
        size_category = "🔴 VERY LARGE - Consider smaller areas first"
    
    print(f"\n📈 Size Category: {size_category}")

print("📊 Visualization functions defined successfully!")

## 5. Progress Tracking & Analytics

In [None]:
def create_progress_dashboard(stats: DownloadStats):
    """Create a live progress dashboard"""
    
    # Create subplots
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle('📊 OSM Tile Download Dashboard', fontsize=16, fontweight='bold')
    
    # 1. Progress pie chart
    if stats.total_tiles > 0:
        sizes = [stats.downloaded, stats.failed, stats.total_tiles - stats.downloaded - stats.failed]
        labels = ['Downloaded', 'Failed', 'Remaining']
        colors = ['#2ecc71', '#e74c3c', '#95a5a6']
        
        ax1.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=90)
        ax1.set_title('Overall Progress')
    
    # 2. Success rate over time
    if len(stats.speeds) > 0:
        ax2.plot(stats.speeds, color='#3498db', linewidth=2)
        ax2.set_title('Download Speed (tiles/sec)')
        ax2.set_xlabel('Batch Number')
        ax2.set_ylabel('Speed')
        ax2.grid(True, alpha=0.3)
    
    # 3. Statistics bar chart
    metrics = ['Downloaded', 'Failed', 'Success Rate %']
    values = [stats.downloaded, stats.failed, stats.success_rate]
    colors_bar = ['#2ecc71', '#e74c3c', '#f39c12']
    
    bars = ax3.bar(metrics, values, color=colors_bar)
    ax3.set_title('Download Statistics')
    ax3.set_ylabel('Count / Percentage')
    
    # Add value labels on bars
    for bar, value in zip(bars, values):
        ax3.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1, 
                f'{value:.0f}', ha='center', va='bottom')
    
    # 4. Time remaining estimate
    if stats.average_speed > 0 and stats.total_tiles > stats.downloaded + stats.failed:
        remaining_tiles = stats.total_tiles - stats.downloaded - stats.failed
        eta_seconds = remaining_tiles / stats.average_speed
        eta_hours = eta_seconds / 3600
        
        ax4.text(0.5, 0.7, f'⏱️ ETA: {eta_hours:.1f} hours', 
                ha='center', va='center', fontsize=14, fontweight='bold')
        ax4.text(0.5, 0.5, f'🚀 Avg Speed: {stats.average_speed:.1f} tiles/sec', 
                ha='center', va='center', fontsize=12)
        ax4.text(0.5, 0.3, f'📦 Remaining: {remaining_tiles:,} tiles', 
                ha='center', va='center', fontsize=12)
    
    ax4.set_xlim(0, 1)
    ax4.set_ylim(0, 1)
    ax4.axis('off')
    ax4.set_title('Time Estimates')
    
    plt.tight_layout()
    plt.show()

def save_session_report(stats: DownloadStats, output_dir: Path, bbox: Tuple):
    """Save a detailed session report"""
    report = {
        'session_info': {
            'timestamp': datetime.now().isoformat(),
            'bbox': bbox,
            'total_tiles': stats.total_tiles,
            'downloaded': stats.downloaded,
            'failed': stats.failed,
            'success_rate': stats.success_rate,
            'total_time_seconds': stats.total_time,
            'average_speed': stats.average_speed
        },
        'performance_data': {
            'speeds': stats.speeds,
            'max_speed': max(stats.speeds) if stats.speeds else 0,
            'min_speed': min(stats.speeds) if stats.speeds else 0
        }
    }
    
    report_path = output_dir / "progress" / f"session_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
    with open(report_path, 'w') as f:
        json.dump(report, f, indent=2)
    
    print(f"📄 Session report saved: {report_path}")
    return report_path

print("📈 Progress tracking functions defined successfully!")

## 6. Interactive Area Selection

In [None]:
# Interactive widgets for area selection
region_dropdown = widgets.Dropdown(
    options=list(REGIONS.keys()),
    value='test_sf',
    description='Region:',
    style={'description_width': 'initial'}
)

zoom_slider = widgets.IntSlider(
    value=16,
    min=14,
    max=18,
    step=1,
    description='Zoom Level:',
    style={'description_width': 'initial'}
)

workers_slider = widgets.IntSlider(
    value=8,
    min=1,
    max=16,
    step=1,
    description='Max Workers:',
    style={'description_width': 'initial'}
)

# Custom bbox inputs
lat_min_input = widgets.FloatText(value=37.7849, description='Lat Min:')
lat_max_input = widgets.FloatText(value=37.7949, description='Lat Max:')
lon_min_input = widgets.FloatText(value=-122.4094, description='Lon Min:')
lon_max_input = widgets.FloatText(value=-122.3994, description='Lon Max:')

# Buttons
preview_button = widgets.Button(description="🗺️ Preview Area", button_style='info')
calculate_button = widgets.Button(description="📊 Calculate Stats", button_style='warning')
download_button = widgets.Button(description="🚀 Start Download", button_style='success')

# Output areas
output_area = widgets.Output()
map_area = widgets.Output()
stats_area = widgets.Output()

# Global variables to store current selection
current_bbox = None
current_stats = None

def update_bbox_from_region(change):
    """Update bbox inputs when region changes"""
    global current_bbox
    if change['new'] in REGIONS:
        bbox = REGIONS[change['new']]
        lat_min_input.value = bbox[0]
        lat_max_input.value = bbox[1] 
        lon_min_input.value = bbox[2]
        lon_max_input.value = bbox[3]
        current_bbox = bbox

def on_preview_click(b):
    """Handle preview button click"""
    global current_bbox
    current_bbox = (lat_min_input.value, lat_max_input.value, lon_min_input.value, lon_max_input.value)
    
    with map_area:
        clear_output(wait=True)
        print("🗺️ Generating preview map...")
        preview_map = create_preview_map(current_bbox, zoom=10)
        display(preview_map)

def on_calculate_click(b):
    """Handle calculate stats button click"""
    global current_bbox, current_stats
    current_bbox = (lat_min_input.value, lat_max_input.value, lon_min_input.value, lon_max_input.value)
    
    with stats_area:
        clear_output(wait=True)
        print("📊 Calculating statistics...")
        current_stats = calculate_area_stats(current_bbox, zoom_slider.value)
        display_area_stats(current_stats)

def on_download_click(b):
    """Handle download button click"""
    global current_bbox
    if current_bbox is None:
        print("❌ Please preview area first!")
        return
    
    with output_area:
        clear_output(wait=True)
        print("🚀 Starting download process...")
        # This will be implemented in the next section

# Connect event handlers
region_dropdown.observe(update_bbox_from_region, names='value')
preview_button.on_click(on_preview_click)
calculate_button.on_click(on_calculate_click)
download_button.on_click(on_download_click)

# Initialize with default region
update_bbox_from_region({'new': region_dropdown.value})

print("🎛️ Interactive controls ready!")

## 7. Control Panel

Use this interactive control panel to select your extraction area and settings:

In [None]:
# Display the control panel
control_panel = widgets.VBox([
    widgets.HTML("<h3>🎛️ OSM Tile Extraction Control Panel</h3>"),
    
    widgets.HBox([region_dropdown, zoom_slider, workers_slider]),
    
    widgets.HTML("<h4>📍 Custom Bounding Box (optional):</h4>"),
    widgets.HBox([lat_min_input, lat_max_input]),
    widgets.HBox([lon_min_input, lon_max_input]),
    
    widgets.HTML("<h4>🔧 Actions:</h4>"),
    widgets.HBox([preview_button, calculate_button, download_button]),
    
    output_area
])

display(control_panel)

In [None]:
# Display areas for map and stats
display(widgets.HTML("<h4>🗺️ Preview Map:</h4>"))
display(map_area)

display(widgets.HTML("<h4>📊 Area Statistics:</h4>"))
display(stats_area)

## 8. Download Implementation with Progress Tracking

In [None]:
async def download_tiles_with_progress(extractor: OSMTileExtractor, tiles: List[TileInfo], 
                                     batch_size: int = 100) -> DownloadStats:
    """Download tiles with visual progress tracking"""
    
    stats = DownloadStats()
    stats.total_tiles = len(tiles)
    stats.start_time = time.time()
    
    # Create progress bar
    progress_bar = tqdm(total=len(tiles), desc="Downloading tiles")
    
    # Process in batches
    for i in range(0, len(tiles), batch_size):
        batch = tiles[i:i + batch_size]
        batch_start = time.time()
        
        # Download batch
        try:
            results = await download_batch_async(extractor, batch)
            
            # Update statistics
            batch_downloaded = sum(results)
            batch_failed = len(results) - batch_downloaded
            
            stats.downloaded += batch_downloaded
            stats.failed += batch_failed
            
            # Calculate speed
            batch_time = time.time() - batch_start
            batch_speed = len(batch) / batch_time if batch_time > 0 else 0
            stats.speeds.append(batch_speed)
            
            # Update progress bar
            progress_bar.update(len(batch))
            progress_bar.set_postfix({
                'Success': f'{stats.success_rate:.1f}%',
                'Speed': f'{batch_speed:.1f} t/s',
                'Avg': f'{stats.average_speed:.1f} t/s'
            })
            
            # Show dashboard every few batches
            if (i // batch_size + 1) % 5 == 0:
                clear_output(wait=True)
                create_progress_dashboard(stats)
                progress_bar = tqdm(total=len(tiles), initial=i+len(batch), desc="Downloading tiles")
                
        except Exception as e:
            print(f"❌ Batch error: {e}")
            stats.failed += len(batch)
    
    progress_bar.close()
    stats.end_time = time.time()
    
    return stats

async def download_batch_async(extractor: OSMTileExtractor, tiles: List[TileInfo]) -> List[bool]:
    """Download a batch of tiles asynchronously"""
    
    connector = aiohttp.TCPConnector(limit=extractor.max_workers)
    timeout = aiohttp.ClientTimeout(total=30)
    
    async with aiohttp.ClientSession(
        connector=connector,
        timeout=timeout,
        headers={'User-Agent': 'Jupyter-OSM-Intersection-Training-Extractor/1.0'}
    ) as session:
        
        semaphore = asyncio.Semaphore(extractor.max_workers)
        
        async def download_single_tile(tile: TileInfo) -> bool:
            async with semaphore:
                return await download_tile_async(extractor, session, tile)
        
        # Execute all downloads concurrently
        tasks = [download_single_tile(tile) for tile in tiles]
        results = await asyncio.gather(*tasks, return_exceptions=True)
        
        # Handle exceptions
        success_results = []
        for result in results:
            if isinstance(result, Exception):
                success_results.append(False)
            else:
                success_results.append(result)
        
        return success_results

async def download_tile_async(extractor: OSMTileExtractor, session: aiohttp.ClientSession, 
                            tile: TileInfo) -> bool:
    """Download a single tile asynchronously"""
    
    tile_path = extractor.output_dir / "tiles" / tile.filename
    
    # Skip if already exists
    if tile_path.exists():
        return True
    
    server = extractor.get_next_server()
    url = extractor.build_tile_url(server, tile.x, tile.y, tile.zoom)
    
    try:
        # Rate limiting
        await asyncio.sleep(1.0 / server.max_requests_per_second)
        
        async with session.get(url) as response:
            if response.status == 200:
                content = await response.read()
                
                # Save tile
                with open(tile_path, 'wb') as f:
                    f.write(content)
                
                # Save metadata
                tile.downloaded = True
                tile.timestamp = datetime.now().isoformat()
                tile.server_used = server.name
                
                metadata_path = extractor.output_dir / "metadata" / f"tile_{tile.zoom}_{tile.x}_{tile.y}.json"
                with open(metadata_path, 'w') as f:
                    json.dump(asdict(tile), f, indent=2)
                
                return True
            else:
                return False
    
    except Exception:
        return False

print("⚡ Download implementation ready!")

## 9. Main Download Function

In [None]:
async def run_extraction(bbox: Tuple[float, float, float, float], zoom: int = 16, 
                        max_workers: int = 8, batch_size: int = 100):
    """Main function to run the tile extraction with full progress tracking"""
    
    print(f"🚀 Starting OSM Tile Extraction")
    print(f"📍 Area: {bbox}")
    print(f"🔍 Zoom: {zoom}")
    print(f"👥 Workers: {max_workers}")
    print(f"📦 Batch size: {batch_size}")
    print("=" * 50)
    
    # Initialize extractor
    extractor = OSMTileExtractor(CONFIG['output_dir'], max_workers)
    
    # Generate tile grid
    print("📐 Generating tile grid...")
    tiles = extractor.generate_tile_grid(bbox, zoom)
    print(f"✅ Generated {len(tiles):,} tiles")
    
    # Check existing tiles
    existing_count = 0
    for tile in tiles:
        tile_path = extractor.output_dir / "tiles" / tile.filename
        if tile_path.exists():
            existing_count += 1
    
    remaining_tiles = [t for t in tiles if not (extractor.output_dir / "tiles" / t.filename).exists()]
    
    print(f"📊 Progress: {existing_count:,} / {len(tiles):,} tiles already downloaded")
    print(f"⏳ Remaining: {len(remaining_tiles):,} tiles to download")
    
    if not remaining_tiles:
        print("✅ All tiles already downloaded!")
        return
    
    # Start download
    print("\n🚀 Starting download...")
    stats = await download_tiles_with_progress(extractor, remaining_tiles, batch_size)
    
    # Final report
    print("\n" + "=" * 50)
    print("🎉 EXTRACTION COMPLETE!")
    print("=" * 50)
    print(f"📊 Total tiles: {stats.total_tiles:,}")
    print(f"✅ Downloaded: {stats.downloaded:,}")
    print(f"❌ Failed: {stats.failed:,}")
    print(f"📈 Success rate: {stats.success_rate:.1f}%")
    print(f"⏱️  Total time: {stats.total_time/60:.1f} minutes")
    print(f"⚡ Average speed: {stats.average_speed:.1f} tiles/second")
    
    # Save report
    report_path = save_session_report(stats, extractor.output_dir, bbox)
    
    # Final dashboard
    create_progress_dashboard(stats)
    
    return stats, extractor

# Update the download button handler
async def handle_download():
    """Handle the download process"""
    global current_bbox
    
    if current_bbox is None:
        print("❌ Please preview area and calculate stats first!")
        return
    
    # Run extraction
    stats, extractor = await run_extraction(
        current_bbox,
        zoom_slider.value,
        workers_slider.value,
        CONFIG['batch_size']
    )
    
    return stats, extractor

print("🎯 Main extraction function ready!")

## 10. Quick Start Examples

Here are some ready-to-run examples for different use cases:

In [None]:
# Example 1: Small Test Area (Great for first try)
async def run_small_test():
    """Download a small test area - perfect for getting started"""
    print("🧪 Running small test extraction...")
    
    # Small SF area - about 400 tiles
    test_bbox = (37.7849, 37.7949, -122.4094, -122.3994)
    
    stats = await run_extraction(test_bbox, zoom=16, max_workers=4, batch_size=50)
    print("\n✅ Small test complete! Check the 'osm_training_data' folder.")
    return stats

# Example 2: Medium Development Dataset  
async def run_medium_dataset():
    """Download a medium-sized dataset for model development"""
    print("🏗️ Running medium dataset extraction...")
    
    # Downtown SF - about 2,500 tiles
    medium_bbox = (37.7749, 37.8049, -122.4194, -122.3894)
    
    stats = await run_extraction(medium_bbox, zoom=16, max_workers=6, batch_size=100)
    print("\n✅ Medium dataset complete!")
    return stats

# Example 3: Large Production Dataset
async def run_large_dataset():
    """Download a large dataset for production model training"""
    print("🏭 Running large dataset extraction...")
    print("⚠️  This will take several hours and use significant storage!")
    
    # Los Angeles area - about 25,000 tiles  
    large_bbox = (33.704538, 34.337306, -118.668176, -118.155289)
    
    stats = await run_extraction(large_bbox, zoom=16, max_workers=8, batch_size=200)
    print("\n✅ Large dataset complete!")
    return stats

print("📚 Example functions ready!")
print("\n🎯 To get started, try: await run_small_test()")

## 11. Run Your Extraction

Choose one of the methods below to start your tile extraction:

In [None]:
# Method 1: Use the interactive control panel above
# 1. Select a region or enter custom coordinates
# 2. Click "Preview Area" to see the map
# 3. Click "Calculate Stats" to see size estimates
# 4. Click "Start Download" to begin extraction

# Method 2: Run a quick test
# Uncomment the line below to run a small test:
# await run_small_test()

# Method 3: Manual execution
# Uncomment and modify the lines below:
# custom_bbox = (37.7849, 37.7949, -122.4094, -122.3994)  # Your coordinates
# stats, extractor = await run_extraction(custom_bbox, zoom=16, max_workers=8)

print("🎛️ Use the control panel above or uncomment one of the methods in this cell to start!")

## 12. Post-Download Analysis

In [None]:
def analyze_downloaded_tiles(output_dir: str = "osm_training_data"):
    """Analyze the downloaded tiles and show statistics"""
    
    output_path = Path(output_dir)
    tiles_dir = output_path / "tiles"
    metadata_dir = output_path / "metadata"
    
    if not tiles_dir.exists():
        print(f"❌ No tiles directory found at {tiles_dir}")
        return
    
    # Count files
    tile_files = list(tiles_dir.glob("*.png"))
    metadata_files = list(metadata_dir.glob("*.json"))
    
    # Calculate storage
    total_size_bytes = sum(f.stat().st_size for f in tile_files)
    total_size_mb = total_size_bytes / 1024 / 1024
    total_size_gb = total_size_mb / 1024
    
    # Average file size
    avg_size_kb = (total_size_bytes / len(tile_files) / 1024) if tile_files else 0
    
    print("📊 DOWNLOADED DATASET ANALYSIS")
    print("=" * 40)
    print(f"📁 Output directory: {output_path}")
    print(f"🖼️  Total tiles: {len(tile_files):,}")
    print(f"📄 Metadata files: {len(metadata_files):,}")
    print(f"💾 Total storage: {total_size_mb:.1f} MB ({total_size_gb:.2f} GB)")
    print(f"📏 Average tile size: {avg_size_kb:.1f} KB")
    
    # Load and analyze metadata if available
    if metadata_files:
        print("\n🔍 Analyzing tile metadata...")
        
        # Load sample metadata
        with open(metadata_files[0]) as f:
            sample_metadata = json.load(f)
        
        zoom_level = sample_metadata.get('zoom', 'Unknown')
        print(f"🔍 Zoom level: {zoom_level}")
        
        # Calculate coverage area
        if len(metadata_files) >= 4:  # Need at least 4 corners
            lats, lons = [], []
            for metadata_file in metadata_files[:100]:  # Sample first 100
                try:
                    with open(metadata_file) as f:
                        data = json.load(f)
                        lats.extend([data['lat_min'], data['lat_max']])
                        lons.extend([data['lon_min'], data['lon_max']])
                except:
                    continue
            
            if lats and lons:
                lat_range = max(lats) - min(lats)
                lon_range = max(lons) - min(lons)
                
                print(f"🌍 Coverage area:")
                print(f"   Latitude range: {lat_range:.4f}° ({lat_range * 111:.1f} km)")
                print(f"   Longitude range: {lon_range:.4f}° ({lon_range * 111:.1f} km)")
    
    # Create visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
    
    # File size distribution
    if tile_files:
        sizes_kb = [f.stat().st_size / 1024 for f in tile_files[:1000]]  # Sample first 1000
        ax1.hist(sizes_kb, bins=30, alpha=0.7, color='skyblue')
        ax1.set_title('Tile Size Distribution (KB)')
        ax1.set_xlabel('Size (KB)')
        ax1.set_ylabel('Count')
    
    # Storage breakdown pie chart
    if total_size_mb > 0:
        labels = ['Tiles', 'Metadata', 'Other']
        metadata_size = sum(f.stat().st_size for f in metadata_files) / 1024 / 1024
        other_size = max(0, total_size_mb * 0.01)  # Estimate
        
        sizes = [total_size_mb - metadata_size - other_size, metadata_size, other_size]
        colors = ['#3498db', '#2ecc71', '#95a5a6']
        
        ax2.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%')
        ax2.set_title('Storage Breakdown (MB)')
    
    plt.tight_layout()
    plt.show()
    
    # Next steps suggestions
    print("\n🎯 NEXT STEPS FOR ML TRAINING:")
    print("1. 🏷️  Start annotating intersections in the downloaded tiles")
    print("2. 🔄 Use tools like LabelImg, CVAT, or VGG Image Annotator")
    print("3. 🧠 Train your intersection detection model")
    print("4. 📈 Expand dataset based on model performance")
    
    return {
        'total_tiles': len(tile_files),
        'total_size_mb': total_size_mb,
        'avg_size_kb': avg_size_kb,
        'tiles_dir': tiles_dir,
        'metadata_dir': metadata_dir
    }

# Run analysis on current dataset
# Uncomment the line below after downloading tiles:
# analysis_results = analyze_downloaded_tiles()

print("📊 Analysis functions ready! Run analyze_downloaded_tiles() after downloading.")

## 13. Save & Export Functions

In [None]:
def create_training_manifest(output_dir: str = "osm_training_data"):
    """Create a comprehensive manifest for ML training"""
    
    output_path = Path(output_dir)
    tiles_dir = output_path / "tiles"
    metadata_dir = output_path / "metadata"
    
    if not tiles_dir.exists():
        print(f"❌ No tiles found at {tiles_dir}")
        return
    
    # Collect all tile information
    tiles_info = []
    tile_files = list(tiles_dir.glob("*.png"))
    
    for tile_file in tqdm(tile_files, desc="Processing tiles"):
        # Extract tile info from filename
        parts = tile_file.stem.split('_')  # tile_16_1234_5678
        if len(parts) >= 4:
            zoom, x, y = int(parts[1]), int(parts[2]), int(parts[3])
            
            # Load metadata if available
            metadata_file = metadata_dir / f"tile_{zoom}_{x}_{y}.json"
            metadata = {}
            if metadata_file.exists():
                with open(metadata_file) as f:
                    metadata = json.load(f)
            
            # File info
            file_size = tile_file.stat().st_size
            
            tile_info = {
                'filename': tile_file.name,
                'x': x, 'y': y, 'zoom': zoom,
                'file_size_bytes': file_size,
                'coordinates': {
                    'lat_min': metadata.get('lat_min'),
                    'lat_max': metadata.get('lat_max'),
                    'lon_min': metadata.get('lon_min'),
                    'lon_max': metadata.get('lon_max')
                },
                'downloaded_at': metadata.get('timestamp'),
                'server_used': metadata.get('server_used')
            }
            tiles_info.append(tile_info)
    
    # Create manifest
    manifest = {
        'dataset_info': {
            'name': 'OSM Intersection Detection Training Dataset',
            'created': datetime.now().isoformat(),
            'total_tiles': len(tiles_info),
            'zoom_levels': list(set(t['zoom'] for t in tiles_info)),
            'tile_format': 'PNG',
            'tile_size': '256x256',
            'coordinate_system': 'WGS84',
            'source': 'OpenStreetMap'
        },
        'coverage_stats': {
            'total_size_mb': sum(t['file_size_bytes'] for t in tiles_info) / 1024 / 1024,
            'avg_file_size_kb': np.mean([t['file_size_bytes'] for t in tiles_info]) / 1024,
            'bounding_box': {
                'lat_min': min(t['coordinates']['lat_min'] for t in tiles_info if t['coordinates']['lat_min']),
                'lat_max': max(t['coordinates']['lat_max'] for t in tiles_info if t['coordinates']['lat_max']),
                'lon_min': min(t['coordinates']['lon_min'] for t in tiles_info if t['coordinates']['lon_min']),
                'lon_max': max(t['coordinates']['lon_max'] for t in tiles_info if t['coordinates']['lon_max'])
            } if any(t['coordinates']['lat_min'] for t in tiles_info) else None
        },
        'tiles': tiles_info
    }
    
    # Save manifest
    manifest_path = output_path / "training_manifest.json"
    with open(manifest_path, 'w') as f:
        json.dump(manifest, f, indent=2)
    
    # Also save a CSV version for easy analysis
    df = pd.DataFrame(tiles_info)
    csv_path = output_path / "tiles_catalog.csv"
    df.to_csv(csv_path, index=False)
    
    print(f"📋 Training manifest saved: {manifest_path}")
    print(f"📊 CSV catalog saved: {csv_path}")
    print(f"✅ Dataset ready with {len(tiles_info):,} tiles!")
    
    return manifest_path, csv_path

def export_sample_for_annotation(output_dir: str = "osm_training_data", sample_size: int = 100):
    """Export a random sample of tiles for manual annotation"""
    
    output_path = Path(output_dir)
    tiles_dir = output_path / "tiles"
    sample_dir = output_path / "annotation_sample"
    
    sample_dir.mkdir(exist_ok=True)
    
    tile_files = list(tiles_dir.glob("*.png"))
    
    if len(tile_files) < sample_size:
        sample_size = len(tile_files)
        print(f"⚠️  Only {len(tile_files)} tiles available, sampling all of them.")
    
    # Random sample
    import shutil
    sample_files = random.sample(tile_files, sample_size)
    
    for i, tile_file in enumerate(tqdm(sample_files, desc="Copying sample tiles")):
        new_name = f"sample_{i+1:03d}_{tile_file.name}"
        shutil.copy2(tile_file, sample_dir / new_name)
    
    print(f"📦 Exported {sample_size} tiles to {sample_dir}")
    print(f"🏷️  Ready for annotation with your preferred tool!")
    
    return sample_dir

print("💾 Export functions ready!")

## 🎉 Congratulations!

You now have a complete, interactive OSM tile extraction system! Here's what you can do:

### 🚀 **Getting Started:**
1. Use the **Control Panel** above to select your area
2. **Preview** the area on an interactive map
3. **Calculate statistics** to see download time and storage estimates
4. **Start the download** with visual progress tracking

### 📊 **After Downloading:**
- Run `analyze_downloaded_tiles()` to see dataset statistics
- Run `create_training_manifest()` to prepare for ML training
- Run `export_sample_for_annotation()` to get tiles ready for labeling

### 🎯 **Recommended Workflow:**
1. **Start small** - Try the `test_sf` region first (~400 tiles)
2. **Annotate sample** - Label intersections in 50-100 tiles
3. **Train initial model** - Build your first intersection detector
4. **Scale up** - Download larger areas as needed
5. **Iterate** - Improve model and expand dataset

### 💡 **Pro Tips:**
- **Save this notebook** - It contains all your progress and settings
- **Monitor disk space** - Large extractions can use significant storage
- **Use version control** - Track your annotation and model development
- **Start with urban areas** - More intersections for training

**Happy intersection detecting! 🚗🛣️**