# Geospatial Data Processing Pipeline

## Key Features
- **Overture Maps download** via DuckDB with bounding box filtering
- **Multi-format conversion** (Shapefile, GeoPackage, etc.) to GeoJSON
- **Automated PMTiles generation** with tippecanoe settings per geometry type and/or theme

## Processing Steps
1. **Download** - Fetch Overture Maps data for specified extent
2. **Convert** - Transform custom spatial data to GeoJSON format
3. **Tile** - Generate PMTiles using tippecanoe with custom settings

## Prerequisites
- Python with required packages (duckdb, tqdm, pathlib)
- Tippecanoe installed and available in PATH
- GDAL/OGR for geospatial format conversion

In [11]:
# Import the three modular processing scripts
import sys
import os
from pathlib import Path
import json
import time

# Add the processing directory to Python path
processing_dir = Path("./processing")
if str(processing_dir) not in sys.path:
    sys.path.append(str(processing_dir))

# Import modular processing scripts
try:
    from downloadOverture import download_overture_data
    from convertCustomData import convert_file
    from runCreateTiles import process_to_tiles, create_tilejson
    print("Successfully imported all processing modules")
except ImportError as e:
    print(f"Error importing modules: {e}")
    print("Make sure the processing scripts are in the ./processing directory")

# Import additional libraries for visualization and analysis
import pandas as pd
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

Successfully imported all processing modules


## 1. Project Configuration and Paths

Configure the project directories and processing parameters for the pipeline.

In [12]:
# Configuration - All paths and parameters centralized
from pathlib import Path

# Define all project paths
PROJECT_ROOT = Path(__file__).resolve().parent.parent if '__file__' in globals() else Path.cwd().parent
PROCESSING_DIR = PROJECT_ROOT / "processing"
DATA_DIR = PROCESSING_DIR / "data"
OVERTURE_DATA_DIR = DATA_DIR / "raw" / "overture"
CUSTOM_DATA_DIR = DATA_DIR / "raw" / "grid3"
OUTPUT_DIR = DATA_DIR / "processed"
TILE_DIR = DATA_DIR / "tiles"
PUBLIC_TILES_DIR = PROJECT_ROOT / "public" / "tiles"

CONFIG = {
    "paths": {
        "project_root": PROJECT_ROOT,
        "processing_dir": PROCESSING_DIR,
        "data_dir": DATA_DIR,
        "overture_data_dir": OVERTURE_DATA_DIR,
        "custom_data_dir": CUSTOM_DATA_DIR,
        "tile_dir": TILE_DIR,
        "output_dir" : OUTPUT_DIR,
        "public_tiles_dir": PUBLIC_TILES_DIR,
        "template_path": PROCESSING_DIR / "tileQueries.template"
    },
    "extent": {
        "coordinates": (22.0, -6.0, 24.0, -4.0),  # kasai-oriental
        "buffer_degrees": 0.2
    },
    "download": {
        "verbose": True,
        "output_formats": ["*.geojson", "*.geojsonseq"]
    },
    "conversion": {
        "input_patterns": ["*.shp", "*.gpkg", "*.gdb", "*.sqlite", "*.db", "*.geojson", "*.json"],
        "output_suffix": ".geojsonseq",
        "reproject_crs": "EPSG:4326",
        "overwrite": True,
        "verbose": True
    },
    "tiling": {
        "input_dirs": [CUSTOM_DATA_DIR, OVERTURE_DATA_DIR],  # Search in both data directories
        "output_dir": TILE_DIR,  # Explicit output directory for PMTiles
        "parallel": True,
        "overwrite": True,
        "verbose": True,
        "create_tilejson": True,
        "filter_pattern": None  # Optional: filter files by pattern
    }
}

# Create necessary directories
for path_key, path_value in CONFIG["paths"].items():
    if path_key.endswith("_dir") and path_value:
        path_value.mkdir(parents=True, exist_ok=True)

# Display configuration summary
print("PROJECT CONFIGURATION INITIALIZED")
print("=" * 50)
print(f"Project root: {CONFIG['paths']['project_root']}")
print(f"Processing directory: {CONFIG['paths']['processing_dir']}")
print(f"Data directory: {CONFIG['paths']['data_dir']}")
print(f"Output directory: {CONFIG['paths']['output_dir']}")
print(f"Overture data directory: {CONFIG['paths']['overture_data_dir']}")
print(f"Custom data directory: {CONFIG['paths']['custom_data_dir']}")
print(f"Tile output directory: {CONFIG['paths']['tile_dir']}")
print(f"Public tiles directory: {CONFIG['paths']['public_tiles_dir']}")
print()
print(f"Processing extent: {CONFIG['extent']['coordinates']}")
print(f"Buffer degrees: {CONFIG['extent']['buffer_degrees']}")
print(f"Area: {(CONFIG['extent']['coordinates'][2] - CONFIG['extent']['coordinates'][0]) * (CONFIG['extent']['coordinates'][3] - CONFIG['extent']['coordinates'][1]):.2f} degree²")
print()
print("All directories created and configuration loaded")
print("All modular functions will use CONFIG parameters instead of hardcoded defaults")

PROJECT CONFIGURATION INITIALIZED
Project root: /Users/matthewheaton/GitHub/basemap
Processing directory: /Users/matthewheaton/GitHub/basemap/processing
Data directory: /Users/matthewheaton/GitHub/basemap/processing/data
Output directory: /Users/matthewheaton/GitHub/basemap/processing/data/processed
Overture data directory: /Users/matthewheaton/GitHub/basemap/processing/data/raw/overture
Custom data directory: /Users/matthewheaton/GitHub/basemap/processing/data/raw/grid3
Tile output directory: /Users/matthewheaton/GitHub/basemap/processing/data/tiles
Public tiles directory: /Users/matthewheaton/GitHub/basemap/public/tiles

Processing extent: (22.0, -6.0, 24.0, -4.0)
Buffer degrees: 0.2
Area: 4.00 degree²

All directories created and configuration loaded
All modular functions will use CONFIG parameters instead of hardcoded defaults


## 2. Download Overture Data with DuckDB

Use the `downloadOverture.py` module to fetch geospatial data from Overture Maps. This module uses DuckDB to efficiently query and download data for specific geographic extents.

In [3]:
# Download Overture Maps data
print("=== STEP 1: DOWNLOADING OVERTURE DATA ===")
download_results = download_overture_data(
    extent=CONFIG["extent"]["coordinates"],
    buffer_degrees=CONFIG["extent"]["buffer_degrees"],
    template_path=str(CONFIG["paths"]["template_path"]),
    verbose=CONFIG["download"]["verbose"],
    project_root=str(CONFIG["paths"]["project_root"]),
    overture_data_dir=str(CONFIG["paths"]["overture_data_dir"])
)

print(f"Download completed: {download_results['success']}")
print(f"Sections processed: {download_results['processed_sections']}")
if download_results["errors"]:
    print(f"Errors encountered: {len(download_results['errors'])}")
    for error in download_results["errors"]:
        print(f"  - {error}")
print()

=== STEP 1: DOWNLOADING OVERTURE DATA ===
=== DOWNLOADING SOURCE DATA ===
Raw extent: (22.0, -6.0, 24.0, -4.0)
Snapped extent: (21.09375, -7.0136679275666305, 25.3125, -2.8113711933311296)
Map extent: 21.09375, -7.0136679275666305 to 25.3125, -2.8113711933311296
Download extent (buffered): 20.89375, -7.213667927566631 to 25.5125, -2.6113711933311294
Buffer: 0.2 degrees (~22.2km)



Overall progress:   0%|          | 0/10 [00:00<?, ?section/s]

Executing Section 1: base/land
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=base/type=land/*
  -> Output: land.geojsonseq


Overall progress:  10%|█         | 1/10 [00:43<06:28, 43.15s/section]

Executing Section 2: base/land_use
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=base/type=land_use/*
  -> Output: land_use.geojsonseq


Overall progress:  20%|██        | 2/10 [01:16<05:00, 37.56s/section]

Executing Section 3: base/land_use
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=base/type=land_use/*
  -> Output: land_residential.geojsonseq


Overall progress:  30%|███       | 3/10 [01:21<02:36, 22.35s/section]

Executing Section 4: base/water
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=base/type=water/*
  -> Output: water.geojsonseq


Overall progress:  40%|████      | 4/10 [02:19<03:40, 36.69s/section]

Executing Section 5: transportation/segment
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=transportation/type=segment/*
  -> Output: roads.geojsonseq


Overall progress:  50%|█████     | 5/10 [04:24<05:42, 68.60s/section]

Executing Section 6: buildings/building
  -> Querying: az://overturemapswestus2.blob.core.windows.net/release/2025-06-25.0/theme=buildings/type=building/*
  -> Output: buildings.geojsonseq


Overall progress:  60%|██████    | 6/10 [09:47<10:20, 155.07s/section]

Executing Section 7: admins/locality
  -> Querying: az://overturemapswestus2.blob.core.windows.net/release/2024-04-16-beta.0/theme=admins/type=locality/*
  -> Output: placenames.geojson


Overall progress:  70%|███████   | 7/10 [09:53<05:18, 106.22s/section]

Executing Section 8: unknown
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=places/*/*
  -> Output: places.geojson


Overall progress:  80%|████████  | 8/10 [10:04<02:31, 75.97s/section] 

Executing Section 9: base/land_cover
  -> Querying: az://overturemapswestus2.blob.core.windows.net/release/2025-06-25.0/theme=base/type=land_cover/*
  -> Output: land_cover.geojsonseq


Overall progress:  90%|█████████ | 9/10 [12:48<01:43, 103.57s/section]

Executing Section 10: base/infrastructure
  -> Querying: az://overturemapswestus2.blob.core.windows.net/release/2025-06-25.0/theme=base/type=infrastructure/*
  -> Output: infrastructure.geojsonseq


Overall progress: 100%|██████████| 10/10 [13:02<00:00, 78.23s/section]

=== SOURCE DATA DOWNLOAD COMPLETE ===

Download completed: True
Sections processed: 10






In [None]:
# Check what files were created during download
print("=== CHECKING DOWNLOADED FILES ===")

overture_files = []
search_dirs = [CONFIG["paths"]["data_dir"], CONFIG["paths"]["overture_data_dir"]]

for data_dir in search_dirs:
    if data_dir.exists():
        for pattern in CONFIG["download"]["output_formats"]:
            files = list(data_dir.glob(pattern))
            overture_files.extend(files)

print(f"Found {len(overture_files)} downloaded files:")
for file in sorted(overture_files):
    file_size = file.stat().st_size / 1024 / 1024  # Size in MB
    print(f"  {file.name} ({file_size:.1f} MB)")

# Display file statistics
if overture_files:
    total_size_mb = sum(f.stat().st_size for f in overture_files) / 1024 / 1024
    print(f"\nTotal size: {total_size_mb:.1f} MB")
    print(f"Search directories: {[str(d) for d in search_dirs]}")
else:
    print("No files found. Check download results above.")
    print(f"Searched in: {[str(d) for d in search_dirs]}")

## 3. Convert Custom Spatial Data for Tippecanoe

Use the `convertCustomData.py` module to convert various geospatial formats to newline-delimited GeoJSON files suitable for Tippecanoe 

### Supported Input Formats
- Shapefile (.shp)
- GeoPackage (.gpkg)
- FileGDB (.gdb)
- SQLite/SpatiaLite (.sqlite, .db)
- PostGIS (connection string)
- CSV with geometry columns

In [7]:
# Look for custom data files to convert
print("=== STEP 3: CONVERTING CUSTOM SPATIAL DATA ===")

custom_input_dir = CONFIG["paths"]["custom_data_dir"]
custom_files = []

# Search for various spatial data formats using CONFIG patterns
for pattern in CONFIG["conversion"]["input_patterns"]:
    custom_files.extend(custom_input_dir.glob(pattern))

print(f"Found {len(custom_files)} custom data files to convert:")
print(f"Search directory: {custom_input_dir}")
for file in custom_files:
    print(f"  {file.name}")

# Convert custom data files (if any exist)
converted_files = []

for input_file in custom_files:
    output_file = CONFIG["paths"]["output_dir"] / f"{input_file.stem}{CONFIG['conversion']['output_suffix']}"
    
    print(f"Converting {input_file.name}...")
    
    try:
        # Convert using the modular function with CONFIG settings
        processed, skipped, output_path = convert_file(
            input_path=str(input_file),
            output_path=str(output_file),
            reproject=CONFIG["conversion"]["reproject_crs"],
            verbose=CONFIG["conversion"]["verbose"]
        )
        
        converted_files.append(output_file)
        print(f"✓ Converted: {processed} features, {skipped} skipped")
        print(f"  Output: {output_file.name}")
        
    except Exception as e:
        print(f"✗ Error converting {input_file.name}: {e}")

if converted_files:
    print(f"\n✓ Successfully converted {len(converted_files)} files")
    print(f"  Output directory: {CONFIG['paths']['output_dir']}")
else:
    print(f"\nNo custom files to convert. Add data files to: {custom_input_dir}")
    print(f"Supported formats: {', '.join(CONFIG['conversion']['input_patterns'])}")

=== STEP 3: CONVERTING CUSTOM SPATIAL DATA ===
Found 5 custom data files to convert:
Search directory: /Users/matthewheaton/GitHub/basemap/processing/data/raw/grid3
  GRID3_COD_Settlement_Extents_v3_1.gpkg
  GRID3_COD_health_zones_v5_0.geojson
  GRID3_COD_health_facilities_v5_0.geojson
  GRID3_COD_health_areas_v5_0.geojson
  GRID3_COD_settlement_names_v5_0.geojson
Converting GRID3_COD_Settlement_Extents_v3_1.gpkg...
Processing 572537 features


Converting:   2%|▏         | 10507/572537 [00:03<02:24, 3901.34features/s]

Batch processed: 10000 features, 2980.6 features/sec


Converting:   4%|▎         | 20405/572537 [00:06<03:09, 2910.50features/s]

Batch processed: 20000 features, 3313.4 features/sec


Converting:   5%|▌         | 30799/572537 [00:10<02:24, 3739.74features/s]

Batch processed: 30000 features, 2884.4 features/sec


Converting:   7%|▋         | 40059/572537 [00:13<02:07, 4188.34features/s]

Batch processed: 40000 features, 3534.8 features/sec


Converting:   9%|▉         | 50538/572537 [00:16<02:10, 3993.55features/s]

Batch processed: 50000 features, 3342.5 features/sec


Converting:  11%|█         | 60499/572537 [00:19<03:56, 2168.49features/s]

Batch processed: 60000 features, 3040.2 features/sec


Converting:  12%|█▏        | 70655/572537 [00:21<01:59, 4193.79features/s]

Batch processed: 70000 features, 4067.1 features/sec


Converting:  14%|█▍        | 80621/572537 [00:24<01:55, 4247.71features/s]

Batch processed: 80000 features, 3968.8 features/sec


Converting:  16%|█▌        | 90356/572537 [00:27<02:06, 3821.94features/s]

Batch processed: 90000 features, 3806.8 features/sec


Converting:  18%|█▊        | 100764/572537 [00:29<01:57, 4011.80features/s]

Batch processed: 100000 features, 3758.5 features/sec


Converting:  19%|█▉        | 110616/572537 [00:32<01:54, 4035.60features/s]

Batch processed: 110000 features, 3976.8 features/sec


Converting:  21%|██        | 120415/572537 [00:34<02:06, 3563.75features/s]

Batch processed: 120000 features, 3709.2 features/sec


Converting:  23%|██▎       | 130504/572537 [00:37<02:08, 3435.68features/s]

Batch processed: 130000 features, 3685.5 features/sec


Converting:  25%|██▍       | 140570/572537 [00:40<02:18, 3115.23features/s]

Batch processed: 140000 features, 3626.3 features/sec


Converting:  26%|██▋       | 150549/572537 [00:43<01:36, 4359.12features/s]

Batch processed: 150000 features, 3846.9 features/sec


Converting:  28%|██▊       | 160756/572537 [00:46<02:01, 3400.99features/s]

Batch processed: 160000 features, 3124.0 features/sec


Converting:  30%|██▉       | 170292/572537 [00:48<01:52, 3567.78features/s]

Batch processed: 170000 features, 3609.7 features/sec


Converting:  32%|███▏      | 180452/572537 [00:51<01:58, 3305.65features/s]

Batch processed: 180000 features, 3549.2 features/sec


Converting:  33%|███▎      | 190562/572537 [00:54<01:40, 3802.15features/s]

Batch processed: 190000 features, 3643.4 features/sec


Converting:  35%|███▍      | 200359/572537 [00:57<01:46, 3488.10features/s]

Batch processed: 200000 features, 3771.9 features/sec


Converting:  37%|███▋      | 210571/572537 [00:59<01:48, 3333.78features/s]

Batch processed: 210000 features, 3640.2 features/sec


Converting:  39%|███▊      | 220577/572537 [01:02<01:53, 3088.24features/s]

Batch processed: 220000 features, 3464.7 features/sec


Converting:  40%|████      | 230558/572537 [01:05<02:05, 2734.47features/s]

Batch processed: 230000 features, 3386.0 features/sec


Converting:  42%|████▏     | 240693/572537 [01:08<01:25, 3892.45features/s]

Batch processed: 240000 features, 3942.2 features/sec


Converting:  44%|████▎     | 250466/572537 [01:11<01:36, 3349.16features/s]

Batch processed: 250000 features, 3652.6 features/sec


Converting:  45%|████▌     | 260425/572537 [01:13<01:24, 3672.63features/s]

Batch processed: 260000 features, 4001.7 features/sec


Converting:  47%|████▋     | 270811/572537 [01:16<01:12, 4145.29features/s]

Batch processed: 270000 features, 3424.2 features/sec


Converting:  49%|████▉     | 280380/572537 [01:19<01:38, 2965.25features/s]

Batch processed: 280000 features, 3125.4 features/sec


Converting:  51%|█████     | 290751/572537 [01:22<01:21, 3468.64features/s]

Batch processed: 290000 features, 3278.9 features/sec


Converting:  52%|█████▏    | 300268/572537 [01:25<01:08, 3990.15features/s]

Batch processed: 300000 features, 3165.4 features/sec


Converting:  54%|█████▍    | 310451/572537 [01:28<01:05, 3973.87features/s]

Batch processed: 310000 features, 3455.8 features/sec


Converting:  56%|█████▌    | 320466/572537 [01:32<01:08, 3694.44features/s]

Batch processed: 320000 features, 2995.2 features/sec


Converting:  58%|█████▊    | 330378/572537 [01:34<01:08, 3510.53features/s]

Batch processed: 330000 features, 3677.8 features/sec


Converting:  59%|█████▉    | 340618/572537 [01:37<01:06, 3484.77features/s]

Batch processed: 340000 features, 3619.7 features/sec


Converting:  61%|██████    | 350334/572537 [01:40<01:08, 3236.08features/s]

Batch processed: 350000 features, 3552.7 features/sec


Converting:  63%|██████▎   | 360517/572537 [01:43<01:04, 3288.45features/s]

Batch processed: 360000 features, 3076.4 features/sec


Converting:  65%|██████▍   | 370348/572537 [01:46<00:55, 3654.02features/s]

Batch processed: 370000 features, 3819.2 features/sec


Converting:  66%|██████▋   | 380368/572537 [01:49<00:59, 3255.34features/s]

Batch processed: 380000 features, 3215.3 features/sec


Converting:  68%|██████▊   | 390628/572537 [01:51<00:48, 3719.51features/s]

Batch processed: 390000 features, 4015.7 features/sec


Converting:  70%|███████   | 400785/572537 [01:54<00:41, 4093.28features/s]

Batch processed: 400000 features, 3959.6 features/sec


Converting:  72%|███████▏  | 410361/572537 [01:57<00:48, 3335.12features/s]

Batch processed: 410000 features, 3805.5 features/sec


Converting:  73%|███████▎  | 420459/572537 [01:59<00:39, 3896.59features/s]

Batch processed: 420000 features, 3857.8 features/sec


Converting:  75%|███████▌  | 430652/572537 [02:02<00:40, 3489.33features/s]

Batch processed: 430000 features, 3296.8 features/sec


Converting:  77%|███████▋  | 440782/572537 [02:05<00:32, 4036.67features/s]

Batch processed: 440000 features, 3369.9 features/sec


Converting:  79%|███████▊  | 450600/572537 [02:08<00:35, 3446.26features/s]

Batch processed: 450000 features, 3706.7 features/sec


Converting:  80%|████████  | 460519/572537 [02:11<00:35, 3127.56features/s]

Batch processed: 460000 features, 3297.6 features/sec


Converting:  82%|████████▏ | 470742/572537 [02:14<00:29, 3506.02features/s]

Batch processed: 470000 features, 2983.0 features/sec


Converting:  84%|████████▍ | 480543/572537 [02:17<00:27, 3402.92features/s]

Batch processed: 480000 features, 3339.7 features/sec


Converting:  86%|████████▌ | 490515/572537 [02:20<00:31, 2634.99features/s]

Batch processed: 490000 features, 3154.6 features/sec


Converting:  87%|████████▋ | 500359/572537 [02:23<00:20, 3515.29features/s]

Batch processed: 500000 features, 3325.7 features/sec


Converting:  89%|████████▉ | 510605/572537 [02:26<00:15, 3995.26features/s]

Batch processed: 510000 features, 3562.7 features/sec


Converting:  91%|█████████ | 520485/572537 [02:29<00:15, 3467.85features/s]

Batch processed: 520000 features, 3427.1 features/sec


Converting:  93%|█████████▎| 530565/572537 [02:32<00:11, 3686.76features/s]

Batch processed: 530000 features, 3734.7 features/sec


Converting:  94%|█████████▍| 540604/572537 [02:35<00:08, 3887.04features/s]

Batch processed: 540000 features, 3495.1 features/sec


Converting:  96%|█████████▌| 550635/572537 [02:37<00:05, 3732.95features/s]

Batch processed: 550000 features, 3709.3 features/sec


Converting:  98%|█████████▊| 560635/572537 [02:40<00:03, 3535.60features/s]

Batch processed: 560000 features, 3325.3 features/sec


Converting: 100%|█████████▉| 570445/572537 [02:44<00:01, 1901.28features/s]

Batch processed: 570000 features, 2735.6 features/sec


Converting: 100%|██████████| 572537/572537 [02:45<00:00, 3456.07features/s]


Conversion complete: 572537 features processed, 0 features skipped
Output written to: /Users/matthewheaton/GitHub/basemap/processing/data/processed/GRID3_COD_Settlement_Extents_v3_1.geojsonseq
✓ Converted: 572537 features, 0 skipped
  Output: GRID3_COD_Settlement_Extents_v3_1.geojsonseq
Converting GRID3_COD_health_zones_v5_0.geojson...
Processing 329 features


Converting: 100%|██████████| 329/329 [00:06<00:00, 52.44features/s] 


Conversion complete: 329 features processed, 0 features skipped
Output written to: /Users/matthewheaton/GitHub/basemap/processing/data/processed/GRID3_COD_health_zones_v5_0.geojsonseq
✓ Converted: 329 features, 0 skipped
  Output: GRID3_COD_health_zones_v5_0.geojsonseq
Converting GRID3_COD_health_facilities_v5_0.geojson...
Processing 27213 features


Converting:  44%|████▍     | 11922/27213 [00:01<00:01, 12020.57features/s]

Batch processed: 10000 features, 11909.6 features/sec


Converting:  78%|███████▊  | 21341/27213 [00:01<00:00, 11304.19features/s]

Batch processed: 20000 features, 11353.2 features/sec


Converting: 100%|██████████| 27213/27213 [00:02<00:00, 11678.06features/s]


Conversion complete: 27213 features processed, 0 features skipped
Output written to: /Users/matthewheaton/GitHub/basemap/processing/data/processed/GRID3_COD_health_facilities_v5_0.geojsonseq
✓ Converted: 27213 features, 0 skipped
  Output: GRID3_COD_health_facilities_v5_0.geojsonseq
Converting GRID3_COD_health_areas_v5_0.geojson...
Processing 5978 features


Converting: 100%|██████████| 5978/5978 [00:23<00:00, 249.66features/s] 


Conversion complete: 5978 features processed, 0 features skipped
Output written to: /Users/matthewheaton/GitHub/basemap/processing/data/processed/GRID3_COD_health_areas_v5_0.geojsonseq
✓ Converted: 5978 features, 0 skipped
  Output: GRID3_COD_health_areas_v5_0.geojsonseq
Converting GRID3_COD_settlement_names_v5_0.geojson...
Processing 85680 features


Converting:  15%|█▍        | 12815/85680 [00:00<00:05, 14237.14features/s]

Batch processed: 10000 features, 14089.7 features/sec


Converting:  25%|██▌       | 21540/85680 [00:01<00:04, 14505.22features/s]

Batch processed: 20000 features, 14318.0 features/sec


Converting:  37%|███▋      | 31685/85680 [00:02<00:03, 14047.40features/s]

Batch processed: 30000 features, 14384.1 features/sec


Converting:  49%|████▊     | 41626/85680 [00:02<00:03, 14023.09features/s]

Batch processed: 40000 features, 14055.8 features/sec


Converting:  60%|██████    | 51539/85680 [00:03<00:02, 13889.56features/s]

Batch processed: 50000 features, 13998.8 features/sec


Converting:  73%|███████▎  | 62857/85680 [00:04<00:01, 14220.17features/s]

Batch processed: 60000 features, 13976.3 features/sec


Converting:  85%|████████▌ | 72829/85680 [00:05<00:00, 14144.68features/s]

Batch processed: 70000 features, 14111.5 features/sec


Converting:  97%|█████████▋| 82788/85680 [00:05<00:00, 14171.48features/s]

Batch processed: 80000 features, 14179.9 features/sec


Converting: 100%|██████████| 85680/85680 [00:06<00:00, 14138.25features/s]

Conversion complete: 85680 features processed, 0 features skipped
Output written to: /Users/matthewheaton/GitHub/basemap/processing/data/processed/GRID3_COD_settlement_names_v5_0.geojsonseq
✓ Converted: 85680 features, 0 skipped
  Output: GRID3_COD_settlement_names_v5_0.geojsonseq

✓ Successfully converted 5 files
  Output directory: /Users/matthewheaton/GitHub/basemap/processing/data/processed





## 4. Process GeoJSON/GeoJSONSeq to PMTiles

Use the `runCreateTiles.py` module to convert GeoJSON and GeoJSONSeq files to PMTiles using optimized Tippecanoe settings.

### Automatic Optimization Features
- **Geometry Detection**: Automatically detects Point, LineString, or Polygon geometries
- **Layer-Specific Settings**: Optimized settings for water, roads, places, land use, etc.
- **Parallel Processing**: Multi-threaded processing for large datasets
- **Quality Optimization**: Smart simplification and feature dropping

In [16]:
# Step 4: Process all GeoJSON/GeoJSONSeq files to PMTiles
print("=== STEP 4: PROCESSING TO PMTILES ===")

# Process all downloaded and converted files to PMTiles using CONFIG settings
tiling_results = process_to_tiles(
    extent=CONFIG["extent"]["coordinates"],
    input_dirs=[str(d) for d in CONFIG["tiling"]["input_dirs"]],  # Convert Path objects to strings
    filter_pattern=CONFIG["tiling"]["filter_pattern"],  # Pass filter pattern from CONFIG
    output_dir=str(CONFIG["tiling"]["output_dir"]),  # Use explicit output directory from CONFIG
    parallel=CONFIG["tiling"]["parallel"],
    verbose=CONFIG["tiling"]["verbose"]
)

print(f"Tiling completed: {tiling_results['success']}")
print(f"Files processed: {len(tiling_results['processed_files'])}/{tiling_results['total_files']}")

if tiling_results["errors"]:
    print(f"Errors encountered: {len(tiling_results['errors'])}")
    for error in tiling_results["errors"]:
        print(f"  - {error}")

# Display generated PMTiles files
if tiling_results["processed_files"]:
    print(f"\n✓ Successfully generated {len(tiling_results['processed_files'])} PMTiles:")
    
    pmtiles_files = list(CONFIG["paths"]["tile_dir"].glob("*.pmtiles"))
    
    total_size_mb = 0
    for pmtile in sorted(pmtiles_files):
        size_mb = pmtile.stat().st_size / 1024 / 1024
        total_size_mb += size_mb
        print(f"  {pmtile.name} ({size_mb:.1f} MB)")
    
    print(f"\nTotal PMTiles size: {total_size_mb:.1f} MB")
    print(f"Files location: {CONFIG['paths']['tile_dir']}")
    
else:
    print("\nNo PMTiles files were generated. Check the errors above.")
    print(f"Make sure you have GeoJSON/GeoJSONSeq files in: {[str(d) for d in CONFIG['tiling']['input_dirs']]}")

=== STEP 4: PROCESSING TO PMTILES ===
=== PROCESSING TO TILES ===
Found 14 files to process:
  GRID3_COD_health_zones_v5_0.geojson
  GRID3_COD_health_facilities_v5_0.geojson
  GRID3_COD_health_areas_v5_0.geojson
  GRID3_COD_settlement_names_v5_0.geojson
  placenames.geojson
  places.geojson
  land_use.geojsonseq
  land_residential.geojsonseq
  land_cover.geojsonseq
  water.geojsonseq
  land.geojsonseq
  buildings.geojsonseq
  infrastructure.geojsonseq
  roads.geojsonseq


Processing files:   0%|          | 0/14 [00:00<?, ?file/s]

  Detected geometry type: Point for GRID3_COD_health_facilities_v5_0.geojson (0.378s)


Processing files:   7%|▋         | 1/14 [00:02<00:34,  2.66s/file]

✓ GRID3_COD_health_facilities_v5_0.geojson -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/GRID3_COD_health_facilities_v5_0.pmtiles


Processing files:  14%|█▍        | 2/14 [00:03<00:16,  1.35s/file]

✓ placenames.geojson -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/placenames.pmtiles


Processing files:  21%|██▏       | 3/14 [00:03<00:09,  1.11file/s]

✓ places.geojson -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/places.pmtiles


Processing files:  29%|██▊       | 4/14 [00:03<00:06,  1.57file/s]

✓ land_use.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/land_use.pmtiles


Processing files:  36%|███▌      | 5/14 [00:04<00:06,  1.31file/s]

✓ land_residential.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/land_residential.pmtiles


Processing files:  43%|████▎     | 6/14 [00:05<00:05,  1.57file/s]

✓ GRID3_COD_settlement_names_v5_0.geojson -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/GRID3_COD_settlement_names_v5_0.pmtiles
  Detected geometry type: Mixed for GRID3_COD_health_zones_v5_0.geojson (1.544s)


Processing files:  50%|█████     | 7/14 [00:07<00:08,  1.19s/file]

✓ water.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/water.pmtiles


Processing files:  57%|█████▋    | 8/14 [00:13<00:17,  2.84s/file]

✓ land.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/land.pmtiles


Processing files:  64%|██████▍   | 9/14 [00:15<00:13,  2.62s/file]

✓ GRID3_COD_health_zones_v5_0.geojson -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/GRID3_COD_health_zones_v5_0.pmtiles


Processing files:  71%|███████▏  | 10/14 [00:16<00:07,  1.90s/file]

✓ infrastructure.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/infrastructure.pmtiles


Processing files:  79%|███████▊  | 11/14 [00:19<00:06,  2.19s/file]

✓ roads.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/roads.pmtiles


Processing files:  86%|████████▌ | 12/14 [00:26<00:07,  3.93s/file]

✓ land_cover.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/land_cover.pmtiles
  Detected geometry type: Polygon for GRID3_COD_health_areas_v5_0.geojson (5.849s)


Processing files:  93%|█████████▎| 13/14 [00:39<00:06,  6.62s/file]

✓ GRID3_COD_health_areas_v5_0.geojson -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/GRID3_COD_health_areas_v5_0.pmtiles


Processing files: 100%|██████████| 14/14 [00:57<00:00,  4.14s/file]

✓ buildings.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/data/tiles/buildings.pmtiles

=== TILE PROCESSING COMPLETE ===
Processed: 14/14 files
Tiling completed: True
Files processed: 14/14

✓ Successfully generated 14 PMTiles:
  GRID3_COD_health_areas_v5_0.pmtiles (8.7 MB)
  GRID3_COD_health_facilities_v5_0.pmtiles (1.9 MB)
  GRID3_COD_health_zones_v5_0.pmtiles (2.6 MB)
  GRID3_COD_settlement_names_v5_0.pmtiles (2.1 MB)
  buildings.pmtiles (34.8 MB)
  infrastructure.pmtiles (0.1 MB)
  land.pmtiles (0.3 MB)
  land_cover.pmtiles (29.4 MB)
  land_residential.pmtiles (1.3 MB)
  land_use.pmtiles (0.0 MB)
  placenames.pmtiles (0.3 MB)
  places.pmtiles (0.2 MB)
  roads.pmtiles (6.5 MB)
  water.pmtiles (2.0 MB)

Total PMTiles size: 90.2 MB
Files location: /Users/matthewheaton/GitHub/basemap/processing/data/tiles





## 5. Create TileJSON Metadata

Generate TileJSON metadata files for seamless integration with web mapping libraries like MapLibre GL JS.

### TileJSON Features
- **Bounds and zoom levels** automatically detected from PMTiles
- **Vector layer definitions** for each data layer
- **MapLibre GL JS compatibility** for easy web integration
- **PMTiles URL references** for efficient tile serving

In [17]:
# Step 5: Create TileJSON metadata for MapLibre integration
print("=== STEP 5: CREATING TILEJSON METADATA ===")

# Check if PMTiles files exist in the configured tile directory
pmtiles_files = list(CONFIG["paths"]["tile_dir"].glob("*.pmtiles"))

if pmtiles_files:
    print(f"Found {len(pmtiles_files)} PMTiles files, creating TileJSON...")
    
    try:
        tilejson = create_tilejson(
            tile_dir=str(CONFIG["paths"]["tile_dir"]),  # Explicitly pass tile directory
            extent=CONFIG["extent"]["coordinates"],  # Pass extent from CONFIG
            output_file=str(CONFIG["paths"]["tile_dir"] / "tilejson.json")  # Explicitly pass output file path
        )
        
        print("✓ TileJSON created successfully")
        print(f"  Bounds: {tilejson['bounds']}")
        print(f"  Zoom range: {tilejson['minzoom']} - {tilejson['maxzoom']}")
        print(f"  Vector layers: {len(tilejson['vector_layers'])}")
        print(f"  Output file: {CONFIG['paths']['tile_dir'] / 'tilejson.json'}")
        
        # Show a summary of all output files
        print(f"\nComplete output summary:")
        total_size_mb = 0
        for pmtile in sorted(pmtiles_files):
            size_mb = pmtile.stat().st_size / 1024 / 1024
            total_size_mb += size_mb
            print(f"  {pmtile.name} ({size_mb:.1f} MB)")
        
        print(f"  tilejson.json")
        print(f"\nTotal PMTiles size: {total_size_mb:.1f} MB")
        print(f"All files location: {CONFIG['paths']['tile_dir']}")
        
    except Exception as e:
        print(f"✗ TileJSON creation failed: {e}")
        
else:
    print("No PMTiles files found in output directory.")
    print(f"Expected location: {CONFIG['paths']['tile_dir']}")
    print("Run Step 4 first to generate PMTiles files.")

=== STEP 5: CREATING TILEJSON METADATA ===
Found 14 PMTiles files, creating TileJSON...
TileJSON created: /Users/matthewheaton/GitHub/basemap/processing/data/tiles/tilejson.json
Found 14 PMTiles files
✓ TileJSON created successfully
  Bounds: [22.0, -6.0, 24.0, -4.0]
  Zoom range: 0 - 16
  Vector layers: 14
  Output file: /Users/matthewheaton/GitHub/basemap/processing/data/tiles/tilejson.json

Complete output summary:
  GRID3_COD_health_areas_v5_0.pmtiles (8.7 MB)
  GRID3_COD_health_facilities_v5_0.pmtiles (1.9 MB)
  GRID3_COD_health_zones_v5_0.pmtiles (2.6 MB)
  GRID3_COD_settlement_names_v5_0.pmtiles (2.1 MB)
  buildings.pmtiles (34.8 MB)
  infrastructure.pmtiles (0.1 MB)
  land.pmtiles (0.3 MB)
  land_cover.pmtiles (29.4 MB)
  land_residential.pmtiles (1.3 MB)
  land_use.pmtiles (0.0 MB)
  placenames.pmtiles (0.3 MB)
  places.pmtiles (0.2 MB)
  roads.pmtiles (6.5 MB)
  water.pmtiles (2.0 MB)
  tilejson.json

Total PMTiles size: 90.2 MB
All files location: /Users/matthewheaton/GitHub

## 6. Validate and Test Individual Steps

Test each processing step individually and validate the generated outputs.

In [None]:
# Individual Step Testing and Validation

print("INDIVIDUAL STEP TESTING")
print("=" * 50)

print("\n1. Test downloadOverture.py standalone:")
print("python processing/downloadOverture.py --extent='23.4,-6.2,23.8,-5.8' --buffer=0.1")

print("\n2. Test convertCustomData.py standalone:")
print("python processing/convertCustomData.py input.shp output.geojsonseq --reproject=EPSG:4326")

print("\n3. Test runCreateTiles.py standalone:")
print("python processing/runCreateTiles.py --extent='23.4,-6.2,23.8,-5.8' --create-tilejson")

print("\n4. Test individual steps in this notebook:")
print("   - Step 1: Download section (cell 6)")
print("   - Step 2: Check downloaded files (cell 7)")
print("   - Step 3: Convert custom data (cell 9)")
print("   - Step 4: Process to PMTiles (cell 11)")
print("   - Step 5: Create TileJSON (cell 13)")

print("\n5. Validate outputs using CONFIG paths:")
print(f"   - Check {CONFIG['paths']['data_dir']} for GeoJSON files")
print(f"   - Check {CONFIG['paths']['tile_dir']} for PMTiles files")
print(f"   - Verify TileJSON metadata file")

# Configuration validation using centralized CONFIG
print("\nCURRENT CONFIGURATION VALIDATION")
print("=" * 50)
print(f"Extent: {CONFIG['extent']['coordinates']}")
print(f"Buffer: {CONFIG['extent']['buffer_degrees']} degrees")
print(f"Tile output directory: {CONFIG['paths']['tile_dir']}")
print(f"Custom data directory: {CONFIG['paths']['custom_data_dir']}")
print(f"Input directories for tiling: {[str(d) for d in CONFIG['tiling']['input_dirs']]}")

# Area calculation using CONFIG
extent = CONFIG['extent']['coordinates']
area = (extent[2] - extent[0]) * (extent[3] - extent[1])
print(f"Processing area: {area:.2f} degree² ({area * 111**2:.0f} km²)")

# Check directory status
print(f"\nDIRECTORY STATUS")
print("=" * 30)
for path_name, path_obj in CONFIG['paths'].items():
    if path_name.endswith('_dir'):
        status = "exists" if path_obj.exists() else "missing"
        file_count = len(list(path_obj.glob("*"))) if path_obj.exists() else 0
        print(f"{path_name}: {status} ({file_count} files)")

print("\nPERFORMANCE OPTIMIZATION TIPS")
print("=" * 50)

print(f"\n1. For large areas (current: {area:.2f} degree²):")
print(f"   - Current buffer: {CONFIG['extent']['buffer_degrees']} degrees")
print(f"   - Parallel processing: {CONFIG['tiling']['parallel']}")
print("   - Consider smaller chunks if memory issues occur")

print("\n2. File management:")
print(f"   - Monitor {CONFIG['paths']['data_dir']} size during processing")
print("   - Clean intermediate files between steps if needed")
print("   - Use filter patterns to process specific layers only")

print("\n3. Output optimization:")
print(f"   - PMTiles output: {CONFIG['paths']['tile_dir']}")
print(f"   - Public tiles: {CONFIG['paths']['public_tiles_dir']}")
print("   - Copy final tiles to public directory for web serving")

# Modular Processing Summary

This notebook provides a complete, step-by-step approach for geospatial data processing with the following capabilities:

## Core Steps
1. **Download Overture Maps data** with spatial filtering using DuckDB
2. **Check and validate** downloaded files 
3. **Convert custom spatial data** to GeoJSON format
4. **Generate PMTiles** using optimized tippecanoe settings
5. **Create TileJSON metadata** for web mapping integration
6. **Validate and test** individual processing steps

## Key Features
- **Modular design** - Each step can be run independently
- **Flexible configuration** - Easy to customize for different areas and data types
- **Interactive development** - Run steps individually for debugging
- **Performance optimized** - Appropriate settings for different geometry types
- **Production ready** - Robust error handling and validation

## Output Files
Each step generates specific outputs that can be directly used:
- **GeoJSON/GeoJSONSeq files** for further processing or analysis
- **PMTiles files** for efficient web mapping
- **TileJSON metadata** for MapLibre GL JS integration

## Usage Patterns
- **Development**: Run steps individually for testing and debugging
- **Production**: Execute all steps in sequence for automated processing
- **Customization**: Modify CONFIG settings and re-run specific steps
- **Integration**: Use generated files with web mapping applications