# Geospatial Data Processing Pipeline

This notebook demonstrates a complete pipeline for processing geospatial data from multiple sources into PMTiles format using:

## Key Features
- **Overture Maps download** via DuckDB with bounding box filtering
- **Multi-format conversion** (Shapefile, GeoPackage, etc.) to GeoJSON
- **Automated PMTiles generation** with optimized tippecanoe settings
- **Modular architecture** for easy customization and debugging
- **Progress tracking** with detailed logging and error handling
- **Geometry-aware processing** with appropriate settings for each data type

## Processing Steps
1. **Download** - Fetch Overture Maps data for specified extent
2. **Convert** - Transform custom spatial data to GeoJSON format
3. **Tile** - Generate PMTiles using tippecanoe with optimized settings

## Prerequisites
- Python with required packages (duckdb, tqdm, pathlib)
- Tippecanoe installed and available in PATH
- GDAL/OGR for geospatial format conversion

In [8]:
# Import the three modular processing scripts
import sys
import os
from pathlib import Path
import json
import time

# Add the processing directory to Python path
processing_dir = Path("./processing")
if str(processing_dir) not in sys.path:
    sys.path.append(str(processing_dir))

# Import our modular processing scripts
try:
    from downloadOverture import download_overture_data
    from convertCustomData import convert_file, convert_to_geojsonseq
    from runCreateTiles import process_to_tiles, create_tilejson
    print("✓ Successfully imported all processing modules")
except ImportError as e:
    print(f"Error importing modules: {e}")
    print("Make sure the processing scripts are in the ./processing directory")

# Import additional libraries for visualization and analysis
import pandas as pd
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

✓ Successfully imported all processing modules


## 1. Project Configuration and Paths

Configure the project directories and processing parameters for the pipeline.

In [9]:
# Configuration - All paths and parameters centralized
from pathlib import Path

# Define all project paths
PROJECT_ROOT = Path(__file__).resolve().parent.parent if '__file__' in globals() else Path.cwd().parent
PROCESSING_DIR = PROJECT_ROOT / "processing"
DATA_DIR = PROCESSING_DIR / "data"
OVERTURE_DATA_DIR = PROJECT_ROOT / "overture" / "data"
CUSTOM_DATA_DIR = PROCESSING_DIR / "input" / "grid3"
TILE_DIR = PROCESSING_DIR / "tiles"
PUBLIC_TILES_DIR = PROJECT_ROOT / "public" / "tiles"

CONFIG = {
    "paths": {
        "project_root": PROJECT_ROOT,
        "processing_dir": PROCESSING_DIR,
        "data_dir": DATA_DIR,
        "overture_data_dir": OVERTURE_DATA_DIR,
        "custom_data_dir": CUSTOM_DATA_DIR,
        "tile_dir": TILE_DIR,
        "public_tiles_dir": PUBLIC_TILES_DIR,
        "template_path": PROCESSING_DIR / "tileQueries.template"
    },
    "extent": {
        "coordinates": (22.0, -6.0, 24.0, -4.0),  # kasai-oriental
        "buffer_degrees": 0.2
    },
    "download": {
        "verbose": True,
        "output_formats": ["*.geojson", "*.geojsonseq"]
    },
    "conversion": {
        "input_patterns": ["*.shp", "*.gpkg", "*.gdb", "*.sqlite", "*.db", "*.geojson", "*.json"],
        "output_suffix": ".geojsonseq",
        "reproject_crs": "EPSG:4326",
        "overwrite": True,
        "verbose": True
    },
    "tiling": {
        "input_dirs": [DATA_DIR, OVERTURE_DATA_DIR],  # Search in both data directories
        "output_dir": TILE_DIR,
        "parallel": True,
        "overwrite": True,
        "verbose": True,
        "create_tilejson": True
    }
}

# Create necessary directories
for path_key, path_value in CONFIG["paths"].items():
    if path_key.endswith("_dir") and path_value:
        path_value.mkdir(parents=True, exist_ok=True)

# Display configuration summary
print("PROJECT CONFIGURATION INITIALIZED")
print("=" * 50)
print(f"Project root: {CONFIG['paths']['project_root']}")
print(f"Processing directory: {CONFIG['paths']['processing_dir']}")
print(f"Data directory: {CONFIG['paths']['data_dir']}")
print(f"Overture data directory: {CONFIG['paths']['overture_data_dir']}")
print(f"Custom data directory: {CONFIG['paths']['custom_data_dir']}")
print(f"Tile output directory: {CONFIG['paths']['tile_dir']}")
print(f"Public tiles directory: {CONFIG['paths']['public_tiles_dir']}")
print()
print(f"Processing extent: {CONFIG['extent']['coordinates']}")
print(f"Buffer degrees: {CONFIG['extent']['buffer_degrees']}")
print(f"Area: {(CONFIG['extent']['coordinates'][2] - CONFIG['extent']['coordinates'][0]) * (CONFIG['extent']['coordinates'][3] - CONFIG['extent']['coordinates'][1]):.2f} degree²")
print()
print("All directories created and configuration loaded")

PROJECT CONFIGURATION INITIALIZED
Project root: /Users/matthewheaton/GitHub/basemap
Processing directory: /Users/matthewheaton/GitHub/basemap/processing
Data directory: /Users/matthewheaton/GitHub/basemap/processing/data
Overture data directory: /Users/matthewheaton/GitHub/basemap/overture/data
Custom data directory: /Users/matthewheaton/GitHub/basemap/processing/input/grid3
Tile output directory: /Users/matthewheaton/GitHub/basemap/processing/tiles
Public tiles directory: /Users/matthewheaton/GitHub/basemap/public/tiles

Processing extent: (22.0, -6.0, 24.0, -4.0)
Buffer degrees: 0.2
Area: 4.00 degree²

All directories created and configuration loaded


[A

## 2. Download Overture Data with DuckDB

Use the `downloadOverture.py` module to fetch geospatial data from Overture Maps. This module uses DuckDB to efficiently query and download data for specific geographic extents.

In [None]:
# Download Overture Maps data
print("=== STEP 1: DOWNLOADING OVERTURE DATA ===")
download_results = download_overture_data(
    extent=CONFIG["extent"]["coordinates"],
    buffer_degrees=CONFIG["extent"]["buffer_degrees"],
    template_path=str(CONFIG["paths"]["template_path"]),
    verbose=CONFIG["download"]["verbose"]
)

print(f"Download completed: {download_results['success']}")
print(f"Sections processed: {download_results['processed_sections']}")
if download_results["errors"]:
    print(f"Errors encountered: {len(download_results['errors'])}")
    for error in download_results["errors"]:
        print(f"  - {error}")
print()

=== STEP 1: DOWNLOADING OVERTURE DATA ===
=== DOWNLOADING SOURCE DATA ===
Raw extent: (22.0, -6.0, 24.0, -4.0)
Snapped extent: (21.09375, -7.0136679275666305, 25.3125, -2.8113711933311296)
Map extent: 21.09375, -7.0136679275666305 to 25.3125, -2.8113711933311296
Download extent (buffered): 20.89375, -7.213667927566631 to 25.5125, -2.6113711933311294
Buffer: 0.2 degrees (~22.2km)



Overall progress:   0%|          | 0/10 [00:00<?, ?section/s]

Executing Section 1: base/land
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=base/type=land/*
  -> Output: land.geojsonseq


Overall progress:  10%|█         | 1/10 [00:27<04:03, 27.09s/section]

Executing Section 2: base/land_use
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=base/type=land_use/*
  -> Output: land_use.geojsonseq


Overall progress:  20%|██        | 2/10 [00:45<02:57, 22.20s/section]

Executing Section 3: base/land_use
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=base/type=land_use/*
  -> Output: land_residential.geojsonseq


Overall progress:  30%|███       | 3/10 [00:48<01:33, 13.37s/section]

Executing Section 4: base/water
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=base/type=water/*
  -> Output: water.geojsonseq


Overall progress:  40%|████      | 4/10 [01:19<02:01, 20.29s/section]

Executing Section 5: transportation/segment
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=transportation/type=segment/*
  -> Output: roads.geojsonseq


Overall progress:  50%|█████     | 5/10 [02:30<03:12, 38.56s/section]

Executing Section 6: buildings/building
  -> Querying: az://overturemapswestus2.blob.core.windows.net/release/2025-06-25.0/theme=buildings/type=building/*
  -> Output: buildings.geojsonseq


Overall progress:  60%|██████    | 6/10 [07:01<07:49, 117.42s/section]

Executing Section 7: admins/locality
  -> Querying: az://overturemapswestus2.blob.core.windows.net/release/2024-04-16-beta.0/theme=admins/type=locality/*
  -> Output: placenames.geojson


Overall progress:  70%|███████   | 7/10 [07:06<04:02, 80.89s/section] 

Executing Section 8: unknown
  -> Querying: s3://overturemaps-us-west-2/release/2025-06-25.0/theme=places/*/*
  -> Output: places.geojson


Overall progress:  80%|████████  | 8/10 [07:16<01:56, 58.19s/section]

Executing Section 9: base/land_cover
  -> Querying: az://overturemapswestus2.blob.core.windows.net/release/2025-06-25.0/theme=base/type=land_cover/*
  -> Output: land_cover.geojsonseq


Overall progress:  90%|█████████ | 9/10 [09:08<01:15, 75.11s/section]

Executing Section 10: base/infrastructure
  -> Querying: az://overturemapswestus2.blob.core.windows.net/release/2025-06-25.0/theme=base/type=infrastructure/*
  -> Output: infrastructure.geojsonseq


Overall progress: 100%|██████████| 10/10 [09:18<00:00, 55.82s/section]

=== SOURCE DATA DOWNLOAD COMPLETE ===

Download completed: True
Sections processed: 10






In [None]:
# Check what files were created during download
print("=== CHECKING DOWNLOADED FILES ===")

overture_files = []
search_dirs = [CONFIG["paths"]["data_dir"], CONFIG["paths"]["overture_data_dir"]]

for data_dir in search_dirs:
    if data_dir.exists():
        for pattern in CONFIG["download"]["output_formats"]:
            files = list(data_dir.glob(pattern))
            overture_files.extend(files)

print(f"Found {len(overture_files)} downloaded files:")
for file in sorted(overture_files):
    file_size = file.stat().st_size / 1024 / 1024  # Size in MB
    print(f"  {file.name} ({file_size:.1f} MB)")

# Display file statistics
if overture_files:
    total_size_mb = sum(f.stat().st_size for f in overture_files) / 1024 / 1024
    print(f"\nTotal size: {total_size_mb:.1f} MB")
    print(f"Search directories: {[str(d) for d in search_dirs]}")
else:
    print("No files found. Check download results above.")
    print(f"Searched in: {[str(d) for d in search_dirs]}")

Checking downloaded files...

Found 10 downloaded files:
  GRID3_COD_health_areas_v5_0.geojson (281.7 MB)
  GRID3_COD_health_facilities_v5_0.geojson (19.4 MB)
  GRID3_COD_health_zones_v5_0.geojson (77.2 MB)
  GRID3_COD_settlement_extents_v3_1.geojsonseq (2401.8 MB)
  GRID3_COD_settlement_names_v5_0.geojson (50.4 MB)
  infrastructure.geojsonseq (0.7 MB)
  land.geojsonseq (7.1 MB)
  land_cover.geojsonseq (356.1 MB)
  land_residential.geojsonseq (7.7 MB)
  land_use.geojsonseq (1.6 MB)

Total size: 3203.5 MB


## 3. Convert Custom Spatial Data for Tippecanoe

Use the `convertCustomData.py` module to convert various geospatial formats to newline-delimited GeoJSON files suitable for Tippecanoe processing.

### Supported Input Formats
- Shapefile (.shp)
- GeoPackage (.gpkg)
- FileGDB (.gdb)
- SQLite/SpatiaLite (.sqlite, .db)
- PostGIS (connection string)
- CSV with geometry columns

In [11]:
# Look for custom data files to convert
print("=== STEP 3: CONVERTING CUSTOM SPATIAL DATA ===")

custom_input_dir = CONFIG["paths"]["custom_data_dir"]
custom_files = []

# Search for various spatial data formats using CONFIG patterns
for pattern in CONFIG["conversion"]["input_patterns"]:
    custom_files.extend(custom_input_dir.glob(pattern))

print(f"Found {len(custom_files)} custom data files to convert:")
print(f"Search directory: {custom_input_dir}")
for file in custom_files:
    print(f"  {file.name}")

# Convert custom data files (if any exist)
converted_files = []

for input_file in custom_files[:3]:  # Convert first 3 files as example
    output_file = CONFIG["paths"]["data_dir"] / f"{input_file.stem}{CONFIG['conversion']['output_suffix']}"
    
    print(f"Converting {input_file.name}...")
    
    try:
        # Convert using the modular function with CONFIG settings
        processed, skipped, output_path = convert_file(
            input_path=str(input_file),
            output_path=str(output_file),
            reproject=CONFIG["conversion"]["reproject_crs"],
            verbose=CONFIG["conversion"]["verbose"]
        )
        
        converted_files.append(output_file)
        print(f"✓ Converted: {processed} features, {skipped} skipped")
        print(f"  Output: {output_file.name}")
        
    except Exception as e:
        print(f"✗ Error converting {input_file.name}: {e}")

if converted_files:
    print(f"\n✓ Successfully converted {len(converted_files)} files")
    print(f"  Output directory: {CONFIG['paths']['data_dir']}")
else:
    print(f"\nNo custom files to convert. Add data files to: {custom_input_dir}")
    print(f"Supported formats: {', '.join(CONFIG['conversion']['input_patterns'])}")

=== STEP 3: CONVERTING CUSTOM SPATIAL DATA ===
Found 5 custom data files to convert:
Search directory: /Users/matthewheaton/GitHub/basemap/processing/input/grid3
  GRID3_COD_Settlement_Extents_v3_1.gpkg
  GRID3_COD_health_zones_v5_0.geojson
  GRID3_COD_health_facilities_v5_0.geojson
  GRID3_COD_health_areas_v5_0.geojson
  GRID3_COD_settlement_names_v5_0.geojson
Converting GRID3_COD_Settlement_Extents_v3_1.gpkg...
Processing 572537 features
Processing 572537 features


Converting:  18%|█▊        | 104693/572537 [02:00<08:58, 868.04features/s] 
Converting:   1%|          | 4628/572537 [00:01<04:05, 2314.63features/s]
Converting:   2%|▏         | 10637/572537 [00:03<02:39, 3513.08features/s]

Batch processed: 10000 features, 2850.2 features/sec


Converting:   3%|▎         | 18479/572537 [00:06<02:34, 3588.94features/s]
[A
Converting:   4%|▎         | 20336/572537 [00:06<03:23, 2712.11features/s]

Batch processed: 20000 features, 3128.6 features/sec


Converting:   5%|▌         | 30571/572537 [00:10<02:37, 3437.85features/s]

Batch processed: 30000 features, 2739.9 features/sec


Converting:   7%|▋         | 40124/572537 [00:13<02:18, 3840.74features/s]

Batch processed: 40000 features, 3263.0 features/sec


Converting:   9%|▉         | 50697/572537 [00:16<02:20, 3724.47features/s]

Batch processed: 50000 features, 3074.1 features/sec


Converting:  11%|█         | 60649/572537 [00:20<04:20, 1968.03features/s]

Batch processed: 60000 features, 2671.2 features/sec


Converting:  12%|█▏        | 70373/572537 [00:23<02:09, 3874.66features/s]

Batch processed: 70000 features, 3708.7 features/sec


Converting:  14%|█▍        | 80489/572537 [00:26<02:26, 3358.92features/s]

Batch processed: 80000 features, 3567.3 features/sec


Converting:  16%|█▌        | 90687/572537 [00:29<02:12, 3624.43features/s]

Batch processed: 90000 features, 3387.8 features/sec


Converting:  18%|█▊        | 100364/572537 [00:31<02:06, 3734.14features/s]

Batch processed: 100000 features, 3466.5 features/sec


Converting:  19%|█▉        | 110477/572537 [00:34<02:07, 3620.83features/s]

Batch processed: 110000 features, 3712.3 features/sec


Converting:  21%|██        | 120466/572537 [00:37<02:13, 3375.99features/s]

Batch processed: 120000 features, 3441.5 features/sec


Converting:  23%|██▎       | 130384/572537 [00:40<02:17, 3214.19features/s]

Batch processed: 130000 features, 3304.6 features/sec


Converting:  24%|██▍       | 140182/572537 [00:43<02:18, 3111.64features/s]

Batch processed: 140000 features, 3411.0 features/sec


Converting:  26%|██▋       | 150713/572537 [00:46<01:50, 3805.72features/s]

Batch processed: 150000 features, 3674.3 features/sec


Converting:  28%|██▊       | 160770/572537 [00:49<02:00, 3413.72features/s]

Batch processed: 160000 features, 3269.1 features/sec


Converting:  30%|██▉       | 170462/572537 [00:52<01:53, 3531.70features/s]

Batch processed: 170000 features, 3493.9 features/sec


Converting:  32%|███▏      | 180485/572537 [00:55<02:02, 3202.78features/s]

Batch processed: 180000 features, 3402.1 features/sec


Converting:  33%|███▎      | 190539/572537 [00:57<01:42, 3722.59features/s]

Batch processed: 190000 features, 3482.2 features/sec


Converting:  35%|███▍      | 200328/572537 [01:00<01:46, 3507.67features/s]

Batch processed: 200000 features, 3618.8 features/sec


Converting:  37%|███▋      | 210524/572537 [01:03<01:53, 3201.31features/s]

Batch processed: 210000 features, 3488.2 features/sec


Converting:  38%|███▊      | 220382/572537 [01:06<01:58, 2973.87features/s]

Batch processed: 220000 features, 3221.6 features/sec


Converting:  40%|████      | 230614/572537 [01:09<02:10, 2612.46features/s]

Batch processed: 230000 features, 3252.8 features/sec


Converting:  42%|████▏     | 240497/572537 [01:12<01:31, 3632.59features/s]

Batch processed: 240000 features, 3647.4 features/sec


Converting:  44%|████▍     | 250703/572537 [01:15<01:42, 3148.14features/s]

Batch processed: 250000 features, 3422.1 features/sec


Converting:  46%|████▌     | 260572/572537 [01:18<01:26, 3626.53features/s]

Batch processed: 260000 features, 3662.9 features/sec


Converting:  47%|████▋     | 270556/572537 [01:21<01:13, 4133.54features/s]

Batch processed: 270000 features, 3274.7 features/sec


Converting:  49%|████▉     | 280407/572537 [01:24<01:43, 2818.56features/s]

Batch processed: 280000 features, 2920.8 features/sec


Converting:  51%|█████     | 290361/572537 [01:27<01:27, 3229.10features/s]

Batch processed: 290000 features, 3045.5 features/sec


Converting:  52%|█████▏    | 300489/572537 [01:31<01:15, 3580.28features/s]

Batch processed: 300000 features, 2971.5 features/sec


Converting:  54%|█████▍    | 310464/572537 [01:34<01:11, 3685.41features/s]

Batch processed: 310000 features, 3204.4 features/sec


Converting:  56%|█████▌    | 320395/572537 [01:37<01:12, 3500.11features/s]

Batch processed: 320000 features, 2875.7 features/sec


Converting:  58%|█████▊    | 330174/572537 [01:40<01:34, 2553.44features/s]

Batch processed: 330000 features, 3402.2 features/sec


Converting:  59%|█████▉    | 340540/572537 [01:44<01:10, 3295.31features/s]

Batch processed: 340000 features, 2879.7 features/sec


Converting:  61%|██████    | 350470/572537 [01:47<01:05, 3400.93features/s]

Batch processed: 350000 features, 3385.6 features/sec


Converting:  63%|██████▎   | 360505/572537 [01:50<01:01, 3438.13features/s]

Batch processed: 360000 features, 3214.8 features/sec


Converting:  65%|██████▍   | 370657/572537 [01:52<00:52, 3822.84features/s]

Batch processed: 370000 features, 3845.2 features/sec


Converting:  66%|██████▋   | 380366/572537 [01:55<00:55, 3489.97features/s]

Batch processed: 380000 features, 3265.6 features/sec


Converting:  68%|██████▊   | 390372/572537 [01:58<00:46, 3875.87features/s]

Batch processed: 390000 features, 3879.0 features/sec


Converting:  70%|██████▉   | 400706/572537 [02:01<00:44, 3903.08features/s]

Batch processed: 400000 features, 3849.5 features/sec


Converting:  72%|███████▏  | 410538/572537 [02:03<00:50, 3194.63features/s]

Batch processed: 410000 features, 3614.2 features/sec


Converting:  73%|███████▎  | 420494/572537 [02:06<00:39, 3825.83features/s]

Batch processed: 420000 features, 3514.9 features/sec


Converting:  75%|███████▌  | 430466/572537 [02:10<00:50, 2840.25features/s]

Batch processed: 430000 features, 2769.8 features/sec


Converting:  77%|███████▋  | 440421/572537 [02:13<00:33, 3905.69features/s]

Batch processed: 440000 features, 2973.5 features/sec


Converting:  79%|███████▊  | 450268/572537 [02:16<00:35, 3430.63features/s]

Batch processed: 450000 features, 3290.1 features/sec


Converting:  80%|████████  | 460161/572537 [02:20<00:36, 3069.27features/s]

Batch processed: 460000 features, 3013.2 features/sec


Converting:  82%|████████▏ | 470755/572537 [02:23<00:29, 3397.86features/s]

Batch processed: 470000 features, 2791.5 features/sec


Converting:  84%|████████▍ | 480433/572537 [02:26<00:26, 3455.06features/s]

Batch processed: 480000 features, 3409.1 features/sec


Converting:  86%|████████▌ | 490336/572537 [02:29<00:29, 2784.02features/s]

Batch processed: 490000 features, 3212.0 features/sec


Converting:  87%|████████▋ | 500543/572537 [02:32<00:19, 3688.86features/s]

Batch processed: 500000 features, 3308.2 features/sec


Converting:  89%|████████▉ | 510531/572537 [02:35<00:15, 3922.47features/s]

Batch processed: 510000 features, 3466.1 features/sec


Converting:  91%|█████████ | 520386/572537 [02:38<00:15, 3463.36features/s]

Batch processed: 520000 features, 3344.7 features/sec


Converting:  93%|█████████▎| 530709/572537 [02:41<00:11, 3597.47features/s]

Batch processed: 530000 features, 3663.7 features/sec


Converting:  94%|█████████▍| 540392/572537 [02:44<00:09, 3543.39features/s]

Batch processed: 540000 features, 3351.0 features/sec


Converting:  96%|█████████▌| 550571/572537 [02:47<00:06, 3635.11features/s]

Batch processed: 550000 features, 3657.5 features/sec


Converting:  98%|█████████▊| 560552/572537 [02:49<00:03, 3666.65features/s]

Batch processed: 560000 features, 3592.2 features/sec


Converting: 100%|█████████▉| 570365/572537 [02:53<00:01, 2131.33features/s]

Batch processed: 570000 features, 2926.0 features/sec


Converting: 100%|██████████| 572537/572537 [02:54<00:00, 3283.66features/s]
Converting: 100%|██████████| 572537/572537 [02:54<00:00, 3283.66features/s]


Conversion complete: 572537 features processed, 0 features skipped
Output written to: /Users/matthewheaton/GitHub/basemap/processing/data/GRID3_COD_Settlement_Extents_v3_1.geojsonseq
✓ Converted: 572537 features, 0 skipped
  Output: GRID3_COD_Settlement_Extents_v3_1.geojsonseq
Converting GRID3_COD_health_zones_v5_0.geojson...
Processing 329 features
Processing 329 features


Converting: 100%|██████████| 329/329 [00:06<00:00, 49.22features/s] 



Conversion complete: 329 features processed, 0 features skipped
Output written to: /Users/matthewheaton/GitHub/basemap/processing/data/GRID3_COD_health_zones_v5_0.geojsonseq
✓ Converted: 329 features, 0 skipped
  Output: GRID3_COD_health_zones_v5_0.geojsonseq
Converting GRID3_COD_health_facilities_v5_0.geojson...
Processing 27213 features
Processing 27213 features


Converting:  41%|████      | 11193/27213 [00:01<00:01, 11596.26features/s]

Batch processed: 10000 features, 11129.7 features/sec


Converting:  79%|███████▊  | 21415/27213 [00:01<00:00, 10400.10features/s]

Batch processed: 20000 features, 10860.0 features/sec


Converting: 100%|██████████| 27213/27213 [00:02<00:00, 11001.42features/s]

Conversion complete: 27213 features processed, 0 features skipped
Output written to: /Users/matthewheaton/GitHub/basemap/processing/data/GRID3_COD_health_facilities_v5_0.geojsonseq
✓ Converted: 27213 features, 0 skipped
  Output: GRID3_COD_health_facilities_v5_0.geojsonseq

✓ Successfully converted 3 files
  Output directory: /Users/matthewheaton/GitHub/basemap/processing/data





## 4. Process GeoJSON/GeoJSONSeq to PMTiles

Use the `runCreateTiles.py` module to convert GeoJSON and GeoJSONSeq files to PMTiles using optimized Tippecanoe settings.

### Automatic Optimization Features
- **Geometry Detection**: Automatically detects Point, LineString, or Polygon geometries
- **Layer-Specific Settings**: Optimized settings for water, roads, places, land use, etc.
- **Parallel Processing**: Multi-threaded processing for large datasets
- **Quality Optimization**: Smart simplification and feature dropping

In [12]:
# Step 4: Process all GeoJSON/GeoJSONSeq files to PMTiles
print("=== STEP 4: PROCESSING TO PMTILES ===")

# Process all downloaded and converted files to PMTiles using CONFIG settings
tiling_results = process_to_tiles(
    extent=CONFIG["extent"]["coordinates"],
    input_dirs=[str(d) for d in CONFIG["tiling"]["input_dirs"]],  # Convert Path objects to strings
    output_dir=str(CONFIG["paths"]["tile_dir"]), 
    parallel=CONFIG["tiling"]["parallel"],
    verbose=CONFIG["tiling"]["verbose"]
)

print(f"Tiling completed: {tiling_results['success']}")
print(f"Files processed: {len(tiling_results['processed_files'])}/{tiling_results['total_files']}")

if tiling_results["errors"]:
    print(f"Errors encountered: {len(tiling_results['errors'])}")
    for error in tiling_results["errors"]:
        print(f"  - {error}")

# Display generated PMTiles files
if tiling_results["processed_files"]:
    print(f"\n✓ Successfully generated {len(tiling_results['processed_files'])} PMTiles:")
    
    pmtiles_files = list(CONFIG["paths"]["tile_dir"].glob("*.pmtiles"))
    
    total_size_mb = 0
    for pmtile in sorted(pmtiles_files):
        size_mb = pmtile.stat().st_size / 1024 / 1024
        total_size_mb += size_mb
        print(f"  {pmtile.name} ({size_mb:.1f} MB)")
    
    print(f"\nTotal PMTiles size: {total_size_mb:.1f} MB")
    print(f"Files location: {CONFIG['paths']['tile_dir']}")
    
else:
    print("\nNo PMTiles files were generated. Check the errors above.")
    print(f"Make sure you have GeoJSON/GeoJSONSeq files in: {[str(d) for d in CONFIG['tiling']['input_dirs']]}")

=== STEP 4: PROCESSING TO PMTILES ===
=== PROCESSING TO TILES ===
Found 15 files to process:
  GRID3_COD_health_zones_v5_0.geojsonseq
  land_use.geojsonseq
  land_residential.geojsonseq
  land_cover.geojsonseq
  GRID3_COD_Settlement_Extents_v3_1.geojsonseq
  land.geojsonseq
  GRID3_COD_health_facilities_v5_0.geojsonseq
  infrastructure.geojsonseq
  placenames.geojson
  places.geojson
  land_use.geojsonseq
  water.geojsonseq
  land.geojsonseq
  buildings.geojsonseq
  roads.geojsonseq


Processing files:   0%|          | 0/15 [00:00<?, ?file/s]

  Detected geometry type: Mixed for GRID3_COD_health_zones_v5_0.geojsonseq (0.189s)


                                                          

Processing files:   0%|          | 0/15 [00:02<?, ?file/s]                 
Processing files:   7%|▋         | 1/15 [00:02<00:37,  2.70s/file]

Processing files:   0%|          | 0/15 [00:02<?, ?file/s]                 
Processing files:   7%|▋         | 1/15 [00:02<00:37,  2.70s/file]

✓ land_use.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/land_use.pmtiles


                                                                  

Processing files:   7%|▋         | 1/15 [00:03<00:37,  2.70s/file]         
Processing files:  13%|█▎        | 2/15 [00:03<00:23,  1.81s/file]

Processing files:   7%|▋         | 1/15 [00:03<00:37,  2.70s/file]         
Processing files:  13%|█▎        | 2/15 [00:03<00:23,  1.81s/file]

✓ land_residential.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/land_residential.pmtiles


                                                                  

Processing files:  13%|█▎        | 2/15 [00:11<00:23,  1.81s/file]         
Processing files:  20%|██        | 3/15 [00:11<00:52,  4.34s/file]

Processing files:  13%|█▎        | 2/15 [00:11<00:23,  1.81s/file]         
Processing files:  20%|██        | 3/15 [00:11<00:52,  4.34s/file]

✓ GRID3_COD_health_zones_v5_0.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/GRID3_COD_health_zones_v5_0.pmtiles
  Detected geometry type: Point for GRID3_COD_health_facilities_v5_0.geojsonseq (0.022s)


                                                                  

Processing files:  20%|██        | 3/15 [00:14<00:52,  4.34s/file]         
Processing files:  27%|██▋       | 4/15 [00:14<00:42,  3.90s/file]

Processing files:  20%|██        | 3/15 [00:14<00:52,  4.34s/file]         
Processing files:  27%|██▋       | 4/15 [00:14<00:42,  3.90s/file]

✓ land.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/land.pmtiles


                                                                  

Processing files:  27%|██▋       | 4/15 [00:18<00:42,  3.90s/file]         
Processing files:  33%|███▎      | 5/15 [00:18<00:38,  3.82s/file]

Processing files:  27%|██▋       | 4/15 [00:18<00:42,  3.90s/file]         
Processing files:  33%|███▎      | 5/15 [00:18<00:38,  3.82s/file]

✓ GRID3_COD_health_facilities_v5_0.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/GRID3_COD_health_facilities_v5_0.pmtiles


                                                                  

Processing files:  33%|███▎      | 5/15 [00:19<00:38,  3.82s/file]         
Processing files:  40%|████      | 6/15 [00:19<00:26,  2.96s/file]

Processing files:  33%|███▎      | 5/15 [00:19<00:38,  3.82s/file]         
Processing files:  40%|████      | 6/15 [00:19<00:26,  2.96s/file]

✓ infrastructure.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/infrastructure.pmtiles


                                                                  

Processing files:  40%|████      | 6/15 [00:22<00:26,  2.96s/file]         
Processing files:  47%|████▋     | 7/15 [00:22<00:22,  2.87s/file]

Processing files:  40%|████      | 6/15 [00:22<00:26,  2.96s/file]         
Processing files:  47%|████▋     | 7/15 [00:22<00:22,  2.87s/file]

✓ placenames.geojson -> /Users/matthewheaton/GitHub/basemap/processing/tiles/placenames.pmtiles


                                                                  

Processing files:  47%|████▋     | 7/15 [00:23<00:22,  2.87s/file]         
Processing files:  53%|█████▎    | 8/15 [00:23<00:16,  2.37s/file]

Processing files:  47%|████▋     | 7/15 [00:23<00:22,  2.87s/file]         
Processing files:  53%|█████▎    | 8/15 [00:23<00:16,  2.37s/file]

✓ places.geojson -> /Users/matthewheaton/GitHub/basemap/processing/tiles/places.pmtiles


                                                                  

Processing files:  53%|█████▎    | 8/15 [00:25<00:16,  2.37s/file]         
Processing files:  60%|██████    | 9/15 [00:25<00:13,  2.28s/file]

Processing files:  53%|█████▎    | 8/15 [00:25<00:16,  2.37s/file]         
Processing files:  60%|██████    | 9/15 [00:25<00:13,  2.28s/file]

✓ land_cover.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/land_cover.pmtiles


                                                                  

Processing files:  60%|██████    | 9/15 [00:25<00:13,  2.28s/file]         
Processing files:  67%|██████▋   | 10/15 [00:25<00:08,  1.64s/file]

Processing files:  60%|██████    | 9/15 [00:25<00:13,  2.28s/file]         
Processing files:  67%|██████▋   | 10/15 [00:25<00:08,  1.64s/file]

✓ land_use.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/land_use.pmtiles
  Detected geometry type: Polygon for buildings.geojsonseq (0.890s)
  Detected geometry type: Polygon for buildings.geojsonseq (0.890s)


                                                                   

Processing files:  67%|██████▋   | 10/15 [00:27<00:08,  1.64s/file]        
Processing files:  73%|███████▎  | 11/15 [00:27<00:06,  1.67s/file]

Processing files:  67%|██████▋   | 10/15 [00:27<00:08,  1.64s/file]        
Processing files:  73%|███████▎  | 11/15 [00:27<00:06,  1.67s/file]

✓ water.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/water.pmtiles


                                                                   

Processing files:  73%|███████▎  | 11/15 [00:33<00:06,  1.67s/file]        
Processing files:  80%|████████  | 12/15 [00:33<00:08,  2.92s/file]

Processing files:  73%|███████▎  | 11/15 [00:33<00:06,  1.67s/file]        
Processing files:  80%|████████  | 12/15 [00:33<00:08,  2.92s/file]

✓ GRID3_COD_Settlement_Extents_v3_1.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/GRID3_COD_Settlement_Extents_v3_1.pmtiles


                                                                   

Processing files:  80%|████████  | 12/15 [00:33<00:08,  2.92s/file]        
Processing files:  87%|████████▋ | 13/15 [00:33<00:04,  2.22s/file]

Processing files:  80%|████████  | 12/15 [00:33<00:08,  2.92s/file]        
Processing files:  87%|████████▋ | 13/15 [00:33<00:04,  2.22s/file]

✓ land.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/land.pmtiles


                                                                   

Processing files:  87%|████████▋ | 13/15 [00:34<00:04,  2.22s/file]        
Processing files:  93%|█████████▎| 14/15 [00:34<00:01,  1.88s/file]

Processing files:  87%|████████▋ | 13/15 [00:34<00:04,  2.22s/file]        
Processing files:  93%|█████████▎| 14/15 [00:34<00:01,  1.88s/file]

✓ roads.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/roads.pmtiles


                                                                   

Processing files:  93%|█████████▎| 14/15 [01:01<00:01,  1.88s/file]        
Processing files: 100%|██████████| 15/15 [01:01<00:00,  4.09s/file]

Processing files:  93%|█████████▎| 14/15 [01:01<00:01,  1.88s/file]        
Processing files: 100%|██████████| 15/15 [01:01<00:00,  4.09s/file]

✓ buildings.geojsonseq -> /Users/matthewheaton/GitHub/basemap/processing/tiles/buildings.pmtiles

=== TILE PROCESSING COMPLETE ===
Processed: 15/15 files
Tiling completed: True
Files processed: 15/15

✓ Successfully generated 15 PMTiles:
  GRID3_COD_Settlement_Extents_v3_1.pmtiles (3.4 MB)
  GRID3_COD_health_facilities_v5_0.pmtiles (1.9 MB)
  GRID3_COD_health_zones_v5_0.pmtiles (2.5 MB)
  buildings.pmtiles (13.3 MB)
  infrastructure.pmtiles (0.1 MB)
  land.pmtiles (0.3 MB)
  land_cover.pmtiles (29.4 MB)
  land_residential.pmtiles (1.3 MB)
  land_use.pmtiles (1.3 MB)
  placenames.pmtiles (0.3 MB)
  places.pmtiles (0.2 MB)
  roads.pmtiles (6.5 MB)
  water.pmtiles (2.0 MB)

Total PMTiles size: 62.5 MB
Files location: /Users/matthewheaton/GitHub/basemap/processing/tiles





## 5. Create TileJSON Metadata

Generate TileJSON metadata files for seamless integration with web mapping libraries like MapLibre GL JS.

### TileJSON Features
- **Bounds and zoom levels** automatically detected from PMTiles
- **Vector layer definitions** for each data layer
- **MapLibre GL JS compatibility** for easy web integration
- **PMTiles URL references** for efficient tile serving

In [13]:
# Step 5: Create TileJSON metadata for MapLibre integration
print("=== STEP 5: CREATING TILEJSON METADATA ===")

# Check if PMTiles files exist in the configured tile directory
pmtiles_files = list(CONFIG["paths"]["tile_dir"].glob("*.pmtiles"))

if pmtiles_files:
    print(f"Found {len(pmtiles_files)} PMTiles files, creating TileJSON...")
    
    try:
        tilejson = create_tilejson(
            tile_dir=CONFIG["paths"]["tile_dir"],
            extent=CONFIG["extent"]["coordinates"],
            output_file=CONFIG["paths"]["tile_dir"] / "tilejson.json"
        )
        
        print("✓ TileJSON created successfully")
        print(f"  Bounds: {tilejson['bounds']}")
        print(f"  Zoom range: {tilejson['minzoom']} - {tilejson['maxzoom']}")
        print(f"  Vector layers: {len(tilejson['vector_layers'])}")
        print(f"  Output file: {CONFIG['paths']['tile_dir'] / 'tilejson.json'}")
        
        # Show a summary of all output files
        print(f"\nComplete output summary:")
        total_size_mb = 0
        for pmtile in sorted(pmtiles_files):
            size_mb = pmtile.stat().st_size / 1024 / 1024
            total_size_mb += size_mb
            print(f"  {pmtile.name} ({size_mb:.1f} MB)")
        
        print(f"  tilejson.json")
        print(f"\nTotal PMTiles size: {total_size_mb:.1f} MB")
        print(f"All files location: {CONFIG['paths']['tile_dir']}")
        
    except Exception as e:
        print(f"✗ TileJSON creation failed: {e}")
        
else:
    print("No PMTiles files found in output directory.")
    print(f"Expected location: {CONFIG['paths']['tile_dir']}")
    print("Run Step 4 first to generate PMTiles files.")

=== STEP 5: CREATING TILEJSON METADATA ===
Found 13 PMTiles files, creating TileJSON...
TileJSON created: /Users/matthewheaton/GitHub/basemap/processing/tiles/tilejson.json
Found 13 PMTiles files
✓ TileJSON created successfully
  Bounds: [22.0, -6.0, 24.0, -4.0]
  Zoom range: 0 - 16
  Vector layers: 13
  Output file: /Users/matthewheaton/GitHub/basemap/processing/tiles/tilejson.json

Complete output summary:
  GRID3_COD_Settlement_Extents_v3_1.pmtiles (3.4 MB)
  GRID3_COD_health_facilities_v5_0.pmtiles (1.9 MB)
  GRID3_COD_health_zones_v5_0.pmtiles (2.5 MB)
  buildings.pmtiles (13.3 MB)
  infrastructure.pmtiles (0.1 MB)
  land.pmtiles (0.3 MB)
  land_cover.pmtiles (29.4 MB)
  land_residential.pmtiles (1.3 MB)
  land_use.pmtiles (1.3 MB)
  placenames.pmtiles (0.3 MB)
  places.pmtiles (0.2 MB)
  roads.pmtiles (6.5 MB)
  water.pmtiles (2.0 MB)
  tilejson.json

Total PMTiles size: 62.5 MB
All files location: /Users/matthewheaton/GitHub/basemap/processing/tiles


## 6. Validate and Test Individual Steps

Test each processing step individually and validate the generated outputs.

In [None]:
# Individual Step Testing and Validation

print("INDIVIDUAL STEP TESTING")
print("=" * 50)

print("\n1. Test downloadOverture.py standalone:")
print("python processing/downloadOverture.py --extent='23.4,-6.2,23.8,-5.8' --buffer=0.1")

print("\n2. Test convertCustomData.py standalone:")
print("python processing/convertCustomData.py input.shp output.geojsonseq --reproject=EPSG:4326")

print("\n3. Test runCreateTiles.py standalone:")
print("python processing/runCreateTiles.py --extent='23.4,-6.2,23.8,-5.8' --create-tilejson")

print("\n4. Test individual steps in this notebook:")
print("   - Step 1: Download section (cell 6)")
print("   - Step 2: Check downloaded files (cell 7)")
print("   - Step 3: Convert custom data (cell 9)")
print("   - Step 4: Process to PMTiles (cell 11)")
print("   - Step 5: Create TileJSON (cell 13)")

print("\n5. Validate outputs using CONFIG paths:")
print(f"   - Check {CONFIG['paths']['data_dir']} for GeoJSON files")
print(f"   - Check {CONFIG['paths']['tile_dir']} for PMTiles files")
print(f"   - Verify TileJSON metadata file")

# Configuration validation using centralized CONFIG
print("\nCURRENT CONFIGURATION VALIDATION")
print("=" * 50)
print(f"Extent: {CONFIG['extent']['coordinates']}")
print(f"Buffer: {CONFIG['extent']['buffer_degrees']} degrees")
print(f"Tile output directory: {CONFIG['paths']['tile_dir']}")
print(f"Custom data directory: {CONFIG['paths']['custom_data_dir']}")
print(f"Input directories for tiling: {[str(d) for d in CONFIG['tiling']['input_dirs']]}")

# Area calculation using CONFIG
extent = CONFIG['extent']['coordinates']
area = (extent[2] - extent[0]) * (extent[3] - extent[1])
print(f"Processing area: {area:.2f} degree² ({area * 111**2:.0f} km²)")

# Check directory status
print(f"\nDIRECTORY STATUS")
print("=" * 30)
for path_name, path_obj in CONFIG['paths'].items():
    if path_name.endswith('_dir'):
        status = "exists" if path_obj.exists() else "missing"
        file_count = len(list(path_obj.glob("*"))) if path_obj.exists() else 0
        print(f"{path_name}: {status} ({file_count} files)")

print("\nPERFORMANCE OPTIMIZATION TIPS")
print("=" * 50)

print(f"\n1. For large areas (current: {area:.2f} degree²):")
print(f"   - Current buffer: {CONFIG['extent']['buffer_degrees']} degrees")
print(f"   - Parallel processing: {CONFIG['tiling']['parallel']}")
print("   - Consider smaller chunks if memory issues occur")

print("\n2. File management:")
print(f"   - Monitor {CONFIG['paths']['data_dir']} size during processing")
print("   - Clean intermediate files between steps if needed")
print("   - Use filter patterns to process specific layers only")

print("\n3. Output optimization:")
print(f"   - PMTiles output: {CONFIG['paths']['tile_dir']}")
print(f"   - Public tiles: {CONFIG['paths']['public_tiles_dir']}")
print("   - Copy final tiles to public directory for web serving")

🔍 INDIVIDUAL STEP TESTING

1. Test downloadOverture.py standalone:
python processing/downloadOverture.py --extent='23.4,-6.2,23.8,-5.8' --buffer=0.1

2. Test convertCustomData.py standalone:
python processing/convertCustomData.py input.shp output.geojsonseq --reproject=EPSG:4326

3. Test runCreateTiles.py standalone:
python processing/runCreateTiles.py --extent='23.4,-6.2,23.8,-5.8' --create-tilejson

4. Test individual steps in this notebook:
   - Step 1: Download section (cell 6)
   - Step 2: Check downloaded files (cell 7)
   - Step 3: Convert custom data (cell 9)
   - Step 4: Process to PMTiles (cell 11)
   - Step 5: Create TileJSON (cell 13)

5. Validate outputs:
   - Check data/ directory for GeoJSON files
   - Check tiles/ directory for PMTiles files
   - Verify TileJSON metadata file

📋 CURRENT CONFIGURATION VALIDATION
Extent: (22.0, -6.0, 24.0, -4.0)
Buffer: 0.2 degrees
Output directory: tiles
Custom data directory: ['input/grid3']
Processing area: 4.00 degree² (49284 km²)

🎯 

# Modular Processing Summary

This notebook provides a complete, step-by-step approach for geospatial data processing with the following capabilities:

## Core Steps
1. **Download Overture Maps data** with spatial filtering using DuckDB
2. **Check and validate** downloaded files 
3. **Convert custom spatial data** to GeoJSON format
4. **Generate PMTiles** using optimized tippecanoe settings
5. **Create TileJSON metadata** for web mapping integration
6. **Validate and test** individual processing steps

## Key Features
- **Modular design** - Each step can be run independently
- **Flexible configuration** - Easy to customize for different areas and data types
- **Interactive development** - Run steps individually for debugging
- **Performance optimized** - Appropriate settings for different geometry types
- **Production ready** - Robust error handling and validation

## Output Files
Each step generates specific outputs that can be directly used:
- **GeoJSON/GeoJSONSeq files** for further processing or analysis
- **PMTiles files** for efficient web mapping
- **TileJSON metadata** for MapLibre GL JS integration

## Usage Patterns
- **Development**: Run steps individually for testing and debugging
- **Production**: Execute all steps in sequence for automated processing
- **Customization**: Modify CONFIG settings and re-run specific steps
- **Integration**: Use generated files with web mapping applications