# Processing Pipeline

A note on hot spot thresholding:
A key limitation of our approach is that hotspot classification is derived from the LST distribution itself (mean + 1σ within the administrative boundary), which means there is no independent thermal validation of what constitutes a "hotspot." Ground truth temperature measurements would be ideal here, but Ouagadougou has extremely limited in-situ monitoring infrastructure, essentially a single WMO station at the airport, with no open-access urban temperature network available for the study period. This is a common constraint in Sahelian cities and not unique to our analysis.

The predictor variables (NDVI, NDBI, land cover, elevation, distance to water/roads) are all sourced from spectrally and temporally independent datasets (Sentinel-2, ESA WorldCover, Copernicus DEM, JRC Global Surface Water). LST is *not* being used to predict itself, rather, it defines the classification target, while entirely independent remotely sensed variables are used as predictors. This follows a standard supervised classification framework widely used in UHI research (and then we'd need some high-quality citations here).

To further ensure robustness, we could:

- Run our analysis at multiple thresholds (mean + 0.5σ, mean + 1σ, mean + 1.5σ), and maybe a percentile-based approach (top 10%, top 20%). If the same predictors come out as dominant drivers regardless of threshold, that could strengthen our conclusions considerably. If the predictor importance changes drastically with threshold, that's important to report and discuss.
- Pull the airport METAR data from IEM or OGIMET for our study period and at least confirm that our LST composite's magnitude is physically plausible relative to observed air temperatures (acknowledging LST ≠ air temp, but they should be correlated). A 46°C mean LST during March-May with air temps routinely hitting 40-42°C at the airport is internally consistent and worth stating explicitly.
- Cross-validate our hotspot map against the actual terrain or ESA WorldCover classes. Since our hotspots overwhelmingly coincide with bare soil/urban outskirts where it's hot and our coldspots with urban center and vegetation/water, that's convergent evidence that we're capturing a real physical signal rather than noise.
- We could also do a spatial holdout validation, as planned. We train our model on a subset of the city and predict on the rest, then check whether it generalizes. This doesn't solve the ground truth problem but does test whether the relationships are spatially robust.


In [1]:
import sys
import warnings

import ee
import geemap
import geemap.colormaps as cm

from IPython.display import Image

sys.path.insert(0, "..")
from src.data import load_config
from src.pipeline import (
    load_aoi,
    compute_all_features,
    stack_layers,
    download_stack,
)

warnings.filterwarnings("ignore")

In [2]:
# =============================================================================
# LOAD CONFIG AND INITIALIZE EARTH ENGINE
# =============================================================================
config = load_config("../config/processing.yaml")

try:
    ee.Initialize(project=config["ee_project"])
    print("Earth Engine initialized successfully")
except Exception:
    print("  Earth Engine not authenticated, attempting authentication...")
    ee.Authenticate()
    ee.Initialize(project=config["ee_project"])
    print("Earth Engine initialized successfully")

Earth Engine initialized successfully


# Process data

In [3]:
# =============================================================================
# LOAD AOI AND COMPUTE ALL FEATURES
# =============================================================================
aoi = load_aoi(config)
layers = compute_all_features(aoi, config)

print(f"\nComputed {len(layers)} feature layers: {list(layers.keys())}")

AOI loaded from Earth Engine asset

Computed 10 feature layers: ['LST', 'hotspot', 'NDVI', 'NDBI', 'BSI', 'distance_to_water', 'distance_to_roads', 'DEM', 'built_density', 'green_density']


# Visualize before exporting

In [4]:
print("\n" + "="*80)
print("COMPUTING VISUALIZATION PARAMETERS")
print("="*80)


def get_band_stats(image, band_name, scale):
    """Get min/max statistics for a band at its native scale."""
    stats = image.select(band_name).clip(aoi).reduceRegion(
        reducer=ee.Reducer.minMax(),
        geometry=aoi,
        scale=scale,
        maxPixels=1e13
    ).getInfo()
    return stats[f'{band_name}_min'], stats[f'{band_name}_max']


# Band name, image, native scale
band_stats_config = [
    ('NDVI', layers['NDVI'], 10),
    ('NDBI', layers['NDBI'], 10),
    ('BSI', layers['BSI'], 10),
    ('DEM', layers['DEM'], 30),
    ('distance_to_water', layers['distance_to_water'], 30),
    ('distance_to_roads', layers['distance_to_roads'], 30),
    ('built_density', layers['built_density'], 30),
    ('green_density', layers['green_density'], 30),
    ('LST', layers['LST'], 30),
]

# Compute and store stats for each band
band_stats = {}
print("\nValue ranges:")
for band_name, image, scale in band_stats_config:
    vmin, vmax = get_band_stats(image, band_name, scale)
    band_stats[band_name] = {'min': vmin, 'max': vmax}
    print(f"  {band_name:.<25} [{vmin:.3f}, {vmax:.3f}]")


COMPUTING VISUALIZATION PARAMETERS

Value ranges:
  NDVI..................... [-0.173, 0.759]
  NDBI..................... [-0.382, 0.420]
  BSI...................... [-0.304, 0.422]
  DEM...................... [273.918, 346.675]
  distance_to_water........ [0.000, 13604.959]
  distance_to_roads........ [0.000, 657.951]
  built_density............ [0.000, 1.000]
  green_density............ [0.000, 1.000]
  LST...................... [30.953, 52.497]


In [5]:
# Set visualization parameters (using constant ranges cause it's faster)
vis_ndvi = {
    'min': -0.2, 
    'max': 0.8, 
    # 'palette': ['red', 'gold', 'green']
    'palette': cm.get_palette('RdYlGn')
}

vis_ndbi = {
    'min': -0.4, 
    'max': 0.5, 
    'palette': ['blue', 'darkcyan', 'gold', 'red', 'darkred']
}

vis_bsi = {
    'min': -0.4, 
    'max': 0.5, 
    'palette': ['blue', 'darkcyan', 'gold', 'red', 'darkred']
}

vis_elevation = {
    'min': 250,
    'max': 350,
    'palette': cm.get_palette('terrain')
}

vis_distance_water = {
    'min': 0,
    'max': 14000,
    'palette': ['blue', 'cyan', 'limegreen']
}

vis_distance_roads = {
    'min': 0,
    'max': 500,
    'palette': cm.get_palette('turbo')
}

vis_builtup_density = {
    'min': 0,
    'max': 1,
    'palette': ['royalblue', 'khaki', 'red']
}

vis_green_density = {
    'min': 0,
    'max': 1,
    'palette': ['navy', 'turquoise', 'greenyellow']
}

vis_lst = {
    'min': 25,
    'max': 55,
    'palette': cm.get_palette('turbo')
}

vis_hotspot = {
    'min': 0,
    'max': 1,
    'palette': ['cyan', 'red']
}

In [6]:
# =============================================================================
# Interactive Map (geemap)
# =============================================================================

Map1 = geemap.Map()
Map1.setCenter(-1.45, 12.345, 11)
Map1.add_basemap('Esri.WorldImagery')

# Map1.addLayer(layers['DEM'], vis_elevation, 'Elevation (DEM)', shown=False)
Map1.addLayer(layers['distance_to_water'], vis_distance_water, 'Distance to Water', shown=False)
Map1.addLayer(layers['distance_to_roads'], vis_distance_roads, 'Distance to Roads', shown=False)
# Map1.addLayer(layers['NDVI'], vis_ndvi, 'NDVI', shown=False)
# Map1.addLayer(layers['BSI'], vis_bsi, 'BSI', shown=False)
# Map1.addLayer(layers['NDBI'], vis_ndbi, 'NDBI', shown=False)
# Map1.addLayer(layers['green_density'], vis_green_density, 'Green Space Density', shown=False)
# Map1.addLayer(layers['built_density'], vis_builtup_density, 'Built-Up Density', shown=False)
Map1.addLayer(layers['LST'], vis_lst, 'LST (°C)', shown=True)
Map1.addLayer(layers['hotspot'], vis_hotspot, 'Hot spots', shown=False)
Map1.addLayer(aoi)

Map1

Map(center=[12.345, -1.45], controls=(WidgetControl(options=['position', 'transparent_bg'], position='topright…

In [7]:
# =============================================================================
# Visual Comparison (renders on GitHub)
# =============================================================================

viz_layers = {
    'NDVI': (layers['NDVI'], vis_ndvi),
    'NDBI': (layers['NDBI'], vis_ndbi),
    'BSI': (layers['BSI'], vis_bsi),
    'Elevation (DEM)': (layers['DEM'], vis_elevation),
    'Distance to Water': (layers['distance_to_water'], vis_distance_water),
    'Distance to Roads': (layers['distance_to_roads'], vis_distance_roads),
    'Built-Up Density': (layers['built_density'], vis_builtup_density),
    'Green Space Density': (layers['green_density'], vis_green_density),
    'LST (°C)': (layers['LST'], vis_lst),
    'Hot spots': (layers['hotspot'], vis_hotspot),
}

for name, (image, vis) in viz_layers.items():
    print(f"--- {name} ---")
    thumb_url = image.clip(aoi).getThumbUrl({
        'min': vis['min'],
        'max': vis['max'],
        'palette': vis['palette'],
        'dimensions': 400,
    })
    display(Image(url=thumb_url))

--- NDVI ---


--- NDBI ---


--- BSI ---


--- Elevation (DEM) ---


--- Distance to Water ---


--- Distance to Roads ---


--- Built-Up Density ---


--- Green Space Density ---


--- LST (°C) ---


--- Hot spots ---


# Align and export

In [8]:
# =============================================================================
# STACK & VERIFY BAND TYPES
# =============================================================================
predictor_stack = stack_layers(layers, config["band_names"], aoi)

# Verify band types
band_types = predictor_stack.bandTypes().getInfo()
print("Band types after casting:")
for band, dtype in band_types.items():
    print(f"  {band}: {dtype}")

Band types after casting:
  BSI: {'type': 'PixelType', 'precision': 'float'}
  DEM: {'type': 'PixelType', 'precision': 'float'}
  LST: {'type': 'PixelType', 'precision': 'float'}
  NDBI: {'type': 'PixelType', 'precision': 'float'}
  NDVI: {'type': 'PixelType', 'precision': 'float'}
  built_density: {'type': 'PixelType', 'precision': 'float'}
  distance_to_roads: {'type': 'PixelType', 'precision': 'float'}
  distance_to_water: {'type': 'PixelType', 'precision': 'float'}
  green_density: {'type': 'PixelType', 'precision': 'float'}
  hotspot: {'type': 'PixelType', 'precision': 'float'}


In [9]:
# =============================================================================
# DOWNLOAD STACKED IMAGE
# =============================================================================
download_stack(predictor_stack, aoi, config)

Downloading to /Users/helyne/code/climatematch/ouaga-urban-heat-drivers/data/processed/ouaga_aligned_stack.tif...


  0%|          |0/10 tiles [00:00<?]

Done. Saved to /Users/helyne/code/climatematch/ouaga-urban-heat-drivers/data/processed/ouaga_aligned_stack.tif
