# Quick Start: Loading the Ouagadougou Dataset

This notebook shows how to load and work with the processed data.

The pipeline produces a single multi-band GeoTIFF (`data/processed/ouaga_aligned_stack.tif`) containing all 10 variables at 30m resolution. The `src.data` module provides convenience functions to load this into a pandas DataFrame or access the raw raster.

## Option 1: One-call DataFrame (recommended)

This is the simplest way to get started. `load_dataset` loads the config, reads the raster, filters out no-data pixels, and returns a modeling-ready DataFrame.

In [None]:
import sys
sys.path.insert(0, "..")

from src.data import load_dataset

df, config = load_dataset("../config/processing.yaml")

In [None]:
# One row per valid pixel, with coordinates and all 10 bands
print(f"{len(df):,} pixels")
print(f"Columns: {list(df.columns)}")
df.describe()

In [None]:
# Predictor columns (everything except the target variables)
predictor_cols = [c for c in config['band_names'] if c not in ['hotspot', 'LST']]
print(f"Predictors: {predictor_cols}")
print(f"Target: LST (continuous), hotspot (binary)")

## What's in `config`?

The config dict contains all study parameters (from `config/processing.yaml`) plus derived fields. Use it instead of hardcoding values.

In [None]:
# Study design
print(f"CRS:              {config['target_crs']}")
print(f"Resolution:       {config['target_scale']}m")
print(f"Study years:      {config['study_years']}")
print(f"Hot season:       months {config['hot_season_months']}")
print(f"Random state:     {config['random_state']}")

# Paths (derived automatically from config)
print(f"\nRaster path:      {config['raster_path']}")
print(f"Data dir:         {config['data_dir']}")
print(f"Figures dir:      {config['figures_dir']}")

# Band ordering (1-indexed, for use with rasterio)
print(f"\nBand index:       {config['band_index']}")

## Option 2: Raw raster access

The 3D numpy array and raster metadata are available via `config['raster_info']`. Use this when you need spatial operations (e.g. plotting maps, rasterio I/O).

In [None]:
info = config['raster_info']

print(f"Raster shape:     {info['shape']}  (bands, rows, cols)")
print(f"CRS:              {info['crs']}")
print(f"Resolution:       {info['resolution'][0]:.0f}m x {info['resolution'][1]:.0f}m")
print(f"Coverage:         {info['n_valid']:,} / {info['n_total']:,} pixels ({info['coverage_pct']:.1f}%)")

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Plot a single band using the raw raster array
lst_band = config['band_index']['LST'] - 1  # convert to 0-indexed
lst_data = np.ma.masked_invalid(info['data_3d'][lst_band])

fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(lst_data, cmap='turbo')
fig.colorbar(im, label='LST (Â°C)')
ax.set_title('Land Surface Temperature')
plt.show()

## Option 3: Config only

If you just need the config (e.g. for GEE work in `01_processing_pipeline.ipynb`), use `load_config` directly.

In [None]:
from src.data import load_config

config = load_config("../config/processing.yaml")
print(f"Band names: {config['band_names']}")