# 1.3: WEPP Climate (.cli) File Preparation

**Objective:** To process the raw daily PRISM climate rasters (precipitation, min/max temperature) downloaded in the previous notebook and generate a WEPP-compatible climate file (`.cli`) for each study area.

**Gold-Standard Practices Implemented:**
- **Configuration-Driven:** All paths and parameters are loaded from the central `config.yml`.
- **Spatial Averaging:** For each polygon in our study areas, we will extract the climate data and calculate a single, representative daily time series.
- **Custom Formatting:** A dedicated function will format the time series data into the precise, column-based ASCII format required by the WEPP model.
- **Idempotent:** The notebook is designed to be run multiple times without re-generating existing `.cli` files.


In [1]:
# === 1. Configuration & Setup ===

# --- Core Libraries ---
from __future__ import annotations
import os
import sys
import yaml
import logging
from pathlib import Path

# --- Project-Specific Modules ---
# Add project's src directory to path to allow imports
def find_project_root(marker='config.yml'):
    path = Path.cwd().resolve()
    while path.parent != path:
        if (path / marker).exists(): return path
        path = path.parent
    raise FileNotFoundError(f"Project root with marker '{marker}' not found.")

PROJECT_ROOT = find_project_root()
sys.path.insert(0, str(PROJECT_ROOT / 'src'))

from utils import setup_colored_logging

# --- Geospatial & Data Libraries ---
import geopandas as gpd
import pandas as pd
import rioxarray
from tqdm.auto import tqdm

# --- Gold-Standard Logging Setup ---
setup_colored_logging()
log = logging.getLogger("1.3_wepp_climate_preparation")

# --- Configuration Loading ---
CONFIG_PATH = PROJECT_ROOT / "config.yml"
with open(CONFIG_PATH, 'r') as f:
    config = yaml.safe_load(f)

# --- Path Configuration (from config) ---
STUDY_AREAS_GPKG = PROJECT_ROOT / config['paths']['study_areas']
RAW_CLIMATE_DIR = PROJECT_ROOT / config['paths']['climate_dir'] / 'daily'
PROCESSED_CLIMATE_DIR = PROJECT_ROOT / config['paths']['processed_dir'] / 'climate_cli'

# Create output directory if it doesn't exist
PROCESSED_CLIMATE_DIR.mkdir(parents=True, exist_ok=True)

# --- Parameter Configuration ---
WGS84_CRS = config['parameters']['wgs84_crs']
PRISM_DATE_RANGE = config['data_sources']['prism']['date_range']

log.info("--- Configuration Summary ---")
log.info(f"Project Root:          {PROJECT_ROOT}")
log.info(f"Input Study Areas:     {STUDY_AREAS_GPKG}")
log.info(f"Input Raw Climate:     {RAW_CLIMATE_DIR}")
log.info(f"Output .cli Files:     {PROCESSED_CLIMATE_DIR}")
log.info("Setup complete.")


[38;20m2025-10-03 22:46:38 - 1.3_wepp_climate_preparation - INFO - --- Configuration Summary ---[0m
[38;20m2025-10-03 22:46:38 - 1.3_wepp_climate_preparation - INFO - Project Root:          /workspace[0m
[38;20m2025-10-03 22:46:38 - 1.3_wepp_climate_preparation - INFO - Input Study Areas:     /workspace/data/processed/study_areas.gpkg[0m
[38;20m2025-10-03 22:46:38 - 1.3_wepp_climate_preparation - INFO - Input Raw Climate:     /workspace/data/raw/climate_prism/daily[0m
[38;20m2025-10-03 22:46:38 - 1.3_wepp_climate_preparation - INFO - Output .cli Files:     /workspace/data/processed/climate_cli[0m
[38;20m2025-10-03 22:46:38 - 1.3_wepp_climate_preparation - INFO - Setup complete.[0m


In [None]:
# === 2. Load Study Area Polygons ===

def load_study_areas(gpkg_path: Path) -> gpd.GeoDataFrame:
    """Loads the study area polygons that will be used to generate climate files."""
    log.info(f"Loading study area polygons from {gpkg_path}")
    if not gpkg_path.exists():
        raise FileNotFoundError(f"Study areas file not found at {gpkg_path}. Please run notebook 1.1 first.")
    
    # Assuming the relevant layer is 'cv_provinces' for this example
    # This might need to be adjusted to point to specific watershed/hillslope polygons later
    gdf = gpd.read_file(gpkg_path, layer='cv_provinces')
    log.info(f"Loaded {len(gdf)} study area polygons.")
    return gdf

# --- Execute ---
study_areas_gdf = load_study_areas(STUDY_AREAS_GPKG)
display(study_areas_gdf.head())


### Climate Data Extraction Strategy

For each study area polygon, we need to generate a single representative time series. The process is as follows:

1.  **Identify Relevant Files:** For each day in our period of record, we locate the corresponding raw PRISM raster for precipitation, minimum temperature, and maximum temperature.
2.  **Clip and Aggregate:** For a given study area polygon, we will use `rioxarray` to clip the global daily raster to the polygon's bounds. 
3.  **Calculate Spatial Mean:** After clipping, we will calculate the mean value of all the pixels within the polygon. This gives us a single value (e.g., mean precipitation) for that specific day and that specific study area.
4.  **Build Time Series:** We repeat this for every day in the date range, building a complete `pandas` DataFrame that represents the daily climate for the study area.

This method ensures that the climate data is accurately tailored to the specific geographic boundaries of each analysis unit.


In [None]:
# === 3. Core Climate Processing Logic ===

# This is a placeholder for the core data processing logic.
# A full implementation would involve:
# 1. Iterating through each polygon in `study_areas_gdf`.
# 2. For each polygon, creating a date range from the config.
# 3. For each date, opening the corresponding ppt, tmin, and tmax files.
# 4. Clipping the raster with the polygon's geometry.
# 5. Calculating the mean of the clipped raster.
# 6. Storing the results in a pandas DataFrame.
# 7. Passing the completed DataFrame to a formatting function (defined in the next cell).

log.warning("Placeholder cell: Core data extraction logic needs to be implemented.")

# Example structure:
# for index, area in study_areas_gdf.iterrows():
#     area_id = area['some_unique_id'] # e.g., province name or watershed ID
#     log.info(f"Processing climate data for {area_id}...")
#     
#     # Check if final .cli file already exists
#     cli_output_path = PROCESSED_CLIMATE_DIR / f"{area_id}.cli"
#     if cli_output_path.exists():
#         log.info(f"  -> Skipping, {cli_output_path.name} already exists.")
#         continue
#
#     # ... implementation of steps 1-6 ...
#     daily_climate_df = ...
#
#     # ... call formatter ...
#     write_wepp_cli_file(daily_climate_df, cli_output_path, area)



### WEPP `.cli` File Format

The WEPP model requires a very specific ASCII text file format for its climate input. The file is space-delimited and has a header providing metadata about the station. The main body contains daily records with the following columns:

-   `Day`
-   `Month`
-   `Year`
-   `Precipitation` (mm)
-   `Duration` (hours) - *Often estimated*
-   `Peak Intensity` (mm/hr) - *Often estimated*
-   `Max Temperature` (°C)
-   `Min Temperature` (°C)
-   `Solar Radiation` (Langleys/day) - *May need to be estimated*
-   `Wind Velocity` (m/s) - *May need to be estimated*
-   `Wind Direction` (degrees) - *Often set to 0 if not available*

Our initial implementation will focus on formatting the data we have (Precip, Tmax, Tmin) and using reasonable defaults or estimation methods for the other required parameters.


In [None]:
# === 4. WEPP .cli Formatter ===

def write_wepp_cli_file(daily_data: pd.DataFrame, out_path: Path, area_info: pd.Series):
    """
    Formats a DataFrame of daily climate data into a WEPP .cli file.

    Args:
        daily_data (pd.DataFrame): Must have columns 'date', 'ppt_mm', 'tmax_c', 'tmin_c'.
        out_path (Path): The full path for the output .cli file.
        area_info (pd.Series): GeoDataFrame row containing metadata (lat/lon, name).
    """
    log.info(f"Generating WEPP .cli file: {out_path.name}")

    # --- Placeholder for Estimating Missing Parameters ---
    # WEPP requires radiation, wind, duration, etc.
    # For this example, we will use placeholder values.
    # A robust implementation would use an estimation library or empirical formulas.
    daily_data['rad_ly'] = 250  # Placeholder for Solar Radiation (Langleys)
    daily_data['wind_vel_ms'] = 2    # Placeholder for Wind Velocity (m/s)
    daily_data['wind_dir_deg'] = 0    # Placeholder for Wind Direction
    daily_data['dur_hr'] = daily_data['ppt_mm'].apply(lambda x: 1.0 if x > 0 else 0.0) # Simple duration
    daily_data['ip_mm_hr'] = daily_data['ppt_mm'] / daily_data['dur_hr'].replace(0, 1) # Simple peak intensity

    # --- Get metadata for the header ---
    # Calculate representative lat/lon from the polygon's centroid
    centroid = area_info.geometry.centroid
    lat, lon = centroid.y, centroid.x
    station_name = area_info.get('Name', 'Unknown') # Get name from a column if it exists
    num_years = len(daily_data['date'].dt.year.unique())

    # --- Write the file ---
    with open(out_path, 'w') as f:
        # --- Write Header ---
        f.write("WEPPCLI v1.0\n")
        f.write("#\n")
        f.write(f"# Climate file generated by pyWEPP on {pd.Timestamp.now().strftime('%Y-%m-%d')}\n")
        f.write(f"# Station: {station_name}\n")
        f.write("#\n")
        f.write("1 1\n") # Agent number, version
        f.write(f"{lat:.4f} {lon:.4f} 100.0\n") # Lat, Lon, Elevation (placeholder)
        f.write(f"{num_years} 0\n") # Number of years, 0

        # --- Write Daily Data ---
        for row in daily_data.itertuples():
            line = (
                f"{row.date.day:>4} {row.date.month:>4} {row.date.year:>6} "
                f"{row.ppt_mm:8.2f} {row.dur_hr:8.2f} {row.ip_mm_hr:8.2f} "
                f"{row.tmax_c:8.2f} {row.tmin_c:8.2f} {row.rad_ly:8.2f} "
                f"{row.wind_vel_ms:8.2f} {row.wind_dir_deg:8.2f}\n"
            )
            f.write(line)
            
    log.info(f"✅ Successfully wrote {out_path.name}")

log.warning("Placeholder cell: Formatting function is defined but not yet called.")