# Preprocessing Guide: RainfallProcessor

This notebook describes the `RainfallProcessor`, a component designed for spatial interpolation of rainfall data. 

**Note:** As of the current version, this feature is incomplete due to missing dependencies and cannot be run. This document serves as a record of its intended design and a guide for future development.

## 1. Intended Purpose

Hydrological simulations often require rainfall data for specific points (like the center of a sub-basin), but real-world data usually comes from a sparse network of rain gauges.

The `RainfallProcessor` was designed to bridge this gap. Its purpose is to take data from multiple rain gauges and use a mathematical interpolation strategy (like Inverse Distance Weighting or Thiessen Polygons) to estimate the rainfall at any number of target locations. 

This process is intended to be run once, before the main simulation loop, to generate a complete rainfall time series for all necessary locations.

## 2. Current Status & Missing Components

The `RainfallProcessor` class exists in `chs_sdk.preprocessing.rainfall_processor`. However, it requires an **interpolation strategy** object to be passed to its constructor. 

The `ComponentRegistry` in `simulation_manager.py` shows that classes like `InverseDistanceWeightingInterpolator` were intended to exist in a file named `chs_sdk.preprocessing.interpolators.py`. 

**This file is currently missing from the repository.**

Without these interpolation strategy classes, the `RainfallProcessor` cannot be instantiated or used. The code below is a hypothetical example of how it *would* be used if the missing components were implemented.

## 3. Hypothetical Code Example

Below is an example of how one might configure and run the `RainfallProcessor` in a YAML file for the `SimulationManager`. **This code will not run.**

In [None]:
# HYPOTHETICAL YAML CONFIG - WILL NOT RUN

datasets:
  rain_gauges:
    - id: 'gauge_A'
      coords: [10, 20]
      time_series_path: 'data/rain_gauge_A.csv'
    - id: 'gauge_B'
      coords: [50, 60]
      time_series_path: 'data/rain_gauge_B.csv'

components:
  rainfall_interpolator:
    type: RainfallProcessor
    params:
      source_dataset: 'rain_gauges' # Key from the datasets section
      strategy: 
        type: InverseDistanceWeightingInterpolator # This class is missing
        params:
          power: 2

  my_hydrology_model:
    type: SemiDistributedHydrologyModel
    # ... other params
    # This model would then be able to access the interpolated rainfall
    # from the 'rainfall_interpolator' component.

# The processor would be run before the main simulation
preprocessing:
  - rainfall_interpolator

execution_order:
  - my_hydrology_model


### Path Forward

To make this feature functional, the following steps are required:
1. Create the `water_system_sdk/src/chs_sdk/preprocessing/interpolators.py` file.
2. Implement the `BaseSpatialInterpolator` abstract base class.
3. Implement concrete strategies like `InverseDistanceWeightingInterpolator` and `ThiessenPolygonInterpolator`.