# Product Requirements Document (PRD): CHF-Automator

## 1. Executive Summary
**Project Name:** `CHF-Automator`  
**Objective:** Automate the "Crop Health Factor (CHF)" workflow (Murthy et al., 2022) for paddy crop insurance using Google Earth Engine (GEE).  
**Core Logic:** The tool processes multi-year satellite data to "learn" the weight of importance for various crop health indicators using the Entropy method, then applies those weights to assess the current season's risk.

## 2. Technical Stack
* **Platform:** Google Earth Engine (Python API).
* **Environment:** Python 3.9+ (Local execution or Jupyter Notebook).
* **Libraries:** `earthengine-api`, `pandas`, `geopandas`, `numpy`.
* **Authentication:** GEE Project `nrscworks`.

## 3. Input Specifications
The user must provide the following inputs in the configuration:

1.  **Insurance Units (Vector):**
    * A Shapefile or FeatureCollection containing the boundaries.
    * **Required Attributes:**
        * `Unit_ID`: Unique identifier for the village/unit.
        * `Strata_ID`: Identifier for the homogeneous group the unit belongs to.
2.  **Crop Mask (Raster):**
    * A GEE Asset (GeoTIFF) where Pixel Value 1 = Crop, 0 = Non-Crop.
3.  **Training Years (List):**
    * A list of historical years used to calculate the Entropy Weights (e.g., `[2018, 2019, 2020, 2021, 2022]`).
    * *Note: Includes both good and bad years to capture variance.*
4.  **Assessment Year (Integer):**
    * The specific target year for which CHF is to be calculated (e.g., `2023`).
5.  **Season Dates:**
    * `season_start`: (e.g., '06-15' for June 15)
    * `season_end`: (e.g., '11-30' for Nov 30)
    * `peak_start`: (e.g., '09-01' - Start of active vegetative phase for FAPAR)
    * `peak_end`: (e.g., '11-15' - End of active vegetative phase for FAPAR)

## 4. The 8 Indicators (GEE Processing Logic)
The tool must calculate the mean value of these 8 indicators for the **crop pixels only** within each `Unit_ID`.

| Indicator | Data Source | GEE Processing Logic |
| :--- | :--- | :--- |
| **1. Max NDVI** | Sentinel-2 (`COPERNICUS/S2_SR_HARMONIZED`) | **Masking:** Use band `MSK_CLDPRB`. Mask out pixels where `MSK_CLDPRB > 20`. <br>**Calc:** Max NDVI over `season_start` to `season_end`. |
| **2. Max LSWI** | Sentinel-2 (`COPERNICUS/S2_SR_HARMONIZED`) | **Masking:** Same as above (`MSK_CLDPRB > 20`). <br>**Calc:** Max LSWI `(NIR-SWIR1)/(NIR+SWIR1)` over season. |
| **3. Max Backscatter** | Sentinel-1 (`COPERNICUS/S1_GRD`) | **Filter:** `InstrumentMode: IW`, `Polarization: VH`, `OrbitProperties_Pass: DESCENDING`. <br>**Process:** Apply Refined Lee Speckle Filter (5x5). <br>**Calc:** Max $\sigma^0$ (VH) over season. |
| **4. Integrated Backscatter** | Sentinel-1 (`COPERNICUS/S1_GRD`) | **Calc:** `.sum()` of VH backscatter over season (Area under the curve). |
| **5. Integrated FAPAR** | MODIS (`MODIS/061/MCD15A3H`) | **Band:** `Fpar`. <br>**Window:** Sum `.sum()` over the specific **`peak_start` to `peak_end`** window. <br>*Note: Handle resolution mismatch (500m vs S2) automatically.* |
| **6. Condition Variability** | Sentinel-2 | **Spatial CV:** Calculate the Coefficient of Variation spatially within the IU polygon. <br>**Formula:** `(Standard Deviation of Max_NDVI_Image / Mean of Max_NDVI_Image)` within the geometry. |
| **7. Rainy Days** | CHIRPS Daily (`UCSB-CHG/CHIRPS/DAILY`) | **Calc:** Count days where `precipitation > 2.5mm` between `season_start` and `season_end`. |
| **8. Adjusted Rainfall** | CHIRPS Daily | **Step A:** Calc Current Year Total Rain. <br>**Step B (Normal):** Calc 10-year average total rain for the same season dates. <br>**Step C (Capping):** If `Current` > (1.5 * `Normal`), set value to (1.5 * `Normal`). Else use `Current`. |

## 5. Algorithmic Workflow

### Phase 1: GEE Data Extraction (`src/gee_processor.py`)
This module is responsible for talking to Earth Engine.
1.  **Function `fetch_metrics(year, roi, crop_mask_asset)`:**
    * Connect to GEE.
    * Load Collections (S1, S2, MODIS, CHIRPS).
    * Apply `MSK_CLDPRB` mask to Optical. Apply Speckle Filter to SAR. Apply User Crop Mask to all (except rain).
    * Reduce images to the 8 bands described above.
    * Run `reduceRegions` on the ROI to get tabular data.
    * **Return:** A Pandas DataFrame containing columns: `[Unit_ID, Strata_ID, NDVI, LSWI, ...]`

### Phase 2: Weight Training (`src/chf_model.py`)
1.  Iterate through the **Training Years** list.
2.  Call `fetch_metrics` for each year.
3.  Concatenate all years into a single **Master Training DataFrame**.
4.  **Group by `Strata_ID`** and perform the following **independently for each group**:
    * **Normalization:**
        * Positive Indicators: $$ New = \frac{Val - Min}{Max - Min} $$
        * Negative Indicators (Variability): $$ New = \frac{Max - Val}{Max - Min} $$
    * **Entropy Calculation:**
        * Probability: $$ P_{ij} = \frac{X_{ij}}{\sum X_{ij}} $$
        * Entropy: $$ E_j = -k \sum (P_{ij} \ln(P_{ij})) $$ where $$ k = \frac{1}{\ln(n)} $$
        * Divergence: $$ D_j = 1 - E_j $$
        * Final Weight: $$ w_j = \frac{D_j}{\sum D_j} $$
    * **Edge Case Handling:** If an indicator has zero variance (Max == Min), force Weight = 0.
5.  **Output:** A dictionary or JSON file mapping `Strata_ID` to its calculated weights.

### Phase 3: Assessment & CHF Computation (`main.py`)
1.  Call `fetch_metrics` for the **Assessment Year** (e.g., 2023).
2.  **Normalization:** Normalize the 2023 data using the **Min/Max values derived from the Training Phase**. (Do not re-calculate Min/Max solely on 2023 data, or the scale will be mismatched).
3.  **Apply Weights:** Multiply the normalized 2023 values by the weights saved in Phase 2.
4.  **Final Sum:** $$ CHF = \sum (NormVal \times Weight) $$
5.  **Export:** Save `CHF_Results_2023.csv` containing `Unit_ID`, `Strata_ID`, Raw Metrics, and Final CHF.

## 6. Project Structure

```text
CHF-software/
├── inputs/
│   └── (User shapefiles go here)
├── outputs/
│   └── (Results CSVs go here)
├── src/
│   ├── __init__.py
│   ├── gee_utils.py       # Helper functions (Cloud masking, Filtering, Band Math)
│   ├── data_fetcher.py    # Class to handle the GEE extraction loop
│   └── chf_engine.py      # Pandas logic for Entropy and Weights
├── main.py                # Main execution script
└── requirements.txt
```