# Product Requirements Document (PRD): CHF-Automator

## 1. Executive Summary
**Project Name:** `CHF-Automator`  
**Objective:** Automate the "Crop Health Factor (CHF)" workflow (Murthy et al., 2022) for paddy crop insurance using Google Earth Engine (GEE).  
**Core Philosophy:** 
1. **Dynamic Masking:** Every year of analysis (Historic or Current) uses its own specific Crop Map to ensure spatial accuracy.
2. **Training vs. Application:** Weights are learned from the variance in **Historic Years only**, but CHF scores are calculated for **All Years (Historic + Current)**.

## 2. Technical Stack
* **Platform:** Google Earth Engine (Python API).
* **Environment:** Python 3.9+ (Local execution or Jupyter Notebook).
* **Libraries:** `earthengine-api`, `pandas`, `geopandas`, `numpy`.
* **Authentication:** GEE Project `nrscworks`.

## 3. Input Specifications
The tool will require the following inputs from the user during execution:

1.  **Insurance Units (Vector):** Path to Shapefile containing `Unit_ID` and `Strata_ID`. (Constant across all years).
2.  **Season Dates:** `season_start`, `season_end`, `peak_start`, `peak_end`. (Constant across all years).
3.  **Analysis Configuration:**
    * The user will provide a list of **All Years** to analyze (e.g., `[2018, 2019, 2020, 2021, 2022, 2023]`).
    * **Crucial:** For *each* year in this list, the user must map it to a specific **Crop Map Asset ID**.
    * **Training Subset:** The user must specify which of these years are "Historic/Training" years (e.g., 2018-2022) vs "Current/Assessment" year (e.g., 2023).

## 4. The 8 Indicators (GEE Processing Logic)
For any given year `Y` and Crop Map `M`, calculate these 8 indicators for the **crop pixels only** within each `Unit_ID`.

| Indicator | Data Source | GEE Processing Logic |
| :--- | :--- | :--- |
| **1. Max NDVI** | Sentinel-2 (`COPERNICUS/S2_SR_HARMONIZED`) | **Masking:** Use band `MSK_CLDPRB`. Mask pixels where `MSK_CLDPRB > 20`. <br>**Calc:** Max NDVI over `season_start` to `season_end`. |
| **2. Max LSWI** | Sentinel-2 (`COPERNICUS/S2_SR_HARMONIZED`) | **Masking:** Same as above (`MSK_CLDPRB > 20`). <br>**Calc:** Max LSWI `(NIR-SWIR1)/(NIR+SWIR1)` over season. |
| **3. Max Backscatter** | Sentinel-1 (`COPERNICUS/S1_GRD`) | **Filter:** `InstrumentMode: IW`, `Polarization: VH`, `OrbitProperties_Pass: DESCENDING`. <br>**Process:** Apply Refined Lee Speckle Filter (5x5). <br>**Calc:** Max $\sigma^0$ (VH) over season. |
| **4. Integrated Backscatter** | Sentinel-1 (`COPERNICUS/S1_GRD`) | **Calc:** `.sum()` of VH backscatter over season (Area under the curve). |
| **5. Integrated FAPAR** | MODIS (`MODIS/061/MCD15A3H`) | **Band:** `Fpar`. <br>**Window:** Sum `.sum()` over the specific **`peak_start` to `peak_end`** window. <br>*Note: Handle resolution mismatch (500m vs S2) automatically.* |
| **6. Condition Variability** | Sentinel-2 | **Spatial CV:** Calculate the Coefficient of Variation spatially within the IU polygon. <br>**Formula:** `(Standard Deviation of Max_NDVI_Image / Mean of Max_NDVI_Image)` within the geometry. |
| **7. Rainy Days** | CHIRPS Daily (`UCSB-CHG/CHIRPS/DAILY`) | **Calc:** Count days where `precipitation > 2.5mm` between `season_start` and `season_end`. |
| **8. Adjusted Rainfall** | CHIRPS Daily | **Step A:** Calc Current Year Total Rain. <br>**Step B (Normal):** Calc 10-year average total rain for the same season dates (Pre-calculated constant). <br>**Step C (Capping):** If `Current` > (1.5 * `Normal`), set value to (1.5 * `Normal`). Else use `Current`. |

## 5. Algorithmic Workflow

### Phase 1: Dynamic Data Extraction
1.  **Loop:** Iterate through *every* year (Historic + Current) provided by the user.
2.  **Fetch:** For each year, call GEE with that year's specific **Crop Map Asset**.
3.  **Compile:** Create a single Master DataFrame containing data for ALL years.
4.  **Output 1:** Export `raw_indicators_all_years.csv` (Columns: `Year`, `Unit_ID`, `Strata_ID`, + 8 Indicators).

### Phase 2: Weight Generation (Training)
1.  **Filter:** Subset the Master DataFrame to include **Historic Years Only**.
2.  **Group:** Group by `Strata_ID`.
3.  **Entropy Calculation (Per Strata):**
    * **Normalize** the Historic Data (Min-Max).
    * Calculate **Entropy ($E$)**, **Divergence ($D$)**, and **Weight ($w$)**.
    * *Edge Case:* If `Max == Min` (Zero Variance), Weight = 0.
4.  **Output 2:** Export `strata_weights.csv` (Columns: `Strata_ID`, `Weight_NDVI`, `Weight_LSWI`, ...).

### Phase 3: CHF Calculation (All Years)
1.  **Input:** Use the Master DataFrame (All Years) from Phase 1.
2.  **Normalize:** Normalize the entire dataset using the **Min/Max limits derived from the Historic Data** (Phase 2). This ensures the scale is consistent across time.
3.  **Apply Weights:** For every row (Historic & Current), calculate: $$ CHF = \sum (NormVal \times Weight) $$
4.  **Output 3:** Export `chf_scores_all_years.csv` (Columns: `Year`, `Unit_ID`, `Strata_ID`, `CHF_Score`).

## 6. Project Structure

```text
CHF-software/
├── inputs/
│   └── (User shapefiles go here)
├── outputs/
│   ├── raw_indicators_all_years.csv
│   ├── strata_weights.csv
│   └── chf_scores_all_years.csv
├── src/
│   ├── __init__.py
│   ├── gee_utils.py       # Helper functions (Cloud masking, Filtering, Band Math)
│   ├── data_fetcher.py    # Class to handle the Multi-Year Loop
│   └── chf_engine.py      # Pandas logic for Entropy and Weights
├── main.py                # Main execution script
└── requirements.txt
```