# Treatmet and control selection threshold

Defining control grids based on the percentage of overlap between treatment and control areas can provide a more nuanced approach. Below are several alternative options and methods for refining control grid definitions:

In [1]:
import os

In [2]:
country = 'KHM'
os.chdir('/Users/Daniel/Library/CloudStorage/GoogleDrive-dwiesner@sig-gis.com/My Drive/DISES/batched-predictions-branch')

In [3]:
import geopandas as gpd
import matplotlib.patches as mpatches
import contextily as ctx
import matplotlib.pyplot as plt
import pandas as pd
from shapely.validation import make_valid
import functions as fn

In [4]:
# Load the shape file with geographic covariates
initial_shape_path = os.path.join(data_folder, country, 'panel/panel-khm-02082024-v2.shp')
gdf = gpd.read_file(initial_shape_path)
project_crs = '32648' #Metric crs ideal for cambodia

NameError: name 'data_folder' is not defined

In [None]:
#Filter for only the necessary covariates
gdf = gdf[['grid_id', 'cf_buffer5', 'geometry']]

In [None]:
gdf = gdf.to_crs(project_crs)

In [None]:
#Load treatment and control from Roberto
tc = gpd.read_file('/Users/Daniel/Library/CloudStorage/GoogleDrive-dwiesner@sig-gis.com/.shortcut-targets-by-id/1Y83sGckPnURtqsg-y0FRgK1eOjNe7TSz/DISES shared/Data/Postmatching/treat_controls/CountryPSM_relevant.shp')

#Change crs to pcrs
tc = tc.to_crs(project_crs)

### **1. Using Overlap Percentages**
You can calculate the proportion of each grid cell covered by treatment and control areas and classify them based on thresholds:

1. **Calculate Area Overlap**:
   - Use `geopandas.overlay()` or `shapely` to calculate the intersection areas between grids and treatment/control areas.
   - Compute the percentage overlap for each grid with respect to its total area.

2. **Define Rules**:
   - Classify grids as control if a significant portion (e.g., >50%) of the grid overlaps with a control area but less than a certain threshold overlaps with treatment.


In [None]:
# Convert to GeoDataFrames
grid_gdf = gdf
areas_gdf = tc

# Validate geometries
grid_gdf['geometry'] = grid_gdf['geometry'].apply(make_valid)
areas_gdf['geometry'] = areas_gdf['geometry'].apply(make_valid)

# Split treatment and control areas
treatment_areas = areas_gdf[areas_gdf['treat'] == 1]
control_areas = areas_gdf[areas_gdf['treat'] == 0]

#Safe for empty intersections 
def safe_intersection(row, other_geometries):
    try:
        result = other_geometries.intersection(row.geometry).union_all()
        return result.area if not result.is_empty else 0
    except Exception:
        return 0

# Apply safe intersection
grid_gdf['C_Inter_Area'] = grid_gdf.apply(
    lambda row: safe_intersection(row, control_areas), axis=1
)
grid_gdf['T_Inter_Area'] = grid_gdf.apply(
    lambda row: safe_intersection(row, treatment_areas), axis=1
)

# Calculate percentages
grid_gdf['C_Percent'] = grid_gdf['C_Inter_Area'] / grid_gdf.geometry.area
grid_gdf['T_Percent'] = grid_gdf['T_Inter_Area'] / grid_gdf.geometry.area

#Save the gdf to a temporary location
grid_gdf.to_file('temp_files/grid_gdf.shp')

In [None]:
grid_gdf = grid_gdf.to_crs('WGS84')

In [None]:
thresholds = [0.1, 0.15, 0.2, 0.25, 0.5, 0.75, 1] 

In [None]:
for threshold in thresholds: 
    
    grid_gdf['Treatment'] = 3  # Default
    grid_gdf.loc[(grid_gdf['C_Percent'] > threshold), 'Treatment'] = 0
    grid_gdf.loc[grid_gdf['T_Percent'] > threshold, 'Treatment'] = 1
    
    title = f'Treatment and control grids with {threshold*100}% overlap criterion'
    fn.plot_treatment_control_grids(grid_gdf, 'Treatment', 'WGS84', title)

### **2. Weighted Overlap-Based Classification**
Instead of binary thresholds, assign a score to each grid based on weighted overlaps with control and treatment areas:

- Compute `Control_Score` and `Treatment_Score`:
   \[
   \text{Score} = \frac{\text{Overlap Area}}{\text{Grid Area}}
   \]

- Assign treatment status based on the dominance of these scores.

In [None]:
import numpy as np

In [None]:
# Assign Treatment
grid_gdf['Treatment'] = np.where(
    (grid_gdf['C_Percent'] == 0) & (grid_gdf['T_Percent'] == 0),
    3,
    grid_gdf[['C_Percent', 'T_Percent']].idxmax(axis=1).map({'C_Percent': 0, 'T_Percent': 1})
)

In [None]:
title = 'Treatment and control grids with max. % criterion'
fn.plot_treatment_control_grids(grid_gdf, 'Treatment', 'WGS84', title)

### 3. Classification of Villages based on Proximity
