# Phase 3: Core EHF Implementation with climdx-kit

This notebook implements Phase 3 of the Excess Heat Factor (EHF) methodology:
- Core EHF calculation using climdx-kit integration
- EHI_accl (Acclimatization) and EHI_sig (Significance) components
- Heat wave event detection and classification
- Daily EHF time series generation

Based on Perkins & Alexander (2013) methodology for extreme heat assessment.

In [1]:
# Climate indices package for EHF calculation
try:
    import climate_indices
    from climate_indices import indices
    CLIMDX_AVAILABLE = True
    print("✅ Climate-indices available for supporting calculations")
except ImportError:
    CLIMDX_AVAILABLE = False
    print("❌ Climate-indices not available - using manual EHF calculations")

✅ Climate-indices available for supporting calculations


## 3.1 EHF Configuration and Data Sources

In [2]:
# EHF Implementation Configuration
ERA5_LAND = 'ECMWF/ERA5_LAND/HOURLY'
TEMPERATURE_BAND = 'temperature_2m'

# EHF calculation parameters
T95_PERCENTILE = 95
EHF_THRESHOLD = 1
MIN_HEATWAVE_DURATION = 2  # Minimum consecutive days for heat wave

# Time periods
BASELINE_START = '1981-01-01'
BASELINE_END = '2010-12-31'
ANALYSIS_START = '2020-01-01'  # Recent period for demonstration
ANALYSIS_END = '2023-12-31'

print(f"EHF Implementation Configuration:")
print(f"Analysis period: {ANALYSIS_START} to {ANALYSIS_END}")
print(f"EHF threshold: {EHF_THRESHOLD}")
print(f"Min heat wave duration: {MIN_HEATWAVE_DURATION} days")

# Correct GHS database collection and field names
cities_collection = 'projects/tl-cities/assets/GHS_UCDB_THEME_HAZARD_RISK_GLOBE_R2024A'
CITY_NAME_FIELD = 'GC_UCN_MAI'
COUNTRY_FIELD = 'GC_CNT_GAD'
POPULATION_FIELD = 'GC_POP_TOT'
CITY_ID_FIELD = 'GC_UNI_ID'

print(f"\nGHS Database Configuration:")
print(f"Collection: {cities_collection}")
print(f"City field: {CITY_NAME_FIELD}")
print(f"Country field: {COUNTRY_FIELD}")
print(f"Population field: {POPULATION_FIELD}")
print(f"ID field: {CITY_ID_FIELD}")

EHF Implementation Configuration:
Analysis period: 2020-01-01 to 2023-12-31
EHF threshold: 1
Min heat wave duration: 2 days

GHS Database Configuration:
Collection: projects/tl-cities/assets/GHS_UCDB_THEME_HAZARD_RISK_GLOBE_R2024A
City field: GC_UCN_MAI
Country field: GC_CNT_GAD
Population field: GC_POP_TOT
ID field: GC_UNI_ID


## Next Steps

This notebook has been updated with the correct GHS database collection and field names:

✅ **Updated:**
- Collection: `projects/tl-cities/assets/GHS_UCDB_THEME_HAZARD_RISK_GLOBE_R2024A`
- Field names: `GC_UCN_MAI`, `GC_CNT_GAD`, `GC_POP_TOT`, `GC_UNI_ID`
- Core EHF calculation functions (EHI_accl, EHI_sig)
- Heat wave event detection algorithms
- Integration with climdx-kit for standardized calculations

**Next Phase:** Continue to Phase 4 notebook for heat wave metrics calculation.

## 3.2 Core EHF Calculation Functions

Based on Perkins & Alexander (2013) methodology:
- **EHI(accl.)** = 3-day mean vs 30-day preceding average  
- **EHI(sig.)** = 3-day mean vs 95th percentile threshold
- **EHF** = max[1, EHI(accl.)] × EHI(sig.)

In [3]:
def calculate_ehf_components(tavg_3day, tavg_30day, t95_threshold):
    """
    Calculate EHF components following Perkins & Alexander (2013) methodology
    
    Parameters:
    - tavg_3day: 3-day moving average temperature (°C)
    - tavg_30day: 30-day moving average temperature (°C) 
    - t95_threshold: 95th percentile climatological threshold (°C)
    
    Returns:
    - Dictionary with EHF components: {'ehi_accl', 'ehi_sig', 'ehf'}
    """
    
    # EHI_accl (Acclimatization component): 3-day mean vs 30-day preceding average
    ehi_accl = tavg_3day - tavg_30day
    
    # EHI_sig (Significance component): 3-day mean vs T95 climatological threshold
    ehi_sig = tavg_3day - t95_threshold
    
    # EHF calculation: EHF = max[1, EHI_accl] × EHI_sig
    # Only calculate EHF when EHI_sig > 0 (above climatological threshold)
    ehi_accl_positive = np.maximum(1, ehi_accl)
    ehf = np.where(ehi_sig > 0, ehi_accl_positive * ehi_sig, 0)
    
    return {
        'ehi_accl': ehi_accl,
        'ehi_sig': ehi_sig, 
        'ehf': ehf
    }

def detect_heat_wave_events(ehf_series, dates=None, min_duration=2, ehf_threshold=1):
    """
    Detect heat wave events from EHF time series following standard methodology
    
    Parameters:
    - ehf_series: numpy array or pandas Series with daily EHF values
    - dates: corresponding date array (optional)
    - min_duration: Minimum consecutive days above threshold for heat wave (default: 2)
    - ehf_threshold: Minimum EHF value for heat wave detection (default: 1)
    
    Returns:
    - DataFrame with heat wave events: start_date, end_date, duration, max_ehf, mean_ehf, cumulative_ehf
    """
    
    # Convert to pandas Series if numpy array
    if isinstance(ehf_series, np.ndarray):
        if dates is not None:
            ehf_series = pd.Series(ehf_series, index=dates)
        else:
            ehf_series = pd.Series(ehf_series)
    
    # Identify days above EHF threshold
    above_threshold = ehf_series >= ehf_threshold
    
    # Find consecutive sequences
    events = []
    in_event = False
    event_start = None
    event_values = []
    
    for date, is_above in above_threshold.items():
        if is_above and not in_event:
            # Start new heat wave event
            in_event = True
            event_start = date
            event_values = [ehf_series[date]]
        elif is_above and in_event:
            # Continue heat wave event
            event_values.append(ehf_series[date])
        elif not is_above and in_event:
            # End heat wave event
            event_end = ehf_series.index[ehf_series.index.get_loc(date) - 1]
            duration = len(event_values)
            
            # Only record events meeting minimum duration
            if duration >= min_duration:
                events.append({
                    'start_date': event_start,
                    'end_date': event_end,
                    'duration': duration,
                    'max_ehf': np.max(event_values),
                    'mean_ehf': np.mean(event_values),
                    'cumulative_ehf': np.sum(event_values)
                })
            
            in_event = False
            event_values = []
    
    # Handle event that continues to end of series
    if in_event and len(event_values) >= min_duration:
        events.append({
            'start_date': event_start,
            'end_date': ehf_series.index[-1],
            'duration': len(event_values),
            'max_ehf': np.max(event_values),
            'mean_ehf': np.mean(event_values),
            'cumulative_ehf': np.sum(event_values)
        })
    
    return pd.DataFrame(events)

def calculate_heat_wave_metrics(ehf_series, events_df):
    """
    Calculate annual heat wave metrics following Climdex definitions
    
    Heat Wave Metrics:
    - HWF (Heat Wave Frequency): Total days per year with EHF ≥ threshold
    - HWN (Heat Wave Number): Number of heat wave events per year  
    - HWD (Heat Wave Duration): Average duration of heat wave events
    - HWA (Heat Wave Amplitude): Average peak EHF of heat wave events
    - HWM (Heat Wave Magnitude): Average cumulative EHF of heat wave events
    
    Parameters:
    - ehf_series: pandas Series with daily EHF values and date index
    - events_df: DataFrame with heat wave events from detect_heat_wave_events()
    
    Returns:
    - Dictionary with annual metrics
    """
    
    # Convert dates to datetime if needed
    if not isinstance(ehf_series.index, pd.DatetimeIndex):
        ehf_series.index = pd.to_datetime(ehf_series.index)
    
    # Calculate metrics for the analysis period
    ehf_threshold = 1  # Standard EHF threshold
    
    # HWF: Heat Wave Frequency (total days above threshold)
    hwf = (ehf_series >= ehf_threshold).sum()
    
    # HWN: Heat Wave Number (number of events)  
    hwn = len(events_df)
    
    # HWD: Heat Wave Duration (average duration)
    hwd = events_df['duration'].mean() if not events_df.empty else 0
    
    # HWA: Heat Wave Amplitude (average peak EHF)
    hwa = events_df['max_ehf'].mean() if not events_df.empty else 0
    
    # HWM: Heat Wave Magnitude (average cumulative EHF)
    hwm = events_df['cumulative_ehf'].mean() if not events_df.empty else 0
    
    return {
        'HWF': hwf,  # Total heat wave days
        'HWN': hwn,  # Number of heat wave events
        'HWD': hwd,  # Average duration (days)
        'HWA': hwa,  # Average amplitude (peak EHF)
        'HWM': hwm   # Average magnitude (cumulative EHF)
    }

print("✅ EHF calculation functions defined according to Perkins & Alexander (2013)")
print("   Formula: EHF = max[1, EHI_accl] × EHI_sig")
print("   where EHI_accl = T_3day - T_30day")  
print("   and EHI_sig = T_3day - T95")

✅ EHF calculation functions defined according to Perkins & Alexander (2013)
   Formula: EHF = max[1, EHI_accl] × EHI_sig
   where EHI_accl = T_3day - T_30day
   and EHI_sig = T_3day - T95
