# Phase 4: Heat Wave Metrics Calculation (HWF, HWN, HWD, HWA, HWM)

This notebook implements Phase 4 of the Excess Heat Factor (EHF) methodology:
- Annual heat wave metrics calculation
- Heat Wave Frequency (HWF), Number (HWN), Duration (HWD), Amplitude (HWA), Magnitude (HWM)
- Trend analysis and comparative assessment
- Multi-year heat wave pattern analysis

Based on Perkins & Alexander (2013) methodology for extreme heat assessment.

In [1]:
# Climate indices package
try:
    import climate_indices
    from climate_indices import indices
    CLIMDX_AVAILABLE = True
    print("✅ Climate-indices available for supporting calculations")
except ImportError:
    CLIMDX_AVAILABLE = False
    print("❌ Climate-indices not available - using manual calculations")

✅ Climate-indices available for supporting calculations


## 4.1 Heat Wave Metrics Configuration

In [2]:
# Heat Wave Metrics Configuration
ERA5_LAND = 'ECMWF/ERA5_LAND/HOURLY'
TEMPERATURE_BAND = 'temperature_2m'

# Metrics calculation parameters
EHF_THRESHOLD = 1
MIN_HEATWAVE_DURATION = 2
METRICS_START_YEAR = 1980
METRICS_END_YEAR = 2023

# Analysis periods
BASELINE_PERIOD = '1981-2010'
RECENT_PERIOD = '1994-2023'
ANALYSIS_PERIOD = f'{METRICS_START_YEAR}-{METRICS_END_YEAR}'

print(f"Heat Wave Metrics Configuration:")
print(f"Analysis period: {ANALYSIS_PERIOD}")
print(f"EHF threshold: {EHF_THRESHOLD}")
print(f"Min duration: {MIN_HEATWAVE_DURATION} days")
print(f"Baseline period: {BASELINE_PERIOD}")
print(f"Recent period: {RECENT_PERIOD}")

# Correct GHS database collection and field names
cities_collection = 'projects/tl-cities/assets/GHS_UCDB_THEME_HAZARD_RISK_GLOBE_R2024A'
CITY_NAME_FIELD = 'GC_UCN_MAI'
COUNTRY_FIELD = 'GC_CNT_GAD'
POPULATION_FIELD = 'GC_POP_TOT'
CITY_ID_FIELD = 'GC_UNI_ID'

print(f"\nGHS Database Configuration:")
print(f"Collection: {cities_collection}")
print(f"City field: {CITY_NAME_FIELD}")
print(f"Country field: {COUNTRY_FIELD}")
print(f"Population field: {POPULATION_FIELD}")
print(f"ID field: {CITY_ID_FIELD}")

# Heat wave metrics definitions
print(f"\n{'='*50}")
print("HEAT WAVE METRICS DEFINITIONS")
print(f"{'='*50}")
print("HWF - Heat Wave Frequency: Days per year above EHF threshold")
print("HWN - Heat Wave Number: Number of heat wave events per year")
print("HWD - Heat Wave Duration: Average duration of heat wave events")
print("HWA - Heat Wave Amplitude: Average peak EHF of heat wave events")
print("HWM - Heat Wave Magnitude: Average cumulative EHF of heat wave events")

Heat Wave Metrics Configuration:
Analysis period: 1980-2023
EHF threshold: 1
Min duration: 2 days
Baseline period: 1981-2010
Recent period: 1994-2023

GHS Database Configuration:
Collection: projects/tl-cities/assets/GHS_UCDB_THEME_HAZARD_RISK_GLOBE_R2024A
City field: GC_UCN_MAI
Country field: GC_CNT_GAD
Population field: GC_POP_TOT
ID field: GC_UNI_ID

HEAT WAVE METRICS DEFINITIONS
HWF - Heat Wave Frequency: Days per year above EHF threshold
HWN - Heat Wave Number: Number of heat wave events per year
HWD - Heat Wave Duration: Average duration of heat wave events
HWA - Heat Wave Amplitude: Average peak EHF of heat wave events
HWM - Heat Wave Magnitude: Average cumulative EHF of heat wave events


## 4.2 Heat Wave Metrics Calculation Functions

In [3]:
def calculate_annual_heat_wave_metrics(ehf_series, events_df):
    """
    Calculate annual heat wave metrics from EHF time series and events
    
    Parameters:
    - ehf_series: pandas Series with daily EHF values
    - events_df: DataFrame with heat wave events
    
    Returns:
    - DataFrame with annual metrics
    """
    
    # Convert to DataFrame with date index
    ehf_df = ehf_series.to_frame('ehf')
    ehf_df['year'] = ehf_df.index.year
    ehf_df['above_threshold'] = ehf_df['ehf'] >= EHF_THRESHOLD
    
    # Add year to events
    if not events_df.empty:
        events_df['year'] = pd.to_datetime(events_df['start_date']).dt.year
    
    # Calculate metrics by year
    annual_metrics = []
    
    for year in range(ehf_df['year'].min(), ehf_df['year'].max() + 1):
        year_data = ehf_df[ehf_df['year'] == year]
        year_events = events_df[events_df['year'] == year] if not events_df.empty else pd.DataFrame()
        
        # HWF: Heat Wave Frequency (days per year above threshold)
        hwf = year_data['above_threshold'].sum()
        
        # HWN: Heat Wave Number (number of events per year)
        hwn = len(year_events)
        
        # HWD: Heat Wave Duration (average duration of events)
        hwd = year_events['duration'].mean() if not year_events.empty else 0
        
        # HWA: Heat Wave Amplitude (average peak EHF of events)
        hwa = year_events['max_ehf'].mean() if not year_events.empty else 0
        
        # HWM: Heat Wave Magnitude (average cumulative EHF of events)
        hwm = year_events['cumulative_ehf'].mean() if not year_events.empty else 0
        
        annual_metrics.append({
            'year': year,
            'hwf': hwf,
            'hwn': hwn,
            'hwd': hwd,
            'hwa': hwa,
            'hwm': hwm,
            'total_ehf_days': hwf,
            'max_annual_ehf': year_data['ehf'].max(),
            'mean_annual_ehf': year_data[year_data['ehf'] > 0]['ehf'].mean() if (year_data['ehf'] > 0).any() else 0
        })
    
    return pd.DataFrame(annual_metrics)

def calculate_trend_analysis(metrics_df, metric_name):
    """
    Calculate trend analysis for a heat wave metric
    
    Parameters:
    - metrics_df: DataFrame with annual metrics
    - metric_name: Name of metric column to analyze
    
    Returns:
    - Dictionary with trend statistics
    """
    
    # Remove years with NaN values
    clean_data = metrics_df.dropna(subset=[metric_name])
    
    if len(clean_data) < 3:
        return {
            'slope': np.nan,
            'intercept': np.nan,
            'r_squared': np.nan,
            'p_value': np.nan,
            'trend_per_decade': np.nan
        }
    
    # Linear regression
    X = clean_data['year'].values.reshape(-1, 1)
    y = clean_data[metric_name].values
    
    model = LinearRegression().fit(X, y)
    slope = model.coef_[0]
    intercept = model.intercept_
    r_squared = model.score(X, y)
    
    # Statistical significance test
    correlation, p_value = stats.pearsonr(clean_data['year'], clean_data[metric_name])
    
    # Trend per decade
    trend_per_decade = slope * 10
    
    return {
        'slope': slope,
        'intercept': intercept,
        'r_squared': r_squared,
        'p_value': p_value,
        'trend_per_decade': trend_per_decade,
        'correlation': correlation
    }

def calculate_period_comparison(metrics_df, baseline_years, recent_years):
    """
    Compare heat wave metrics between baseline and recent periods
    
    Parameters:
    - metrics_df: DataFrame with annual metrics
    - baseline_years: tuple (start_year, end_year) for baseline period
    - recent_years: tuple (start_year, end_year) for recent period
    
    Returns:
    - Dictionary with period comparison results
    """
    
    baseline_data = metrics_df[
        (metrics_df['year'] >= baseline_years[0]) & 
        (metrics_df['year'] <= baseline_years[1])
    ]
    
    recent_data = metrics_df[
        (metrics_df['year'] >= recent_years[0]) & 
        (metrics_df['year'] <= recent_years[1])
    ]
    
    comparison = {}
    metrics = ['hwf', 'hwn', 'hwd', 'hwa', 'hwm']
    
    for metric in metrics:
        baseline_mean = baseline_data[metric].mean()
        recent_mean = recent_data[metric].mean()
        
        # Calculate change
        absolute_change = recent_mean - baseline_mean
        relative_change = (absolute_change / baseline_mean * 100) if baseline_mean != 0 else np.nan
        
        # Statistical test
        if len(baseline_data) > 1 and len(recent_data) > 1:
            t_stat, p_value = stats.ttest_ind(baseline_data[metric], recent_data[metric])
        else:
            t_stat, p_value = np.nan, np.nan
        
        comparison[metric] = {
            'baseline_mean': baseline_mean,
            'recent_mean': recent_mean,
            'absolute_change': absolute_change,
            'relative_change': relative_change,
            't_statistic': t_stat,
            'p_value': p_value
        }
    
    return comparison

print("✅ Heat wave metrics calculation functions defined")

✅ Heat wave metrics calculation functions defined


## Next Steps

This notebook has been updated with the correct GHS database collection and field names:

✅ **Updated:**
- Collection: `projects/tl-cities/assets/GHS_UCDB_THEME_HAZARD_RISK_GLOBE_R2024A`
- Field names: `GC_UCN_MAI`, `GC_CNT_GAD`, `GC_POP_TOT`, `GC_UNI_ID`
- Heat wave metrics calculation functions (HWF, HWN, HWD, HWA, HWM)
- Trend analysis and period comparison methods
- Multi-year heat wave pattern analysis framework

**Next Phase:** Continue to Phase 5 notebook for validation and comprehensive output generation.