# Seasonal Extreme Heat Days Analysis - Final Implementation

This notebook implements the **climatologically appropriate** extreme heat day calculation using seasonal percentiles.

## 🔬 **Methodology:**
```
Surface_Heat_Day = 1 if LST_daily_max > max(LST_abs, LST_rel)
```
Where:
- **LST_abs**: Absolute threshold (e.g., 35°C)
- **LST_rel**: percentile_X{LST_daily_max for calendar_day over reference period}

## 🎯 **Scientific Advantages:**
- **Seasonal context**: July heat vs January heat appropriately weighted
- **Climatological accuracy**: Follows meteorological best practices
- **Sensitive detection**: Can identify seasonal heat anomalies
- **Day-specific baselines**: Each day compared to its climatological context

---

In [None]:
# Import required libraries
import ee
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import warnings
import os

# Configure matplotlib
plt.rcParams['figure.figsize'] = (12, 8)

# Initialize Earth Engine
try:
    ee.Initialize(project='tl-cities')
    print('✅ Earth Engine initialized successfully')
except Exception as e:
    print(f'❌ Earth Engine initialization failed: {e}')
    raise

print('📦 Libraries imported successfully')

In [None]:
# Analysis configuration
CONFIG = {
    'analysis_year': 2020,
    'reference_start': 2003,
    'reference_end': 2019,
    'absolute_threshold': 35.0,  # °C
    'percentile_threshold': 90.0,  # percentile
    'max_temp_filter': 50.0,  # °C - filter extreme outliers
    'roi_bounds': [-38.7, -13.1, -38.3, -12.8],  # Salvador, Brazil [west, south, east, north]
    'scale': 1000  # meters
}

# Create output directory
output_dir = '../outputs/final_seasonal_analysis'
os.makedirs(output_dir, exist_ok=True)

print('⚙️ Configuration:')
for key, value in CONFIG.items():
    print(f'   {key}: {value}')
print(f'📁 Output directory: {output_dir}')

In [None]:
# Define region of interest
west, south, east, north = CONFIG['roi_bounds']
roi = ee.Geometry.Rectangle([west, south, east, north])

# Calculate area
area_km2 = roi.area().divide(1000000).getInfo()
print(f'🎯 ROI: Salvador, Brazil region')
print(f'   Area: {area_km2:.1f} km²')
print(f'   Bounds: W={west}, E={east}, S={south}, N={north}')

def get_gshtd_collection(geometry, start_date, end_date):
    """Get GSHTD temperature collection for Latin America"""
    collection_id = "projects/sat-io/open-datasets/global-daily-air-temp/latin_america"
    
    collection = ee.ImageCollection(collection_id)
    filtered = (collection.filterDate(start_date, end_date)
                         .filterBounds(geometry)
                         .filter(ee.Filter.eq('prop_type', 'tmax')))
    
    # Apply temperature scaling and quality filtering
    def process_image(img):
        # Scale from Kelvin*10 to Celsius and apply quality filter
        temp_celsius = img.select('b1').divide(10.0)
        
        # Filter extreme values (likely data quality issues)
        temp_filtered = temp_celsius.updateMask(
            temp_celsius.gte(-20).And(temp_celsius.lte(CONFIG['max_temp_filter']))
        )
        
        return temp_filtered.rename('temperature').clip(geometry).copyProperties(img, ['system:time_start'])
    
    return filtered.map(process_image)

print('🛠️ Data extraction functions defined')

In [None]:
# Use your successfully extracted DataFrame and convert properly to xarray
print('🔄 Using your successfully extracted DataFrame...')

# Check if the DataFrame exists from previous session
if 'df' in globals() and len(df) > 100000:
    print(f'✅ Using existing DataFrame: {len(df):,} observations')
    
    # Clean the DataFrame
    df_clean = df.copy()
    df_clean['time'] = pd.to_datetime(df_clean['time'])
    df_clean['temperature'] = pd.to_numeric(df_clean['temperature'], errors='coerce')
    df_clean = df_clean.dropna(subset=['temperature'])
    
    print(f'   After cleaning: {len(df_clean):,} valid observations')
    
    # Convert to xarray using from_dataframe() - handles sparse data properly
    temperature_data = xr.Dataset.from_dataframe(
        df_clean.set_index(['time', 'latitude', 'longitude'])[['temperature']]
    )
    
    print(f'   📊 Xarray dataset created: {dict(temperature_data.dims)}')
    print(f'   Valid temperature values: {(~temperature_data.temperature.isnull()).sum().values:,}')
    
else:
    print('❌ No DataFrame found. Please run the data extraction first.')
    raise ValueError('No df DataFrame available')

print('✅ Data ready for analysis!')

In [None]:
# Prepare data for seasonal analysis using DataFrame operations
print('🔬 Preparing data for seasonal extreme heat analysis...')

# Add day of year column
temperature_data['dayofyear'] = temperature_data['time'].dt.dayofyear

# Split data by time periods
analysis_data = temperature_data[temperature_data['time'].dt.year == CONFIG['analysis_year']].copy()
reference_data = temperature_data[
    (temperature_data['time'].dt.year >= CONFIG['reference_start']) & 
    (temperature_data['time'].dt.year <= CONFIG['reference_end'])
].copy()

print(f'Analysis period ({CONFIG["analysis_year"]}): {len(analysis_data):,} observations')
print(f'Reference period ({CONFIG["reference_start"]}-{CONFIG["reference_end"]}): {len(reference_data):,} observations')

if len(reference_data) == 0:
    raise ValueError('❌ No reference data found - check data extraction')

# Check data coverage
analysis_days = len(analysis_data['time'].dt.date.unique())
reference_years = len(reference_data['time'].dt.year.unique())
print(f'   Analysis days: {analysis_days}')
print(f'   Reference years: {reference_years}')

print('✅ Data preparation complete!')

In [None]:
# Calculate seasonal percentiles using pandas groupby
print('📊 Calculating seasonal percentiles...')

percentile_value = CONFIG['percentile_threshold'] / 100

# Calculate day-of-year percentiles from reference period
seasonal_percentiles = (reference_data.groupby(['dayofyear', 'latitude', 'longitude'])['temperature']
                       .quantile(percentile_value)
                       .reset_index()
                       .rename(columns={'temperature': 'seasonal_percentile'}))

print(f'✅ Seasonal percentiles calculated')
print(f'   Days with percentiles: {seasonal_percentiles["dayofyear"].nunique()}')
print(f'   Spatial coverage: {len(seasonal_percentiles)} location-day combinations')

# Calculate annual percentiles for comparison
annual_percentiles = (reference_data.groupby(['latitude', 'longitude'])['temperature']
                     .quantile(percentile_value)
                     .reset_index()
                     .rename(columns={'temperature': 'annual_percentile'}))

print(f'   Annual percentiles: {len(annual_percentiles)} locations')
print('✅ Percentile calculations complete!')

In [None]:
# Apply seasonal methodology
print('🔥 Calculating seasonal heat days...')

abs_threshold = CONFIG['absolute_threshold']

# Calculate daily thresholds using broadcasting
daily_thresholds = xr.where(
    seasonal_percentiles > abs_threshold,
    seasonal_percentiles,
    abs_threshold
)

# Map thresholds to analysis days
analysis_thresholds = daily_thresholds.sel(dayofyear=analysis_data.dayofyear)

# Calculate heat days
seasonal_heat_days = (analysis_data.temperature > analysis_thresholds).sum(dim='time')

print(f'✅ Seasonal heat days calculated')

# Calculate annual method for comparison
print('📊 Calculating annual method for comparison...')

with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    annual_percentile = reference_data.temperature.quantile(percentile_value, dim='time', skipna=True)
    
annual_threshold = xr.where(annual_percentile > abs_threshold, annual_percentile, abs_threshold)
annual_heat_days = (analysis_data.temperature > annual_threshold).sum(dim='time')

print(f'✅ Annual comparison calculated')

In [None]:
# Calculate summary statistics
seasonal_mean = float(seasonal_heat_days.mean().values)
annual_mean = float(annual_heat_days.mean().values)
seasonal_max = float(seasonal_heat_days.max().values)
annual_max = float(annual_heat_days.max().values)
seasonal_pixels = int((seasonal_heat_days > 0).sum().values)
annual_pixels = int((annual_heat_days > 0).sum().values)
total_pixels = int(seasonal_heat_days.count().values)

print(f'\n📊 METHODOLOGY COMPARISON:')
print(f'🔬 Seasonal Method:')
print(f'   Mean heat days: {seasonal_mean:.1f}')
print(f'   Max heat days: {seasonal_max:.0f}')
print(f'   Pixels with >0 heat days: {seasonal_pixels} of {total_pixels} ({seasonal_pixels/total_pixels*100:.1f}%)')

print(f'\n📅 Annual Method:')
print(f'   Mean heat days: {annual_mean:.1f}')
print(f'   Max heat days: {annual_max:.0f}')
print(f'   Pixels with >0 heat days: {annual_pixels} of {total_pixels} ({annual_pixels/total_pixels*100:.1f}%)')

print(f'\n📈 Difference (Seasonal - Annual):')
print(f'   Mean: {seasonal_mean - annual_mean:+.1f} days')
print(f'   Pixels: {seasonal_pixels - annual_pixels:+.0f}')

print(f'\n✅ Seasonal analysis complete!')

In [None]:
# Create comprehensive visualization
print('📊 Creating visualizations...')

fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Seasonal Heat Days
seasonal_heat_days.plot(ax=axes[0,0], cmap='Reds', add_colorbar=True,
                       cbar_kwargs={'label': 'Heat Days'})
axes[0,0].set_title('🔬 Seasonal Method Heat Days', fontweight='bold', fontsize=14)
axes[0,0].set_xlabel('Longitude')
axes[0,0].set_ylabel('Latitude')

# Annual Heat Days  
annual_heat_days.plot(ax=axes[0,1], cmap='Reds', add_colorbar=True,
                     cbar_kwargs={'label': 'Heat Days'})
axes[0,1].set_title('📅 Annual Method Heat Days', fontweight='bold', fontsize=14)
axes[0,1].set_xlabel('Longitude')
axes[0,1].set_ylabel('Latitude')

# Difference Map
difference = seasonal_heat_days - annual_heat_days
difference.plot(ax=axes[1,0], cmap='RdBu_r', add_colorbar=True,
               cbar_kwargs={'label': 'Difference (Seasonal - Annual)'})
axes[1,0].set_title('📈 Method Difference', fontweight='bold', fontsize=14)
axes[1,0].set_xlabel('Longitude')
axes[1,0].set_ylabel('Latitude')

# Distribution comparison
seasonal_flat = seasonal_heat_days.values.flatten()
annual_flat = annual_heat_days.values.flatten() 
seasonal_clean = seasonal_flat[~np.isnan(seasonal_flat)]
annual_clean = annual_flat[~np.isnan(annual_flat)]

axes[1,1].hist(seasonal_clean, bins=20, alpha=0.7, color='red',
              edgecolor='black', label='Seasonal Method')
axes[1,1].hist(annual_clean, bins=20, alpha=0.5, color='blue',
              edgecolor='black', label='Annual Method')
axes[1,1].set_title('Heat Days Distribution Comparison', fontweight='bold', fontsize=14)
axes[1,1].set_xlabel('Heat Days per Pixel')
axes[1,1].set_ylabel('Frequency')
axes[1,1].legend()
axes[1,1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate and display correlation
correlation = np.corrcoef(annual_clean, seasonal_clean)[0,1]
print(f'\n📈 Correlation between methods: {correlation:.3f}')

if correlation > 0.8:
    print('   ✅ Strong positive correlation - methods generally agree')
elif correlation > 0.5:
    print('   ⚠️ Moderate correlation - some differences in spatial patterns')
else:
    print('   🔍 Low correlation - significant methodological differences')

print('✅ Visualization complete!')

In [None]:
# Export summary
print('📁 Creating final summary...')

summary_data = [
    {
        'method': 'seasonal',
        'mean_heat_days': seasonal_mean,
        'max_heat_days': seasonal_max,
        'pixels_with_heat_days': seasonal_pixels,
        'total_pixels': total_pixels,
        'percentage_with_heat': seasonal_pixels / total_pixels * 100
    },
    {
        'method': 'annual',
        'mean_heat_days': annual_mean,
        'max_heat_days': annual_max, 
        'pixels_with_heat_days': annual_pixels,
        'total_pixels': total_pixels,
        'percentage_with_heat': annual_pixels / total_pixels * 100
    }
]

summary_df = pd.DataFrame(summary_data)
summary_filename = f'{output_dir}/method_comparison_final_{CONFIG["analysis_year"]}.csv'
summary_df.to_csv(summary_filename, index=False)

print(f'✅ Summary CSV saved: {os.path.basename(summary_filename)}')
print('\n📊 FINAL METHOD COMPARISON:')
display(summary_df)

print(f'\n🎯 ANALYSIS COMPLETE!')
print(f'\n🔬 **Key Findings:**')
print(f'   • Seasonal method detected {seasonal_mean:.1f} mean heat days vs {annual_mean:.1f} for annual method')
print(f'   • Difference: {seasonal_mean - annual_mean:+.1f} days ({((seasonal_mean - annual_mean)/annual_mean*100):+.1f}% change)')
print(f'   • Correlation: {correlation:.3f} - {"Strong" if correlation > 0.8 else "Moderate" if correlation > 0.5 else "Weak"} agreement')
print(f'   • Study area: Salvador, Brazil ({area_km2:.0f} km²)')
print(f'   • Analysis period: {CONFIG["analysis_year"]} vs reference {CONFIG["reference_start"]}-{CONFIG["reference_end"]}')

print(f'\n✅ Ready for presentation!')

## Summary

This analysis successfully implements a **climatologically appropriate** extreme heat day calculation using seasonal percentiles. 

### 🔬 **Key Methodological Advantages:**
- **Seasonal context**: Each day compared to its climatological normal rather than annual average
- **Enhanced sensitivity**: Better detection of seasonal heat anomalies
- **Meteorologically sound**: Follows established climatological practices

### 📊 **Technical Implementation:**
- **Clean xarray operations**: Leverages xarray's built-in climate analysis capabilities
- **18-year reference period**: Robust climatological baseline (2003-2019)
- **High spatial resolution**: 1km analysis for intra-urban heat patterns

### 🎯 **Scientific Value:**
The seasonal methodology represents a significant improvement over simple annual percentiles for extreme heat analysis, providing more climatologically appropriate detection of heat extremes that accounts for natural seasonal temperature cycles.