# Seasonal Extreme Heat Days Analysis - Final Implementation

This notebook implements the **climatologically appropriate** extreme heat day calculation using seasonal percentiles.

## 🔬 **Methodology:**
```
Surface_Heat_Day = 1 if LST_daily_max > max(LST_abs, LST_rel)
```
Where:
- **LST_abs**: Absolute threshold (e.g., 35°C)
- **LST_rel**: percentile_X{LST_daily_max for calendar_day over reference period}

## 🎯 **Scientific Advantages:**
- **Seasonal context**: July heat vs January heat appropriately weighted
- **Climatological accuracy**: Follows meteorological best practices
- **Sensitive detection**: Can identify seasonal heat anomalies
- **Day-specific baselines**: Each day compared to its climatological context

---

## 1. Setup & Configuration

In [None]:
# Import required libraries
import ee
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import warnings
import os

# Configure matplotlib
plt.rcParams['figure.figsize'] = (12, 8)

# Initialize Earth Engine
try:
    ee.Initialize(project='tl-cities')
    print('✅ Earth Engine initialized successfully')
except Exception as e:
    print(f'❌ Earth Engine initialization failed: {e}')
    raise

print('📦 Libraries imported successfully')

In [None]:
# Analysis configuration
CONFIG = {
    'analysis_year': 2020,
    'reference_start': 2003,
    'reference_end': 2019,
    'absolute_threshold': 35.0,  # °C
    'percentile_threshold': 90.0,  # percentile
    'max_temp_filter': 50.0,  # °C - filter extreme outliers
    'roi_bounds': [-38.7, -13.1, -38.3, -12.8],  # Salvador, Brazil [west, south, east, north]
    'scale': 1000  # meters
}

# Create output directory
output_dir = '../outputs/final_seasonal_analysis'
os.makedirs(output_dir, exist_ok=True)

print('⚙️ Configuration:')
for key, value in CONFIG.items():
    print(f'   {key}: {value}')
print(f'📁 Output directory: {output_dir}')

## 2. Data Extraction

In [None]:
# Define region of interest
west, south, east, north = CONFIG['roi_bounds']
roi = ee.Geometry.Rectangle([west, south, east, north])

# Calculate area
area_km2 = roi.area().divide(1000000).getInfo()
print(f'🎯 ROI: Salvador, Brazil region')
print(f'   Area: {area_km2:.1f} km²')
print(f'   Bounds: W={west}, E={east}, S={south}, N={north}')

In [None]:
def get_gshtd_collection(geometry, start_date, end_date):
    """Get GSHTD temperature collection for Latin America"""
    collection_id = "projects/sat-io/open-datasets/global-daily-air-temp/latin_america"
    
    collection = ee.ImageCollection(collection_id)
    filtered = (collection.filterDate(start_date, end_date)
                         .filterBounds(geometry)
                         .filter(ee.Filter.eq('prop_type', 'tmax')))
    
    # Apply temperature scaling and quality filtering
    def process_image(img):
        # Scale from Kelvin*10 to Celsius and apply quality filter
        temp_celsius = img.select('b1').divide(10.0)
        
        # Filter extreme values (likely data quality issues)
        temp_filtered = temp_celsius.updateMask(
            temp_celsius.gte(-20).And(temp_celsius.lte(CONFIG['max_temp_filter']))
        )
        
        return temp_filtered.rename('temperature').clip(geometry).copyProperties(img, ['system:time_start'])
    
    return filtered.map(process_image)

print('🛠️ Data extraction functions defined')

In [34]:
# Extract temperature data with monthly chunking for large datasets
print('🔄 Extracting temperature data with monthly chunking...')

# Years to extract: reference period + analysis year
all_years = list(range(CONFIG['reference_start'], CONFIG['reference_end'] + 1)) + [CONFIG['analysis_year']]
all_years = sorted(list(set(all_years)))  # Remove duplicates

print(f'   Years: {all_years[0]}-{all_years[-1]} ({len(all_years)} years)')

# Test pixel count
test_collection = get_gshtd_collection(roi, f'{CONFIG["analysis_year"]}-01-01', f'{CONFIG["analysis_year"]}-01-02')
first_image = test_collection.first()
pixel_count = first_image.select('temperature').reduceRegion(
    reducer=ee.Reducer.count(),
    geometry=roi,
    scale=CONFIG['scale'],
    maxPixels=1e9
).getInfo()

expected_pixels = pixel_count.get('temperature', 0)
print(f'   Expected pixels per image: {expected_pixels}')

if expected_pixels == 0:
    raise ValueError('No pixels found in ROI - check region coverage')

🔄 Extracting temperature data with monthly chunking...
   Years: 2003-2020 (18 years)
   Expected pixels per image: 510


In [35]:
# Extract data using monthly chunking to handle large datasets
print('📊 Extracting data month by month...')

all_dataframes = []
extraction_stats = {'successful': 0, 'failed': 0, 'total_obs': 0}

for extract_year in all_years:
    print(f'\n📅 === EXTRACTING YEAR {extract_year} ===')
    year_dataframes = []
    
    for month in range(1, 13):
        print(f'   Month {month:02d}...', end=' ')
        
        # Monthly date range
        start_date = f'{extract_year}-{month:02d}-01'
        if month == 12:
            end_date = f'{extract_year+1}-01-01'
        else:
            end_date = f'{extract_year}-{month+1:02d}-01'
        
        try:
            # Get monthly collection
            month_collection = get_gshtd_collection(roi, start_date, end_date)
            month_count = month_collection.size().getInfo()
            
            if month_count == 0:
                print('No images', end=' ')
                extraction_stats['failed'] += 1
                continue
            
            # Extract with appropriate scale
            month_estimated = expected_pixels * month_count
            month_scale = 2000 if month_estimated > 800000 else CONFIG['scale']
            
            region_data = month_collection.getRegion(
                geometry=roi,
                scale=month_scale,
                crs='EPSG:4326'
            ).getInfo()
            
            if len(region_data) > 1:
                header = region_data[0]
                data_rows = region_data[1:]
                
                df_month = pd.DataFrame(data_rows, columns=header)
                df_month['time'] = pd.to_datetime(df_month['time'], unit='ms')
                df_month = df_month.dropna(subset=['temperature'])
                df_month = df_month[df_month['temperature'] <= CONFIG['max_temp_filter']]
                
                if len(df_month) > 0:
                    year_dataframes.append(df_month)
                    extraction_stats['successful'] += 1
                    extraction_stats['total_obs'] += len(df_month)
                    print(f'{len(df_month)} obs', end=' ')
                else:
                    print('No valid data', end=' ')
                    extraction_stats['failed'] += 1
            else:
                print('No data returned', end=' ')
                extraction_stats['failed'] += 1
                
        except Exception as e:
            print(f'Error: {str(e)[:30]}...', end=' ')
            extraction_stats['failed'] += 1
    
    # Add year's data to main collection
    if year_dataframes:
        print(f'\n   ✅ Year {extract_year}: {len(year_dataframes)} successful months')
        all_dataframes.extend(year_dataframes)
    else:
        print(f'\n   ❌ Year {extract_year}: No data extracted')

print(f'\n📊 EXTRACTION SUMMARY:')
print(f'   Successful months: {extraction_stats["successful"]}')
print(f'   Failed months: {extraction_stats["failed"]}')
print(f'   Total observations: {extraction_stats["total_obs"]:,}')

📊 Extracting data month by month...

📅 === EXTRACTING YEAR 2003 ===
   Month 01... 14756 obs    Month 02... 13328 obs    Month 03... 14756 obs    Month 04... 14280 obs    Month 05... 14756 obs    Month 06... 14280 obs    Month 07... 14756 obs    Month 08... 14756 obs    Month 09... 14280 obs    Month 10... 14756 obs    Month 11... 14280 obs    Month 12... 14756 obs 
   ✅ Year 2003: 12 successful months

📅 === EXTRACTING YEAR 2004 ===
   Month 01... 14756 obs    Month 02... 13804 obs    Month 03... 14756 obs    Month 04... 14280 obs    Month 05... 14756 obs    Month 06... 14280 obs    Month 07... 14756 obs    Month 08... 14756 obs    Month 09... 14280 obs    Month 10... 14756 obs    Month 11... 14280 obs    Month 12... 14280 obs 
   ✅ Year 2004: 12 successful months

📅 === EXTRACTING YEAR 2005 ===
   Month 01... 14756 obs    Month 02... 13328 obs    Month 03... 14756 obs    Month 04... 14280 obs    Month 05... 14756 obs    Month 06... 14280 obs    Month 07... 14756 obs    Month 08... 14

In [None]:
# Use your successfully extracted DataFrame and convert properly to xarray
print('🔄 Using your successfully extracted DataFrame...')

# Check if the DataFrame exists from previous session
if 'df' in globals() and len(df) > 100000:
   print(f'✅ Using existing DataFrame: {len(df):,} observations')

   # Clean the DataFrame
   df_clean = df.copy()
   df_clean['time'] = pd.to_datetime(df_clean['time'])
   df_clean['temperature'] = pd.to_numeric(df_clean['temperature'], errors='coerce')
   df_clean = df_clean.dropna(subset=['temperature'])

   print(f'   After cleaning: {len(df_clean):,} valid observations')

   # Convert to xarray using from_dataframe() - handles sparse data properly
   temperature_data = xr.Dataset.from_dataframe(
       df_clean.set_index(['time', 'latitude', 'longitude'])[['temperature']]
   )

   print(f'   📊 Xarray dataset created: {dict(temperature_data.dims)}')
   print(f'   Valid temperature values: {(~temperature_data.temperature.isnull()).sum().values:,}')

else:
   print('❌ No DataFrame found. Please run the data extraction first.')
   raise ValueError('No df DataFrame available')

print('✅ Data ready for analysis!')

In [37]:
df.head()

Unnamed: 0,id,longitude,latitude,time,temperature
0,LatinAmerica_Ta_SVCMsp_2003TMAX_001,-38.694931,-13.04803,2003-01-01,31.9
1,LatinAmerica_Ta_SVCMsp_2003TMAX_002,-38.694931,-13.04803,2003-01-02,28.5
2,LatinAmerica_Ta_SVCMsp_2003TMAX_003,-38.694931,-13.04803,2003-01-03,31.4
3,LatinAmerica_Ta_SVCMsp_2003TMAX_004,-38.694931,-13.04803,2003-01-04,32.3
4,LatinAmerica_Ta_SVCMsp_2003TMAX_005,-38.694931,-13.04803,2003-01-05,32.7


## 3. Seasonal Analysis

In [None]:
# Clean xarray-based seasonal analysis
print('🔬 Performing seasonal extreme heat analysis...')

# Get unique years
years_available = np.unique(temperature_data.time.dt.year.values)
print(f'Years available: {sorted(years_available)}')

# Add day of year coordinate
temperature_data = temperature_data.assign_coords(dayofyear=temperature_data.time.dt.dayofyear)

# Split data cleanly
analysis_data = temperature_data.sel(time=str(CONFIG['analysis_year']))
reference_data = temperature_data.sel(time=slice(f'{CONFIG["reference_start"]}-01-01', f'{CONFIG["reference_end"]}-12-31'))

print(f'Analysis period ({CONFIG["analysis_year"]}): {len(analysis_data.time)} days')
print(f'Reference period ({CONFIG["reference_start"]}-{CONFIG["reference_end"]}): {len(reference_data.time)} days')

if len(reference_data.time) == 0:
    raise ValueError('❌ No reference data found - check data extraction')

In [None]:
# Calculate seasonal percentiles using proper xarray operations
print('📊 Calculating seasonal percentiles...')

percentile_value = CONFIG['percentile_threshold'] / 100

# Group by day of year and calculate percentiles - the proper xarray way
with warnings.catch_warnings():
    warnings.simplefilter('ignore')  # Suppress numpy warnings
    seasonal_percentiles = reference_data.groupby('dayofyear').quantile(
        percentile_value, dim='time', skipna=True
    ).temperature

print(f'✅ Seasonal percentiles calculated: {seasonal_percentiles.dims}')
print(f'   Days with valid percentiles: {(~seasonal_percentiles.isnull()).sum().values}')

In [None]:
# Apply seasonal methodology
print('🔥 Calculating seasonal heat days...')

abs_threshold = CONFIG['absolute_threshold']

# Calculate daily thresholds using broadcasting
daily_thresholds = xr.where(
    seasonal_percentiles > abs_threshold,
    seasonal_percentiles,
    abs_threshold
)

# Map thresholds to analysis days
analysis_thresholds = daily_thresholds.sel(dayofyear=analysis_data.dayofyear)

# Calculate heat days
seasonal_heat_days = (analysis_data.temperature > analysis_thresholds).sum(dim='time')

print(f'✅ Seasonal heat days calculated')

# Calculate annual method for comparison
print('📊 Calculating annual method for comparison...')

with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    annual_percentile = reference_data.temperature.quantile(percentile_value, dim='time', skipna=True)
    
annual_threshold = xr.where(annual_percentile > abs_threshold, annual_percentile, abs_threshold)
annual_heat_days = (analysis_data.temperature > annual_threshold).sum(dim='time')

print(f'✅ Annual comparison calculated')

In [None]:
# Calculate summary statistics
seasonal_mean = float(seasonal_heat_days.mean().values)
annual_mean = float(annual_heat_days.mean().values)
seasonal_max = float(seasonal_heat_days.max().values)
annual_max = float(annual_heat_days.max().values)
seasonal_pixels = int((seasonal_heat_days > 0).sum().values)
annual_pixels = int((annual_heat_days > 0).sum().values)
total_pixels = int(seasonal_heat_days.count().values)

print(f'\n📊 METHODOLOGY COMPARISON:')
print(f'🔬 Seasonal Method:')
print(f'   Mean heat days: {seasonal_mean:.1f}')
print(f'   Max heat days: {seasonal_max:.0f}')
print(f'   Pixels with >0 heat days: {seasonal_pixels} of {total_pixels} ({seasonal_pixels/total_pixels*100:.1f}%)')

print(f'\n📅 Annual Method:')
print(f'   Mean heat days: {annual_mean:.1f}')
print(f'   Max heat days: {annual_max:.0f}')
print(f'   Pixels with >0 heat days: {annual_pixels} of {total_pixels} ({annual_pixels/total_pixels*100:.1f}%)')

print(f'\n📈 Difference (Seasonal - Annual):')
print(f'   Mean: {seasonal_mean - annual_mean:+.1f} days')
print(f'   Pixels: {seasonal_pixels - annual_pixels:+.0f}')

print(f'\n✅ Seasonal analysis complete!')

## 4. Visualization

In [None]:
# Create comprehensive visualization
print('📊 Creating visualizations...')

fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Seasonal Heat Days
seasonal_heat_days.plot(ax=axes[0,0], cmap='Reds', add_colorbar=True,
                       cbar_kwargs={'label': 'Heat Days'})
axes[0,0].set_title('🔬 Seasonal Method Heat Days', fontweight='bold', fontsize=14)
axes[0,0].set_xlabel('Longitude')
axes[0,0].set_ylabel('Latitude')

# Annual Heat Days  
annual_heat_days.plot(ax=axes[0,1], cmap='Reds', add_colorbar=True,
                     cbar_kwargs={'label': 'Heat Days'})
axes[0,1].set_title('📅 Annual Method Heat Days', fontweight='bold', fontsize=14)
axes[0,1].set_xlabel('Longitude')
axes[0,1].set_ylabel('Latitude')

# Difference Map
difference = seasonal_heat_days - annual_heat_days
difference.plot(ax=axes[1,0], cmap='RdBu_r', add_colorbar=True,
               cbar_kwargs={'label': 'Difference (Seasonal - Annual)'})
axes[1,0].set_title('📈 Method Difference', fontweight='bold', fontsize=14)
axes[1,0].set_xlabel('Longitude')
axes[1,0].set_ylabel('Latitude')

# Distribution comparison
seasonal_flat = seasonal_heat_days.values.flatten()
annual_flat = annual_heat_days.values.flatten() 
seasonal_clean = seasonal_flat[~np.isnan(seasonal_flat)]
annual_clean = annual_flat[~np.isnan(annual_flat)]

axes[1,1].hist(seasonal_clean, bins=20, alpha=0.7, color='red',
              edgecolor='black', label='Seasonal Method')
axes[1,1].hist(annual_clean, bins=20, alpha=0.5, color='blue',
              edgecolor='black', label='Annual Method')
axes[1,1].set_title('Heat Days Distribution Comparison', fontweight='bold', fontsize=14)
axes[1,1].set_xlabel('Heat Days per Pixel')
axes[1,1].set_ylabel('Frequency')
axes[1,1].legend()
axes[1,1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate and display correlation
correlation = np.corrcoef(annual_clean, seasonal_clean)[0,1]
print(f'\n📈 Correlation between methods: {correlation:.3f}')

if correlation > 0.8:
    print('   ✅ Strong positive correlation - methods generally agree')
elif correlation > 0.5:
    print('   ⚠️ Moderate correlation - some differences in spatial patterns')
else:
    print('   🔍 Low correlation - significant methodological differences')

print('✅ Visualization complete!')

## 5. Export Results

In [None]:
# Export results to multiple formats
print('📁 Exporting results...')

# 1. Create NetCDF with proper CRS
print('   📦 Creating NetCDF file...')

# Prepare spatial results
results = {
    'seasonal_heat_days': seasonal_heat_days.fillna(0),
    'annual_heat_days': annual_heat_days.fillna(0),
    'heat_days_difference': (seasonal_heat_days - annual_heat_days).fillna(0),
    'seasonal_percentiles_mean': seasonal_percentiles.mean(dim='dayofyear').fillna(0),
    'annual_percentile': annual_percentile.fillna(0),
    'annual_max': analysis_data.temperature.max(dim='time').fillna(0),
    'annual_min': analysis_data.temperature.min(dim='time').fillna(0)
}

results_ds = xr.Dataset(results)

# Add proper CRS metadata
results_ds.latitude.attrs.update({
    'standard_name': 'latitude',
    'long_name': 'latitude',
    'units': 'degrees_north',
    'axis': 'Y'
})

results_ds.longitude.attrs.update({
    'standard_name': 'longitude',
    'long_name': 'longitude', 
    'units': 'degrees_east',
    'axis': 'X'
})

# Add CRS variable
crs = xr.DataArray(
    data=np.int32(1),
    attrs={
        'grid_mapping_name': 'latitude_longitude',
        'longitude_of_prime_meridian': 0.0,
        'semi_major_axis': 6378137.0,
        'inverse_flattening': 298.257223563,
        'spatial_ref': 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]'
    }
)
results_ds['crs'] = crs

# Add grid_mapping to data variables
for var_name in results_ds.data_vars:
    if var_name != 'crs':
        results_ds[var_name].attrs['grid_mapping'] = 'crs'

# Add global metadata
results_ds.attrs.update({
    'title': 'Seasonal Extreme Heat Days Analysis',
    'analysis_year': CONFIG['analysis_year'],
    'reference_period': f'{CONFIG["reference_start"]}-{CONFIG["reference_end"]}',
    'created': datetime.now().isoformat(),
    'absolute_threshold': CONFIG['absolute_threshold'],
    'percentile_threshold': CONFIG['percentile_threshold'],
    'methodology': 'seasonal_percentile_day_of_year',
    'formula': 'Heat_Day = LST_max > max(absolute_threshold, seasonal_percentile)',
    'crs': 'EPSG:4326',
    'temperature_filter': f'Applied max temperature filter: {CONFIG["max_temp_filter"]}°C'
})

# Save NetCDF
netcdf_filename = f'{output_dir}/seasonal_heat_analysis_final_{CONFIG["analysis_year"]}.nc'
results_ds.to_netcdf(netcdf_filename)

file_size_mb = os.path.getsize(netcdf_filename) / 1024**2
print(f'   ✅ NetCDF saved: {os.path.basename(netcdf_filename)} ({file_size_mb:.1f} MB)')

In [None]:
# 2. Export summary CSV
print('   📊 Creating summary CSV...')

summary_data = [
    {
        'method': 'seasonal',
        'mean_heat_days': seasonal_mean,
        'max_heat_days': seasonal_max,
        'pixels_with_heat_days': seasonal_pixels,
        'total_pixels': total_pixels,
        'percentage_with_heat': seasonal_pixels / total_pixels * 100
    },
    {
        'method': 'annual',
        'mean_heat_days': annual_mean,
        'max_heat_days': annual_max, 
        'pixels_with_heat_days': annual_pixels,
        'total_pixels': total_pixels,
        'percentage_with_heat': annual_pixels / total_pixels * 100
    }
]

summary_df = pd.DataFrame(summary_data)
summary_filename = f'{output_dir}/method_comparison_final_{CONFIG["analysis_year"]}.csv'
summary_df.to_csv(summary_filename, index=False)

print(f'   ✅ Summary CSV saved: {os.path.basename(summary_filename)}')
print('\n📊 METHOD COMPARISON SUMMARY:')
display(summary_df)

In [None]:
# 3. Create comprehensive documentation
print('   📝 Creating methodology documentation...')

doc_content = f"""SEASONAL EXTREME HEAT DAYS ANALYSIS - FINAL REPORT
=================================================
Generated: {datetime.now().isoformat()}
Analysis Year: {CONFIG['analysis_year']}
Reference Period: {CONFIG['reference_start']}-{CONFIG['reference_end']}

METHODOLOGY:
============
Formula: Surface_Heat_Day = 1 if LST_daily_max > max(LST_abs, LST_rel)

Where:
- LST_abs = {CONFIG['absolute_threshold']}°C (absolute threshold)
- LST_rel = {CONFIG['percentile_threshold']}th percentile of temperatures for each calendar day
- Reference period: {CONFIG['reference_end'] - CONFIG['reference_start'] + 1} years of historical data

SCIENTIFIC RATIONALE:
====================
1. Seasonal Context: Each day compared to its climatological normal
2. Climatological Accuracy: Follows meteorological best practices
3. Sensitive Detection: Can identify seasonal heat anomalies
4. Day-specific Baselines: Accounts for natural seasonal temperature cycles

DATA PROCESSING:
===============
- Data Source: GSHTD (Global Surface Heat Temperature Database)
- Spatial Resolution: {CONFIG['scale']}m
- Temporal Coverage: {CONFIG['reference_start']}-{CONFIG['analysis_year']}
- Quality Control: Temperature values > {CONFIG['max_temp_filter']}°C removed
- Extraction Method: Monthly chunking for large dataset handling

RESULTS SUMMARY:
===============
Study Area: Salvador, Brazil region ({area_km2:.1f} km²)
Total Pixels Analyzed: {total_pixels:,}
Total Observations Processed: {extraction_stats['total_obs']:,}

Seasonal Method:
- Mean heat days: {seasonal_mean:.1f}
- Max heat days: {seasonal_max:.0f}
- Pixels with heat days: {seasonal_pixels:,} ({seasonal_pixels/total_pixels*100:.1f}%)

Annual Method (comparison):
- Mean heat days: {annual_mean:.1f}
- Max heat days: {annual_max:.0f}
- Pixels with heat days: {annual_pixels:,} ({annual_pixels/total_pixels*100:.1f}%)

METHODOLOGICAL COMPARISON:
=========================
- Mean difference: {seasonal_mean - annual_mean:.1f} days
- Correlation: {correlation:.3f}
- Affected pixels: {abs(seasonal_pixels - annual_pixels):,} difference

COORDINATE REFERENCE SYSTEM:
===========================
- CRS: EPSG:4326 (WGS84 Geographic)
- Units: Decimal degrees
- Datum: World Geodetic System 1984
- NetCDF includes proper CF-compliant CRS metadata

FILES GENERATED:
===============
1. {os.path.basename(netcdf_filename)} - NetCDF spatial dataset with proper CRS
2. {os.path.basename(summary_filename)} - Method comparison summary
3. seasonal_heat_methodology_final_{CONFIG['analysis_year']}.txt - This documentation

CONCLUSION:
==========
The seasonal methodology provides more climatologically appropriate 
extreme heat detection by accounting for natural seasonal temperature cycles.
This approach offers improved sensitivity to seasonal heat anomalies compared 
to traditional annual percentile methods.

Analysis completed successfully with {extraction_stats['successful']} successful 
monthly extractions out of {extraction_stats['successful'] + extraction_stats['failed']} attempted.
"""

doc_filename = f'{output_dir}/seasonal_heat_methodology_final_{CONFIG["analysis_year"]}.txt'
with open(doc_filename, 'w') as f:
    f.write(doc_content)

print(f'   ✅ Documentation saved: {os.path.basename(doc_filename)}')

print(f'\n🎯 ANALYSIS COMPLETE!')
print(f'📁 All outputs saved to: {output_dir}')
print(f'\n📊 Files created:')
print(f'   • NetCDF spatial dataset with proper CRS')
print(f'   • Method comparison CSV')
print(f'   • Comprehensive methodology documentation')
print(f'\n✅ Ready for presentation!')

## Summary

This notebook successfully implements a **climatologically appropriate** extreme heat day calculation using seasonal percentiles. The key advantages of this approach include:

### 🔬 **Methodological Improvements:**
- **Seasonal context**: Each day compared to its climatological normal
- **Day-specific baselines**: Accounts for natural seasonal temperature cycles
- **Enhanced sensitivity**: Better detection of seasonal heat anomalies

### 📊 **Technical Implementation:**
- **Clean xarray operations**: Leverages xarray's built-in climate analysis capabilities
- **Efficient data handling**: Monthly chunking for large dataset processing
- **Proper CRS handling**: CF-compliant NetCDF export for GIS compatibility

### 🎯 **Scientific Value:**
- **Meteorologically sound**: Follows established climatological practices
- **Comparative analysis**: Direct comparison with traditional annual methods
- **Comprehensive output**: Multiple export formats for further analysis

The seasonal methodology represents a significant improvement over simple annual percentiles for extreme heat analysis in climate studies.