# 🌡️ Seasonal Extreme Heat Days Analysis - Enhanced Methodology

This notebook implements the **climatologically appropriate** extreme heat day calculation using seasonal percentiles:

## 📐 **Formula:**
```
Surface_Heat_Day = 1 if LST_daily_max > max(LST_abs, LST_rel)
```
Where:
- **LST_abs**: Absolute threshold (e.g., 35°C)
- **LST_rel**: percentile_X{LST_daily_max for calendar_day ± 5 days over 30-year period}

## 🔬 **Scientific Advantages:**
- ⚡ **Seasonal context**: July heat vs January heat appropriately weighted
- 📈 **Climatological accuracy**: Follows meteorological best practices
- 🎯 **Sensitive detection**: Can identify winter/spring heat anomalies
- 🔄 **Day-specific baselines**: Each day compared to its historical context

## 🏗️ **Architecture:**
- Same robust data processing as notebook 17
- Enhanced seasonal percentile calculation
- Improved visualization and export capabilities

In [1]:
# Import required libraries
import ee
import geemap
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import ipywidgets as widgets
from IPython.display import display, clear_output
import os
import rasterio
from tkinter import filedialog
import tkinter as tk

# Set matplotlib to display inline
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 8)

# Initialize Earth Engine with your project
try:
    ee.Initialize(project='tl-cities')
    print('✅ Earth Engine initialized successfully')
except Exception as e:
    print(f'❌ Earth Engine initialization failed: {e}')

print('📦 Available packages:')
print(f'   - xarray: {xr.__version__}')
print(f'   - pandas: {pd.__version__}')
print(f'   - numpy: {np.__version__}')

# Create outputs directory
os.makedirs('../outputs', exist_ok=True)
print('📁 Created outputs directory: ../outputs')
print('🌡️ Seasonal Extreme Heat Analysis - Ready!')

*** Earth Engine *** Share your feedback by taking our Annual Developer Satisfaction Survey: https://google.qualtrics.com/jfe/form/SV_7TDKVSyKvBdmMqW?ref=4i2o6


✅ Earth Engine initialized successfully
📦 Available packages:
   - xarray: 2024.7.0
   - pandas: 2.3.1
   - numpy: 1.26.4
📁 Created outputs directory: ../outputs
🌡️ Seasonal Extreme Heat Analysis - Ready!


## 🎯 Step 1: ROI Selection (Same as Notebook 17)

In [2]:
# Global variables
analysis_geom = None
temperature_data = None

# Create map for ROI selection
m = geemap.Map(center=[-12.9714, -38.5014], zoom=10)  # Salvador, Brazil
m.add_basemap('SATELLITE')
m.add('draw_control')

def set_roi_from_drawing():
    '''Extract ROI from map drawing'''
    global analysis_geom
    
    try:
        if hasattr(m, 'draw_control') and len(m.draw_control.data) > 0:
            # Get the last drawn feature
            feature = m.draw_control.data[-1]
            coords = feature['geometry']['coordinates']
            
            if feature['geometry']['type'] == 'Polygon':
                analysis_geom = ee.Geometry.Polygon(coords)
            elif feature['geometry']['type'] == 'Rectangle':
                analysis_geom = ee.Geometry.Rectangle(coords)
            
            area_km2 = analysis_geom.area().divide(1000000).getInfo()
            bounds_info = analysis_geom.bounds().getInfo()['coordinates'][0]
            west, south = bounds_info[0]
            east, north = bounds_info[2]
            
            print(f'✅ ROI set from drawing: {area_km2:.1f} km²')
            print(f'   Bounds: W={west:.3f}, E={east:.3f}, S={south:.3f}, N={north:.3f}')
            return True
        else:
            print('❌ No drawing found. Please draw a polygon or rectangle on the map.')
            return False
    except Exception as e:
        print(f'❌ Error setting ROI from drawing: {e}')
        return False

def set_roi_from_coordinates():
    '''Set ROI from coordinate inputs'''
    global analysis_geom
    
    try:
        west = float(west_input.value) if west_input.value else -38.7
        east = float(east_input.value) if east_input.value else -38.3
        south = float(south_input.value) if south_input.value else -13.1
        north = float(north_input.value) if north_input.value else -12.8
        
        analysis_geom = ee.Geometry.Rectangle([west, south, east, north])
        area_km2 = analysis_geom.area().divide(1000000).getInfo()
        
        # Add rectangle to map with proper visualization
        roi_image = ee.Image().paint(analysis_geom, 1, 2)
        m.addLayer(roi_image, {'palette': ['red'], 'max': 1}, 'ROI')
        m.centerObject(analysis_geom, 11)
        
        print(f'✅ ROI set from coordinates: {area_km2:.1f} km²')
        print(f'   Bounds: W={west:.3f}, E={east:.3f}, S={south:.3f}, N={north:.3f}')
        return True
    except Exception as e:
        print(f'❌ Error setting ROI from coordinates: {e}')
        return False

def browse_raster_file():
    '''Open file browser to select raster file'''
    try:
        # Create a temporary tkinter root window
        root = tk.Tk()
        root.withdraw()  # Hide the root window
        
        # Open file dialog
        file_path = filedialog.askopenfilename(
            title='Select Reference Raster File',
            filetypes=[
                ('Raster files', '*.tif *.tiff *.img *.nc *.hdf *.jp2'),
                ('GeoTIFF', '*.tif *.tiff'),
                ('NetCDF', '*.nc'),
                ('All files', '*.*')
            ]
        )
        
        root.destroy()  # Clean up
        
        if file_path:
            raster_path_display.value = file_path
            print(f'📁 Selected file: {os.path.basename(file_path)}')
            print(f'    Full path: {file_path}')
            return file_path
        else:
            print('❌ No file selected')
            return None
            
    except Exception as e:
        print(f'❌ Error opening file browser: {e}')
        print('   Note: File browser requires GUI environment')
        return None

def set_roi_from_raster():
    '''Set ROI from selected raster extent with proper CRS handling'''
    global analysis_geom
    
    try:
        raster_path = raster_path_display.value.strip()
        
        if not raster_path or not os.path.exists(raster_path):
            print('❌ Please select a valid raster file first')
            return False
        
        print(f'📖 Reading raster: {os.path.basename(raster_path)}')
        
        # Read raster bounds and CRS information
        with rasterio.open(raster_path) as src:
            bounds = src.bounds
            crs = src.crs
            shape = src.shape
            transform = src.transform
            
            # Get bounds in original CRS
            west, south, east, north = bounds.left, bounds.bottom, bounds.right, bounds.top
            
            print(f'   📊 Raster info:')
            print(f'      CRS: {crs}')
            print(f'      Shape: {shape}')
            print(f'      Original bounds: W={west:.3f}, E={east:.3f}, S={south:.3f}, N={north:.3f}')
            
            # Transform to WGS84 if needed
            if crs.to_epsg() != 4326:
                from rasterio.warp import transform_bounds
                west_wgs84, south_wgs84, east_wgs84, north_wgs84 = transform_bounds(
                    crs, 'EPSG:4326', west, south, east, north
                )
                print(f'      Transformed to WGS84:')
                print(f'      WGS84 bounds: W={west_wgs84:.6f}, E={east_wgs84:.6f}, S={south_wgs84:.6f}, N={north_wgs84:.6f}')
                west, south, east, north = west_wgs84, south_wgs84, east_wgs84, north_wgs84
            else:
                print(f'      Already in WGS84')
        
        # Validate bounds are reasonable
        if abs(west) > 180 or abs(east) > 180 or abs(south) > 90 or abs(north) > 90:
            print(f'❌ Invalid bounds detected - coordinates out of valid range')
            print(f'   This suggests a CRS projection issue')
            return False
        
        if west >= east or south >= north:
            print(f'❌ Invalid bounds - west >= east or south >= north')
            return False
        
        # Create geometry in WGS84
        analysis_geom = ee.Geometry.Rectangle([west, south, east, north], 'EPSG:4326')
        area_km2 = analysis_geom.area().divide(1000000).getInfo()
        
        # Add to map with proper visualization
        roi_image = ee.Image().paint(analysis_geom, 1, 2)
        m.addLayer(roi_image, {'palette': ['blue'], 'max': 1}, 'Raster ROI')
        m.centerObject(analysis_geom, 11)
        
        print(f'   ✅ ROI set from raster extent: {area_km2:.1f} km²')
        print(f'   Final WGS84 bounds: W={west:.6f}, E={east:.6f}, S={south:.6f}, N={north:.6f}')
        return True
        
    except Exception as e:
        print(f'❌ Error setting ROI from raster: {e}')
        import traceback
        print(f'   Details: {traceback.format_exc()}')
        return False

# ROI input widgets
west_input = widgets.FloatText(value=-38.7, description='West:')
east_input = widgets.FloatText(value=-38.3, description='East:')
south_input = widgets.FloatText(value=-13.1, description='South:')
north_input = widgets.FloatText(value=-12.8, description='North:')

# File browser widgets
raster_path_display = widgets.Text(
    value='',
    placeholder='No file selected...',
    description='Selected File:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='500px'),
    disabled=True  # Read-only display
)

browse_button = widgets.Button(
    description='📂 Browse Files',
    button_style='info',
    tooltip='Click to select a raster file'
)
browse_button.on_click(lambda b: browse_raster_file())

# Action buttons
set_drawing_button = widgets.Button(description='📍 Use Drawing', button_style='success')
set_coords_button = widgets.Button(description='📍 Use Coordinates', button_style='info')
set_raster_button = widgets.Button(description='📍 Use Raster Extent', button_style='warning')

set_drawing_button.on_click(lambda b: set_roi_from_drawing())
set_coords_button.on_click(lambda b: set_roi_from_coordinates())
set_raster_button.on_click(lambda b: set_roi_from_raster())

roi_interface = widgets.VBox([
    widgets.HTML('<h3>🎯 ROI Selection for Seasonal Analysis</h3>'),
    
    widgets.HTML('<b>Method 1: Draw on Map</b>'),
    widgets.HTML('Draw a polygon or rectangle on the map below, then click:'),
    set_drawing_button,
    
    widgets.HTML('<b>Method 2: Enter Coordinates</b>'),
    widgets.HBox([west_input, east_input]),
    widgets.HBox([south_input, north_input]),
    set_coords_button,
    
    widgets.HTML('<b>Method 3: Use Raster File Extent</b>'),
    widgets.HTML('Click Browse to select a reference raster file:'),
    widgets.HBox([browse_button, raster_path_display]),
    set_raster_button
])

display(roi_interface)
display(m)

print('🎯 Enhanced ROI Selection Ready for Seasonal Analysis')
print('🔧 Complete functionality from notebook 17: drawing, coordinates, and file browser')
print('Choose any method to define your region of interest')

VBox(children=(HTML(value='<h3>🎯 ROI Selection for Seasonal Analysis</h3>'), HTML(value='<b>Method 1: Draw on …

Map(center=[-12.9714, -38.5014], controls=(WidgetControl(options=['position', 'transparent_bg'], position='top…

🎯 Enhanced ROI Selection Ready for Seasonal Analysis
🔧 Complete functionality from notebook 17: drawing, coordinates, and file browser
Choose any method to define your region of interest


## 📊 Step 2: Analysis Configuration - Enhanced for Seasonal Method

In [3]:
# Analysis configuration with seasonal parameters
analysis_year = widgets.IntSlider(value=2020, min=2003, max=2020, description='Analysis Year:')
reference_start = widgets.IntSlider(value=2003, min=2003, max=2019, description='Reference Start:')
reference_end = widgets.IntSlider(value=2019, min=2004, max=2020, description='Reference End:')
absolute_threshold = widgets.FloatSlider(value=35.0, min=20.0, max=45.0, step=0.5, description='Threshold (°C):')
percentile_threshold = widgets.FloatSlider(value=90.0, min=50.0, max=99.0, step=1.0, description='Percentile:')
day_window = widgets.IntSlider(value=5, min=1, max=15, step=1, description='Day Window (±):')

config_interface = widgets.VBox([
    widgets.HTML('<h3>📊 Seasonal Analysis Configuration</h3>'),
    widgets.HTML('<div style="background-color: #e7f3ff; padding: 10px; border-radius: 5px;">' +
                '<b>🔬 Seasonal Method:</b> Each day compared to historical temperatures ' +
                'for the same calendar day ± window over reference period.</div>'),
    widgets.HTML('<div style="background-color: #fff3cd; padding: 10px; border-radius: 5px;">' +
                '<b>⚠️ Data Availability:</b> GSHTD data available 2003-2020. ' +
                'Reference period adjusted to available years (17 years still provides robust percentiles).</div>'),
    analysis_year,
    widgets.HTML('<b>Reference Period (GSHTD available 2003-2020):</b>'),
    widgets.HBox([reference_start, reference_end]),
    widgets.HTML('<b>Threshold Parameters:</b>'),
    widgets.HBox([absolute_threshold, percentile_threshold]),
    widgets.HTML('<b>Seasonal Window:</b>'),
    day_window,
    widgets.HTML('<small><i>Day Window: ±5 means July 15 compared to July 10-20 historical data</i></small>')
])

display(config_interface)
print('📊 Seasonal Configuration Ready')
print('🌡️ Formula: Heat_Day = LST_max > max(absolute_threshold, seasonal_percentile)')
print('⚠️ GSHTD data: 2003-2020 (17 years provides robust seasonal percentiles)')

VBox(children=(HTML(value='<h3>📊 Seasonal Analysis Configuration</h3>'), HTML(value='<div style="background-co…

📊 Seasonal Configuration Ready
🌡️ Formula: Heat_Day = LST_max > max(absolute_threshold, seasonal_percentile)
⚠️ GSHTD data: 2003-2020 (17 years provides robust seasonal percentiles)


## 🔄 Step 3: Data Extraction (Same Architecture as Notebook 17)

In [4]:
def extract_temperature_data():
    '''Extract temperature data from GSHTD using monthly chunking for large datasets at 1km resolution'''
    global temperature_data, analysis_geom
    
    if analysis_geom is None:
        print('❌ Please set an ROI first!')
        return False
    
    try:
        print('🔄 Extracting temperature data for seasonal analysis...')
        
        year = analysis_year.value
        ref_start = reference_start.value
        ref_end = reference_end.value
        
        # Debug ROI information
        area_km2 = analysis_geom.area().divide(1000000).getInfo()
        bounds = analysis_geom.bounds().getInfo()
        print(f'   📏 ROI area: {area_km2:.2f} km²')
        print(f'   🗺️ ROI bounds: {bounds}')
        
        # Function to get regional collection based on location
        def get_region_collection(geom):
            """Determine which regional GSHTD collection to use based on geometry location"""
            centroid = geom.centroid().coordinates().getInfo()
            lon, lat = centroid[0], centroid[1]
            
            if lat > 15 and lon > -140 and lon < -40:  # North America
                return "projects/sat-io/open-datasets/global-daily-air-temp/north_america"
            elif lat < 35 and lon > -120 and lon < -30:  # Latin America  
                return "projects/sat-io/open-datasets/global-daily-air-temp/latin_america"
            elif lat > 30 and lon > -15 and lon < 180:  # Europe & Asia
                return "projects/sat-io/open-datasets/global-daily-air-temp/europe_asia"
            elif lat < 40 and lon > -20 and lon < 55:  # Africa
                return "projects/sat-io/open-datasets/global-daily-air-temp/africa"
            elif lat < -5 and lon > 110 and lon < 180:  # Australia
                return "projects/sat-io/open-datasets/global-daily-air-temp/australia"
            else:
                return "projects/sat-io/open-datasets/global-daily-air-temp/north_america"  # Default
        
        # Function to get temperature collection
        def get_temperature_collection(region_geom, start_date, end_date, temp_type='tmax'):
            """Get daily air temperature collection for the specified region and period"""
            collection_id = get_region_collection(region_geom)
            print(f'   📡 Using collection: {collection_id.split("/")[-1]}')
            
            collection = ee.ImageCollection(collection_id)
            
            # Filter by date, bounds, and temperature type using prop_type metadata
            filtered_collection = (collection.filterDate(start_date, end_date)
                                 .filterBounds(region_geom)
                                 .filter(ee.Filter.eq('prop_type', temp_type)))
            
            # Just select and clip, scaling handled in pandas processing
            temp_collection = filtered_collection.map(lambda img: 
                img.select('b1')
                  .clip(region_geom)
                  .copyProperties(img, ['system:time_start'])
            )
            
            return temp_collection
        
        # Test pixel count at 1km resolution
        test_collection = get_temperature_collection(analysis_geom, f'{year}-01-01', f'{year}-01-02', 'tmax')
        
        if test_collection.size().getInfo() == 0:
            print('❌ No images found for test date - check ROI coverage')
            return False
        
        first_image = test_collection.first()
        pixel_count = first_image.select('b1').reduceRegion(
            reducer=ee.Reducer.count(),
            geometry=analysis_geom,
            scale=1000,  # 1km resolution
            maxPixels=1e9
        ).getInfo()
        
        expected_pixels = pixel_count.get('b1', 0)
        print(f'   🔍 Expected pixels per image at 1km: {expected_pixels}')
        
        if expected_pixels == 0:
            print('❌ No pixels found in ROI - check if ROI overlaps with data coverage')
            return False
        
        # Calculate years to extract (need full reference period for seasonal analysis)
        years_to_extract = list(range(ref_start, ref_end + 1)) + [year]
        years_to_extract = sorted(list(set(years_to_extract)))  # Remove duplicates and sort
        
        print(f'   📅 Will extract {len(years_to_extract)} years: {years_to_extract}')
        print(f'   🔬 Extended period needed for seasonal percentile calculation')
        
        # Check if we need monthly chunking for very large datasets
        total_estimated = expected_pixels * 365 * len(years_to_extract)
        print(f'   🔍 Total estimated values for all years: {total_estimated:,}')
        
        if total_estimated > 2000000 or expected_pixels > 300:  # Very large dataset
            print(f'   ⚠️ Dataset too large - using monthly chunking extraction')
            use_monthly_chunking = True
        else:
            print(f'   ✅ Using yearly extraction')
            use_monthly_chunking = False
        
        # Extract data with appropriate chunking strategy
        all_dataframes = []
        
        if use_monthly_chunking:
            print(f'\n   📅 Extracting data month by month for {len(years_to_extract)} years...')
            total_months = len(years_to_extract) * 12
            processed_months = 0
            
            for extract_year in years_to_extract:
                print(f'\n   📅 Processing year {extract_year}...')
                
                for month in range(1, 13):
                    print(f'      Month {month:02d} ({processed_months+1}/{total_months})...', end=' ')
                    
                    # Monthly date range
                    start_date = f'{extract_year}-{month:02d}-01'
                    if month == 12:
                        end_date = f'{extract_year+1}-01-01'
                    else:
                        end_date = f'{extract_year}-{month+1:02d}-01'
                    
                    month_collection = get_temperature_collection(
                        analysis_geom, start_date, end_date, 'tmax'
                    )
                    
                    month_count = month_collection.size().getInfo()
                    if month_count == 0:
                        print('No data')
                        processed_months += 1
                        continue
                    
                    month_estimated = expected_pixels * month_count
                    if month_estimated > 800000:  # Even monthly is too large
                        scale = 2000
                        print(f'Using 2km scale ({month_estimated:,} values)')
                    else:
                        scale = 1000
                        print(f'Using 1km scale ({month_estimated:,} values)')
                    
                    try:
                        region_data = month_collection.getRegion(
                            geometry=analysis_geom,
                            scale=scale,
                            crs='EPSG:4326'
                        ).getInfo()
                        
                        if len(region_data) > 1:  # More than just header
                            header = region_data[0]
                            data = region_data[1:]
                            
                            df_month = pd.DataFrame(data, columns=header)
                            df_month['time'] = pd.to_datetime(df_month['time'], unit='ms')
                            
                            # Apply temperature scaling and rename column
                            if 'b1' in df_month.columns:
                                df_month['temperature'] = df_month['b1'] / 10.0  # Scale to Celsius
                                df_month = df_month.drop(columns=['b1'])
                            
                            # Drop nulls and filter realistic temperatures
                            df_month = df_month.dropna(subset=['temperature'])
                            df_month = df_month[df_month['temperature'] <= 50.0]  # Remove outliers
                            
                            if len(df_month) > 0:
                                all_dataframes.append(df_month)
                        
                        processed_months += 1
                        
                    except Exception as e:
                        print(f'Failed: {e}')
                        processed_months += 1
                        continue
        
        else:  # Yearly extraction for smaller datasets
            for extract_year in years_to_extract:
                print(f'\n   📅 Extracting year {extract_year}...')
                
                year_collection = get_temperature_collection(
                    analysis_geom, f'{extract_year}-01-01', f'{extract_year}-12-31', 'tmax'
                )
                
                year_size = year_collection.size().getInfo()
                estimated_values = expected_pixels * year_size
                
                print(f'      Images: {year_size}, Estimated values: {estimated_values:,}')
                
                if estimated_values > 900000:  # Still too large
                    print(f'      ⚠️ Still too large for single year, using 2km scale')
                    scale = 2000
                else:
                    print(f'      ✅ Using 1km scale')
                    scale = 1000
                
                try:
                    region_data = year_collection.getRegion(
                        geometry=analysis_geom,
                        scale=scale,
                        crs='EPSG:4326'
                    ).getInfo()
                    
                    print(f'      ✅ Extracted {len(region_data)} rows')
                    
                    if len(region_data) > 1:  # More than just header
                        header = region_data[0]
                        data = region_data[1:]
                        
                        df_year = pd.DataFrame(data, columns=header)
                        df_year['time'] = pd.to_datetime(df_year['time'], unit='ms')
                        
                        # Apply temperature scaling and rename column BEFORE dropping nulls
                        if 'b1' in df_year.columns:
                            df_year['temperature'] = df_year['b1'] / 10.0  # Scale to Celsius
                            df_year = df_year.drop(columns=['b1'])  # Remove original column
                        
                        # Now drop nulls from the correctly named temperature column
                        df_year = df_year.dropna(subset=['temperature'])
                        
                        print(f'      📊 Valid observations: {len(df_year)}')
                        
                        if len(df_year) > 0:
                            all_dataframes.append(df_year)
                    
                except Exception as e:
                    print(f'      ❌ Failed to extract {extract_year}: {e}')
                    continue
        
        if not all_dataframes:
            print('❌ No data extracted for any period')
            return False
        
        # Combine all periods
        print(f'\n   🔗 Combining {len(all_dataframes)} data chunks...')
        df = pd.concat(all_dataframes, ignore_index=True)
        
        print(f'   📊 Total combined data: {len(df)} observations')
        
        unique_pixels = df[['longitude', 'latitude']].drop_duplicates()
        print(f'   📍 Unique spatial pixels: {len(unique_pixels)}')
        print(f'   🌡️ Temperature range: {df["temperature"].min():.1f}°C to {df["temperature"].max():.1f}°C')
        
        # Determine actual resolution achieved
        if 'scale' in locals():
            print(f'   📐 Resolution achieved: {scale}m')
        else:
            print(f'   📐 Resolution: Mixed (adaptive scaling)')
        
        # Show sample of the data
        print('\n📋 Sample of extracted data:')
        print(df[['time', 'latitude', 'longitude', 'temperature']].head())
        
        # Convert to xarray
        try:
            temperature_data = df.set_index(['time', 'latitude', 'longitude']).to_xarray()
            
            print(f'\n✅ Xarray dataset created successfully!')
            print(f'   📅 Time range: {temperature_data.time.min().values} to {temperature_data.time.max().values}')
            print(f'   🌍 Spatial dimensions: {temperature_data.dims["latitude"]} × {temperature_data.dims["longitude"]} pixels')
            print(f'   📊 Total observations: {temperature_data.temperature.count().values}')
            print(f'   🎯 Dataset: GSHTD Daily Air Temperature for Seasonal Analysis')
            print(f'   🔧 Extraction method: {"Monthly chunking" if use_monthly_chunking else "Yearly chunking"}')
            
            return True
            
        except Exception as e:
            print(f'❌ Error converting to xarray: {e}')
            print('   Raw DataFrame saved as backup for debugging')
            globals()['debug_df'] = df
            return False
        
    except Exception as e:
        print(f'❌ Error extracting data: {e}')
        import traceback
        print(f'   Details: {traceback.format_exc()}')
        return False

extract_button = widgets.Button(description='🔄 Extract Data', button_style='primary')
extract_button.on_click(lambda b: extract_temperature_data())

display(extract_button)
print('🔄 Ready to extract temperature data for seasonal analysis')
print('🎯 Will extract extended time period for robust seasonal percentiles')
print('⚡ NEW: Adaptive extraction with monthly chunking for large datasets')
print('🔧 Automatically handles Earth Engine 1M value limit at 1000m resolution')

Button(button_style='primary', description='🔄 Extract Data', style=ButtonStyle())

🔄 Ready to extract temperature data for seasonal analysis
🎯 Will extract extended time period for robust seasonal percentiles
⚡ NEW: Adaptive extraction with monthly chunking for large datasets
🔧 Automatically handles Earth Engine 1M value limit at 1000m resolution


## 🔬 Step 4: Seasonal Heat Days Calculation - NEW METHODOLOGY

In [5]:
def calculate_seasonal_heat_days():
    '''Calculate heat days using seasonal percentile methodology'''
    global temperature_data, analysis_results
    
    if temperature_data is None:
        print('❌ Please extract temperature data first!')
        return None
    
    try:
        print('🔬 Calculating seasonal extreme heat days...')
        
        year = analysis_year.value
        ref_start = reference_start.value
        ref_end = reference_end.value
        abs_threshold = absolute_threshold.value
        pct_threshold = percentile_threshold.value
        window_days = day_window.value
        
        results = {}
        
        # Filter data
        analysis_data = temperature_data.sel(time=str(year))
        reference_data = temperature_data.sel(time=slice(f'{ref_start}-01-01', f'{ref_end}-12-31'))
        
        print(f'   📅 Analysis year: {len(analysis_data.time)} days')
        print(f'   📅 Reference period: {len(reference_data.time)} days')
        print(f'   🪟 Day window: ±{window_days} days')
        
        # Add day of year to both datasets
        analysis_data = analysis_data.assign_coords(dayofyear=analysis_data.time.dt.dayofyear)
        reference_data = reference_data.assign_coords(dayofyear=reference_data.time.dt.dayofyear)
        
        print('\n🔬 Calculating seasonal percentiles for each day of year...')
        
        # Create seasonal percentile array
        seasonal_percentiles = xr.DataArray(
            np.full((366, len(analysis_data.latitude), len(analysis_data.longitude)), np.nan),
            dims=['dayofyear', 'latitude', 'longitude'],
            coords={
                'dayofyear': np.arange(1, 367),
                'latitude': analysis_data.latitude,
                'longitude': analysis_data.longitude
            }
        )
        
        # Calculate percentile for each day of year
        for doy in range(1, 367):  # 1-366 (including leap day)
            if doy % 50 == 0:  # Progress indicator
                print(f'   Processing day {doy}/366...')
            
            # Create window around day of year (circular for year boundaries)
            window_start = doy - window_days
            window_end = doy + window_days
            
            # Handle year boundaries (e.g., Jan 1 ± 5 includes Dec 27-31)
            if window_start <= 0:
                # Wrap around to end of year
                window_doys = list(range(366 + window_start, 367)) + list(range(1, window_end + 1))
            elif window_end > 366:
                # Wrap around to beginning of year
                window_doys = list(range(window_start, 367)) + list(range(1, window_end - 366 + 1))
            else:
                # Normal case
                window_doys = list(range(window_start, window_end + 1))
            
            # Select data for this window from all reference years
            window_data = reference_data.where(
                reference_data.dayofyear.isin(window_doys), drop=True
            )
            
            if len(window_data.time) > 0:
                # Calculate percentile for this day across all reference years
                day_percentile = window_data.temperature.quantile(
                    pct_threshold/100, dim='time', skipna=True
                )
                seasonal_percentiles[doy-1, :, :] = day_percentile
        
        print(f'\n✅ Seasonal percentiles calculated for all 366 days')
        
        # Apply seasonal thresholds to analysis year
        print('\n🔥 Calculating seasonal heat days...')
        
        # Create threshold array for each day in analysis year
        daily_thresholds = xr.full_like(analysis_data.temperature, np.nan)
        
        for i, day_time in enumerate(analysis_data.time):
            doy = int(day_time.dt.dayofyear.values)
            day_seasonal_pct = seasonal_percentiles[doy-1, :, :]
            
            # Apply formula: max(absolute_threshold, seasonal_percentile)
            day_threshold = xr.where(
                day_seasonal_pct > abs_threshold,
                day_seasonal_pct,
                abs_threshold
            )
            
            daily_thresholds[i, :, :] = day_threshold
        
        # Calculate heat days: LST_daily_max > threshold for that day
        heat_days_boolean = analysis_data.temperature > daily_thresholds
        seasonal_heat_days = heat_days_boolean.sum(dim='time')
        
        # Store results
        results['seasonal_heat_days'] = seasonal_heat_days
        results['seasonal_percentiles_mean'] = seasonal_percentiles.mean(dim='dayofyear')
        results['daily_thresholds_mean'] = daily_thresholds.mean(dim='time')
        
        # Also calculate traditional metrics for comparison
        print('\n📊 Calculating comparison metrics...')
        
        # Traditional annual percentile method
        annual_percentile = reference_data.temperature.quantile(pct_threshold/100, dim='time')
        annual_threshold = xr.where(
            annual_percentile > abs_threshold,
            annual_percentile,
            abs_threshold
        )
        annual_heat_days = (analysis_data.temperature > annual_threshold).sum(dim='time')
        
        results['annual_heat_days'] = annual_heat_days
        results['annual_percentile'] = annual_percentile
        
        # Annual extremes
        results['annual_max'] = analysis_data.temperature.max(dim='time')
        results['annual_min'] = analysis_data.temperature.min(dim='time')
        results['annual_range'] = results['annual_max'] - results['annual_min']
        
        print('\n✅ Seasonal heat days calculation complete!')
        
        # Print comparison summary
        seasonal_mean = seasonal_heat_days.mean().values
        annual_mean = annual_heat_days.mean().values
        seasonal_max = seasonal_heat_days.max().values
        annual_max = annual_heat_days.max().values
        seasonal_pixels = (seasonal_heat_days > 0).sum().values
        annual_pixels = (annual_heat_days > 0).sum().values
        
        print(f'\n📊 METHODOLOGY COMPARISON:')
        print(f'   🔬 Seasonal Method:')
        print(f'      Mean heat days: {seasonal_mean:.1f}')
        print(f'      Max heat days: {seasonal_max:.0f}')
        print(f'      Pixels with >0 heat days: {seasonal_pixels}')
        print(f'   📅 Annual Method:')
        print(f'      Mean heat days: {annual_mean:.1f}')
        print(f'      Max heat days: {annual_max:.0f}')
        print(f'      Pixels with >0 heat days: {annual_pixels}')
        print(f'   📈 Difference:')
        print(f'      Mean: {seasonal_mean - annual_mean:+.1f} days')
        print(f'      Pixels: {seasonal_pixels - annual_pixels:+.0f}')
        
        return results
        
    except Exception as e:
        print(f'❌ Error calculating seasonal heat days: {e}')
        import traceback
        print(f'   Details: {traceback.format_exc()}')
        return None

# Analysis button
analyze_button = widgets.Button(description='🔬 Calculate Seasonal Heat Days', button_style='success')
analysis_results = None

def run_seasonal_analysis(button):
    global analysis_results
    print('🔄 Starting seasonal analysis...')
    analysis_results = calculate_seasonal_heat_days()
    if analysis_results is not None:
        print('✅ Seasonal analysis complete!')
        print('📁 Ready for visualization and export')
    else:
        print('❌ Seasonal analysis failed')

analyze_button.on_click(run_seasonal_analysis)

display(analyze_button)
print('🔬 Ready to calculate seasonal extreme heat days')
print('📐 Formula: Heat_Day = LST_max > max(absolute_threshold, seasonal_percentile)')
print('🪟 Each day compared to historical ±window days for climatological context')

Button(button_style='success', description='🔬 Calculate Seasonal Heat Days', style=ButtonStyle())

🔬 Ready to calculate seasonal extreme heat days
📐 Formula: Heat_Day = LST_max > max(absolute_threshold, seasonal_percentile)
🪟 Each day compared to historical ±window days for climatological context


## 📊 Step 5: Visualization - Seasonal vs Annual Comparison

In [6]:
def create_seasonal_visualizations():
    '''Create comprehensive visualizations comparing seasonal vs annual methods'''
    global analysis_results, temperature_data
    
    if analysis_results is None:
        print('❌ Please run seasonal analysis first!')
        return
    
    try:
        print('📊 Creating seasonal analysis visualizations...')
        
        # Create comprehensive comparison plot with time series analysis
        fig, axes = plt.subplots(4, 3, figsize=(24, 24))
        
        # Seasonal Heat Days Map
        seasonal_heat_days = analysis_results['seasonal_heat_days']
        seasonal_heat_days.plot(ax=axes[0,0], cmap='Reds', add_colorbar=True, 
                                cbar_kwargs={'label': 'Heat Days'})
        axes[0,0].set_title('🔬 Seasonal Method Heat Days', fontweight='bold')
        
        # Annual Heat Days Map
        annual_heat_days = analysis_results['annual_heat_days']
        annual_heat_days.plot(ax=axes[0,1], cmap='Reds', add_colorbar=True,
                             cbar_kwargs={'label': 'Heat Days'})
        axes[0,1].set_title('📅 Annual Method Heat Days', fontweight='bold')
        
        # Difference Map
        difference = seasonal_heat_days - annual_heat_days
        difference.plot(ax=axes[0,2], cmap='RdBu_r', add_colorbar=True,
                       cbar_kwargs={'label': 'Difference (Seasonal - Annual)'})
        axes[0,2].set_title('📈 Method Difference', fontweight='bold')
        
        # Heat Days Distribution Comparison
        seasonal_flat = seasonal_heat_days.values.flatten()
        annual_flat = annual_heat_days.values.flatten()
        seasonal_clean = seasonal_flat[~np.isnan(seasonal_flat)]
        annual_clean = annual_flat[~np.isnan(annual_flat)]
        
        axes[1,0].hist(seasonal_clean, bins=20, alpha=0.7, color='red', 
                      edgecolor='black', label='Seasonal Method')
        axes[1,0].hist(annual_clean, bins=20, alpha=0.5, color='blue',
                      edgecolor='black', label='Annual Method')
        axes[1,0].set_title('Heat Days Distribution Comparison', fontweight='bold')
        axes[1,0].set_xlabel('Heat Days per Pixel')
        axes[1,0].set_ylabel('Frequency')
        axes[1,0].legend()
        axes[1,0].grid(True, alpha=0.3)
        
        # Seasonal Thresholds Map
        seasonal_thresholds = analysis_results['seasonal_percentiles_mean']
        seasonal_thresholds.plot(ax=axes[1,1], cmap='viridis', add_colorbar=True,
                               cbar_kwargs={'label': 'Temperature (°C)'})
        axes[1,1].set_title('Mean Seasonal Thresholds', fontweight='bold')
        
        # Scatter plot comparison
        axes[1,2].scatter(annual_clean, seasonal_clean, alpha=0.6, s=20, color='purple')
        max_val = max(annual_clean.max(), seasonal_clean.max())
        axes[1,2].plot([0, max_val], [0, max_val], 'k--', alpha=0.5, label='1:1 line')
        axes[1,2].set_xlabel('Annual Method Heat Days')
        axes[1,2].set_ylabel('Seasonal Method Heat Days')
        axes[1,2].set_title('Method Correlation', fontweight='bold')
        axes[1,2].legend()
        axes[1,2].grid(True, alpha=0.3)
        
        # Monthly Threshold and Temperature Comparison (for analysis year)
        print('📊 Creating monthly threshold comparison...')
        
        try:
            # Get analysis year data
            analysis_year_val = analysis_year.value
            analysis_data = temperature_data.sel(time=str(analysis_year_val))
            
            # Calculate monthly statistics
            monthly_temps = analysis_data.temperature.groupby('time.month')
            monthly_max_temps = monthly_temps.max().mean(dim=['latitude', 'longitude'])
            
            # Get reference data for percentile calculations
            ref_start = reference_start.value
            ref_end = reference_end.value
            reference_data = temperature_data.sel(time=slice(f'{ref_start}-01-01', f'{ref_end}-12-31'))
            
            # Calculate monthly seasonal percentiles (average across all pixels)
            seasonal_pct_monthly = []
            for month in range(1, 13):
                # Get all days in this month across all reference years
                month_data = reference_data.where(reference_data.time.dt.month == month, drop=True)
                if len(month_data.time) > 0:
                    month_pct = month_data.temperature.quantile(
                        percentile_threshold.value/100, dim='time', skipna=True
                    ).mean(dim=['latitude', 'longitude'])
                    seasonal_pct_monthly.append(float(month_pct.values))
                else:
                    seasonal_pct_monthly.append(np.nan)
            
            # Calculate annual percentile (constant across months)
            annual_percentile_val = float(analysis_results['annual_percentile'].mean().values)
            
            # User-defined threshold (constant)
            user_threshold = absolute_threshold.value
            
            # Create monthly comparison plot
            months = range(1, 13)
            month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
            
            axes[2,0].plot(months, seasonal_pct_monthly, 'o-', linewidth=2, markersize=6, 
                          color='red', label=f'Seasonal {percentile_threshold.value:.0f}th Percentile')
            axes[2,0].axhline(annual_percentile_val, color='blue', linestyle='--', linewidth=2, 
                             label=f'Annual {percentile_threshold.value:.0f}th Percentile')
            axes[2,0].axhline(user_threshold, color='green', linestyle='-', linewidth=2, 
                             label=f'User Threshold ({user_threshold}°C)')
            axes[2,0].plot(months, monthly_max_temps, 's-', linewidth=2, markersize=6, 
                          color='orange', label=f'Avg Max Temp {analysis_year_val}')
            
            axes[2,0].set_xlabel('Month')
            axes[2,0].set_ylabel('Temperature (°C)')
            axes[2,0].set_title(f'Monthly Threshold Comparison ({analysis_year_val})', fontweight='bold')
            axes[2,0].set_xticks(months)
            axes[2,0].set_xticklabels(month_names)
            axes[2,0].legend()
            axes[2,0].grid(True, alpha=0.3)
            
            # Show which thresholds are used (max of absolute or percentile)
            effective_seasonal = [max(user_threshold, pct) for pct in seasonal_pct_monthly]
            effective_annual = max(user_threshold, annual_percentile_val)
            
            axes[2,1].plot(months, effective_seasonal, 'o-', linewidth=3, markersize=8, 
                          color='red', label='Seasonal Effective Threshold')
            axes[2,1].axhline(effective_annual, color='blue', linestyle='--', linewidth=3,
                             label='Annual Effective Threshold')
            axes[2,1].plot(months, monthly_max_temps, 's-', linewidth=2, markersize=6, 
                          color='orange', label=f'Avg Max Temp {analysis_year_val}')
            
            axes[2,1].set_xlabel('Month')
            axes[2,1].set_ylabel('Temperature (°C)')
            axes[2,1].set_title(f'Effective Thresholds ({analysis_year_val})', fontweight='bold')
            axes[2,1].set_xticks(months)
            axes[2,1].set_xticklabels(month_names)
            axes[2,1].legend()
            axes[2,1].grid(True, alpha=0.3)
            
            # Temperature exceedance analysis
            seasonal_exceed = [max(0, temp - thresh) for temp, thresh in zip(monthly_max_temps, effective_seasonal)]
            annual_exceed = [max(0, temp - effective_annual) for temp in monthly_max_temps]
            
            axes[2,2].plot(months, seasonal_exceed, 'o-', linewidth=2, markersize=6, 
                          color='red', label='Seasonal Exceedance')
            axes[2,2].plot(months, annual_exceed, 's-', linewidth=2, markersize=6, 
                          color='blue', label='Annual Exceedance')
            axes[2,2].axhline(0, color='black', linestyle='-', alpha=0.3)
            
            axes[2,2].set_xlabel('Month')
            axes[2,2].set_ylabel('Temperature Exceedance (°C)')
            axes[2,2].set_title(f'Monthly Temperature Exceedance ({analysis_year_val})', fontweight='bold')
            axes[2,2].set_xticks(months)
            axes[2,2].set_xticklabels(month_names)
            axes[2,2].legend()
            axes[2,2].grid(True, alpha=0.3)
            
        except Exception as monthly_error:
            print(f'⚠️ Error creating monthly comparison: {monthly_error}')
            # Fill with placeholder text
            for ax in [axes[2,0], axes[2,1], axes[2,2]]:
                ax.text(0.5, 0.5, 'Monthly comparison\nunavailable', 
                       ha='center', va='center', transform=ax.transAxes, fontsize=12)
                ax.set_title('Monthly Analysis - Error', fontweight='bold')
        
        # NEW: Full Time Series Analysis (2003-2020)
        print('📊 Creating full time series analysis (2003-2020)...')
        
        try:
            # Get full time series data
            ref_start = reference_start.value
            ref_end = reference_end.value
            analysis_year_val = analysis_year.value
            
            # Get all data from reference start to analysis year
            full_data = temperature_data.sel(time=slice(f'{ref_start}-01-01', f'{analysis_year_val}-12-31'))
            
            # Calculate annual statistics for each year
            years = range(ref_start, analysis_year_val + 1)
            annual_max_temps_ts = []
            annual_mean_temps_ts = []
            annual_percentile_ts = []
            
            # Get reference data for consistent percentile calculation
            reference_data = temperature_data.sel(time=slice(f'{ref_start}-01-01', f'{ref_end}-12-31'))
            ref_percentile = float(reference_data.temperature.quantile(
                percentile_threshold.value/100, dim='time', skipna=True
            ).mean(dim=['latitude', 'longitude']).values)
            
            for year in years:
                year_data = full_data.sel(time=str(year))
                if len(year_data.time) > 0:
                    # Annual maximum temperature (spatial average)
                    year_max = float(year_data.temperature.max(dim='time').mean(dim=['latitude', 'longitude']).values)
                    annual_max_temps_ts.append(year_max)
                    
                    # Annual mean temperature (spatial average)
                    year_mean = float(year_data.temperature.mean(dim=['time', 'latitude', 'longitude']).values)
                    annual_mean_temps_ts.append(year_mean)
                    
                    # Annual percentile for this specific year
                    year_pct = float(year_data.temperature.quantile(
                        percentile_threshold.value/100, dim='time', skipna=True
                    ).mean(dim=['latitude', 'longitude']).values)
                    annual_percentile_ts.append(year_pct)
                else:
                    annual_max_temps_ts.append(np.nan)
                    annual_mean_temps_ts.append(np.nan)
                    annual_percentile_ts.append(np.nan)
            
            # Time series plot 1: Annual temperatures and thresholds
            axes[3,0].plot(years, annual_max_temps_ts, 'o-', linewidth=2, markersize=6, 
                          color='red', label='Annual Max Temperature')
            axes[3,0].plot(years, annual_mean_temps_ts, 's-', linewidth=2, markersize=4, 
                          color='blue', label='Annual Mean Temperature')
            axes[3,0].axhline(ref_percentile, color='green', linestyle='--', linewidth=2, 
                             label=f'Reference {percentile_threshold.value:.0f}th Percentile')
            axes[3,0].axhline(user_threshold, color='orange', linestyle='-', linewidth=2, 
                             label=f'User Threshold ({user_threshold}°C)')
            
            axes[3,0].set_xlabel('Year')
            axes[3,0].set_ylabel('Temperature (°C)')
            axes[3,0].set_title('Temperature Time Series (2003-2020)', fontweight='bold')
            axes[3,0].legend()
            axes[3,0].grid(True, alpha=0.3)
            axes[3,0].set_xlim(ref_start-0.5, analysis_year_val+0.5)
            
            # Time series plot 2: Annual percentiles vs reference
            axes[3,1].plot(years, annual_percentile_ts, 'o-', linewidth=2, markersize=6, 
                          color='purple', label=f'Annual {percentile_threshold.value:.0f}th Percentile')
            axes[3,1].axhline(ref_percentile, color='green', linestyle='--', linewidth=2, 
                             label=f'Reference {percentile_threshold.value:.0f}th Percentile')
            axes[3,1].axhline(user_threshold, color='orange', linestyle='-', linewidth=2, 
                             label=f'User Threshold ({user_threshold}°C)')
            
            axes[3,1].set_xlabel('Year')
            axes[3,1].set_ylabel('Temperature (°C)')
            axes[3,1].set_title('Annual Percentiles vs Reference', fontweight='bold')
            axes[3,1].legend()
            axes[3,1].grid(True, alpha=0.3)
            axes[3,1].set_xlim(ref_start-0.5, analysis_year_val+0.5)
            
            # Time series plot 3: Temperature anomalies
            mean_temp_baseline = np.mean(annual_mean_temps_ts[:-1])  # Exclude analysis year
            mean_anomalies = [temp - mean_temp_baseline for temp in annual_mean_temps_ts]
            max_temp_baseline = np.mean(annual_max_temps_ts[:-1])  # Exclude analysis year
            max_anomalies = [temp - max_temp_baseline for temp in annual_max_temps_ts]
            
            axes[3,2].plot(years, mean_anomalies, 'o-', linewidth=2, markersize=6, 
                          color='blue', label='Mean Temp Anomaly')
            axes[3,2].plot(years, max_anomalies, 's-', linewidth=2, markersize=6, 
                          color='red', label='Max Temp Anomaly')
            axes[3,2].axhline(0, color='black', linestyle='-', alpha=0.5)
            axes[3,2].axvline(analysis_year_val, color='gray', linestyle=':', alpha=0.7, 
                             label=f'Analysis Year ({analysis_year_val})')
            
            axes[3,2].set_xlabel('Year')
            axes[3,2].set_ylabel('Temperature Anomaly (°C)')
            axes[3,2].set_title('Temperature Anomalies (relative to 2003-2019)', fontweight='bold')
            axes[3,2].legend()
            axes[3,2].grid(True, alpha=0.3)
            axes[3,2].set_xlim(ref_start-0.5, analysis_year_val+0.5)
            
        except Exception as timeseries_error:
            print(f'⚠️ Error creating time series analysis: {timeseries_error}')
            import traceback
            print(f'   Details: {traceback.format_exc()}')
            # Fill with placeholder text
            for ax in [axes[3,0], axes[3,1], axes[3,2]]:
                ax.text(0.5, 0.5, 'Time series analysis\nunavailable', 
                       ha='center', va='center', transform=ax.transAxes, fontsize=12)
                ax.set_title('Time Series Analysis - Error', fontweight='bold')
        
        plt.tight_layout()
        plt.show()
        
        # Print detailed statistics
        print('\n📊 DETAILED COMPARISON STATISTICS:')
        print(f'🔬 Seasonal Method:')
        print(f'   Range: {seasonal_clean.min():.0f} to {seasonal_clean.max():.0f} heat days')
        print(f'   Mean: {seasonal_clean.mean():.1f} ± {seasonal_clean.std():.1f}')
        print(f'   Median: {np.median(seasonal_clean):.1f}')
        print(f'   Pixels with >0: {(seasonal_clean > 0).sum()} ({(seasonal_clean > 0).mean()*100:.1f}%)')
        
        print(f'\n📅 Annual Method:')
        print(f'   Range: {annual_clean.min():.0f} to {annual_clean.max():.0f} heat days')
        print(f'   Mean: {annual_clean.mean():.1f} ± {annual_clean.std():.1f}')
        print(f'   Median: {np.median(annual_clean):.1f}')
        print(f'   Pixels with >0: {(annual_clean > 0).sum()} ({(annual_clean > 0).mean()*100:.1f}%)')
        
        # Correlation analysis
        correlation = np.corrcoef(annual_clean, seasonal_clean)[0,1]
        print(f'\n📈 Correlation between methods: {correlation:.3f}')
        
        if correlation > 0.8:
            print('   ✅ Strong positive correlation - methods generally agree')
        elif correlation > 0.5:
            print('   ⚠️ Moderate correlation - some differences in spatial patterns')
        else:
            print('   🔍 Low correlation - significant methodological differences')
        
        # Monthly threshold insights
        if 'seasonal_pct_monthly' in locals():
            print(f'\n📅 MONTHLY THRESHOLD INSIGHTS:')
            print(f'   Seasonal threshold range: {min(seasonal_pct_monthly):.1f} to {max(seasonal_pct_monthly):.1f}°C')
            print(f'   Annual threshold (constant): {annual_percentile_val:.1f}°C')
            print(f'   User threshold: {user_threshold}°C')
            
            seasonal_var = np.var(seasonal_pct_monthly)
            print(f'   Seasonal variation: {seasonal_var:.2f}°C² (higher = more seasonal contrast)')
        
        # Time series insights
        if 'annual_max_temps_ts' in locals():
            print(f'\n📈 TIME SERIES INSIGHTS (2003-2020):')
            print(f'   Average annual max: {np.nanmean(annual_max_temps_ts):.1f}°C')
            print(f'   Average annual mean: {np.nanmean(annual_mean_temps_ts):.1f}°C')
            print(f'   Max temp trend: {np.polyfit(years, annual_max_temps_ts, 1)[0]:.3f}°C/year')
            print(f'   {analysis_year_val} max temp anomaly: {max_anomalies[-1]:+.1f}°C')
            print(f'   {analysis_year_val} mean temp anomaly: {mean_anomalies[-1]:+.1f}°C')
            
        print('\n✅ Visualization complete!')
        
    except Exception as e:
        print(f'❌ Error creating visualizations: {e}')
        import traceback
        print(f'   Details: {traceback.format_exc()}')

viz_button = widgets.Button(description='📊 Create Visualizations', button_style='info')
viz_button.on_click(lambda b: create_seasonal_visualizations())

display(viz_button)
print('📊 Ready to create seasonal analysis visualizations')
print('🔬 Will compare seasonal vs annual methodologies')
print('📈 Includes spatial maps, distributions, correlation analysis, monthly and time series comparisons')
print('📅 NEW: Monthly threshold comparison + Full time series analysis (2003-2020)')
print('⏳ Enhanced visualization shows temperature trends, anomalies, and temporal context')

Button(button_style='info', description='📊 Create Visualizations', style=ButtonStyle())

📊 Ready to create seasonal analysis visualizations
🔬 Will compare seasonal vs annual methodologies
📈 Includes spatial maps, distributions, correlation analysis, monthly and time series comparisons
📅 NEW: Monthly threshold comparison + Full time series analysis (2003-2020)
⏳ Enhanced visualization shows temperature trends, anomalies, and temporal context


## 📁 Step 6: Export Results

In [7]:
def export_seasonal_results():
    '''Export seasonal analysis results to ../outputs directory'''
    global analysis_results, temperature_data
    
    if analysis_results is None:
        print('❌ Please run seasonal analysis first!')
        return
    
    try:
        print('📁 Exporting seasonal analysis results...')
        year = analysis_year.value
        
        # Create outputs directory
        os.makedirs('../outputs', exist_ok=True)
        
        # 1. Summary Table with Both Methods
        print('   📄 Creating comparison summary...')
        
        seasonal_heat_days = analysis_results['seasonal_heat_days']
        annual_heat_days = analysis_results['annual_heat_days']
        
        summary_data = [
            {
                'method': 'seasonal',
                'mean_heat_days': float(seasonal_heat_days.mean().values),
                'max_heat_days': float(seasonal_heat_days.max().values),
                'min_heat_days': float(seasonal_heat_days.min().values),
                'std_heat_days': float(seasonal_heat_days.std().values),
                'pixels_with_heat_days': int((seasonal_heat_days > 0).sum().values),
                'total_pixels': int(seasonal_heat_days.count().values)
            },
            {
                'method': 'annual',
                'mean_heat_days': float(annual_heat_days.mean().values),
                'max_heat_days': float(annual_heat_days.max().values),
                'min_heat_days': float(annual_heat_days.min().values),
                'std_heat_days': float(annual_heat_days.std().values),
                'pixels_with_heat_days': int((annual_heat_days > 0).sum().values),
                'total_pixels': int(annual_heat_days.count().values)
            }
        ]
        
        summary_df = pd.DataFrame(summary_data)
        summary_filename = f'../outputs/seasonal_comparison_{year}.csv'
        summary_df.to_csv(summary_filename, index=False)
        print(f'   ✅ Comparison summary saved: {summary_filename}')
        
        # Display summary
        print('\n📊 METHODOLOGY COMPARISON:')
        display(summary_df)
        
        # 2. Detailed Pixel Export
        print('\n   📊 Creating detailed pixel export...')
        
        lats = seasonal_heat_days.latitude.values
        lons = seasonal_heat_days.longitude.values
        
        pixel_data = []
        for i, lat in enumerate(lats):
            for j, lon in enumerate(lons):
                row = {
                    'pixel_id': f'{i}_{j}',
                    'latitude': float(lat),
                    'longitude': float(lon),
                    'seasonal_heat_days': float(seasonal_heat_days.loc[lat, lon].values),
                    'annual_heat_days': float(annual_heat_days.loc[lat, lon].values),
                    'heat_days_difference': float((seasonal_heat_days - annual_heat_days).loc[lat, lon].values),
                    'seasonal_threshold_mean': float(analysis_results['seasonal_percentiles_mean'].loc[lat, lon].values),
                    'annual_percentile': float(analysis_results['annual_percentile'].loc[lat, lon].values),
                    'annual_max_temp': float(analysis_results['annual_max'].loc[lat, lon].values)
                }
                pixel_data.append(row)
        
        pixel_df = pd.DataFrame(pixel_data)
        pixel_filename = f'../outputs/seasonal_pixels_{year}.csv'
        pixel_df.to_csv(pixel_filename, index=False)
        
        print(f'   ✅ Detailed pixel data saved: {pixel_filename}')
        print(f'      Rows: {len(pixel_df):,}')
        
        # Show sample
        print('\n📊 SAMPLE PIXEL DATA:')
        display(pixel_df.head(10))
        
        # 3. NetCDF Export with PROPER CRS (Fixed from notebook 17)
        print('\n   📦 Creating NetCDF file with proper CRS...')
        
        # Prepare spatial results
        spatial_results = {
            'seasonal_heat_days': seasonal_heat_days,
            'annual_heat_days': annual_heat_days,
            'heat_days_difference': seasonal_heat_days - annual_heat_days,
            'seasonal_percentiles_mean': analysis_results['seasonal_percentiles_mean'],
            'annual_percentile': analysis_results['annual_percentile'],
            'annual_max': analysis_results['annual_max'],
            'annual_min': analysis_results['annual_min'],
            'annual_range': analysis_results['annual_range']
        }
        
        # Fill NaN with 0 for consistent export
        for k, v in spatial_results.items():
            spatial_results[k] = v.fillna(0)
        
        results_ds = xr.Dataset(spatial_results)
        
        # Add proper CRS information (WGS84) - FIXED VERSION FROM NOTEBOOK 17
        results_ds.latitude.attrs['standard_name'] = 'latitude'
        results_ds.latitude.attrs['long_name'] = 'latitude'
        results_ds.latitude.attrs['units'] = 'degrees_north'
        results_ds.latitude.attrs['axis'] = 'Y'
        
        results_ds.longitude.attrs['standard_name'] = 'longitude'
        results_ds.longitude.attrs['long_name'] = 'longitude'
        results_ds.longitude.attrs['units'] = 'degrees_east'
        results_ds.longitude.attrs['axis'] = 'X'
        
        # Add CRS variable following CF conventions
        crs = xr.DataArray(
            data=np.int32(1),
            attrs={
                'grid_mapping_name': 'latitude_longitude',
                'longitude_of_prime_meridian': 0.0,
                'semi_major_axis': 6378137.0,
                'inverse_flattening': 298.257223563,
                'spatial_ref': 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]',
                'crs_wkt': 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]'
            }
        )
        results_ds['crs'] = crs
        
        # Add grid_mapping attribute to all data variables
        for var_name in results_ds.data_vars:
            if var_name != 'crs':
                results_ds[var_name].attrs['grid_mapping'] = 'crs'
        
        # Add metadata
        results_ds.attrs.update({
            'analysis_year': year,
            'reference_period': f'{reference_start.value}-{reference_end.value}',
            'created': datetime.now().isoformat(),
            'absolute_threshold': absolute_threshold.value,
            'percentile_threshold': percentile_threshold.value,
            'day_window': day_window.value,
            'methodology': 'seasonal_percentile_±window_days',
            'formula': 'Heat_Day = LST_max > max(absolute_threshold, seasonal_percentile)',
            'crs': 'EPSG:4326'
        })
        
        netcdf_filename = f'../outputs/seasonal_analysis_{year}.nc'
        results_ds.to_netcdf(netcdf_filename)
        
        print(f'   ✅ NetCDF saved with proper CRS: {netcdf_filename}')
        print(f'      Variables: {list(results_ds.data_vars)}')
        print(f'      Dimensions: {dict(results_ds.dims)}')
        print(f'      CRS: EPSG:4326 (WGS84)')
        print(f'      File size: {os.path.getsize(netcdf_filename) / 1024**2:.1f} MB')
        
        # 4. Create methodology documentation
        print('\n   📝 Creating methodology documentation...')
        
        doc_content = f"""SEASONAL EXTREME HEAT DAYS ANALYSIS - METHODOLOGY REPORT
===========================================================
Generated: {datetime.now().isoformat()}
Analysis Year: {year}
Reference Period: {reference_start.value}-{reference_end.value}

METHODOLOGY:
============
Formula: Surface_Heat_Day = 1 if LST_daily_max > max(LST_abs, LST_rel)

Where:
- LST_abs = {absolute_threshold.value}°C (absolute threshold)
- LST_rel = {percentile_threshold.value}th percentile of temperatures for calendar day ± {day_window.value} days
- Calendar day window: Each day compared to historical temperatures for same day ± {day_window.value} days
- Reference period: {reference_end.value - reference_start.value + 1} years of historical data

SCIENTIFIC RATIONALE:
====================
1. Seasonal Context: July heat vs January heat appropriately weighted
2. Climatological Accuracy: Follows meteorological best practices
3. Sensitive Detection: Can identify winter/spring heat anomalies
4. Day-specific Baselines: Each day compared to its historical context

RESULTS SUMMARY:
===============
Seasonal Method:
- Mean heat days: {seasonal_heat_days.mean().values:.1f}
- Max heat days: {seasonal_heat_days.max().values:.0f}
- Pixels with heat days: {(seasonal_heat_days > 0).sum().values} of {seasonal_heat_days.count().values}

Annual Method (comparison):
- Mean heat days: {annual_heat_days.mean().values:.1f}
- Max heat days: {annual_heat_days.max().values:.0f}
- Pixels with heat days: {(annual_heat_days > 0).sum().values} of {annual_heat_days.count().values}

DIFFERENCE (Seasonal - Annual):
- Mean difference: {(seasonal_heat_days - annual_heat_days).mean().values:.1f} days
- Correlation: {np.corrcoef(seasonal_heat_days.values.flatten(), annual_heat_days.values.flatten())[0,1]:.3f}

COORDINATE REFERENCE SYSTEM:
===========================
- CRS: EPSG:4326 (WGS84 Geographic)
- Units: Decimal degrees
- Datum: World Geodetic System 1984
- NetCDF includes proper CF-compliant CRS metadata

FILES GENERATED:
===============
1. {summary_filename} - Method comparison summary
2. {pixel_filename} - Detailed pixel-level results
3. {netcdf_filename} - NetCDF spatial dataset WITH PROPER CRS
4. seasonal_methodology_{year}.txt - This documentation

The seasonal methodology provides more climatologically appropriate 
extreme heat detection by accounting for natural seasonal temperature cycles.
NetCDF file includes proper CRS metadata for GIS compatibility.
"""
        
        doc_filename = f'../outputs/seasonal_methodology_{year}.txt'
        with open(doc_filename, 'w') as f:
            f.write(doc_content)
        
        print(f'   ✅ Methodology documentation saved: {doc_filename}')
        
        print('\n✅ Export complete with PROPER CRS!')
        print('\n📊 Files created:')
        print(f'   • {summary_filename} - Method comparison')
        print(f'   • {pixel_filename} - Detailed pixel results')
        print(f'   • {netcdf_filename} - NetCDF with WGS84 CRS metadata') 
        print(f'   • {doc_filename} - Methodology documentation')
        
    except Exception as e:
        print(f'❌ Error exporting: {e}')
        import traceback
        print(f'   Details: {traceback.format_exc()}')

export_button = widgets.Button(description='📁 Export Results', button_style='warning')
export_button.on_click(lambda b: export_seasonal_results())

display(export_button)
print('📁 Ready to export seasonal analysis results')
print('🔬 Will include both seasonal and annual method results for comparison')
print('🌍 FIXED: NetCDF export now includes proper WGS84 CRS metadata from notebook 17')



📁 Ready to export seasonal analysis results
🔬 Will include both seasonal and annual method results for comparison
🌍 FIXED: NetCDF export now includes proper WGS84 CRS metadata from notebook 17


## 🎯 Summary

This notebook implements the **climatologically appropriate** extreme heat day calculation using seasonal percentiles:

### 🔬 **Enhanced Methodology:**
- **Day-specific thresholds**: Each day compared to historical temperatures for same calendar day ± window
- **Seasonal context**: July heat vs January heat appropriately weighted  
- **Formula**: `Heat_Day = LST_max > max(absolute_threshold, seasonal_percentile)`
- **Robust baselines**: Uses 30+ years of reference data

### 📊 **Complete Workflow:**
1. **ROI Selection** - Same interface as notebook 17
2. **Configuration** - Enhanced with seasonal parameters
3. **Data Extraction** - Extended time period for robust percentiles
4. **Seasonal Analysis** - New day-of-year percentile calculation
5. **Comparison Visualization** - Seasonal vs annual methods
6. **Export** - Comprehensive results with methodology documentation

### 🎯 **Scientific Advantages:**
- More sensitive to seasonal heat anomalies
- Follows established meteorological practices
- Can detect winter/spring extreme heat events
- Provides climatologically meaningful baselines

This represents a significant methodological improvement over simple annual percentiles!

In [None]:
# the CRS is all wrong here, the final exports are in the incorrect CRS - apply fix from notebook 17
