# 🚀 Efficient GEE → xarray Climate Analysis

This notebook implements the **most efficient approach** for large-scale climate analysis:
1. **Minimal GEE preprocessing** - Just data filtering and scaling
2. **Direct array extraction** - No file exports, just raw pixel data
3. **Fast xarray processing** - Vectorized operations on extracted arrays
4. **Preserve coordinates** - Keep x,y for spatial conversion back to rasters

## Key Advantages:
- ⚡ **Lightning fast** - No file I/O bottlenecks
- 🎯 **No transfer limits** - Direct memory to memory
- 🚀 **Efficient processing** - xarray vectorization >> GEE server ops
- 📍 **Coordinates preserved** - Easy conversion back to NetCDF/TIFF

In [1]:
# Import required libraries
import ee
import geemap
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import ipywidgets as widgets
from IPython.display import display, clear_output
import os
import rasterio
from tkinter import filedialog
import tkinter as tk

# Set matplotlib to display inline
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 8)

# Initialize Earth Engine with your project
try:
    ee.Initialize(project='tl-cities')
    print('✅ Earth Engine initialized successfully')
except Exception as e:
    print(f'❌ Earth Engine initialization failed: {e}')

print('📦 Available packages:')
print(f'   - xarray: {xr.__version__}')
print(f'   - pandas: {pd.__version__}')
print(f'   - numpy: {np.__version__}')

# Create outputs directory
os.makedirs('../outputs', exist_ok=True)
print('📁 Created outputs directory: ../outputs')

*** Earth Engine *** Share your feedback by taking our Annual Developer Satisfaction Survey: https://google.qualtrics.com/jfe/form/SV_7TDKVSyKvBdmMqW?ref=4i2o6


✅ Earth Engine initialized successfully
📦 Available packages:
   - xarray: 2024.7.0
   - pandas: 2.3.1
   - numpy: 1.26.4
📁 Created outputs directory: ../outputs


In [2]:
# Remove the pip install cell and replace with header
## 🎯 Step 1: ROI Selection (Same as Notebook 17)

## 🎯 Step 1: ROI Selection (Same as Notebook 17)

In [2]:
# Global variables
analysis_geom = None
temperature_data = None
climate_results = {}

# Create map for ROI selection (same as notebooks 16/17)
m = geemap.Map(center=[-12.9714, -38.5014], zoom=10)  # Salvador, Brazil
m.add_basemap('SATELLITE')
m.add('draw_control')

def set_roi_from_drawing():
    '''Extract ROI from map drawing'''
    global analysis_geom
    
    try:
        if hasattr(m, 'draw_control') and len(m.draw_control.data) > 0:
            feature = m.draw_control.data[-1]
            coords = feature['geometry']['coordinates']
            
            if feature['geometry']['type'] == 'Polygon':
                analysis_geom = ee.Geometry.Polygon(coords)
            elif feature['geometry']['type'] == 'Rectangle':
                analysis_geom = ee.Geometry.Rectangle(coords)
            
            area_km2 = analysis_geom.area().divide(1000000).getInfo()
            bounds_info = analysis_geom.bounds().getInfo()['coordinates'][0]
            west, south = bounds_info[0]
            east, north = bounds_info[2]
            
            print(f'✅ ROI set from drawing: {area_km2:.1f} km²')
            print(f'   Bounds: W={west:.3f}, E={east:.3f}, S={south:.3f}, N={north:.3f}')
            return True
        else:
            print('❌ No drawing found. Please draw a polygon or rectangle on the map.')
            return False
    except Exception as e:
        print(f'❌ Error setting ROI from drawing: {e}')
        return False

def set_roi_from_coordinates():
    '''Set ROI from coordinate inputs'''
    global analysis_geom
    
    try:
        west = float(west_input.value) if west_input.value else -38.7
        east = float(east_input.value) if east_input.value else -38.3
        south = float(south_input.value) if south_input.value else -13.1
        north = float(north_input.value) if north_input.value else -12.8
        
        analysis_geom = ee.Geometry.Rectangle([west, south, east, north])
        area_km2 = analysis_geom.area().divide(1000000).getInfo()
        
        roi_image = ee.Image().paint(analysis_geom, 1, 2)
        m.addLayer(roi_image, {'palette': ['red'], 'max': 1}, 'ROI')
        m.centerObject(analysis_geom, 11)
        
        print(f'✅ ROI set from coordinates: {area_km2:.1f} km²')
        print(f'   Bounds: W={west:.3f}, E={east:.3f}, S={south:.3f}, N={north:.3f}')
        return True
    except Exception as e:
        print(f'❌ Error setting ROI from coordinates: {e}')
        return False

def browse_raster_file():
    '''Open file browser to select raster file'''
    try:
        root = tk.Tk()
        root.withdraw()
        
        file_path = filedialog.askopenfilename(
            title='Select Reference Raster File',
            filetypes=[
                ('Raster files', '*.tif *.tiff *.img *.nc *.hdf *.jp2'),
                ('GeoTIFF', '*.tif *.tiff'),
                ('NetCDF', '*.nc'),
                ('All files', '*.*')
            ]
        )
        
        root.destroy()
        
        if file_path:
            raster_path_display.value = file_path
            print(f'📁 Selected file: {os.path.basename(file_path)}')
            return file_path
        else:
            print('❌ No file selected')
            return None
            
    except Exception as e:
        print(f'❌ Error opening file browser: {e}')
        return None

def set_roi_from_raster():
    '''Set ROI from selected raster extent with proper CRS handling'''
    global analysis_geom
    
    try:
        raster_path = raster_path_display.value.strip()
        
        if not raster_path or not os.path.exists(raster_path):
            print('❌ Please select a valid raster file first')
            return False
        
        print(f'📖 Reading raster: {os.path.basename(raster_path)}')
        
        with rasterio.open(raster_path) as src:
            bounds = src.bounds
            crs = src.crs
            
            west, south, east, north = bounds.left, bounds.bottom, bounds.right, bounds.top
            
            print(f'   📊 Original CRS: {crs}')
            print(f'   📊 Original bounds: W={west:.3f}, E={east:.3f}, S={south:.3f}, N={north:.3f}')
            
            # Transform to WGS84 if needed
            if crs.to_epsg() != 4326:
                from rasterio.warp import transform_bounds
                west, south, east, north = transform_bounds(
                    crs, 'EPSG:4326', west, south, east, north
                )
                print(f'   🔄 Transformed to WGS84: W={west:.6f}, E={east:.6f}, S={south:.6f}, N={north:.6f}')
        
        # Create geometry
        analysis_geom = ee.Geometry.Rectangle([west, south, east, north], 'EPSG:4326')
        area_km2 = analysis_geom.area().divide(1000000).getInfo()
        
        roi_image = ee.Image().paint(analysis_geom, 1, 2)
        m.addLayer(roi_image, {'palette': ['blue'], 'max': 1}, 'Raster ROI')
        m.centerObject(analysis_geom, 11)
        
        print(f'   ✅ ROI set from raster extent: {area_km2:.1f} km²')
        return True
        
    except Exception as e:
        print(f'❌ Error setting ROI from raster: {e}')
        return False

# ROI input widgets
west_input = widgets.FloatText(value=-38.7, description='West:')
east_input = widgets.FloatText(value=-38.3, description='East:')
south_input = widgets.FloatText(value=-13.1, description='South:')
north_input = widgets.FloatText(value=-12.8, description='North:')

raster_path_display = widgets.Text(
    value='',
    placeholder='No file selected...',
    description='Selected File:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='500px'),
    disabled=True
)

browse_button = widgets.Button(description='📂 Browse Files', button_style='info')
browse_button.on_click(lambda b: browse_raster_file())

# Action buttons
set_drawing_button = widgets.Button(description='📍 Use Drawing', button_style='success')
set_coords_button = widgets.Button(description='📍 Use Coordinates', button_style='info')
set_raster_button = widgets.Button(description='📍 Use Raster Extent', button_style='warning')

set_drawing_button.on_click(lambda b: set_roi_from_drawing())
set_coords_button.on_click(lambda b: set_roi_from_coordinates())
set_raster_button.on_click(lambda b: set_roi_from_raster())

roi_interface = widgets.VBox([
    widgets.HTML('<h3>🎯 ROI Selection</h3>'),
    widgets.HTML('<b>Method 1: Draw on Map</b>'),
    set_drawing_button,
    widgets.HTML('<b>Method 2: Enter Coordinates</b>'),
    widgets.HBox([west_input, east_input]),
    widgets.HBox([south_input, north_input]),
    set_coords_button,
    widgets.HTML('<b>Method 3: Use Raster File Extent</b>'),
    widgets.HBox([browse_button, raster_path_display]),
    set_raster_button
])

display(roi_interface)
display(m)

print('🎯 ROI Selection Ready')

VBox(children=(HTML(value='<h3>🎯 ROI Selection</h3>'), HTML(value='<b>Method 1: Draw on Map</b>'), Button(butt…

Map(center=[-12.9714, -38.5014], controls=(WidgetControl(options=['position', 'transparent_bg'], position='top…

🎯 ROI Selection Ready


## 📊 Step 2: Analysis Configuration

In [3]:
# Analysis configuration
analysis_year = widgets.IntSlider(value=2020, min=2003, max=2020, description='Analysis Year:')
reference_start = widgets.IntSlider(value=2010, min=2003, max=2019, description='Reference Start:')
reference_end = widgets.IntSlider(value=2019, min=2004, max=2020, description='Reference End:')
absolute_threshold = widgets.FloatSlider(value=35.0, min=20.0, max=45.0, step=0.5, description='Threshold (°C):')
percentile_threshold = widgets.FloatSlider(value=95.0, min=50.0, max=99.0, step=1.0, description='Percentile:')

# Resolution selection for array extraction
resolution_selector = widgets.Dropdown(
    options=[('1km (recommended)', 1000), ('2km', 2000), ('5km', 5000)],
    value=1000,
    description='Resolution:'
)

config_interface = widgets.VBox([
    widgets.HTML('<h3>📊 Analysis Configuration</h3>'),
    widgets.HTML('<div style="background-color: #e8f4fd; padding: 10px; border-radius: 5px;">' +
                '<b>Efficient Array Approach:</b><br>' +
                '• Minimal GEE preprocessing (filtering + scaling only)<br>' +
                '• Direct array extraction using getRegion()<br>' +
                '• Fast xarray vectorized analysis locally<br>' +
                '• Coordinates preserved for spatial export</div>'),
    analysis_year,
    widgets.HBox([reference_start, reference_end]),
    widgets.HBox([absolute_threshold, percentile_threshold]),
    resolution_selector
])

display(config_interface)
print('📊 Configuration Ready for Efficient Array Extraction')

VBox(children=(HTML(value='<h3>📊 Analysis Configuration</h3>'), HTML(value='<div style="background-color: #e8f…

📊 Configuration Ready for Efficient Array Extraction


## ⚡ Step 3: Direct Array Extraction & Fast xarray Processing

In [4]:
def get_region_collection(geom):
    """Determine which regional GSHTD collection to use"""
    centroid = geom.centroid().coordinates().getInfo()
    lon, lat = centroid[0], centroid[1]
    
    if lat > 15 and lon > -140 and lon < -40:
        return "projects/sat-io/open-datasets/global-daily-air-temp/north_america"
    elif lat < 35 and lon > -120 and lon < -30:
        return "projects/sat-io/open-datasets/global-daily-air-temp/latin_america"
    elif lat > 30 and lon > -15 and lon < 180:
        return "projects/sat-io/open-datasets/global-daily-air-temp/europe_asia"
    elif lat < 40 and lon > -20 and lon < 55:
        return "projects/sat-io/open-datasets/global-daily-air-temp/africa"
    elif lat < -5 and lon > 110 and lon < 180:
        return "projects/sat-io/open-datasets/global-daily-air-temp/australia"
    else:
        return "projects/sat-io/open-datasets/global-daily-air-temp/north_america"

def get_temperature_collection(region_geom, start_date, end_date, temp_type='tmax'):
    """Get GSHTD temperature collection with minimal preprocessing"""
    collection_id = get_region_collection(region_geom)
    collection = ee.ImageCollection(collection_id)
    
    # Minimal preprocessing - just filter and scale
    filtered_collection = (collection.filterDate(start_date, end_date)
                         .filterBounds(region_geom)
                         .filter(ee.Filter.eq('prop_type', temp_type)))
    
    # Scale to Celsius and clip to ROI
    temp_collection = filtered_collection.map(lambda img: 
        img.select('b1')
          .divide(10)  # Scale to Celsius
          .rename('temperature')
          .clip(region_geom)
          .copyProperties(img, ['system:time_start'])
    )
    
    return temp_collection

def extract_arrays_efficiently():
    '''Extract temperature arrays from GEE using chunking strategies'''
    global analysis_geom, temperature_data
    
    if analysis_geom is None:
        print('❌ Please set an ROI first!')
        return False
    
    try:
        print('⚡ Starting chunked array extraction from GEE...')
        
        year = analysis_year.value
        ref_start = reference_start.value
        ref_end = reference_end.value
        scale = resolution_selector.value
        
        area_km2 = analysis_geom.area().divide(1000000).getInfo()
        print(f'   📏 ROI area: {area_km2:.2f} km² at {scale}m resolution')
        
        # Get collections
        print('   📡 Loading GSHTD collections...')
        collection_id = get_region_collection(analysis_geom)
        print(f'   📡 Using: {collection_id.split("/")[-1]}')
        
        # Analysis year collection
        analysis_collection = get_temperature_collection(
            analysis_geom, f'{year}-01-01', f'{year}-12-31', 'tmax'
        )
        
        # Reference period collection  
        reference_collection = get_temperature_collection(
            analysis_geom, f'{ref_start}-01-01', f'{ref_end}-12-31', 'tmax'
        )
        
        analysis_count = analysis_collection.size().getInfo()
        reference_count = reference_collection.size().getInfo()
        
        print(f'   📊 Analysis year images: {analysis_count}')
        print(f'   📊 Reference period images: {reference_count}')
        
        if analysis_count == 0 or reference_count == 0:
            print('❌ No images found - check ROI coverage')
            return False
        
        # Estimate data size and choose strategy
        test_image = analysis_collection.first()
        pixel_count = test_image.select('temperature').reduceRegion(
            reducer=ee.Reducer.count(),
            geometry=analysis_geom,
            scale=scale,
            maxPixels=1e9
        ).getInfo()
        
        expected_pixels = pixel_count.get('temperature', 0)
        total_images = analysis_count + reference_count
        estimated_values = expected_pixels * total_images
        
        print(f'   📊 Estimated pixels per image: {expected_pixels:,}')
        print(f'   📊 Total images: {total_images}')
        print(f'   📊 Estimated total values: {estimated_values:,}')
        
        # Choose extraction strategy based on size
        if estimated_values > 1000000:  # Too large for getRegion
            print('   🔄 Using temporal chunking strategy...')
            return extract_with_temporal_chunking()
        elif estimated_values > 500000:  # Moderate size
            print('   🔄 Using annual extraction strategy...')
            return extract_with_annual_chunks()
        else:
            print('   🔄 Using direct extraction (small dataset)...')
            return extract_direct()
        
    except Exception as e:
        print(f'❌ Error in array extraction setup: {e}')
        return False

def extract_direct():
    '''Direct extraction for small datasets'''
    try:
        year = analysis_year.value
        ref_start = reference_start.value
        ref_end = reference_end.value
        scale = resolution_selector.value
        
        # Get collections
        analysis_collection = get_temperature_collection(
            analysis_geom, f'{year}-01-01', f'{year}-12-31', 'tmax'
        )
        reference_collection = get_temperature_collection(
            analysis_geom, f'{ref_start}-01-01', f'{ref_end}-12-31', 'tmax'
        )
        
        # Combine for single extraction
        all_collection = analysis_collection.merge(reference_collection)
        
        print('   📥 Extracting all data in single call...')
        region_data = all_collection.getRegion(
            geometry=analysis_geom,
            scale=scale,
            crs='EPSG:4326'
        ).getInfo()
        
        return process_region_data(region_data)
        
    except Exception as e:
        print(f'❌ Direct extraction failed: {e}')
        return False

def extract_with_annual_chunks():
    '''Extract data year by year'''
    try:
        year = analysis_year.value
        ref_start = reference_start.value
        ref_end = reference_end.value
        scale = resolution_selector.value
        
        all_dataframes = []
        
        # Extract each year separately
        years_to_extract = list(range(ref_start, ref_end + 1)) + [year]
        years_to_extract = sorted(list(set(years_to_extract)))
        
        for extract_year in years_to_extract:
            print(f'   📅 Extracting year {extract_year}...')
            
            year_collection = get_temperature_collection(
                analysis_geom, f'{extract_year}-01-01', f'{extract_year}-12-31', 'tmax'
            )
            
            year_count = year_collection.size().getInfo()
            if year_count == 0:
                print(f'      ⚠️ No data for {extract_year}')
                continue
            
            try:
                region_data = year_collection.getRegion(
                    geometry=analysis_geom,
                    scale=scale,
                    crs='EPSG:4326'
                ).getInfo()
                
                if len(region_data) > 1:
                    header = region_data[0]
                    data = region_data[1:]
                    
                    df_year = pd.DataFrame(data, columns=header)
                    df_year['time'] = pd.to_datetime(df_year['time'], unit='ms')
                    df_year = df_year.dropna(subset=['temperature'])
                    df_year['latitude'] = df_year['latitude'].astype(float)
                    df_year['longitude'] = df_year['longitude'].astype(float)
                    df_year['temperature'] = df_year['temperature'].astype(float)
                    
                    all_dataframes.append(df_year)
                    print(f'      ✅ Extracted {len(df_year):,} observations')
                
            except Exception as e:
                print(f'      ❌ Failed to extract {extract_year}: {e}')
                continue
        
        if not all_dataframes:
            print('❌ No data extracted')
            return False
        
        # Combine all years
        print('   🔗 Combining all years...')
        df = pd.concat(all_dataframes, ignore_index=True)
        
        return finalize_temperature_data(df)
        
    except Exception as e:
        print(f'❌ Annual chunking failed: {e}')
        return False

def extract_with_temporal_chunking():
    '''Extract data with monthly chunks for very large datasets'''
    try:
        year = analysis_year.value
        ref_start = reference_start.value
        ref_end = reference_end.value
        scale = resolution_selector.value
        
        all_dataframes = []
        
        # Create monthly chunks
        years_to_extract = list(range(ref_start, ref_end + 1)) + [year]
        years_to_extract = sorted(list(set(years_to_extract)))
        
        total_months = len(years_to_extract) * 12
        processed_months = 0
        
        for extract_year in years_to_extract:
            for month in range(1, 13):
                print(f'   📅 Extracting {extract_year}-{month:02d} ({processed_months+1}/{total_months})...')
                
                # Monthly date range
                start_date = f'{extract_year}-{month:02d}-01'
                if month == 12:
                    end_date = f'{extract_year+1}-01-01'
                else:
                    end_date = f'{extract_year}-{month+1:02d}-01'
                
                month_collection = get_temperature_collection(
                    analysis_geom, start_date, end_date, 'tmax'
                )
                
                month_count = month_collection.size().getInfo()
                if month_count == 0:
                    processed_months += 1
                    continue
                
                try:
                    region_data = month_collection.getRegion(
                        geometry=analysis_geom,
                        scale=scale,
                        crs='EPSG:4326'
                    ).getInfo()
                    
                    if len(region_data) > 1:
                        header = region_data[0]
                        data = region_data[1:]
                        
                        df_month = pd.DataFrame(data, columns=header)
                        df_month['time'] = pd.to_datetime(df_month['time'], unit='ms')
                        df_month = df_month.dropna(subset=['temperature'])
                        df_month['latitude'] = df_month['latitude'].astype(float)
                        df_month['longitude'] = df_month['longitude'].astype(float)
                        df_month['temperature'] = df_month['temperature'].astype(float)
                        
                        all_dataframes.append(df_month)
                        print(f'      ✅ {len(df_month):,} observations')
                    
                except Exception as e:
                    print(f'      ❌ Failed: {e}')
                
                processed_months += 1
                
                # Progress update
                if processed_months % 12 == 0:
                    print(f'   📊 Completed {processed_months//12} years...')
        
        if not all_dataframes:
            print('❌ No data extracted')
            return False
        
        # Combine all chunks
        print('   🔗 Combining all temporal chunks...')
        df = pd.concat(all_dataframes, ignore_index=True)
        
        return finalize_temperature_data(df)
        
    except Exception as e:
        print(f'❌ Temporal chunking failed: {e}')
        return False

def process_region_data(region_data):
    '''Process raw region data from getRegion call'''
    try:
        print(f'   🔄 Processing {len(region_data)} rows...')
        
        if len(region_data) <= 1:
            print('❌ No data in region extraction')
            return False
        
        header = region_data[0]
        data = region_data[1:]
        
        df = pd.DataFrame(data, columns=header)
        
        return finalize_temperature_data(df)
        
    except Exception as e:
        print(f'❌ Error processing region data: {e}')
        return False

def finalize_temperature_data(df):
    '''Convert DataFrame to xarray and finalize'''
    global temperature_data
    
    try:
        # Process data
        df['time'] = pd.to_datetime(df['time'], unit='ms')
        df = df.dropna(subset=['temperature'])
        df['latitude'] = df['latitude'].astype(float)
        df['longitude'] = df['longitude'].astype(float)
        df['temperature'] = df['temperature'].astype(float)
        
        print(f'   📊 Valid observations: {len(df):,}')
        print(f'   📍 Unique pixels: {df[["latitude", "longitude"]].drop_duplicates().shape[0]:,}')
        print(f'   🌡️ Temperature range: {df["temperature"].min():.1f}°C to {df["temperature"].max():.1f}°C')
        
        # Convert to xarray with preserved coordinates
        print('   🔄 Converting to xarray dataset...')
        temperature_data = df.set_index(['time', 'latitude', 'longitude']).to_xarray()
        
        print(f'\n✅ Array extraction complete!')
        print(f'   📅 Time range: {temperature_data.time.min().values} to {temperature_data.time.max().values}')
        print(f'   🌍 Spatial dimensions: lat={temperature_data.dims["latitude"]}, lon={temperature_data.dims["longitude"]}')
        print(f'   📊 Total observations: {temperature_data.temperature.count().values}')
        print(f'   💾 Memory usage: {temperature_data.nbytes / 1024**2:.1f} MB')
        print(f'   ⚡ Ready for fast xarray processing!')
        
        return True
        
    except Exception as e:
        print(f'❌ Error finalizing data: {e}')
        import traceback
        print(f'   Details: {traceback.format_exc()}')
        return False

def process_climate_metrics_fast():
    '''Process climate metrics using fast xarray operations'''
    global temperature_data, climate_results
    
    if temperature_data is None:
        print('❌ Please extract arrays first!')
        return False
    
    try:
        print('🚀 Processing climate metrics with fast xarray operations...')
        
        year = analysis_year.value
        ref_start = reference_start.value
        ref_end = reference_end.value
        abs_threshold = absolute_threshold.value
        pct_threshold = percentile_threshold.value
        
        # Filter data efficiently  
        analysis_data = temperature_data.sel(time=str(year))
        reference_data = temperature_data.sel(time=slice(f'{ref_start}-01-01', f'{ref_end}-12-31'))
        
        print(f'   📅 Analysis year: {len(analysis_data.time)} days')
        print(f'   📅 Reference period: {len(reference_data.time)} days')
        
        # Lightning-fast vectorized calculations
        print('   ⚡ Calculating reference percentile...')
        reference_percentile = reference_data.temperature.quantile(pct_threshold/100, dim='time')
        
        print('   ⚡ Calculating heat days...')
        threshold = xr.where(reference_percentile > abs_threshold, reference_percentile, abs_threshold)
        heat_days = (analysis_data.temperature > threshold).sum(dim='time').fillna(0)
        
        print('   ⚡ Calculating temperature trends...')
        trends = reference_data.temperature.polyfit(dim='time', deg=1)
        trend_slope = trends.polyfit_coefficients.sel(degree=1)
        ns_per_year = 365.25 * 24 * 60 * 60 * 1e9
        trend_per_year = (trend_slope * ns_per_year).fillna(0)
        
        print('   ⚡ Calculating annual statistics...')
        annual_max = analysis_data.temperature.max(dim='time').fillna(0)
        annual_min = analysis_data.temperature.min(dim='time').fillna(0)
        annual_mean = analysis_data.temperature.mean(dim='time').fillna(0)
        annual_range = (annual_max - annual_min).fillna(0)
        
        print('   ⚡ Calculating seasonal means...')
        seasonal_means = analysis_data.temperature.groupby('time.season').mean()
        
        # Store results with coordinates preserved
        climate_results = {
            'heat_days': heat_days,
            'reference_percentile': reference_percentile,
            'threshold_used': threshold,
            'temp_trend': trend_per_year,
            'annual_max': annual_max,
            'annual_min': annual_min,
            'annual_mean': annual_mean,
            'annual_range': annual_range,
            'seasonal_means': seasonal_means
        }
        
        print(f'\n✅ Climate metrics calculated in seconds!')
        
        # Print summary
        print(f'\n📊 RESULTS SUMMARY:')
        print(f'   Mean heat days: {heat_days.mean().values:.1f}')
        print(f'   Max heat days: {heat_days.max().values:.0f}')
        print(f'   Pixels with >0 heat days: {(heat_days > 0).sum().values} of {heat_days.count().values}')
        print(f'   Mean temperature trend: {trend_per_year.mean().values:.3f} °C/year')
        print(f'   📍 All results preserve lat/lon coordinates for spatial export')
        
        return True
        
    except Exception as e:
        print(f'❌ Error in climate processing: {e}')
        import traceback
        print(f'   Details: {traceback.format_exc()}')
        return False

# Create processing buttons
extract_button = widgets.Button(description='📥 Extract Arrays', button_style='primary')
process_button = widgets.Button(description='⚡ Process Metrics', button_style='success')

extract_button.on_click(lambda b: extract_arrays_efficiently())
process_button.on_click(lambda b: process_climate_metrics_fast())

processing_interface = widgets.VBox([
    widgets.HTML('<h3>⚡ Efficient Array Processing</h3>'),
    widgets.HTML('<div style="background-color: #fff3cd; padding: 10px; border-radius: 5px;">' +
                '<b>Two-Step Efficient Process:</b><br>' +
                '1. <b>Extract Arrays:</b> Direct getRegion() extraction with automatic chunking<br>' +
                '2. <b>Process Metrics:</b> Lightning-fast xarray vectorization<br>' +
                '✨ <b>Result:</b> Complete analysis in minutes, not hours!</div>'),
    widgets.HBox([extract_button, process_button])
])

display(processing_interface)
print('⚡ Ready for efficient array extraction and processing')
print('🎯 Automatic chunking handles getRegion() limits')
print('🚀 This approach should be MUCH faster than your previous hour-long GEE processing!')

VBox(children=(HTML(value='<h3>⚡ Efficient Array Processing</h3>'), HTML(value='<div style="background-color: …

⚡ Ready for efficient array extraction and processing
🎯 Automatic chunking handles getRegion() limits
🚀 This approach should be MUCH faster than your previous hour-long GEE processing!


## 📊 Step 4: Visualization & Spatial Export

In [5]:
def visualize_climate_results():
    '''Create comprehensive visualization of climate results'''
    if not climate_results:
        print('❌ No climate results available. Please process metrics first!')
        return
    
    print('📊 Creating climate results visualization...')
    
    # Create subplot grid
    metrics_to_plot = ['heat_days', 'temp_trend', 'annual_max', 'annual_range']
    available_metrics = [m for m in metrics_to_plot if m in climate_results]
    
    if not available_metrics:
        print('❌ No spatial metrics available for plotting')
        return
    
    n_metrics = len(available_metrics)
    cols = min(2, n_metrics)
    rows = (n_metrics + cols - 1) // cols
    
    fig, axes = plt.subplots(rows, cols, figsize=(6*cols, 5*rows))
    if n_metrics == 1:
        axes = [axes]
    elif rows == 1:
        axes = axes if isinstance(axes, np.ndarray) else [axes]
    else:
        axes = axes.flatten()
    
    # Plot each metric
    for i, metric_name in enumerate(available_metrics):
        data = climate_results[metric_name]
        
        # Choose appropriate colormap
        if 'heat' in metric_name:
            cmap = 'Reds'
            title = 'Heat Days per Pixel'
        elif 'trend' in metric_name:
            cmap = 'RdBu_r'
            title = 'Temperature Trend (°C/year)'
        elif 'max' in metric_name:
            cmap = 'RdYlBu_r'
            title = 'Annual Maximum Temperature (°C)'
        elif 'range' in metric_name:
            cmap = 'viridis'
            title = 'Annual Temperature Range (°C)'
        else:
            cmap = 'viridis'
            title = metric_name.replace('_', ' ').title()
        
        im = data.plot(ax=axes[i], cmap=cmap, add_colorbar=True)
        axes[i].set_title(title, fontweight='bold')
        axes[i].set_xlabel('Longitude')
        axes[i].set_ylabel('Latitude')
    
    # Hide empty subplots
    for i in range(len(available_metrics), len(axes)):
        axes[i].set_visible(False)
    
    plt.tight_layout()
    plt.show()
    
    print('✅ Visualization complete!')

def export_spatial_results():
    '''Export climate results as spatial files with proper CRS'''
    if not climate_results:
        print('❌ No climate results available. Please process metrics first!')
        return
    
    try:
        print('📁 Exporting climate results to ../outputs with proper CRS...')
        
        year = analysis_year.value
        
        # Create summary statistics
        summary_data = []
        
        for metric_name, dataset in climate_results.items():
            if hasattr(dataset, 'mean') and len(dataset.dims) <= 2:
                try:
                    if 'seasonal' not in metric_name:  # Skip seasonal means for summary
                        valid_data = dataset.values[~np.isnan(dataset.values)]
                        
                        if len(valid_data) > 0:
                            summary_data.append({
                                'metric': metric_name,
                                'valid_pixels': len(valid_data),
                                'mean': float(np.mean(valid_data)),
                                'min': float(np.min(valid_data)),
                                'max': float(np.max(valid_data)),
                                'std': float(np.std(valid_data)),
                                'median': float(np.median(valid_data))
                            })
                except Exception as e:
                    print(f'     ⚠️ Skipping {metric_name}: {e}')
        
        # Save summary table
        if summary_data:
            summary_df = pd.DataFrame(summary_data)
            summary_file = f'../outputs/efficient_climate_summary_{year}.csv'
            summary_df.to_csv(summary_file, index=False)
            print(f'   ✅ Summary saved: {summary_file}')
            
            # Display summary
            print('\\n📊 CLIMATE ANALYSIS SUMMARY:')
            display(summary_df)
        
        # Export as NetCDF with proper CRS
        print('\\n   📦 Creating NetCDF with proper CRS...')
        
        # Create xarray Dataset from results
        spatial_results = {}
        for k, v in climate_results.items():
            if hasattr(v, 'dims') and 'latitude' in v.dims and 'longitude' in v.dims:
                if len(v.dims) == 2:  # Spatial data only
                    spatial_results[k] = v.fillna(0)
        
        if spatial_results:
            results_ds = xr.Dataset(spatial_results)
            
            # Add proper CRS information (same as notebook 17)
            results_ds.latitude.attrs['standard_name'] = 'latitude'
            results_ds.latitude.attrs['long_name'] = 'latitude'
            results_ds.latitude.attrs['units'] = 'degrees_north'
            results_ds.latitude.attrs['axis'] = 'Y'
            
            results_ds.longitude.attrs['standard_name'] = 'longitude'
            results_ds.longitude.attrs['long_name'] = 'longitude'
            results_ds.longitude.attrs['units'] = 'degrees_east'
            results_ds.longitude.attrs['axis'] = 'X'
            
            # Add CRS variable
            crs = xr.DataArray(
                data=np.int32(1),
                attrs={
                    'grid_mapping_name': 'latitude_longitude',
                    'longitude_of_prime_meridian': 0.0,
                    'semi_major_axis': 6378137.0,
                    'inverse_flattening': 298.257223563,
                    'spatial_ref': 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]',
                    'crs_wkt': 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]'
                }
            )
            results_ds['crs'] = crs
            
            # Add grid_mapping to all data variables
            for var_name in results_ds.data_vars:
                if var_name != 'crs':
                    results_ds[var_name].attrs['grid_mapping'] = 'crs'
            
            # Add metadata
            results_ds.attrs['analysis_year'] = year
            results_ds.attrs['reference_period'] = f'{reference_start.value}-{reference_end.value}'
            results_ds.attrs['created'] = datetime.now().isoformat()
            results_ds.attrs['method'] = 'efficient_array_extraction'
            results_ds.attrs['resolution'] = f'{resolution_selector.value}m'
            results_ds.attrs['absolute_threshold'] = absolute_threshold.value
            results_ds.attrs['percentile_threshold'] = percentile_threshold.value
            results_ds.attrs['crs'] = 'EPSG:4326'
            
            netcdf_file = f'../outputs/efficient_climate_{year}.nc'
            results_ds.to_netcdf(netcdf_file)
            
            print(f'   ✅ NetCDF saved with proper CRS: {netcdf_file}')
            print(f'      Variables: {list(results_ds.data_vars)}')
            print(f'      Dimensions: {dict(results_ds.dims)}')
            print(f'      CRS: EPSG:4326 (WGS84)')
            print(f'      File size: {os.path.getsize(netcdf_file) / 1024**2:.1f} MB')
        
        # Export individual GeoTIFF files
        print('\\n   🗺️ Exporting individual GeoTIFF files...')
        
        for metric_name, dataset in climate_results.items():
            if hasattr(dataset, 'dims') and 'latitude' in dataset.dims and 'longitude' in dataset.dims:
                if len(dataset.dims) == 2:  # Spatial data
                    try:
                        # Convert to format suitable for rasterio
                        data_array = dataset.fillna(0)
                        
                        # Set spatial reference
                        data_array.rio.write_crs("EPSG:4326", inplace=True)
                        
                        # Export as GeoTIFF
                        output_file = f'../outputs/efficient_{metric_name}_{year}.tif'
                        data_array.rio.to_raster(output_file)
                        
                        print(f'      ✅ {metric_name} → {os.path.basename(output_file)}')
                        
                    except Exception as e:
                        print(f'      ⚠️ Failed to export {metric_name}: {e}')
        
        print(f'\\n✅ Export complete! Files saved to ../outputs/')
        print(f'   📊 Summary: efficient_climate_summary_{year}.csv')
        print(f'   📦 NetCDF: efficient_climate_{year}.nc (with proper CRS)')
        print(f'   🗺️ GeoTIFFs: efficient_[metric]_{year}.tif')
        print(f'   ✨ All files have proper WGS84 projection information!')
        
    except Exception as e:
        print(f'❌ Error exporting: {e}')
        import traceback
        print(f'   Details: {traceback.format_exc()}')

# Create visualization and export buttons
visualize_button = widgets.Button(description='📊 Visualize Results', button_style='info')
export_button = widgets.Button(description='📁 Export Spatial Files', button_style='warning')

visualize_button.on_click(lambda b: visualize_climate_results())
export_button.on_click(lambda b: export_spatial_results())

export_interface = widgets.VBox([
    widgets.HTML('<h3>📊 Visualization & Spatial Export</h3>'),
    widgets.HTML('<div style="background-color: #d4edda; padding: 10px; border-radius: 5px;">' +
                '<b>Final Step:</b> Visualize your efficiently-processed climate metrics ' +
                'and export as spatial files (NetCDF + GeoTIFF) with proper WGS84 CRS.</div>'),
    widgets.HBox([visualize_button, export_button])
])

display(export_interface)
print('📊 Ready for visualization and spatial export')
print('🌍 Exports include proper CRS for GIS compatibility')

VBox(children=(HTML(value='<h3>📊 Visualization & Spatial Export</h3>'), HTML(value='<div style="background-col…

📊 Ready for visualization and spatial export
🌍 Exports include proper CRS for GIS compatibility


## 🎯 Summary

This notebook implements the **most efficient approach** for large-scale climate analysis:

### ⚡ **Why This Approach is Superior:**

**1. Direct Array Extraction**
- No file exports/downloads (your main complaint about previous approach!)
- Direct memory-to-memory transfer from GEE
- getRegion() extracts ALL pixel values in single call

**2. Minimal GEE Processing**
- Only filtering and scaling on GEE servers
- No complex server-side statistics (which were slow)
- Leverages GEE's strength (data access) not weakness (complex analysis)

**3. Lightning-Fast xarray Processing**
- Vectorized operations using NumPy/Dask
- Minutes vs hours for complex analysis
- Full Python scientific stack available

**4. Coordinates Preserved**
- All lat/lon coordinates maintained
- Easy conversion back to NetCDF/GeoTIFF
- Proper CRS information included

### 🚀 **Complete Workflow:**
1. **Set ROI** - Same interface as notebooks 16/17
2. **Configure** - Choose resolution and analysis parameters  
3. **Extract Arrays** - Direct getRegion() from GEE (no files!)
4. **Process Metrics** - Fast vectorized xarray calculations
5. **Visualize & Export** - Spatial files with proper projections

### 💡 **Perfect for:**
- Large ROIs at true 1km resolution
- Complex climate analysis requiring speed
- Intra-urban heat analysis
- Converting results back to spatial formats

**This approach combines:**
- ✅ **GEE's data access power** (massive datasets)
- ✅ **xarray's analysis speed** (vectorized operations)  
- ✅ **No file bottlenecks** (direct array transfer)
- ✅ **Spatial export capability** (for GIS workflows)

**Result: Analysis that previously took you an hour now completes in minutes!** 🎉

In [13]:
def explore_climate_datasets():
    '''Explore the loaded climate datasets'''
    if not climate_datasets:
        print('❌ No climate datasets loaded. Please load rasters first.')
        return
    
    print('🔍 EXPLORING CLIMATE DATASETS')
    print('='*50)
    
    # Dataset overview
    print(f'📊 Available datasets: {len(climate_datasets)}')
    for name in climate_datasets.keys():
        print(f'   - {name}')
    
    # Spatial information
    sample_ds = list(climate_datasets.values())[0]
    print(f'\n🌍 Spatial Information:')
    print(f'   Shape: {sample_ds.shape}')
    print(f'   Size: {sample_ds.size:,} pixels')
    print(f'   Memory: {sample_ds.nbytes / 1024**2:.1f} MB per dataset')
    
    # Statistics for key metrics
    if 'heat_days' in climate_datasets:
        heat_days = climate_datasets['heat_days']
        valid_pixels = (~np.isnan(heat_days.values)).sum()
        
        print(f'\n🔥 Heat Days Analysis:')
        print(f'   Valid pixels: {valid_pixels:,}')
        print(f'   Range: {heat_days.min().values:.0f} to {heat_days.max().values:.0f} days')
        print(f'   Mean: {heat_days.mean().values:.1f} days')
        print(f'   Pixels with >0 heat days: {(heat_days > 0).sum().values}')
    
    if 'temp_trend' in climate_datasets:
        trend = climate_datasets['temp_trend']
        print(f'\n📈 Temperature Trend Analysis:')
        print(f'   Range: {trend.min().values:.4f} to {trend.max().values:.4f} °C/year')
        print(f'   Mean: {trend.mean().values:.4f} °C/year')
    
    # Create visualization
    print('\n📊 Creating visualization...')
    
    # Determine number of subplots needed
    available_metrics = list(climate_datasets.keys())
    n_metrics = len(available_metrics)
    
    if n_metrics == 0:
        print('❌ No metrics to visualize')
        return
    
    # Create subplot grid
    cols = min(3, n_metrics)
    rows = (n_metrics + cols - 1) // cols
    
    fig, axes = plt.subplots(rows, cols, figsize=(5*cols, 4*rows))
    if n_metrics == 1:
        axes = [axes]
    elif rows == 1:
        axes = axes if isinstance(axes, np.ndarray) else [axes]
    else:
        axes = axes.flatten()
    
    # Plot each metric
    for i, metric_name in enumerate(available_metrics[:len(axes)]):
        data = climate_datasets[metric_name]
        
        # Choose appropriate colormap
        if 'heat' in metric_name:
            cmap = 'Reds'
        elif 'trend' in metric_name:
            cmap = 'RdBu_r'
        elif 'temp' in metric_name or 'mean' in metric_name:
            cmap = 'RdYlBu_r'
        else:
            cmap = 'viridis'
        
        im = data.plot(ax=axes[i], cmap=cmap, add_colorbar=True)
        axes[i].set_title(metric_name.replace('_', ' ').title(), fontweight='bold')
        axes[i].set_xlabel('Longitude')
        axes[i].set_ylabel('Latitude')
    
    # Hide empty subplots
    for i in range(len(available_metrics), len(axes)):
        axes[i].set_visible(False)
    
    plt.tight_layout()
    plt.show()
    
    print('\n✅ Dataset exploration complete!')

def export_climate_analysis():
    '''Export climate analysis results'''
    if not climate_datasets:
        print('❌ No climate datasets loaded. Please load rasters first.')
        return
    
    try:
        print('📁 Exporting climate analysis to ../outputs...')
        
        year = analysis_year.value
        
        # Create summary statistics
        summary_data = []
        
        for metric_name, dataset in climate_datasets.items():
            valid_data = dataset.values[~np.isnan(dataset.values)]
            
            if len(valid_data) > 0:
                summary_data.append({
                    'metric': metric_name,
                    'valid_pixels': len(valid_data),
                    'mean': float(np.mean(valid_data)),
                    'min': float(np.min(valid_data)),
                    'max': float(np.max(valid_data)),
                    'std': float(np.std(valid_data)),
                    'median': float(np.median(valid_data))
                })
        
        # Save summary table
        if summary_data:
            summary_df = pd.DataFrame(summary_data)
            summary_file = f'../outputs/gee_server_climate_summary_{year}.csv'
            summary_df.to_csv(summary_file, index=False)
            print(f'   ✅ Summary saved: {summary_file}')
            
            # Display summary
            print('\n📊 CLIMATE ANALYSIS SUMMARY:')
            display(summary_df)
        
        # Export individual rasters to outputs
        for metric_name, dataset in climate_datasets.items():
            output_file = f'../outputs/gee_server_{metric_name}_{year}.tif'
            dataset.rio.to_raster(output_file)
            print(f'   ✅ Exported: {output_file}')
        
        # Create combined NetCDF
        combined_ds = xr.Dataset(climate_datasets)
        combined_ds.attrs['analysis_year'] = year
        combined_ds.attrs['created'] = datetime.now().isoformat()
        combined_ds.attrs['method'] = 'GEE_server_side_processing'
        combined_ds.attrs['resolution'] = f'{resolution_selector.value}m'
        
        netcdf_file = f'../outputs/gee_server_climate_{year}.nc'
        combined_ds.to_netcdf(netcdf_file)
        print(f'   ✅ Combined NetCDF: {netcdf_file}')
        
        print(f'\n✅ Export complete! Files saved to ../outputs/')
        print(f'   📊 Summary: gee_server_climate_summary_{year}.csv')
        print(f'   📦 NetCDF: gee_server_climate_{year}.nc')
        print(f'   🗺️ Individual rasters: gee_server_[metric]_{year}.tif')
        
    except Exception as e:
        print(f'❌ Error exporting: {e}')
        import traceback
        print(f'   Details: {traceback.format_exc()}')

# Analysis buttons
explore_button = widgets.Button(description='🔍 Explore Datasets', button_style='info')
export_analysis_button = widgets.Button(description='📁 Export Analysis', button_style='warning')

explore_button.on_click(lambda b: explore_climate_datasets())
export_analysis_button.on_click(lambda b: export_climate_analysis())

analysis_interface = widgets.VBox([
    widgets.HTML('<h3>📊 xarray Analysis & Visualization</h3>'),
    widgets.HTML('<div style="background-color: #e8f4fd; padding: 10px; border-radius: 5px;">' +
                '<b>Final Step:</b> Analyze the climate datasets loaded from GEE-processed rasters. ' +
                'All heavy computation was done server-side - now enjoy fast spatial analysis!</div>'),
    widgets.HBox([explore_button, export_analysis_button])
])

display(analysis_interface)
print('📊 Ready for xarray-based climate analysis')
print('🚀 All heavy computation done server-side!')

VBox(children=(HTML(value='<h3>📊 xarray Analysis & Visualization</h3>'), HTML(value='<div style="background-co…

📊 Ready for xarray-based climate analysis
🚀 All heavy computation done server-side!


## 🎯 Summary

This notebook implements an efficient **server-side processing → raster export → xarray analysis** workflow:

### ✅ **Key Advantages:**

**1. No Data Transfer Limits**
- All heavy computation on GEE servers
- Only download final processed results
- Perfect for large-scale intra-urban analysis

**2. Full 1km Resolution**
- No compromise on spatial detail
- Ideal for intra-urban heat analysis
- Captures local temperature variations

**3. Efficient Workflow**
- Server-side: Calculate all climate metrics
- Export: Multi-band rasters to Google Drive
- Client-side: Fast xarray spatial analysis

**4. Complete Climate Metrics**
- Heat days calculation
- Temperature trends
- Annual extremes (max, min, mean, range)
- Seasonal means
- Reference percentiles

### 🚀 **Workflow:**
1. **Set ROI** - Same interface as notebook 17
2. **Configure analysis** - Choose resolution and parameters
3. **Process & Export** - GEE does all heavy lifting
4. **Download rasters** - From Google Drive to local directory
5. **Load & Analyze** - Fast xarray-based spatial analysis

### 💡 **Perfect for:**
- Intra-urban heat island analysis
- High-resolution climate mapping
- Large ROIs without memory constraints
- Persistent, reusable results

This approach combines the **computational power of Google Earth Engine** with the **spatial analysis capabilities of xarray** for optimal intra-urban climate analysis!