# NEON AOP to Satellite Data Crosswalk

This notebook demonstrates how to create a crosswalk between NEON's Airborne Observation Platform (AOP) hyperspectral and LiDAR data with satellite multispectral imagery.

## Learning Objectives
- Understand NEON AOP data products (spectrometer and LiDAR)
- Match hyperspectral bands to satellite multispectral bands
- Validate satellite-derived metrics using high-resolution AOP data
- Scale fire risk indicators from point to landscape scale

In [None]:
# Import required libraries
import sys
sys.path.append('..')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import rasterio
from rasterio.plot import show
import geopandas as gpd
from scipy import stats

from src.data_collection.neon_client import NEONDataCollector

# Set up plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

## 1. NEON AOP Data Overview

NEON's Airborne Observation Platform collects:
- **Hyperspectral imagery**: 426 bands from 380-2510 nm at 1m resolution
- **LiDAR data**: Vegetation structure, canopy height, terrain models
- **High-resolution RGB**: 10cm resolution camera imagery

In [None]:
# Initialize NEON collector
collector = NEONDataCollector()

# Display available AOP products
aop_products_df = pd.DataFrame(
    [(code, desc) for code, desc in collector.AOP_PRODUCTS.items()],
    columns=['Product Code', 'Description']
)
print("NEON AOP Products for Satellite Crosswalk:")
aop_products_df

## 2. Spectral Band Matching

Create a mapping between NEON's hyperspectral bands and common satellite sensors.

In [None]:
# Define spectral response functions for satellites
satellite_bands = {
    'Sentinel-2': {
        'B02_Blue': {'center': 490, 'width': 65, 'range': (459, 549)},
        'B03_Green': {'center': 560, 'width': 35, 'range': (542, 578)},
        'B04_Red': {'center': 665, 'width': 30, 'range': (649, 681)},
        'B05_RedEdge1': {'center': 705, 'width': 15, 'range': (697, 713)},
        'B06_RedEdge2': {'center': 740, 'width': 15, 'range': (732, 748)},
        'B07_RedEdge3': {'center': 783, 'width': 20, 'range': (773, 793)},
        'B08_NIR': {'center': 842, 'width': 115, 'range': (784, 900)},
        'B11_SWIR1': {'center': 1610, 'width': 90, 'range': (1568, 1660)},
        'B12_SWIR2': {'center': 2190, 'width': 180, 'range': (2115, 2290)}
    },
    'Landsat-8': {
        'B2_Blue': {'center': 482, 'width': 65, 'range': (450, 515)},
        'B3_Green': {'center': 562, 'width': 75, 'range': (525, 600)},
        'B4_Red': {'center': 655, 'width': 50, 'range': (630, 680)},
        'B5_NIR': {'center': 865, 'width': 40, 'range': (845, 885)},
        'B6_SWIR1': {'center': 1610, 'width': 100, 'range': (1560, 1660)},
        'B7_SWIR2': {'center': 2200, 'width': 200, 'range': (2100, 2300)}
    },
    'MODIS': {
        'B3_Blue': {'center': 469, 'width': 20, 'range': (459, 479)},
        'B4_Green': {'center': 555, 'width': 20, 'range': (545, 565)},
        'B1_Red': {'center': 645, 'width': 50, 'range': (620, 670)},
        'B2_NIR': {'center': 858, 'width': 35, 'range': (841, 876)},
        'B6_SWIR1': {'center': 1640, 'width': 24, 'range': (1628, 1652)},
        'B7_SWIR2': {'center': 2130, 'width': 50, 'range': (2105, 2155)}
    }
}

# Visualize spectral band positions
fig, ax = plt.subplots(figsize=(14, 8))

y_positions = {'Sentinel-2': 0, 'Landsat-8': 1, 'MODIS': 2}
colors = {'Blue': 'blue', 'Green': 'green', 'Red': 'red', 
          'RedEdge': 'orange', 'NIR': 'darkred', 'SWIR': 'brown'}

for satellite, bands in satellite_bands.items():
    y_pos = y_positions[satellite]
    for band_name, band_info in bands.items():
        # Determine color based on band type
        band_type = band_name.split('_')[1]
        color = colors.get(band_type.replace('1','').replace('2','').replace('3',''), 'gray')
        
        # Plot band as rectangle
        ax.barh(y_pos, band_info['width'], 
                left=band_info['center'] - band_info['width']/2,
                height=0.3, alpha=0.6, color=color,
                label=f"{satellite} {band_name.split('_')[1]}")

# Add NEON hyperspectral range
ax.axhspan(-0.5, -0.2, xmin=380/2510, xmax=1, alpha=0.2, color='gray', 
           label='NEON Hyperspectral Range')

ax.set_yticks(list(y_positions.values()) + [-0.35])
ax.set_yticklabels(list(y_positions.keys()) + ['NEON AOP'])
ax.set_xlabel('Wavelength (nm)')
ax.set_title('Spectral Band Comparison: NEON AOP vs Satellite Sensors')
ax.set_xlim(400, 2400)
ax.grid(True, alpha=0.3)

# Clean up legend
handles, labels = ax.get_legend_handles_labels()
by_label = dict(zip(labels, handles))
ax.legend(by_label.values(), by_label.keys(), bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()

## 3. Fire-Relevant Vegetation Indices

Calculate vegetation indices from both NEON and satellite data for comparison.

In [None]:
def calculate_vegetation_indices(red, nir, swir1=None, swir2=None):
    """
    Calculate common vegetation indices used in fire risk assessment.
    """
    indices = {}
    
    # Normalized Difference Vegetation Index
    indices['NDVI'] = (nir - red) / (nir + red + 1e-8)
    
    # Enhanced Vegetation Index (simplified without blue band)
    indices['EVI'] = 2.5 * (nir - red) / (nir + 6 * red - 7.5 + 1)
    
    if swir1 is not None:
        # Normalized Difference Water Index
        indices['NDWI'] = (nir - swir1) / (nir + swir1 + 1e-8)
        
        # Normalized Burn Ratio
        indices['NBR'] = (nir - swir2) / (nir + swir2 + 1e-8) if swir2 is not None else None
    
    return indices

# Simulate calculation with example data
# In practice, these would come from actual NEON and satellite imagery
n_pixels = 1000
neon_bands = {
    'red': np.random.normal(0.15, 0.05, n_pixels),
    'nir': np.random.normal(0.45, 0.1, n_pixels),
    'swir1': np.random.normal(0.25, 0.05, n_pixels),
    'swir2': np.random.normal(0.20, 0.05, n_pixels)
}

sentinel_bands = {
    'red': np.random.normal(0.14, 0.06, n_pixels),
    'nir': np.random.normal(0.43, 0.12, n_pixels),
    'swir1': np.random.normal(0.24, 0.06, n_pixels),
    'swir2': np.random.normal(0.19, 0.06, n_pixels)
}

# Calculate indices
neon_indices = calculate_vegetation_indices(**neon_bands)
sentinel_indices = calculate_vegetation_indices(**sentinel_bands)

# Compare indices
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
indices_to_plot = ['NDVI', 'EVI', 'NDWI', 'NBR']

for idx, index_name in enumerate(indices_to_plot):
    ax = axes[idx // 2, idx % 2]
    
    if neon_indices.get(index_name) is not None:
        # Scatter plot
        ax.scatter(neon_indices[index_name], sentinel_indices[index_name], 
                  alpha=0.5, s=10)
        
        # Add 1:1 line
        min_val = min(neon_indices[index_name].min(), sentinel_indices[index_name].min())
        max_val = max(neon_indices[index_name].max(), sentinel_indices[index_name].max())
        ax.plot([min_val, max_val], [min_val, max_val], 'r--', lw=2)
        
        # Calculate correlation
        corr = np.corrcoef(neon_indices[index_name], sentinel_indices[index_name])[0, 1]
        
        ax.set_xlabel(f'NEON AOP {index_name}')
        ax.set_ylabel(f'Sentinel-2 {index_name}')
        ax.set_title(f'{index_name} Comparison (r = {corr:.3f})')
        ax.grid(True, alpha=0.3)

plt.suptitle('NEON AOP vs Sentinel-2 Vegetation Indices Comparison', fontsize=14)
plt.tight_layout()
plt.show()

## 4. LiDAR-Derived Fire Risk Metrics

NEON's LiDAR data provides unique structural information critical for fire behavior modeling.

In [None]:
# Fire-relevant LiDAR metrics
lidar_fire_metrics = {
    'Canopy Height': 'Maximum vegetation height - affects fire intensity',
    'Canopy Cover': 'Percentage of ground covered by vegetation',
    'Canopy Base Height': 'Height to live crown - critical for crown fire',
    'Canopy Bulk Density': 'Biomass per unit volume - fuel load indicator',
    'Ladder Fuel Density': 'Vertical fuel continuity - fire spread to crown',
    'Surface Fuel Height': 'Height of surface fuels - surface fire behavior'
}

# Create visualization of LiDAR-derived metrics
fig, ax = plt.subplots(figsize=(10, 6))

# Simulate a forest profile
x = np.linspace(0, 100, 1000)
ground = np.sin(x/20) * 2 + 100
canopy_top = ground + 15 + np.sin(x/5) * 5 + np.random.normal(0, 1, 1000)
canopy_base = ground + 5 + np.sin(x/7) * 2
surface_fuel = ground + np.random.uniform(0, 0.5, 1000)

# Plot forest profile
ax.fill_between(x, 0, ground, color='brown', alpha=0.3, label='Terrain')
ax.fill_between(x, ground, surface_fuel, color='orange', alpha=0.5, label='Surface Fuels')
ax.fill_between(x, canopy_base, canopy_top, color='green', alpha=0.5, label='Canopy')

# Add annotations for key metrics
idx = 500
ax.annotate('', xy=(x[idx]+5, ground[idx]), xytext=(x[idx]+5, canopy_base[idx]),
            arrowprops=dict(arrowstyle='<->', color='red', lw=2))
ax.text(x[idx]+7, (ground[idx] + canopy_base[idx])/2, 'Canopy Base Height', 
        rotation=90, va='center', color='red')

ax.annotate('', xy=(x[idx]+15, canopy_base[idx]), xytext=(x[idx]+15, canopy_top[idx]),
            arrowprops=dict(arrowstyle='<->', color='blue', lw=2))
ax.text(x[idx]+17, (canopy_base[idx] + canopy_top[idx])/2, 'Canopy Height', 
        rotation=90, va='center', color='blue')

ax.set_xlabel('Distance (m)')
ax.set_ylabel('Elevation (m)')
ax.set_title('LiDAR-Derived Forest Structure for Fire Risk Assessment')
ax.legend(loc='upper right')
ax.set_ylim(0, 130)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Display metrics table
metrics_df = pd.DataFrame([
    {'Metric': metric, 'Description': desc, 'Satellite Capability': capability}
    for metric, desc, capability in [
        ('Canopy Height', lidar_fire_metrics['Canopy Height'], 'Limited - only top of canopy'),
        ('Canopy Cover', lidar_fire_metrics['Canopy Cover'], 'Good - NDVI proxy'),
        ('Canopy Base Height', lidar_fire_metrics['Canopy Base Height'], 'Not possible'),
        ('Canopy Bulk Density', lidar_fire_metrics['Canopy Bulk Density'], 'Not possible'),
        ('Ladder Fuel Density', lidar_fire_metrics['Ladder Fuel Density'], 'Not possible'),
        ('Surface Fuel Height', lidar_fire_metrics['Surface Fuel Height'], 'Not possible')
    ]
])

print("\nLiDAR Metrics vs Satellite Capabilities:")
metrics_df

## 5. Creating the Crosswalk Model

Develop a machine learning model to predict LiDAR-derived metrics from satellite data.

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error

# Simulate crosswalk dataset
# In practice, this would be actual matched NEON AOP and satellite data
n_samples = 5000

# Satellite features (what we have at scale)
satellite_features = pd.DataFrame({
    'NDVI': np.random.normal(0.6, 0.2, n_samples),
    'EVI': np.random.normal(0.4, 0.15, n_samples),
    'NDWI': np.random.normal(0.3, 0.1, n_samples),
    'NBR': np.random.normal(0.5, 0.15, n_samples),
    'B4_Red': np.random.normal(0.15, 0.05, n_samples),
    'B8_NIR': np.random.normal(0.45, 0.1, n_samples),
    'B11_SWIR1': np.random.normal(0.25, 0.05, n_samples),
    'B12_SWIR2': np.random.normal(0.20, 0.05, n_samples),
})

# LiDAR-derived targets (what we want to predict)
# Create correlated targets based on satellite features
canopy_height = 20 * satellite_features['NDVI'] + 10 + np.random.normal(0, 2, n_samples)
canopy_cover = 100 * satellite_features['NDVI']**2 + np.random.normal(0, 5, n_samples)
canopy_base_height = 0.3 * canopy_height + np.random.normal(2, 1, n_samples)

lidar_targets = pd.DataFrame({
    'canopy_height': np.clip(canopy_height, 0, 50),
    'canopy_cover': np.clip(canopy_cover, 0, 100),
    'canopy_base_height': np.clip(canopy_base_height, 0, 20)
})

# Train crosswalk models
results = {}

for target in lidar_targets.columns:
    print(f"\nTraining model for {target}...")
    
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        satellite_features, lidar_targets[target], 
        test_size=0.2, random_state=42
    )
    
    # Train model
    model = RandomForestRegressor(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    
    # Evaluate
    y_pred = model.predict(X_test)
    r2 = r2_score(y_test, y_pred)
    mae = mean_absolute_error(y_test, y_pred)
    
    results[target] = {
        'model': model,
        'r2': r2,
        'mae': mae,
        'feature_importance': pd.DataFrame({
            'feature': satellite_features.columns,
            'importance': model.feature_importances_
        }).sort_values('importance', ascending=False)
    }
    
    print(f"R² Score: {r2:.3f}")
    print(f"MAE: {mae:.2f}")

# Visualize results
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

for idx, (target, result) in enumerate(results.items()):
    # Prediction scatter
    ax1 = axes[0, idx]
    X_test = satellite_features.iloc[result['model'].n_features_in_:]
    y_test = lidar_targets[target].iloc[result['model'].n_features_in_:]
    y_pred = result['model'].predict(satellite_features.iloc[:len(y_test)])
    
    ax1.scatter(y_test[:1000], y_pred[:1000], alpha=0.5, s=10)
    ax1.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
    ax1.set_xlabel(f'LiDAR {target}')
    ax1.set_ylabel(f'Predicted {target}')
    ax1.set_title(f'{target}\nR² = {result["r2"]:.3f}, MAE = {result["mae"]:.2f}')
    ax1.grid(True, alpha=0.3)
    
    # Feature importance
    ax2 = axes[1, idx]
    top_features = result['feature_importance'].head(6)
    ax2.barh(top_features['feature'], top_features['importance'])
    ax2.set_xlabel('Feature Importance')
    ax2.set_title(f'Top Features for {target}')
    ax2.grid(True, alpha=0.3)

plt.suptitle('Satellite to LiDAR Crosswalk Model Performance', fontsize=14)
plt.tight_layout()
plt.show()

## 6. Temporal Alignment Strategy

NEON AOP flights occur annually, while satellites provide frequent revisits. Here's how to handle temporal mismatches.

In [None]:
# Temporal alignment strategies
temporal_strategies = pd.DataFrame([
    {
        'Strategy': 'Closest Match',
        'Description': 'Use satellite image closest to AOP flight date',
        'Pros': 'Simple, minimal phenological differences',
        'Cons': 'May have clouds, limited samples',
        'Use Case': 'Stable vegetation, clear weather regions'
    },
    {
        'Strategy': 'Seasonal Composite',
        'Description': 'Average satellite data over the season of AOP flight',
        'Pros': 'Reduces noise, handles clouds',
        'Cons': 'Smooths temporal variations',
        'Use Case': 'General vegetation mapping'
    },
    {
        'Strategy': 'Phenology Matching',
        'Description': 'Match based on vegetation phenology stage',
        'Pros': 'Accounts for inter-annual variations',
        'Cons': 'Complex, requires phenology model',
        'Use Case': 'Deciduous forests, agricultural areas'
    },
    {
        'Strategy': 'Multi-temporal Stack',
        'Description': 'Use time series features (median, variance, trends)',
        'Pros': 'Captures temporal dynamics',
        'Cons': 'Increased data volume',
        'Use Case': 'Dynamic fire risk assessment'
    }
])

print("Temporal Alignment Strategies for AOP-Satellite Crosswalk:")
temporal_strategies

## 7. Pre/Post Fire Analysis

NEON sites with fire events provide unique opportunities to validate burn severity assessments.

In [None]:
# Simulate pre/post fire data
dates = pd.date_range('2019-01-01', '2024-01-01', freq='MS')
fire_date = pd.Timestamp('2021-08-15')

# Create time series with fire impact
ndvi_prefire = 0.7 + 0.1 * np.sin(2 * np.pi * np.arange(len(dates)) / 12)
fire_impact = np.where(dates > fire_date, -0.4, 0)
recovery = np.where(dates > fire_date, 
                   0.3 * (1 - np.exp(-0.1 * (dates - fire_date).days / 30)), 
                   0)
ndvi = ndvi_prefire + fire_impact + recovery + np.random.normal(0, 0.05, len(dates))

# NBR for burn severity
nbr = 0.6 + 0.05 * np.sin(2 * np.pi * np.arange(len(dates)) / 12)
nbr_impact = np.where(dates > fire_date, -0.5, 0)
nbr_recovery = np.where(dates > fire_date, 
                       0.2 * (1 - np.exp(-0.05 * (dates - fire_date).days / 30)), 
                       0)
nbr_series = nbr + nbr_impact + nbr_recovery + np.random.normal(0, 0.03, len(dates))

# Calculate dNBR (difference NBR)
pre_fire_nbr = nbr_series[dates < fire_date].mean()
post_fire_nbr = nbr_series[(dates > fire_date) & (dates < fire_date + pd.Timedelta(days=60))].mean()
dnbr = pre_fire_nbr - post_fire_nbr

# Plotting
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

# NDVI time series
ax1.plot(dates, ndvi, 'g-', linewidth=2, label='NDVI')
ax1.axvline(fire_date, color='red', linestyle='--', alpha=0.7, label='Fire Event')
ax1.fill_between(dates, 0, 1, where=(dates > fire_date - pd.Timedelta(days=30)) & 
                                   (dates < fire_date + pd.Timedelta(days=30)),
                alpha=0.2, color='red')
ax1.set_ylabel('NDVI')
ax1.set_title('Vegetation Response to Fire Event')
ax1.legend()
ax1.grid(True, alpha=0.3)

# NBR time series
ax2.plot(dates, nbr_series, 'b-', linewidth=2, label='NBR')
ax2.axvline(fire_date, color='red', linestyle='--', alpha=0.7)
ax2.axhline(pre_fire_nbr, color='green', linestyle=':', label=f'Pre-fire NBR = {pre_fire_nbr:.3f}')
ax2.axhline(post_fire_nbr, color='orange', linestyle=':', label=f'Post-fire NBR = {post_fire_nbr:.3f}')
ax2.set_ylabel('NBR')
ax2.set_xlabel('Date')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Add dNBR annotation
ax2.text(fire_date + pd.Timedelta(days=100), post_fire_nbr + 0.05, 
         f'dNBR = {dnbr:.3f}\n(High Severity)', 
         bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.7))

plt.tight_layout()
plt.show()

# Burn severity classification
burn_severity = pd.DataFrame([
    {'dNBR Range': '< 0.1', 'Severity': 'Unburned/Low', 'Description': 'Little to no change'},
    {'dNBR Range': '0.1 - 0.27', 'Severity': 'Low', 'Description': 'Low severity burn'},
    {'dNBR Range': '0.27 - 0.44', 'Severity': 'Moderate-Low', 'Description': 'Moderate-low severity'},
    {'dNBR Range': '0.44 - 0.66', 'Severity': 'Moderate-High', 'Description': 'Moderate-high severity'},
    {'dNBR Range': '> 0.66', 'Severity': 'High', 'Description': 'High severity burn'}
])

print("\nBurn Severity Classification using dNBR:")
burn_severity

## Key Insights and Recommendations

### 1. **NEON AOP Data Advantages**
- **Hyperspectral precision**: 426 bands vs 10-13 satellite bands
- **Structural information**: LiDAR provides critical fire behavior parameters
- **Calibration quality**: Research-grade instruments with known uncertainty

### 2. **Crosswalk Benefits**
- **Scale up point measurements**: Use AOP to validate satellite-derived products
- **Fill structural gaps**: Predict canopy base height, fuel loads from satellite
- **Improve fire models**: Better fuel characterization across landscapes

### 3. **Implementation Strategy**
1. **Training Phase**: Use overlapping AOP-satellite data to train crosswalk models
2. **Validation**: Test on fire-affected NEON sites with pre/post data
3. **Deployment**: Apply models to satellite data for wall-to-wall coverage
4. **Continuous Improvement**: Update models with new AOP flights

### 4. **Technical Considerations**
- **Spatial resolution**: Aggregate 1m AOP to match 10-30m satellite pixels
- **Temporal alignment**: Use phenology-aware matching strategies
- **Uncertainty propagation**: Track error from AOP through satellite predictions

This crosswalk approach provides the best of both worlds: NEON's data quality with satellite's spatial coverage for comprehensive wildfire risk assessment.