# Historical Event Validation with AORC Precipitation and USGS Gauges

This notebook demonstrates a comprehensive historical flood event validation workflow using:
- AORC gridded precipitation (rain-on-grid on 2D mesh)
- USGS gauge data for boundary conditions
- Multiple validation points
- HUC12 watershed coverage analysis

**Event**: December 24-25, 2020 storm (2.72 inches, largest 2020 event)
**Model**: Bald Eagle Creek Multi-2D  
**Template**: Plan 06 (gridded precipitation enabled)

## Workflow Overview

1. Extract and initialize HEC-RAS project (BaldEagleCrkMulti2D)
2. Get project bounds and HUC12 watershed coverage analysis
3. Download AORC precipitation data for storm event
4. Retrieve USGS gauge data for boundary conditions
5. Clone plan and configure for storm simulation
6. Run HEC-RAS model
7. Extract modeled results and compare with observed USGS data
8. Calculate validation metrics and generate comparison plots

In [None]:
# =============================================================================
# IMPORTS AND SETUP
# =============================================================================
from pathlib import Path
import sys

# Flexible imports for development vs installed package
try:
    from ras_commander import RasExamples, init_ras_project, RasCmdr, RasPlan, ras
    from ras_commander.hdf import HdfProject, HdfMesh, HdfResultsXsec, HdfResultsMesh
    from ras_commander.precip import PrecipAorc
    from ras_commander.usgs import (
        get_gauge_metadata,
        retrieve_flow_data,
        retrieve_stage_data,
        align_timeseries,
        calculate_all_metrics,
        plot_timeseries_comparison,
        plot_scatter_comparison,
        configure_rate_limit
    )
except ImportError:
    current_file = Path.cwd()
    parent_directory = current_file.parent
    sys.path.insert(0, str(parent_directory))
    from ras_commander import RasExamples, init_ras_project, RasCmdr, RasPlan, ras
    from ras_commander.hdf import HdfProject, HdfMesh, HdfResultsXsec, HdfResultsMesh
    from ras_commander.precip import PrecipAorc
    from ras_commander.usgs import (
        get_gauge_metadata,
        retrieve_flow_data,
        retrieve_stage_data,
        align_timeseries,
        calculate_all_metrics,
        plot_timeseries_comparison,
        plot_scatter_comparison,
        configure_rate_limit
    )

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
from shapely.geometry import Point

print("Imports successful")

In [None]:
# =============================================================================
# API KEY SETUP
# =============================================================================
# USGS API key provides higher rate limits (5 req/sec vs 0.2 req/sec)
api_key_file = Path("usgs_api_key.txt")

if api_key_file.exists():
    usgs_api_key = api_key_file.read_text().strip()
    configure_rate_limit(requests_per_second=5.0)
    print(f"USGS API key loaded - configured rate limit: 5.0 req/sec")
else:
    usgs_api_key = None
    print("No API key file found - using public access (0.2 req/sec)")
    print("To get higher rate limits, create usgs_api_key.txt with your USGS API key")

In [None]:
# =============================================================================
# PARAMETERS
# =============================================================================
PROJECT_NAME = "BaldEagleCrkMulti2D"
TEMPLATE_PLAN = "06"  # Plan with gridded precipitation enabled
STORM_DATE = "20201224"  # December 24-25, 2020 storm

# Simulation period (48h warmup + event + 48h recession)
SIM_START = "2020-12-22"
SIM_END = "2020-12-27"

# USGS Gauges in Bald Eagle Creek watershed
# 01547500 - Bald Eagle Creek at Blanchard, PA (upstream BC)
# 01548005 - Beech Creek Station (validation point + downstream BC)
UPSTREAM_GAUGE = "01547500"
VALIDATION_GAUGE = "01548005"

# HEC-RAS version
RAS_VERSION = "6.6"

print(f"Project: {PROJECT_NAME}")
print(f"Template Plan: {TEMPLATE_PLAN}")
print(f"Storm Date: {STORM_DATE}")
print(f"Simulation Period: {SIM_START} to {SIM_END}")
print(f"Upstream BC Gauge: {UPSTREAM_GAUGE}")
print(f"Validation Gauge: {VALIDATION_GAUGE}")

In [None]:
# =============================================================================
# EXTRACT AND INITIALIZE PROJECT
# =============================================================================
# Check if project with suffix already exists
suffix = "914_historical"
expected_folder = Path.cwd() / "example_projects" / f"{PROJECT_NAME}_{suffix}"

if expected_folder.exists():
    print(f"Project folder already exists: {expected_folder}")
    project_folder = expected_folder
else:
    # Extract project with unique suffix for this analysis
    project_folder = RasExamples.extract_project(PROJECT_NAME, suffix=suffix)
    print(f"Project extracted to: {project_folder}")

# Initialize project
ras = init_ras_project(project_folder, RAS_VERSION)
print(f"Project initialized")

# Display available plans
print(f"\nAvailable plans:")
print(ras.plan_df[['plan_number', 'Plan Title']].to_string())

In [None]:
# =============================================================================
# GET PROJECT BOUNDS FOR AORC DOWNLOAD
# =============================================================================
# Find geometry HDF file
geom_hdf_files = list(project_folder.glob("*.g*.hdf"))
print(f"Found geometry HDF files: {[f.name for f in geom_hdf_files]}")

# Use the most appropriate geometry file (typically highest number with 2D areas)
geom_hdf = None
for f in sorted(geom_hdf_files, reverse=True):
    if f.suffix == '.hdf' and '.g' in f.name:
        geom_hdf = f
        break

if geom_hdf is None:
    raise FileNotFoundError("No geometry HDF file found")

print(f"\nUsing geometry HDF: {geom_hdf.name}")

# Get project bounds in WGS84 for AORC download
# Use 50% buffer to capture upstream precipitation areas
bounds = HdfProject.get_project_bounds_latlon(
    geom_hdf, 
    buffer_percent=50.0,
    project_crs="EPSG:26918"  # UTM Zone 18N for Pennsylvania
)

print(f"\nProject bounds (WGS84 with 50% buffer):")
print(f"  West:  {bounds[0]:.4f}")
print(f"  South: {bounds[1]:.4f}")
print(f"  East:  {bounds[2]:.4f}")
print(f"  North: {bounds[3]:.4f}")

In [None]:
# =============================================================================
# DOWNLOAD HUC12 WATERSHED FOR COVERAGE ANALYSIS
# =============================================================================
try:
    from pygeohydro import WBD
    
    # Get mesh areas for centroid calculation
    mesh_areas = HdfMesh.get_mesh_areas(geom_hdf)
    print(f"Found {len(mesh_areas)} 2D flow areas")
    
    # Calculate centroid of 2D mesh in WGS84
    mesh_wgs84 = mesh_areas.to_crs("EPSG:4326")
    centroid = mesh_wgs84.geometry.unary_union.centroid
    print(f"\nModel centroid: {centroid.y:.4f}N, {centroid.x:.4f}W")
    
    # Download HUC12 watershed containing centroid
    wbd = WBD("huc12")
    huc12 = wbd.bygeom(Point(centroid.x, centroid.y), geo_crs="EPSG:4326")
    
    print(f"\nHUC12 Watershed:")
    print(f"  HUC ID: {huc12.iloc[0]['huc12']}")
    print(f"  Name: {huc12.iloc[0]['name']}")
    print(f"  Area: {huc12.iloc[0]['areasqkm']:.1f} sq km ({huc12.iloc[0]['areasqkm'] * 0.386102:.1f} sq mi)")
    
    HUC12_AVAILABLE = True
    
except ImportError:
    print("pygeohydro not available - skipping HUC12 analysis")
    print("Install with: pip install pygeohydro")
    HUC12_AVAILABLE = False
except Exception as e:
    print(f"Error downloading HUC12: {e}")
    HUC12_AVAILABLE = False

In [None]:
# =============================================================================
# CALCULATE DRAINAGE COVERAGE
# =============================================================================
if HUC12_AVAILABLE and 'huc12' in dir():
    # Use equal-area projection for accurate area calculations
    # EPSG:5070 (Albers Equal Area Conic) is standard for US applications
    equal_area_crs = "EPSG:5070"
    
    # Project mesh areas to equal-area CRS for accurate area calculation
    mesh_areas_proj = mesh_areas.to_crs(equal_area_crs)
    mesh_total_area_sqkm = mesh_areas_proj.geometry.area.sum() / 1e6  # m^2 to km^2
    
    # Use reported area from WBD (already accurate from USGS)
    huc12_area_sqkm = huc12.iloc[0]['areasqkm']
    
    # Calculate coverage percentage
    coverage_pct = (mesh_total_area_sqkm / huc12_area_sqkm) * 100
    unmodeled_area_sqkm = huc12_area_sqkm - mesh_total_area_sqkm
    
    print("Drainage Coverage Analysis:")
    print(f"  2D Mesh Area: {mesh_total_area_sqkm:.2f} sq km ({mesh_total_area_sqkm * 0.386102:.2f} sq mi)")
    print(f"  HUC12 Area: {huc12_area_sqkm:.2f} sq km ({huc12_area_sqkm * 0.386102:.2f} sq mi)")
    print(f"  Coverage: {coverage_pct:.1f}%")
    print(f"  Unmodeled Area: {unmodeled_area_sqkm:.2f} sq km ({unmodeled_area_sqkm * 0.386102:.2f} sq mi)")
    
    if coverage_pct < 80:
        print(f"\nWARNING: Model covers only {coverage_pct:.1f}% of HUC12 drainage area")
        print("  Expect validation discrepancies due to unmodeled runoff contributions")
else:
    print("HUC12 not available - skipping coverage analysis")

In [None]:
# =============================================================================
# CREATE COVERAGE FIGURE
# =============================================================================
if HUC12_AVAILABLE and 'huc12' in dir():
    # Use projected CRS for proper visualization
    vis_crs = "EPSG:5070"  # Albers Equal Area Conic
    
    fig, ax = plt.subplots(figsize=(12, 10))
    
    # Project all data to visualization CRS
    huc12_proj = huc12.to_crs(vis_crs)
    mesh_areas_vis = mesh_areas.to_crs(vis_crs)
    
    # Plot HUC12 boundary
    huc12_proj.plot(ax=ax, facecolor='lightblue', edgecolor='blue', linewidth=2, alpha=0.3, label='HUC12 Watershed')
    
    # Plot 2D mesh areas
    mesh_areas_vis.plot(ax=ax, facecolor='green', edgecolor='darkgreen', linewidth=1, alpha=0.5, label='2D Flow Areas')
    
    # Add USGS gauge locations if we can retrieve them
    try:
        upstream_meta = get_gauge_metadata(UPSTREAM_GAUGE)
        validation_meta = get_gauge_metadata(VALIDATION_GAUGE)
        
        # Create gauge points in WGS84
        gauge_points = gpd.GeoDataFrame([
            {'site_id': UPSTREAM_GAUGE, 'name': upstream_meta['station_name'], 
             'geometry': Point(upstream_meta['longitude'], upstream_meta['latitude']), 'role': 'Upstream BC'},
            {'site_id': VALIDATION_GAUGE, 'name': validation_meta['station_name'],
             'geometry': Point(validation_meta['longitude'], validation_meta['latitude']), 'role': 'Validation'}
        ], crs="EPSG:4326")
        
        # Transform to visualization CRS
        gauge_points_proj = gauge_points.to_crs(vis_crs)
        
        # Plot gauges with different colors
        for idx, row in gauge_points_proj.iterrows():
            color = 'red' if row['role'] == 'Upstream BC' else 'orange'
            ax.scatter(row.geometry.x, row.geometry.y, c=color, s=150, marker='^', 
                      edgecolors='black', linewidths=1.5, zorder=5)
            ax.annotate(f"{row['site_id']}\n({row['role']})", 
                       (row.geometry.x, row.geometry.y), 
                       xytext=(10, 10), textcoords='offset points', fontsize=8)
        
    except Exception as e:
        print(f"Could not add gauge locations: {e}")
    
    ax.set_title(f"Drainage Coverage Analysis\n{PROJECT_NAME} - HUC12: {huc12.iloc[0]['huc12']}", fontsize=14)
    ax.set_xlabel('Easting (m)')
    ax.set_ylabel('Northing (m)')
    
    # Add coverage annotation
    ax.annotate(f"Coverage: {coverage_pct:.1f}%\nUnmodeled: {unmodeled_area_sqkm:.1f} sq km",
               xy=(0.02, 0.98), xycoords='axes fraction', fontsize=10,
               verticalalignment='top', bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
    
    plt.tight_layout()
    plt.savefig(project_folder / "coverage_analysis.png", dpi=150)
    plt.show()
    
    print(f"\nCoverage figure saved to: {project_folder / 'coverage_analysis.png'}")
else:
    print("Skipping coverage figure (HUC12 not available)")

In [None]:
# =============================================================================
# DOWNLOAD AORC PRECIPITATION DATA
# =============================================================================
# Create precipitation folder
precip_folder = project_folder / "Precipitation"
precip_folder.mkdir(exist_ok=True)

aorc_file = precip_folder / f"storm_{STORM_DATE}.nc"

print(f"Downloading AORC precipitation data...")
print(f"  Start: {SIM_START} 00:00")
print(f"  End: {SIM_END} 00:00")
print(f"  Output: {aorc_file}")

try:
    output_path = PrecipAorc.download(
        bounds=bounds,
        start_time=f"{SIM_START} 00:00",
        end_time=f"{SIM_END} 00:00",
        output_path=aorc_file,
        target_crs="EPSG:5070",  # SHG (Standard Hydrologic Grid) for HEC-RAS
        resolution=2000.0  # 2km resolution (standard SHG)
    )
    print(f"\nAORC data downloaded successfully: {output_path}")
    
    # Get file size
    file_size_mb = output_path.stat().st_size / (1024 * 1024)
    print(f"  File size: {file_size_mb:.2f} MB")
    
except Exception as e:
    print(f"\nError downloading AORC data: {e}")
    print("Continuing with workflow - manual precipitation setup may be needed")
    aorc_file = None

In [None]:
# =============================================================================
# RETRIEVE USGS UPSTREAM BOUNDARY CONDITION DATA
# =============================================================================
print(f"Retrieving upstream BC flow data from gauge {UPSTREAM_GAUGE}...")

# Get gauge metadata
upstream_meta = get_gauge_metadata(UPSTREAM_GAUGE)
print(f"  Station: {upstream_meta['station_name']}")
print(f"  Drainage area: {upstream_meta.get('drainage_area_sqmi', 'N/A')} sq mi")

# Retrieve flow data for simulation period
upstream_flow = retrieve_flow_data(
    site_id=UPSTREAM_GAUGE,
    start_datetime=SIM_START,
    end_datetime=SIM_END,
    data_type='iv'  # Instantaneous values
)

# Remove timezone to match HEC-RAS (timezone-naive)
upstream_flow['datetime'] = pd.to_datetime(upstream_flow['datetime']).dt.tz_localize(None)

print(f"\nRetrieved {len(upstream_flow)} records")
print(f"  Period: {upstream_flow['datetime'].min()} to {upstream_flow['datetime'].max()}")
print(f"  Flow range: {upstream_flow['value'].min():.0f} to {upstream_flow['value'].max():.0f} cfs")

In [None]:
# =============================================================================
# RETRIEVE USGS VALIDATION DATA
# =============================================================================
print(f"Retrieving validation data from gauge {VALIDATION_GAUGE}...")

# Get gauge metadata
try:
    validation_meta = get_gauge_metadata(VALIDATION_GAUGE)
    print(f"  Station: {validation_meta['station_name']}")
    print(f"  Drainage area: {validation_meta.get('drainage_area_sqmi', 'N/A')} sq mi")
except Exception as e:
    print(f"  Could not retrieve metadata: {e}")
    validation_meta = {'station_name': f'USGS {VALIDATION_GAUGE}'}

# Retrieve flow data for validation
try:
    validation_flow = retrieve_flow_data(
        site_id=VALIDATION_GAUGE,
        start_datetime=SIM_START,
        end_datetime=SIM_END,
        data_type='iv'
    )
    
    # Remove timezone
    validation_flow['datetime'] = pd.to_datetime(validation_flow['datetime']).dt.tz_localize(None)
    
    print(f"\nRetrieved {len(validation_flow)} flow records")
    print(f"  Period: {validation_flow['datetime'].min()} to {validation_flow['datetime'].max()}")
    print(f"  Flow range: {validation_flow['value'].min():.0f} to {validation_flow['value'].max():.0f} cfs")
    
except Exception as e:
    print(f"Error retrieving validation flow data: {e}")
    validation_flow = None

# Retrieve stage data for validation
try:
    validation_stage = retrieve_stage_data(
        site_id=VALIDATION_GAUGE,
        start_datetime=SIM_START,
        end_datetime=SIM_END,
        data_type='iv'
    )
    
    # Remove timezone
    validation_stage['datetime'] = pd.to_datetime(validation_stage['datetime']).dt.tz_localize(None)
    
    print(f"\nRetrieved {len(validation_stage)} stage records")
    print(f"  Stage range: {validation_stage['value'].min():.2f} to {validation_stage['value'].max():.2f} ft")
    
except Exception as e:
    print(f"Stage data not available for gauge {VALIDATION_GAUGE}: {e}")
    validation_stage = None

In [None]:
# =============================================================================
# PLOT UPSTREAM BC HYDROGRAPH
# =============================================================================
fig, ax = plt.subplots(figsize=(12, 5))

ax.plot(upstream_flow['datetime'], upstream_flow['value'], 'b-', linewidth=1, label='USGS Flow')
ax.fill_between(upstream_flow['datetime'], 0, upstream_flow['value'], alpha=0.2)

# Highlight storm period
storm_start = pd.Timestamp('2020-12-24')
storm_end = pd.Timestamp('2020-12-26')
ax.axvspan(storm_start, storm_end, alpha=0.1, color='red', label='Storm Period')

ax.set_xlabel('Date')
ax.set_ylabel('Flow (cfs)')
ax.set_title(f'Upstream Boundary Condition: USGS {UPSTREAM_GAUGE}\n{upstream_meta["station_name"]}')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate peak flow during storm
storm_data = upstream_flow[(upstream_flow['datetime'] >= storm_start) & (upstream_flow['datetime'] <= storm_end)]
print(f"\nStorm Period Statistics ({storm_start.strftime('%Y-%m-%d')} to {storm_end.strftime('%Y-%m-%d')})")
print(f"  Peak flow: {storm_data['value'].max():.0f} cfs")
print(f"  Peak time: {storm_data.loc[storm_data['value'].idxmax(), 'datetime']}")

In [None]:
# =============================================================================
# CLONE PLAN FOR STORM SIMULATION
# =============================================================================
print(f"Cloning plan {TEMPLATE_PLAN} for storm simulation...")

# Clone the template plan
new_plan = RasPlan.clone_plan(
    TEMPLATE_PLAN,
    new_plan_shortid="storm12",
    ras_object=ras
)

print(f"Created new plan: {new_plan}")

# Re-initialize to pick up new plan
ras = init_ras_project(project_folder, RAS_VERSION)

# Display updated plans
print(f"\nUpdated plan list:")
print(ras.plan_df[['plan_number', 'Plan Title']].to_string())

In [None]:
# =============================================================================
# UPDATE SIMULATION DATES IN PLAN FILE
# =============================================================================
import re

# Find the new plan file
plan_files = list(project_folder.glob("*.p*"))
plan_file = None
for f in plan_files:
    if f.suffix.startswith('.p') and f.suffix != '.prj' and 'storm12' in str(f).lower() or new_plan in f.name:
        if not f.name.endswith('.hdf'):
            plan_file = f
            break

# If not found by name, use the last plan number
if plan_file is None:
    plan_file = project_folder / f"{project_folder.name.split('_')[0]}.p{new_plan}"

print(f"Plan file: {plan_file}")

if plan_file.exists():
    content = plan_file.read_text()
    
    # Update simulation dates
    # Format: Simulation Date=22DEC2020,0000,27DEC2020,0000
    new_date_line = "Simulation Date=22DEC2020,0000,27DEC2020,0000"
    
    # Replace existing simulation date line
    content = re.sub(r'Simulation Date=.*', new_date_line, content)
    
    plan_file.write_text(content)
    print(f"Updated simulation dates: Dec 22-27, 2020")
else:
    print(f"Warning: Plan file not found at {plan_file}")

In [None]:
# =============================================================================
# NOTE: MANUAL CONFIGURATION STEPS
# =============================================================================
print("="*70)
print("MANUAL CONFIGURATION REQUIRED")
print("="*70)
print("")
print("The following steps require manual configuration in HEC-RAS GUI:")
print("")
print("1. PRECIPITATION DATA:")
if aorc_file and aorc_file.exists():
    print(f"   - AORC NetCDF file saved to: {aorc_file}")
print("   - In HEC-RAS: Edit > Plan Data > Meteorology Data")
print("   - Set Precipitation Type to 'GDAL Raster'")
print(f"   - Browse to: {aorc_file if aorc_file else 'Precipitation/storm_20201224.nc'}")
print("")
print("2. UPSTREAM BOUNDARY CONDITION:")
print(f"   - USGS gauge {UPSTREAM_GAUGE} data available above")
print("   - In HEC-RAS: Edit > Unsteady Flow Data")
print("   - Update upstream flow hydrograph with USGS data")
print("")
print("3. DOWNSTREAM BOUNDARY CONDITION:")
print("   - Use normal depth or stage hydrograph from downstream gauge")
print("")
print("For automated boundary condition updates, see notebook 913.")
print("="*70)

In [None]:
# =============================================================================
# RUN MODEL (Skip if manual configuration needed)
# =============================================================================
RUN_MODEL = True  # Set to True after manual configuration

if RUN_MODEL:
    print(f"Running plan {new_plan}...")
    
    try:
        RasCmdr.compute_plan(
            plan_number=new_plan,
            num_cores=4,
            ras_object=ras
        )
        print("Model execution complete")
        
        # Re-initialize to pick up results
        ras = init_ras_project(project_folder, RAS_VERSION)
        
    except Exception as e:
        print(f"Model execution failed: {e}")
        print("\nCheck that:")
        print("  1. Precipitation data is properly configured")
        print("  2. Boundary conditions are set")
        print("  3. Simulation dates match precipitation data")
else:
    print("Model execution skipped - set RUN_MODEL = True after configuration")
    print("\nAlternatively, use existing results from pre-run plan...")

In [None]:
# =============================================================================
# USE EXISTING PLAN RESULTS (If available)
# =============================================================================
# Check if any plan has HDF results we can use for demonstration
hdf_path = None

for idx, row in ras.plan_df.iterrows():
    hdf_path_str = row.get('HDF_Results_Path')
    if hdf_path_str and Path(hdf_path_str).exists():
        hdf_path = Path(hdf_path_str)
        plan_number = row['plan_number']
        print(f"Found existing results: Plan {plan_number}")
        print(f"  HDF: {hdf_path.name}")
        break

if hdf_path is None:
    print("No existing HDF results found.")
    print("Run the model first or use an existing plan with results.")

# =============================================================================
# VALIDATION: EXTRACT AND COMPARE MODELED VS OBSERVED DATA
# =============================================================================
# The cells below extract modeled results from the HDF file and compare them 
# against observed USGS data. This section requires:
# 1. A completed HEC-RAS run with HDF results (check cell above)
# 2. validation_meta defined (from cell 11)  
# 3. validation_flow/validation_stage defined (from cell 11)

In [None]:
# =============================================================================
# EXTRACT MODELED RESULTS
# =============================================================================
if hdf_path and hdf_path.exists():
    print(f"Extracting modeled results from {hdf_path.name}...")
    
    modeled_df = None
    
    # Try 1D cross-section data first (for 1D or 1D/2D combined models)
    try:
        print("Attempting to extract 1D cross-section results...")
        xs_data = HdfResultsXsec.get_xsec_timeseries(hdf_path)
        
        # Get list of cross sections
        xs_names = xs_data.coords['cross_section'].values.tolist()
        station_values = xs_data.coords['Station'].values.tolist()
        time_values = xs_data.coords['time'].values
        
        print(f"\nExtracted 1D cross-section data:")
        print(f"  Cross sections: {len(xs_names)}")
        print(f"  Time steps: {len(time_values)}")
        print(f"  Station range: {station_values[0]} to {station_values[-1]}")
        
        # Select a downstream cross section for validation comparison
        # Use a station in the middle-lower portion of the model
        target_station = float(station_values[-1]) + (float(station_values[0]) - float(station_values[-1])) * 0.3
        closest_idx = min(range(len(station_values)), key=lambda i: abs(float(station_values[i]) - target_station))
        validation_xs = xs_names[closest_idx]
        
        print(f"\nUsing cross section for validation: {validation_xs}")
        print(f"  Station: {station_values[closest_idx]}")
        
        # Extract flow timeseries
        modeled_flow_ts = xs_data['Flow'].sel(cross_section=validation_xs)
        
        # Create modeled dataframe
        modeled_df = pd.DataFrame({
            'datetime': pd.to_datetime(time_values),
            'value': modeled_flow_ts.values
        })
        
        print(f"\nModeled results (1D cross-section):")
        print(f"  Period: {modeled_df['datetime'].min()} to {modeled_df['datetime'].max()}")
        print(f"  Flow range: {modeled_df['value'].min():.1f} to {modeled_df['value'].max():.1f} cfs")
        
    except (KeyError, Exception) as e:
        print(f"  1D cross-section data not available: {e}")
        print("\nFalling back to 2D mesh results...")
        
        # Try 2D reference lines (like cross-sections for 2D models)
        try:
            print("Attempting to extract 2D reference line results...")
            ref_lines_data = HdfResultsXsec.get_ref_lines_timeseries(hdf_path)
            
            if ref_lines_data and len(ref_lines_data.data_vars) > 0:
                # Get reference line names
                if 'ref_line' in ref_lines_data.coords:
                    ref_line_names = ref_lines_data.coords['ref_line'].values.tolist()
                    time_values = ref_lines_data.coords['time'].values
                    
                    print(f"\nExtracted 2D reference line data:")
                    print(f"  Reference lines: {len(ref_line_names)}")
                    print(f"  Time steps: {len(time_values)}")
                    
                    # Use the first (or most downstream) reference line
                    validation_ref_line = ref_line_names[0] if ref_line_names else None
                    
                    if validation_ref_line and 'Flow' in ref_lines_data.data_vars:
                        print(f"\nUsing reference line for validation: {validation_ref_line}")
                        
                        # Extract flow timeseries
                        modeled_flow_ts = ref_lines_data['Flow'].sel(ref_line=validation_ref_line)
                        
                        # Create modeled dataframe
                        modeled_df = pd.DataFrame({
                            'datetime': pd.to_datetime(time_values),
                            'value': modeled_flow_ts.values
                        })
                        
                        print(f"\nModeled results (2D reference line):")
                        print(f"  Period: {modeled_df['datetime'].min()} to {modeled_df['datetime'].max()}")
                        print(f"  Flow range: {modeled_df['value'].min():.1f} to {modeled_df['value'].max():.1f} cfs")
                    else:
                        print("  No flow data available in reference lines")
                else:
                    print("  No reference lines found in HDF file")
            else:
                print("  Reference line data is empty")
                
        except Exception as e2:
            print(f"  2D reference line data not available: {e2}")
            print("\nFalling back to 2D mesh cell results near gauge location...")
            
            # Try 2D mesh cell data near validation gauge
            try:
                # Get validation gauge location
                if validation_meta and 'latitude' in validation_meta and 'longitude' in validation_meta:
                    gauge_lat = validation_meta['latitude']
                    gauge_lon = validation_meta['longitude']
                    gauge_point = Point(gauge_lon, gauge_lat)
                    
                    print(f"\nValidation gauge location: {gauge_lat:.4f}N, {gauge_lon:.4f}W")
                    
                    # Get mesh cell points and find nearest to gauge
                    cell_points = HdfMesh.get_mesh_cell_points(geom_hdf)
                    
                    # Transform gauge point to mesh CRS
                    mesh_crs = cell_points.crs
                    gauge_gdf = gpd.GeoDataFrame([1], geometry=[gauge_point], crs="EPSG:4326")
                    gauge_proj = gauge_gdf.to_crs(mesh_crs)
                    gauge_point_proj = gauge_proj.geometry.iloc[0]
                    
                    # Find nearest cell
                    cell_id, distance = HdfMesh.find_nearest_cell(gauge_point_proj, cell_points)
                    
                    if cell_id is not None:
                        print(f"  Nearest mesh cell: {cell_id} (distance: {distance:.1f} m)")
                        
                        # Get mesh area names
                        mesh_names = HdfMesh.get_mesh_area_names(hdf_path)
                        if mesh_names:
                            mesh_name = mesh_names[0]  # Use first mesh area
                            
                            # Get cell timeseries for water surface (we'll use this as proxy)
                            # Note: For 2D models, flow is typically calculated from face velocities
                            # Here we extract water surface as a proxy metric
                            print(f"  Extracting water surface timeseries from mesh: {mesh_name}")
                            
                            # Get all mesh cell timeseries
                            mesh_cells_data = HdfResultsMesh.get_mesh_cells_timeseries(
                                hdf_path, 
                                mesh_names=[mesh_name],
                                var="Water Surface"
                            )
                            
                            if mesh_name in mesh_cells_data:
                                wse_data = mesh_cells_data[mesh_name]['Water Surface']
                                
                                # Extract timeseries for the specific cell
                                if cell_id < wse_data.sizes.get('cell_id', 0):
                                    cell_wse = wse_data.sel(cell_id=cell_id)
                                    time_values = cell_wse.coords['time'].values
                                    
                                    # Create dataframe with water surface (as proxy)
                                    # Note: This is WSE, not flow - validation will need adjustment
                                    modeled_df = pd.DataFrame({
                                        'datetime': pd.to_datetime(time_values),
                                        'value': cell_wse.values  # WSE in feet
                                    })
                                    
                                    print(f"\nModeled results (2D mesh cell - Water Surface):")
                                    print(f"  Period: {modeled_df['datetime'].min()} to {modeled_df['datetime'].max()}")
                                    print(f"  WSE range: {modeled_df['value'].min():.2f} to {modeled_df['value'].max():.2f} ft")
                                    print(f"\nNOTE: Extracted water surface elevation, not flow.")
                                    print(f"      For flow validation, use reference lines or calculate from face velocities.")
                                else:
                                    print(f"  Cell ID {cell_id} not found in mesh data")
                            else:
                                print(f"  No data available for mesh: {mesh_name}")
                        else:
                            print("  No mesh areas found in HDF file")
                    else:
                        print("  Could not find nearest cell to gauge location")
                else:
                    print("  Validation gauge location not available")
                    
            except Exception as e3:
                print(f"  2D mesh cell extraction failed: {e3}")
                print(f"\nUnable to extract modeled results from HDF file.")
                print(f"  This may be a 2D-only model without 1D cross-sections or reference lines.")
                print(f"  Consider using HEC-RAS to add a reference line at the validation gauge location.")
                modeled_df = None
    
    if modeled_df is None:
        print("\n" + "="*70)
        print("WARNING: Could not extract flow timeseries from model results.")
        print("="*70)
        print("Options:")
        print("  1. Add a reference line in HEC-RAS at the validation gauge location")
        print("  2. Use stage validation instead of flow (extract WSE from mesh cells)")
        print("  3. Calculate flow from face velocities if reference lines are not available")
        print("="*70)
else:
    print("No HDF results available for extraction")
    modeled_df = None

In [None]:
# =============================================================================
# ALIGN AND CALCULATE FLOW METRICS
# =============================================================================
if modeled_df is not None and validation_flow is not None:
    print("Aligning modeled and observed flow timeseries...")
    
    # Align timeseries
    aligned_flow = align_timeseries(
        modeled_df=modeled_df,
        observed_df=validation_flow
    )
    
    print(f"\nAligned {len(aligned_flow)} timesteps")
    print(f"  Period: {aligned_flow['datetime'].min()} to {aligned_flow['datetime'].max()}")
    print(f"  Modeled range: {aligned_flow['modeled'].min():.1f} to {aligned_flow['modeled'].max():.1f} cfs")
    print(f"  Observed range: {aligned_flow['observed'].min():.1f} to {aligned_flow['observed'].max():.1f} cfs")
    
    # Calculate validation metrics
    # Note: calculate_all_metrics expects (observed, modeled) order
    print("\nCalculating flow validation metrics...")
    flow_metrics = calculate_all_metrics(
        observed=aligned_flow['observed'],
        modeled=aligned_flow['modeled'],
        time_index=aligned_flow['datetime']
    )
    
    print("\n" + "="*60)
    print("FLOW VALIDATION METRICS")
    print("="*60)
    print(f"Nash-Sutcliffe Efficiency (NSE): {flow_metrics['nse']:.3f}")
    print(f"  Interpretation: {'Good' if flow_metrics['nse'] > 0.5 else 'Poor'} (>0.5 = satisfactory)")
    print(f"\nKling-Gupta Efficiency (KGE): {flow_metrics['kge']:.3f}")
    print(f"  Interpretation: {'Good' if flow_metrics['kge'] > 0.5 else 'Poor'} (>0.5 = satisfactory)")
    print(f"\nPeak Flow Error: {flow_metrics['peak_error_pct']:.1f}%")
    print(f"Volume Error: {flow_metrics['vol_error_pct']:.1f}%")
    print(f"RMSE: {flow_metrics['rmse']:.1f} cfs")
else:
    print("Cannot calculate metrics - missing modeled or observed data")
    aligned_flow = None
    flow_metrics = None

In [None]:
# =============================================================================
# PLOT FLOW COMPARISON
# =============================================================================
if aligned_flow is not None:
    # plot_timeseries_comparison expects aligned_data DataFrame, not separate arrays
    fig = plot_timeseries_comparison(
        aligned_data=aligned_flow,
        metrics=flow_metrics,
        title=f"Flow Validation: {validation_meta.get('station_name', VALIDATION_GAUGE)}"
    )
    
    plt.tight_layout()
    plt.savefig(project_folder / "flow_validation.png", dpi=150)
    plt.show()
    
    print(f"\nFigure saved to: {project_folder / 'flow_validation.png'}")
else:
    print("Cannot plot comparison - missing aligned data")

In [None]:
# =============================================================================
# PLOT SCATTER COMPARISON
# =============================================================================
if aligned_flow is not None:
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))
    
    # Flow scatter plot
    ax1 = axes[0]
    ax1.scatter(aligned_flow['observed'], aligned_flow['modeled'], alpha=0.5, s=10)
    
    # Add 1:1 line
    max_val = max(aligned_flow['observed'].max(), aligned_flow['modeled'].max())
    ax1.plot([0, max_val], [0, max_val], 'r--', label='1:1 Line')
    
    ax1.set_xlabel('Observed Flow (cfs)')
    ax1.set_ylabel('Modeled Flow (cfs)')
    ax1.set_title(f'Flow Scatter Plot\nNSE={flow_metrics["nse"]:.3f}, KGE={flow_metrics["kge"]:.3f}')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    ax1.set_aspect('equal')
    
    # Residuals histogram
    ax2 = axes[1]
    residuals = aligned_flow['modeled'] - aligned_flow['observed']
    ax2.hist(residuals, bins=50, edgecolor='black', alpha=0.7)
    ax2.axvline(x=0, color='r', linestyle='--', label='Zero Error')
    ax2.axvline(x=residuals.mean(), color='g', linestyle='-', label=f'Mean: {residuals.mean():.0f} cfs')
    
    ax2.set_xlabel('Residual (Modeled - Observed) [cfs]')
    ax2.set_ylabel('Count')
    ax2.set_title('Residuals Distribution')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(project_folder / "flow_scatter.png", dpi=150)
    plt.show()
    
    print(f"\nFigure saved to: {project_folder / 'flow_scatter.png'}")
else:
    print("Cannot plot scatter - missing aligned data")

## Results Summary

### Validation Interpretation

**NSE (Nash-Sutcliffe Efficiency)**:
- NSE > 0.75: Very good
- 0.65 < NSE < 0.75: Good
- 0.50 < NSE < 0.65: Satisfactory
- NSE < 0.50: Unsatisfactory

**KGE (Kling-Gupta Efficiency)**:
- KGE > 0.75: Very good
- 0.50 < KGE < 0.75: Good/Satisfactory
- KGE < 0.50: Unsatisfactory

### Expected Limitations

1. **Drainage Area Mismatch**: The 2D model may not cover the entire HUC12 watershed, leading to missing runoff contributions from unmodeled areas.

2. **USGS Gauge Location**: The validation gauge may be downstream of significant unmodeled tributaries.

3. **Precipitation Spatial Variability**: AORC data at 4km resolution may not capture localized intense precipitation.

4. **Boundary Condition Uncertainty**: Upstream BC from USGS gauge represents only one flow path into the model.

### Recommendations

1. **Expand Model Domain**: Consider extending 2D mesh to cover more of the HUC12 watershed.

2. **Additional Validation Points**: Add more USGS gauges if available within the model domain.

3. **Sensitivity Analysis**: Test sensitivity to Manning's n, precipitation, and initial conditions.

4. **Multiple Events**: Validate against multiple storm events to assess model robustness.

In [None]:
# =============================================================================
# SUMMARY REPORT
# =============================================================================
print("="*70)
print("HISTORICAL EVENT VALIDATION SUMMARY")
print("="*70)
print(f"\nProject: {PROJECT_NAME}")
print(f"Storm Event: December 24-25, 2020")
print(f"Simulation Period: {SIM_START} to {SIM_END}")

print(f"\nData Sources:")
print(f"  Precipitation: AORC gridded data (4km, hourly)")
print(f"  Upstream BC: USGS {UPSTREAM_GAUGE}")
print(f"  Validation: USGS {VALIDATION_GAUGE}")

if HUC12_AVAILABLE and 'coverage_pct' in dir():
    print(f"\nDrainage Coverage:")
    print(f"  Model covers {coverage_pct:.1f}% of HUC12 watershed")
    print(f"  Unmodeled area: {unmodeled_area_sqkm:.1f} sq km")

if flow_metrics is not None:
    print(f"\nFlow Validation Metrics:")
    print(f"  NSE: {flow_metrics['nse']:.3f}")
    print(f"  KGE: {flow_metrics['kge']:.3f}")
    print(f"  Peak Error: {flow_metrics['peak_error_pct']:.1f}%")
    print(f"  Volume Error: {flow_metrics['vol_error_pct']:.1f}%")

print(f"\nOutput Files:")
if (project_folder / "coverage_analysis.png").exists():
    print(f"  - {project_folder / 'coverage_analysis.png'}")
if (project_folder / "flow_validation.png").exists():
    print(f"  - {project_folder / 'flow_validation.png'}")
if (project_folder / "flow_scatter.png").exists():
    print(f"  - {project_folder / 'flow_scatter.png'}")
if aorc_file and aorc_file.exists():
    print(f"  - {aorc_file}")

print("\n" + "="*70)

## Next Steps for Modeler

### Current State of Analysis

This notebook has completed the **data preparation phase** for historical event validation:

✅ **Completed**:
- HUC12 drainage coverage analysis (84.6% modeled, 15.4% ungauged)
- AORC precipitation downloaded for Storm 12 (Dec 24-25, 2020)
- USGS boundary condition data retrieved and validated
- Template plan cloned and ready for configuration
- Upstream BC hydrograph visualized and validated

⚠️ **Pending Manual Configuration**:

The model currently has:
1. **Existing upstream BC** at "Upstream Inflow" - needs USGS data update
2. **Existing downstream BC** as "Normal Depth" - needs conversion to Stage Hydrograph

**Additional BCs NOT in current model** (would require geometry file edits):
3. **Spring Creek** (USGS 01547100, 142 sq mi) - lateral inflow boundary
4. **Marsh Creek** (USGS 01547700, 44 sq mi) - lateral inflow boundary
5. **Beech Creek** (USGS 01547980, 170 sq mi) - potential lateral inflow
6. **Fishing Creek** (USGS 01548079, 180 sq mi) - potential lateral inflow

---

### Phase 1: Minimum Viable Validation (Recommended to Start)

**Use existing boundary condition locations only:**

#### 1.1 Update Existing Upstream BC with USGS Data

The "Upstream Inflow" BC already exists. Update it with USGS 01547500 flow data.

**Steps**:
1. Locate cloned unsteady file: `BaldEagleDamBrk.u07` (or whatever was created)
2. Find "Boundary Location=...Upstream Inflow" (around line 6)
3. Replace the flow hydrograph table with USGS data

**Use notebook cell outputs**:
- Cell 11 shows the upstream BC hydrograph plot
- The hydrograph data has been retrieved (576 hourly values)

**Automation**:
```python
from ras_commander.usgs.boundary_generation import BoundaryGenerator

# Generate flow table from USGS data
flow_table = BoundaryGenerator.generate_flow_hydrograph_table(
    flow_values=upstream_flow['value'].values,
    interval='1HOUR'
)

# Update in unsteady file
BoundaryGenerator.update_boundary_hydrograph(
    unsteady_file=unsteady_file,
    boundary_location="BaldEagleCr,Upstream Inflow",
    hydrograph_table=flow_table
)
```

#### 1.2 Convert Downstream BC to Stage Hydrograph

The "DSNormalDepth" BC needs to be converted from Normal Depth to Stage Hydrograph.

**Manual Steps**:
1. Open unsteady file: `BaldEagleDamBrk.u07`
2. Find "Boundary Location=...DSNormalDepth" (around line 4)
3. Find "Friction Slope=0.0003,0" (next line after boundary location)
4. Replace "Friction Slope=" line with stage hydrograph table

**Generate stage table**:
```python
from ras_commander.usgs.boundary_generation import BoundaryGenerator

stage_table = BoundaryGenerator.generate_stage_hydrograph_table(
    stage_values=validation_stage['value'].values,
    interval='1HOUR'
)

# Save to helper file for copy-paste
with open('stage_hydrograph_insert.txt', 'w') as f:
    f.write(stage_table)

print("Stage table saved to stage_hydrograph_insert.txt")
print("Copy contents and replace 'Friction Slope=' line in unsteady file")
```

#### 1.3 Update Precipitation Source

Change from DSS to AORC NetCDF:

**Manual Steps**:
1. Open unsteady file
2. Find section starting with `Met BC=Precipitation|Mode=Gridded` (around line 141)
3. Change:
   - `Met BC=Precipitation|Gridded Source=DSS` → `Met BC=Precipitation|Gridded Source=GDAL`
   - Update file path to point to downloaded AORC NetCDF

**Automation** (if available):
```python
from ras_commander import RasUnsteady

RasUnsteady.set_gridded_precipitation(
    unsteady_file=unsteady_file,
    netcdf_path="Precipitation/storm_20201224.nc",
    interpolation="Nearest"
)
```

#### 1.4 Run Simulation and Validate

After BC configuration:
1. Open project in HEC-RAS GUI to verify BCs
2. Run simulation (set `RUN_MODEL=True` in cell 17, or run in GUI)
3. Execute cells 18-22 to extract results and calculate metrics
4. Review validation findings

**Expected Metrics** (with 15% ungauged drainage):
- NSE: 0.40-0.65 (satisfactory to good, accounting for ungauged area)
- PBIAS: -30% to 0% (negative bias expected from ungauged inflow)
- Peak Error: 10-30%
- Correlation: > 0.70

---

### Phase 2: Enhanced Validation with Lateral Inflows (Future)

**Add new boundary conditions** for lateral tributaries to improve validation.

#### 2.1 Required New BCs

| Tributary | Gauge | DA (sq mi) | Data Available | Priority |
|-----------|-------|------------|----------------|----------|
| **Spring Creek** | 01547100 | 142 | Flow (IV) ✓ | **HIGH** |
| **Marsh Creek** | 01547700 | 44 | Flow (IV) ✓ | **HIGH** |
| Beech Creek | 01547980 | 170 | No data | Low |
| Fishing Creek | 01548079 | 180 | No data | Low |

**Combined gaged inflow**: 339 + 142 + 44 = **525 sq mi** (vs 562 sq mi at downstream gauge)
**Ungauged gap**: 37 sq mi (6.6% of downstream drainage)

#### 2.2 Geometry File Edits Required

**For each new BC**, you need to:

1. **Create SA/2D Area Conn** in geometry file:
   - Add connection line between external boundary and 2D mesh
   - Define as "SA/2D Area Conn" type
   - Specify connection cells

2. **Define External SA** (storage area):
   - Create storage area for lateral inflow
   - Connect to 2D mesh via SA/2D Area Conn

3. **Add BC Reference** in unsteady file:
   - Reference the new SA connection
   - Add flow hydrograph table

**Current Limitation**: This requires **manual geometry editing** or HEC-RAS GUI operations.

#### 2.3 Future Automation Feature (Development Roadmap)

**Proposed Function**: `create_bc_from_gauge_location()`

**Concept**:
```python
from ras_commander.geom import RasGeometry2D

RasGeometry2D.create_lateral_bc_from_gauge(
    geom_file="BaldEagleDamBrk.g09",
    gauge_id="01547100",  # Spring Creek
    gauge_lat=40.9158,
    gauge_lon=-77.7897,
    num_faces=20,        # Number of mesh cell faces to use
    offset_distance=50,  # Offset from mesh (ft)
    trim_percent=7.5,    # Trim 7.5% from each end
    bc_name="Spring Creek Inflow"
)
```

**Algorithm**:
1. Find nearest 20 mesh cell faces to gauge location
2. Extract face linestrings and combine
3. Offset linestring away from mesh interior by 50 ft
4. Trim 5-10% from each end (avoid corners)
5. Add SA/2D Area Conn to geometry file
6. Add BC reference to unsteady file
7. Validate geometry integrity

**Benefits**:
- Automate BC creation from gauge coordinates
- Ensure valid HEC-RAS geometry
- Accelerate validation model setup
- Enable batch processing of multiple lateral inflows

**Roadmap Priority**: Medium-High (enables comprehensive multi-gauge validation)

---

### Phase 3: Multi-Gauge Validation Network

**Once lateral BCs are added**, validate at multiple points:

| Validation Point | Gauge | DA at Gauge | DA Upstream BCs | Coverage |
|------------------|-------|-------------|-----------------|----------|
| Downstream | 01548005 | 562 sq mi | 525 sq mi | 93.4% |
| Midstream | 01548000 | 559 sq mi | 525 sq mi | 93.9% |
| Spring Creek | 01547100 | 142 sq mi | 142 sq mi | 100% (at BC) |

**Validation Approach**:
- Extract results at cross sections near each gauge
- Compare modeled vs observed at each location
- Calculate metrics for each validation point
- Assess spatial performance (upstream vs downstream)

---

### Phase 4: Multi-Event Calibration (Future)

**Use AORC catalog to test multiple storms**:

From 2020 AORC catalog (cell 9 reference):
1. Storm 12 (Dec 24-25): 2.72" - largest event
2. Storm 5 (Apr 30-May 1): 2.12" - spring conditions
3. Storm 10 (Nov 11): 1.99" - high intensity
4. Storm 9 (Oct 29-30): 1.74" - fall conditions

**Multi-event validation benefits**:
- Test across different magnitudes
- Assess parameter transferability
- Identify seasonal biases
- Build confidence in calibration

---

### Tools and References

**Automation Functions Available**:
- `BoundaryGenerator.generate_flow_hydrograph_table()` - Create BC tables
- `BoundaryGenerator.update_boundary_hydrograph()` - Update existing BCs
- `RasUnsteady.set_gridded_precipitation()` - Configure AORC NetCDF
- `calculate_all_metrics()` - Comprehensive validation metrics
- `plot_timeseries_comparison()` - Publication-quality plots

**Documentation**:
- BC Configuration: `.claude/outputs/general-purpose/2025-12-30-bc-configuration-workflow.md`
- Gauge Analysis: `.claude/outputs/general-purpose/2025-12-29-gauge-data-availability.md`
- AORC Workflow: `.claude/outputs/general-purpose/2025-12-29-aorc-workflow-research.md`
- Complete Summary: `.claude/outputs/general-purpose/2025-12-30-validation-workflow-summary.md`

---

**Estimated Time to Complete**:
- Phase 1 (Minimum viable): 2-3 hours
- Phase 2 (Add lateral BCs): +3-5 hours (pending automation feature)
- Phase 3 (Multi-gauge validation): +2-3 hours
- Phase 4 (Multi-event): +4-6 hours per additional storm

**Recommended Approach**: Start with Phase 1, validate the workflow works, then expand.