# Example 33: USGS Gauge Catalog Generation

This notebook demonstrates how to generate and use a standardized USGS gauge data catalog for your HEC-RAS project.

## Purpose

The gauge catalog generation function creates a standardized "USGS Gauge Data" folder (similar to the precipitation module's storm catalog) that:
- Discovers all active USGS gauges within project extent
- Downloads historical data for each gauge
- Creates master catalog for easy gauge discovery
- Provides standard location for engineering review and downstream functions

## Key Functions

- `generate_gauge_catalog()`: Create complete gauge catalog with metadata and data
- `load_gauge_catalog()`: Load gauge catalog from standard location
- `load_gauge_data()`: Load historical data for specific gauge
- `get_gauge_folder()`: Get path to gauge folder
- `update_gauge_catalog()`: Refresh catalog with latest data

## Example Project

We'll use the **Bald Eagle Creek** example project which has 2 active USGS gauges:
- USGS-01547200: Upstream gauge (265 sq mi drainage)
- USGS-01548005: Downstream gauge (562 sq mi drainage)

## Dependencies

Requires: `pip install dataretrieval geopandas tqdm`

## 1. Setup and Initialization

In [1]:
# Standard library imports
import sys
from pathlib import Path
import pandas as pd
import json

# Add parent directory to path for development
try:
    from ras_commander import init_ras_project, ras, RasExamples
    from ras_commander.usgs import (
        generate_gauge_catalog,
        load_gauge_catalog,
        load_gauge_data,
        get_gauge_folder,
        update_gauge_catalog
    )
except ImportError:
    current_file = Path().resolve()
    parent_directory = current_file.parent
    sys.path.append(str(parent_directory))
    from ras_commander import init_ras_project, ras, RasExamples
    from ras_commander.usgs import (
        generate_gauge_catalog,
        load_gauge_catalog,
        load_gauge_data,
        get_gauge_folder,
        update_gauge_catalog
    )

print("✓ Imports successful")

✓ Imports successful


## 2. Extract Example Project

In [2]:
# Extract Bald Eagle Creek project
project_path = RasExamples.extract_project("Balde Eagle Creek", output_path="example_projects_420_usgs_gauge_catalog")

print(f"Project extracted to: {project_path}")

# Initialize project
init_ras_project(project_path, "6.6")

print(f"\nProject: {ras.project_name}")
print(f"Path: {ras.project_folder}")
print(f"\nProject initialized successfully")

2025-12-13 23:14:33 - ras_commander.RasExamples - INFO - Found zip file: C:\GH\ras-commander\examples\Example_Projects_6_6.zip
2025-12-13 23:14:33 - ras_commander.RasExamples - INFO - Loading project data from CSV...
2025-12-13 23:14:33 - ras_commander.RasExamples - INFO - Loaded 68 projects from CSV.
2025-12-13 23:14:33 - ras_commander.RasExamples - INFO - ----- RasExamples Extracting Project -----
2025-12-13 23:14:33 - ras_commander.RasExamples - INFO - Extracting project 'Balde Eagle Creek'
2025-12-13 23:14:33 - ras_commander.RasExamples - INFO - Project 'Balde Eagle Creek' already exists. Deleting existing folder...
2025-12-13 23:14:33 - ras_commander.RasExamples - INFO - Existing folder for project 'Balde Eagle Creek' has been deleted.
2025-12-13 23:14:33 - ras_commander.RasExamples - INFO - Successfully extracted project 'Balde Eagle Creek' to c:\GH\ras-commander\examples\example_projects_420_usgs_gauge_catalog\Balde Eagle Creek
2025-12-13 23:14:33 - ras_commander.RasMap - INFO -

Project extracted to: c:\GH\ras-commander\examples\example_projects_420_usgs_gauge_catalog\Balde Eagle Creek

Project: BaldEagle
Path: C:\GH\ras-commander\examples\example_projects_420_usgs_gauge_catalog\Balde Eagle Creek

Project initialized successfully


## 3. Generate Gauge Catalog

This will:
1. Find all USGS gauges within 50% buffer of project extent
2. Download 10 years of historical data (flow and stage)
3. Create standardized folder structure
4. Generate master catalog and documentation

In [3]:
# Note: Bald Eagle Creek project doesn't have embedded CRS, so we specify it manually
# The project uses PA State Plane North (US feet) - EPSG:2271
summary = generate_gauge_catalog(
    buffer_percent=50.0,         # Search within 50% buffer of project extent
    include_historical=True,     # Download historical data
    historical_years=10,         # Last 10 years of data
    parameters=['flow', 'stage'], # Retrieve flow and stage data
    project_crs="EPSG:2271"      # PA State Plane North (US feet) - required for Bald Eagle
)

# Display summary
print("\n" + "="*60)
print("GAUGE CATALOG GENERATION SUMMARY")
print("="*60)
print(f"Gauges found: {summary['gauge_count']}")
print(f"Successfully processed: {summary['gauges_processed']}")
print(f"Failed: {summary['gauges_failed']}")
print(f"Output folder: {summary['output_folder']}")
print(f"Data size: {summary['data_size_mb']:.2f} MB")
print(f"Processing time: {summary['processing_time_sec']:.1f} seconds")
print("="*60)

2025-12-13 23:14:33 - ras_commander.usgs.catalog - INFO - Generating USGS gauge catalog for project: C:\GH\ras-commander\examples\example_projects_420_usgs_gauge_catalog\Balde Eagle Creek
2025-12-13 23:14:33 - ras_commander.usgs.catalog - INFO - Output folder: C:\GH\ras-commander\examples\example_projects_420_usgs_gauge_catalog\Balde Eagle Creek\USGS Gauge Data
2025-12-13 23:14:33 - ras_commander.usgs.catalog - INFO - Buffer: 50.0%, Historical years: 10
2025-12-13 23:14:33 - ras_commander.usgs.catalog - INFO - Step 1/7: Finding gauges in project extent...
2025-12-13 23:14:33 - ras_commander.usgs.spatial - INFO - Retrieving project bounds from: C:\GH\ras-commander\examples\example_projects_420_usgs_gauge_catalog\Balde Eagle Creek\BaldEagle.g01.hdf
2025-12-13 23:14:33 - ras_commander.hdf.HdfProject - INFO - Using existing Path object HDF file: C:\GH\ras-commander\examples\example_projects_420_usgs_gauge_catalog\Balde Eagle Creek\BaldEagle.g01.hdf
2025-12-13 23:14:33 - ras_commander.hdf.H

ValueError: No USGS gauges found within 50.0% buffer of project extent. Try increasing buffer_percent parameter.

## 4. Explore Catalog Structure

The catalog creates a standardized folder structure:

```
project_folder/
├── USGS Gauge Data/
│   ├── gauge_catalog.csv          # Master catalog
│   ├── gauge_locations.geojson    # Spatial data
│   ├── README.md                  # Documentation
│   ├── USGS-01547200/             # Individual gauge folders
│   │   ├── metadata.json
│   │   ├── historical_flow.csv
│   │   ├── historical_stage.csv
│   │   └── data_availability.json
│   └── USGS-01548005/
│       └── ...
```

In [None]:
# List files in catalog folder
catalog_folder = Path(ras.project_path) / "USGS Gauge Data"

print("Catalog folder contents:")
print("\nTop-level files:")
for file in sorted(catalog_folder.glob('*')):
    if file.is_file():
        size_kb = file.stat().st_size / 1024
        print(f"  {file.name:30s} ({size_kb:8.1f} KB)")

print("\nGauge folders:")
for folder in sorted(catalog_folder.glob('USGS-*')):
    if folder.is_dir():
        file_count = len(list(folder.glob('*')))
        print(f"  {folder.name:30s} ({file_count} files)")

## 5. Load and Explore Master Catalog

In [None]:
# Load catalog using helper function
catalog = load_gauge_catalog()

print(f"Loaded catalog with {len(catalog)} gauges\n")

# Display key information
print("Gauge Catalog:")
print("-" * 120)
display_cols = ['site_id', 'station_name', 'drainage_area_sqmi', 'upstream_downstream', 
                'distance_to_project_km', 'parameters_available']
print(catalog[display_cols].to_string(index=False))
print("-" * 120)

## 6. Load Gauge Metadata

In [None]:
# Load metadata for first gauge
site_id = catalog.iloc[0]['site_id']
gauge_folder = get_gauge_folder(site_id)
metadata_file = gauge_folder / "metadata.json"

with open(metadata_file, 'r') as f:
    metadata = json.load(f)

print(f"Metadata for USGS-{site_id}:")
print("="*60)
print(f"Station: {metadata['station_name']}")
print(f"Location: {metadata['location']['latitude']:.4f}, {metadata['location']['longitude']:.4f}")
print(f"State: {metadata['location']['state']}")
print(f"County: {metadata['location']['county']}")
print(f"Drainage Area: {metadata['drainage_area_sqmi']} sq mi")
print(f"Gage Datum: {metadata['gage_datum_ft']} ft")
print(f"Active: {metadata['active']}")
print(f"\nAvailable Parameters: {', '.join(metadata['available_parameters'])}")
print(f"\nPeriod of Record:")
print(f"  Start: {metadata['period_of_record']['start']}")
print(f"  End: {metadata['period_of_record']['end']}")
print(f"  Years: {metadata['period_of_record']['years']}")
print(f"\nLast Updated: {metadata['last_updated']}")
print("="*60)

## 7. Load Data Availability Information

In [None]:
# Load data availability for first gauge
availability_file = gauge_folder / "data_availability.json"

with open(availability_file, 'r') as f:
    availability = json.load(f)

print(f"Data Availability for USGS-{site_id}:")
print("="*60)

for param, info in availability.items():
    print(f"\n{param.upper()}:")
    print(f"  Available: {info['available']}")
    if info['available']:
        print(f"  Date Range: {info['start_date']} to {info['end_date']}")
        print(f"  Record Count: {info['record_count']:,}")
        print(f"  Completeness: {info['completeness']*100:.1f}%")
        if info['gaps']:
            print(f"  Gaps Found: {len(info['gaps'])}")
            for gap in info['gaps'][:3]:  # Show first 3 gaps
                print(f"    - {gap['start']} to {gap['end']} ({gap['days']} days)")

## 8. Load Historical Data Using Helper Function

In [None]:
# Load flow data for first gauge
flow_data = load_gauge_data(site_id, parameter='flow')

print(f"Flow Data for USGS-{site_id}:")
print("="*60)
print(f"Records: {len(flow_data):,}")
print(f"Date Range: {flow_data['datetime'].min()} to {flow_data['datetime'].max()}")
print(f"\nFlow Statistics:")
print(f"  Mean: {flow_data['value'].mean():.1f} cfs")
print(f"  Median: {flow_data['value'].median():.1f} cfs")
print(f"  Min: {flow_data['value'].min():.1f} cfs")
print(f"  Max: {flow_data['value'].max():.1f} cfs")
print(f"\nFirst 5 records:")
print(flow_data.head())
print("="*60)

In [None]:
# Load stage data
stage_data = load_gauge_data(site_id, parameter='stage')

print(f"Stage Data for USGS-{site_id}:")
print("="*60)
print(f"Records: {len(stage_data):,}")
print(f"Date Range: {stage_data['datetime'].min()} to {stage_data['datetime'].max()}")
print(f"\nStage Statistics:")
print(f"  Mean: {stage_data['value'].mean():.2f} ft")
print(f"  Median: {stage_data['value'].median():.2f} ft")
print(f"  Min: {stage_data['value'].min():.2f} ft")
print(f"  Max: {stage_data['value'].max():.2f} ft")
print(f"\nFirst 5 records:")
print(stage_data.head())
print("="*60)

## 9. Plot Historical Data

In [None]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Create figure with 2 subplots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

# Plot flow
ax1.plot(flow_data['datetime'], flow_data['value'], 'b-', linewidth=0.8, alpha=0.7)
ax1.set_ylabel('Flow (cfs)', fontsize=12, fontweight='bold')
ax1.set_title(f"USGS-{site_id}: {metadata['station_name']}", fontsize=14, fontweight='bold')
ax1.grid(True, alpha=0.3)
ax1.set_ylim(bottom=0)

# Plot stage
ax2.plot(stage_data['datetime'], stage_data['value'], 'g-', linewidth=0.8, alpha=0.7)
ax2.set_ylabel('Stage (ft)', fontsize=12, fontweight='bold')
ax2.set_xlabel('Date', fontsize=12, fontweight='bold')
ax2.grid(True, alpha=0.3)
ax2.set_ylim(bottom=0)

# Format x-axis
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
ax2.xaxis.set_major_locator(mdates.YearLocator())
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

print(f"\nPlot shows 10 years of flow and stage data for USGS-{site_id}")

## 10. Process All Gauges in Catalog

In [None]:
# Summary statistics for all gauges
print("Summary for all gauges:")
print("="*80)

for idx, gauge in catalog.iterrows():
    site_id = gauge['site_id']
    name = gauge['station_name']
    drainage = gauge['drainage_area_sqmi']
    position = gauge['upstream_downstream']
    distance = gauge['distance_to_project_km']
    
    print(f"\nUSGS-{site_id}: {name}")
    print(f"  Position: {position.title()} ({distance:.1f} km from project)")
    print(f"  Drainage: {drainage} sq mi")
    
    # Check if flow data file exists before loading
    gauge_folder = get_gauge_folder(site_id)
    flow_file = gauge_folder / "historical_flow.csv"
    
    if flow_file.exists():
        flow = load_gauge_data(site_id, parameter='flow')
        print(f"  Flow: {len(flow):,} records, mean={flow['value'].mean():.0f} cfs, max={flow['value'].max():.0f} cfs")
    else:
        print(f"  Flow: No data available")
    
    # Check if stage data file exists before loading
    stage_file = gauge_folder / "historical_stage.csv"
    
    if stage_file.exists():
        stage = load_gauge_data(site_id, parameter='stage')
        print(f"  Stage: {len(stage):,} records, mean={stage['value'].mean():.2f} ft, max={stage['value'].max():.2f} ft")
    else:
        print(f"  Stage: No data available")

print("\n" + "="*80)

## 11. Use Catalog Data for Boundary Conditions

The catalog data can be easily used with other USGS functions:

In [None]:
from ras_commander.usgs import generate_flow_hydrograph_table

# Select upstream gauge for BC generation
upstream_gauge = catalog[catalog['upstream_downstream'] == 'upstream'].iloc[0]
site_id = upstream_gauge['site_id']

print(f"Using USGS-{site_id} for boundary condition:")
print(f"  Station: {upstream_gauge['station_name']}")
print(f"  Drainage: {upstream_gauge['drainage_area_sqmi']} sq mi\n")

# Load flow data
flow = load_gauge_data(site_id, parameter='flow')

# Get last 7 days (168 hours)
recent_flow = flow.tail(168).copy()

print(f"Using last {len(recent_flow)} hourly values for BC")
print(f"Date range: {recent_flow['datetime'].min()} to {recent_flow['datetime'].max()}")
print(f"Flow range: {recent_flow['value'].min():.0f} to {recent_flow['value'].max():.0f} cfs\n")

# Generate HEC-RAS format hydrograph table
bc_table = generate_flow_hydrograph_table(
    flow_values=recent_flow['value'],
    interval='1HOUR'
)

print("Generated boundary condition table:")
print(bc_table[:500])  # Show first 500 characters
print(f"\n... ({len(bc_table)} characters total)")

## 12. Update Catalog (Add New Data)

The `update_gauge_catalog()` function refreshes existing gauges with new data:

In [None]:
# Update catalog with latest data (last 30 days)
update_summary = update_gauge_catalog()

print("\nCatalog Update Summary:")
print("="*60)
print(f"Gauges updated: {update_summary['gauges_updated']}")
print(f"Gauges failed: {update_summary['gauges_failed']}")
print(f"Processing time: {update_summary['processing_time_sec']:.1f} seconds")
print("="*60)

## 13. Custom Catalog Configuration

You can customize the catalog generation:

In [None]:
# Example: Generate catalog with custom settings
# (Don't run this cell if you want to keep existing catalog)

if False:  # Set to True to run
    custom_summary = generate_gauge_catalog(
        buffer_percent=100.0,        # Wider search area (2x project extent)
        include_historical=True,
        historical_years=20,         # More historical data
        parameters=['flow', 'stage', 'temperature'],  # Additional parameters
        output_folder=None           # Use default location
    )
    
    print("Custom catalog generated:")
    print(f"  Gauges: {custom_summary['gauge_count']}")
    print(f"  Data size: {custom_summary['data_size_mb']:.2f} MB")

## Summary

This notebook demonstrated:

1. ✅ **Catalog Generation**: One-command gauge discovery and data download
2. ✅ **Standard Structure**: Consistent folder organization across projects
3. ✅ **Metadata Management**: Complete gauge information in JSON format
4. ✅ **Data Loading**: Easy access to historical gauge data
5. ✅ **Integration**: Seamless use with boundary condition generation
6. ✅ **Updates**: Refresh catalog with latest data

## Key Takeaways

- **Standard Location**: `project_folder/USGS Gauge Data/` (like precipitation module)
- **One Command**: `generate_gauge_catalog()` does everything
- **Engineering Review**: Master catalog CSV for easy gauge assessment
- **Downstream Functions**: Standard location for automated workflows
- **Documentation**: Auto-generated README with usage instructions

## Related Examples

- Example 29: USGS Gauge Data Integration (basic retrieval)
- Example 30: Real-Time Monitoring (live gauge data)
- Example 31: BC Generation from Live Gauge (boundary conditions)
- Example 32: Model Validation with USGS (calibration metrics)

## Next Steps

With the catalog generated, you can:
- Review all available gauges in one location
- Use gauge data for boundary condition generation
- Perform model validation with observed data
- Include catalog in project deliverables for documentation