# Hydraulic Geometry Calculator

The following code takes the standard RivMapper reach polygons, and clips and trims the Global Bankfull Discharge Dataset (GQBF) to each reach. Using this dataset. Using GQBF, the Open-Elevation API, and standard Python geospatial libraries, the code extracts the median wetted channel width, median bankfull discharge, channel length, and channel slope for each reach, outputting all metrics to a .csv.

The user must then use the BASED depth estimator to manually calculate water depth for each reach. A planned addition to this notebook will include the BASED API and automate depth calculations. 

Global River BankFull Discharge (GQBF): 
Liu, Y., Wortmann, M., & Slater, L. (2024). Global River BankFull Discharge (GQBF) (0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.13855371

Open-Elevation-API: https://github.com/Jorl17/open-elevation

Boost-Assisted Stream Estimator for Depth (BASED):https://based-estimator.streamlit.app/ (Use manually after running this code to estimate depth)

Author: James (Huck) Rees; PhD Student, UCSB Geography

Date: March 30, 2025

## Import packages

In [1]:
import requests
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pyproj
from pyproj import CRS, Transformer
import os
from shapely.ops import unary_union, split, linemerge
from shapely.geometry import LineString, Point
import xgboost as xgb

## Initialize functions

Define working directory

In [4]:
# Get GQBF. Checks if it is already extracted. If not, it extracts from the QGBF database
def get_GQBF(river_name, reach_gdf, continent_abr, working_directory):
    """
    Checks if a shapefile exists for the given river name in the working directory.
    If it exists, loads it as a GeoDataFrame.
    If it does not exist, calls extract_GQBF to generate the data.
    
    Parameters:
    - river_name (str): The name of the river.
    - reach_gdf (GeoDataFrame): The river reach geometry.
    - continent_abr (str): Two-letter continent abbreviation.
    - working_directory (str): The directory where the shapefile is expected to be found.
    
    Returns:
    - gdf (GeoDataFrame): The loaded, extracted, and trimmed GeoDataFrame.
    """

    # Construct the expected file path for the shapefile
    shapefile_path = os.path.join(working_directory, 'GQBF', 'Extracted_rivers', river_name, f"{river_name}.shp")
    
    # Check if the shapefile exists and is a valid file
    if os.path.isfile(shapefile_path):
        print(f"Shapefile found: {shapefile_path}. Loading data...")
        try:
            gdf = gpd.read_file(shapefile_path)
        except Exception as e:
            print(f"Error loading shapefile: {e}")
            print(f"Attempting to extract data for {river_name} in {continent_abr}...")
            gdf = extract_GQBF(river_name, reach_gdf, continent_abr, working_directory)
    else:
        print(f"Shapefile not found. Extracting data for {river_name} in {continent_abr}...")
        gdf = extract_GQBF(river_name, reach_gdf, continent_abr, working_directory)

    # Ensure the dataset is not empty before trimming
    if gdf is not None and not gdf.empty:
        # Call trim_GQBF() to refine the extracted data
        trimmed_gdf = trim_GQBF(reach_gdf, gdf)
        return trimmed_gdf
    else:
        print(f"Warning: Extracted GQBF data for {river_name} is empty.")
        return None

# This function extracts the GQBF data from one of the main geopackages. 
# This function is called as part of the previous function if the GQBF for the specified river does not yet exist.
def extract_GQBF(river_name, reach_gdf, continent_abr, working_directory):
    """
    Extracts GQBF data for a given river by importing the corresponding GeoPackage.
    The function filters data for the specified river using its reach geometry.

    Parameters:
    - river_name (str): The name of the river.
    - reach_gdf (GeoDataFrame): The river reach geometry.
    - continent_abr (str): Two-letter continent abbreviation.
    - working_directory (str): The directory where extracted data should be stored.

    Returns:
    - gdf (GeoDataFrame): The filtered and processed GeoDataFrame.
    """
    
    # Step 1: Construct the path to the GeoPackage file
    gpkg_filename = f"GQBFv0.1_reaches_{continent_abr}_EPSG4326.gpkg"
    gpkg_path = os.path.join(working_directory, 'GQBF', gpkg_filename)

    # Step 2: Check if the GeoPackage file exists
    if not os.path.isfile(gpkg_path):
        raise FileNotFoundError(f"GeoPackage file not found: {gpkg_path}")

    print(f"Loading GQBF data from: {gpkg_path}...")

    # Step 3: Load the entire GeoPackage
    try:
        gdf = gpd.read_file(gpkg_path)
        print("GeoPackage successfully loaded.")
    except Exception as e:
        raise RuntimeError(f"Error loading GeoPackage: {e}")
    
    # Step 4: Ensure both layers are in the same CRS (EPSG:4326)
    if gdf.crs.to_epsg() != 4326:
        print("Reprojecting GQBF dataset to EPSG:4326...")
        gdf = gdf.to_crs(epsg=4326)

    # Step 5: Spatially filter GQBF data to the river reach
    filtered_gdf = gdf[gdf.intersects(reach_gdf.unary_union)]

    if filtered_gdf.empty:
        print(f"Warning: No matching features found for {river_name}.")
    
    # Step 6: Save the new filtered shapefile (Optional)
    output_path = os.path.join(working_directory, 'GQBF', 'Extracted_rivers', river_name)
    os.makedirs(output_path, exist_ok=True)  # Ensure directory exists
    shapefile_path = os.path.join(output_path, f"{river_name}.shp")

    if not filtered_gdf.empty:
        filtered_gdf.to_file(shapefile_path)
        print(f"Extracted shapefile saved at: {shapefile_path}")
        print(f"GQBF for {river_name} loaded.")
        print("Columns in filtered GQBF:", filtered_gdf.columns.tolist())

    return filtered_gdf  # Return the filtered GeoDataFrame

# This function gets the RivMapper reach polygons
def get_reach(river_name, working_directory):
    """
    Imports the reach shapefile for the given river, reprojects it to EPSG:4326, and returns it as a GeoDataFrame.

    Parameters:
    - river_name (str): The name of the river.
    - working_directory (str): The directory where the reach shapefile is stored.

    Returns:
    - reach_gdf (GeoDataFrame): The reprojected reach GeoDataFrame.
    """

    # Step 1: Construct the file path
    reach_shapefile_path = os.path.join(working_directory, 'RiverMapping', 'Reaches', river_name, f"{river_name}.shp")

    # Step 2: Check if the file exists
    if not os.path.isfile(reach_shapefile_path):
        raise FileNotFoundError(f"Reach shapefile not found: {reach_shapefile_path}")

    print(f"Loading reach shapefile from: {reach_shapefile_path}...")

    # Step 3: Load the reach shapefile
    try:
        reach_gdf = gpd.read_file(reach_shapefile_path)
        print("Reach shapefile successfully loaded.")
    except Exception as e:
        raise RuntimeError(f"Error loading reach shapefile: {e}")

    # Step 4: Reproject to EPSG:4326 if necessary
    if reach_gdf.crs is None:
        raise ValueError(f"Reach shapefile does not have a CRS: {reach_shapefile_path}")
    
    if reach_gdf.crs.to_epsg() != 4326:
        print(f"Reprojecting reach shapefile to EPSG:4326...")
        reach_gdf = reach_gdf.to_crs(epsg=4326)

    print("Reach shapefile successfully reprojected to EPSG:4326.")

    return reach_gdf

# Trim the GQBF to just the main channel
def trim_GQBF(reach_gdf, filtered_gdf):
    """
    Further refines the filtered GQBF dataset to extract the river's mainstem.

    Parameters:
    - reach_gdf (GeoDataFrame): The river reach geometry.
    - filtered_gdf (GeoDataFrame): The initially filtered GQBF dataset.

    Returns:
    - mainstem_gqbf_gdf (GeoDataFrame): The extracted mainstem of the river.
    """

    print("Trimming GQBF data to retain only ds_order = 1 segments...")

    # Select reaches where ds_order = 1
    ds_order_1_reaches = reach_gdf[reach_gdf['ds_order'] == 1]

    # Filter the GQBF dataset to include only segments that intersect with ds_order = 1 reaches
    trimmed_gqbf_gdf = filtered_gdf[filtered_gdf.intersects(ds_order_1_reaches.unary_union)].copy()
    
    print("Identifying the most upstream segment...")
    
    # Convert upstream_l to list of integers
    def parse_upstream_l(value):
        if isinstance(value, str):
            return [int(v) for v in value.split(',')]
        elif isinstance(value, int):
            return [value]
        else:
            return []
    
    trimmed_gqbf_gdf.loc[:, 'parsed_upstream_l'] = trimmed_gqbf_gdf['upstream_l'].apply(parse_upstream_l)
    
    # Find the most upstream segment
    upstream_end_gqbf_gdf = trimmed_gqbf_gdf[trimmed_gqbf_gdf.apply(lambda row: all(up not in trimmed_gqbf_gdf['reach_id'].values for up in row['parsed_upstream_l']), axis=1)].copy()
    
    if not upstream_end_gqbf_gdf.empty:
        current_segment = upstream_end_gqbf_gdf.loc[upstream_end_gqbf_gdf['qbf'].idxmax()].copy()
    else:
        return None  # No upstream segment found
    
    print("Upstream segement identified.")
    # Extract mainstem by following downstream connections
    mainstem_segments = []
    print("Mapping out mainstem.")
    while current_segment is not None:
        mainstem_segments.append(current_segment)
        
        # Get downstream segment(s)
        downstre_values = current_segment['downstre_1']
        if isinstance(downstre_values, str):
            downstream_ids = [int(v) for v in downstre_values.split(',')]
        elif isinstance(downstre_values, int):
            downstream_ids = [downstre_values]
        else:
            downstream_ids = []
        
        # Select the next segment
        downstream_segments = filtered_gdf[filtered_gdf['reach_id'].isin(downstream_ids)]
        if not downstream_segments.empty:
            current_segment = downstream_segments.loc[downstream_segments['qbf'].idxmax()].copy()
        else:
            current_segment = None
    
    # Create final mainstem GeoDataFrame
    mainstem_gqbf_gdf = filtered_gdf[filtered_gdf['reach_id'].isin([seg['reach_id'] for seg in mainstem_segments])].copy()
    print("Mainstem mapped.")
    
    return mainstem_gqbf_gdf

def get_elevation(lat, lon):
    """Get elevation using ASTER30M via Open-Topodata"""
    import requests
    url = "https://api.opentopodata.org/v1/aster30m"
    params = {"locations": f"{lat},{lon}"}
    try:
        response = requests.get(url, params=params, timeout=10)
        data = response.json()
        if data['status'] == 'OK':
            return data['results'][0]['elevation']
    except:
        pass
    return None

# Get channel slope at each reach
def get_slope(reach_gdf, gqbf_gdf):
    """
    Computes the channel slope for each reach in the reach_gdf using elevation data.

    Parameters:
    - reach_gdf (GeoDataFrame): River reach geometries.
    - gqbf_gdf (GeoDataFrame): GQBF mainstem geometries with flow direction.

    Returns:
    - slope_dict (dict): Mapping from ds_order to estimated channel slope.
    """
    
    def get_point_elevation(point):
        return get_elevation(point.y, point.x)

    slope_dict = {}

    for _, reach in reach_gdf.iterrows():
        ds_order = reach['ds_order']

        # Get intersecting GQBF segments
        reach_segments = gqbf_gdf[gqbf_gdf.intersects(reach.geometry)]

        if reach_segments.empty:
            slope_dict[ds_order] = None
            continue

        # Merge all lines into one to form a continuous path if possible
        merged_line = linemerge(reach_segments.geometry.values)

        if merged_line.geom_type == 'MultiLineString':
            merged_line = max(merged_line, key=lambda l: l.length)

        line = LineString(merged_line)
        total_length = reach_segments['length'].sum()  # Use actual length from attribute (in meters)

        if total_length == 0:
            slope_dict[ds_order] = None
            continue

        distances = np.linspace(0, line.length, 10)
        points = [line.interpolate(d) for d in distances]

        elevations = []
        for pt in points:
            try:
                elev = get_point_elevation(pt)
                elevations.append(elev)
            except:
                elevations.append(None)

        valid = [(d, e) for d, e in zip(distances, elevations) if e is not None]
        if len(valid) < 2:
            slope_dict[ds_order] = None
            continue

        dists, elevs = zip(*valid)
        slope, _ = np.polyfit(dists, elevs, 1)
        slope = slope * (line.length / total_length)

        slope_dict[ds_order] = slope

    return slope_dict

def load_based_model(working_directory):
    """
    Load the BASED XGBoost model for depth prediction.
    
    Parameters:
    - working_directory (str): The base working directory.
    
    Returns:
    - model (xgb.Booster): Loaded XGBoost model.
    """
    model_path = os.path.join(working_directory, 'Gearon_etal_2024', 'based-api', 'based_us_sans_trampush_early_stopping_combat_overfitting.ubj')
    
    if not os.path.isfile(model_path):
        raise FileNotFoundError(f"BASED model file not found: {model_path}")
    
    print(f"Loading BASED model from: {model_path}")
    model = xgb.Booster()
    model.load_model(model_path)
    print("BASED model loaded successfully.")
    
    return model

def predict_depth_based(model, width, slope, discharge):
    """
    Predict channel depth using the BASED model.
    
    Parameters:
    - model (xgb.Booster): The loaded BASED XGBoost model.
    - width (float): Channel width in meters.
    - slope (float): Channel slope (dimensionless, e.g., -0.001 for 0.1% gradient).
    - discharge (float): Bankfull discharge in mÂ³/s.
    
    Returns:
    - depth (float): Predicted bankfull depth in meters, or None if inputs invalid.
    """
    
    # Validate inputs
    if width is None or slope is None or discharge is None:
        return None
    
    if width <= 0 or discharge <= 0:
        return None
    
    # BASED uses absolute value of slope (magnitude only)
    slope_abs = abs(slope)
    
    if slope_abs == 0:
        return None
    
    # Create DataFrame with correct feature order: width, slope, discharge
    input_data = pd.DataFrame({
        'width': [width],
        'slope': [slope_abs],
        'discharge': [discharge]
    })
    
    # Convert to DMatrix
    dmatrix = xgb.DMatrix(input_data)
    
    # Predict
    try:
        prediction = model.predict(dmatrix)
        depth = float(prediction[0])
        
        # Basic sanity check
        if depth <= 0:
            return None
            
        return depth
    except Exception as e:
        print(f"Error predicting depth: {e}")
        return None

# Calculate hydraulic geometry by reach
def calculate_hydraulic_geom(river_name, continent_abr, working_directory):
    """
    Calculates hydraulic geometry parameters for each reach within the river segment.

    Parameters:
    - river_name (str): The name of the river.
    - continent_abr (str): Two-letter continent abbreviation.
    - working_directory (str): The directory where data is stored.

    Outputs:
    - A CSV file named "river_name_hydraulic_geometry.csv" containing the hydraulic geometry calculations for each reach.
    """
    
    # Call get_reach() to retrieve reach geometry
    reach_gdf = get_reach(river_name, working_directory)
    
    # Step 1: Retrieve GQBF data
    gqbf_gdf = get_GQBF(river_name, reach_gdf, continent_abr, working_directory)

    # Step 1.5: Compute slope for each reach
    slope_dict = get_slope(reach_gdf, gqbf_gdf)
    
    # Step 1.75: Load BASED model once for this river
    try:
        based_model = load_based_model(working_directory)
    except FileNotFoundError as e:
        print(f"Warning: {e}")
        print("Continuing without BASED depth predictions.")
        based_model = None

    print(f"Calculating hydraulic geometry for {river_name}...")
    
    results = []
    
    # Step 2: Iterate through each reach in reach_gdf
    for _, reach in reach_gdf.iterrows():
        ds_order = reach["ds_order"]

        # Select segments from gqbf_gdf that intersect with the current reach
        reach_segments = gqbf_gdf[gqbf_gdf.intersects(reach.geometry)]

        if not reach_segments.empty:
            median_width = reach_segments["grwl_width"].median()
            median_qbf = reach_segments["qbf"].median()
            length = reach_segments["length"].sum()
        else:
            median_width = median_qbf = length = None

        slope = slope_dict.get(ds_order, None)
        
        # Predict depth using BASED model
        depth = None
        if based_model is not None:
            depth = predict_depth_based(based_model, median_width, slope, median_qbf)

        results.append({
            "ds_order": ds_order,
            "median_width_m": median_width,
            "median_qbf_m3s": median_qbf,
            "length_m": length,
            "slope": slope,
            "BASED_depth_m": depth
        })
    
    # Step 3: Convert results to a DataFrame
    df = pd.DataFrame(results)
    
    # Step 3.5: Print summary statistics
    print(f"\n=== Hydraulic Geometry Summary for {river_name} ===")
    print(f"Total reaches: {len(df)}")
    
    valid_widths = df['median_width_m'].notna().sum()
    valid_discharges = df['median_qbf_m3s'].notna().sum()
    valid_slopes = df['slope'].notna().sum()
    valid_depths = df['BASED_depth_m'].notna().sum()
    
    print(f"Reaches with valid width: {valid_widths}/{len(df)}")
    print(f"Reaches with valid discharge: {valid_discharges}/{len(df)}")
    print(f"Reaches with valid slope: {valid_slopes}/{len(df)}")
    print(f"Reaches with valid BASED depth: {valid_depths}/{len(df)}")
    
    if valid_depths > 0:
        print(f"\nBASED Depth Statistics:")
        print(f"  Mean: {df['BASED_depth_m'].mean():.2f} m")
        print(f"  Median: {df['BASED_depth_m'].median():.2f} m")
        print(f"  Min: {df['BASED_depth_m'].min():.2f} m")
        print(f"  Max: {df['BASED_depth_m'].max():.2f} m")
    else:
        print("\nWarning: No valid BASED depth predictions generated.")
    
    print("="*50 + "\n")
    
    # Step 4: Ensure output directory exists
    output_dir = os.path.join(working_directory, "RiverMapping", "HydraulicGeometry", river_name)
    os.makedirs(output_dir, exist_ok=True)
    
    # Step 5: Save to CSV
    output_csv_path = os.path.join(output_dir, f"{river_name}_hydraulic_geometry.csv")
    df.to_csv(output_csv_path, index=False)
    
    print(f"Hydraulic geometry data saved to: {output_csv_path}")

# Main function to calcualte hydraulic geometry for all rivers in the .csv
def process_hydraulic_geom_calculator(csv_file_path):
    """
    Process multiple rivers by generating river masks for each entry in the input CSV file.

    Parameters:
    csv_file (str): The file path to the CSV containing input parameters for multiple rivers.
                    Each row should specify parameters such as:
                    - River name
                    - Working directory
                    - HydroAtlas zone
    Workflow:
    1. Read the CSV file to get parameters for each river.
    2. Loop through each river in the CSV to extract hydraulic geometry metrics for every reach.

    Returns:
    None: This function processes and exports river masks for each river specified in the CSV file.
    """
    
    # Step 1: Read the CSV file containing input variables for multiple rivers
    river_data = pd.read_csv(csv_file_path)

    # Step 2: Loop through each row (each river) in the CSV
    for index, row in river_data.iterrows():
        # Extract necessary input values from the current CSV row
        river_name = row['river_name']  # Name of the river
        working_directory = row['working_directory']  # Directory for processing
        continent_abbr = row['hydroatlas_zone']  # Continent abbreviation to select GQBF dataset in case extraction needed
        
        # Step 3: Call the function to process the river mask for the current river
        calculate_hydraulic_geom(river_name, continent_abbr, working_directory)

In [6]:
csv_file_path = r"E:\Dissertation\Data\Geyman_river_datasheet.csv"
process_hydraulic_geom_calculator(csv_file_path)

Loading reach shapefile from: E:\Dissertation\Data\RiverMapping\Reaches\Yukon_Beaver\Yukon_Beaver.shp...
Reach shapefile successfully loaded.
Reprojecting reach shapefile to EPSG:4326...
Reach shapefile successfully reprojected to EPSG:4326.
Shapefile found: E:\Dissertation\Data\GQBF\Extracted_rivers\Yukon_Beaver\Yukon_Beaver.shp. Loading data...
Trimming GQBF data to retain only ds_order = 1 segments...
Identifying the most upstream segment...
Upstream segement identified.
Mapping out mainstem.
Mainstem mapped.
Loading BASED model from: E:\Dissertation\Data\Gearon_etal_2024\based-api\based_us_sans_trampush_early_stopping_combat_overfitting.ubj
BASED model loaded successfully.
Calculating hydraulic geometry for Yukon_Beaver...

=== Hydraulic Geometry Summary for Yukon_Beaver ===
Total reaches: 1
Reaches with valid width: 1/1
Reaches with valid discharge: 1/1
Reaches with valid slope: 1/1
Reaches with valid BASED depth: 1/1

BASED Depth Statistics:
  Mean: 4.27 m
  Median: 4.27 m
  Min: