# Extract river segements from HydroRIVERS

The following code takes a continental-scale rivers shapefile from the HydroATLAS "HydroRIVERS" database, which is a large polyline file containing segments of all large rivers in the world. The user should first open the shpaefile in a GIS and find the desired river to extract. The user should then choose the most upstream segment they want for the extraction, and record the 'HYRIV_ID' for that segment. 'HYRIV_ID' is a unique numeric for each segment in the dataset. The user then enters the desired dataset name, path, the 'HYRIV_ID', a river name, and an output path (optional). Executing the code extracts all river segments downstream of and including the given 'HYRIV_ID' segment, terminating at the ocean or depositional basin. This is written as a shapefile to the desired output folder and named after the river.

Author: James (Huck) Rees; PhD Student, UCSB Geography

Date: June 25th, 2024

## Import libraries

In [1]:
import geopandas as gpd
import pandas as pd
import os

## Initialize functions

In [2]:
def find_downstream_segments(gdf, hyriv_id):
    """
    Find all downstream segments for a given river segment.

    Parameters:
    gdf (gpd.GeoDataFrame): The GeoDataFrame containing river segments.
    hyriv_id (int): The HYRIV_ID of the starting river segment.

    Returns:
    gpd.GeoDataFrame: A GeoDataFrame containing all downstream segments.
    """
    downstream_segments = []
    current_segment = gdf[gdf['HYRIV_ID'] == hyriv_id]

    while not current_segment.empty:
        downstream_segments.append(current_segment)
        next_down_id = current_segment.iloc[0]['NEXT_DOWN']
        
        if next_down_id == 0:
            break
        
        current_segment = gdf[gdf['HYRIV_ID'] == next_down_id]

    return gpd.GeoDataFrame(pd.concat(downstream_segments, ignore_index=True))

def extract_downstream_rivers(shapefile_path: str, hyriv_id: int, output_folder: str, river_name: str) -> None:
    """
    Extract and save all downstream river segments starting from a given segment.

    Parameters:
    shapefile_path (str): The path to the input shapefile containing river segments.
    hyriv_id (int): The HYRIV_ID of the starting river segment.
    output_folder (str): The folder path where the output shapefile will be stored.
    river_name (str): The name of the river to be used as the shapefile name and subfolder.

    Returns:
    None
    """
    # Construct the full path to the output shapefile, including a subfolder named river_name
    output_directory = os.path.join(output_folder, river_name)
    output_shapefile_path = os.path.join(output_directory, river_name + '.shp')

    # Create the directory if it does not exist
    os.makedirs(output_directory, exist_ok=True)

    # Load the shapefile into a GeoDataFrame
    gdf = gpd.read_file(shapefile_path)

    # Find all downstream segments starting from the specified HYRIV_ID
    downstream_gdf = find_downstream_segments(gdf, hyriv_id)

    # Save the downstream segments to the constructed output shapefile path
    downstream_gdf.to_file(output_shapefile_path)

## Input variables and run

In [3]:
# Required inputs
hyriv_id = 40784746  # Replace with your starting HYRIV_ID
river_name = 'Brahmaputra'   # Name of river. This will be used to ID it throughout entire future analysis
shapefile_path = r'C:\Users\huckr\Desktop\UCSB\Dissertation\Data\RiverMapping\HydroATLAS\HydroRIVERS\Asia\HydroRIVERS_Asia.shp'

# Optional inputs
output_folder = r'C:\Users\huckr\Desktop\UCSB\Dissertation\Data\RiverMapping\HydroATLAS\HydroRIVERS\Extracted_Rivers'

extract_downstream_rivers(shapefile_path, hyriv_id, 
                          output_folder, 
                          river_name)