# Assigning River Names to a Global River Dataset

The purpose of this notebook is to document the workflow for havesting river locations and names from [OpenStreetMap](https://www.openstreetmap.org/) to assign names to the [HydroRivers](https://www.hydrosheds.org/page/hydrorivers) Dataset

## Harvesting global river data from OSM

Using the [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API), all elements with the tag `waterway=river` were downloaded from OSM and saved to a shapefile using [a python script](https://github.com/rachelthoms/wri-projects/blob/main/ocean-watch/processing-scripts/global-rivers/harvest-rivers.py).

The script failed to download data for 3 bouding boxes in Europe due to the high density of rivers. A modified version of the script was run to download and save this data. See code below:

In [1]:
import osm2geojson 
import requests
import json
import fiona
from fiona.crs import from_epsg
import time
import geopandas as gdp
import pandas as pd
import geojson

In [2]:
# Define list of missing bounding boxes (note these are 1/2 the size of the bounding boxes in the original script)
missing_bbox_list= [(40,0,45,10),(45,0,50,10),(40,10,45,20),(45,10,50,20)]

# Define the progression of timeouts - increasing by 3 minutes, every 3 attempts
time_outs=[180,180,180,360,360,360,540,540,540]

# Define a list to store the river elements
rivers=[]

In [3]:
# Define the url to the hosted Overpass API to use
overpass_url = "https://overpass.openstreetmap.fr/api/interpreter"

In [4]:
# Create a template for the Overpass QL query
template = """
[out:json]
[timeout:{time_out}];
way{bbox}[waterway=river]["name"];
(._;>;);out;
"""

In [5]:
# Iterate through the list of missing bounding boxes, download the tagged river elements, 
# and save as a geojson object within a list: 

for bbox in missing_bbox_list:    
    print("Getting rivers for: " + str(bbox))
    for attempt in range(10):                
        try: 
            overpass_query= template.format(time_out=time_outs[attempt],bbox=bbox)
            response = requests.get(overpass_url, params={'data': overpass_query,}, headers = {'User-agent': 'wriuser'})
            data = response.json()  
            geojson= osm2geojson.json2geojson(data)
            break
        except: 
            print("error " + str(response.status_code) + "- retrying in 3 minutes")
            time.sleep(180)                
    else:
        print("unable to fetch data for" + str(bbox))
        break            
    for way in geojson['features']: 
        rivers.append(way)
    print(str(len(geojson['features'])) + " rivers added. Total rivers: " + str(len(rivers)))

Getting rivers for: (40, 0, 45, 10)
8445 rivers added. Total rivers: 8445
Getting rivers for: (45, 0, 50, 10)
27298 rivers added. Total rivers: 35743
Getting rivers for: (40, 10, 45, 20)
3042 rivers added. Total rivers: 38785
Getting rivers for: (45, 10, 50, 20)
11924 rivers added. Total rivers: 50709


In [6]:
# for each river, convert the geojson object to an object that can be read by fiona and write to a shapefile
schema = {'geometry': 'LineString', 'properties': {'Name':'str:80'}}
shapeout = "missing_osm_river_names.shp"
with fiona.open(shapeout, 'w',crs=from_epsg(4326),driver='ESRI Shapefile', encoding='utf-8', schema=schema) as output:
    for way in rivers:
        # the shapefile geometry use (lon,lat) 
        line = {'type': 'LineString', 'coordinates': way['geometry']['coordinates']}
        prop = {'Name': way['properties']['tags']['name']}
        try:
            output.write({'geometry': line, 'properties':prop})  
        except:
            print("could not add " + prop["Name"])

could not add Rio Mare Foghe


In [7]:
schema = {'geometry': 'LineString', 'properties': {'Name':'str:80'}}
shapeout = "missing_osm_river_names.shp.geojson"
with fiona.open(shapeout, 'w',crs=from_epsg(4326),driver='GeoJSON', encoding='utf-8', schema=schema) as output:
    for way in rivers:
        # the shapefile geometry use (lon,lat) 
        line = {'type': 'LineString', 'coordinates': way['geometry']['coordinates']}
        prop = {'Name': way['properties']['tags']['name']}
        try:
            output.write({'geometry': line, 'properties':prop})  
        except:
            print("could not add " + prop["Name"])

could not add Rio Mare Foghe


The original OSM river dataset and the smaller European dataset were combined into one shapefile. See code below:

In [9]:
# load in the two OSM river datasets
to_add_gdf = gdp.read_file("missing_osm_river_names.shp")
rivers_gdf = gdp.read_file("merged_osm_river_names.shp")

In [10]:
# combine the datasets
rivers_gdf = pd.concat([to_add_gdf, rivers_gdf], ignore_index=True)

In [12]:
# save as an shapefile
rivers_gdf.to_file(driver = 'ESRI Shapefile', encoding = 'utf-8', filename = 'osm_rivers_final.shp')