## OSM Network Download
Start here. OSM data is downloaded from Geofabrik in the '.pbf' format to ensure that all tags are retrieved. The OSM data are filtered to only transportation ways and nodes with `osmium` and processed into a routable network using `OSMnx`. Lastly, OSM cycling infrastructure is extracted.

In [1]:
import geopandas as gpd
import osmnx as ox
from shapely.ops import LineString, Point
import pandas as pd
import shutil
from pathlib import Path

#custom
from bikewaysim import general_utils
from bikewaysim.paths import config
from bikewaysim.network import osm_download_functions

## Study Area from Bounding Box (if not already done)
- Use https://boundingbox.klokantech.com
- Copy paste the GeoJSON coordinates into the function below

In [2]:
studyarea = general_utils.bbox_to_gdf([[[-84.4101863402,33.7560977321],[-84.3469908431,33.7560977321],[-84.3469908431,33.8011839015],[-84.4101863402,33.8011839015],[-84.4101863402,33.7560977321]]])
studyarea.to_file(config['project_fp']/'studyarea.geojson')

## Download Geofabrik Extract(s)
Use below cell to download Geofabrik '.pbf' extract(s) for desired US state(s) for the specified year provided in 'YY' format (these should already be specified in your `config.json` file). Type 'current' instead of year to get the most current extract. Alternatively, provide your own extract.

**NOTE:**
- If you change which year you used or downloaded a more current extract, be sure to delete the old ones
- If downloading multiple states put the names in all lowercase (full name) seperated by commas

In [3]:
# osm_download_functions.download_geofabrik(config['geofabrik_state'],config['geofabrik_year'],config['geofabrik_fp'])

## Import / export the study area for osmium script

In [4]:
studyarea_geo = gpd.read_file(config['studyarea_fp'])

# turn into a bounding box
studyarea_geo = gpd.GeoDataFrame(data={'geometry':general_utils.bounds_to_polygon(studyarea_geo)},index=[0],geometry='geometry',crs=studyarea_geo.crs)
studyarea_geo.to_crs('epsg:4326',inplace=True)

#export to geojson for the script steps
studyarea_geo.to_file(config['geofabrik_fp']/'studyarea.geojson')

## Osmium Scripts
In these next steps, we'll have to exit the notebook to run some lines of code in Command Prompt or Terminal. An example `.bat` / `.sh` script has been prepared for using `osmium-tool` to process the OSM data.

- Change directory to where you downloaded the geofrabrik extracts:
    - On Windows: `chdir directory/to/geofabrik_extracts.pbf`
    - On MacOS/Linux: `cd directory/to/geofabrik_extracts.pbf`
- If windows run the `osmium_script.bat` script
    - type `osmium_script.bat`
- If mac/linux run the `osmium_script.sh` script
    - type `chmod +x osmium_script.sh`    
    - type `bash osmium_script.sh`

**NOTE: If repeating this step, be sure to delete the existing "merged.osm" and merged.geojson" files or else you will get an overwrite error**

In [5]:
# copy paste the osmium scripts into the geofabrik folder
shutil.copy(Path.cwd() / 'osmium_scripts/export-config.json',config['geofabrik_fp'])
shutil.copy(Path.cwd() / 'osmium_scripts/osmium_script.bat',config['geofabrik_fp'])
shutil.copy(Path.cwd() / 'osmium_scripts/osmium_script.sh',config['geofabrik_fp'])

'/Users/tannerpassmore/Documents/BikewaySim/RAW/geofabrik/osmium_script.sh'

## Use OSMnx to process the raw '.osm' version into a network graph

In [6]:
G = ox.graph.graph_from_xml(config['geofabrik_fp']/'merged.osm',simplify=False,retain_all=False)

#simplify graph unless different osm ids
#can change columns to change this behavior (i.e., )
G = ox.simplification.simplify_graph(G, edge_attrs_differ=['osmid'])

## Convert graph to links
Note that OSMnx creates three columns to identify the new links: u, v, key.
- u: starting node
- v: ending node
- key: number assigned if are multiple links with the same u and v

In [7]:
links = ox.convert.graph_to_gdfs(G,nodes=False)
links.reset_index(inplace=True)

#project links
links.to_crs(config['projected_crs_epsg'],inplace=True)

#drop reverse links
links = links[links['reversed']==False]

# re-calculate the length of the links using the new geometry
links['length_ft'] = links.length

#remove loops as we can't use these for routing unless we split the self loop in half
print((links['u'] == links['v']).sum(),'self-loops in the network')
links = links[links['u'] != links['v']]

70 self-loops in the network


## Create a raw gpkg version of the OSM data.

In [8]:
osm_extract = config['geofabrik_fp']/'merged.geojson'

#include these in the main dataframe columns
include_tags = ['@id','@timestamp','@version','@type',
                'highway','oneway','name',
                'bridge','tunnel',
                'cycleway','service',
                'footway','sidewalk',
                'bicycle','foot','access','area','surface']
#remove these from the all tags dict
remove_tags = ['@id','@timestamp','@version','@type']
raw_links, raw_nodes = osm_download_functions.import_raw_osm_from_geojson(osm_extract,include_tags,remove_tags)

# returns a dict of the node sequence for each way (used for elevation)
line_node_ids = osm_download_functions.get_way_node_seq(raw_links)

# deletes the node sequence from the all tags field
raw_links['all_tags'] = raw_links['all_tags'].apply(lambda x: {key:item for key,item in x.items() if key != '@way_nodes'})

## Filter the raw links to remove disconnected features and self-loops

In [9]:
raw_links = raw_links[raw_links['osmid'].isin(set(links['osmid'].tolist()))]

## Get start and end node distances
Get distance from start of OSM way for the start node and end node of each new OSM edge. In a few cases, the end point will start before the start point because the edge loops back on itself.

In [10]:
from importlib import reload
reload(osm_download_functions)
links = osm_download_functions.add_start_end_dists(links,raw_links,line_node_ids)

## Add attributes from raw links to osmnx links

In [11]:
# add attributes from the raw links
links = pd.merge(links[['u','v','osmid','length_ft','start_dist','end_dist','geometry']],raw_links.drop(columns=['geometry']),on="osmid")

# sort values so it's the same order everytime we import
links.sort_values(['u','v','osmid','length_ft'],inplace=True)

# assign a unique linkid in sequential order
links['linkid'] = range(0,len(links))

## Export

In [12]:
links.to_file(config['network_fp']/f"osm.gpkg",layer='edges')
raw_links.to_file(config['network_fp']/f'osm.gpkg',layer='raw')
raw_nodes.to_file(config['network_fp']/f'osm.gpkg',layer='highway_nodes')