## Step 2 Network Reconciliation
---
### This the second of five steps to running BikewaySim
1. Process network spatial data into a routable network graph format
2. __Reconcile networks into one through node and link overlap conflation__
3. Create final network graph and calculate link costs
4. Create OD tables
5. Run BikewaySim

In this step, the networks are conflated to each other by utilizing functions in the network_conflation.py module. 

There are five main functions in the conflation tools module:
1. Match nearest points between base and join networks
1. Split base links by joining network node
1. Add network attributes by link overlap
1. Add join links/nodes that aren't in base network
1. Resolve reference IDs

The final step combines the different types of networks.

## Import Packages

In [1]:
import os
from pathlib import Path
import time
import geopandas as gpd
import pickle

#from conflation_tools import *
from network_reconcile import *

### Quick Method (just use overlap)
For a quick reconcile, just run the add_attributes function. For this example, the osm network will serve as the base network, and the here road network will be used to add additional road attributes on speed, the number of lanes, etc. The HERE attributes will only be added to the OSM road layer to minimize incorrect matches. (NOTE: attribute matches will need to be QA/QC'd, this function just serves to populate a base network with the most likely match)
   
In addition, a non-network geojson file of the Atlanta Regional Comission's Regional Bikeway Inventory 2022 will be used to add additional info to the network.

In [2]:
studyarea_name = 'bikewaysim'
working_dir = Path.home() / Path(f'Documents/NewBikewaySimData')

#import
osm = gpd.read_file(working_dir / Path(f'{studyarea_name}/filtered.gpkg'),layer='osm_links_road')
osm_bike = gpd.read_file(working_dir / Path(f'{studyarea_name}/filtered.gpkg'),layer='osm_links_bike')
here = gpd.read_file(working_dir / Path(f'{studyarea_name}/filtered.gpkg'),layer='here_links_road')
arc_bike = gpd.read_file(working_dir / Path('Data/ARC/Regional_Bikeway_Inventory_2022.geojson')).to_crs('epsg:2240')

ValueError: Null layer: 'here_links_road'

## Merge attribute data
These are custom functions made for each network that sort and re-fine the columns to avoid adding excess columns.

In [3]:
osm = add_osm_attr(osm,working_dir / Path(f'{studyarea_name}/osm.pkl'))
osm_bike = add_osm_attr(osm_bike,working_dir / Path(f'{studyarea_name}/osm.pkl'))
here = add_here_attr(here,working_dir / Path(f'{studyarea_name}/here.pkl'))
arc_bike = add_arc_bike(arc_bike)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


## transfer attribute data

In [14]:
#add here road attributes to osm road
outputs = add_attributes(osm, here, 'here', 25, .9)

#add back in osm bike
combined = pd.concat([osm_bike,outputs])

#add arc bike attributes to osm
final = add_attributes(combined, arc_bike, 'arc_bike', 50, .9, dissolve=False)

Dissolving by 5 columns


Export

In [15]:
final.to_file(Path.home() / 'Downloads/test.gpkg',layer='comb_test')

# Lab 9 Conflation

In [160]:
# working_dir = Path.home() / Path("Downloads/Shortest Path Lab")
# sidewalk_links = gpd.read_file(working_dir / Path('lab_files/sidewalk_network/sidewalks.shp')).to_crs('epsg:2240')
# crosswalks = gpd.read_file(working_dir / Path('lab_files/sidewalk_network/crosswalks.shp')).to_crs('epsg:2240')
# sidewalk_connectors = gpd.read_file(working_dir / Path('Lab Files/lab_9.gpkg'),layer='sidewalk_connectors')
# #sidewalk_nodes = gpd.read_file(working_dir / Path('lab_files/sidewalk_network/sidewalk_nodes.shp')).to_crs('epsg:2240')

# osm_links = gpd.read_file(working_dir / Path('Lab Files/networks.gpkg'),layer='osm links')
# #osm_nodes = gpd.read_file(working_dir / Path('Lab Files/networks.gpkg'),layer='osm nodes')
# #osm_nodes = osm_nodes[osm_nodes['osm_A'].append(osm_nodes['osm_B'].dropna().drop_duplicates().to_list())]

In [161]:
#sidewalk_nodes['id'] = 'SD' + sidewalk_nodes['ID'].astype(str)
#sidewalk_nodes[['id','geometry']].to_file(working_dir / Path('lab9.gpkg'),layer='sidewalk_nodes')

In [162]:
#osm_nodes = osm_nodes[osm_nodes['osm_N'].isin(osm_links['osm_A'].append(osm_links['osm_B']).dropna().drop_duplicates().to_list())]
#osm_links.loc[osm_links['type'].isna(),'type'] = "connector"

In [163]:
#osm_nodes['id'] = 'OSM' + osm_nodes['osm_N'].astype(str)
#osm_nodes[['id','geometry']].to_file(working_dir / Path('lab9.gpkg'),layer='osm_nodes')

In [168]:
# #import
# sidewalk_nodes = gpd.read_file(working_dir / Path('lab9.gpkg'),layer='sidewalk_nodes')
groceries = gpd.read_file(working_dir / Path('lab9.gpkg'),layer='groceries')
origins = gpd.read_file(working_dir / Path('lab9.gpkg'),layer='origins')
# osm_nodes = gpd.read_file(working_dir / Path('lab9.gpkg'),layer='osm_nodes')

# all_nodes = pd.concat([sidewalk_nodes,osm_nodes,origins,groceries])
# all_nodes['lab9_N'] = all_nodes['id']

In [166]:

# #add ref ids to sidewalk connectors from sidewalk nodes
# from network_filter import add_ref_ids
# sidewalk_links = add_ref_ids(sidewalk_links,all_nodes,'lab9')
# crosswalks = add_ref_ids(crosswalks,all_nodes,'lab9')
# sidewalk_connectors = add_ref_ids(sidewalk_connectors,all_nodes,'lab9')
# osm_links = add_ref_ids(osm_links,all_nodes,'lab9')

In [167]:
# from network_filter import add_ref_ids
# dfs = {
#     'sidewalk_links':sidewalk_links,
#     'crosswalks':crosswalks,
#     'sidewalk_connectors':sidewalk_connectors,
#     'osm_links':osm_links
# }

# for key in dfs.keys():
#     df = dfs[key]
#     df = add_ref_ids(df,all_nodes,'lab9')
#     df['A'] = df['lab9_A']
#     df['B'] = df['lab9_B']
#     df['A_B'] = df['A'] + '_' + df['B']
#     df.to_file(working_dir / Path('lab9.gpkg'),layer=key)


  arr = construct_1d_object_array_from_listlike(values)
  arr = construct_1d_object_array_from_listlike(values)


Reference IDs successfully added to links.


  arr = construct_1d_object_array_from_listlike(values)
  arr = construct_1d_object_array_from_listlike(values)


Reference IDs successfully added to links.


  arr = construct_1d_object_array_from_listlike(values)
  arr = construct_1d_object_array_from_listlike(values)


Reference IDs successfully added to links.


  arr = construct_1d_object_array_from_listlike(values)
  arr = construct_1d_object_array_from_listlike(values)


Reference IDs successfully added to links.


In [186]:
import itertools

combs = list(itertools.product(origins['id'],groceries['id']))
df = pd.DataFrame.from_records(combs,columns=['ori','dest'])
df = pd.merge(df,origins,left_on='ori',right_on='id')
df = gpd.GeoDataFrame(df,geometry='geometry',crs='epsg:2240').to_crs('epsg:4326')
df['ori_lat'] = df.geometry.y
df['ori_lon'] = df.geometry.x
df.drop(columns=['id','geometry'],inplace=True)

df = pd.merge(df,groceries,left_on='dest',right_on='id')
df = gpd.GeoDataFrame(df,geometry='geometry',crs='epsg:2240').to_crs('epsg:4326')
df['dest_lat'] = df.geometry.y
df['dest_lon'] = df.geometry.x
df.drop(columns=['id','geometry'],inplace=True)

df['trip_id'] = df['ori'] + '_' + df['dest']

df.to_csv(working_dir / Path('trips.csv'),index=False)

### Links and Nodes to Conflate
Determine what you want the base and join network to be. All of the links and nodes from the base network will be present in the final network.

For this project, OSM served as the base followed by HERE and ABM as the joining. Only the road + bike networks created in the first step was used for conflation.

In [11]:
base_name = 'osm'
join_name = 'here'
study_area = 'bikewaysim'

### Road Link Conflation

I like to remove all of the columns that aren't related to node_id or geometry for this step. To make sure we preserve link information I also make a A_B column

In [None]:
base_links, base_nodes = import_network(base_name,'road',study_area)
join_links, join_nodes = import_network(join_name,'road',study_area)

#initialize the base network
base_links, base_nodes = initialize_base(base_links, base_nodes, join_name)

### Node Matching
This function matches nodes within a set tolerence (in CRS units) that are likely to be the same nodes. This function is intended for matching road intersections or road termini since these are likely to be in both networks. This function can be applied with an iteratively increasing tolerance if you're not sure what's a good tolerance. At some point, the number of matched nodes will not increase by much.

The match results will get printed out.

#### NOTE: This function handles duplicate matches (i.e. when two or more nodes share a nearest node in the other network) by selecting the one with the shorter match distance. The duplicates won't be rematched unless you run the matching process again.

#### When looping match function, feed outputs from previous

### Function Inputs
- base_nodes, base_name, join_nodes, join_name # self explanatory
- tolerance_ft: the match tolerance in units of feet
- prev_matched_nodes: geodataframe of the list of currently matched nodes, set to none for first run
- remove_duplicates: if set to 'True' (default), then remove duplicate matches. If set to false, duplicate matches will be returned in the matched_nodes gdf.
- export_error_lines: if set to 'False', a geojson of linestrings visualizing the matches will be written.
- export_unmatched: if you want a geojson of the nodes that didn't match in each network set this to true (False by default).

### Function Outputs
- matched_nodes: a df of matched nodes, just the node ids.
- unmatched_base_nodes: a gdf of the base nodes that weren't matched.
- unmatched_join_nodes: a gdf of the join nodes that weren't matched.

In [None]:
#first match the nodes, can repeat this by adding in previously matched_nodes
tolerance_ft = 25
base_nodes = match_nodes(base_nodes,base_name,join_nodes,join_name,tolerance_ft)

In [None]:
#second iteration example with same tolerance
base_nodes = match_nodes(base_nodes,base_name,join_nodes,join_name,tolerance_ft)

#third iteration example wiht larger tolerance
tolerance_ft = 30
base_nodes = match_nodes(base_nodes,base_name,join_nodes,join_name,tolerance_ft)

### Link Splitting and Add New Links and Nodes
This function will split links in the base network if there's a node in the join network that is within a certain tolerance. This creates new nodes and links on the base network. The original base links are then replaced with these new links/nodes.

#### NOTE: This may create way more links/nodes than neccessary.

It may be wise to consider limiting the kind of join nodes that can split a base link. For instance, OSM has lots of additional links and nodes because there are sidewalks. The nodes used to access these sidewalks will split the base link, which creates additional link where there otherwise would be none. These added links/nodes can slow down computational time for shortest path calculation. However, it could be adventageous in the attribute transfer process.

#### Looping
This function can be looped if unsure what tolerance or nodes to use. 

### Function Inputs
- unmatched_join_nodes: These are the join nodes that weren't matched to base nodes in the previous step
- join_name, base_links, base_name: self-explanatory
- tolerance_ft: the matching tolerance in feet
- export: set to 'True' to get a GeoJSON of new links and nodes that were created

### Function Outputs
- split_lines: a gdf of just the new base links
- split_nodes: a gdf of jsut the new base nodes
- unmathced_join_nodes: a gdf of the join nodes that didn't match

In [None]:
unmatched_join_nodes.head()

In [None]:
#create new node and lines from the base links by splitting lines can repeat after the add_new_links_nodes function
tolerance_ft = 25
split_lines, split_nodes, unmatched_join_nodes = split_lines_create_points(unmatched_join_nodes,
                                                                           join_name,
                                                                           base_links,
                                                                           base_name,
                                                                           tolerance_ft,
                                                                           export = False)
split_lines.head()

In [None]:
#add new links and nodes to the base links and nodes created from split_lines_create_points function
new_links, new_nodes = add_new_links_nodes(base_links, matched_nodes_final, split_lines, split_nodes, base_name)
new_links.head()

### Attribute Transfer
In the previous steps, we found geometric commonalties between the networks. In this step, we want to transfer attribute information from the join network into the base network. Link attributes are based on a link's reference ids, but the current set of links may not have reference ids that correspond to a join network link.

To address this, we buffer the base links and intersect them with the join links. We then measure the length of the resulting linestrings. The attribute information from the join links that have the maximum length (i.e. the maximum amount of overlap with the base link) is tranferred. This ensures that each base link is associated with only one join link's attributes.

### NOTE: The buffer here needs to be smaller
If it's larger, then a longer join node could be selected as the join link with most overlap.

This process will likely change in the future. A different approach might be to look at all the base links with at least one join node in the reference id column, and then look up all the links in the join network associated with that node (there should only be a few). Using other reference node that doesn't have a join node id, the nearest node in that lookup table could be found.

In [None]:
#match attribute information with greatest overlap from joining links
buffer_ft = 30
new_base_links_w_attr = add_attributes(new_links, base_name, join_links, join_name, buffer_ft)
new_base_links_w_attr.head()

### Add rest of features
Now that we've settled the geometric and attribute commonalities between the base and join networks, we can add in the join network features that aren't represented in the base network. This is done using a buffer. If a join link is covered at least 95% by a base link, then it is left out.

In [None]:
#add unrepresented features from joining by looking at the attributes added in previous step for links and the list of matched nodes
added_base_links, added_base_nodes = add_rest_of_features(new_base_links_w_attr,new_nodes,base_name,join_links,join_nodes,join_name)

#create new abmhere column with id and geo
final_links, final_nodes = fin_subnetwork(added_base_links,added_base_nodes,base_name,join_name)

final_links.to_file(rf'processed_shapefiles/conflation/{base_name+join_name}_links.geojson')
final_nodes.to_file(rf'processed_shapefiles/conflation/{base_name+join_name}_nodes.geojson')

### Save as pickle, this is more of a progress save

In [None]:
#pickle.dump(added_base_links, open("processed_shapefiles/conflation/inter/abm_here_road.p","wb"))
#pickle.dump(added_base_nodes, open("processed_shapefiles/conflation/inter/abm_here_road.p","wb"))

In [None]:
### Repeat for OSM
Now that we've resovled ABM and HERE, we can add the second join network.

In [None]:
base_name = "abmhere"
base_links = final_links
base_nodes = final_nodes

join_name = "osm"
join_links = gpd.read_file(r"processed_shapefiles/osm/osm_bikewaysim_road_links.geojson")
join_nodes = gpd.read_file(r"processed_shapefiles/osm/osm_bikewaysim_road_nodes.geojson")

In [None]:
#clean join links (no need to clean base links)
join_links, join_nodes = cleaning_process(join_links,join_nodes,join_name)

In [None]:
#first match the nodes, can repeat this by adding in previously matched_nodes
tolerance_ft = 25
matched_nodes, unmatched_base_nodes, unmatched_join_nodes = match_nodes(base_nodes, base_name, join_nodes, join_name, tolerance_ft, prev_matched_nodes=None)

#join the matched nodes to the base nodes once done with matching
matched_nodes_final = pd.merge(base_nodes, matched_nodes, on = f'{base_name}_ID', how = "left")

In [None]:
#create new node and lines from the base links by splitting lines can repeat after the add_new_links_nodes function
tolerance_ft = 25
split_lines, split_nodes, unmatched_join_nodes = split_lines_create_points(unmatched_join_nodes,
                                                                           join_name,
                                                                           base_links,
                                                                           base_name,
                                                                           tolerance_ft,
                                                                           export = False)
split_lines.head()
split_lines.to_file('processed_shapefiles/conflation/split_lines.geojson')

In [None]:
#add new links and nodes to the base links and nodes created from split_lines_create_points function
new_links, new_nodes = add_new_links_nodes(base_links, matched_nodes_final, split_lines, split_nodes, base_name)
new_links.head()

In [None]:
#match attribute information with greatest overlap from joining links
buffer_ft = 30
new_base_links_w_attr = add_attributes(new_links, base_name, join_links, join_name, buffer_ft)
new_base_links_w_attr.head()

In [None]:
#add unrepresented features from joining by looking at the attributes added in previous step for links and the list of matched nodes
added_base_links, added_base_nodes = add_rest_of_features(new_base_links_w_attr,new_nodes,base_name,join_links,join_nodes,join_name)

#create new abmhere column with id and geo
final_links, final_nodes = fin_subnetwork(added_base_links,added_base_nodes,base_name,join_name)

In [None]:
final_links.to_file(rf'processed_shapefiles/conflation/{base_name+join_name}_links.geojson')
final_nodes.to_file(rf'processed_shapefiles/conflation/{base_name+join_name}_nodes.geojson')

In [None]:
# Bike Subnetworks

In [None]:
#bike layers
bike_links = gpd.read_file(r'processed_shapefiles/here/here_bikewaysim_bike_links.geojson')
bike_nodes = gpd.read_file(r'processed_shapefiles/here/here_bikewaysim_bike_nodes.geojson')
bike_name = 'here'

In [None]:
#clean excess columns
bike_links, bike_nodes = cleaning_process(bike_links,bike_nodes,bike_name)

In [None]:
### Merge with other networks

In [None]:
tolerance_ft = 25
merged_links, merged_nodes = merge_diff_networks(added_base_links, added_base_nodes, 'road', bike_links, bike_nodes, 'bike', tolerance_ft)

In [None]:
### Add reference IDs

In [None]:
# match reference IDs based on all the id in the nodes
refid_base_links = add_reference_ids(merged_links, merged_nodes)

In [None]:
refid_base_links.head()

In [None]:
### Export

In [None]:
refid_base_links.to_file(r'processed_shapefiles\conflation\final_links.geojson', driver = 'GeoJSON')
merged_nodes.to_file(r'processed_shapefiles\conflation\final_nodes.geojson', driver = 'GeoJSON')

In [None]:
## Convert for use in BikewaySim

This last section focusses on making sure that the conflated network is readable by BikewaySim. After this is completed, you can run the Running BikwaySim notebook.

In [None]:
import os
from pathlib import Path
import time
import pandas as pd
import geopandas as gpd
import pickle

#make directory/pathing more intuitive later
file_dir = r"C:\Users\tpassmore6\Documents\BikewaySimData" #directory of bikewaysim network processing code

#change this to where you stored this folder
os.chdir(file_dir)

In [None]:
### Specify filepaths

In [None]:
#filepath for just OSM network
conflated_linksfp
conflated_nodesfp

#filepath for conflated network
#conflated_linksfp = r'processed_shapefiles\conflation\final_links.geojson'
#conflated_nodesfp = r'processed_shapefiles\conflation\final_nodes.geojson'

#filepaths for network attribute data (doesn't have to be a shapefile)
abm_linksfp = r'processed_shapefiles\abm\abm_bikewaysim_base_links.geojson'
here_linksfp = r'processed_shapefiles\here\here_bikewaysim_base_links.geojson'
osm_linksfp = r'base_shapefiles\osm\osm_links_attr.p'

In [None]:
#### Node cleaning and export

In [None]:
#import conflated nodes
conflated_nodes = gpd.read_file(conflated_nodesfp)

#drop the num links columns
conflated_nodes = conflated_nodes.drop(columns=['abm_num_links','here_num_links'])

#create an N column that takes the abm_id if avaiable followed by the here_id
func = lambda row: row['here_ID'] if row['abm_ID'] == None else row['abm_ID']
conflated_nodes['N'] = conflated_nodes.apply(func,axis=1)

#create UTM coords columns
conflated_nodes['X'] = conflated_nodes.geometry.x
conflated_nodes['Y'] = conflated_nodes.geometry.y

#reproject and find latlon
conflated_nodes = conflated_nodes.to_crs(epsg=4326)
conflated_nodes['lon'] = conflated_nodes.geometry.x
conflated_nodes['lat'] = conflated_nodes.geometry.y

#filter
conflated_nodes = conflated_nodes[['N','X','Y','lon','lat','geometry']]

#export
conflated_nodes.to_file(r'processed_shapefiles\prepared_network\nodes\nodes.geojson',driver='GeoJSON')
conlfated_nodes = conflated_nodes.drop(columns=['geometry'])
conflated_nodes.to_csv(r'processed_shapefiles\prepared_network\nodes\nodes.csv')

In [None]:
### Link cleaning and export

In [None]:
#import conflated network
conflated_links = gpd.read_file(conflated_linksfp)

In [None]:
#### Merging function

In [None]:
def merge_network_and_attributes(conflated_links,attr_network,cols_to_keep):
    #find the shared columns between conflated network and attribute network
    shared_cols = list(conflated_links.columns[conflated_links.columns.isin(attr_network.columns)])

    if len(shared_cols) > 2:
        #merge based on shared columns
        conflated_links = pd.merge(conflated_links,attr_network[cols_to_keep + shared_cols],on=shared_cols,how='left')
        print(conflated_links.head(20))
    else:
        print(f'Attr_network columns not in conflated network')
    return conflated_links

In [None]:
#import data with attributes, don't bring in geometry
abm_links = gpd.read_file(abm_linksfp,ignore_geometry=True)

#specify which columns you need
cols_to_keep = ['NAME','SPEEDLIMIT','two_way']

#perform the merge
conflated_links = merge_network_and_attributes(conflated_links,abm_links,cols_to_keep)

#delete data with attributes to free up memory
del(abm_links)

In [None]:
here_links = gpd.read_file(here_linksfp,ignore_geometry=True)

cols_to_keep = ['ST_NAME','DIR_TRAVEL']

conflated_links = merge_network_and_attributes(conflated_links,here_links,cols_to_keep)
del(here_links)

In [None]:
osm_links = pickle.load(open(osm_linksfp,"rb"))

cols_to_keep = ['name']

conflated_links = merge_network_and_attributes(conflated_links,osm_links,cols_to_keep)
del(osm_links)

In [None]:
conflated_links.head()