<center>
    <h1>Verbal Explanation of Spatial Temporal GNNs for Traffic Forecasting</h1>
    <h2>Fetching the Nodes Locations on the PeMS-Bay Dataset</h2>
</center>

---

In this notebook the street and kilometrage of each node in the dataset are fetched.

Note: the extraction of the geolocation of the nodes is not deterministic, refer to [link](../data/pems-bay/structured/node_locations.pkl) for the nodes streets and kilometrages extracted for the experiment.

In [1]:
import sys
import os

# Set the main path in the root folder of the project.
sys.path.append(os.path.join('..'))

In [2]:
# Settings for autoreloading.
%load_ext autoreload
%autoreload 2

In [3]:
from src.utils.seed import set_random_seed

# Set the random seed for deterministic operations.
SEED = 42
set_random_seed(SEED)

# 1 Loading the Data
In this section the data is loaded

In [5]:
import os

BASE_DATA_DIR = os.path.join('..', 'data', 'pems-bay')

In [7]:
from src.data.data_extraction import get_adjacency_matrix

# Get the adjacency matrix
adj_matrix_structure = get_adjacency_matrix(
    os.path.join(BASE_DATA_DIR, 'raw', 'adj_mx_pems_bay.pkl'))

# Get the header of the adjacency matrix, the node indices and the
# matrix itself.
header, node_ids_dict, adj_matrix = adj_matrix_structure

In [8]:
from src.data.data_extraction import get_locations_dataframe

# Get the dataframe containing the latitude and longitude of each sensor.
locations_df = get_locations_dataframe(
    os.path.join(BASE_DATA_DIR, 'raw', 'graph_sensor_locations_pems_bay.csv'),
    has_header=False)

In [9]:
# Get the node positions dictionary.
node_pos_dict = { i: id for id, i in node_ids_dict.items() }

# 2 Fetch the Nodes Locations and Kilometrage
In this section the node locations and kilometrages is fetched.

# 2.1 Fetch the Nodes Roads
The roads of the nodes are fetched by geopy according to their latitude and longitude.

In [12]:
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

geocoder = Nominatim(user_agent='pems-bay')

geocode = RateLimiter(
    geocoder.geocode,
    min_delay_seconds=.05,
    return_value_on_exception=None)

In [13]:
import pandas as pd

def get_road(row: pd.Series) -> str:
    data = geocoder.reverse((row['latitude'], row['longitude'])).raw
    road = data['address']['road']
    return road

In [14]:
locations_df['road'] = locations_df.apply(get_road, axis=1)

In [15]:
set(road for road in locations_df['road'].values)

{'Almaden Expressway',
 'Bayshore Freeway',
 'CA 17',
 'CA 85',
 'Curtner Avenue',
 'Guadalupe Freeway',
 'Guadalupe Parkway',
 'Highway 87 Bikeway',
 'Hillsdale Avenue',
 'I 280',
 'I 880',
 'Junipero Serra Freeway',
 'Mill Pond Drive',
 'Nimitz Freeway',
 'North Bascom Avenue',
 'North Shoreline Boulevard',
 'Santa Teresa Boulevard',
 'Sinclair Freeway',
 'Southbay Freeway',
 'Story Road',
 'West Julian Street',
 'West Valley Freeway'}

Fixing wrong assignment of coordinates

In [21]:
for row in locations_df.itertuples():
    if row.road in [
        'Story Road',
        'Almaden Expressway',
        'North Shoreline Boulevard',
        'West Julian Street',
        'Highway 87 Bikeway',
        'Santa Teresa Boulevard',
        'Curtner Avenue',
        'North Bascom Avenue',
        'Hillsdale Avenue',
        'Mill Pond Drive']:
        print(f'{row.road}:',
              'coordinates', 
              f'({row.latitude}, {row.longitude})')

Curtner Avenue: coordinates (37.294949, -121.873109)
Almaden Expressway: coordinates (37.255791, -121.875479)
North Shoreline Boulevard: coordinates (37.4118, -122.077964)
Mill Pond Drive: coordinates (37.291715, -121.871799)
Highway 87 Bikeway: coordinates (37.26198, -121.858479)
Mill Pond Drive: coordinates (37.291828, -121.871836)
North Bascom Avenue: coordinates (37.334602, -121.936288)
Story Road: coordinates (37.336089, -121.848437)
Hillsdale Avenue: coordinates (37.279515, -121.864784)
Santa Teresa Boulevard: coordinates (37.255657, -121.858984)
Hillsdale Avenue: coordinates (37.279628, -121.864838)
West Julian Street: coordinates (37.337137, -121.897741)
Highway 87 Bikeway: coordinates (37.262704, -121.858533)


By searching these coordinates, it can be observed that they actually refer to specific highways, hence we manually change this wrongly assigned reference

In [38]:
import pandas as pd


def change_road_name(row: pd.Series) -> str:
    if row['road'] == 'Story Road':
        return 'Bayshore Freeway'

    elif row['road'] == 'Almaden Expressway':
        return 'West Valley Freeway'

    elif row['road'] == 'North Shoreline Boulevard':
        return 'Bayshore Freeway'
    
    elif row['road'] == 'West Julian Street':
        return 'Guadalupe Parkway'

    elif row['road'] == 'Highway 87 Bikeway':
        return 'Guadalupe Parkway'

    elif row['road'] == 'Santa Teresa Boulevard':
        return 'West Valley Freeway'

    elif row['road'] == 'Curtner Avenue':
        return 'Guadalupe Parkway'

    elif row['road'] == 'North Bascom Avenue':
        return 'I 880'

    elif row['road'] == 'Hillsdale Avenue':
        return 'Guadalupe Parkway'

    elif row['road'] == 'Mill Pond Drive':
        return 'Guadalupe Parkway'

    elif row['road'] == 'Guadalupe Freeway':
        return 'Guadalupe Parkway'
    
    elif row['road'] == 'Nimitz Freeway':
        return 'I 880'
    
    elif row['road'] == 'Junipero Serra Freeway':
        return 'I 280'

    return row['road']

In [39]:
locations_df['road'] = locations_df.apply(change_road_name, axis=1)

In [40]:
set(road for road in locations_df['road'].values)

{'Bayshore Freeway',
 'CA 17',
 'CA 85',
 'Guadalupe Parkway',
 'I 280',
 'I 880',
 'Sinclair Freeway',
 'Southbay Freeway',
 'West Valley Freeway'}

## 2.2 Get the Road Beginnings

In order to get the kilometrages of the nodes, the road beginnings are computed from [ArcGIS official website](https://www.arcgis.com/home/webmap/viewer.html?featurecollection=https%3A%2F%2Fgeo.dot.gov%2Fserver%2Frest%2Fservices%2FHosted%2FCalifornia_2018_PR%2FFeatureServer%3Ff%3Djson%26option%3Dfootprints&supportsProjection=true&supportsJSONP=true)

In [41]:
road_beginnings = {
    'Bayshore Freeway': (37.767128809088646, -122.40547364738181),
    'CA 17': (37.32030129169095, -121.9404749990039),
    'CA 85': (37.40459857894696, -122.06998937731801),
    'Guadalupe Parkway': (37.36982274309332, -121.92638659077168),
    'I 280': (37.7721327422648, -122.3980151051885),
    'I 880': (37.814745629742546, -122.3008290731509),
    'Sinclair Freeway': (37.463262229915564, -121.90336058919574),
    'Southbay Freeway': (37.38038205127383, -122.07299188639912),
    'West Valley Freeway': (37.40516281598811, -122.06996550808908),
    }

## 2.3 Get Nodes Kilometrages and Save Results
Finally, the kilometrage of the nodes is computed by their distance from the coordinates of the beginnings of the roads.

For each node a dictionary is computed containing a tuple with the name of the road it is part of and its kilometrage on the road.

In [42]:
import pandas as pd
from geopy.distance import distance


def get_kilometrage(
    row: pd.Series,
    ) -> float:
    road = row['road']
    road_beginning_coordinates = road_beginnings[road]
    coordinates = (row['latitude'], row['longitude'])
    return distance(road_beginning_coordinates, coordinates).km

In [43]:
locations_df['kilometrage'] = locations_df.apply(get_kilometrage, axis=1)

In [44]:
locations_df.head()

Unnamed: 0_level_0,sensor_id,latitude,longitude,road,kilometrage
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,400001,37.364085,-121.901149,Bayshore Freeway,63.13692
1,400017,37.253303,-121.94544,West Valley Freeway,20.145852
2,400030,37.359087,-121.906538,I 880,61.403403
3,400040,37.294949,-121.873109,Guadalupe Parkway,9.557508
4,400045,37.363402,-121.902233,I 880,61.226829


In [45]:
road_locations = {}

for row in locations_df.itertuples():
    node_id = row.sensor_id
    road = row.road
    kilometrage = row.kilometrage
    road_locations[node_id] = (road, kilometrage)

In [47]:
import os
import pickle

os.makedirs(os.path.join(BASE_DATA_DIR, 'structured'), exist_ok=True)

with open(
      os.path.join(BASE_DATA_DIR, 'structured', 'node_locations.pkl'), 
      'wb') as f:
    pickle.dump(road_locations, f)