<center>
    <h1>Verbal Explanation of Spatial Temporal GNNs for Traffic Forecasting</h1>
    <h2>Fetching the Nodes Locations on the Metr-LA Dataset</h2>
</center>

---

In this notebook the street and kilometrage of each node in the dataset are fetched.

Note: the extraction of the geolocation of the nodes is not deterministic, refer to [link](../data/metr-la/structured/node_locations.pkl) for the nodes streets and kilometrages extracted for the experiment.

In [1]:
import sys
import os

# Set the main path in the root folder of the project.
sys.path.append(os.path.join('..'))

In [2]:
# Settings for autoreloading.
%load_ext autoreload
%autoreload 2

In [3]:
from src.utils.seed import set_random_seed

# Set the random seed for deterministic operations.
SEED = 42
set_random_seed(SEED)

# 1 Loading the Data
In this section the data is loaded

In [4]:
import os

BASE_DATA_DIR = os.path.join('..', 'data', 'metr-la')

In [5]:
from src.data.data_extraction import get_adjacency_matrix


# Get the adjacency matrix
adj_matrix_structure = get_adjacency_matrix(
    os.path.join(BASE_DATA_DIR, 'raw', 'adj_mx_metr_la.pkl'))

# Get the header of the adjacency matrix, the node indices and the
# matrix itself.
header, node_ids_dict, adj_matrix = adj_matrix_structure

In [6]:
from src.data.data_extraction import get_locations_dataframe

# Get the dataframe containing the latitude and longitude of each sensor.
locations_df = get_locations_dataframe(
    os.path.join(BASE_DATA_DIR, 'raw', 'graph_sensor_locations_metr_la.csv'),
    has_header=True)

In [7]:
# Get the node positions dictionary.
node_pos_dict = { i: id for id, i in node_ids_dict.items() }

# 2 Fetch the Nodes Locations and Kilometrage
In this section the node locations and kilometrages is fetched.

# 2.1 Fetch the Nodes Roads
The roads of the nodes are fetched by geopy according to their latitude and longitude.

In [8]:
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

geocoder = Nominatim(user_agent='metr-la')

geocode = RateLimiter(
    geocoder.geocode,
    min_delay_seconds=.05,
    return_value_on_exception=None)

In [9]:
import pandas as pd

def get_road(row: pd.Series) -> str:
    data = geocoder.reverse((row['latitude'], row['longitude'])).raw
    road = data['address']['road']
    return road

In [10]:
locations_df['road'] = locations_df.apply(get_road, axis=1)

In [11]:
set(road for road in locations_df['road'].values)

{'Arroyo Seco Parkway',
 'East 4th Street',
 'Foothill Freeway',
 'Franklin Avenue',
 'Glendale Freeway',
 'Golden State Freeway',
 'Hollywood Freeway',
 'Metro G Line Bikeway',
 'North Brand Boulevard',
 'North Western Avenue',
 'North Wilton Place',
 'San Diego Freeway',
 'Santa Ana Freeway',
 'Sherman Way',
 'Silver Lake Boulevard',
 'US 101',
 'Ventura Freeway'}

Fixing wrong assignment of coordinates

In [12]:
for row in locations_df.itertuples():
    if row.road in ['East 4th Street', 'Sherman Way', 'US 101']:
        print(f'{row.road}:',
              'coordinates', 
              f'({row.latitude}, {row.longitude})')

US 101: coordinates (34.06491, -118.25126)
East 4th Street: coordinates (34.04301, -118.21724)
Sherman Way: coordinates (34.20112, -118.47361)
US 101: coordinates (34.1527, -118.3754)
US 101: coordinates (34.15133, -118.37456)


By searching these coordinates, it can be observed that they actually refer to specific highways, hence we manually change this wrongly assigned reference

In [13]:
import pandas as pd


def change_road_name(row: pd.Series) -> str:
    if row['road'] == 'East 4th Street':
        return 'Golden State Freeway'

    elif row['road'] == 'Sherman Way':
        return 'San Diego Freeway'

    elif row['road'] == 'US 101':
        return 'Hollywood Freeway'

    return row['road']

In [14]:
locations_df['road'] = locations_df.apply(change_road_name, axis=1)

In [15]:
set(road for road in locations_df['road'].values)

{'Arroyo Seco Parkway',
 'Foothill Freeway',
 'Franklin Avenue',
 'Glendale Freeway',
 'Golden State Freeway',
 'Hollywood Freeway',
 'Metro G Line Bikeway',
 'North Brand Boulevard',
 'North Western Avenue',
 'North Wilton Place',
 'San Diego Freeway',
 'Santa Ana Freeway',
 'Silver Lake Boulevard',
 'Ventura Freeway'}

## 2.2 Get the Road Beginnings

In order to get the kilometrages of the nodes, the road beginnings are computed from [ArcGIS official website](https://www.arcgis.com/home/webmap/viewer.html?featurecollection=https%3A%2F%2Fgeo.dot.gov%2Fserver%2Frest%2Fservices%2FHosted%2FCalifornia_2018_PR%2FFeatureServer%3Ff%3Djson%26option%3Dfootprints&supportsProjection=true&supportsJSONP=true)

In [16]:
road_beginnings = {
    'Arroyo Seco Parkway': (34.06261, -118.24863),
    'Foothill Freeway': (34.317596, -118.481173),
    'Glendale Freeway': (34.207526, -118.215157),
    'Golden State Freeway': (34.060422, -118.213057),
    'Hollywood Freeway': (34.151159, -118.373795),
    'San Diego Freeway': (34.294368, -118.469921),
    'Santa Ana Freeway': (34.06312,	-118.247073),
    'Ventura Freeway': (34.147306, -118.160817)}

In [19]:
road_beginnings.update({
    'Silver Lake Boulevard': (34.0954, -118.2745),
    'Fletcher Drive': (34.1121, -118.2526),
    'Vine Street': (34.1016, -118.3269),
    'North Western Avenue': (34.1042, -118.3085),
    'North Brand Boulevard': (34.1512, -118.2544)
})

road_beginnings.update({
    'Avenue 43': (34.0909, -118.2116),  # Near intersection with Griffin Ave
    'Fern Lane': (34.1122, -118.2140)   # Small residential road near Mt Washington
})

road_beginnings.update({
    'Franklin Avenue': (34.1032, -118.3276),         # Near Hollywood Blvd intersection
    'Metro G Line Bikeway': (34.1886, -118.4494),    # Near Balboa Blvd in the Valley
    'North Wilton Place': (34.0989, -118.3105)       # Near Melrose Ave in Hollywood
})

## 2.3 Get Nodes Kilometrages and Save Results
Finally, the kilometrage of the nodes is computed by their distance from the coordinates of the beginnings of the roads.

For each node a dictionary is computed containing a tuple with the name of the road it is part of and its kilometrage on the road.

In [20]:
import pandas as pd
from geopy.distance import distance


def get_kilometrage(
    row: pd.Series,
    ) -> float:
    road = row['road']
    road_beginning_coordinates = road_beginnings[road]
    coordinates = (row['latitude'], row['longitude'])
    return distance(road_beginning_coordinates, coordinates).km

In [21]:
locations_df['kilometrage'] = locations_df.apply(get_kilometrage, axis=1)

In [22]:
locations_df.head()

Unnamed: 0_level_0,sensor_id,latitude,longitude,road,kilometrage
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,773869,34.15497,-118.31829,Ventura Freeway,14.547154
1,767541,34.11621,-118.23799,Glendale Freeway,10.345752
2,767542,34.11641,-118.23819,Glendale Freeway,10.327808
3,717447,34.07248,-118.26772,Hollywood Freeway,13.112986
4,717446,34.07142,-118.26572,Hollywood Freeway,13.329056


In [23]:
road_locations = {}

for row in locations_df.itertuples():
    node_id = row.sensor_id
    road = row.road
    kilometrage = row.kilometrage
    road_locations[node_id] = (road, kilometrage)

In [24]:
import os
import pickle

os.makedirs(os.path.join(BASE_DATA_DIR, 'structured'), exist_ok=True)

with open(
      os.path.join(BASE_DATA_DIR, 'structured', 'node_locations.pkl'), 
      'wb') as f:
    pickle.dump(road_locations, f)