# Integration
Where can I go from here? Can a passenger efficiently transfer from one line to another?

Stops are considered integrated if they are in a walkable radius from other desirable stops.
In the original study, every stop was scored based on its mean distance to all other stops. Here, only stops in a 400m radius are considered. Furthermore, the quality of each stop is considered -- how many lines are accessible from that 400m distance?

In [2]:
%load_ext autoreload
%autoreload 2

from scripts.preprocessing import stop_data, gtfs
import pandas as pd
import geopandas as gpd
from scipy.spatial import distance_matrix, distance
from Transit_Quality_Study.transit_quality_study.config import EPSG, processed_data_dir

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Initializing GTFS feed...
Processing GTFS files...
Parquet files found. Loading parquet files.
Reading census files...
Success.
Census files merged.


In [3]:
# Gather relevant information
stop_integration = stop_data[['route_id', 'geometry']] # We don't use lon and lat on purpose
stop_integration

Unnamed: 0_level_0,route_id,geometry
stop_code,Unnamed: 1_level_1,Unnamed: 2_level_1
10118,[1],POINT (-73.60312 45.44647)
10120,[1],POINT (-73.59324 45.45116)
10122,[1],POINT (-73.58169 45.45701)
10124,[1],POINT (-73.57202 45.45944)
10126,[1],POINT (-73.56707 45.46189)
...,...,...
62373,"[380, 171, 164, 135]",POINT (-73.67293 45.54732)
62374,"[380, 171, 164, 135, 180]",POINT (-73.67909 45.53755)
62375,[69],POINT (-73.67161 45.55063)
62376,[100],POINT (-73.69387 45.47644)


First, we have to project longitudes and latitudes to flat x and y coordinates. Thankfully, this is integrated into geopandas. The EPSG we use is 2950, used for mapping surveys in Quebec and Ontario.

In [4]:
stop_integration = stop_integration.geometry.to_crs(epsg=EPSG)

The last iteration of this study used two nested loops to compare the distance from every stop to every other stop on the network. With 9000 stops in this dataframe, we had 81,000,000 operations!
Thankfully, I have decided to use matrices this time around.

In [5]:
# First, we have to turn the points to coordinates:
coords = stop_integration.get_coordinates()

# Then we can create the distance matrix
dist_matrix = distance_matrix(coords, coords)
dist_matrix

array([[    0.        ,   932.01509465,  2044.95154727, ...,
        12753.40122359,  7839.79604963, 16478.13967749],
       [  932.01509465,     0.        ,  1113.16960098, ...,
        12637.4274882 ,  8355.5630757 , 17160.48038891],
       [ 2044.95154727,  1113.16960098,     0.        , ...,
        12554.51147519,  9033.77920419, 17982.96592508],
       ...,
       [12753.40122359, 12637.4274882 , 12554.51147519, ...,
            0.        ,  8426.85758038, 13939.70965239],
       [ 7839.79604963,  8355.5630757 ,  9033.77920419, ...,
         8426.85758038,     0.        ,  9144.24369784],
       [16478.13967749, 17160.48038891, 17982.96592508, ...,
        13939.70965239,  9144.24369784,     0.        ]],
      shape=(8791, 8791))

For the record the original code took five minutes to run.

Next, we only keep distances equal or less than 400m -- the standard distance considered walkable.
We don't need the distances, simply the stops.

In [6]:
walkable_dist = 400
walkable_stops = dist_matrix <= walkable_dist
walkable_stops

array([[ True, False, False, ..., False, False, False],
       [False,  True, False, ..., False, False, False],
       [False, False,  True, ..., False, False, False],
       ...,
       [False, False, False, ...,  True, False, False],
       [False, False, False, ..., False,  True, False],
       [False, False, False, ..., False, False,  True]],
      shape=(8791, 8791))

We also want a binary matrix between routes and stops:

In [7]:
# Get exploded list of route_id served at each stop
stop_routes = stop_data.explode('route_id')['route_id']
# Finally, we make a pivot table, and convert it to a binary matrix:
route_matrix = (pd.crosstab(stop_routes.index, stop_routes, dropna=False) > 0).to_numpy()
route_matrix

array([[ True, False, False, ..., False, False, False],
       [ True, False, False, ..., False, False, False],
       [ True, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False,  True, ..., False, False, False],
       [False, False, False, ..., False, False, False]], shape=(8791, 213))

Now, by multiplying the two matrices together, we can get the number of routes for each stop in a walkable distance!!!

In [8]:
stop_walkable_routes = (walkable_stops @ route_matrix) > 0
stop_walkable_routes = pd.DataFrame(stop_walkable_routes.sum(axis=1), index = stop_integration.index)
stop_walkable_routes.columns = ['walkable_routes']
stop_walkable_routes

Unnamed: 0_level_0,walkable_routes
stop_code,Unnamed: 1_level_1
10118,12
10120,4
10122,4
10124,6
10126,8
...,...
62373,7
62374,7
62375,7
62376,1


In [9]:
# Shipping it off
stop_walkable_routes.to_parquet(processed_data_dir / 'stop_walkable_routes.parquet')

Note on integration:

It cannot be understated that this can be taken even further:
$$
T = R^T \times W \times R
$$

Where $R$ is the route matrix and $W$ is the walkable stop matrix,

$T$ represents the integration of the *whole route*, such the sum can tell us the number of possible transfers on a line.


Unfortunately, it is beyond the scope of this project... for now.