# Getting distances between stations
A short intro to `src/distanceutils.py`

Import the distanceutils to get some tools for... distances! You can calculate _geospatial_ distance between stations as well as average _car driving times_ between them.

## Getting started
Apart of some imports, we need a specially prepared dataframe with station information to store distances and durations later on.

In [1]:
import src.distanceutils as du

# for a quick start, you can load an example station csv-file
# (130 stations from Duesseldorf an surroundings) into a dataframe

my_stations = du.load_station_file()
my_stations.head()

Unnamed: 0.1,Unnamed: 0,uuid,name,brand,street,house_number,post_code,city,latitude,longitude,first_active,openingtimes_json,file_date
0,115,64ec825f-740c-477a-b7c2-2121154ee8a7,TotalEnergies Erkrath,TotalEnergies,Kirchstr.,23.0,40699,Erkrath,51.223767,6.916481,2014-03-18 16:45:31+01,"{""openingTimes"":[{""applicable_days"":63,""period...",2023-05-06
1,260,fd99c048-3b6b-4943-8b93-838daefba76b,Shell Duesseldorf Karlsruher Str. 45,Shell,Karlsruher Str.,45.0,40229,Duesseldorf,51.197512,6.841084,2014-03-18 16:45:31+01,{},2023-05-06
2,393,1241bc5a-5571-4cee-bce0-d0ab82000d8c,Aral Tankstelle,ARAL,Südring,115.0,40221,Düsseldorf,51.20101,6.763259,2014-03-18 16:45:31+01,{},2023-05-06
3,451,127035c1-a7c7-41db-9976-ab4cd14b7271,Aral Tankstelle,ARAL,Engelbertstraße,,41462,Neuss,51.207104,6.671111,2014-03-18 16:45:31+01,"{""openingTimes"":[{""applicable_days"":31,""period...",2023-05-06
4,462,d7376c09-b449-4948-8706-15777971f03c,Aral Tankstelle,ARAL,Frankfurter Straße,323.0,40595,Düsseldorf,51.138004,6.904963,2014-03-18 16:45:31+01,{},2023-05-06


---

**Important**: The station list must have `uuid`, `latitude` and `longitude` columns. You will run into exceptions if your station dataframe fails to meet this requirement.

---

In [2]:
# we shorten the list - for now it's just about showing the principles
my_stations = my_stations.loc[:2]
my_stations.shape

(3, 13)

### Getting a _station matrix_
First thing to do is always to create a n*n matrix with all station uuids as index and as columns. This is done via the `create_station_matrix` method. It will create the needed structure as long as all needed columns (`uuid`, `latitude` and `longitude`) are present. Otherwise, it will throw an exception. 

In [3]:
#
# create a station matrix to fill up later
#
my_station_matrix = du.create_station_matrix(my_stations)
my_station_matrix.head()


Adding station columns to matrix: 100%|██████████| 3/3 [00:00<00:00, 2227.06it/s]


Unnamed: 0_level_0,longitude,latitude,64ec825f-740c-477a-b7c2-2121154ee8a7,fd99c048-3b6b-4943-8b93-838daefba76b,1241bc5a-5571-4cee-bce0-d0ab82000d8c
uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
64ec825f-740c-477a-b7c2-2121154ee8a7,6.916481,51.223767,0,0,0
fd99c048-3b6b-4943-8b93-838daefba76b,6.841084,51.197512,0,0,0
1241bc5a-5571-4cee-bce0-d0ab82000d8c,6.763259,51.20101,0,0,0


Note that `uuid` became a dataframe index - this makes distance/duration lookups way easier later on.

## Geospatial distances
With a proper stations matrix at hand, we can fill it up with geospatial distances (in km). Just call `create_distance_matrix` on it.

In [4]:
#
# calculating distances and filling up the matrix
#
my_distance_matrix = du.create_distance_matrix(my_station_matrix)
my_distance_matrix.head()

Calculating distances for stations: 100%|██████████| 3/3 [00:00<00:00, 410.00it/s]


Unnamed: 0_level_0,longitude,latitude,64ec825f-740c-477a-b7c2-2121154ee8a7,fd99c048-3b6b-4943-8b93-838daefba76b,1241bc5a-5571-4cee-bce0-d0ab82000d8c
uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
64ec825f-740c-477a-b7c2-2121154ee8a7,6.916481,51.223767,0.0,6.008945,10.96877
fd99c048-3b6b-4943-8b93-838daefba76b,6.841084,51.197512,6.008945,0.0,5.436489
1241bc5a-5571-4cee-bce0-d0ab82000d8c,6.763259,51.20101,10.96877,5.436489,0.0


You can now look up the distance between two stations using their uuids:


In [6]:
#
# example distance lookup
#
origin = "64ec825f-740c-477a-b7c2-2121154ee8a7"
destination = "fd99c048-3b6b-4943-8b93-838daefba76b"
distance = my_distance_matrix.loc[origin, destination]

print(distance)

6.008945120260507


## Driving time durations
Durations are pulled via the Matrix API call of [openrouteservice.org](https://openrouteservice.org/dev/#/api-docs/matrix). The limits of the free service are 500 request per day and max. 40 requests per second.

Getting driving time durations works the same way as getting the distances. The method to use is called `create_duration_matrix`.

For up to 50 stations the method will pull a complete duration matrix from openrouteservice.org in just one single (and very fast) API call. For more than 50 stations, the method reverts to doing as many single calls as there are stations. Currently, the method has a **hard-coded limit of 250 stations** to avoid consuming all available API call with just one shot.

In [4]:
#
# calculating durations and filling up the matrix
#
my_duration_matrix = du.create_duration_matrix(my_station_matrix)
my_duration_matrix.head()

The ORS API reported the error: 403 ({'error': 'Access to this API has been disallowed'})
No durations have been filled into your matrix.


Unnamed: 0_level_0,longitude,latitude,64ec825f-740c-477a-b7c2-2121154ee8a7,fd99c048-3b6b-4943-8b93-838daefba76b,1241bc5a-5571-4cee-bce0-d0ab82000d8c
uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
64ec825f-740c-477a-b7c2-2121154ee8a7,6.916481,51.223767,0,0,0
fd99c048-3b6b-4943-8b93-838daefba76b,6.841084,51.197512,0,0,0
1241bc5a-5571-4cee-bce0-d0ab82000d8c,6.763259,51.20101,0,0,0


Note that the API really calculates driving durations: getting from A to B will not result in the same duration as getting from B to A.

## That's it.