Final Project

SBBG - Serbian Bundesbahnen Belgrade

Executive Summary

This project is a robust journey planner for the Swiss Federal Railways (SBB). The planner calculates the fastest routes between stops, providing a list of routes with confidence levels. The final deliverable includes a video presentation and the project code.

Problem Description

Given a departure stop, an arrival stop, an arrival time, an integer k, and a confidence level, our algorithm returns the k best routes from stop A to stop B that arrive before the given arrival time. The returned routes all have a probability of arriving on time greater than the confidence level given as input.

Data Model

Graph

We use a directed graph with multiple edges between node pairs.

Nodes: Stops
Edges: walk connections (provided by SBB), walk connections that we computed ourselves (in the preprocessing step), and public transport connections stored with their attributes
Edge attributes: duration, type, arrival time, departure time, and mean predicted delay (obtained using our prediction model).

Distance Matrix

Stores the distance between every pair of stops. This is useful because stops sometimes have different names or IDs and sometimes appear as different stops with nearly the same coordinates.

Routing Algorithm

A detailed description of the algorithm's working principle is available in the route_planning.py file. This includes implementation details and assumption justifications.

k-Shortest Paths Algorithm:

We use Backward Dijkstra to find the k shortest paths according to a cost metric.

Backward: We have a hard constraint on the arrival time, so we start from the arrival node and progressively add route sections until we reach the departure node.
K-shortest: We find the k best routes.
Heap Queue: We use a heap queue to store the routes and to pop the ones with the smallest cost first.
We ignore the routes that lead to long waiting times to speed up the algorithm.

Cost:

Our cost is the sum of the waiting time, the moving time, and a penalty for each change. This penalty is equivalent to the burden of 2 additional minutes of travel time. We add it to penalize routes that have a large number of changes, even if they are slightly quicker.

Delay Model

We use the actual arrival and departure times given by SBB.
The delay is modeled using an exponential distribution that we fit for each different transport type.

Limitations and Improvements

We assumed independence between trips regarding their delay.
We restricted our model to the working day and to the working times (6 AM to 8:30 PM).
Our formula to compute the walking time based on the distance "as the crow flies" is not realistic and could be further improved.

Visuals

Presentation video

Link: https://drive.google.com/file/d/1KZbzHCfG1i3dpD718mqquEZrd-bBVuBo/view?usp=sharing

How to use

Open the visualization.ipynb file, and follow the instructions.
The section Human Interface of the visualization.ipynb notebook provides the GUI that can be used to run our application.

Files

All of the code is organized into several Jupyter notebooks and Python files.

data_preprocessing.ipynb: The notebook that creates direct_connection_df that stores stop pairs that are directly connected to each other and walk_transfers that stores stop pairs that can be reachable by foot.
preprocessing_stops.ipynb: Restricting the stops-to-stops and walking transfers tables only to the region of interest, given by the objectid.
delays.ipynb: Notebook for parsing SBB actual arrival/departure times and adding route information with mean arrival delays.
route_planning.py: Python file, includes functions for building a graph, finding the optimal route, delay modeling using exponential distribution and computing the distance matrix between the stops.
visualization.ipynb: The Jupyter notebook, responsible for the postprocessing of the routing algorithm, presents the route with attractive text and graphics. Additionally, it features an intuitive GUI for inputting data into the algorithm.
eval.ipynb: The Jupyter notebook that does the evaluation of the algorithm.
save_spark.ipynb: Helper notebook that allows saving the tables locally.

Data Source

Actual data

For this project we used the data published on the Open Data Platform Mobility Switzerland.

The full description of the data is available in the opentransportdata.swiss data istdaten cookbooks. We have access to the following data:

BETRIEBSTAG: date of the trip
FAHRT_BEZEICHNER: identifies the trip
BETREIBER_ABK, BETREIBER_NAME: operator (name will contain the full name, e.g. Schweizerische Bundesbahnen for SBB)
PRODUCT_ID: type of transport, e.g. train, bus
LINIEN_ID: for trains, this is the train number
LINIEN_TEXT,VERKEHRSMITTEL_TEXT: for trains, the service type (IC, IR, RE, etc.)
ZUSATZFAHRT_TF: boolean, true if this is an additional trip (not part of the regular schedule)
FAELLT_AUS_TF: boolean, true if this trip failed (cancelled or not completed)
HALTESTELLEN_NAME: name of the stop
ANKUNFTSZEIT: arrival time at the stop according to schedule
AN_PROGNOSE: actual arrival time (see AN_PROGNOSE_STATUS)
AN_PROGNOSE_STATUS: method used to measure AN_PROGNOSE, the time of arrival.
ABFAHRTSZEIT: departure time at the stop according to schedule
AB_PROGNOSE: actual departure time (see AN_PROGNOSE_STATUS)
AB_PROGNOSE_STATUS: method used to measure AB_PROGNOSE, the time of departure.
DURCHFAHRT_TF: boolean, true if the transport does not stop there

Each line of the file represents a stop and contains arrival and departure times. When the stop is the start or end of a journey, the corresponding columns will be empty (ANKUNFTSZEIT/ABFAHRTSZEIT). In some cases, the actual times were not measured so the AN_PROGNOSE_STATUS/AB_PROGNOSE_STATUS will be empty or set to PROGNOSE and AN_PROGNOSE/AB_PROGNOSE will be empty.

Timetible data

Timetable data are available from timetable data set.

The timetables are updated weekly. It is ok to assume that the weekly changes are small, and a timetable for a given week is thus the same for the full year - use the schedule of the most recent week for the day of the trip. However, note that public transport services may run at different times or not at all depending on the day of the week.

The full description of the GTFS format is available in the opentransportdata.swiss data timetable cookbooks.

We provide a summary description of the data below:

stops:

STOP_ID: unique identifier (PK) of the stop
STOP_NAME: long name of the stop
STOP_LAT: stop latitude (WGS84)
STOP_LON: stop longitude
LOCATION_TYPE:
PARENT_STATION: if the stop is one of many collocated at a same location, such as platforms at a train station

stop_times:

TRIP_ID: identifier (FK) of the trip, unique for the day - e.g. 1.TA.1-100-j19-1.1.H
ARRIVAL_TIME: scheduled (local) time of arrival at the stop (same as DEPARTURE_TIME if this is the start of the journey)
DEPARTURE_TIME: scheduled (local) time of departure at the stop
STOP_ID: stop (station) identifier (FK), from stops.txt
STOP_SEQUENCE: sequence number of the stop on this trip id, starting at 1.
PICKUP_TYPE:
DROP_OFF_TYPE:

trips:

ROUTE_ID: identifier (FK) for the route. A route is a sequence of stops. It is time independent.
SERVICE_ID: identifier (FK) of a group of trips in the calendar, and for managing exceptions (e.g. holidays, etc).
TRIP_ID: is one instance (PK) of a vehicle journey on a given route - the same route can have many trips at regular intervals; a trip may skip some of the route stops.
TRIP_HEADSIGN: displayed to passengers, most of the time this is the (short) name of the last stop.
TRIP_SHORT_NAME: internal identifier for the trip_headsign (note TRIP_HEADSIGN and TRIP_SHORT_NAME are only unique for an agency)
DIRECTION_ID: if the route is bidirectional, this field indicates the direction of the trip on the route.

calendar:

SERVICE_ID: identifier (PK) of a group of trips sharing a same calendar and calendar exception pattern. MONDAY..SUNDAY: FALSE (0) or TRUE (1) for each day of the week, indicating occurence of the service on that day.
START_DATE: start date when weekly service id pattern is valid
END_DATE: end date after which weekly service id pattern is no longer valid

routes:

ROUTE_ID: identifier for the route (PK)
AGENCY_ID: identifier of the operator (FK)
ROUTE_SHORT_NAME: the short name of the route, usually a line number
ROUTE_LONG_NAME: (empty)
ROUTE_DESC: Bus, Zub, Tram, etc.
ROUTE_TYPE:

Geo spatial data

The useful geospatial shapes are located in the Hive table com490.geo_shapes.

com490.geo_shapes:

OBJECTID: numerical identifier of the shape
NAME: a human readable name of the shape
GEOMETRY: binary representation of the shape that can be used in ESRI UDF functions.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
data_preprocessing.ipynb		data_preprocessing.ipynb
delays.ipynb		delays.ipynb
eval.ipynb		eval.ipynb
preprocessing_stops.ipynb		preprocessing_stops.ipynb
route_planning.py		route_planning.py
save_spark_df.ipynb		save_spark_df.ipynb
stats.pkl		stats.pkl
visualization.ipynb		visualization.ipynb
visualization_fct.py		visualization_fct.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final Project

SBBG - Serbian Bundesbahnen Belgrade

Executive Summary

Problem Description

Data Model

Graph

Distance Matrix

Routing Algorithm

k-Shortest Paths Algorithm:

Cost:

Delay Model

Limitations and Improvements

Visuals

Presentation video

How to use

Files

Data Source

Actual data

Timetible data

Geo spatial data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Final Project

SBBG - Serbian Bundesbahnen Belgrade

Executive Summary

Problem Description

Data Model

Graph

Distance Matrix

Routing Algorithm

k-Shortest Paths Algorithm:

Cost:

Delay Model

Limitations and Improvements

Visuals

Presentation video

How to use

Files

Data Source

Actual data

Timetible data

Geo spatial data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages