Skip to content

masazelic/SBBG

Repository files navigation

Final Project

SBBG - Serbian Bundesbahnen Belgrade

Executive Summary

This project is a robust journey planner for the Swiss Federal Railways (SBB). The planner calculates the fastest routes between stops, providing a list of routes with confidence levels. The final deliverable includes a video presentation and the project code.

Problem Description

Given a departure stop, an arrival stop, an arrival time, an integer k, and a confidence level, our algorithm returns the k best routes from stop A to stop B that arrive before the given arrival time. The returned routes all have a probability of arriving on time greater than the confidence level given as input.

Data Model

Graph

We use a directed graph with multiple edges between node pairs.

  • Nodes: Stops
  • Edges: walk connections (provided by SBB), walk connections that we computed ourselves (in the preprocessing step), and public transport connections stored with their attributes
  • Edge attributes: duration, type, arrival time, departure time, and mean predicted delay (obtained using our prediction model).

Distance Matrix

Stores the distance between every pair of stops. This is useful because stops sometimes have different names or IDs and sometimes appear as different stops with nearly the same coordinates.

Routing Algorithm

A detailed description of the algorithm's working principle is available in the route_planning.py file. This includes implementation details and assumption justifications.

k-Shortest Paths Algorithm:

We use Backward Dijkstra to find the k shortest paths according to a cost metric.

  • Backward: We have a hard constraint on the arrival time, so we start from the arrival node and progressively add route sections until we reach the departure node.
  • K-shortest: We find the k best routes.
  • Heap Queue: We use a heap queue to store the routes and to pop the ones with the smallest cost first.
  • We ignore the routes that lead to long waiting times to speed up the algorithm.

Cost:

  • Our cost is the sum of the waiting time, the moving time, and a penalty for each change. This penalty is equivalent to the burden of 2 additional minutes of travel time. We add it to penalize routes that have a large number of changes, even if they are slightly quicker.

Delay Model

  • We use the actual arrival and departure times given by SBB.
  • The delay is modeled using an exponential distribution that we fit for each different transport type.

Limitations and Improvements

  • We assumed independence between trips regarding their delay.
  • We restricted our model to the working day and to the working times (6 AM to 8:30 PM).
  • Our formula to compute the walking time based on the distance "as the crow flies" is not realistic and could be further improved.

Visuals

Presentation video

How to use

  • Open the visualization.ipynb file, and follow the instructions.
  • The section Human Interface of the visualization.ipynb notebook provides the GUI that can be used to run our application.

Files

All of the code is organized into several Jupyter notebooks and Python files.

  • data_preprocessing.ipynb: The notebook that creates direct_connection_df that stores stop pairs that are directly connected to each other and walk_transfers that stores stop pairs that can be reachable by foot.
  • preprocessing_stops.ipynb: Restricting the stops-to-stops and walking transfers tables only to the region of interest, given by the objectid.
  • delays.ipynb: Notebook for parsing SBB actual arrival/departure times and adding route information with mean arrival delays.
  • route_planning.py: Python file, includes functions for building a graph, finding the optimal route, delay modeling using exponential distribution and computing the distance matrix between the stops.
  • visualization.ipynb: The Jupyter notebook, responsible for the postprocessing of the routing algorithm, presents the route with attractive text and graphics. Additionally, it features an intuitive GUI for inputting data into the algorithm.
  • eval.ipynb: The Jupyter notebook that does the evaluation of the algorithm.
  • save_spark.ipynb: Helper notebook that allows saving the tables locally.

Data Source

Actual data

For this project we used the data published on the Open Data Platform Mobility Switzerland.

The full description of the data is available in the opentransportdata.swiss data istdaten cookbooks. We have access to the following data:

  • BETRIEBSTAG: date of the trip
  • FAHRT_BEZEICHNER: identifies the trip
  • BETREIBER_ABK, BETREIBER_NAME: operator (name will contain the full name, e.g. Schweizerische Bundesbahnen for SBB)
  • PRODUCT_ID: type of transport, e.g. train, bus
  • LINIEN_ID: for trains, this is the train number
  • LINIEN_TEXT,VERKEHRSMITTEL_TEXT: for trains, the service type (IC, IR, RE, etc.)
  • ZUSATZFAHRT_TF: boolean, true if this is an additional trip (not part of the regular schedule)
  • FAELLT_AUS_TF: boolean, true if this trip failed (cancelled or not completed)
  • HALTESTELLEN_NAME: name of the stop
  • ANKUNFTSZEIT: arrival time at the stop according to schedule
  • AN_PROGNOSE: actual arrival time (see AN_PROGNOSE_STATUS)
  • AN_PROGNOSE_STATUS: method used to measure AN_PROGNOSE, the time of arrival.
  • ABFAHRTSZEIT: departure time at the stop according to schedule
  • AB_PROGNOSE: actual departure time (see AN_PROGNOSE_STATUS)
  • AB_PROGNOSE_STATUS: method used to measure AB_PROGNOSE, the time of departure.
  • DURCHFAHRT_TF: boolean, true if the transport does not stop there

Each line of the file represents a stop and contains arrival and departure times. When the stop is the start or end of a journey, the corresponding columns will be empty (ANKUNFTSZEIT/ABFAHRTSZEIT). In some cases, the actual times were not measured so the AN_PROGNOSE_STATUS/AB_PROGNOSE_STATUS will be empty or set to PROGNOSE and AN_PROGNOSE/AB_PROGNOSE will be empty.

Timetible data

Timetable data are available from timetable data set.

The timetables are updated weekly. It is ok to assume that the weekly changes are small, and a timetable for a given week is thus the same for the full year - use the schedule of the most recent week for the day of the trip. However, note that public transport services may run at different times or not at all depending on the day of the week.

The full description of the GTFS format is available in the opentransportdata.swiss data timetable cookbooks.

We provide a summary description of the data below:

stops:

  • STOP_ID: unique identifier (PK) of the stop
  • STOP_NAME: long name of the stop
  • STOP_LAT: stop latitude (WGS84)
  • STOP_LON: stop longitude
  • LOCATION_TYPE:
  • PARENT_STATION: if the stop is one of many collocated at a same location, such as platforms at a train station

stop_times:

  • TRIP_ID: identifier (FK) of the trip, unique for the day - e.g. 1.TA.1-100-j19-1.1.H
  • ARRIVAL_TIME: scheduled (local) time of arrival at the stop (same as DEPARTURE_TIME if this is the start of the journey)
  • DEPARTURE_TIME: scheduled (local) time of departure at the stop
  • STOP_ID: stop (station) identifier (FK), from stops.txt
  • STOP_SEQUENCE: sequence number of the stop on this trip id, starting at 1.
  • PICKUP_TYPE:
  • DROP_OFF_TYPE:

trips:

  • ROUTE_ID: identifier (FK) for the route. A route is a sequence of stops. It is time independent.
  • SERVICE_ID: identifier (FK) of a group of trips in the calendar, and for managing exceptions (e.g. holidays, etc).
  • TRIP_ID: is one instance (PK) of a vehicle journey on a given route - the same route can have many trips at regular intervals; a trip may skip some of the route stops.
  • TRIP_HEADSIGN: displayed to passengers, most of the time this is the (short) name of the last stop.
  • TRIP_SHORT_NAME: internal identifier for the trip_headsign (note TRIP_HEADSIGN and TRIP_SHORT_NAME are only unique for an agency)
  • DIRECTION_ID: if the route is bidirectional, this field indicates the direction of the trip on the route.

calendar:

  • SERVICE_ID: identifier (PK) of a group of trips sharing a same calendar and calendar exception pattern. MONDAY..SUNDAY: FALSE (0) or TRUE (1) for each day of the week, indicating occurence of the service on that day.
  • START_DATE: start date when weekly service id pattern is valid
  • END_DATE: end date after which weekly service id pattern is no longer valid

routes:

  • ROUTE_ID: identifier for the route (PK)
  • AGENCY_ID: identifier of the operator (FK)
  • ROUTE_SHORT_NAME: the short name of the route, usually a line number
  • ROUTE_LONG_NAME: (empty)
  • ROUTE_DESC: Bus, Zub, Tram, etc.
  • ROUTE_TYPE:

Geo spatial data

The useful geospatial shapes are located in the Hive table com490.geo_shapes.

com490.geo_shapes:

  • OBJECTID: numerical identifier of the shape
  • NAME: a human readable name of the shape
  • GEOMETRY: binary representation of the shape that can be used in ESRI UDF functions.

About

COM490 Project: Serbian Bundesbahnen Belgrade (SBBG)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors