Skip to content

Bus Real Time (SIRI) Data Documentation

Sagi Sarussi edited this page Oct 5, 2021 · 11 revisions

Work-in-progress, part of issue #184

This doc will include information about the SIRI data. More details and relevant discussions can be found in the linked issue.

============

Introduction

Real-time bus data is being routinely downloaded from the Israel Ministry of Transport (MoT) through SIRI-SM protocol. MoT exposes real-time bus data through a web-service. Our siri_retriever service queries MoT's web-service on real-time for every vehicle and logs the responses. This documentation includes information about the SIRI-SM protocol, our retriever service, and the retrieved data.

Data Source

SIRI-SM protocol

SIRI stands for Service Interface for Real Time Information. This protocol allows to exchange real-time public transportation information. SIRI-SM is a Stop-Monitoring service, which means that data can be retrieved in stops-level: We query in real-time a specific bus-stop, and get a response with information about the public transportation vehicles that are on their way to this stop.

Israel MoT SIRI-SM web-service

In Israel, the Ministry of Transport maintains a web-service to publish real-time bus data using SIRI-SM protocol. Data are collected by MoT from the bus agencies, and only available for public queries in real-time. To date (June 2019), MoT does not provide an option for retrospective queries. Full documentation (in Hebrew) of MoT's web-service can be found in their website: https://www.gov.il/he/Departments/General/real_time_information_siri, (Archived version Apr. 2019).

SIRI Retriever

Our SIRI data retriever is an open soure Java process that queries MOT service around the clock, and logs the SIRI data results after a minimal processing. Our current querying policy results with a GPS location for every bus in Israel approximately every minute. For more information about this operation, refer to the README of the SIRI retriever code.

Raw Data

Missing info:
1. What are the format of the raw results that SIRI retriever gets? What are the fields in the raw response?
Response from Evyatar - The response in an XML file, which undergoes basic parsing to a CSV, which we save as log files that are being the input for Splunk.
2. What are the stages between the raw answers and the Splunk table?

Data fields

** missing - specification of where these fields are coming from - the raw data or our preprocessing? ** ** missing - do we want to fill in the types? do we have types specifications in Splunk? ** ** TODO - Provide 1-1 mappings for agencies **

Basics:

  • Every row in this database represents a single GPS positioning of a single bus at a specific time.
  • Many items have two actual fields: the ID field and the name field, which is mostly a string with an informative name. There should be a 1-1 mapping between these fields.
  • ID types are mostly represented by integers, but it is recommended to treat them as strings.
Field Name Expected Type Description
agency_id ID The agency/brand that operates the service.
bus_id ID Identifier of the vehicle. Different agencies have different conventions for bus identifiers, and they can also change these conventions without prior notice. It is absolutely not recommended to rely on such conventions for generalize solutions, however, they can be useful in specific case investigations. For example, Kavim & Egged use the actual bus licence plate, while Dan uses an internal ID. bus_id can be used to track the vehicle along time across routes.
date YYYY-MM-DD "date recorded". The date in which we got the GPS position response. This is not always the date of the bus trip (think about trips around midnight) and we are working on adding the trip planned date to the logs (issue #183)
lat float GPS latitude coordinate in WGS84 standard
lon float GPS longitude coordinate in WGS84 standard
planned_start_time HH:MM:SS Planned time of departure from the first stop
predicted_end_time HH:MM:SS Predicted time of arrival to the last stop. This prediction can change during the trip. We assume that this prediction is currently based on a rough estimation of the distance left to the final destination, and it does not indclude traffic and irregular events
route_id ID Route identifier. Unofficialy, route represents a specific path that the bus travels. There can be multiple routes per
route_short_name ? ?
service_id ? ?
time_recorded ? ?
timeendpos ? ?
timestamp ? ?
timestartpos ? ?

SIRI log v2

Each line in siri log v2 could be parsed by the following format:

{responseTimestamp},[line {lineName} v {licensePlate} oad {departureTime} ea {expectedArrivalTime}],{operatorRef},{lineRef},{lineName},{journeyRef},{departureTime},{licensePlate},{expectedArrivalTime},{recordedAt},{lon},{lan},{dataFrameRef},{stopPointRef},{vehicleAtStopStr},v2

Complementary data: GTFS

GTFS is the format MOT uses to publish planned bus schedules in Israel. It contains the list of route ids, agencies, trip times, stops and route geographical shapes. These data is being downloaded by us daily and used for two main purposes:

  1. Direct SIRI-retriever to up-to-date stops to query.
  2. Being point for comparison between planned and actual bus trips.