# Merge Delays
In this notebook, the delay information from 23/01/2019 is merged into a single object.

## Real-time updates
In the archive, all of the responses to requests to the Transport for NSW Open Data [realtime API](https://opendata.transport.nsw.gov.au/dataset/public-transport-realtime-trip-update) have been saved. As part of the repository, there is a Python task for making these requests every two minutes, 24 hours a day.

In [1]:
import glob
data_path = 'home/pi/sydney-transport-tracker/data/raw/20190123/'
delay_files = sorted(glob.glob(data_path + '*.pickle'))
print('We have ' + str(len(delay_files)) + ' delay files')

We have 720 delay files


### The response
This data is in the format according to [General Transit Feed Specification](https://developers.google.com/transit/) and can be parsed with the [GTFS python library](https://developers.google.com/transit/gtfs-realtime/examples/python-sample) to return entities containing the delay information.
Let's look at entity 20 from the first file as an example.

In [2]:
from google.transit import gtfs_realtime_pb2
import pickle
delay_response = pickle.load(open(delay_files[0], "rb"))
feed = gtfs_realtime_pb2.FeedMessage()
feed.ParseFromString(delay_response)
feed.entity[20]

id: "4--Y.1260.122.60.M.8.55186691"
trip_update {
  trip {
    trip_id: "4--Y.1260.122.60.M.8.55186691"
    schedule_relationship: SCHEDULED
    route_id: "BNK_2a"
  }
  stop_time_update {
    arrival {
      delay: 0
    }
    departure {
      delay: 0
    }
    stop_id: "214381"
    schedule_relationship: SCHEDULED
  }
  stop_time_update {
    arrival {
      delay: 0
    }
    departure {
      delay: 0
    }
    stop_id: "2199171"
    schedule_relationship: SCHEDULED
  }
  stop_time_update {
    arrival {
      delay: 0
    }
    departure {
      delay: 0
    }
    stop_id: "2200501"
    schedule_relationship: SCHEDULED
  }
  timestamp: 1548161874
}

As mentioned, responses were saved every two minutes. This means there is going to be overlaps where a trip and its delay information will appear across multiple response files. We keep all of this information because, throughout the day, as trips finish, they will no longer appear in the real time response.

Delay objects are merged by simply taking the latest delay information for a stop. This is done with the assistance of the `trip_objects` class and the `merge_trips` method.

In the end, we simply have the last available information of delays for each trip.

In [None]:
import sys
sys.path.append('../')
from src.features.trip_objects import *
from src.features.trip_helper import *

merged_delays = dict()

for delay_data_file in delay_files:
    current_delay_response = pickle.load(open(delay_data_file, "rb"))

    feed = gtfs_realtime_pb2.FeedMessage()
    feed.ParseFromString(current_delay_response)
    for entity in feed.entity:
        if entity.HasField('trip_update') and len(entity.trip_update.stop_time_update) > 0:
            trip_update = TripUpdate(entity.trip_update.trip.trip_id,
                                     entity.trip_update.trip.route_id,
                                     entity.trip_update.trip.schedule_relationship,
                                     entity.trip_update.timestamp)

            for stop_time_update in entity.trip_update.stop_time_update:
                trip_update.stop_time_updates[stop_time_update.stop_id] \
                    = StopTimeUpdate(stop_time_update.stop_id,
                                     stop_time_update.arrival.delay,
                                     stop_time_update.departure.delay,
                                     stop_time_update.schedule_relationship)

            if trip_update.trip_id in merged_delays:
                merged_delays[trip_update.trip_id] = \
                    merge_trips(merged_delays[trip_update.trip_id], trip_update)
            else:
                merged_delays[trip_update.trip_id] = trip_update

print("Found " + str(len(merged_delays)) + " trips")

In [None]:
pickle.dump(merged_delays, open('merged_delays.pickle', 'wb'))