Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC to get hours of service for a station/route #14

Closed
elliottwilliams opened this issue Mar 22, 2017 · 5 comments
Closed

RPC to get hours of service for a station/route #14

elliottwilliams opened this issue Mar 22, 2017 · 5 comments

Comments

@elliottwilliams
Copy link
Member

elliottwilliams commented Mar 22, 2017

It'd be useful to have an RPC to determine when a particular route makes stops at a station. This would allow clients to say something like:

"Route 13 stops here Mon-Fri, 7:00am-6:00pm, every 5 min"

So, the call would look something like

timetable.service_times(station, route)

and would return

  • a range of weekday-times (e.g. Mon 07:00...Mon 18:00 or Tue 19:00...Wed 02:00) for a particular level of service
  • the time interval between stop times
timetable.service_times(station, route) -> [ (Date Range, Visit Interval) ]
@faultyserver
Copy link
Member

I really like this idea, but I'm not entirely sure it can be implemented in this way without losing some information in the process. Obviously, this is a solved problem, as other services are able to provide this information, though, their implementation may not be automatic in the way that we would like it to be.

I'll try to build up my understanding of why I don't think this will work exactly as we want (or at least won't be trivial).

(Simplified) Implementation Overview

First, let's assume that a route only has one trip.

GTFS directly contains the information for part of the "range of weekday-times" field through the mapping of stop -> stop_time -> trip (route) -> service -> calendar. However, the information is stored as a boolean for each day of the week indicating whether the service is active, so the direct result would end up looking more like [[Monday, Tuesday, Thursday], Time Range] (note how Wednesday is not part of the date range).

The time range portion is fairly simple. Just iterating all stop_times at the station for the trip and finding the min/max would give the time bounds for each day. In fact, since the visit_list is sorted in part by departure time, just querying for the lower and upper bound would give the same result without requiring iteration.

The last portion is potentially the most difficult. It could potentially suffice to iterate all of the stop_times for a route at a station and hope they all occur at a consistent interval (in most cases this should be true). However, what happens if there is a 30 minute break in the middle of the day where a station is not serviced? More generally, how would we handle instances where the interval is not consistent between stop_times? I don't have any experiential data for this, but it is a possibility inside of the GTFS format.

With that, I would propose to modify the response of this RPC to something along the lines of the following:

timetable.service_times(station, route) -> (days-with-service, time-range, visit-interval)
# Example:
timetable.service_times('BUS123', '1B') -> ([Mon,Tue,Thu,Sat], [06:45:00, 18:00:00], 00:00:05)

Multiple Trips per Route

It is almost always the case that a Route will consist of multiple trips, each potentially with their own service (and thus date-range) and time-range.

The easiest way I can think to support this is to simply create a response for each trip and join them all in an array. The format would then be adjust to the following:

timetable.service_times(station, route) -> [(days-with-service, time-range, visit-interval)]
# Example:
timetable.service_times('BUS123', '1B') -> [([Mon], [06:45:00, 18:00:00], 00:00:05), ([Thu,Fri], [07:45:00, 20:00:00], 00:00:10)]

Attempting to collate this information into a single response (to me) would either lose information (as the intervals may be different between trips, but collating would require dropping all but one interval as the representative. Or, the time range may be different, etc.).

Collating by Service

In my experience with CityBus's data, trips are (mostly) organized into services such that all of the vehicles that are traveling a route on a given day belong to the same service. Additionally, the time interval between vehicles on different trips for the same service seems to stay consistent. With that, it may be possible to collate trips that are on the same service to the same entry in the response above (where the response is an array).

With these assumptions, it would be possible to collate the interval by merging the two sets of stop_times and calculating the difference between two of them. It would also be possible to merge the time-range by doing a min/max on all of the range beginnings and endings to get a complete timebox. As all of the trips are on the same service, the date-range would not need any merging, as that information comes from the service itself.

My only concern with this is that I don't know if we can make those assumptions. If we do, what happens when our assumptions aren't met? Do we simply say that the information is unavailable? Make a best attempt? I'm not sure.

Additionally, services are not necessarily active for the entire date range that a GTFS archive covers. The calendar_dates.txt file can be used to state that specific dates are inactive, or that additional dates are active for a given service. So simply saying "Monday-Thursday, 6am-8pm" may not always be true if one a date that would normally be active for that range is marked as inactive.

Conclusion

I don't know what the best format for the response is, but I'm inclined to follow Google Maps' implementation of Store Hours for this, where the days are separated and the hours are shown for each day, and only the services for a given week are shown. calendar_dates.txt is limited to a precision of 1 day, so we would then be able to mark those particular dates as inactive, and we wouldn't have to worry about other discrepancies across weeks.

The issue of collating information for each day across services/trips is still present with this solution, but I believe it will be easier than attempting to do a general collation for the feed's entire duration, and will avoid having to indicate exceptions that may be defined within that range.

I don't know what the implementation of this format would look like to Timetable (or to callers, for that matter). I'll be thinking about it tonight and tomorrow and will hopefully have a better idea then.

tl;dr:
I'd like to see the root concept of this RPC implemented, but I don't think the response can be as simple as has been outlined here. I think that the response will either need to be made into an array of responses, one for each component that makes up the service at the station, or a different approach (akin to Google Maps' Store Hours display) will be required.

@elliottwilliams
Copy link
Member Author

However, the information is stored as a boolean for each day of the week indicating whether the service is active, so the direct result would end up looking more like [[Monday, Tuesday, Thursday], Time Range] (note how Wednesday is not part of the date range).

I think this is fine—building some sort of disjoint range (if needed) should be an exercise for the client.

I'm not sure I can speak much into the feasability of coaxing this information out GTFS. One thing I was thinking though, was that I could write a call like this on top of timetable.visits_between by exploring stop times across an entire day, and calculating stop intervals (by taking an average or some other heuristic). An implementation like this wouldn't even need access to Timetable's internals to function. Is there something I'm overlooking?

The purpose of visit intervals

One thing to consider is the UI goal. A user story for this feature would read something like:

As a transit rider, I need to know the schedule of the bus I take throughout the week. This way I can plan ahead and know when I can expect to catch it. Knowing when it isn't running, like on weekends or at night, is necessary, but also knowing how frequently it stops is important: it allows me to determine how long I'd have to wait (should I catch this bus or just wait for the next one?), and to know if service slows down at certain times, like in the early mornings and late evenings.

With this in mind, I think it's important to be able to report inconsistencies in the visit interval. For CityBus, some routes (1B and 5A, at least), run at 2x service during daytime hours and go from arriving every 30 min to every 15 min.

Normalization algorithm

tl;dr Take an average of stop times, but break into multiple visit intervals after passing a certain differential.

What if we chose a small time window (window = 3min), and iterated through stop times following this algorithm:

for each stop:
  if stop time within avg_interval ± window:
     add stop.time to the computed average, interval.avg
     interval.end_time = stop.time
  else:
    create a new interval: { start_time = stop.time, avg = stop.time, end_time = stop.time }
  return all intervals

This would allow discrepancies in the stop time, but significant changes would report as a separate interval. The RPC would return time ranges associated with time intervals, like this:

timetable.service_times(station, route) -> [(days-with-service, [time-range: visit-interval])]
# Example:
timetable.service_times('BUS123', '1B') -> [([Mon], [[06:45:00, 07:50:00]: 00:00:10, 
                                                     [08:00:00, 16:55:00]: 00:00:05,
                                                     [17:00:00, 22:50:00]: 00:00:10)]

For routes with irregular schedules, this algorithm would produce a bunch of insignificant (range, interval) pairs. Either in Timetable or client-side, we "give up" after a certain number of visit-interval changes. In these instances, the client would show something like:

1B. 6:45 AM - 7:50 AM : every 10 min
    8:00 AM - 4:55 PM : every 5 min
    5:00 PM - 10:50 PM: every 10 min

23. 8:00 AM - 5:00 PM : periodically

@faultyserver
Copy link
Member

I like this solution. I think it provides the most accurate view of the frequency information while avoiding being overly verbose. In fact, I had forgotten about one of the optional GTFS files, frequencies.txt, that specifies visit intervals exactly in the format that you are responding with. Sadly, though, this is an optional table that most agencies do not seem to include.

However, even with this solution, there is still an issue with how to collate multiple trips together. Remember that different trips for the same route can belong to different services, and those services can be activated on arbitrary dates, so I feel like there still needs to be a time bound for the period in which the response is valid. If we just want it to be the day that the request is made, that should be fine, but I feel like that makes the information somewhat less useful.

I'd like to somehow merge your solution with the Store Hours solution I mentioned before. Maybe we can bound the response to a week (always Sunday-Saturday, no matter the current date), and then have the response be a 7-element list following the format you specified for each day.

I think it may be worth discussing over a voice call or similar to better pass ideas back and forth on this. I think there is a little too much detail to really be able to write out ideas.

@faultyserver
Copy link
Member

FWIW, if we implement that combination of solutions, I believe clients would be able to say things like every 5 minutes until 5:00pm, which I think is a really nice, brief, inline-able presentation of that information that's easier to grasp than the table of intervals that would be generated otherwise.

@faultyserver
Copy link
Member

Closing in favor of propershark/proto#1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants