Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC calls for visits within a time interval #1

Closed
elliottwilliams opened this issue Jan 15, 2017 · 7 comments
Closed

RPC calls for visits within a time interval #1

elliottwilliams opened this issue Jan 15, 2017 · 7 comments

Comments

@elliottwilliams
Copy link
Member

elliottwilliams commented Jan 15, 2017

As Proper starts to consume Timetable data, I'm realizing that a more useful way to query arrival data is by asking for all the arrivals within a time interval. Proper's station views (by default) show all arrivals for the next hour; this RPC would suit that use case nicely. Here's how I envision the RPC looking:

timetable.visits_between(station, route, start_time, end_time, n) -> [(ETA, ETD)]

start_time and end_time are standard (%Y%m%d %H:%M:%S) timestamps. n allows a count limit to be specified, but it can be omitted.

@faultyserver
Copy link
Member

This is definitely possible (and fairly easy) with the current system. In fact I've already implemented it.

Right now, there exists a map with a complex key of (Route, Trip, Station, StopTime). The keys are ordered lexicographically by each of the key parts successively. That is, all Stops on a Route are contiguous in memory, etc. Asking for all StopTimes in any timespan is just making two searches into the tree and iterating between them. In fact, visits_after and visits_before are just special cases of visits_between where one of the bounding points is the first/last StopTime at a Station.

The question is: does this hierarchy make sense? Or should it be rearranged in a way that's more efficient for the queries that will be made? For example, having the Station be the top-level key-part and the Route be the last-level key-part would make queries for all arrivals at a Station (regardless of Route) much more efficient.

@elliottwilliams
Copy link
Member Author

The two views in 1.0 that feed off of Timetable data will be station views, so inverting your map to have the top-level key be Station would make sense to me.

Perhaps "the right thing" to do here is to have a map keyed by route-trip-station, and another by station-route-trip, and pick the more efficient for a given RPC call, but station-route-trip is the most useful to Proper for now.

@elliottwilliams
Copy link
Member Author

And if you're cool with the visits_between call I specified, that's perfect for me. I'll add it to the wiki and start using it in Proper.

@elliottwilliams
Copy link
Member Author

I made two minor changes to the call I added to the wiki:

  • Clarified that visits returned must have a departure time t such that begin_time <= t < end_time. This is so that the client can get the next interval of visits by searching starting with end_time.
  • Removed the limit n parameter. This makes doing interval-based "pagination" the way I specified above safer by guaranteeing that visits won't be missed.

@faultyserver
Copy link
Member

All that makes sense to me.

Having two maps wouldn't be too bad, since each one can just store pointers (or rather, references). The only changes will be adding a new key class for the different order. Will try this out tonight.

@faultyserver
Copy link
Member

faultyserver commented Jan 29, 2017

Update on implementation: there's still only one map at this point, but it's key is (Station, StopTime, Route, Trip). This means that the default order of results for a query will be time-ordered, then route-ordered, and finally trip-ordered (though trip ordering is mostly negligible).

Update on performance: I'm currently working with Citybus' most recent archive, which has more than 250,000 StopTimes shared across 800 Stops and just under 9,000 Trips. The total memory usage with this single map is 213MB. My best guess at the implications of adding a second map will increase that memory usage to ~300MB. Processing time (parsing, interpolating stop times, and creating the map) before the system is available is ~4 seconds.

Overall, it's actually a lot better than I was anticipating based on the dataset, but could probably be improved farther with better usage of references and move-constructors, but that's a future concern.

faultyserver added a commit that referenced this issue Jan 29, 2017
…on, getting ready for filter_iterator.

CSV Parsing can now be done all at once (as before, but now called `all`) or as a stream (via `initialize` and `next`).

Also, `visit_list_key` and its usages have also been re-arranged to be `(Station, StopTime, Route, Trip)` to better accomodate its primary use case (See #1). For uses better managed by a different arrangement, another key/map will likely be made.

In general, Timetable is now ready to have a `filter_iterator` that takes a custom predicate and only returns results that pass the predicate condition. The primary use of this will be to filter out visits whose service is not being offered in a given timeframe (a la `Timetable::Calendar`), or to return only visits from a given set of routes.
@faultyserver
Copy link
Member

This has been referenced in a few other issues as well (#3 in particular), but this RPC has both been added to the wiki and implemented in Timetable, so I'm going to close this issue as completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants