-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MTA buses config #97
Conversation
Quick follow-up - I believe I now understand the trip loop issue. Apparently trip ids are not unique for MTA bus trips. This causes transiter to try and reconcile the stop sequences of multiple trips running on the same route, which creates the cycles. I believe creating an id from a combination of |
Thanks for this PR! I’ve been wanting to add the NYC bus data for the longest time but the old Python version of Transiter was too inefficient so I didn’t even try. With the current version of Transiter I think it’s feasible and the background update process shouldn’t be too CPU intensive. Thanks!
I’m currently away from my computer while abroad, and would like to give this a spin before merging if that’s okay? Will be back next week. In the meantime two comments:
1. Should we rename the file us-ny-nycbus.yaml or something like that? My initial idea was that the naming convention would be country-state-system.yaml, in which case “buses” is potentially not specific enough for all of NY state. (I think “subway” is specific enough, though in retrospect nycsubway would also be reasonable.)
2. Thanks for looking into the service map failures! I’ve actually seen these with the subway feed, too! You’re right that Transiter doesn’t fail in this case: it updates all of the trip data, but keeps the old service maps. For the subway the error usually goes away after a few minutes, and because it doesn’t block the update, I never bothered to dig deeper. Do you know if this is a transient issue with the bus feed too?
Trip IDs being reused does sound like a more serious problem though which would be interesting to solve… I think the subway feed also has this problem occasionally. My recollection is that Transiter will silently ignore the second trip in this case when updating (which is not great tbh but it’s hard to think of a better way.) Maybe the service maps code doesn’t ignore subsequent trips which sounds like a bug.
By the way speaking of service maps, Transiter should be resilient to loops in trips because some systems do have such trips (the Circle Line in London, all of the Chicago “L” routes). The service maps feature probably should be more clever that is currently is…but that’s a huge scope problem for another time!
…On Wed, Mar 22, 2023, at 8:02 PM, Sam Cedarbaum wrote:
Quick follow-up - I believe I now understand the trip loop issue. Apparently trip ids are not unique for MTA bus trips. This causes transiter to try and reconcile the stop sequences of multiple trips running on the same route, which creates the cycles.
I believe creating an id from a combination of `TripId` and `VehicleId` will be unique, so this could be done as an extension in the `gtfs` library. I'll look into this next.
—
Reply to this email directly, view it on GitHub <#97 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC4T7SVY5R35J7XUDE6T2O3W5OVLXANCNFSM6AAAAAAWDD6QZM>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Thanks for the initial feedback, and it would be great to have you try it on your end when you are back and have time! In the meantime, I've done some more experimentation and have some additional notes below:
Update 1
Update 2
This causes some issues when transiter tries to merge the trip timeliness across updates, most noticeably the loop issue mentioned previously. I was able to fix this by simply deleting all previous stop times on trip updates (i.e. MTA bus trips will only return current and futures stops now). Something more clever could probably be done to merge the timelines. |
I just tried this out and I hit the GTFS parsing error you fixed in jamespfennell/gtfs#1! I also noticed that when this parsing error is hit the client command ( Also trying to fix the CI failure right now, which is unrelated to your change. |
Wow this system has over 12 thousand stops! |
Using your fix in the GTFS package I was able to play around with this. It looks awesome! I sanity checked that the data from one of my local bus stops matches the MTA website and it does! To merge this in, I think we're blocked on the GTFS Go package PR you created? And also, if we want to use that, I think we'll need to change the Transiter Go code to wire in that extension as a parser option? Just like how the subway does it. Unfortunately this is a big tedious... Regarding service maps, I just realized that they're more complex than the NYC subway in a way that may need changes to the service maps feature itself (not in this PR, to be clear). For the subway, a northbound trip calls at the exact same stations as a southbound trip, just in the reverse order. For the buses this is not true at all. For example, my local bus the B57 goes down Court St on the way south and up Smith St on the way north. The Transiter generated service map isn't useful because it's basically all the northbound stops followed separately by all the southbound stops. I wonder if Transiter should support creating northbound-only and southbound-only service maps? This could be specified in the service maps config. So for the buses there wouldn't just be a "realtime" service map; there would be a "realtime_northbound" map and a "realtime_southbound" map. |
Hmmm just as an FYI, I'm getting lots of "service map cannot be topologically sorted" errors but no duplicate trip IDs. So perhaps there is another reason these maps can't be sorted, in addition to the trip ID duplication issue? I'm going to look further. |
Hey @jamespfennell - did you see my theory about unstable stop sequence numbers in my last comment? It may explain why you're still seeing that error. When I completely replace all stop sequences with each new update (i.e. make no attempt to preserve trip history), I no longer get the error. I also validated this theory with a python script that compares stop sequences between updates for the same trip id. Let me know if that makes sense based on what you're seeing on your end. EDIT: For reference, this is the hack I use to clear previous trip data in var stopTimePksToDelete []int64
for _, stopTime := range stopSequenceToStopTimePk {
stopTimePksToDelete = append(stopTimePksToDelete, stopTime.Pk)
}
if err := updateCtx.Querier.DeleteTripStopTimes(ctx, stopTimePksToDelete); err != nil {
return false, err
}
stopSequenceToStopTimePk = map[int32]db.ListTripStopTimesForUpdateRow{} |
Ah yes, sorry I did read that comment last week but had forgotten it. Do you have any opinion on the best way to fix this? Should Transiter ignore the stop sequences numbers and generate its own? This is what it does for the subway. Though I do wonder - is the MTA using stop sequences because some trips call at the same stop twice? In this case the GTFS realtime spec does require stop sequences be provided to disambiguate the stop times for the same stop. |
By the way, I'm happy to merge in any version of this work that you're happy with. Things like improving the stop sequences situation can always be done in subsequent PRs (or not at all!). From my perspective, even just the new config file (along with the |
I think there is a quick fix for the service maps issue alone. When calculating the service map, we could skip all stops times that are in the past. It's a one line fix to this one SQL query - it just needs an additional This wouldn't fix the trip data itself, and as in your comment above, we would still see duplicate past stops. |
Realtime service maps should only use future stop times. Using past stop times can result in inaccurate service maps when (for example) a route is finishing for the night. After the last train passes a station, the station should disappear from the map because there is no longer service there. With the current code, the station remains in the map until the trip is finished entirely. Previously this was the behavior of Transiter. However in the large refactor in be23046 I introduced this bug. Discovered by @cedarbaum in #97
Thanks for the service maps fix! I agree that this can be merged safely with just that (after bumping 1.) So going forward, I can cleanup this PR and potentially also work on (1) or (2) if you think this an OK approach. Let me know your thoughts! |
Awesome, that sounds great! I personally think (2) is the way to go because I think it will end up making the MTA buses work correctly with trip history. The implementation shouldn't be too bad either? After adding the If you wanted to implement that that would be awesome! Could do it in this PR, or a follow up PR. By the way, I think the downside of the new configuration option only really applies if you're trying to link the realtime data with the static data. My understanding is that the stop sequences give you a way of linking a stop time prediction in the realtime data to the associated scheduled time in GTFS static. For the MTA buses this realtime-static linking will presumably always be broken anyway because of the bug you discovered. But it's definitely a limitation of the new option that we should document. (To be clear Transiter currently doesn't attempt to do this realtime-static linking, but it's something I want to do in the future because it's needed to get some transit systems working, like the SF BART.) Another random interesting thing (sorry for the brain dump) is that the MTA bus transit system is really big, and when I play with it locally I'm seeing trip feed updates take 7 or 8 seconds. Having this system in the repo will be great for performance benchmarking. I just took a pprof profile out of curiosity, and am finding that a lot of the slowness is because Transiter updates trip stop times with individual SQL queries, one at a time. This is slow just because for each SQL query, a network call has to be made to Postgres. Maybe in the future there is some optimization that could be done based on this kind of profiling. |
Interesting! I was sort of thinking about this a bit when I was looking at that code, but figured Postgres would do some magic to optimize these operations within the same transaction. Do you think a multi-row |
Yeah I was thinking a multi-row upsert (or even just update because inserts are relatively rare) would befaster. The main motivation would be to avoid the overhead of making a SQL call to Postgres for each stop time update update. Not sure how hard it is though! I think Postgres supports it, but not sure about |
I created a new issue for the optimizing the feed updater, just so that I don't derail your PR :) #100 |
04b6c67
to
7ad7276
Compare
Awesome, thank you! |
About
This change adds static and realtime trip data for MTA buses.
Issue with trip loops
In my testing so far, this appears to work mostly out of the box except for one error that sometime occurs for routes:
I looked into this a bit, and it seems there are loops being created in some GTFS trips. For example, on the M96 route, I observed this:
My guess would be this is a bug with the feed, but it's hard to say for sure without a repro outside of transiter. I did try writing a small Python script to poll the feed and look for loops within trips, but I haven't yet seen the same behavior there (this could just be an issue of timing though).
It does seem transiter is resilient to this sort of error, though I am curious if you think it's something we need to understand better before using this config. Also, I am curious if you have any thoughts about where the bug most likely is and what could be done to further investigate.
Other notes:
As always, I appreciate any time you're able to spend looking over this and please let me know if you have any feedback when you have a chance.