-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding my own shitty gtfs csv parser + schedule files #10
Conversation
- parser reduces 50MB or so of CSVs down to 600kB worth of train-related json files - this was as painful as I thought it'd be - i'll need to re-run (and probably fix) this task as the gtfs data changes
|
||
// of course they couldn't use the same station names | ||
// as in the realtime API | ||
var station_name_mapping = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😏
In my testing over the past couple of days, it turns out the schedule data is kind of useless in predicting when a train will depart from one of the terminal stations. It might still be better than nothing, but I'm leaning towards only using it on terminal stations like Doraville. Maybe 1 out of every 4 or 5 predictions is accurate enough, in my guestimation, to be useful. When things go well, you see these gray numbers that indicate schedule data. The as the countdown nears zero for doraville or airport, you'd see a train suddenly appear: More often than not, though, either the schedule time will pass with no train leaving (and we just roll over to predict the next scheduled arrival, which is depressing), or a train departs too early and we get confusing states going on. Below, the schedule still had 6min and 3min to go for the terminal stations, but a train left early, so we're stuck in some kind of useless state that I'm sure will confuse users: |
@itsmarta definitely has the realtime data for those end stations somewhere, since it's visible on the boards (at least at Chamblee). Have you tried contacting them to see if they can get that stuff reflected in the API? http://www.itsmarta.com/developers/contact-us.aspx I see you started this thread - you may want to push harder on it: Also the page at http://itsmarta.com/developers/default.aspx gives the email address martadevreq@itsmarta.com |
Back when I was gathering feedback from reddit, someone did PM me with an undocumented API route that is station specific (think However, I'd have to hit that endpoint once for each station we care about, and then repeated every 10s or so, if someone is looking at the website. Not ideal... but I could start doing it every 10s for the terminating stations, at least... P.S. after examining the undocumented endpoint's |
- mixing in schedule data alongside realtime data, whenever a station only has realtime predictions for a single direction (hopefully only near-terminating stations) - this should sidestep the cases close to after-hours when there are many stations with no predictions - marked as `scheduled: true` to differentiate them
I'll deploy this branch to marta.io when I get a free moment later today. |
- also solve case where now = 11:55 and arrival = past midnight - first time wrapping my head around timestamps that go past 24:00:00 ಠ_ಠ
- task time jumped from ~0.8s to 5+ seconds. still doable. - due to parsing every line of the CSV now, instead of only looking at the start of the vast majority of lines - could add stream listeners on both, and send only valid lines to the CSV parser... maybe later - verified this by running the task again and seeing if my JSON changed - also deleted my json and made sure the task added it back without anything show up in `git diff`
- might need a changelog if these features keep up
adding my own shitty gtfs csv parser + schedule files
Well, it's been a year and marta.io still suffers from the same API edge cases for terminating stations (and those stations near them). Here we have a calendar file that lays out the service # to use each day of the week, and then a file for each service. This schedule data is intended for use whenever a train isn't in the API for a given direction. I hope to highlight schedule data vs realtime data, so people know that it's not based on realtime data.
There should also exist exceptions for holidays/etc, but none exist in the GTFS spec right now, and I'm going to add those (and the logic for them) when they appear.
An alternative I considered was scraping MARTA's website for each station's schedule. I chose GTFS because it's a standard spec, and because I got scared while looking at the source code on MARTA's website.