-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add comprehensive date checking in setters of regional_task.py
#266
Comments
Haven’t yet tested this myself, but this can definitely be the case. I’ll check this out now, should be an easy fix
Hm, this is a bit tricky: supplying multiple GTFS files can have the purpose of covering more than one transport system, but it can be used also to cover a longer period of time, with some times covered by one file and some by others. So there should be different warnings for (1) one of the files do not cover the |
Since we open this up another time: let's also add (1) a check whether the origin and destination geometry is within he geographic extent of the GTFS dataset, and (2) a warning if a transit mode is selected, but no GTFS dataset loaded (we might have this check already, but I'm typing this on the phone) Finally, (3), let's coordinate this with a possible super-class that combines |
FYI I was referring to this line of code. |
Yes, I figured, and just this fix is literally 2 minutes of work. Edit: Thinking about it, also the other things are not such a big effort, let’s keep them together |
I assume we would use |
regional_task.py
regional_task.py
Also given that #265 is a bit different (and has a PR open to fix it) I think we should keep this issue focused on date validation only. |
Also, since you’re the GTFS expert: does stops.txt have geometries from which we can create our own convex hull, or should we simply use the feed_info.txt extent? I’ll link these comments into #265 - definitely a better fit there |
From my experience Suppose a GTFS covers a single line service or just does subways for example. We wouldn't get much of a convex hull to work with. I'm doing a similar thing for another project and I'm wondering if it's better to just leave that type of check to the user input. In part because GTFS doesn't need to span the analysis area (people can walk/bike/drive long distances to stops). What is probably more useful is checking the extent of the PBF file, but perhaps R5 does that already (haven't tried it). |
That’s something worth checking |
True, I did not consider that cities with just one line even exist. If the different lines are split into several GTFS feeds, then a convex hull of all stops of all feeds would make sense, though |
Even so, I can see use cases where the area created by the convex hull does not span the entire analysis area. If it's a reasonably quick check I think it's fine to do it (are we getting into "verbosity" setting territory?) but it should definitely only be a warning with a clear explanation that it might not be an unintended thing. |
Absolutely, a warning should suffice, and maybe even only when running in verbose mode. |
Opened a new issue #271 concerning the geometry/extent checking |
From our call:
|
How do we feel about this being in release 0.0.6? |
Now that I'm working with batches of regional GTFS feeds, I have noticed that the warning is thrown even when my own validation using But that got me thinking further: It is possible that an analysis using a set of GTFS feeds might run outside of one of them (for example, a commuter rail GTFS feed that runs only at peak period, while the analysis time is set for 1400). This might be totally fine (especially for automated tasks). It is also possible that an analysis window is half-covered by a GTFS feed. In any case, I'm not sure this input date coverage check should be baked-in to the Suggested behaviour here is, is that if the mode is set to
|
I would not introduce an additional Should the exception be thrown independently of whether or not we’re in I remember you mentioning that the information in |
Philosophical, indeed! I think warning if verbose feedback requested, is alright, and failing if no file claims (!) to cover the |
Reminding myself here that a Perhaps the wording of the warning here should be "One or more of the GTFS data sets is outside the time range covered by currently loaded data sets" |
@christophfink do we still want to provide a check for the dates? If so, we need to consider the following (as I do for GTFS-lite):
Note: this does not check if there are any trip on that date. For example, a Also note: I don't know how extensive the checking of this data is on the R5 end. This is a pretty serious can of worms that I've tackled over at GTFS-lite with As I've expressed before, my opinion is to leave the data wrangling and validation up to the user. We can provide guides and how-tos if we are interested, but ultimately there are so many permutations of what might be going wrong we can't cover them all in R5py without basically recreating things like the MobilityData Validator or GTFS-Lite package. |
I see too viable options really:
|
Loading in a file can be quite slow (5-6 seconds) if the feed is large - but perhaps the tradeoff is worth it. Could also be something we allow the user to specify? My vote is to put those few lines of code outside of the regional task and have people check their dates ahead of time, much like other data preparation steps. Happy to make a PR to include a validation example and remove the warning if this is the way we go. |
|
TODO: Remove date checking warning and add date checking code to examples. |
At the moment, it looks like the "date coverage" check only checks if
TansitMode.TRANSIT
is in the departure modes, but not if there is any subset of these modes.Not sure if we have a test set up for this (but I suspect it would fail if the user passed a a set of all modes individually).
Further the
RuntimeWarning
that gets thrown if a date is off should ideally specify which feed in the GTFS list is causing the issue (currently trying to debug whether this is throwing a warning unnecessarily.The text was updated successfully, but these errors were encountered: