New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forbid subfolders in GTFS files #379
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
I have accepted the CLA. |
We've seen files within folders being a problem several times at RideAmigos, such that we have unilaterally pre-filter all zip files to flatten them before continuing to process then. I'm not sure it would be appropriate to add to the spec, but some zip tools like the
|
@thbar According to the contribution process, you need to send a message to the mailing list announcing this PR: https://github.com/google/transit/blob/master/gtfs/CHANGES.md#specification-amendment-process (I support the proposal, btw). |
Co-authored-by: Leonard Ehrenfried <mail@leonard.io>
I'm in no position to make decisions about this, but adding references to specific software tools and command line invocations feels a little "un-spec-ish". |
True. |
@leonardehrenfried thank you, I have posted over there (probably pending moderation). |
It's great to see a lot of support for this change! I have one question about the must language in this PR. One of the principles of GTFS is that changes should be backwards compatible and we want to avoid changes that would make existing feeds invalid. Any thoughts on changing this proposed must to should? |
In this case, it feels more like a clarification. The first line of the GTFS website now says: "A GTFS feed, which contains static transit information, is composed of a number of text (.txt) files that are contained in a single ZIP file." Elsewhere it already says "All dataset files must be zipped together". It's already reasonable to interpret those as being at the top level. In terms of backcompat, I expect a survey of existing feeds would find that something well above 95% or compliant, and I would not be surprised if that number is > 99%. Adding the term /should/ doesn't help much, because it applies that consumers must still handle the sub-folder case. Because the vast-majority of feeds are already compatible and because there's an easy fix (zip --skip-junk), I believe in this case the term "must" still fulfills the spirit of backwards compatibility . |
I agree with @markstos. If we use "should" then we don't need this PR since you "should" already do that. We want to make those feeds invalid and have some unambiguous language in the spec to point producers to. |
According to what we see ourselves on the field, less than 1% of the GTFS are in that situation (with subfolders). As a side note, we (Mecatran) requires explicitly to enable a "lenient" mode on our tools to load zip with subfolders, we do not handle them by default. Also, interesting questions can arise when you have multiple GTFS inside a single zip in different subfolders (saw that once). |
@thbar I think we have had enough time to discuss this. Do you think that we can start the voting process? |
To @laurentg's point: If we allow subfolders and expect consumers to "flatten" the files, the subfolders could contain It's also important to note that while the headline says "forbid subfolders", that's not what the spec wording actually says-- it says that the data set files must be at the top level. So for example, macOS has been known to leave hidden metadata files in "dot" files or folders that users may not even realize exist. According to the spec proposal, sub-folders will be tolerated, but ignored as data file locations. This is a good approach. |
I don't think that @laurentg was arguing that allowing subfolders is a good idea. He just pointed out the can of worms when you do allow them and you've re-enforced this point. |
@leonardehrenfried sure! I'm not well aware of who does that and how, but definitely yes. |
Hi @thbar! You as the PR advocate can open a vote based the governance process outlined in CHANGES.md. The next steps would be:
Let us know if you have any other questions about how to proceed. |
Thanks @emmambd! Following the template: This pull request has been open for more than one week, so I am calling for a vote. This is to clarify the layout of files in a GTFS archive and to ensure files will be at the top level, to simplify data ingestion. Please vote with a +1 (for) or -1 (against) in the comments. Voting ends on 2023-07-13 at 23:59:59 UTC. Thanks! |
+1 Transit |
+1 Opentripplanner |
+1 OpenGeo |
+1 from @interline-io Our team has spent a good deal of time adding support to www.transit.land for nested subfolders, nested zips, and strange combinations of both nested subfolders and nested zips. Not expecting this situation to improve overnight, but it will still be useful to have the spec explicitly forbid these situations. |
+1 from Trillium, though we recommend simplifying this text:
to: "The files must reside at the root level directly, not in a subfolder." |
+1 Arcadis IBI Group |
+1 Trafiklab/Samtrafiken |
+1 RideAmigos |
Since I got an email from MobilityData indicating the vote is over and I can adapt slightly based on feedback, I'll follow the simplification proposed by @westontrillium above, and just keep "The files must reside at the root level directly, not in a subfolder." |
PR wording has been simplified for brevity. I'll reply to the MobilityData email I got for merging. |
The vote has passed! PR has been merged! Congratulations. |
* Forbid subfolders in GTFS files See google#377 * Update gtfs/spec/en/reference.md Co-authored-by: Leonard Ehrenfried <mail@leonard.io> * Simplify wording based on feedback --------- Co-authored-by: Leonard Ehrenfried <mail@leonard.io>
See #377