-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feed_id to feed_info.txt #62
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
I signed it |
CLAs look good, thanks! |
I'm conceptually a +1 for this, given that it's already being used by multiple producers/consumers. However, to repeat here what I mentioned on Slack, I'd be interested in looking at a possible Then, assuming a critical mass wants to adopt |
@trilliumtransit currently publishes feed_info.feed_id (see https://transitfeeds.com/p/arcata-mad-river-transit-system/148/20170228/file/feed_info.txt). We would also be ready/happy to add onestop_id. |
I am definitely in favor of putting feed_id in the GTFS specification. Along with Trillium, Conveyal has been encouraging the adoption of feed_id by the GTFS producers we work with. For example, prominent GTFS producers TriMet and OpenOV (full Netherlands) have been including feed_ids in their feed_info.txt for quite a while. @barbeau I appreciate Mapzen's work on Transitland, and their effort on creating a system that uniquely identifies transit objects worldwide. However, I think the design of Onestop is contingent enough on their needs and applications that it should not replace the existing feed_id field. The feed_id is intended to be short and human-readable. It does not seek to have a geographic component, it just expresses which feed this is, giving the context necessary to situate it in a sequence of feed_versions. We've discussed feed_ids a few times with people at Transit.land, and a link already exists between the two concepts: the Onestop ID for feeds that declare a feed_id is f-GEOHASH-feed_id. If we look at all the GTFS feeds produced around the world as a large decentralized database, the inclusion of the OneStop ID (which is definitively assigned by Transit.land) in the feed itself is a form of denormalization. It allows referential integrity to be broken, in the sense that a feed can declare a OneStop ID different than the one it is assigned by Transit.land and known to Transit.land users. This is not a theoretical concern. As schemes for mapping two-dimensional vectors (geographic coordinates) onto a one-dimensional line (the space of character strings), geohashes naturally and necessarily have discontinuities. That is to say, while points close to each other are likely to have the same geohash, there are also many points close to one another that have completely different geohashes. The geohash of a feed is the geohash of a point derived from the feed contents (the centroid of a rectangle containing all service). This means that the addition of a single new stop or trip could change the geohash of a feed. Of course the people at Transit.land have thought about this and must have policies for ensuring that the IDs remain consistent over time, but that still means that these IDs must be assigned definitively by a single authority. To me, this seems more fragile and subject to obsolescence than a distributed, cooperative model. The world of GTFS producers is relatively small enough that it should not be difficult to avoid feed_id collisions, especially since multiple global indexes of GTFS data will allow consulting lists of existing IDs. Of course feed producers are free to coordinate with Transit.land, Google, or anyone else that maintains a unique feed ID system to decide on an ID. But at least in the case of Onestop, that seems to leads to IDs that are defined recursively. The feed's feed_info.txt supplies the alphabetical identifier (the feed_id) that serves as the potentially human-readable core of the OneStop ID. I'm not at all discouraging use of Onestop IDs, I just don't think they should be considered a replacement for feed_ids. I'd welcome comments from people at Transit.land or who have used/implemented either system. |
I agree with @abyrd re the decentralized nature of GTFS and the need for simply a locally unique feed ID. I'd push towards merging the |
- Changes feed_id to a String. Matches proposed GTFS extension: google/transit#62 - Add strategy to create stop_headsign from a spreadsheet - Add stragegy to create new entrances and pathways from MTA-provided spreadsheets - Additional fields in Pathway to match proposal here: https://docs.google.com/document/d/1qJOTe4m_a4dcJnvXYt4smYj4QQ1ejZ8CvLBYzDM5IyM/edit?ts=5a39452a
- Changes feed_id to a String. Matches proposed GTFS extension: google/transit#62 - Add strategy to create stop_headsign from a spreadsheet - Add stragegy to create new entrances and pathways from MTA-provided spreadsheets
Was this idea dropped in the end? Is there a replacement? |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This pull request has been closed due to inactivity. Pull requests can always be reopened after they have been closed. See the Specification Amendment Process. |
This PR adds an optional feed_id to feed_info.txt.
See a long discussion of feed_ids here: https://groups.google.com/forum/#!msg/gtfs-changes/zVjEoNIPr_Y/IZMDGUW0DAAJ
This is specified as vague as possible by design. Other discussions have recommended using NTD IDs or Transitland OneStop IDs (example). The downside of NTD IDs that feeds can contain multiple agencies. I recommend that the spec remain silent on the subject of some kind of global GTFS registry, both because GTFS is decentralized by nature and because some feeds already supply
feed_id
. More important than global uniqueness is local uniqueness. Quoting from Andrew Byrd from the above thread: