Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feed_id to feed_info.txt #62

Closed
wants to merge 1 commit into from

Conversation

sdjacobs
Copy link

This PR adds an optional feed_id to feed_info.txt.

See a long discussion of feed_ids here: https://groups.google.com/forum/#!msg/gtfs-changes/zVjEoNIPr_Y/IZMDGUW0DAAJ

This is specified as vague as possible by design. Other discussions have recommended using NTD IDs or Transitland OneStop IDs (example). The downside of NTD IDs that feeds can contain multiple agencies. I recommend that the spec remain silent on the subject of some kind of global GTFS registry, both because GTFS is decentralized by nature and because some feeds already supply feed_id. More important than global uniqueness is local uniqueness. Quoting from Andrew Byrd from the above thread:

But if we look at this pragmatically, what really matters is that within a single coverage area (city, regional, or continental) there is some widely recognized unique name for each feed that interacts with other outside data sources. Basically feeds with realtime data in regions with multiple feeds would truly need to use feed IDs for this to function.

@googlebot
Copy link
Collaborator

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
  • In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

@sdjacobs
Copy link
Author

I signed it

@googlebot
Copy link
Collaborator

CLAs look good, thanks!

@barbeau
Copy link
Collaborator

barbeau commented Jun 15, 2017

I'm conceptually a +1 for this, given that it's already being used by multiple producers/consumers.

However, to repeat here what I mentioned on Slack, I'd be interested in looking at a possible onestop_id in addition to (or in place of?) a feed_id. I agree with Andrew that as historically pragmatically it's local knowledge that has been most important, but I think we're seeing more and more apps cover more than one region and global uniqueness is becoming more important. Given that Transitland has already "solved" this problem (to my knowledge - someone speak up if they disagree), it seems logical to start including a globally unique ID from a global GTFS registry that provides a lot of useful APIs for querying other transit info. There are OneStop IDs for stops and routes too that persist across feed iterations (another common problem consumers all have to solve independently), and I'm looking forward to the day of OneStop IDs for trips ;). This also takes the guesswork out of what the ID field should be for producers - it becomes a copy/paste from Transitland.

Then, assuming a critical mass wants to adopt feed_info.onestop_id, does officially adopting feed_id still make sense? Or do we leapfrog feed_id?

@antrim
Copy link
Contributor

antrim commented Jun 22, 2017

@trilliumtransit currently publishes feed_info.feed_id (see https://transitfeeds.com/p/arcata-mad-river-transit-system/148/20170228/file/feed_info.txt). We would also be ready/happy to add onestop_id.

@abyrd
Copy link

abyrd commented Jul 12, 2017

I am definitely in favor of putting feed_id in the GTFS specification. Along with Trillium, Conveyal has been encouraging the adoption of feed_id by the GTFS producers we work with. For example, prominent GTFS producers TriMet and OpenOV (full Netherlands) have been including feed_ids in their feed_info.txt for quite a while.

@barbeau I appreciate Mapzen's work on Transitland, and their effort on creating a system that uniquely identifies transit objects worldwide. However, I think the design of Onestop is contingent enough on their needs and applications that it should not replace the existing feed_id field.

The feed_id is intended to be short and human-readable. It does not seek to have a geographic component, it just expresses which feed this is, giving the context necessary to situate it in a sequence of feed_versions. We've discussed feed_ids a few times with people at Transit.land, and a link already exists between the two concepts: the Onestop ID for feeds that declare a feed_id is f-GEOHASH-feed_id.

If we look at all the GTFS feeds produced around the world as a large decentralized database, the inclusion of the OneStop ID (which is definitively assigned by Transit.land) in the feed itself is a form of denormalization. It allows referential integrity to be broken, in the sense that a feed can declare a OneStop ID different than the one it is assigned by Transit.land and known to Transit.land users.

This is not a theoretical concern. As schemes for mapping two-dimensional vectors (geographic coordinates) onto a one-dimensional line (the space of character strings), geohashes naturally and necessarily have discontinuities. That is to say, while points close to each other are likely to have the same geohash, there are also many points close to one another that have completely different geohashes. The geohash of a feed is the geohash of a point derived from the feed contents (the centroid of a rectangle containing all service). This means that the addition of a single new stop or trip could change the geohash of a feed. Of course the people at Transit.land have thought about this and must have policies for ensuring that the IDs remain consistent over time, but that still means that these IDs must be assigned definitively by a single authority.

To me, this seems more fragile and subject to obsolescence than a distributed, cooperative model. The world of GTFS producers is relatively small enough that it should not be difficult to avoid feed_id collisions, especially since multiple global indexes of GTFS data will allow consulting lists of existing IDs.

Of course feed producers are free to coordinate with Transit.land, Google, or anyone else that maintains a unique feed ID system to decide on an ID. But at least in the case of Onestop, that seems to leads to IDs that are defined recursively. The feed's feed_info.txt supplies the alphabetical identifier (the feed_id) that serves as the potentially human-readable core of the OneStop ID.

I'm not at all discouraging use of Onestop IDs, I just don't think they should be considered a replacement for feed_ids. I'd welcome comments from people at Transit.land or who have used/implemented either system.

@sdjacobs
Copy link
Author

I agree with @abyrd re the decentralized nature of GTFS and the need for simply a locally unique feed ID. I'd push towards merging the feed_id column as-is, and letting feed producers decide on their own convention, so they certainly are welcome to use a OneStop ID if they want to. Down the line, if we do see a lot of use of OneStop, we could add an additional optional onestop_id field.

sdjacobs added a commit to sdjacobs/onebusaway-gtfs-modules that referenced this pull request Jan 25, 2018
- Changes feed_id to a String. Matches proposed GTFS extension: google/transit#62
- Add strategy to create stop_headsign from a spreadsheet
- Add stragegy to create new entrances and pathways from MTA-provided spreadsheets
- Additional fields in Pathway to match proposal here: https://docs.google.com/document/d/1qJOTe4m_a4dcJnvXYt4smYj4QQ1ejZ8CvLBYzDM5IyM/edit?ts=5a39452a
sdjacobs added a commit to sdjacobs/onebusaway-gtfs-modules that referenced this pull request Jan 25, 2018
- Changes feed_id to a String. Matches proposed GTFS extension: google/transit#62
- Add strategy to create stop_headsign from a spreadsheet
- Add stragegy to create new entrances and pathways from MTA-provided spreadsheets
@barbeau barbeau added the GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule label Aug 27, 2018
@leonardehrenfried
Copy link
Contributor

Was this idea dropped in the end? Is there a replacement?

@stale
Copy link

stale bot commented Aug 21, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Aug 21, 2021
@stale
Copy link

stale bot commented Aug 28, 2021

This pull request has been closed due to inactivity. Pull requests can always be reopened after they have been closed. See the Specification Amendment Process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants