New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feed id to gtfs feeds, resolves #1990 #1999

Merged
merged 12 commits into from Dec 22, 2015

Conversation

Projects
None yet
4 participants
@johannilsson
Contributor

johannilsson commented Jun 19, 2015

This adds improved support for multiple GTFS feeds and multi agency feeds

There is currently an issue with the gtfs-rt updater for alerts with feeds that contains multiple agencies where we can't match stop id, route id or trip ids because lookups could not be done with the agency id within the entity selector.

There's currently an issue with the gtfs-rt stop time updater because it assumes stop id, route id and trip ids is unique within OTP, this is not always the case and in these situations the update would not be possible.

This PR resolves these issues by introducing a feed id as per @abyrd suggestion in #1990.

  • Change to use a feed id instead of selecting a default agency id for feeds
  • Change to re-index gtfs feeds using the feed id
  • Change indexes that use agency id as key to group agencies with a feed id instead
  • Change to use feedId instead of defaultAgency in gtfs-rt updaters, this makes entity selector for gtfs-rt alerts to work as expected
  • Add support for multi agency feeds to the gtfs-rt updaters
  • Change index api for agencies, these api now requires a feed id
  • Add support for matching gtfs alert with a trip id
Add feed id to gtfs feeds, resolves #1990
This adds improved support for multiple GTFS feeds and multi agency feeds

* Change to use a feed id instead of selecting a default agency id for feeds
* Change to re-index gtfs feeds using the feed id
* Change indexes that use agency id as key to group agencies with a feed id instead
* Change to use feedId instead of defaultAgency in gtfs-rt updaters, this makes entity selector for gtfs-rt alerts to work as expected
* Add support for multi agency feeds to the gtfs-rt updaters
* Change index api for agencies, these api now requires a feed id
* Add support for matching gtfs alert with a trip id
Show outdated Hide outdated src/main/java/org/opentripplanner/routing/graph/GraphIndex.java
@@ -65,7 +60,7 @@
public final Map<String, TripPattern> patternForId = Maps.newHashMap();
public final Map<Stop, TransitStop> stopVertexForStop = Maps.newHashMap();
public final Map<Trip, TripPattern> patternForTrip = Maps.newHashMap();
public final Multimap<Agency, TripPattern> patternsForAgency = ArrayListMultimap.create();
public final Map<String, Multimap<Agency, TripPattern>> agencyPatternsForFeedId = Maps.newHashMap();

This comment has been minimized.

@abyrd

abyrd Jun 22, 2015

Member

It looks like the TripPatterns are still organized by agency. Because they are based on TripIds which are feed-unique I think this could just be a Multimap<String, TripPattern>.

@abyrd

abyrd Jun 22, 2015

Member

It looks like the TripPatterns are still organized by agency. Because they are based on TripIds which are feed-unique I think this could just be a Multimap<String, TripPattern>.

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Jun 22, 2015

Member

Thanks @johannilsson, this looks like the right idea. @jkoelewijn let us know if you have any comments on the real-time handling. One important thing about real-time updates is that when they are received as a continuous stream (rather than polling), the stream begins with a "full dataset" update that will erase any existing updates so it can start with a blank slate. When you have two real-time feeds and two schedule feeds, re-connecting one of the real-time feeds must only wipe out the real-time data for one of the schedule feeds (leaving the updates for the other one intact). This happens in org.opentripplanner.updater.stoptime.TimetableSnapshotSource#applyTripUpdates, where if the boolean fullDataset is true, the buffer is cleared. I believe we'd need a feed-specific clear() function for everything to work properly.

So we'd need to do:

if (fullDataset) {
    // Remove all updates from the buffer
    buffer.clear(feedId);
}
Member

abyrd commented Jun 22, 2015

Thanks @johannilsson, this looks like the right idea. @jkoelewijn let us know if you have any comments on the real-time handling. One important thing about real-time updates is that when they are received as a continuous stream (rather than polling), the stream begins with a "full dataset" update that will erase any existing updates so it can start with a blank slate. When you have two real-time feeds and two schedule feeds, re-connecting one of the real-time feeds must only wipe out the real-time data for one of the schedule feeds (leaving the updates for the other one intact). This happens in org.opentripplanner.updater.stoptime.TimetableSnapshotSource#applyTripUpdates, where if the boolean fullDataset is true, the buffer is cleared. I believe we'd need a feed-specific clear() function for everything to work properly.

So we'd need to do:

if (fullDataset) {
    // Remove all updates from the buffer
    buffer.clear(feedId);
}
@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Jun 22, 2015

Member

It's also unfortunate that we still have to call everything AgencyAndId instead of FeedScopedId because we're importing that class from the OBA library. But having the variable named feedId everywhere certainly helps understand what's going on.

Member

abyrd commented Jun 22, 2015

It's also unfortunate that we still have to call everything AgencyAndId instead of FeedScopedId because we're importing that class from the OBA library. But having the variable named feedId everywhere certainly helps understand what's going on.

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jun 22, 2015

Contributor

Thanks @abyrd, good points! I'll go over and push the necessary changes.

Contributor

johannilsson commented Jun 22, 2015

Thanks @abyrd, good points! I'll go over and push the necessary changes.

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jun 23, 2015

Contributor

This should (hopefully) fix the issues found from the review.

I haven't addressed the feed id assignment from the comment in #1990 yet. If one feed that used to have just one agency would update and contain more than one agency this would also affect configuration for that feed and might be tricky for a user to understand why they doesn't match any longer, this might be an edge case and not something that usually happens though. I do agree that it would be more logic and readable, so I'm a bit puzzled and just wanted to share my thoughts on it before moving forward with the change.

Contributor

johannilsson commented Jun 23, 2015

This should (hopefully) fix the issues found from the review.

I haven't addressed the feed id assignment from the comment in #1990 yet. If one feed that used to have just one agency would update and contain more than one agency this would also affect configuration for that feed and might be tricky for a user to understand why they doesn't match any longer, this might be an edge case and not something that usually happens though. I do agree that it would be more logic and readable, so I'm a bit puzzled and just wanted to share my thoughts on it before moving forward with the change.

Show outdated Hide outdated src/main/java/org/opentripplanner/routing/edgetype/TimetableSnapshot.java
@@ -73,6 +72,7 @@ public ServiceDate getServiceDate() {
return serviceDate;
}
@Override
public int hashCode() {
int result = Objects.hash(tripId, serviceDate);

This comment has been minimized.

@jkoelewijn

jkoelewijn Jun 29, 2015

Contributor

The hashCode-method should include the feedId

@jkoelewijn

jkoelewijn Jun 29, 2015

Contributor

The hashCode-method should include the feedId

Show outdated Hide outdated src/main/java/org/opentripplanner/routing/edgetype/TimetableSnapshot.java
* @return true if the timetable changed as a result of the call
*/
protected boolean clearTimetable(String feedId) {
return timetables.keySet().removeAll(timetables.keySet().stream()

This comment has been minimized.

@jkoelewijn

jkoelewijn Jun 29, 2015

Contributor

Wouldn't a removeIf be more efficient?

@jkoelewijn

jkoelewijn Jun 29, 2015

Contributor

Wouldn't a removeIf be more efficient?

This comment has been minimized.

@johannilsson

johannilsson Jun 30, 2015

Contributor

Switched to use removeIf, it's also a bit more readable I think. I haven't measured if one is faster than the other though. Thanks!

@johannilsson

johannilsson Jun 30, 2015

Contributor

Switched to use removeIf, it's also a bit more readable I think. I haven't measured if one is faster than the other though. Thanks!

Show outdated Hide outdated src/main/java/org/opentripplanner/routing/edgetype/TimetableSnapshot.java
* @return true if the lastAddedTripPattern changed as a result of the call
*/
protected boolean clearLastAddedTripPattern(String feedId) {
return lastAddedTripPattern.keySet().removeAll(lastAddedTripPattern.keySet().stream()

This comment has been minimized.

@jkoelewijn

jkoelewijn Jun 29, 2015

Contributor

Wouldn't a removeIf be more efficient?

@jkoelewijn

jkoelewijn Jun 29, 2015

Contributor

Wouldn't a removeIf be more efficient?

Show outdated Hide outdated src/main/java/org/opentripplanner/routing/edgetype/TripPattern.java
if (code != null && code.length() > 0) {
// The key stored in patternForId is the pattern code that is constructed as.
// Agency:RouteId:DirectionId:PatternNumber, the first part is the feed id.
return code.substring(0, code.indexOf(':'));

This comment has been minimized.

@jkoelewijn

jkoelewijn Jun 29, 2015

Contributor

What if code is non-empty, but doesn't have a ':' character?

@jkoelewijn

jkoelewijn Jun 29, 2015

Contributor

What if code is non-empty, but doesn't have a ':' character?

@jkoelewijn

This comment has been minimized.

Show comment
Hide comment
@jkoelewijn

jkoelewijn Jun 29, 2015

Contributor

The real-time code looks good. I just added a few remarks as line notes. It's good to see those lazy-initialized indices removed!

Contributor

jkoelewijn commented Jun 29, 2015

The real-time code looks good. I just added a few remarks as line notes. It's good to see those lazy-initialized indices removed!

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jun 30, 2015

Contributor

Thanks @jkoelewijn

Just pushed the suggested changes. For fetching the feedId from the TripPattern I changed to just take it from the route which always have the feed id now. Guess this will be less confusing if it's always obtained the same way.

Contributor

johannilsson commented Jun 30, 2015

Thanks @jkoelewijn

Just pushed the suggested changes. For fetching the feedId from the TripPattern I changed to just take it from the route which always have the feed id now. Guess this will be less confusing if it's always obtained the same way.

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Jun 30, 2015

Member

@johannilsson I also talked about fetching the feed ID with @jkoelewijn and @clinct today, and our conclusion is that we should just grab the feed ID from the experimental feed_id field in feed-info.txt (see http://developer.trimet.org/gtfs_ext.shtml). GTFS producers in regions that expect to have multiple overlapping static and real-time feeds should be supplying a unique feed_id anyway.

Of course our current GTFS loader library does not support this field. Our new library does, but is not yet integrated. So in the mean time we will need a hack that loads the feed-info.txt file and grabs the feed_id field. Fortunately using the CSV library in dictionary mode this should only be a couple of lines of code.

@johannilsson would you be interested in implementing this? If not no problem, I can do it.

Member

abyrd commented Jun 30, 2015

@johannilsson I also talked about fetching the feed ID with @jkoelewijn and @clinct today, and our conclusion is that we should just grab the feed ID from the experimental feed_id field in feed-info.txt (see http://developer.trimet.org/gtfs_ext.shtml). GTFS producers in regions that expect to have multiple overlapping static and real-time feeds should be supplying a unique feed_id anyway.

Of course our current GTFS loader library does not support this field. Our new library does, but is not yet integrated. So in the mean time we will need a hack that loads the feed-info.txt file and grabs the feed_id field. Fortunately using the CSV library in dictionary mode this should only be a couple of lines of code.

@johannilsson would you be interested in implementing this? If not no problem, I can do it.

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jun 30, 2015

Contributor

@abyrd et al, I guess there needs to be some sort of fallback if the feed_id is not available as well? feed_info.txt it self is also marked as optional.

Contributor

johannilsson commented Jun 30, 2015

@abyrd et al, I guess there needs to be some sort of fallback if the feed_id is not available as well? feed_info.txt it self is also marked as optional.

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Jun 30, 2015

Member

Yes there does need to be a fallback, but I think the fallback should just be sequential numbering: like FEED1, FEED2 or just numbers 0,1,2,3...

Any feed that is intended to be matched with realtime information should really declare its own unique feed ID, otherwise it's always a matter of guessing. The feed-info.txt and feed_id fields are optional, but important when working with realtime. Despite my earlier comments after further reflection I don't think we can fall back on the filename because it may be quite long, and the majority of people who don't want to set up realtime feeds will end up with really long identifiers, such a stop called my-region-2015-06-23-gtfs:1234

Member

abyrd commented Jun 30, 2015

Yes there does need to be a fallback, but I think the fallback should just be sequential numbering: like FEED1, FEED2 or just numbers 0,1,2,3...

Any feed that is intended to be matched with realtime information should really declare its own unique feed ID, otherwise it's always a matter of guessing. The feed-info.txt and feed_id fields are optional, but important when working with realtime. Despite my earlier comments after further reflection I don't think we can fall back on the filename because it may be quite long, and the majority of people who don't want to set up realtime feeds will end up with really long identifiers, such a stop called my-region-2015-06-23-gtfs:1234

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jun 30, 2015

Contributor

Makes sense, how feeds are named and managed can vary quite a lot so makes sense to not expose that through the apis too.

I can implement the change for fetching or generating the feed id this way.

Contributor

johannilsson commented Jun 30, 2015

Makes sense, how feeds are named and managed can vary quite a lot so makes sense to not expose that through the apis too.

I can implement the change for fetching or generating the feed id this way.

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Jun 30, 2015

Member

Thanks @johannilsson, in that case we're not far from merging your PR into a testing branch and trying it out with the Dutch national non-rail feed + separate NS rail feed. Once everything is checked out this should be going into the master branch of OTP. By the way, we were curious where/how you are using these features. It seems like there must be a big real-time enabled OTP server somewhere :)

Member

abyrd commented Jun 30, 2015

Thanks @johannilsson, in that case we're not far from merging your PR into a testing branch and trying it out with the Dutch national non-rail feed + separate NS rail feed. Once everything is checked out this should be going into the master branch of OTP. By the way, we were curious where/how you are using these features. It seems like there must be a big real-time enabled OTP server somewhere :)

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jul 1, 2015

Contributor

Looking forward to see how it will work with these feeds!

This is planned for an app that has not been released yet. I'm happy to share the details once it's out :)

Contributor

johannilsson commented Jul 1, 2015

Looking forward to see how it will work with these feeds!

This is planned for an app that has not been released yet. I'm happy to share the details once it's out :)

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jul 1, 2015

Contributor

Changed the strategy for getting feed ids now. This is how it's intended to work now;

  • If a feed_info.txt is available and contains a feed_id that will be used.
  • If we couldn't find a feed_id in the feed a numeric id will be used.

Each time a GtfsFeedId is created a feed id counter will increase, this approach makes sure that if a feed will be updated with a feed_id the other feeds will still use their previously assigned feed_id (unless the order of reading the files has been changed).

I also did a test with two feeds that use the same agency ids and discovered that the second feed had their agency ids renamed because they was already present in the first feed. We don't need to do that anymore since they should be unique within each feed so changed that as well.

Contributor

johannilsson commented Jul 1, 2015

Changed the strategy for getting feed ids now. This is how it's intended to work now;

  • If a feed_info.txt is available and contains a feed_id that will be used.
  • If we couldn't find a feed_id in the feed a numeric id will be used.

Each time a GtfsFeedId is created a feed id counter will increase, this approach makes sure that if a feed will be updated with a feed_id the other feeds will still use their previously assigned feed_id (unless the order of reading the files has been changed).

I also did a test with two feeds that use the same agency ids and discovered that the second feed had their agency ids renamed because they was already present in the first feed. We don't need to do that anymore since they should be unique within each feed so changed that as well.

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Jul 8, 2015

Member

The merged code is in branch testFeedIds. Our next step is to test it out with the new NS and non-NS Netherlands feeds, and their accompanying realtime streams.

Member

abyrd commented Jul 8, 2015

The merged code is in branch testFeedIds. Our next step is to test it out with the new NS and non-NS Netherlands feeds, and their accompanying realtime streams.

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Aug 3, 2015

Member

@jkoelewijn and @clinct have you guys done much testing of this code? How does it look?

Member

abyrd commented Aug 3, 2015

@jkoelewijn and @clinct have you guys done much testing of this code? How does it look?

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Aug 5, 2015

Contributor

An update from me, we have had these changes deployed for a while now and haven't seen any oddities on our end so far. It's a limited setup though sense we're mainly pushing alerts, but it is on a fairly big multi agency feed. Just let me know if I can help in any way.

Contributor

johannilsson commented Aug 5, 2015

An update from me, we have had these changes deployed for a while now and haven't seen any oddities on our end so far. It's a limited setup though sense we're mainly pushing alerts, but it is on a fairly big multi agency feed. Just let me know if I can help in any way.

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Dec 22, 2015

Member

Hi @johannilsson, as far as I know no one's done any additional formal testing of this yet, but I'd like to move forward with getting outstanding PRs merged. If you guys have continued using this code with no problems, I will proceed and merge it, along with your #2001 #2058.

Member

abyrd commented Dec 22, 2015

Hi @johannilsson, as far as I know no one's done any additional formal testing of this yet, but I'd like to move forward with getting outstanding PRs merged. If you guys have continued using this code with no problems, I will proceed and merge it, along with your #2001 #2058.

@abyrd abyrd merged commit 10bfc9c into opentripplanner:master Dec 22, 2015

@andreyz

This comment has been minimized.

Show comment
Hide comment
@andreyz

andreyz Dec 22, 2015

Contributor

Thanks @abyrd. We've been running this for couple of months now. @johannilsson needs to lay concrete, that's how good his code is!

Contributor

andreyz commented Dec 22, 2015

Thanks @abyrd. We've been running this for couple of months now. @johannilsson needs to lay concrete, that's how good his code is!

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Dec 22, 2015

Contributor

@abyrd Thanks! This is super! Yes as @andreyz said we've been running this for quite a while now. We're running our own RT server so we're in control of what it produces though. It's tested on other publicly available RT feeds as well but there might be cases that isn't covered. In those cases we are still active within the community and can help out as much as we can of course.

I think the docs needs to be updated with the migration from agency id to feed id though, just realised it only briefly mentioned when adding a new RT source. Or at least cover this in a changelog since I guess this change might surprise some that previously relied on the default agency id being present. Where do you think is good place to mention this?

Contributor

johannilsson commented Dec 22, 2015

@abyrd Thanks! This is super! Yes as @andreyz said we've been running this for quite a while now. We're running our own RT server so we're in control of what it produces though. It's tested on other publicly available RT feeds as well but there might be cases that isn't covered. In those cases we are still active within the community and can help out as much as we can of course.

I think the docs needs to be updated with the migration from agency id to feed id though, just realised it only briefly mentioned when adding a new RT source. Or at least cover this in a changelog since I guess this change might surprise some that previously relied on the default agency id being present. Where do you think is good place to mention this?

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this pull request Dec 23, 2015

Fix issue with AgencyId comparsion when applying AlertPatch
This makes the changes in #1999 and #2001 compatible with each other
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment