New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agency id is not correctly mapped, causes problem with realtime alerts #1990

Closed
johannilsson opened this Issue Jun 14, 2015 · 14 comments

Comments

Projects
None yet
3 participants
@johannilsson
Contributor

johannilsson commented Jun 14, 2015

I have one feed (Sweden) that contains multiple agencies. The assigning of a default agency id when building the graph causes some problems with GTFS-RT updates for alerts because OTP assumes the provided agency id (or the default agency id configured) is also mapped and stored in the index with this pattern for trip ids, route ids and stop ids.

This has also been discussed in #1756

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jun 14, 2015

@JordenVerwer

This comment has been minimized.

Show comment
Hide comment
@JordenVerwer

JordenVerwer Jun 14, 2015

Member

That's odd, because the changes referred to in #1756 were made specifically to get GTFS-RT working correctly with multiple-agency feeds. However, there have been changes to the code since that time and it's possible they were made without due regard to GTFS-RT compatibility. I know we at GoAbout don't use OTP master because it's not sufficiently stable (stable as in "not changing", not as in "not crashing"), but I don't know if anyone else uses OTP master with the full Dutch GTFS-RT feed.

Member

JordenVerwer commented Jun 14, 2015

That's odd, because the changes referred to in #1756 were made specifically to get GTFS-RT working correctly with multiple-agency feeds. However, there have been changes to the code since that time and it's possible they were made without due regard to GTFS-RT compatibility. I know we at GoAbout don't use OTP master because it's not sufficiently stable (stable as in "not changing", not as in "not crashing"), but I don't know if anyone else uses OTP master with the full Dutch GTFS-RT feed.

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jun 14, 2015

Contributor

Thanks @JordenVerwer,

Trip or stop time updates works as expected, this is because the lookup towards the index is done without an agency id. I'm currently testing with the patch above that uses the same approach for alerts.

The problem seems to be that OTP assigns all stop id, route id and trip id with the same agency id when graph is built. This breaks the lookup towards the index because it expects a agency id that is not related to the referenced id. If the assigned agency id (from graph build) is added to the GTFS-RT feed for alerts matching seems to work. This is a bit tricky to work around though sense this agency id changes for each build.

Contributor

johannilsson commented Jun 14, 2015

Thanks @JordenVerwer,

Trip or stop time updates works as expected, this is because the lookup towards the index is done without an agency id. I'm currently testing with the patch above that uses the same approach for alerts.

The problem seems to be that OTP assigns all stop id, route id and trip id with the same agency id when graph is built. This breaks the lookup towards the index because it expects a agency id that is not related to the referenced id. If the assigned agency id (from graph build) is added to the GTFS-RT feed for alerts matching seems to work. This is a bit tricky to work around though sense this agency id changes for each build.

@JordenVerwer

This comment has been minimized.

Show comment
Hide comment
@JordenVerwer

JordenVerwer Jun 14, 2015

Member

Correct, you need to specify the agency ID for the GTFS-RT feed you use and this needs to match the default agency ID of the GTFS feed. This is our workaround for OBA's lack of support for feed IDs. Agency ID are basically used as feed IDs in OTP, but this approach is problematic. Other approaches are at least equally problematic, unfortunately.

The idea was to replace OBA with our own GTFS loader code, but for a variety of reasons (which all come down to money in the end, as with everything) that hasn't happened yet.

Member

JordenVerwer commented Jun 14, 2015

Correct, you need to specify the agency ID for the GTFS-RT feed you use and this needs to match the default agency ID of the GTFS feed. This is our workaround for OBA's lack of support for feed IDs. Agency ID are basically used as feed IDs in OTP, but this approach is problematic. Other approaches are at least equally problematic, unfortunately.

The idea was to replace OBA with our own GTFS loader code, but for a variety of reasons (which all come down to money in the end, as with everything) that hasn't happened yet.

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jun 15, 2015

Contributor

@JordenVerwer I can see the problem with OBA and why agency id is used as an feed id. The problem with the current GTFS-RT implementation for alerts is that this limitations isn't considered from what I can see and experiencing.

If I specify a default agency id in the configuration for the feed this will only work if the feed doesn't include an agency id. If the feed includes an agency id that id will be used and matching to stops, trips and routes will fail.

The updater for stop time updates handles the passed agency id as a feed id. All lookups for stops, routes and trips is there done without the agency id, the GraphIndex contains the following maps for storing these references routeForIdWithoutAgency, tripForIdWithoutAgency and stopForIdWithoutAgency and is lazily populated when requested from the stop time updater.

The problem with this approach is that there might be conflicts with ids, but that problem already exists and would at least be the same for both the stop time updater and the alerts updater if the implementation for the alerts updater would be modified.

Contributor

johannilsson commented Jun 15, 2015

@JordenVerwer I can see the problem with OBA and why agency id is used as an feed id. The problem with the current GTFS-RT implementation for alerts is that this limitations isn't considered from what I can see and experiencing.

If I specify a default agency id in the configuration for the feed this will only work if the feed doesn't include an agency id. If the feed includes an agency id that id will be used and matching to stops, trips and routes will fail.

The updater for stop time updates handles the passed agency id as a feed id. All lookups for stops, routes and trips is there done without the agency id, the GraphIndex contains the following maps for storing these references routeForIdWithoutAgency, tripForIdWithoutAgency and stopForIdWithoutAgency and is lazily populated when requested from the stop time updater.

The problem with this approach is that there might be conflicts with ids, but that problem already exists and would at least be the same for both the stop time updater and the alerts updater if the implementation for the alerts updater would be modified.

@JordenVerwer

This comment has been minimized.

Show comment
Hide comment
@JordenVerwer

JordenVerwer Jun 15, 2015

Member

Oh, that's interesting. It means that somebody broke the stop time updater for multiple feeds (with conflicting IDs). In that case it's no surprise that the alerts updater is broken too.

As for the actual issue with GTFS-RT Alerts as you describe it here, that seems to stem from the possibility (left open by the GTFS-RT Reference) to overspecify routes by also including an agency ID. I believe the right approach would be to look up routes by ID using the default agency ID as a surrogate feed ID, then check if the route's agency ID (if any) matches the agency ID given in the alert's EntitySelector. However, I'm not as well versed in alerts as I am in stop time updates, so I could be wrong here.

Member

JordenVerwer commented Jun 15, 2015

Oh, that's interesting. It means that somebody broke the stop time updater for multiple feeds (with conflicting IDs). In that case it's no surprise that the alerts updater is broken too.

As for the actual issue with GTFS-RT Alerts as you describe it here, that seems to stem from the possibility (left open by the GTFS-RT Reference) to overspecify routes by also including an agency ID. I believe the right approach would be to look up routes by ID using the default agency ID as a surrogate feed ID, then check if the route's agency ID (if any) matches the agency ID given in the alert's EntitySelector. However, I'm not as well versed in alerts as I am in stop time updates, so I could be wrong here.

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Jun 15, 2015

Member

On 14 Jun 2015, at 23:10, JordenVerwer notifications@github.com wrote:
The idea was to replace OBA with our own GTFS loader code, but for a variety of reasons (which all come down to money in the end, as with everything) that hasn't happened yet.

The good news is that the replacement GTFS loader library you mention is already written and used in other projects, it “just" needs to be integrated with OTP.

The problem is in fact less about money than availability of developers. This replacement would have to be done by someone who really knows how OTP works, and those people are often busy with other things.

-Andrew

Member

abyrd commented Jun 15, 2015

On 14 Jun 2015, at 23:10, JordenVerwer notifications@github.com wrote:
The idea was to replace OBA with our own GTFS loader code, but for a variety of reasons (which all come down to money in the end, as with everything) that hasn't happened yet.

The good news is that the replacement GTFS loader library you mention is already written and used in other projects, it “just" needs to be integrated with OTP.

The problem is in fact less about money than availability of developers. This replacement would have to be done by someone who really knows how OTP works, and those people are often busy with other things.

-Andrew

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Jun 15, 2015

Member

@JordenVerwer thanks for pointing out that there is a problem with the stoptime updater. The problem is that the current behavior is so counterintuitive that it's possible to break and re-break it as a part of well-meaning repairs. This might have happened as part of @jkoelewijn's recent work on realtime updates. If you have a moment, it would be helpful if you could spot what commit caused this regression.

Really, we need to do the minimum viable thing that will make the updaters behave in the expected way. I believe this would be manually assigning feed IDs to GTFS feeds when more than one is loaded, then also manually associating real-time feeds with those feed IDs. Then rather than indexes of routes and trips without agency IDs, we'd have an index of routes and trips per GTFS schedule feed ID. This also solves the problem where one of several streaming update feeds gets disconnected, then wipes out the updates for not only its scheduled feed but all other feeds, because it can't tell which routes and trips are from its own scheduled feed.

Member

abyrd commented Jun 15, 2015

@JordenVerwer thanks for pointing out that there is a problem with the stoptime updater. The problem is that the current behavior is so counterintuitive that it's possible to break and re-break it as a part of well-meaning repairs. This might have happened as part of @jkoelewijn's recent work on realtime updates. If you have a moment, it would be helpful if you could spot what commit caused this regression.

Really, we need to do the minimum viable thing that will make the updaters behave in the expected way. I believe this would be manually assigning feed IDs to GTFS feeds when more than one is loaded, then also manually associating real-time feeds with those feed IDs. Then rather than indexes of routes and trips without agency IDs, we'd have an index of routes and trips per GTFS schedule feed ID. This also solves the problem where one of several streaming update feeds gets disconnected, then wipes out the updates for not only its scheduled feed but all other feeds, because it can't tell which routes and trips are from its own scheduled feed.

@JordenVerwer

This comment has been minimized.

Show comment
Hide comment
@JordenVerwer

JordenVerwer Jun 15, 2015

Member

I agree that the current approach is extremly counterintuitive and prone to breakage. It wasn't really meant to last this long, but things happen sometimes.

I'll see if I can find some time to look at this in more detail. We want to move to OTP master at some point, but right now the barrier is so high (because a lot of things changed) that we're not sure we want to spend time on it without clear benefits. Working ARNU support is probably what will end up being the most convincing reason to switch over, but right now we're relatively comfortable running a somewhat older version.

Anyway, I'm going to try to make this more of a priority for us.

Member

JordenVerwer commented Jun 15, 2015

I agree that the current approach is extremly counterintuitive and prone to breakage. It wasn't really meant to last this long, but things happen sometimes.

I'll see if I can find some time to look at this in more detail. We want to move to OTP master at some point, but right now the barrier is so high (because a lot of things changed) that we're not sure we want to spend time on it without clear benefits. Working ARNU support is probably what will end up being the most convincing reason to switch over, but right now we're relatively comfortable running a somewhat older version.

Anyway, I'm going to try to make this more of a priority for us.

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jun 15, 2015

Contributor

This sounds like a good approach that @abyrd propose, it would make sense if we could map a feed to a default agency id (feed id) in the build configuration. This would make it easier to configure the updaters as well sense the default agency id would be stable.

Contributor

johannilsson commented Jun 15, 2015

This sounds like a good approach that @abyrd propose, it would make sense if we could map a feed to a default agency id (feed id) in the build configuration. This would make it easier to configure the updaters as well sense the default agency id would be stable.

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Jun 15, 2015

Member

@johannilsson and it matches the way things will work when we replace the GTFS loader, which will facilitate the transition.

Member

abyrd commented Jun 15, 2015

@johannilsson and it matches the way things will work when we replace the GTFS loader, which will facilitate the transition.

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jun 15, 2015

Contributor

@abyrd I could give it a try to add a default agency id if it would be of any help.

Contributor

johannilsson commented Jun 15, 2015

@abyrd I could give it a try to add a default agency id if it would be of any help.

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Jun 15, 2015

Member

If we're going to do this, we should entirely stop using the term "agency ID" to mean "feed ID". So the idea would be to add a feed ID to GTFS feeds at graph build time. Ideally since GTFS feed files are auto-detected this would be read from the feed-info.txt in the GTFS feed itself (there's an extension just waiting for implemetation). But... that means replacing the OBA loader :) So maybe in the interim we could just use the filename of the GTFS feed up to the first '.' (without the extensions) as its ID.

Then in the runtime router config (rather than the build config) you'd have to be able to say "this updater is for the scheduled feed with ID X". And of course this will require some new indexes to work right.

If you're interested in working on this of course we always welcome contributions!

Member

abyrd commented Jun 15, 2015

If we're going to do this, we should entirely stop using the term "agency ID" to mean "feed ID". So the idea would be to add a feed ID to GTFS feeds at graph build time. Ideally since GTFS feed files are auto-detected this would be read from the feed-info.txt in the GTFS feed itself (there's an extension just waiting for implemetation). But... that means replacing the OBA loader :) So maybe in the interim we could just use the filename of the GTFS feed up to the first '.' (without the extensions) as its ID.

Then in the runtime router config (rather than the build config) you'd have to be able to say "this updater is for the scheduled feed with ID X". And of course this will require some new indexes to work right.

If you're interested in working on this of course we always welcome contributions!

@johannilsson

This comment has been minimized.

Show comment
Hide comment
@johannilsson

johannilsson Jun 17, 2015

Contributor

I've started with this in this branch, https://github.com/johannilsson/OpenTripPlanner/tree/add-feed-id-to-feeds.

Not sure if this is how you thought about it, I'm currently removing all indexes that use the agency as the key to remove collisions in those places. The feed also have the default agency id replaced with the feed id that is picked up from the filename of the gtfs feed. No work has been done on the gtfs-rt updaters yet.

Before I put more work into it, it would be good to know if this in the "right" direction though.

Edit: Updated reference to branch.

Contributor

johannilsson commented Jun 17, 2015

I've started with this in this branch, https://github.com/johannilsson/OpenTripPlanner/tree/add-feed-id-to-feeds.

Not sure if this is how you thought about it, I'm currently removing all indexes that use the agency as the key to remove collisions in those places. The feed also have the default agency id replaced with the feed id that is picked up from the filename of the gtfs feed. No work has been done on the gtfs-rt updaters yet.

Before I put more work into it, it would be good to know if this in the "right" direction though.

Edit: Updated reference to branch.

@abyrd

This comment has been minimized.

Show comment
Hide comment
@abyrd

abyrd Jun 22, 2015

Member

Hi @johannilsson, sorry for the lag. I'm currently reviewing your work. So far it seems like the right approach to me, I'll give you more feedback later on your pull requests. It is correct that we don't need to use agency IDs in index keys at all, i.e. rather than agencyId:entityId we would refer to routes, trips etc. as feedId:entityId, so those agencyIds would disappear entirely from the indexes.

In fact according to the spec (https://developers.google.com/transit/gtfs/reference) all IDs are unique at the feed level, so agencies are not structurally important, they are just an extra table of information attached to routes so we can tell the riders who is operating their bus.

On further reflection though, to adhere to the "principle of least surprise", we should probably have somewhat more nuanced rules for automatically assigning feed IDs to feeds (until we use our own GTFS loader which can read the feedId field). If a feed contains only one agency, we would usually get more human-readable feedId:entityId names if we just reused that agency's ID.

Member

abyrd commented Jun 22, 2015

Hi @johannilsson, sorry for the lag. I'm currently reviewing your work. So far it seems like the right approach to me, I'll give you more feedback later on your pull requests. It is correct that we don't need to use agency IDs in index keys at all, i.e. rather than agencyId:entityId we would refer to routes, trips etc. as feedId:entityId, so those agencyIds would disappear entirely from the indexes.

In fact according to the spec (https://developers.google.com/transit/gtfs/reference) all IDs are unique at the feed level, so agencies are not structurally important, they are just an extra table of information attached to routes so we can tell the riders who is operating their bus.

On further reflection though, to adhere to the "principle of least surprise", we should probably have somewhat more nuanced rules for automatically assigning feed IDs to feeds (until we use our own GTFS loader which can read the feedId field). If a feed contains only one agency, we would usually get more human-readable feedId:entityId names if we just reused that agency's ID.

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jun 23, 2015

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jun 23, 2015

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jun 23, 2015

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jun 23, 2015

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jun 30, 2015

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jun 30, 2015

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jun 30, 2015

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jun 30, 2015

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jul 1, 2015

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jul 1, 2015

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jul 1, 2015

Change to get feed_id from feed_info.txt, for #1990
* Use feed_id if available in feed_info.txt
* If feed_id is not available a generated feed id will be used

johannilsson added a commit to johannilsson/OpenTripPlanner that referenced this issue Jul 6, 2015

@abyrd abyrd closed this in b5ae935 Dec 22, 2015

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

Add feed id to gtfs feeds, resolves #1990
This adds improved support for multiple GTFS feeds and multi agency feeds

* Change to use a feed id instead of selecting a default agency id for feeds
* Change to re-index gtfs feeds using the feed id
* Change indexes that use agency id as key to group agencies with a feed id instead
* Change to use feedId instead of defaultAgency in gtfs-rt updaters, this makes entity selector for gtfs-rt alerts to work as expected
* Add support for multi agency feeds to the gtfs-rt updaters
* Change index api for agencies, these api now requires a feed id
* Add support for matching gtfs alert with a trip id

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

flibbertigibbet pushed a commit to flibbertigibbet/OpenTripPlanner that referenced this issue Jan 25, 2016

Change to get feed_id from feed_info.txt, for #1990
* Use feed_id if available in feed_info.txt
* If feed_id is not available a generated feed id will be used
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment