Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add TripUpdate.is_trip_finished #241

Closed
wants to merge 1 commit into from

Conversation

barbeau
Copy link
Collaborator

@barbeau barbeau commented Aug 14, 2020

There is processing overhead for consuming each TripUpdate and applying the predicted arrival and departure times to the internal data model.

The goal of this proposal is to provide a data attribute, TripUpdate.is_trip_finished, for publishers to inform consumers that a trip has ended and no additional TripUpdates with different values will be provided in future feed messages. It allows consumers to avoid continuously processing these updates if they have not changed.

Note that this proposal does not change the expectations of how producers should publish TripUpdates and when they are allowed to be removed from the feed. More specifically, even if this field is set to true, producers should not drop a TripUpdate from a feed if it is prior to the scheduled arrival time for the last stop on the trip, as otherwise it will be concluded that there is no update for this trip (which is the current GTFS-realtime spec behavior).

@barbeau barbeau added the GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime label Aug 14, 2020
@google-cla google-cla bot added the cla: yes label Aug 14, 2020
@skinkie
Copy link
Contributor

skinkie commented Aug 14, 2020

@barbeau who requests this kind of additions? If the trip is finished, it should be removed. If it is not finished, I would expect that stop cancels will be send, not that I have to infer them.

@tleboulenge
Copy link

@skinkie I requested this :)
It's really just a minor optimization-level feature. This comes from a new need we have to be able to display the status of a trip, including a "running" and a "arrived" status.

The problem is that a feed being removed is not a signal of anything, it's just missing information (maybe on purpose, e.g. in the case the trip finished, but maybe not). And as you know, it does happen a lot that a trip just disappears from a feed, even for just a few messages.

Take this example:

  • At 10:42, trip has ETA to last stop of 10:44

It's 10:45 now, do you assume:

  1. The train has arrived at 10:44, and was removed from the feed? [ARRIVED]
  2. The train is stuck at a signal in the arrival station, but is somehow not reported? [RUNNING]

This proposal would make the producer issue one clear "trip_is_finished" message in case (1). And consumers could then rely on this stronger signal than inferring from the last few messages whether the train has arrived or not.

@jxeeno
Copy link
Contributor

jxeeno commented Aug 18, 2020

Hmm... I'm still not too sure why this is needed.

If the primary reason for this is to reduce processing overhead, I think this would be better solved by working with producers to make sure they properly communicate the time when the TripUpdate was generated and populate the TripUpdate entity's timestamp with that value. If the value is unchanged for that trip between messages, then don't reprocess the TripUpdate. Many producers already do this and it solves the question of "is this a new TripUpdate or is it the same one as before" more generally... rather than just for the last stop.

If the reason is to convey to end users that the trip has finished, then imo that should be expressed through vehicle positions.

@skinkie
Copy link
Contributor

skinkie commented Aug 18, 2020

I am with this on @jxeeno. If you want arriving and departures a websocket based vehicle position feed is exactly what you want here. OpenTripPlanner supports this for 7 years now. The pull based infrastructure is just legacy.

In your example; if the data is removed from the feed, I assume that the last reported ETA applies, but even if it did not, the data at 10:45 suggest so. I do not see the need to retain finished trips, especially if it is not specified when the trip is finished should be discarded.

@tleboulenge
Copy link

tleboulenge commented Aug 19, 2020

If the reason is to convey to end users that the trip has finished, then imo that should be expressed through vehicle positions.

Yes that's generally the idea, and indeed might be a better fit for Vehicle Position than Trip Updates. It might be hard to detect right now in a VP feed that the trip is over (since the train might keep moving, or GPS might not get a reading exactly at the position of the last stop), so having that extra explicit bit there makes sense to me.

I am with this on @jxeeno. If you want arriving and departures a websocket based vehicle position feed is exactly what you want here. OpenTripPlanner supports this for 7 years now. The pull based infrastructure is just legacy.

Haha yes, hardly the "small optimisation" I was looking for... but I don't disagree that the streaming model for instant real-time information seems like a good fit.

@jxeeno
Copy link
Contributor

jxeeno commented Aug 19, 2020

It might be hard to detect right now in a VP feed that the trip is over (since the train might keep moving, or GPS might not get a reading exactly at the position of the last stop), so having that extra explicit bit there makes sense to me.

So maybe a vehicle status enum in the VP feed is the way to go? Something that's able to describe the typical lifecycle of a vehicle's journey on a trip - comparable to what some producers provide in SIRI-VM feeds: e.g. assigned, atOrigin, completed, inProgress, offRoute.

Are there any producers who you're working with to provide this more explicit flag? I'd love to get a producer POV of this because I tend to find that producer AVLs are generally no better at determining ambiguous trip ends than what we are able to do as a consumer by tracking vehicle positions over time.

@barbeau
Copy link
Collaborator Author

barbeau commented Aug 19, 2020

If the primary reason for this is to reduce processing overhead, I think this would be better solved by working with producers to make sure they properly communicate the time when the TripUpdate was generated and populate the TripUpdate entity's timestamp with that value.

@jxeeno I agree with this! @paulswartz (MBTA) actually asked a question a month or two back on MobilityData Slack if anyone was using TripUpdate.timstamp, and it sounds like the answer is yes from your response.

The current definition of TripUpdate.timestamp has always bothered me:

Moment at which the vehicle's real-time progress was measured. In POSIX time (i.e., the number of seconds since January 1st 1970 00:00:00 UTC).

IMHO this isn't a good definition for a prediction timestamp, as a prediction could change even without new vehicle information. Maybe something like this would be better:

Moment at which the arrival and departure prediction(s) in this TripUpdate were generated. This value should only change when the prediction(s) (delay or time) within this TripUpdate were refreshed. In POSIX time (i.e., the number of seconds since January 1st 1970 00:00:00 UTC).

Improvements are welcome, and I'd be curious to know if anyone considers this a breaking change based on their current use of the field.

So maybe a vehicle status enum in the VP feed is the way to go? Something that's able to describe the typical lifecycle of a vehicle's journey on a trip - comparable to what some producers provide in SIRI-VM feeds: e.g. assigned, atOrigin, completed, inProgress, offRoute.

Are there any producers who you're working with to provide this more explicit flag? I'd love to get a producer POV of this because I tend to find that producer AVLs are generally no better at determining ambiguous trip ends than what we are able to do as a consumer by tracking vehicle positions over time.

@jxeeno @tleboulenge I agree that factual observations (vs. predicted values) about the vehicle state are better suited in VehiclePositions.

Note that we do currently have VehiclePosition.VehicleStopStatus, with the values:

Value Comment
INCOMING_AT The vehicle is just about to arrive at the stop (on a stop display, the vehicle symbol typically flashes).
STOPPED_AT The vehicle is standing at the stop.
IN_TRANSIT_TO The vehicle has departed the previous stop and is in transit.

In theory, if the vehicle is STOPPED_AT the last stop of the trip, it should signal that the trip is complete.

However, to my knowledge very few producers publish VehicleStopStatus, possibly because of the reason @jxeeno cites above.

I'd also love to hear from more producers on this topic.

If you want arriving and departures a websocket based vehicle position feed is exactly what you want here.

Yes, although this is certainly a more complex solution, both from a spec perspective (given it's currently "undefined") and producer implementations. Let's continue that discussion on the DIFFERENTIAL issue at #84.

@jxeeno
Copy link
Contributor

jxeeno commented Aug 21, 2020

Improvements are welcome, and I'd be curious to know if anyone considers this a breaking change based on their current use of the field.

Yeah, I'd like to see it changed to something similar as well. We probably want to include changes to StopTimeUpdate.schedule_relationship as well so we don't miss stops being skipped, etc...

There are producers out (e.g. Sydney Trains) there that are following the spec -- i.e. populating the timestamp field with the last vehicle position reported rather than last prediction. As @tleboulenge alluded to, that's a problem when the AVL source is a transponder-based or track-side signalling based that only reports when a train begins to occupy a section of the track. If a train is unexpectedly held at a danger signal, in the background, delay predictions are still being updated because it's not occupying the next section of track within the timetabled period of time. However, the timestamp isn't being incremented because no new location is provided.

@barbeau
Copy link
Collaborator Author

barbeau commented Aug 21, 2020

We probably want to include changes to StopTimeUpdate.schedule_relationship as well so we don't miss stops being skipped, etc...

Good point - here's some updated language that should cover any future field additions too, and attempts to better define "refreshed":

Moment at which the arrival and departure prediction(s), schedule_relationship, etc. in this TripUpdate were generated. This value should only change when the data within this TripUpdate is refreshed. If data is updated but the value remains the same, the timestamp should still be updated to the most recent data generation time. In POSIX time (i.e., the number of seconds since January 1st 1970 00:00:00 UTC).

@jxeeno
Copy link
Contributor

jxeeno commented Sep 25, 2020

@barbeau is this one on your radar any time soon? if not, happy to raise a PR for it.

@barbeau
Copy link
Collaborator Author

barbeau commented Sep 25, 2020

@jxeeno I probably won't get to this until early next week, so if you'd like to beat me to it with a new PR, feel free! And we can close this one out.

@barbeau
Copy link
Collaborator Author

barbeau commented Sep 28, 2020

Per the above discussion, I'm closing this proposal in favor of a new proposal at #250 that updates the definition of TripUpdate.timestamp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants