Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timepoint=empty (with no times specified) #61

Open
antrim opened this issue May 30, 2017 · 8 comments · May be fixed by #474
Open

timepoint=empty (with no times specified) #61

antrim opened this issue May 30, 2017 · 8 comments · May be fixed by #474
Labels
GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule Status: Ready for Pull Request Issues that are ready to be transferred to the Pull Request stage.

Comments

@antrim
Copy link
Contributor

antrim commented May 30, 2017

The intended meaning of an empty stop_times.timepoint value may be unclear. Quoted below from trips.txt. See, in particular, statements that are somewhat at odds with each other in bold italic below.

Field Name Required Details
timepoint Optional The timepoint field can be used to indicate if the specified arrival and departure times for a stop are strictly adhered to by the transit vehicle or if they are instead approximate and/or interpolated times. The field allows a GTFS producer to provide interpolated stop times that potentially incorporate local knowledge, but still indicate if the times are approximate. For stop-time entries with specified arrival and departure times, valid values for this field are:
* empty - Times are considered exact.
* 0 - Times are considered approximate.
* 1 - Times are considered exact.
For stop-time entries without specified arrival and departure times, feed consumers must interpolate arrival and departure times. Feed producers may optionally indicate that such an entry is not a timepoint (value=0) but it is an error to mark a entry as a timepoint (value=1) without specifying arrival and departure times.

If there are no times provided and stop_times.timepoint is empty, then should feed consumers therefore assume that times should be interpolated? Is there any way of indicating times should not be interpolated, and would that ever be necessary?

@antrim
Copy link
Contributor Author

antrim commented May 30, 2017

Relevant thread from GTFS-changes here: https://groups.google.com/d/msg/transit-developers/dwd96EwJqIc/N5DJxP8VCQAJ

@barbeau
Copy link
Collaborator

barbeau commented May 30, 2017

My expectation for this feature when proposed/accepted was that there would be two types of feeds:

  1. Legacy feeds before the timepoint field existed, where the CSV header timepoint does not exist in stop_times.txt
  2. New feeds going forward where timepoint field is added to stop_times.txt header, and all records in stop_times.txt have either a 0 or a 1

empty as currently written means that stop_times.txt is missing the CSV timepoint header (at least that's how I always read it), and therefore there are no timepoint values for any records in stop_times.txt - in other words, it's a legacy feed as described in 1) above. IMHO You shouldn't have a stop_times.txt file with a timepoint CSV header that has some records with empty timepoint values - this should be an error. I think some of the wording is a bit awkward as it was edited from the original text.

To clarify this, I suggest we remove * empty - Times are considered exact as a valid value and update the text:

Field Name Required Details
timepoint Optional The timepoint field can be used to indicate if the specified arrival and departure times for a stop are strictly adhered to by the transit vehicle or if they are instead approximate and/or interpolated times. The field allows a GTFS producer to provide interpolated stop times that potentially incorporate local knowledge, but still indicate if the times are approximate. If a GTFS feed stop_times.txt file does not contain the timepoint field, feed producers should only provide arrival and departure times for stops that are timepoints. In this case, consumers must interpolate all entries that do not have specified arrival and departure times. If a GTFS feed stop_times.txt file contains the timepoint field, every row in stop_times.txt should have a value for timepoint. Valid values for this field are:
* 0 - Times are considered approximate.
* 1 - Times are considered exact.
If timepoint has a value of 0, arrival and departure times do not need to be provided for that stop. If timepoint has a value of 1, arrival and departure times must be provided for that stop.

@barbeau barbeau added the GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule label Aug 27, 2018
@antrim
Copy link
Contributor Author

antrim commented Oct 1, 2020

The spec currently reads:

0 - Times are considered approximate.
1 or empty - Times are considered exact.

Other notes are provided in the arrival_time and departure_time field definitions.

Disallowing an empty value for timepoint would not technically be a backwards-compatible change (though it's a very minor breaking change).

Should we:

  • Discuss a change to the spec?
  • Discuss a change to the Best Practices and/or validation software?
  • Do nothing and/or close this issue?
  • Something else?

@barbeau
Copy link
Collaborator

barbeau commented Oct 1, 2020

The definition of "empty" within the GTFS spec hasn't been standardized, and generally speaking it should be - although this is a broader issue than just about the timepoint field.

I'd still like to see the spec clarified to define what is allowed and what is not about this field. In retrospect, IMHO the way the definition is written it's hard to decipher the original intent.

This also needs some historical context of GTFS. Originally, prior to the timepoint field, GTFS spec said you should only provide arrival and departure times for stop_times.txt records that are timepoints. So, if stops 1 and 4 were timepoints, but 2 and 3 were not, you'd have a valid GTFS that looks like this:

stop_sequence arrival_time departure_time
1 00:00:00 00:00:00
2
3
4 00:10:00 00:10:00

However, producers realized that for multiple consumers to show consistent scheduled arrival and departure times at each stop (i.e., so consumers didn't interpolate them and come up with their own values), they would need to share arrival/departure times for each stop in the trip. A large number of GTFS producers started doing the following, even though technically it was against the GTFS spec:

stop_sequence arrival_time departure_time
1 00:00:00 00:00:00
2 00:02:00 00:02:00
3 00:08:00 00:08:00
4 00:10:00 00:10:00

Now, to consumers, all the stops looked like timepoints, even though that wasn't the producer's intent.

The timepoint field was added to give producers a legitimate way to share times for each stop in the trip, while still correctly indicating which stops are timepoints.

So if producers want to provide times for every stop, they shouldn't be doing the above, and instead should provide the timepoint field:

stop_sequence arrival_time departure_time timepoint
1 00:00:00 00:00:00 1
2 00:02:00 00:02:00 0
3 00:08:00 00:08:00 0
4 00:10:00 00:10:00 1

This means as of today, IMHO, there are two valid ways to share arrival and departure times in GTFS. The first is the original GTFS spec without the timepoint field, where times are omitted for stops that are not timepoints:

stop_sequence arrival_time departure_time
1 00:00:00 00:00:00
2
3
4 00:10:00 00:10:00

Or, if they want to provide times for every stop, they should provide timepoint values for the entire stop-times.txt.

stop_sequence arrival_time departure_time timepoint
1 00:00:00 00:00:00 1
2 00:02:00 00:02:00 0
3 00:08:00 00:08:00 0
4 00:10:00 00:10:00 1

Here's a suggestion for clarifying the spec to try and capture the above intent:

Field Name Required Details
timepoint Optional The timepoint field can be used to indicate if the specified arrival and departure times for a stop are strictly adhered to by the transit vehicle or if they are instead approximate and/or interpolated times. The field allows a GTFS producer to provide interpolated stop times that potentially incorporate local knowledge, but still indicate if the times are approximate. If a GTFS feed stop_times.txt file does not contain the timepoint field, feed producers must only provide arrival and departure times for stops that are timepoints - arrival and departure times should be left blank for stops that are not timepoints. In this case, consumers must interpolate all entries that do not have specified arrival and departure times. If a GTFS feed stop_times.txt file contains the timepoint field, every row in stop_times.txt should have a value for timepoint. Valid values for this field are:
* 0 - Times are considered approximate.
* 1 - Times are considered exact.
If timepoint has a value of 0, arrival and departure times do not need to be provided for that stop. If timepoint has a value of 1, arrival and departure times must be provided for that stop.

Alternate suggestions, or alternate interpretations of the spec, are welcome.

@isabelle-dr
Copy link
Collaborator

There has been a discussion in the canonical GTFS schedule validator around this issue, and we noticed that existing data had implemented timepoint in two opposite ways:

  1. Considering that stop_times.timepoint = "" means times are exact. This is based on stop_times.timepoint description.

1 or empty - Times are considered exact.

Sample from the Greater Glens Falls Transit (Mobility Database link), which doesn't contain any 1's in the timepoint column.

stop_sequence arrival_time departure_time timepoint
0 13:15:00 13:15:00
1 13:30:00 13:30:00 0
2 13:40:00 13:40:00 0
3 13:50:00 13:50:00
4 14:00:00 14:00:00 0
  1. Considering that stop_times.timepoint = "" means times are approximate. This is based on stop_times.departure_time and stop_times.arrival_time description:

Conditionally Required: Required for timepoint=1. Optional otherwise.

Sample from the Squaxin Island Transit (Mobility Database link), which doesn't contain any 0's in the timepoint column.

stop_sequence arrival_time departure_time timepoint
1 8:30:00 8:30:00 1
2 8:31:01 8:31:01 1
3
4
5
6
7
8
9
10 8:45:00 8:45:00 1
11 8:55:00 8:55:00 1

The GTFS validator won't interpret empty timepoint values as equivalent to 1 in order to avoid false positives, but it will flag:

  1. a record with timepoint = 1 and (departure_time or arrival_time) missing (referencing the specification), as an error.
  2. a record with a missing value in timepoint(only if the column exists, referencing the specification although not said explicitly), as a warning.
  3. a stop_times.txt file with no timepoint column (referencing the best practices), as a warning.

I think @barbeau's suggestion above would help a lot implement timepoint in a consistent way.

@barbeau
Copy link
Collaborator

barbeau commented Dec 13, 2021

As discussed in this thread:
MobilityData/gtfs-validator#887 (comment)

...how timepoint works in conjunction with frequencies.txt exact_times=0 trips should also be clarified.

MobilityData/gtfs-validator#887 (comment) says:

A trip can be frequency-based, so the exact start of the trip is unknown. However, once the vehicle starts moving, it visits each stop in predictable time since the time difference between each stop is fixed (hence timepoint=1). This is a usual case for metro/subway/underground. So, frequency-based trips may have timepoint=1.

...but this isn't documented anywhere in the spec.

My comment here:
MobilityData/gtfs-validator#887 (comment)

...said:

I can't think of another way in the validator (based on the current spec at least) to differentiate the valid use case from agencies mistakenly assigning wall-clock timepoints to exact_times=0 trips. One approach that would require a spec change is for consumers to start exact_times=0 trips at midnight to clearly differentiate the use of stop_times.txt time records for time offsets rather than wall-clock times.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Dec 14, 2022
@isabelle-dr
Copy link
Collaborator

keep open

@github-actions github-actions bot removed the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Dec 16, 2022
@emmambd emmambd added the Status: Ready for Pull Request Issues that are ready to be transferred to the Pull Request stage. label May 10, 2023
@isabelle-dr isabelle-dr linked a pull request Jun 7, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule Status: Ready for Pull Request Issues that are ready to be transferred to the Pull Request stage.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants