Skip to content

Prevent ambiguous duplicate trips generated from frequency-based service #604

@PatrickSteil

Description

@PatrickSteil

Describe the problem

GTFS frequencies.txt allows defining multiple frequency blocks for the same trip_id.
When these frequency blocks overlap in time, they may generate duplicate trips with the same trip_id operating at the same departure times.

This creates ambiguity once frequency-based service is materialized into individual trip instances, which is required by downstream consumers such as GTFS Realtime.

GTFS Realtime identifies trips defined by a frequency using the tuple:

(trip_id, start_date, start_time)

and assumes this tuple uniquely identifies a single vehicle run. However, the GTFS Schedule specification currently allows frequency definitions that violate this assumption after expansion.

As a result:

  • GTFS Schedule allows ambiguous trip instances
  • GTFS Realtime cannot disambiguate them
  • Consumers must either guess, drop data, or rewrite feeds

This represents a mismatch between GTFS Schedule and GTFS Realtime semantics.

Use cases

Let's assume, a feed defines the following frequencies:

trip_id start_time  end_time headway_secs
A 08:00  10:00  300
A 08:10  10:00  600

For trip A two frequencies are defined; one with a 5 min headway, the other with 10 min.
The primary key (trip_id, start_time) is not violated, so this feed is valid under the current GTFS Schedule specification.
After expanding these frequency blocks into concrete trips, multiple departures occur at the same times, for example:

  • 08:10
  • 08:20
  • 08:30

These departures are generated by both frequency blocks.

A GTFS Realtime TripDescriptor identifying a trip as:

trip_id = "A"
start_date = "YYYYMMDD"
start_time = "08:10:00"

cannot be unambiguously matched to a single expanded trip instance, because multiple frequency blocks generate an identical (trip_id, start_date, start_time).

Since GTFS Realtime does not include any reference to the originating frequency block, consumers cannot reliably apply realtime updates in this scenario.

Proposed solution

Ensure that after expanding frequency-based service, each concrete trip instance is uniquely identifiable by:
(trip_id, start_date, start_time)

This would align GTFS Schedule semantics with GTFS Realtime, which relies on this tuple to identify individual vehicle runs.

Alternatively, extend GTFS Realtime to include the start time of the frequency block that generated the expanded trip, for example by identifying trips using:

(trip_id, start_date, start_time, frequency_start_time)

This would allow GTFS Realtime to disambiguate trips originating from different frequency blocks with overlapping departure times.

Additional information

Even if this pattern is uncommon in production GTFS feeds, the current specification allows frequency-based schedules that expand into ambiguous trip instances.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions