Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common GTFS time bug "on those days that the DST <-> standard time switch occurs on" also affects this package #175

Open
derhuerst opened this issue Nov 25, 2021 · 10 comments

Comments

@derhuerst
Copy link

derhuerst commented Nov 25, 2021

GTFS Time is not defined relative to midnight, but relative to noon - 12h. While that makes "writing" GTFS feeds easier, it makes processing a lot harder.

Expected functionality

As explained in my note about GTFS Time values, with the Europe/Berlin time zone (+1h standard time to +2 DST shift occurs at 2021-03-28T02:00+01:00), I expect

  • the departure_time of 00:30 of a trip running on 2021-03-28 to happen at 1616884200/2021-03-28T00:30+02:00, not at 1616887800/2021-03-28T00:30+01:00;
  • the departure_time of 06:30 of a trip running on 2021-03-28 to happen at 1616905800/2021-03-28T06:30+02:00, not at 1616909400/2021-03-28T06:30+01:00.

Describe the bug

I'm very inexperienced with R and not that familiar with this code base, but it seems that tidytransit is affected by this problem on those days that the DST <-> standard time switch occurs on.

I'm not sure how that actually manifests in tidytransit's output, but I assume that wrong delays will be calculated, or that realtime data can't be matched against static data.

I tried to find some places in the code base:


related: google/transit#15

@derhuerst
Copy link
Author

If this is not the case, please excuse me for the noise, and close this Issue!

@mpadge
Copy link
Contributor

mpadge commented Nov 25, 2021

Thanks @derhuerst, that's all very concerning! I don't have time right now to look further, but will definitely be keeping an eye on here to see where this takes us. It must also affect gtfsrouter, because i don't explicitly take care of that kind of situation. A general solution would be really helpful for all.

@polettif
Copy link
Contributor

Thanks, I have some trouble wrapping my head around it at the moment. However, as an initial reaction I'd say this is more of an issue for feed providers than consumers. tidytransit does not use UNIX datetime values (like 2021-03-28T00:30+01:00) only hms values (00:30:00 without date information). This might be a problem in filter_feed_by_date where we don't include trips running past midnight from previous dates.

Generally, I haven't really run into issues with the feeds I work with (well, we rarely focus on saturday nights). Also, I'd really like to avoid adding timezones in tidytransit...

@derhuerst
Copy link
Author

Thanks, I have some trouble wrapping my head around it at the moment.

I can relate. 😄 I keep forgetting the details of this weird date/time handling after a while, and have to get familiar with it again.

However, as an initial reaction I'd say this is more of an issue for feed providers than consumers.

Both, right? Providers/authors need to know about this when transferring their timetables into GTFS, and consumers need to know about it when processing GTFS.

tidytransit does not use UNIX datetime values (like 2021-03-28T00:30+01:00) only hms values (00:30:00 without date information).

I'm not sure about frequencies.R, but raptor.R works with absolute times, right? How would you otherwise handle routing across date "boundaries" (a.k.a. routing across trips defined with difference service days).

This might be a problem in filter_feed_by_date where we don't include trips running past midnight from previous dates.

Sounds like another issue.

Generally, I haven't really run into issues with the feeds I work with (well, we rarely focus on saturday nights).

I made the experience, that this kind of bug is very hard to come across in practice, but at least in Germany, there are many trips running across the DST <-> standard time boundaries.

Another problem is that so many public-transport-related software has buggy date/time handling, even widely use ones, so it's hard to identify the intended behaviour.

Also, I'd really like to avoid adding timezones in tidytransit...

In this case, I'd propose to clearly document this shortcoming somewhere, so that users of tidytransit are aware of the consequences.

@mpadge
Copy link
Contributor

mpadge commented Nov 25, 2021

Totally agree on your concluding statement @derhuerst!

@polettif
Copy link
Contributor

Both, right? Providers/authors need to know about this when transferring their timetables into GTFS, and consumers need to know about it when processing GTFS.

Yes, of course. I guess my point is that tidytransit should just handle feeds "as is" and not try to fix incorrect data. I'm still not sure if that point really applies though 😅

I'm not sure about frequencies.R, but raptor.R works with absolute times, right? How would you otherwise handle routing across date "boundaries" (a.k.a. routing across trips defined with difference service days).

Well, it doesn't handle routing across date boundaries... However, that's not explicitly documented. The date used in filter_feed_by_date isn't the actual date but the date a trip belongs to ("Betriebstag"). It filters stop_times.txt by service_ids running on that day, there's no padding with trips running on service dates before or after. A new function like filter_feed_by_timeframe (with datetimes as limits) could cover that. But as you said, that's another issue.

I made the experience, that this kind of bug is very hard to come across in practice, but at least in Germany, there are many trips running across the DST <-> standard time boundaries.

They exist in Switzerland as well, there are actually more than 6000 trips running between 00:00 and 03:00 on October 31st this year (assuming my calculations in this gist are correct).

In this case, I'd propose to clearly document this shortcoming somewhere, so that users of tidytransit are aware of the consequences.

Definitely. A lot of this issue comes down to properly documenting assumptions and intended behavoir.

@derhuerst
Copy link
Author

Both, right? Providers/authors need to know about this when transferring their timetables into GTFS, and consumers need to know about it when processing GTFS.

Yes, of course. I guess my point is that tidytransit should just handle feeds "as is" and not try to fix incorrect data. I'm still not sure if that point really applies though 😅

Just to prevent a misunderstanding:
A GTFS provider can author perfectly valid GTFS data that follows GTFS's definition of day-relative time, and AFAICT tidytransit wouldn't parse it correctly; This is not about handling the feed "as is", as there is no other reasonable assumption for tidytransit to make than assuming that a GTFS feed follows the GTFS spec.

If you're talking about programmatically "guessing" that the provider/author of the GTFS feed violated the spec by using a more intuitive definition of day-relative time: Let's not do that. 😄

@tbuckl tbuckl changed the title potentially wrong handling of GTFS Time values common GTFS time bug "on those days that the DST <-> standard time switch occurs on" Mar 29, 2024
@tbuckl
Copy link
Member

tbuckl commented Mar 29, 2024

would be great if we could link to a concise and clear explanation of this bug with demonstrated impact for users.

@tbuckl tbuckl changed the title common GTFS time bug "on those days that the DST <-> standard time switch occurs on" common GTFS time bug "on those days that the DST <-> standard time switch occurs on" also affects this package Mar 29, 2024
@derhuerst
Copy link
Author

We could add this to https://gist.github.com/derhuerst/574edc94981a21ef0ce90713f1cff7f6. You're very welcome to propose in a comment on what to add to it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants