Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing arrival/departure_time entries are filled with 00:00:00 #195

Closed
dhersz opened this issue Feb 9, 2023 · 1 comment · Fixed by #196
Closed

Missing arrival/departure_time entries are filled with 00:00:00 #195

dhersz opened this issue Feb 9, 2023 · 1 comment · Fixed by #196
Assignees

Comments

@dhersz
Copy link

dhersz commented Feb 9, 2023

Hi. I've noticed that read_gtfs() fills missing arrival/departure_time (spec compliant) with 00:00:00. Take the following Google's example feed (https://developers.google.com/transit/gtfs/examples/gtfs-feed), which is included in gtfstools for testing purposes:

path <- system.file("extdata/ggl_gtfs.zip", package = "gtfstools")

gtfs <- tidytransit::read_gtfs(path)

# tidytransit fills empty arrival/departure times with 00'00"
gtfs$stop_times[, c("trip_id", "arrival_time", "departure_time", "stop_id")]
#> # A tibble: 11 × 4
#>    trip_id arrival_time departure_time stop_id
#>    <chr>   <time>       <time>         <chr>  
#>  1 AWE1    06'10"       06'10"         S1     
#>  2 AWE1    00'00"       00'00"         S2     
#>  3 AWE1    06'20"       06'30"         S3     
#>  4 AWE1    00'00"       00'00"         S5     
#>  5 AWE1    06'45"       06'45"         S6     
#>  6 AWD1    06'10"       06'10"         S1     
#>  7 AWD1    00'00"       00'00"         S2     
#>  8 AWD1    06'20"       06'20"         S3     
#>  9 AWD1    00'00"       00'00"         S4     
#> 10 AWD1    00'00"       00'00"         S5     
#> 11 AWD1    06'45"       06'45"         S6

The issue persists when saving the GTFS object to disk with tidytransit::write_gtfs(), in which case the 00'00" entries are saved as 00:00:00. MobilityData's GTFS Validator considers this an error, as we can see from the code below (you'll need to use the dev version of gtfstools for that1 [if running the code interactively, the function will open a validation report in HTML format, but I'll include the error in the snippet below anyway]):

# results in validation error using MobilityData gtfs validator

validator_path <- gtfstools::download_validator(tempdir())
output_dir <- tempfile()
gtfstools::validate_gtfs(gtfs, output_dir, validator_path)

result_json <- file.path(output_dir, "report.json")
result_json <- jsonlite::read_json(result_json)

error <- result_json$notices[[4]]

error$code
#> [1] "stop_time_with_arrival_before_previous_departure_time"
data.table::rbindlist(error$sampleNotices)
#>    csvRowNumber prevCsvRowNumber tripId arrivalTime departureTime
#> 1:            3                2   AWE1    00:00:00      00:06:10
#> 2:            5                4   AWE1    00:00:00      00:06:30
#> 3:            8                7   AWD1    00:00:00      00:06:10
#> 4:           10                9   AWD1    00:00:00      00:06:20

I don't know how to handle this case with {hms}, but I'd say that representing missing dates with NA seems to be the best strategy. gtfstools represents dates as strings, so missing dates are represented as "", but I'm not sure if something like that is possible with hms objects.

Footnotes

  1. Testing if validator_gtfs() worked with tidygtfs objects was exactly how I found this behavior :)

@polettif
Copy link
Contributor

Thanks for pointing this out @dhersz

This is indeed unexpected behavior and I think tidytransit did use NA for missing times at some point in the past... Apparently that's not the case anymore so time to fix it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants