Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
151 lines (104 sloc) 5.27 KB
---
title: "Headway/Frequency Estimation"
author: "Tom Buckley"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{tidytransit-headways}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(dplyr)
library(tidytransit)
```
# Introduction
This is a brief introduction to the functions in tidytransit that can be used to describe the frequency with which vehicles are scheduled to pass through routes and stops.
# Using read_gtfs()
For convenience, when you pass a `frequency=TRUE` parameter to `read_gtfs()`, a `routes_frequency` dataframe is added to the list of calculated dataframes in the gtfs object as read by `read_gtfs`.
## Key Assumptions
By default `read_gtfs` assumes:
- the user wants frequencies from 6 AM and 10 PM for all schedules that run on all weekdays.
- the user wants frequencies that correspond to the frequency with which vehicles pass through stops.
- [that we can use a heuristic to guess which service_id is representative of a standard weekday](https://github.com/r-transit/tidytransit/blob/master/R/frequencies.R#L34-L59).
See the reference for the `get_route_frequency()` function for other options (e.g. weekends, other times of day).
```{r}
local_gtfs_path <- system.file("extdata",
"google_transit_nyc_subway.zip",
package = "tidytransit")
nyc <- read_gtfs(local_gtfs_path,
geometry=TRUE,
frequency=TRUE)
```
# By Route
View the headways along routes as a dataframe.
```{r}
head(nyc$.$routes_frequency)
```
# By Stop
View the headways at stops. `stops_frequency` is added to the list of gtfs dataframes read in by `read_gtfs`. Again, by default, frequency is calculated for service that happens every weekday from 6 am to 10 pm. See the reference for the `get_stop_frequency` function for other options (e.g. weekends, other times of day).
```{r}
head(nyc$.$stops_frequency)
```
# Mapping Route Frequencies
You can now map subway routes and color-code each route by how often trains come.
```{r}
plot(nyc)
```
# Mapping Stop Frequencies
Before we plot headways at stops, we must join the frequency table to the geometries for the stops.
```{r}
some_stops_freq_sf <- nyc$.$stops_sf %>%
left_join(nyc$.$stops_frequency, by="stop_id") %>%
select(headway)
```
Then we can plot them.
```{r}
plot(some_stops_freq_sf)
```
We will see some outliers for headway calculations in this plot.
In the NYC MTA schedule, for a few stops, a train will only show up a few times a day. Since we are calculating headways, by default, for a period from 6 am to 10 pm, the average headway for these stops will be as high as hundred of minutes.
One quick solution to the outlier stops in above plot is to throw out stops with headways greater than an unreasonable amount of time. For example, we can filter out stops with headways above 60 minutes.
```{r}
some_stops_freq_sf <- some_stops_freq_sf %>%
filter(headway<60)
plot(some_stops_freq_sf)
```
If you're interested in how to work with schedules and outlier stops like this, the `timetable` vignette, included in this package, is a great introduction.
# Route Frequency Assumptions
Headways along routes, in the `routes_frequency` data frame, are based on summary statistics of the frequency with which vehicles pass through the stops in the `stops_frequency` data frame.
```{r}
head(nyc$.$routes_frequency)
```
The median value for a route will more closely match what a rider might experience along that route. That the median works better than the mean is due to the outlier stops discussed above.
One way we can verify these estimates is by checking against reported headways.
For example, we see that our estimated median headway for the 1 train from 6 AM to 10 PM is 5 minutes. When we compare this estimate with the [wikipedia entry](https://en.wikipedia.org/wiki/List_of_New_York_City_Subway_services#Train_intervals) for this train, we have a rough match. Headways reported there are 3 minutes at rush hour, 6 minutes at mid-day and 10 minutes at night.
# Specific Days and Times
You might be interested in calculating headways for more specific times of day.
For example, what are rush hour headways like on a specific weekday (2018-08-23)? The `set_hms_times` and `set_date_service_table` functions will alter the feed for us, allowing us to filter by date.
```{r}
nyc <- nyc %>%
set_hms_times() %>%
set_date_service_table()
```
Below we pull a service ID for a specific weekday (2018-08-23).
```{r}
nyc <- nyc %>%
set_hms_times() %>%
set_date_service_table()
services_on_180823 <- nyc$.$date_service_table %>%
filter(date == "2018-08-23") %>% select(service_id)
```
See the `servicepatterns` and `timetable` vignettes for more advice on schedule filtering.
Then we calculate the route frequency in the afternoon rush hour.
```{r}
nyc <- get_route_frequency(nyc, service_id = services_on_180823, start_hour = 16, end_hour = 19)
```
```{r}
head(nyc$.$routes_frequency)
```
Again, the median headways for the 1 train seem to roughly correspond (1 min off) to those published on [wikipedia entry](https://en.wikipedia.org/wiki/List_of_New_York_City_Subway_services#Train_intervals)
You can’t perform that action at this time.