Analysis of RTD (Denver-area public transit) reliability
- A line opened in April to great fanfare
- lots of delays and closures
- gained a reputation for unreliability
- Aurora line hasn't opened because it's operated by same company, want to get issues worked out
- Collect all tweets from the RTD Twitter account @RideRTD (
twitter-collect.py
) - Filter out irrelevant tweets (
post-process.py
) - Use tweets to generate data on delays by line
- delays or closures
- if a delay is expressed as a range (e.g. "15-30 minutes"), the upper bound was used
- when URLs were given in the tweet they were referenced for additional details
- if there is a scheduled delay that did not occur (according to a subsequent tweet), not recorded
- Analyze frequency and duration of delays (
analysis.py
)
- Future questions:
- assume that every mile of track has a failure rate, where any delay or closure on a day is a failure, and estimate this rate for each line
- are certain lines correlated (for occurrence and duration)?
- hypothesis: geographically colocated lines will have correlated delays
- correlation could also be indicated by an outage reported in the same tweet
- Go to https://apps.twitter.com/
- Sign in
- Create new app (if necessary)
- Put the following four values in a dotfile in your project:
- Consumer Key (
export TWITTER_CONSUMER_KEY=foo
) - Consumer Secret (
export TWITTER_CONSUMER_SECRET=foo
) - Access Token Key (
export TWITTER_ACCESS_KEY=foo
) - Access Token Secret (
export TWITTER_ACCESS_SECRET=foo
)
- Source the dotfile (and make sure it's in your .gitignore)
- Install TwitterAPI module:
pip install TwitterAPI
- Modify
twitter-collect.py
for the username whose tweets you wish to collect
- Twitter code adapted from code by Charley Frazier and yanofsky.