- Download the 2013 taxi data using this shell script.
- [This R script] (https://github.com/msr-ds3/nyctaxi/blob/master/exploratory_analysis/load_one_week.R) loads the csvs, adds necessary and convenient columns (e.g. neighborhood names) and saves them as
one_week_taxi.Rdata. To use the dataframe, simply call
- This R script uses
taxi_cleanto create a dataframe calles
shifts_cleanof drivers (
hack_licenses) and their shifts (as measured by the cutoff analysis here), and a dataframe called
taxi_clean_shiftswith a shift number for each ride, and stores it in an Rdata file called
####NOTE: AS OF 7/26 YOU SHOULD MOVE ALL .RDATA FILES INTO THE RDATA FOLDER, AND SAVE ALL FUTURE RDATA FILES TO THAT FOLDER
- Cool figures, plots, and maps (output of some of the scripts below) are in this dir
- This script creates a function (
visualize_trips_by_shift) that can plot the route of a random taxicab driver over the course of a shift or a day of the week (
visualize_trips_by_shift(df, hacklicense, shift = NULL).
dfis the dataframe (usually
taxi_cleanbut sometimes a subset of that.
hack_licenseof the driver (usually randomly chosen from
shiftis optional - it takes a shift number; when ommitted, all shifts will be shown as a faceted plot.
visualize_trips_by_day(df, hacklicense, day = NULL)works in a similar manner except that it can take in a particular day in the format "Mon", "Tue", etc.
- Stats for one week of taxi rides by day of week, hour of day, pickup location, and dropoff location are computed by this R script.
- Trip based descriptive plotting (distributions of distance, time, fare, etc) can be found here
- Neighborhood popularity plots (in R) are here
- Interactive popularity heatmaps by neighborhood can be created using this script
- Ggmap (not-interactive) popularity heatmaps can be created using the functions in here
- Driver based descriptive plotting (distributions of distance, time, fare, etc, by number of drivers) are here
- Visualize shifts, and rides within them, for n random drivers by calling the
visualize_rides_and_shifts()function created by this R script.
- Some plots using shift intervals [here] (https://github.com/msr-ds3/nyctaxi/blob/master/exploratory_analysis/plots_with_shift_interval.R)
Predicting shift efficiency
- Features to be included in the design matrix for the shifts prediction task are listed in this markdown file.
- The design matrix can be created and saved as an Rdata file using the script here
- Descriptive plots for both regression and classification for each individual feature here
- Created some models and efficiency prediction here
Predicting driver efficiency
- future work: Features to be included in the design matrix
- Visualizing flow over the day.
- Analysis on carpooling possibilities, here
- Plots on carpooling analysis.
- Probabilites of lat/lng destinations given a source neighborhood and a hour of day.
- Diving into carpool savings in more depth, at this link.
- A shiny app to visualize NYC taxi flow as a heatmap can be found here
- A shiny app (inspired by Todd Schneider's post) to visualize average trip times from neigborhood to neighborhood.
- An app to see popular neighborhood destinations, and unusual neighborhoods.
- Java code that can de-anonymize medallions and hack licenses.
- Play the "predict the driver's efficiency" guessing game using this script.