Skip to content

linbarbell/BigDataProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#A Look at NYC Taxi Data and How We Tip

Computing w/ Large Data Sets - NYU Courant Institute - Fall 2014

Project By:

  • Lin, Emily
  • Liu, Thomas
  • Shaw Susanto, Billy

Main Tools

  • Python (main language)
  • iPython Notebook (main platform)
  • GeoPy (for reverse geocoding)
  • Pandas (for data analysis)
  • Tableau (for data visualization)

Data

This project uses data by Chris Whong, who has requested all 2013 NYC taxi rides data from NYC Taxi & Limousine Commission. Bless his heart.

We took all trip information, as well as all the fare rides.


We sampled about 75 million rides (>= 30GB) to make the following analysis.

Data Analysis

Basic Information (1 slide per point)

  • Which neighborhood takes the cab the most (pickup): 1) List of top 10 2) Pie Chart - "top neighborhood.csv"
  • Which is the most popular neighborhood for cab rides (dropoff) 1) List of top 10 2) Pie - chart - "top_drop_neighborhood.csv"
  • At which time do New Yorkers take cab ride the most? 1) List of percentage 2) Bar chart - "count_by_hour.csv"
  • At which time do New Yorkers take cab rides the least? - "count_by_hour.csv"
  • Which time has the lowest avg speed (distance/ time travelled) - "time_speed.csv"
  • Which time has the highest avg speed (distance/ time travelled) - "time_speed.csv"

Payments & Tip

  • Average in New York CRD - 0.20200465239853405

Specific areas of interest:

  • Tip percentage (avg, top)
  • Credit card vs. cash
  • Total fare

Factors to analyze:

  • Average
  • Neighborhood: does pick up area/ drop off area affect tip or payment methods (for the top list: only pick with more than 10,000 rides)
  • Time of the day
  • Distance travelled

Analytics

  • Prediction: predicting tip based on neighborhood and time -> lm, simplified lasso. compare two methods by comparing mse.

About

Final project for big data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages