# The Sharing Economy: Impact of Ride Hailing

Principal Investigator: Liza Sakhaie

Email: liza.sakhaie@stern.nyu.edu


In this project, I will study how ride hailing apps have grown recently. I will use data over the past 5 years, since this is when ride sharing became particularly popular.

The data holds the potential to portray a few different trends:

- The decline in the NYC Taxi industry over the past 5 years
- The rise of Ride Hailing apps (using number of rides/year as a measurement)
- The Y-O-Y growth rates for each ride hailing app, in order to show the increase in popularity of Juno and Lyft in the past few years


### The Data

The data used in this project comes from NYC open data, which has aggregated ride information for NYC yellow cabs as well as the three most popular ride hailing apps that I plan on including in this project: Uber, Lyft, and Juno. 

The yellow cab data is split up into different data sets per year. I will be using the data sets from 2015 to 2018. However, so far these have been too large to load on my computer, so I'm looking into a different way to look at the data. I will either:
1. Aggregate from monthly data from the NYC Taxi and Limousine Commission by setting up a for loop and appending the data sets for each month together to build a table with a full year's worth of data
2. Focus specifically on the ride-hailing apps and the growing competition between them. This could include further studies such as:
    - the impact of seasonality on ride ordered
    - the growth of ride-sharing within ride-hailing apps
    - the recent rise of apps like Lyft and Juno over the original app, Uber

THe ride-hailing data is all on one data set and includes info from 2015-2018

### Importing all potentially necessary packages

In [2]:
from IPython.display import display, Image # Displays things nicely
import pandas as pd # Key tool 
import matplotlib.pyplot as plt # Helps plot
import numpy as np # Numerical operations
import os
import descartes

from census import Census # This is new...
#from us import states

#import fiona # Needed for geopandas to run
import geopandas as gpd # this is the main geopandas 
from shapely.geometry import Point, Polygon # also needed

##########################
# Then this stuff below allows us to make a nice inset


from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes
from mpl_toolkits.axes_grid1.inset_locator import mark_inset

### Bringing in my data

In [3]:
ride_hailing= "/Users/rksaks/Desktop/Taxi idea/FHV_Base_Aggregate_report-2.xls"

In [5]:
ride_hailing=pd.read_excel(ride_hailing)

In [6]:
ride_hailing.head(10)

Unnamed: 0,Base License Number,Base Name,DBA,Year,Month,Month Name,Total Dispatched Trips,Total Dispatched Shared Trips,Unique Dispatched Vehicles
0,B02849,BROOKLYN RIDES CORP,,2018,11,November,16493,192,122
1,B02686,STANDARD LIMOUSINE & CAR SERVICE GROUP LLC.,,2018,7,July,61,0,15
2,B01741,MONACO LIMO & CAR SERVICES INC.,,2016,9,September,1790,0,12
3,b00965,BEN JEVO MGT. INC.,NEWPORT CAR SERVICE,2016,1,January,7623,0,49
4,B02509,NY MINUTE CAR SERVICE INC.,,2018,9,September,9284,0,54
5,B02790,"LAN TIAN CAR SERVICE, INC",,2018,11,November,6,0,2
6,B01381,CLASSIC CAR SERVICE CORP.,PAISA CLASSIC,2016,6,June,4173,0,45
7,B01876,QUALITY EXECUTIVE LIMOUSINE L.L.C,,2018,5,May,35,0,5
8,B00202,"EXCELSIOR CAR & LIMO, INC.",,2018,9,September,228,0,8
9,B03105,MAZNA TRANSPORTATION CORPORATION,,2018,10,October,142,0,9


### Defining a fuction to pull info for each ride-hailing app

In [7]:
def taxi_app(df,company):
    
    df_taxi_app = df[df["Base Name"] == company]
    # The brand
    
    return df_taxi_app[["Base Name", "Year", "Month", "Month Name", "Total Dispatched Trips", "Total Dispatched Shared Trips", 
                     "Unique Dispatched Vehicles"]] #[brand_size.isin([company])]
    # This then returns the dataframe that we care about...

### Building DataFrames for each app

In [8]:
uber=taxi_app(ride_hailing, "UBER")

In [9]:
uber.head()

Unnamed: 0,Base Name,Year,Month,Month Name,Total Dispatched Trips,Total Dispatched Shared Trips,Unique Dispatched Vehicles
584,UBER,2018,10,October,14663999,4045741,75606
723,UBER,2015,10,October,4359759,0,26875
1428,UBER,2016,5,May,5391879,0,32505
1833,UBER,2017,5,May,8794695,0,54465
2103,UBER,2015,1,January,1871075,0,12544


In [10]:
lyft=taxi_app(ride_hailing, "LYFT")

In [11]:
lyft.head()

Unnamed: 0,Base Name,Year,Month,Month Name,Total Dispatched Trips,Total Dispatched Shared Trips,Unique Dispatched Vehicles
401,LYFT,2016,3,March,691455,0,9266
1148,LYFT,2018,6,June,3664808,602496,39519
1637,LYFT,2018,5,May,3400356,540029,38628
2228,LYFT,2016,10,October,1112748,0,15926
2991,LYFT,2016,8,August,1095428,0,15033


In [12]:
juno=taxi_app(ride_hailing, "JUNO")

In [13]:
juno.head()

Unnamed: 0,Base Name,Year,Month,Month Name,Total Dispatched Trips,Total Dispatched Shared Trips,Unique Dispatched Vehicles
1149,JUNO,2017,1,January,950588,0,17808
1444,JUNO,2018,7,July,1018602,0,21380
1528,JUNO,2017,11,November,1163709,0,20714
1879,JUNO,2018,11,November,1048668,0,20750
2207,JUNO,2017,9,September,1069736,0,19059


In [38]:
via=taxi_app(ride_hailing, "VIA")

In [39]:
via.head()

Unnamed: 0,Base Name,Year,Month,Month Name,Total Dispatched Trips,Total Dispatched Shared Trips,Unique Dispatched Vehicles
554,VIA,2018,1,January,1016794,662745,5697
799,VIA,2017,8,August,984296,649083,4693
882,VIA,2018,6,June,744501,496676,5752
1251,VIA,2016,10,October,655551,0,2851
1865,VIA,2015,6,June,155658,0,665


I will have to group data into years since it is currently divided up by month as you can see.

### Grouping the data by month

#### JUNO

In [20]:
juno1=juno.groupby('Year')

In [31]:
juno1.head()

Unnamed: 0,Base Name,Year,Month,Month Name,Total Dispatched Trips,Total Dispatched Shared Trips,Unique Dispatched Vehicles
1149,JUNO,2017,1,January,950588,0,17808
1444,JUNO,2018,7,July,1018602,0,21380
1528,JUNO,2017,11,November,1163709,0,20714
1879,JUNO,2018,11,November,1048668,0,20750
2207,JUNO,2017,9,September,1069736,0,19059
3668,JUNO,2017,7,July,914051,0,20377
3915,JUNO,2017,3,March,1195637,0,19816
3982,JUNO,2016,10,October,857206,0,15446
5274,JUNO,2018,2,February,1329836,0,21431
5330,JUNO,2019,1,January,1235565,0,21768


In [29]:
juno_sum=juno1.sum()

In [30]:
juno_sum

Unnamed: 0_level_0,Month,Total Dispatched Trips,Total Dispatched Shared Trips,Unique Dispatched Vehicles
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2016,75,5083708,0,106748
2017,78,12537031,0,237889
2018,78,13813009,0,255375
2019,1,1235565,0,21768


May potentially make a new column in date.time format so months and years are easier to work with

#### Lyft

In [32]:
lyft1=lyft.groupby('Year')

In [33]:
lyft_sum=lyft1.sum()

In [34]:
lyft_sum

Unnamed: 0_level_0,Month,Total Dispatched Trips,Total Dispatched Shared Trips,Unique Dispatched Vehicles
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2015,72,2615481,0,43196
2016,78,11415958,0,162843
2017,78,26361098,3261655,332211
2018,78,44823801,7776283,494968
2019,1,4623412,709716,50099


#### Uber

In [35]:
uber1=uber.groupby("Year")

In [36]:
uber_sum=uber1.sum()

In [37]:
uber_sum

Unnamed: 0_level_0,Month,Total Dispatched Trips,Total Dispatched Shared Trips,Unique Dispatched Vehicles
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2015,78,36275937,0,239798
2016,78,70067854,0,441948
2017,78,109642713,11105091,675809
2018,78,163103265,42375713,854781
2019,1,14325492,3616477,78022


#### VIA

In [40]:
via1=via.groupby("Year")

In [41]:
via_sum=via1.sum()

In [42]:
via_sum

Unnamed: 0_level_0,Month,Total Dispatched Trips,Total Dispatched Shared Trips,Unique Dispatched Vehicles
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2015,72,1767709,0,6932
2016,78,6085133,0,21580
2017,78,10324271,4340253,50269
2018,78,11345582,7684873,69460
2019,1,1006908,558233,6784
