Data: NYC Rideshare Data

Questions:
1. How often do riders tip?
2. When riders tip, how much do they tip?
3. What are the busiest tipped routes?

In [1]:
# imports
from pathlib import Path
import pandas as pd

In [2]:
# set pandas to display all rows by default for manual review
pd.set_option("display.max_rows", None)

# ignore warnings
import warnings
warnings.filterwarnings('ignore')

In [3]:
# create dataframe from previously cleaned CSV file
data_df = pd.read_csv("cleaned_data.csv")

# preview data head sample from df
print(f"data_df")
display(data_df.head(3))

# list columns
display(data_df.columns)

data_df


Unnamed: 0,rideshare_company,pickup_zone,dropoff_zone,trip_length_miles,trip_time_seconds,base_passenger_fare_dollars,tolls_dollars,black_car_fund_dollars,sales_tax_dollars,congestion_surcharge_dollars,...,good_fare,request_month,request_date,request_hour,request_minute,request_day_of_week,time_of_day,pickup_borough,dropoff_borough,tip_or_no_tip
0,Uber,University Heights/Morris Heights,Bedford Park,2.06,660,23.87,0.0,0.72,2.12,0.0,...,False,Jan,1,1,23,Sun,night,Bronx,Bronx,False
1,Uber,Bushwick South,Brooklyn Heights,4.38,1379,41.66,0.0,1.25,3.7,0.0,...,False,Jan,1,1,58,Sun,night,Brooklyn,Brooklyn,False
2,Uber,Two Bridges/Seward Park,Lower East Side,1.39,590,34.03,0.0,1.02,3.02,2.75,...,False,Jan,1,1,59,Sun,night,Manhattan,Manhattan,False


Index(['rideshare_company', 'pickup_zone', 'dropoff_zone', 'trip_length_miles',
       'trip_time_seconds', 'base_passenger_fare_dollars', 'tolls_dollars',
       'black_car_fund_dollars', 'sales_tax_dollars',
       'congestion_surcharge_dollars', 'airport_fee_dollars', 'tip_dollars',
       'driver_pay_dollars', 'driver_total_pay_dollars', 'good_fare',
       'request_month', 'request_date', 'request_hour', 'request_minute',
       'request_day_of_week', 'time_of_day', 'pickup_borough',
       'dropoff_borough', 'tip_or_no_tip'],
      dtype='object')

## **1. How often do riders tip?** 20% of rides

In [4]:
# establish total number of rides in df
ride_count = len(data_df)

# count number of rides where drivers received a tip (bool)
tipped_ride_count = (data_df["tip_or_no_tip"] == 1).sum()

# divide number of tipped rides by total rides
tipped_ride_pct = tipped_ride_count / ride_count

# convert to a whole percentage
(tipped_ride_pct * 100).round(0)

20.0

## **2. When riders tip, how much do they tip?** 21% of base passenger fare and 17% of the total pre-tip fare

In [18]:
# isolate tipped rides
tipped_filter = data_df["tip_or_no_tip"] == 1

# create a dataframe of tipped rides
tipped_rides_df = data_df[tipped_filter]

In [19]:
# BASE FARE
# percent tip of base fare column created
tipped_rides_df["pct_tip_of_base"] = tipped_rides_df["tip_dollars"] / tipped_rides_df["base_passenger_fare_dollars"] * 100

# calculate mean
tipped_rides_df["pct_tip_of_base"].mean().round(0)

21.0

In [20]:
#TOTAL FARE BEFORE TIP
# new column for total fare before tip created
tipped_rides_df["total_fare_before_tip"] = tipped_rides_df[["base_passenger_fare_dollars",
    "tolls_dollars",
    "black_car_fund_dollars",
    "sales_tax_dollars",
    "congestion_surcharge_dollars",
    "airport_fee_dollars"]].sum(axis=1)

# pct tip of total before tip column created
tipped_rides_df["pct_tip_of_total"] = tipped_rides_df["tip_dollars"] / tipped_rides_df["total_fare_before_tip"] * 100

# calculate mean
tipped_rides_df["pct_tip_of_total"].mean().round(0)

17.0

## **3. What are the busiest tipped routes?**

1. JFK Airport to Outside of NYC
2. LaGuardia Airport to Outside of NYC
3. Times Sq/Theatre District to LaGuardia Airport
4. LaGuardia Airport to Midtown South
5. LaGuardia Airport to Times Sq/Theatre District

In [24]:
# create a route column
tipped_rides_df["route"] = tipped_rides_df["pickup_zone"] + " to " + tipped_rides_df["dropoff_zone"]

# isolate most frequent tipped routes
tipped_rides_df["route"].value_counts().head(5)

route
JFK Airport to Outside of NYC                     15
LaGuardia Airport to Outside of NYC                8
Times Sq/Theatre District to LaGuardia Airport     7
LaGuardia Airport to Midtown South                 6
LaGuardia Airport to Times Sq/Theatre District     5
Name: count, dtype: int64