# Driver Pay
- This notebook studies how rides differ based on their playform
- Uber has a higher market share of 68% relative to Lyft's market share of 23%
- I show that Lyft rides are more likely to pay the driver a tip
- Uber drivers are compensated more per mile or minute, even after accounting for that Lyft riders are more likely to tip


In [16]:
from sqlalchemy import create_engine
engine = create_engine('postgresql://root:root@localhost:5432/uber')
engine.connect()

%load_ext sql
%sql postgresql://root:root@localhost:5432/uber

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [17]:
%%sql
SELECT * FROM main 
LIMIT 10;

 * postgresql://root:***@localhost:5432/uber
10 rows affected.


index,hvfhs_license_num,dispatching_base_num,originating_base_num,request_datetime,on_scene_datetime,pickup_datetime,dropoff_datetime,PULocationID,DOLocationID,trip_miles,trip_time,base_passenger_fare,tolls,bcf,sales_tax,congestion_surcharge,airport_fee,tips,driver_pay,shared_request_flag,shared_match_flag,access_a_ride_flag,wav_request_flag,wav_match_flag,pickup_hour,pickup_dayofweek,platform,has_tips,driver_pay_per_mile,driver_pay_per_minute
10840404,HV0005,B02510,,2019-08-17 15:59:10,,2019-08-17 16:02:35,2019-08-17 16:32:28,143,145,4.239,1793,19.41,0.16,0.49,1.74,2.75,,0.0,19.44,N,N,N,N,N,16,Saturday,Lyft,0,4.585987,0.65052986
2252549,HV0003,B02870,B02870,2019-08-04 03:15:51,2019-08-04 03:24:27,2019-08-04 03:24:27,2019-08-04 03:31:54,80,198,1.54,447,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,N,N,,N,N,3,Sunday,Uber,0,0.0,0.0
13745783,HV0003,B02879,B02879,2019-08-22 07:42:04,2019-08-22 07:43:54,2019-08-22 07:45:32,2019-08-22 07:50:41,69,119,1.32,308,3.85,0.0,0.0,0.33,0.0,,0.0,5.39,N,N,,N,N,7,Thursday,Uber,0,4.0833335,1.05
2620430,HV0005,B02510,,2019-08-04 17:25:30,,2019-08-04 17:29:54,2019-08-04 17:43:10,144,162,2.85,796,19.74,0.0,0.49,1.75,2.75,,3.71,9.7,N,N,N,N,N,17,Sunday,Lyft,1,3.4035087,0.73115575
3601407,HV0003,B02875,B02875,2019-08-06 12:08:15,2019-08-06 12:08:25,2019-08-06 12:10:25,2019-08-06 12:21:24,244,119,3.1,636,9.34,0.0,0.0,0.81,0.0,,0.0,8.83,N,N,,N,N,12,Tuesday,Uber,0,2.848387,0.83301884
2863573,HV0003,B02869,B02869,2019-08-05 05:00:06,2019-08-05 05:00:12,2019-08-05 05:02:47,2019-08-05 05:13:18,167,213,3.5,632,17.76,0.0,0.0,1.54,0.0,,0.0,13.57,N,N,,N,N,5,Monday,Uber,0,3.877143,1.2882911
7773871,HV0003,B02883,B02883,2019-08-12 19:49:50,2019-08-12 19:50:01,2019-08-12 19:52:26,2019-08-12 20:03:52,125,148,1.64,675,11.5,0.0,0.0,1.0,2.75,,0.0,7.45,N,N,,N,N,19,Monday,Uber,0,4.542683,0.6622222
13497465,HV0005,B02510,,2019-08-21 18:57:26,,2019-08-21 19:02:31,2019-08-21 19:36:55,186,229,2.796,2064,11.35,0.0,0.28,1.01,0.75,,0.0,0.0,Y,Y,N,N,N,19,Wednesday,Lyft,0,0.0,0.0
12036795,HV0003,B02877,B02877,2019-08-19 09:28:33,2019-08-19 09:30:20,2019-08-19 09:31:26,2019-08-19 09:34:50,119,247,0.84,203,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,N,N,,N,N,9,Monday,Uber,0,0.0,0.0
13943106,HV0005,B02510,,2019-08-22 14:00:58,,2019-08-22 14:05:32,2019-08-22 15:06:48,36,48,9.696,3676,31.48,0.0,0.86,3.06,0.75,,0.0,0.0,Y,Y,N,N,N,14,Thursday,Lyft,0,0.0,0.0


### Tips and Ride Platform
- How does the likelihood of tips vary with the platform?
- To study this, I group by the TLC license number
- I first use a subquery to alias the cumbersome 'Hvfhs_license_num' as 'license'
- I then group by 'license' and use a case statement for brand names
- I use a descending order of how likely a platform is to have a tip

- We can see that Lyft rides are significantly more likely to tip than Juno, Via, and Uber


In [18]:
%%sql
SELECT 
platform,
ROUND(100*AVG(
    (tips > 0)::int
    ), 2) "Percent with Tips"
FROM main
GROUP BY platform
ORDER BY AVG((tips > 0)::int) DESC

 * postgresql://root:***@localhost:5432/uber
4 rows affected.


platform,Percent with Tips
Lyft,18.11
Uber,11.17
Via,9.11
Juno,8.66


### Market Share

In [19]:
%%sql
SELECT platform, 
ROUND(
    (100 * COUNT(*) / (SELECT COUNT(*) FROM main))::numeric
, 2) market_share
FROM main 
GROUP BY platform
ORDER BY COUNT(*) DESC

 * postgresql://root:***@localhost:5432/uber
4 rows affected.


platform,market_share
Uber,69.0
Lyft,23.0
Via,4.0
Juno,3.0


### Pay per Mile
- Using the view, I compute the average pay per mile for each platform
- Juno and Uber pay the highest
- Interestingly, Via pays much lower

In [26]:
%%sql
SELECT 
platform, 
ROUND(
    AVG(driver_pay / trip_miles)::numeric
    ,2) pay_per_mile
FROM main 
WHERE trip_miles > 0.00
GROUP BY platform
ORDER BY AVG(driver_pay / trip_miles) DESC

 * postgresql://root:***@localhost:5432/uber
4 rows affected.


platform,pay_per_mile
Juno,4.91
Uber,4.08
Lyft,3.18
Via,0.26


### Pay per Mile - With Tips 
- It's possible that the lower pay of Lyft drivers is compensated by their higher tip rate
- To examine this, I now include tips when computing driver pay
- Naturally, including tips increases driver pay per mile, but doesn't remove the advantage that uber has over lyft

In [21]:
%%sql
SELECT 
platform, 
ROUND(
    AVG((driver_pay + tips) / trip_miles)::numeric
    ,2) pay_per_mile
FROM main 
WHERE trip_miles > 0.00
GROUP BY platform
ORDER BY AVG(driver_pay / trip_miles) DESC

 * postgresql://root:***@localhost:5432/uber
4 rows affected.


platform,pay_per_mile
Juno,5.02
Uber,4.22
Lyft,3.41
Via,0.32


### Pay per Minute
- Using the view, I compute average pay per minute for each platform
- Juno and Uber again pay the highest rate, with Lyft paying a lower rate

In [22]:
%%sql
SELECT 
platform, 
ROUND(
    AVG(driver_pay / trip_minutes)::numeric
    ,2) pay_per_minute
FROM main 
WHERE trip_minutes > 0.00
GROUP BY platform
ORDER BY AVG(driver_pay / trip_minutes) DESC;

 * postgresql://root:***@localhost:5432/uber
(psycopg2.errors.UndefinedColumn) column "trip_minutes" does not exist
LINE 4:     AVG(driver_pay / trip_minutes)::numeric
                             ^
HINT:  Perhaps you meant to reference the column "main.trip_miles".

[SQL: SELECT 
platform, 
ROUND(
    AVG(driver_pay / trip_minutes)::numeric
    ,2) pay_per_minute
FROM main 
WHERE trip_minutes > 0.00
GROUP BY platform
ORDER BY AVG(driver_pay / trip_minutes) DESC;]
(Background on this error at: https://sqlalche.me/e/20/f405)


## Time of Day
- We can see the lowest pay per mile is at late night
- In contrast, late night pays the highest *per minute*
- Tips are more likely at mid-day at least likely late at night

In [23]:
%%sql
SELECT 
pickup_hour, 
ROUND(
    AVG(driver_pay_per_mile)::numeric
    , 2) "Average Pay per Mile"
FROM main 
GROUP BY pickup_hour
ORDER BY pickup_hour ASC;

 * postgresql://root:***@localhost:5432/uber
24 rows affected.


pickup_hour,Average Pay per Mile
0,3.24
1,3.2
2,3.14
3,3.09
4,3.11
5,2.79
6,2.94
7,3.29
8,3.75
9,4.02


In [24]:
%%sql
SELECT 
pickup_hour, 
ROUND(
    AVG(driver_pay_per_minute)::numeric
    , 2) "Average Pay per Minute"
FROM main 
GROUP BY pickup_hour
ORDER BY pickup_hour ASC;

 * postgresql://root:***@localhost:5432/uber
24 rows affected.


pickup_hour,Average Pay per Minute
0,0.77
1,0.78
2,0.79
3,0.82
4,0.89
5,0.9
6,0.81
7,0.74
8,0.72
9,0.72


In [25]:
%%sql
SELECT 
pickup_hour, 
ROUND(
    100*AVG(has_tips)::numeric
    , 2) "Percent that Tip"
FROM main 
GROUP BY pickup_hour
ORDER BY pickup_hour ASC;

 * postgresql://root:***@localhost:5432/uber
24 rows affected.


pickup_hour,Percent that Tip
0,10.54
1,10.51
2,9.21
3,8.23
4,11.88
5,12.87
6,11.69
7,12.18
8,12.6
9,12.88
