# Uber Stats

I've driven over 4000 passenger trips for Uber, and now I'm curious about questions I can answer with the data on those trips. 

<img src="./images/uber_profile.jpg" alt="Uber Trips Count" />

Although the data is not directly made available to drivers, I peeked under the hood of the Uber Drivers site, found the data request, and wrote a scraper to download all trip data to JSON:

<img src="./images/uber_scraping.png" alt="Google Developer Tools - Network Console - API Request for Uber Driver Activity" />

### Questions for Uber Passengers/Riders:

1. **Ride Availability and Popular Routes**:
   - What are the most common routes (ZIP code to ZIP code) that you drive? How does this information reflect on ride availability in certain areas?
   - Are there specific ZIP codes with higher frequency of pickups or dropoffs during certain times? This could indicate busy areas and times for riders.

2. **Cost and Distance Analysis**:
   - What is the average cost per mile or per minute for different ride types? This could help riders understand pricing better.
   - How does the cost of a trip vary by distance or duration? Are there patterns that might suggest more cost-effective options for riders?

3. **Ride Type Preferences**:
   - Which ride types are most popular in your area? Do certain ZIP codes favor specific ride types, like UberX Share or Comfort?
   - How does the popularity of ride types change over time, and what might this suggest about rider preferences?

4. **Timing and Convenience**:
   - What are the most and least busy times of day for trips? This could help riders decide when to request a ride for the shortest wait times.
   - Is there a noticeable difference in trip duration during peak hours versus off-peak hours? How does this affect the rider experience?

5. **Tip Behavior Insights**:
   - How often do riders tip, and how does the tip amount vary by ride type and trip characteristics (like distance or time of day)?
   - Are there trends in tipping behavior that could inform riders about what others are tipping for similar rides?

In [2]:
import pandas as pd

rides_df = pd.read_json('enriched_rides.json')

print(rides_df.head(1))

print(rides_df.shape)

                                   uuid       date      time  \
0  d3096d6c-02bd-4f8e-855b-117588b27910 2023-01-15  13:05:57   

            timestamp     day  day_of_week sortable_day_of_week  season  \
0 2023-01-15 19:05:57  Sunday            6           6 - Sunday  Winter   

      type  earnings  ...  surge  duration  distance  \
0  Comfort     10.72  ...    0.0       956       3.9   

                              pickup_address  \
0  N Ashland Ave, Chicago, IL 60614-1101, US   

                       dropoff_address earnings-surge  earnings/second  \
0  W Madison St, Chicago, IL 60612, US          10.72         0.011213   

   earnings/mile  pickup_zipcode  dropoff_zipcode  
0       2.748718           60614          60612.0  

[1 rows x 21 columns]
(3657, 21)


In [3]:
import datetime
total_duration = sum([r['duration'] for r in rides])
total_distance = sum([r['distance'] for r in rides])

time_delta = datetime.timedelta(seconds=total_duration)
days = time_delta.days
hours, remainder = divmod(time_delta.seconds, 3600)
minutes, seconds = divmod(remainder, 60)

# How many hours per day spent in the car, on average?
days_worked = len(set([ride['date'] for ride in rides]))
hours_worked_per_day = (total_duration / 3600) / days_worked


print(f"Time spent driving other people: {days} days, {hours} hours, {minutes} minutes, {seconds} seconds")
print(f"Number of days actually worked: {days_worked}")
print(f"Average hours worked per day: {hours_worked_per_day}")
print(f"Average mph: {total_distance / (total_duration / 3600) }")
print(f"Average miles traveled per day: {total_distance / days_worked}")

Time spent driving other people: 43 days, 17 hours, 59 minutes, 5 seconds
Number of days actually worked: 321
Average hours worked per day: 3.2709804430598823
Average mph: 15.84023047954399
Average miles traveled per day: 51.813084112149454


In [6]:
print("Total earned in 2024:")
print(f"{sum([ride['earnings'] for ride in rides if '2024' in ride['date']])}")
print("Total tips earned in 2024:")
print(f"{sum([ride['tip'] for ride in rides if '2024' in ride['date']])}")
print("Last trip date recorded:")
print(f"{rides[-1]['date']}")

Total earned in 2024:
24085.180000000066
Total tips earned in 2024:
3282.549999999999
Last trip date recorded:
2024-08-19


In [13]:
from collections import defaultdict
earnings_per_month = defaultdict(lambda: 0)
for ride in rides:
    if '2024' in ride['date']:
        month = int(ride['date'][5:7])
        earnings_per_month[month] += ride['earnings']
earnings_per_month

defaultdict(<function __main__.<lambda>()>,
            {1: 3631.58,
             2: 2507.3799999999987,
             3: 3903.1400000000012,
             4: 2933.590000000001,
             5: 3723.7200000000034,
             6: 2871.8700000000003,
             7: 1263.3599999999994,
             8: 3250.539999999999})