# Get Uber Ride Data

As a full-time Uber driver, I'm interested in performing analysis on my past rides so I can make informed choices about driving in the future and how to optimize for higher earnings. However, when I log in to the [Uber Drivers](https://drivers.uber.com/earnings/activities) page, I can only view my past rides a week at a time, and that weekly view is frustratingly paginated and lacks relevant details. Being a programmer, I naturally thought of downloading the data through an API. As of August 24, 2024 access to the [Uber Drivers API](https://developer.uber.com/docs/drivers/introduction) is "limited" and there's a vague message on their info page about applying for access. So that's not a solution for my personal data needs. 

**Caveat**: This is likely against the Uber Drivers TOS and I'm engaging with this at my own risk of potentially haveing my account limited/banned, but I really want to get this data and I'm requesting in a reasonable manner. **Proceed at your own risk.** 

## Reverse Engineering the Uber Drivers page

By going into the Google Chrome Developer Console Network tab when loading the Uber Drivers page, I can see that the weekly paginated rides data is initially retrieved through an HTTP POST request to `https://drivers.uber.com/earnings/api/getWebActivityFeed?localeCode=en` with the request payload:

```
{"startDateIso":"2024-05-13","endDateIso":"2024-05-20","paginationOption":{}}
```

This retrieves a JSON object including my rides data that's far richer than what's actually displayed on the page. Jackpot! 

If there are more pages of data, then the JSON object `data` has `data['data']['pagination']['hasMoreData']` set to `True`. There is then a pagination cursor value available in `data['data']['pagination']['nextCursor']`, and the next page of data can be requested with a similar HTTP POST request to `https://drivers.uber.com/earnings/api/getWebActivityFeed?localeCode=en` with the request payload:

```
{"startDateIso":"2024-05-13","endDateIso":"2024-05-20","paginationOption":{"cursor": data['data']['pagination']['nextCursor']}}
```

Repeating this request until `data['data']['pagination']['hasMoreData']` is `False` will retrieve all data for that time period. 

Performing this entire series of requests for all weekly date ranges from the date I started driving Uber (2023-01-09) to now will get me all of my ride data. 

Importantly, this is all occurring inside an authenticated HTTPS session. In my initial discovery and testing, I used Postman to perform the requests. Through the Network tab in the Google Chrome Developer Console, I right-clicked the POST request, chose `Copy > Copy as cURL`, and then imported that into Postman using `File > Import...`. 

After confirming it worked, I migrated to Python to request everything programmatically. To generate starter code for Python requests, in Postman on this particular request I chose the `Code` option in the pane on the right side of the window, then chose "Python - Requests" from the drop-down menu. This includes the full authentication tokens in plaintext on the `cookies` key of the headers dictionary. 

With all of the data downloaded, I extract the rides data specifically and save this to a local JSON file so I don't have to re-request the data. I then perform some data cleaning, parsing, and enrichment to ultimately produce CSV files that I can easily analyze with other programs and import into a spreadsheet. 

In [1]:
import csv
from datetime import datetime, timedelta
import json
import re
import pandas as pd
import requests

In [4]:
class UberDriver:
    def __init__(self):
        self.headers = {
          'accept': '*/*',
          'accept-language': 'en-US,en;q=0.9',
          'content-type': 'application/json',
          'cookie': 'marketing_vistor_id=e6179b1a-9343-4ccf-94f3-49c0a0872691; udi-id=96R0deYd5NKtgpGIU0qFlFAcBzBQdTIRqsIlYJbvKhZuroYcx6SkceuzEon93iJ1/4V6+aiHnY4FEB1asER6hueGDGiVa+rD521twej17+B6ZIhp+seAhGU26SYRIVM90DgeCyb6L5jVWA7X2q7g4WwpAdBUzfnV4+M2TqnoBJ/bUQiS5a9rOHzFhVo6lCk84g8Aa3PHc6xcKn/3Hl64DQ==fvhyJitF/W+NoxdxMgTOVQ==Kt6etXM12v3JFy3hyJfCG/nDTSP3v9qMot79dk8GMWw=; _ga=GA1.2.1479445038.1725558410; _ua={"session_id":"35d416d9-038a-4ef5-8c4b-5a3cb64ba693","session_time_ms":1726067606130}; x-uber-analytics-session-id=2d9df78f-8237-4399-9dad-ea95b85d3bcf; udi-fingerprint=IsfVqvFQ6U7zXjFSex9f78kOt9zUKBxnLMvggArLxEMtiS7QjdX8UH4ALuWvgQqbkA3xO0RLlZxwvhhS7ys/gw==I85WUL/fsmgUX2zxrmk6W7+y3np06/p4fHTM3m2d55Q=; isWebLogin=true; sid=QA.CAESELaVBpfVeUpBmXnJnLbcKf0Y3ImLuQYiATEqJDYwOTJiYWFkLTAxYjMtNDRiYi1iZTE2LTAxZDZlMWIzYWM5YzJAFL-KaHDZrJxSY84kGACPtifIhJITxWJMt9EFAkz1y0Uc-Ob9oDNei1ZWE14aT_t2gdlVYeu8gSlVvyA0rVmvUDoBMUIIdWJlci5jb20.NQ0u6aQmFYoFvoj2ZhI6Fg1xup7yaAx4xlPYfcljnss; csid=1.1730331868279.JDd4PJtx+7GeNJ6ML2hoPisPYOjY1Q6OD15otAUshok=; jwt-session=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE3Mjc3Mzk4NzQsImV4cCI6MTcyNzgyNjI3NH0.lYcWnzcy07MFT13LWx9JmkOJCYIePPjh4unR8_LRz54; mp_adec770be288b16d9008c964acfba5c2_mixpanel=%7B%22distinct_id%22%3A%20%226092baad-01b3-44bb-be16-01d6e1b3ac9c%22%2C%22%24device_id%22%3A%20%2219182de2167581-04c7044d865e5-18525637-13c680-19182de21681438%22%2C%22%24initial_referrer%22%3A%20%22https%3A%2F%2Fauth.uber.com%2F%22%2C%22%24initial_referring_domain%22%3A%20%22auth.uber.com%22%2C%22%24user_id%22%3A%20%226092baad-01b3-44bb-be16-01d6e1b3ac9c%22%2C%22%24search_engine%22%3A%20%22google%22%7D',
          'origin': 'https://drivers.uber.com',
          'priority': 'u=1, i',
          'referer': 'https://drivers.uber.com/earnings/activities',
          'sec-ch-ua': '"Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"',
          'sec-ch-ua-mobile': '?0',
          'sec-ch-ua-platform': '"macOS"',
          'sec-fetch-dest': 'empty',
          'sec-fetch-mode': 'cors',
          'sec-fetch-site': 'same-origin',
          'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36',
          'x-csrf-token': 'x',
          'x-uber-earnings-seed': '70946254c2d18c0633a83cf845ab98f2'
        }

    
    def get_rides(self, start_date_iso, end_date_iso):        
        url = "https://drivers.uber.com/earnings/api/getWebActivityFeed?localeCode=en"
    
        payload = json.dumps({
          "startDateIso": start_date_iso,
          "endDateIso": end_date_iso,
          "paginationOption": {}
        })
        
        response = requests.request("POST", url, headers=self.headers, data=payload)
        data = response.json()

        rides = data['data']['activities']
        while data['data']['pagination']['hasMoreData']:
            payload = json.dumps({
              "startDateIso": start_date_iso,
              "endDateIso": end_date_iso,
              "paginationOption": {"cursor": data['data']['pagination']['nextCursor']}
            })
            response = requests.request("POST", url, headers=self.headers, data=payload)
            data = response.json()
            if data['data']['activities']:
                rides = rides + data['data']['activities']
        return rides or []

    def get_ride_detail(self, ride_uuid):
        # This is only helpful to get additional fare breakdown from Uber, if we wanted to analyze how much
        # Uber is taking from each fare.
        url = f"https://drivers.uber.com/earnings/trips/{ride_uuid}"
        response = requests.request("GET", url, headers=self.headers)
        return response.text

In [None]:
uber = UberDriver()

# This is the date I started working as an Uber driver; modify for your start date
start_date = datetime.strptime("2023-01-09", "%Y-%m-%d")
end_date = datetime.today()
current_date = start_date

rides = []
while current_date <= end_date:
    next_date = current_date + timedelta(days = 7)
    start_date_iso = current_date.strftime('%Y-%m-%d')
    end_date_iso = next_date.strftime('%Y-%m-%d')
    print(f"Getting rides for {start_date_iso} - {end_date_iso}...")
    new_rides = uber.get_rides(start_date_iso, end_date_iso)
    print(f"Retrieved {len(new_rides)} rides.")
    rides += new_rides
    current_date = next_date

# Let's dump all of these rides to a JSON file so we can reference this data outside of the script if need be, 
# or simply not have to retrieve from Uber again.
with open(f"../data/rides_raw.json", "w") as file:
    json.dump(rides, file)

print(f"Retrieved {len(rides)} rides total.")

Getting rides for 2023-01-09 - 2023-01-16...
Retrieved 42 rides.
Getting rides for 2023-01-16 - 2023-01-23...
Retrieved 9 rides.
Getting rides for 2023-01-23 - 2023-01-30...
Retrieved 13 rides.
Getting rides for 2023-01-30 - 2023-02-06...
Retrieved 0 rides.
Getting rides for 2023-02-06 - 2023-02-13...
Retrieved 26 rides.
Getting rides for 2023-02-13 - 2023-02-20...
Retrieved 26 rides.
Getting rides for 2023-02-20 - 2023-02-27...
Retrieved 7 rides.
Getting rides for 2023-02-27 - 2023-03-06...
Retrieved 29 rides.
Getting rides for 2023-03-06 - 2023-03-13...
Retrieved 9 rides.
Getting rides for 2023-03-13 - 2023-03-20...
Retrieved 51 rides.
Getting rides for 2023-03-20 - 2023-03-27...
Retrieved 59 rides.
Getting rides for 2023-03-27 - 2023-04-03...
Retrieved 26 rides.
Getting rides for 2023-04-03 - 2023-04-10...
Retrieved 0 rides.
Getting rides for 2023-04-10 - 2023-04-17...
Retrieved 0 rides.
Getting rides for 2023-04-17 - 2023-04-24...
Retrieved 0 rides.
Getting rides for 2023-04-24 - 2

In [6]:
# Let's re-load rides using what's in the JSON file to confirm that it's what we need.
# We can also re-run from this point onward without having to hit the Uber site again.
rides_df = pd.read_json('../data/rides_raw.json')

# Let's see what the ride data looks like
print(rides_df.head())
print("\nUnique values for ride type:\n")
print(rides_df['type'].unique())
print("\nUnique values for ride activityTitle:\n")
print(rides_df['activityTitle'].unique())

# There's some deeply nested JSON data in these fields, so a dataframe isn't the right form yet. I'll convert to 
# a list of dicts and clean it up there. 
rides = rides_df.to_dict(orient='records')

                                   uuid  recognizedAt activityTitle  \
0  d3096d6c-02bd-4f8e-855b-117588b27910    1673809557       Comfort   
1  89a3e777-2000-44af-8509-765b157dfe9e    1673808331         UberX   
2  78616fc5-6214-4f31-89a0-4229e35f0c5c    1673807621         UberX   
3  b6457299-8e46-44ec-b151-398649205336    1673806661         UberX   
4  ae817bfc-76a2-4a47-8622-d85cc9b5dd8e    1673802749         UberX   

  formattedTotal                                            routing  \
0         $10.72  {'webviewUrl': 'https://drivers.uber.com/earni...   
1          $4.13  {'webviewUrl': 'https://drivers.uber.com/earni...   
2          $7.56  {'webviewUrl': 'https://drivers.uber.com/earni...   
3         $11.19  {'webviewUrl': 'https://drivers.uber.com/earni...   
4         $10.48  {'webviewUrl': 'https://drivers.uber.com/earni...   

                                    breakdownDetails  \
0  {'formattedTip': '$1.00', 'formattedSurge': None}   
1                                 

In [13]:
def parse_time_to_seconds(time_str):
    matches = re.findall(r'(\d+)\s*(hr|min|sec)', time_str)
    unit_to_seconds = {'hr': 3600, 'min': 60, 'sec': 1}
    return sum(int(value) * unit_to_seconds[unit] for value, unit in matches)

def parse_miles(miles_str):
    match = re.search(r'(\d+\.?\d*)\s*mi', miles_str)
    return float(match.group(1))

def parse_currency_to_float(currency_str):
    clean_str = currency_str.replace('$', '').strip()
    return float(clean_str)

def parse_season(date):
    """Return the season for a given datetime object."""
    seasons = {
        'Spring': (3, 21, 6, 20),
        'Summer': (6, 21, 9, 20),
        'Fall': (9, 21, 12, 20),
        'Winter': (12, 21, 3, 20)
    }
    month = date.month
    day = date.day
    for season, (start_month, start_day, end_month, end_day) in seasons.items():
        if start_month <= end_month:
            if start_month <= month <= end_month:
                if (month == start_month and day >= start_day) or (month == end_month and day <= end_day) or (start_month < month < end_month):
                    return season
        else:
            if month > start_month or month < end_month or (month == start_month and day >= start_day) or (month == end_month and day <= end_day):
                return season

def extract_zipcode(address):
    zip_code_pattern = re.compile(r'\b\d{5}\b')
    match = zip_code_pattern.search(address)
    if match:
        return match.group()

# Filter out any rides that don't have breakdownDetails or tripMetaData, which are present for completed passenger rides
rides = [ride for ride in rides if ride.get('breakdownDetails') and ride.get('tripMetaData')]

cleaned_rides = []
for ride in rides:
    # what
    ride_type = ride['activityTitle']
    status = ride['status']
    note = ride['type']
    # $$
    tip = parse_currency_to_float(ride['breakdownDetails']['formattedTip'] or '$0.00')
    surge = parse_currency_to_float(ride['breakdownDetails']['formattedSurge'] or '$0.00')
    earnings = parse_currency_to_float(ride['formattedTotal'])
    ## where
    distance = parse_miles(ride['tripMetaData']['formattedDistance'])
    pickup_zip = extract_zipcode(ride['tripMetaData']['pickupAddress'])
    dropoff_zip = extract_zipcode(ride['tripMetaData']['dropOffAddress'])
    ## when
    ride_start = datetime.fromtimestamp(ride['recognizedAt'])
    duration = parse_time_to_seconds(ride['tripMetaData']['formattedDuration'])
    ride_end = ride_start + timedelta(seconds=duration)
    if ride_type in ['Comfort', 'UberX', 'UberXL', 'UberX Share', 'UberX Priority', 'Uber Pet', 'Business Comfort'] \
            and status == 'COMPLETED' \
            and note == 'TRIP' \
            and pickup_zip \
            and dropoff_zip:
        cleaned_rides.append({
            'type': ride_type,
            'tip': tip,
            'surge': surge,
            'earnings': earnings,
            'distance': distance,
            'pickup_zip': pickup_zip,
            'dropoff_zip': dropoff_zip,
            'ride_start': ride_start,
            'duration': duration,
            'ride_end': ride_end
        })

rides_df = pd.DataFrame(cleaned_rides)
rides_df.to_csv('../data/rides.csv', include_index=False)