Transportation Network Providers - Trips (2023-)

Chicago (Jan 2023 - July 11 2023)

[Website](https://data.cityofchicago.org/Transportation/Transportation-Network-Providers-Trips-2023-/n26f-ihde) Accessed July 24 2023

- 33 million reported trips ()

- Size: 9 GB

- Census Tracts are suppressed in some cases, and times are rounded to the nearest 15 minutes. Fares are rounded to the nearest $2.50 and tips are rounded to the nearest $1.00.

- 77 areas (1-78 in [official map](https://www.chicago.gov/content/dam/city/depts/doit/general/GIS/Chicago_Maps/Citywide_Maps/Community_Areas_W_Numbers.pdf))

- 75 (76 on map) is the Chicago O'Hare International Airport (to and from approx 10 minutes)
- 31 (32 on map) is the Chicago Loop, where the vertiport takeoff is located
 
- Considering 42057# of orders requested between 04/13/2023 07:00:00 AM and 04/13/2023 09:00:00 AM
portion of total orders authorized for sharing:  0.028936918943338802

- Considering 42057# of orders requested between 05/12/2023 07:00:00 PM and 05/12/2023 11:00:00 PM
portion of total orders authorized for sharing:  0.028936918943338802



In [1]:
import pickle
import os
import sys
root = '/home/mark/Documents/code/mdrp'
sys.path.append(root)

import numpy as np
import pandas as pd

data_path = '/home/mark/Documents/code/Chicago.csv'


In [2]:
# df = pd.read_csv(data_path,nrows=100000)
df = pd.read_csv(data_path)
cols_to_keep = ['Trip Seconds',
                'Trip Miles',
                'Pickup Community Area',
                'Dropoff Community Area',
                'Fare',
                'Additional Charges',
                'Shared Trip Authorized',
                'Trip Start Timestamp'
                ]

# trim the data for a specific date
# time_start = '05/12/2023 07:00:00 PM'
# time_end = '05/12/2023 11:00:00 PM'

# lb_idx = df['Trip Start Timestamp']>=time_start
# ub_idx = df['Trip Start Timestamp']<=time_end
# idx = lb_idx&ub_idx
# df = df[idx]

# Remove Invalid Data                
df = df[cols_to_keep]
df = df.dropna()

df['Price'] = df['Fare']+df['Additional Charges']

df = df.query('Price>1')
df = df.query('`Trip Miles`>0.1')
df = df.query('`Trip Seconds`>60')

df['Trip Hours'] = df['Trip Seconds']/3600
df['Value'] = df['Price']/df['Trip Hours']

df = df.drop(columns=['Fare', 'Additional Charges','Trip Seconds','Trip Start Timestamp'])
df = df.astype({'Pickup Community Area':'int'})
df = df.astype({'Dropoff Community Area':'int'})

len_df = len(df)
# print('Considering %d# of orders requested between %s and %s'%(len(df),time_start,time_end))

In [3]:
for col in df.columns:
    print(col)

# Check some overall stats
print('portion of total orders authorized for sharing: ',len(df.query('`Shared Trip Authorized` == True'))/len(df))

Trip Miles
Pickup Community Area
Dropoff Community Area
Shared Trip Authorized
Price
Trip Hours
Value
portion of total orders authorized for sharing:  0.040022028052027356


In [4]:
# Initialize a matrix representing [pickup][dropoff][type][mean/std]
# type = 0,1,2 correspodns to total, single, shared
# mean and std are recorded



# Shared is portion [0,1] of orders with share authorizaiton
# Rate is portion [0,1] of all orders that are in this group
data = {'Latency':np.zeros((77,77,3,2)),
        'Distance':np.zeros((77,77,3,2)),
        'Value':np.zeros((77,77,3,2)),
        'Price':np.zeros((77,77,3,2)),
        'Shared':np.zeros((77,77)),
        'Rate':np.zeros((77,77)),
}

time = []
distance = []
price = []
shared = []


for group_name, df_group in df.groupby(['Pickup Community Area','Dropoff Community Area']):
    # get indeces, start them at one
    pickup, dropoff = group_name
    pickup -= 1
    dropoff -= 1

    # split dataframe into three subparts
    df_single = df_group.query('`Shared Trip Authorized` == False')
    df_shared = df_group.query('`Shared Trip Authorized` == True')
    
    # Travel time in hours
    data['Latency'][pickup][dropoff][0] = [df_group['Trip Hours'].mean(), df_group['Trip Hours'].std()]
    data['Latency'][pickup][dropoff][1] = [df_single['Trip Hours'].mean(), df_single['Trip Hours'].std()]
    data['Latency'][pickup][dropoff][2] = [df_shared['Trip Hours'].mean(), df_shared['Trip Hours'].std()]

    # Distance in miles
    data['Distance'][pickup][dropoff][0] = [df_group['Trip Miles'].mean(), df_group['Trip Miles'].std()]
    data['Distance'][pickup][dropoff][1] = [df_single['Trip Miles'].mean(), df_single['Trip Miles'].std()]
    data['Distance'][pickup][dropoff][2] = [df_shared['Trip Miles'].mean(), df_shared['Trip Miles'].std()]

    # Value in $/hour
    data['Value'][pickup][dropoff][0] = [df_group['Value'].mean(), df_group['Value'].std()]
    data['Value'][pickup][dropoff][1] = [df_single['Value'].mean(), df_single['Value'].std()]
    data['Value'][pickup][dropoff][2] = [df_shared['Value'].mean(), df_shared['Value'].std()]

    # Price in $
    data['Price'][pickup][dropoff][0] = [df_group['Price'].mean(), df_group['Price'].std()]
    data['Price'][pickup][dropoff][1] = [df_single['Price'].mean(), df_single['Price'].std()]
    data['Price'][pickup][dropoff][2] = [df_shared['Price'].mean(), df_shared['Price'].std()]

    # Portion Shared
    data['Shared'][pickup][dropoff] = len(df_shared)/len(df_group)

    # Portion of Total
    data['Rate'][pickup][dropoff] = len(df_group)/len_df

In [9]:
print(np.argsort(data['Rate'].sum(axis=1)))

[54 51 73 46  8 36 53 17 11 63 71 61 35 44 47 49 56 50 12 64 58 16 74 62
 19 69 10  9 52 25 72 39 45 57 59 66 33 60 38 26 65 67 29 37 48 34 13 14
 18  1 70 68 28 43 22 41 15  3 20 42  4 55  0 30 24 76 32  2 40 75 21  6
 23  5 31 27  7]


In [6]:
# save_path = '/home/mark/Documents/code/mdrp/results/all.p'
# pickle.dump(data, open(save_path,'wb'))

In [7]:
# import pickle 
# save_path = '/home/mark/Documents/code/mdrp/chicago/all.p'
# data = pickle.load(open(save_path,'rb'))

# pickup  = 7
# dropoff = 7

# print('Latency (hours)')
# print(data['Latency'][pickup][dropoff])

# print('Distance (miles)')
# print(data['Distance'][pickup][dropoff])

# print('Value ($/hour)')
# print(data['Value'][pickup][dropoff])

# print('Price ($)')
# print(data['Price'][pickup][dropoff])

# print('Portion Shared')
# print(data['Shared'][pickup][dropoff])

# print('Rate')
# print(data['Rate'][pickup][dropoff])

In [8]:
# import pickle 
# save_path = '/home/mark/Documents/code/mdrp/chicago/morning.p'
# data = pickle.load(open(save_path,'rb'))

# pickup  = 7
# dropoff = 7

# print('Latency (hours)')
# print(data['Latency'][pickup][dropoff])

# print('Distance (miles)')
# print(data['Distance'][pickup][dropoff])

# print('Value ($/hour)')
# print(data['Value'][pickup][dropoff])

# print('Price ($)')
# print(data['Price'][pickup][dropoff])

# print('Portion Shared')
# print(data['Shared'][pickup][dropoff])

# print('Rate')
# print(data['Rate'][pickup][dropoff])

Requests consist of some pickup/dropoff combination. 

The pickup negihberhood is what determines how long it will take for a vehicle to arrive. We can use the same server utilization formula. 

The travel time is determined by the individual requests 


Each 

Parameters for the case study with urban transport.

- $s_{i,j}$: service time
- $t_{i,j}$: travel time, can be computed as average between  
- $u_{i,j}$: wait time for vehicle to arrive, requires paramters below  
    - $N$: number of couriers
    - $\mu$: rate of order completion 
    - $\rho$: service utilization, solved for
    - $\bar{\rho}$: max service utilization, can be approximated using M/M/c calculator
    - N: number of couriers