# Analysis completed by Stephen Stark
# Jumpman23 - Exploratory Data Analysis

Jumpman23 is an on-demand delivery platform connecting “Jumpmen” and customers  purchasing a variety of goods. Jumpman23 will send Jumpmen to merchants to purchase and  pickup any items requested by the customer. Whenever possible, Jumpman23 will order the  requested items ahead to save the Jumpmen time. Each time a Jumpman23 delivery is  completed, a record is saved to the Jumpman23 database that contains information about that  delivery. Jumpman23 is growing fast and has just launched in its newest market -- New York City.

# Objective
The objective of this notebook is to answer the questions:\
- How are things going in New York?
- Are there data integrity issues? 
    - If so, where are they and how do they impact the analysis?


I will use the dataset provided by Postmates.

# Summary
1. [Understand the data](#understand)\
    1.1 [N/A's?](#na)\
2. 
3.
4. 
5.
6. [Scratch](#scratch)

### Understanding the Delivery Process

A;lkasjdf;lkajsdf;ljasdlfkjal;sdfkja

;laksdfj;lkajdsf;lkjasdl;fkjasdf

asdf;lkjasd;flkjas;dlfkja;lsdkjf

### Understanding Delivery Attributes

- **Job_ID:**  a unique identifier of a delivery
- **Customer_id:** a unique identifier for the Jumpman23 customer
- **Jumpman_id:**  a unique identifier for the Jumpman who completed the delivery
- **vehicle_type:** The method of transport the Jumpman used to complete the delivery
- **pickup_place:** The name of the Pickup location
- **place_category:** A categorization of the Pickup location
- **Item_name:** the name of the item requested
- **Item_quantity:** how many of that item was requested
- **Item_category_name:** categorization provided by the merchant, think “appetizers”, “soups” etc
- **How_long_it_took_to_order:** how long it took to place the order [interval]
- **pickup_lat:** the coordinates of the pickup location
- **pickup_lon:** the coordinates of the pickup location
- **dropoff_lat:** the coordinations of the dropoff location
- **dropoff_lon:** the coordinations of the dropoff location
- **when_the_delivery_started:** localized timestamp representing when the delivery began
- **when_the_Jumpman_arrived_at_pickup:** localized timestamp representing when the Jumpman arrived at the pickup location
- **when_the_Jumpman_left_pickup:** localized timestamp representing when the Jumpman left the pickup location
- **when_the_Jumpman_arrived_at_dropoff :** localized timestamp representing when the Jumpman reached the customer

## Import Neccesary Dependencies

In [48]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import seaborn as sns
from collections import Counter
import haversine as hs
import os as os
from datetime import datetime, timedelta
%matplotlib inline

In [49]:
#ignore warnings
import warnings
warnings.filterwarnings('ignore')

In [50]:
os.listdir()

['README.md',
 '.ipynb_checkpoints',
 'Stephen_Stark_Jumpman23Analysis.ipynb',
 '.git']

In [51]:
df = pd.read_csv('../Jumpman23/analyze_me.csv')

<a id='understand'></a>
## Understand the data

In order to answer the question of 'how are things going in NYC', we need to first spot check the data. We know there are potential data integrity issues. Lets look column by column to determine what sort of analysis we can do.

In [52]:
print(df.shape)
print(df.info())

(5983, 18)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5983 entries, 0 to 5982
Data columns (total 18 columns):
delivery_id                            5983 non-null int64
customer_id                            5983 non-null int64
jumpman_id                             5983 non-null int64
vehicle_type                           5983 non-null object
pickup_place                           5983 non-null object
place_category                         5100 non-null object
item_name                              4753 non-null object
item_quantity                          4753 non-null float64
item_category_name                     4753 non-null object
how_long_it_took_to_order              3038 non-null object
pickup_lat                             5983 non-null float64
pickup_lon                             5983 non-null float64
dropoff_lat                            5983 non-null float64
dropoff_lon                            5983 non-null float64
when_the_delivery_started              5

In [53]:
print('Count of records:', len(df))
print('---')
print('Count of unique elements in each id column:')

id_cols = ['delivery_id','customer_id','jumpman_id']
for i in id_cols:
    print(i,':', df[i].nunique())

Count of records: 5983
---
Count of unique elements in each id column:
delivery_id : 5214
customer_id : 3192
jumpman_id : 578


It is important to note the total number of records 5983 is greater than the unique elements for each id. I would expect customer_id and jumpman_id to be used mutliple times. However, I want to look into why the same delivery_id has been used multiple times.

In [54]:
df[df['delivery_id'].duplicated()].head(3)

Unnamed: 0,delivery_id,customer_id,jumpman_id,vehicle_type,pickup_place,place_category,item_name,item_quantity,item_category_name,how_long_it_took_to_order,pickup_lat,pickup_lon,dropoff_lat,dropoff_lon,when_the_delivery_started,when_the_Jumpman_arrived_at_pickup,when_the_Jumpman_left_pickup,when_the_Jumpman_arrived_at_dropoff
82,1314550,348787,119813,bicycle,Otto Enoteca Pizzeria,Italian,Prosciutto Arugula,1.0,Pizzas,00:22:42.942105,40.732064,-73.996155,40.767582,-73.983704,2014-10-07 18:40:38.769589,2014-10-07 18:49:49.978276,2014-10-07 19:27:58.470009,2014-10-07 19:55:56.804909
207,1332526,48677,152676,bicycle,Shake Shack,Burger,Smoke Shack,1.0,Burgers,,40.715279,-74.01486,40.72452,-73.99342,2014-10-10 18:41:05.90546,2014-10-10 19:04:31.649579,2014-10-10 19:46:22.211936,2014-10-10 20:15:31.476676
244,1319971,94027,119255,walker,Trader Joe's,Grocery Store,Organic Autumn Wheat,1.0,Breakfast & Cereal,,40.74174,-73.99365,40.736971,-73.985844,2014-10-08 19:33:52.549234,2014-10-08 19:45:18.293971,2014-10-08 20:38:07.37508,2014-10-08 20:49:29.420191


Based on the query below, it looks like multiple items from the same order are broken out on different records. I would expect all the other attributes to be the same for all cases. One way I could handle this is to merge the items into a list for each record. For the purposes of this analysis, I'll drop the duplicate records as I don't see a meaningful reason for including them in the analysis.

In [55]:
#sample duplicate row
df[df['delivery_id']==1272701]

Unnamed: 0,delivery_id,customer_id,jumpman_id,vehicle_type,pickup_place,place_category,item_name,item_quantity,item_category_name,how_long_it_took_to_order,pickup_lat,pickup_lon,dropoff_lat,dropoff_lon,when_the_delivery_started,when_the_Jumpman_arrived_at_pickup,when_the_Jumpman_left_pickup,when_the_Jumpman_arrived_at_dropoff
1008,1272701,81085,112646,bicycle,Mighty Quinn's BBQ,BBQ,Brisket,1.0,Meats,,40.727519,-73.988671,40.723962,-73.993393,2014-10-01 12:12:24.393054,2014-10-01 12:17:22.929789,2014-10-01 12:27:42.369732,2014-10-01 12:34:27.142996
5080,1272701,81085,112646,bicycle,Mighty Quinn's BBQ,BBQ,Housemade Iced Tea,1.0,Beverages,,40.727519,-73.988671,40.723962,-73.993393,2014-10-01 12:12:24.393054,2014-10-01 12:17:22.929789,2014-10-01 12:27:42.369732,2014-10-01 12:34:27.142996


In [56]:
#full duplicate row dataset
df[df['delivery_id'].duplicated()].sort_values(by='delivery_id')

Unnamed: 0,delivery_id,customer_id,jumpman_id,vehicle_type,pickup_place,place_category,item_name,item_quantity,item_category_name,how_long_it_took_to_order,pickup_lat,pickup_lon,dropoff_lat,dropoff_lon,when_the_delivery_started,when_the_Jumpman_arrived_at_pickup,when_the_Jumpman_left_pickup,when_the_Jumpman_arrived_at_dropoff
5080,1272701,81085,112646,bicycle,Mighty Quinn's BBQ,BBQ,Housemade Iced Tea,1.0,Beverages,,40.727519,-73.988671,40.723962,-73.993393,2014-10-01 12:12:24.393054,2014-10-01 12:17:22.929789,2014-10-01 12:27:42.369732,2014-10-01 12:34:27.142996
2299,1274248,208020,60149,car,Murray's Falafel,Middle Eastern,Moroccan Cigars (5 pc),1.0,Appetizers,00:07:08.767432,40.732166,-73.981904,40.747019,-73.990922,2014-10-01 17:25:48.54633,2014-10-01 17:40:32.886964,2014-10-01 17:53:54.166799,2014-10-01 18:09:37.353403
2986,1274248,208020,60149,car,Murray's Falafel,Middle Eastern,Watermelon,1.0,Desserts,00:07:08.767432,40.732166,-73.981904,40.747019,-73.990922,2014-10-01 17:25:48.54633,2014-10-01 17:40:32.886964,2014-10-01 17:53:54.166799,2014-10-01 18:09:37.353403
5386,1274328,255435,23359,bicycle,Lure Fishbar,Seafood,King Salmon,3.0,Sushi & Sashimi,00:11:23.081868,40.724635,-73.998402,40.743568,-73.972405,2014-10-01 17:47:16.707187,2014-10-01 17:44:49.255589,2014-10-01 18:21:08.892224,2014-10-01 18:41:16.203243
4578,1274372,82041,133293,bicycle,Parm,Italian,Chicken Parm,1.0,Sandwiches,00:02:31.470754,40.723020,-73.995854,40.720479,-74.001549,2014-10-01 17:57:34.871703,2014-10-01 17:57:34.041223,2014-10-01 18:09:53.957556,2014-10-01 18:20:05.578047
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3614,1490188,166368,174143,motorcycle,Prosperity Dumpling,Chinese,Chives and Pork Dumplings in Soup,1.0,Dumplings,,40.716001,-73.993210,40.775973,-73.947845,2014-10-30 21:00:30.703848,2014-10-30 21:08:00.931773,2014-10-30 21:44:21.145721,2014-10-30 22:07:09.83358
4119,1490188,166368,174143,motorcycle,Prosperity Dumpling,Chinese,Vegetable and Pork Dumplings in Soup,1.0,Dumplings,,40.716001,-73.993210,40.775973,-73.947845,2014-10-30 21:00:30.703848,2014-10-30 21:08:00.931773,2014-10-30 21:44:21.145721,2014-10-30 22:07:09.83358
4983,1490744,52256,38597,bicycle,Han Dynasty,Chinese,Dan Dan Noodle,1.0,Noodles,00:09:51.159698,40.732213,-73.988072,40.732288,-73.987752,2014-10-30 21:44:05.205404,2014-10-30 21:51:58.394867,2014-10-30 22:06:52.148926,2014-10-30 22:08:06.563304
4074,1490744,52256,38597,bicycle,Han Dynasty,Chinese,Bok Choy with Black Mushrooms,1.0,Vegetables,00:09:51.159698,40.732213,-73.988072,40.732288,-73.987752,2014-10-30 21:44:05.205404,2014-10-30 21:51:58.394867,2014-10-30 22:06:52.148926,2014-10-30 22:08:06.563304


In [57]:
#drop duplicate rows based on delivery_id column
df = df.drop_duplicates(subset=['delivery_id'])

In [58]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5214 entries, 0 to 5981
Data columns (total 18 columns):
delivery_id                            5214 non-null int64
customer_id                            5214 non-null int64
jumpman_id                             5214 non-null int64
vehicle_type                           5214 non-null object
pickup_place                           5214 non-null object
place_category                         4373 non-null object
item_name                              3984 non-null object
item_quantity                          3984 non-null float64
item_category_name                     3984 non-null object
how_long_it_took_to_order              2579 non-null object
pickup_lat                             5214 non-null float64
pickup_lon                             5214 non-null float64
dropoff_lat                            5214 non-null float64
dropoff_lon                            5214 non-null float64
when_the_delivery_started              5214 non-nul

The integrity of the following columns can be assessed by simply looking at the counts of the unique values in each set, as well as the size of the set itself.

In [59]:
select_cols = ['vehicle_type',
               'pickup_place',
               'place_category',
               'item_name',
               'item_category_name']

for col in select_cols:
    print(pd.DataFrame(df[col].value_counts()))

            vehicle_type
bicycle             3740
car                 1050
walker               234
van                   69
scooter               64
truck                 38
motorcycle            19
                           pickup_place
Shake Shack                         266
Momofuku Milk Bar                   162
The Meatball Shop                   153
sweetgreen                          138
Blue Ribbon Fried Chicken           115
...                                 ...
Kalustyan's                           1
Xe May Sandwich Shop                  1
67 Burger                             1
Souvlaki GR                           1
Pio Pio Riko                          1

[898 rows x 1 columns]
                       place_category
Italian                           437
Burger                            395
American                          357
Japanese                          335
Dessert                           277
Chinese                           265
Sushi                         

The integrity of the following columns can be assessed by looking at several measures of the statistical distribution of each set.

In [60]:
df['item_quantity'].describe()

count    3984.000000
mean        1.245231
std         0.781632
min         1.000000
25%         1.000000
50%         1.000000
75%         1.000000
max        16.000000
Name: item_quantity, dtype: float64

In [61]:
df['how_long_it_took_to_order'].describe()

count               2579
unique              2579
top       00:02:47.98465
freq                   1
Name: how_long_it_took_to_order, dtype: object

Assessment of pickup and dropoff locations...

In [62]:
import ipyplot

images_list = ['../Pickup.png','../Dropoff.png',
               '../Pickup_zoom.png','../Dropoff_zoom.png']

ipyplot.plot_images(images_list, img_width=500)

The pickup and dropoff locations all appear to be valid.

Check the min and max values for the relevant timestamp columns. It the data is for the month of October.

In [68]:
date_cols = ['when_the_delivery_started',
             'when_the_Jumpman_arrived_at_pickup',
             'when_the_Jumpman_left_pickup',
             'when_the_Jumpman_arrived_at_dropoff']

time_cols = ['how_long_it_took_to_order']
for i in date_cols:
    df[i] = pd.to_datetime(df[i])
    
for i in time_cols:
    df[i] = pd.to_timedelta(df[i])

for col in date_cols:
    print(col+":",
          df[col].min(),
          ",",
          df[col].max())
    
for col in time_cols:
    print(col+":",
          df[col].min(),
          ",",
          df[col].max())



when_the_delivery_started: 2014-10-01 00:07:58.632482 , 2014-10-30 23:08:43.481900
when_the_Jumpman_arrived_at_pickup: 2014-10-01 00:39:31.086322 , 2014-10-30 23:10:31.062088
when_the_Jumpman_left_pickup: 2014-10-01 00:59:57.522402 , 2014-10-30 23:23:51.143279
when_the_Jumpman_arrived_at_dropoff: 2014-10-01 00:30:21.109149 , 2014-10-30 23:29:44.866438
how_long_it_took_to_order: 0 days 00:01:22.997519 , 0 days 01:13:13.266118


Calculate the distance between the pickup and dropoff locations using Haversine distance. The Haversine distance is the angular distance between two points on the surface of a sphere. It is important to note this is distance is "as the crow flies", not distances on a map. 

In [69]:
def haversine_distance(lat1, lon1, lat2, lon2):
   r = 6371
   phi1 = np.radians(lat1)
   phi2 = np.radians(lat2)
   delta_phi = np.radians(lat2 - lat1)
   delta_lambda = np.radians(lon2 - lon1)
   a = np.sin(delta_phi / 2)**2 + np.cos(phi1) * np.cos(phi2) *   np.sin(delta_lambda / 2)**2
   res = r * (2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a)))
   return np.round(res, 2)

In [70]:
df['haversine_distance_km'] = haversine_distance(df['pickup_lat'],
                                              df['pickup_lon'],
                                              df['dropoff_lat'],
                                              df['dropoff_lon'])

There are quite a few missing values in the 'when_the_Jumpman_left_pickup' column. I can estimate this value using several other columns. First, I know what time the Jumpman arrived at the dropoff. I also have sufficient data to determine the average rate of travel per vehicle. I also know the distance traveled from the pickup to the dropoff. Using this information, I will back into the missing values for the 'when_the_Jumpman_left_pickup' column.

In [71]:
#separate dataframe
df_sub = df.copy()
df_sub = df_sub.dropna(subset=['when_the_Jumpman_arrived_at_dropoff','when_the_Jumpman_left_pickup'])

select_cols = ['delivery_id','vehicle_type','when_the_Jumpman_left_pickup',
               'when_the_Jumpman_arrived_at_dropoff','haversine_distance_km']
df_sub = df_sub[select_cols]


#calculate travel time in seconds
travel_time_seconds = (df_sub['when_the_Jumpman_arrived_at_dropoff'] - df_sub['when_the_Jumpman_left_pickup']).astype('timedelta64[s]')
df_sub['travel_time_seconds'] = travel_time_seconds

#calculate travel rate in km/s
travel_rate_kms = df_sub['haversine_distance_km'] / df_sub['travel_time_seconds']
df_sub['travel_rate_kms'] = travel_rate_kms

#create avg rate dictionary that I can call later on
avg_rate_dic = {}
unique_vehicles = df_sub['vehicle_type'].unique()

for i in unique_vehicles:
    avg_rate_dic[i] = df_sub[df_sub.vehicle_type==i]['travel_rate_kms'].mean()



In [72]:
#add average vehicle rate to the dataframe
avg_vehicle_rate = []
for value in df['vehicle_type']:
    if value == 'van':
        avg_vehicle_rate.append(avg_rate_dic['van'])
    elif value == 'bicycle':
        avg_vehicle_rate.append(avg_rate_dic['bicycle'])
    elif value == 'car':
        avg_vehicle_rate.append(avg_rate_dic['car'])
    elif value == 'walker':
        avg_vehicle_rate.append(avg_rate_dic['walker'])
    elif value == 'truck':
        avg_vehicle_rate.append(avg_rate_dic['truck'])
    elif value == 'scooter':
        avg_vehicle_rate.append(avg_rate_dic['scooter'])
    elif value == 'motorcycle':
        avg_vehicle_rate.append(avg_rate_dic['motorcycle'])
    else:
        avg_vehicle_rate.append('N/A')
df['avg_vehicle_rate_kms'] = avg_vehicle_rate
    

In [74]:
df.dtypes

delivery_id                                      int64
customer_id                                      int64
jumpman_id                                       int64
vehicle_type                                    object
pickup_place                                    object
place_category                                  object
item_name                                       object
item_quantity                                  float64
item_category_name                              object
how_long_it_took_to_order              timedelta64[ns]
pickup_lat                                     float64
pickup_lon                                     float64
dropoff_lat                                    float64
dropoff_lon                                    float64
when_the_delivery_started               datetime64[ns]
when_the_Jumpman_arrived_at_pickup      datetime64[ns]
when_the_Jumpman_left_pickup            datetime64[ns]
when_the_Jumpman_arrived_at_dropoff     datetime64[ns]
haversine_

In [82]:
rate = df.iloc[0]['avg_vehicle_rate_kms']
distance = df.iloc[0]['haversine_distance_km']
time = distance/rate
time = time.astype('timedelta64[s]')
time


numpy.timedelta64(399,'s')

In [84]:
print(df.iloc[0]['when_the_Jumpman_arrived_at_dropoff'])
print(df.iloc[0]['when_the_Jumpman_arrived_at_dropoff'] - time)

2014-10-26 14:52:06.313088
2014-10-26 14:45:27.313088


In [103]:
df['time_pickup_to_dropoff'] = df['haversine_distance_km']/df['avg_vehicle_rate_kms']
df['time_pickup_to_dropoff'] = df['time_pickup_to_dropoff'].astype('timedelta64[s]')

In [101]:
df.head(2)

Unnamed: 0,delivery_id,customer_id,jumpman_id,vehicle_type,pickup_place,place_category,item_name,item_quantity,item_category_name,how_long_it_took_to_order,pickup_lat,pickup_lon,dropoff_lat,dropoff_lon,when_the_delivery_started,when_the_Jumpman_arrived_at_pickup,when_the_Jumpman_left_pickup,when_the_Jumpman_arrived_at_dropoff,haversine_distance_km,avg_vehicle_rate_kms
0,1457973,327168,162381,van,Melt Shop,American,Lemonade,1.0,Beverages,00:19:58.582052,40.744607,-73.990742,40.752073,-73.98537,2014-10-26 13:51:59.898924,NaT,NaT,2014-10-26 14:52:06.313088,0.95,0.002379
1,1377056,64452,104533,bicycle,Prince Street Pizza,Pizza,Neapolitan Rice Balls,3.0,Munchables,00:25:09.107093,40.72308,-73.994615,40.719722,-73.991858,2014-10-16 21:58:58.654910,2014-10-16 22:26:02.120931,2014-10-16 22:48:23.091253,2014-10-16 22:59:22.948873,0.44,0.002127


In [100]:
for i in df[df['when_the_Jumpman_left_pickup'].isna()].index:
    #rate = df.iloc[i]['avg_vehicle_rate_kms']
    #distance = df.iloc[i]['haversine_distance_km']
    #time = (distance/rate).astype('timedelta64[s]')
    #df.iloc[i]['when_the_Jumpman_left_pickup'] = (df.iloc[i]['when_the_Jumpman_arrived_at_dropoff'] - (df.iloc[0]['haversine_distance_km']/df.iloc[i]['avg_vehicle_rate_kms']).astype('timedelta64[s]'))
    print(df.iloc[i]['when_the_Jumpman_arrived_at_dropoff'] - df.iloc[i])

    
    
    

ValueError: Cannot add integral value to Timestamp without freq.

In [None]:
df.head(2)

In [None]:
for i in df[(df.vehicle_type=='van') & (df.when_the_Jumpman_left_pickup.isna())].index:
    time_to_dropoff = df.iloc[i]['haversine_distance_km']/avg_rate_dic[df.iloc[i]['vehicle_type']]
    print(df.iloc[i]['when_the_Jumpman_arrived_at_dropoff'] - time_to_dropoff.astype('datetime64[s]'))
    
    #df.iloc[i]['when_the_Jumpman_left_pickup'] = ([df.iloc[i]['when_the_Jumpman_arrived_at_dropoff'] - (df.iloc[i]['haversine_distance_km']/avg_van_rate[df.iloc[i]['vehicle_type']]))

    #'van', 'bicycle', 'car', 'walker', 'truck', 'scooter','motorcycle'


In [None]:
#df.iloc[11]['when_the_Jumpman_arrived_at_dropoff'] - df.iloc[11]['when_the_Jumpman_left_pickup']
print(df.iloc[11]['when_the_Jumpman_arrived_at_dropoff'])
print(df.iloc[11]['when_the_Jumpman_left_pickup'])

In [None]:
df['how_long_it_took_to_order'] = pd.to_datetime(df.how_long_it_took_to_order, format = '%H:%M:%S.%f')

def minutes(data_input):
    return data_input.minute*60.0 + data_input.second

minute = df.how_long_it_took_to_order.apply(minutes)
df['min_to_order'] = round(minute/60,2)



In [None]:
df['min_to_order'].hist(bins=50)

In [None]:
df['min_to_order'].describe()

In [None]:
sns.boxplot(y="min_to_order", data=df, orient='h')

In [None]:
sns.violinplot(x="vehicle_type", y="min_to_order", data=df, split=False, inner="quart", linewidth=1.3)

# Exploratory Data Analysis (EDA)

## Descriptive Statistics

In [None]:
subset_attributes = ['item_quantity','how_long_it_took_to_order']
#rs = round(red_wine[subset_attributes].describe(),2)
#ws = round(white_wine[subset_attributes].describe(),2)
#pd.concat([rs, ws], axis=1, keys=['Red Wine Statistics', 'White Wine Statistics'])
data1 = round(df[subset_attributes].describe(),2)

pd.DataFrame(data1)

In [None]:
df['how_long_it_took_to_order'].describe()

In [None]:
df['item_quantity'].describe()

In [None]:
pd.DataFrame({'count':df.isnull().sum(), 
              'percent':(df.isnull().sum())/df.shape[0]}).sort_values(by='count', ascending=False)

In [None]:
#what timeframe of data?

print(df.when_the_Jumpman_arrived_at_dropoff.min())
print(df.when_the_Jumpman_arrived_at_dropoff.max())

In [None]:
#what is the most popular method of delivery?
df['vehicle_type'].value_counts()

In [None]:
#distribution of jumpman delivery people
df.jumpman_id.value_counts().hist(bins=100)

In [None]:
df.jumpman_id.value_counts().sort_values(ascending=False)

In [None]:
top_3 = ['99219','104533','142394']
df[df.jumpman_id.isin(top_3)]

In [None]:
df[df.how_long_it_took_to_order.isna()]

In [None]:
df.groupby(by='pickup_place').count().sort_values(by='delivery_id', ascending=False)

In [None]:
len(df['pickup_place'].unique())

In [None]:
df[df['pickup_place']=='Shake Shack']

<a id='scratch'></a>
## Scratch

In [None]:
loc1=(df.iloc[2]['pickup_lat'],df.iloc[2]['pickup_lon'])
loc2=(df.iloc[2]['dropoff_lat'],df.iloc[2]['dropoff_lon'])
hs.haversine(loc1,loc2)