### What is the business impact of imposing cancel fee?
If drivers are penalized for cancelling even when they have legitimate reasons (e.g., unsafe delivery conditions, traffic accidents), it may lead to riskier situations or suboptimal deliveries. 
A conflict between cannot affording to pay the fee when the situation is risky
Strict penalties can reduce flexibility and discourage some drivers from picking up orders they see as "high-risk," such as those in difficult-to-reach locations or areas with heavy traffic. This could lead to longer delivery times or more unfilled orders in certain areas, impacting the overall efficiency.

In [1]:
import numpy as np
from scipy import stats
import pandas as pd
from scipy.sparse import csr_matrix
import scipy.sparse as sp

In [3]:
df1 = pd.read_csv("penalty_file (1).csv")
df2 = pd.read_csv("order_file.csv")
df = pd.merge(df1, df2, on='driver.id')

In [475]:
nan_values = df[df['expected.profit'].isna() == True]

1651, 1709, 1617
These are the total amount of nan values found in expected profit for each penalty variant, from 0, 10, and 20, respectively. Since they are relatively equal, I will drop them.

In [5]:
df.dropna(subset=['expected.profit'], inplace=True)

In [7]:
df

Unnamed: 0,driver.id,penalty.variant,order.id,business.type,expected.profit,order.placed.time,delivery.completed.time,cancel.dummy
0,+++3990cLNPGgaPm+ripGg==,20,I9+8q9h67J5XL7x+X2flrA==,grocery,33.318685,2019-05-24 22:14:29.000 America/Los_Angeles,2019-05-24 22:51:37.000 America/Los_Angeles,0
1,+++stf7DqWcT8LMTYbXrwA==,20,FcDOAnNo5aTeYKG1iTyxSA==,grocery,8.532685,2019-05-18 01:23:50.000 America/Los_Angeles,,1
2,++07zTPYFhvA5Ug72kRd0w==,10,JwACfbilNJG1WIH5meqY/A==,restaurant,8.730685,2019-04-23 17:44:47.000 America/Los_Angeles,2019-04-23 18:13:37.000 America/Los_Angeles,0
3,++07zTPYFhvA5Ug72kRd0w==,10,MXUBhCnXvV4tiimvuSuP1g==,restaurant,8.460685,2019-05-20 15:04:52.000 America/Los_Angeles,2019-05-20 15:24:23.000 America/Los_Angeles,0
4,++08hetKFBNoOO5XHAH/5A==,0,LyL81zlkrtWgRju0jZgFpQ==,restaurant,12.600685,2019-04-16 17:45:01.000 America/Los_Angeles,2019-04-16 18:26:29.000 America/Los_Angeles,0
...,...,...,...,...,...,...,...,...
1402307,zzxOWVM+N+xVyaf4hMynwg==,0,GrrMgvBnOaA1ETEGPCCT0A==,restaurant,9.171685,2019-05-21 17:14:24.000 America/Los_Angeles,2019-05-21 17:45:50.000 America/Los_Angeles,0
1402308,zzxZU/e6hKC1DFRoPJFlKw==,0,ETSGQpVt/jalso7iakCvQg==,grocery,8.883685,2019-04-26 07:37:36.000 America/Los_Angeles,2019-04-26 07:49:25.000 America/Los_Angeles,0
1402309,zzxfsWIdNc6mohXYAJ6lig==,20,MJfq87CZqZ+C2PV7CU0ScQ==,grocery,10.701685,2019-05-18 14:12:26.000 America/Los_Angeles,2019-05-18 14:26:14.000 America/Los_Angeles,0
1402310,zzzhpNF1l7HdwyBef3huRw==,10,GeDHHrjniyBcEc1swW+yAw==,grocery,8.694685,2019-04-26 07:54:49.000 America/Los_Angeles,2019-04-26 08:05:38.000 America/Los_Angeles,0


In [9]:
df["order.placed.time"] = df["order.placed.time"].str.replace(r" America/Los_Angeles", "", regex=True)
df["order.placed.time"] = pd.to_datetime(df["order.placed.time"])
df["delivery.completed.time"] = df["delivery.completed.time"].where(df["delivery.completed.time"].isna(), df["delivery.completed.time"].str.replace(r" America/Los_Angeles", "", regex=True))
df["delivery.completed.time"] = pd.to_datetime(df["delivery.completed.time"])

In [11]:
df_20 = df[df['penalty.variant'] == 20].reset_index()
df_10 = df[df['penalty.variant'] == 10].reset_index()
df_0 = df[df['penalty.variant'] == 0].reset_index()

### different experiments begin from here

In [666]:
### run code below for changed profits

In [13]:
df_0['expected.profit'] = np.where(df_0['cancel.dummy'] == 1, 0, df_0['expected.profit'])
df_10['expected.profit'] = np.where(df_10['cancel.dummy'] == 1, 5, df_10['expected.profit'])
df_20['expected.profit'] = np.where(df_20['cancel.dummy'] == 1, 10, df_20['expected.profit'])

### Calculate the cancellation ratio for grocery and restaurant groups

In [19]:
df_10_grocery = df_10[df_10['business.type'] == 'grocery']
df_10_restaurant = df_10[df_10['business.type'] == 'restaurant']
df_0_grocery = df_0[df_0['business.type'] == 'grocery']
df_0_restaurant = df_0[df_0['business.type'] == 'restaurant']
df_20_grocery = df_20[df_20['business.type'] == 'grocery']
df_20_restaurant = df_20[df_20['business.type'] == 'restaurant']

In [21]:
total_orders_0_grocery = df_0_grocery.groupby('driver.id')['cancel.dummy'].count()
total_orders_0_restaurant = df_0_restaurant.groupby('driver.id')['cancel.dummy'].count()

In [23]:
total_orders_20_grocery = df_20_grocery.groupby('driver.id')['cancel.dummy'].count()
total_orders_20_restaurant = df_20_restaurant.groupby('driver.id')['cancel.dummy'].count()

In [24]:
total_orders_10_grocery = df_10_grocery.groupby('driver.id')['cancel.dummy'].count()
total_orders_10_restaurant = df_10_restaurant.groupby('driver.id')['cancel.dummy'].count()

In [27]:
total_cancelled_10_grocery = df_10_grocery.groupby('driver.id').agg(total_cancelled=('cancel.dummy', lambda x: (x == 1).sum()))
total_cancelled_10_restaurant = df_10_restaurant.groupby('driver.id').agg(total_cancelled=('cancel.dummy', lambda x: (x == 1).sum()))

In [28]:
total_cancelled_0_grocery = df_0_grocery.groupby('driver.id').agg(total_cancelled=('cancel.dummy', lambda x: (x == 1).sum()))
total_cancelled_0_restaurant = df_0_restaurant.groupby('driver.id').agg(total_cancelled=('cancel.dummy', lambda x: (x == 1).sum()))

In [29]:
total_cancelled_20_restaurant = df_20_restaurant.groupby('driver.id').agg(total_cancelled=('cancel.dummy', lambda x: (x == 1).sum()))
total_cancelled_20_grocery = df_20_grocery.groupby('driver.id').agg(total_cancelled=('cancel.dummy', lambda x: (x == 1).sum()))

In [30]:
# calculate the ratio for 20$ group for the groceries
total_cancelled_sparse = csr_matrix(total_cancelled_20_grocery).reshape(-1, 1)
total_orders_sparse = csr_matrix(total_orders_20_grocery).reshape(-1, 1)
ratio = total_cancelled_sparse/total_orders_sparse
ratio = np.array(ratio).reshape(1,-1)
series = pd.Series(ratio.flatten())
series.index = total_orders_20_grocery.index
df_20_ratio_grocery = series

In [31]:
# calculate the ratio for 20$ group for the restaurants
total_cancelled_sparse1 = csr_matrix(total_cancelled_20_restaurant).reshape(-1, 1)
total_orders_sparse1 = csr_matrix(total_orders_20_restaurant).reshape(-1, 1)
ratio1 = total_cancelled_sparse1/total_orders_sparse1
ratio1 = np.array(ratio1).reshape(1,-1)
series1 = pd.Series(ratio1.flatten())
series1.index = total_orders_20_restaurant.index
df_20_ratio_restaurant = series1

In [32]:
# Calculate the cancel ratio for 10 group for grocery
total_cancelled_sparse10 = csr_matrix(total_cancelled_10_grocery).reshape(-1, 1)
total_orders_sparse10 = csr_matrix(total_orders_10_grocery).reshape(-1, 1)
ratio10 = total_cancelled_sparse10/total_orders_sparse10
ratio10 = np.array(ratio10).reshape(1,-1)
series10 = pd.Series(ratio10.flatten())
series10.index = total_orders_10_grocery.index
df_10_ratio_grocery = series10

In [33]:
# Calculate the cancel ratio for 10 group for restaurant
total_cancelled_sparse11 = csr_matrix(total_cancelled_10_restaurant).reshape(-1, 1)
total_orders_sparse11 = csr_matrix(total_orders_10_restaurant).reshape(-1, 1)
ratio11 = total_cancelled_sparse11/total_orders_sparse11
ratio11 = np.array(ratio11).reshape(1,-1)
series11 = pd.Series(ratio11.flatten())
series11.index = total_orders_10_restaurant.index
df_10_ratio_restaurant = series11

In [34]:
# Calculate cancel rate for 0 group for grocery
total_cancelled_sparse0 = csr_matrix(total_cancelled_0_grocery).reshape(-1, 1)
total_orders_sparse0 = csr_matrix(total_orders_0_grocery).reshape(-1, 1)
ratio0 = total_cancelled_sparse0/total_orders_sparse0
ratio0 = np.array(ratio0).reshape(1,-1)
series0 = pd.Series(ratio0.flatten())
series0.index = total_orders_0_grocery.index
df_0_ratio_grocery = series0

In [35]:
# Calculate the cancel ratio for 0 group for restaurant
total_cancelled_sparse01 = csr_matrix(total_cancelled_0_restaurant).reshape(-1, 1)
total_orders_sparse01 = csr_matrix(total_orders_0_restaurant).reshape(-1, 1)
ratio01 = total_cancelled_sparse01/total_orders_sparse01
ratio01 = np.array(ratio01).reshape(1,-1)
series01 = pd.Series(ratio01.flatten())
series01.index = total_orders_0_restaurant.index
df_0_ratio_restaurant = series01

In [113]:
u1 = df_0_ratio_grocery
u2 = df_0_ratio_restaurant
u3 = df_10_ratio_grocery
u4 = df_10_ratio_restaurant
u5 = df_20_ratio_grocery
u6 = df_20_ratio_restaurant

In [115]:
### grocery vs restaurant cancel rate ratio within groups
# 0 penalty
t_test_1_2, p_value_1_2 = stats.ttest_ind(u1, u2, equal_var = False)
# 10 penalty
t_test_3_4, p_value_3_4 = stats.ttest_ind(u3, u4, equal_var = False) 
# 20 penalty
t_test_5_6, p_value_5_6 = stats.ttest_ind(u5, u6, equal_var = False) 

### grocery vs grocery cancel rate ratio between groups
# between 0 and 10 penalty for grocery
t_test_1_3, p_value_1_3 = stats.ttest_ind(u1, u3, equal_var = False) 
# between 0 and 20 penalty for grocery
t_test_1_5, p_value_1_5 = stats.ttest_ind(u1, u5, equal_var = False) 
# between 10 and 20 penalty for grocery
t_test_3_5, p_value_3_5 = stats.ttest_ind(u3, u5, equal_var = False)

### restaurant vs restaurant cancel rate ratio between groups
# between 0 and 10 penalty for restaurant
t_test_2_4, p_value_2_4 = stats.ttest_ind(u2, u4, equal_var = False) 
# between 0 and 20 penalty for restaurant
t_test_2_6, p_value_2_6 = stats.ttest_ind(u2, u6, equal_var = False) 
# between 10 and 20 penalty for restaurant
t_test_4_6, p_value_4_6 = stats.ttest_ind(u4, u6, equal_var = False)

In [117]:
results = {
    "Comparison": [
        "Grocery vs Restaurant (0 penalty)",
        "Grocery vs Restaurant (10 penalty)",
        "Grocery vs Restaurant (20 penalty)",
        "Grocery (0 vs 10 penalty)",
        "Grocery (0 vs 20 penalty)",
        "Grocery (10 vs 20 penalty)",
        "Restaurant (0 vs 10 penalty)",
        "Restaurant (0 vs 20 penalty)",
        "Restaurant (10 vs 20 penalty)"
    ],
    "t-test Value": [
        t_test_1_2, t_test_3_4, t_test_5_6,
        t_test_1_3, t_test_1_5, t_test_3_5,
        t_test_2_4, t_test_2_6, t_test_4_6
    ],
    "p-value": [
        p_value_1_2, p_value_3_4, p_value_5_6,
        p_value_1_3, p_value_1_5, p_value_3_5,
        p_value_2_4, p_value_2_6, p_value_4_6
    ]
}

results_df = pd.DataFrame(results)

### Filter the total order count per driver by them having at least 2 orders. This will reduce the impact of 0 and 1 cancel ratios for the final analysis.

In [161]:
total_orders_0_grocery_greater1 = df_0_grocery.groupby('driver.id').filter(lambda x: len(x) > 1).groupby('driver.id')['cancel.dummy'].count()
total_orders_0_restaurant_greater1 = df_0_restaurant.groupby('driver.id').filter(lambda x: len(x) > 1).groupby('driver.id')['cancel.dummy'].count()
total_orders_10_grocery_greater1 = df_10_grocery.groupby('driver.id').filter(lambda x: len(x) > 1).groupby('driver.id')['cancel.dummy'].count()
total_orders_10_restaurant_greater1 = df_10_restaurant.groupby('driver.id').filter(lambda x: len(x) > 1).groupby('driver.id')['cancel.dummy'].count()
total_orders_20_grocery_greater1 = df_20_grocery.groupby('driver.id').filter(lambda x: len(x) > 1).groupby('driver.id')['cancel.dummy'].count()
total_orders_20_restaurant_greater1 = df_20_restaurant.groupby('driver.id').filter(lambda x: len(x) > 1).groupby('driver.id')['cancel.dummy'].count()

In [162]:
total_cancelled_0_grocery_greater1 = total_cancelled_0_grocery[total_cancelled_0_grocery.index.isin(total_orders_0_grocery_greater1.index)]
total_cancelled_0_restaurant_greater1 = total_cancelled_0_restaurant[total_cancelled_0_restaurant.index.isin(total_orders_0_restaurant_greater1.index)]
total_cancelled_10_grocery_greater1 = total_cancelled_10_grocery[total_cancelled_10_grocery.index.isin(total_orders_10_grocery_greater1.index)]
total_cancelled_10_restaurant_greater1 = total_cancelled_10_restaurant[total_cancelled_10_restaurant.index.isin(total_orders_10_restaurant_greater1.index)]
total_cancelled_20_grocery_greater1 = total_cancelled_20_grocery[total_cancelled_20_grocery.index.isin(total_orders_20_grocery_greater1.index)]
total_cancelled_20_restaurant_greater1 = total_cancelled_20_restaurant[total_cancelled_20_restaurant.index.isin(total_orders_20_restaurant_greater1.index)]

In [163]:
# Calculate the cancel ratio for 0 group for restaurant, for count greater than 1
total_cancelled_sparse111 = csr_matrix(total_cancelled_0_restaurant_greater1).reshape(-1, 1)
total_orders_sparse111 = csr_matrix(total_orders_0_restaurant_greater1).reshape(-1, 1)
ratio111 = total_cancelled_sparse111/total_orders_sparse111
ratio111 = np.array(ratio111).reshape(1,-1)
series111 = pd.Series(ratio111.flatten())
series111.index = total_orders_0_restaurant_greater1.index
df_0_ratio_restaurant_greater1 = series111

In [164]:
# Calculate the cancel ratio for 0 group for grocery, for count greater than 1
total_cancelled_sparse112 = csr_matrix(total_cancelled_0_grocery_greater1).reshape(-1, 1)
total_orders_sparse112 = csr_matrix(total_orders_0_grocery_greater1).reshape(-1, 1)
ratio112 = total_cancelled_sparse112/total_orders_sparse112
ratio112 = np.array(ratio112).reshape(1,-1)
series112 = pd.Series(ratio112.flatten())
series112.index = total_orders_0_grocery_greater1.index
df_0_ratio_grocery_greater1 = series112

In [165]:
# Calculate the cancel ratio for 10 group for restaurant, for count greater than 1
total_cancelled_sparse113 = csr_matrix(total_cancelled_10_restaurant_greater1).reshape(-1, 1)
total_orders_sparse113 = csr_matrix(total_orders_10_restaurant_greater1).reshape(-1, 1)
ratio113 = total_cancelled_sparse113/total_orders_sparse113
ratio113 = np.array(ratio113).reshape(1,-1)
series113 = pd.Series(ratio113.flatten())
series113.index = total_orders_10_restaurant_greater1.index
df_10_ratio_restaurant_greater1 = series113

In [166]:
# Calculate the cancel ratio for 0 group for grocery, for count greater than 1
total_cancelled_sparse114 = csr_matrix(total_cancelled_10_grocery_greater1).reshape(-1, 1)
total_orders_sparse114 = csr_matrix(total_orders_10_grocery_greater1).reshape(-1, 1)
ratio114 = total_cancelled_sparse114/total_orders_sparse114
ratio114 = np.array(ratio114).reshape(1,-1)
series114 = pd.Series(ratio114.flatten())
series114.index = total_orders_10_grocery_greater1.index
df_10_ratio_grocery_greater1 = series114

In [167]:
# Calculate the cancel ratio for 20 group for restaurant, for count greater than 1
total_cancelled_sparse115 = csr_matrix(total_cancelled_20_restaurant_greater1).reshape(-1, 1)
total_orders_sparse115 = csr_matrix(total_orders_20_restaurant_greater1).reshape(-1, 1)
ratio115 = total_cancelled_sparse115/total_orders_sparse115
ratio115 = np.array(ratio115).reshape(1,-1)
series115 = pd.Series(ratio115.flatten())
series115.index = total_orders_20_restaurant_greater1.index
df_20_ratio_restaurant_greater1 = series115

In [168]:
# Calculate the cancel ratio for 20 group for grocery, for count greater than 1
total_cancelled_sparse116 = csr_matrix(total_cancelled_20_grocery_greater1).reshape(-1, 1)
total_orders_sparse116 = csr_matrix(total_orders_20_grocery_greater1).reshape(-1, 1)
ratio116 = total_cancelled_sparse116/total_orders_sparse116
ratio116 = np.array(ratio116).reshape(1,-1)
series116 = pd.Series(ratio116.flatten())
series116.index = total_orders_20_grocery_greater1.index
df_20_ratio_grocery_greater1 = series116

In [169]:
### grocery vs restaurant cancel rate ratio within groups, for count greater than 1
# 0 penalty
t_test_1_2_1, p_value_1_2_1 = stats.ttest_ind(df_0_ratio_grocery_greater1, df_0_ratio_restaurant_greater1, equal_var = False)
# 10 penalty
t_test_3_4_1, p_value_3_4_1 = stats.ttest_ind(df_10_ratio_grocery_greater1, df_10_ratio_restaurant_greater1, equal_var = False) 
# 20 penalty
t_test_5_6_1, p_value_5_6_1 = stats.ttest_ind(df_20_ratio_grocery_greater1, df_20_ratio_restaurant_greater1, equal_var = False) 

### grocery vs grocery cancel rate ratio between groups
# between 0 and 10 penalty for grocery
t_test_1_3_1, p_value_1_3_1 = stats.ttest_ind(df_0_ratio_grocery_greater1, df_10_ratio_grocery_greater1, equal_var = False) 
# between 0 and 20 penalty for grocery
t_test_1_5_1, p_value_1_5_1 = stats.ttest_ind(df_0_ratio_grocery_greater1, df_20_ratio_grocery_greater1, equal_var = False) 
# between 10 and 20 penalty for grocery
t_test_3_5_1, p_value_3_5_1 = stats.ttest_ind(df_10_ratio_grocery_greater1, df_20_ratio_grocery_greater1, equal_var = False)

### restaurant vs restaurant cancel rate ratio between groups
# between 0 and 10 penalty for restaurant
t_test_2_4_1, p_value_2_4_1 = stats.ttest_ind(df_0_ratio_restaurant_greater1, df_10_ratio_restaurant_greater1, equal_var = False) 
# between 0 and 20 penalty for restaurant
t_test_2_6_1, p_value_2_6_1 = stats.ttest_ind(df_0_ratio_restaurant_greater1, df_20_ratio_restaurant_greater1, equal_var = False) 
# between 10 and 20 penalty for restaurant
t_test_4_6_1, p_value_4_6_1 = stats.ttest_ind(df_10_ratio_restaurant_greater1, df_20_ratio_restaurant_greater1, equal_var = False)

In [170]:
results_greater1 = {
    "Comparison": [
        "Grocery vs Restaurant (0 penalty)",
        "Grocery vs Restaurant (10 penalty)",
        "Grocery vs Restaurant (20 penalty)",
        "Grocery (0 vs 10 penalty)",
        "Grocery (0 vs 20 penalty)",
        "Grocery (10 vs 20 penalty)",
        "Restaurant (0 vs 10 penalty)",
        "Restaurant (0 vs 20 penalty)",
        "Restaurant (10 vs 20 penalty)"
    ],
    "t-test Value": [
        t_test_1_2_1, t_test_3_4_1, t_test_5_6_1,
        t_test_1_3_1, t_test_1_5_1, t_test_3_5_1,
        t_test_2_4_1, t_test_2_6_1, t_test_4_6_1
    ],
    "p-value": [
        p_value_1_2_1, p_value_3_4_1, p_value_5_6_1,
        p_value_1_3_1, p_value_1_5_1, p_value_3_5_1,
        p_value_2_4_1, p_value_2_6_1, p_value_4_6_1
    ]
}

results_greater1_df = pd.DataFrame(results_greater1)

In [97]:
results_greater1_df

Unnamed: 0,Comparison,t-test Value,p-value
0,Grocery vs Restaurant (0 penalty),-14.713792,6.42796e-49
1,Grocery vs Restaurant (10 penalty),-17.213189,3.0789990000000003e-66
2,Grocery vs Restaurant (20 penalty),-25.445869,5.636907e-142
3,Grocery (0 vs 10 penalty),18.273679,1.695988e-74
4,Grocery (0 vs 20 penalty),54.669459,0.0
5,Grocery (10 vs 20 penalty),36.904524,2.843863e-296
6,Restaurant (0 vs 10 penalty),12.108359,1.04582e-33
7,Restaurant (0 vs 20 penalty),34.55343,1.013703e-258
8,Restaurant (10 vs 20 penalty),22.896523,1.714566e-115


### This result below is from filtering for the total order count if there are 3 or more orders per driver to just see if there is a large difference between the filtering for 2. As you can see when you compare the df above and the df below, it isn't as different as when you just compare no filter to filtering for greater than 1.

In [157]:
results_greater1_df

Unnamed: 0,Comparison,t-test Value,p-value
0,Grocery vs Restaurant (0 penalty),-11.857261,2.2550310000000002e-32
1,Grocery vs Restaurant (10 penalty),-13.488639,2.2733749999999998e-41
2,Grocery vs Restaurant (20 penalty),-22.439942,1.049942e-110
3,Grocery (0 vs 10 penalty),14.237206,6.315696e-46
4,Grocery (0 vs 20 penalty),43.258375,0.0
5,Grocery (10 vs 20 penalty),29.605611,3.6709600000000004e-191
6,Restaurant (0 vs 10 penalty),10.657443,1.763382e-26
7,Restaurant (0 vs 20 penalty),28.907537,2.4450330000000003e-181
8,Restaurant (10 vs 20 penalty),18.718835,8.628687e-78


In [159]:
results_df

Unnamed: 0,Comparison,t-test Value,p-value
0,Grocery vs Restaurant (0 penalty),-19.610049,1.753949e-85
1,Grocery vs Restaurant (10 penalty),-23.651375,2.298111e-123
2,Grocery vs Restaurant (20 penalty),-31.79133,1.008488e-220
3,Grocery (0 vs 10 penalty),25.476606,5.218444e-143
4,Grocery (0 vs 20 penalty),78.195678,0.0
5,Grocery (10 vs 20 penalty),53.235833,0.0
6,Restaurant (0 vs 10 penalty),14.182067,1.284723e-45
7,Restaurant (0 vs 20 penalty),44.271996,0.0
8,Restaurant (10 vs 20 penalty),30.304016,5.898191e-201


In [366]:
len(df_0_ratio_grocery), len(df_10_ratio_grocery)

(141521, 141770)

In [368]:
len(df_0_ratio_restaurant), len(df_10_ratio_restaurant)

(62345, 62202)

In [231]:
df_0_ratio_grocery.mean(), df_10_ratio_grocery.mean(), df_0_ratio_grocery.std(), df_10_ratio_grocery.std()

(0.16697103472147412,
 0.1349827669265178,
 0.35074448305646044,
 0.3166456851348647)

In [233]:
df_10_ratio_grocery.mean()/df_0_ratio_grocery.mean()

0.8084202577511949

In [242]:
df_10_ratio_grocery.std()/df_0_ratio_grocery.std()

0.9027816556815043

In [229]:
df_0_ratio_restaurant.mean(), df_10_ratio_restaurant.mean(), df_0_ratio_restaurant.std(), df_10_ratio_restaurant.std()

(0.20106813400147266,
 0.17272698719539534,
 0.3664568365773622,
 0.33826361304527)

In [235]:
df_10_ratio_restaurant.mean()/df_0_ratio_restaurant.mean()

0.8590470491665788

In [244]:
df_10_ratio_restaurant.std()/df_0_ratio_restaurant.std()

0.9230653634534106

The t statistics indicates that introducing a penalty fee impacts grocery deliveries more than restaurant deliveries, e.g., Grocery (0 vs 10 penalty) has a t statistic of 18.273679 and Restaurant (0 vs 10 penalty) has a t statistic of 12.108359. Why is this the case?

When looking at the avg cancellation rate and standard deviation for these groups, it shows that the grocery deliveries are MORE impacted by the penalty fee, e.g., the ratio for the difference of average means between the 0 and 10 grocery groups is about 0.808, while the ratio for the 0 and 10 restaurant groups is about 0.859. Having a ratio further away from 1 means that there is a larger difference between the two average means. The grocery ratio is further away, so having the 10 penalty for cancellation reduces the average rate of cancellation for grocery deliveries more than the restaurant group. This ratio difference is also for the standard deviations of both the grocery and restaurant groups, i.e., by imposing the penalty, the grocery group gets a greater reduction in the variance than the restaurant group (e.g., a standard deviation ratio of 0.902 for the grocery group versus a ratio of 0.923 for the restaurant group). 

The average cancellation rate for grocery deliveries already remains lower than the restaurant group (e.g., 0.166 vs 0.201), so the restaurant originally begins with a higher cancellation rate before the penalty is imposed. So why would groceries be not as cancelled as the restaurants?

Furthermore, it is interesting to note that the cancel ratio t tests are more similar to the grocery than restaurant group comparisons. For instance, the regular cancellation rate between each groups is given below, e.g., between the 0 and 10 penalty groups, there is a 27.449 t statistic. This value is much closer to the Grocery (0 vs 10 penalty)	group with a t statistic of 25.476606 than the Restaurant (0 vs 10 penalty) group with a t statistic of 14.182067. This is consistent with the rest of the groups. So, the grocery t statistic are generally larger and more closely aligned with the cancellation rate t-tests, which suggests that grocery cancellations might be more sensitive to changes in penalty levels than restaurant cancellations. Hence, as the penalty increases, the reduction in cancellations for groceries becomes greater.

So, the major question remains:
### Why does imposing a penalty fee reduce the cancellation rate for grocery deliveries MORE than the restaurant deliveries?

In [210]:
# cancel ratio without dividing between grocery and delivery
print(f"between 0 and 10 penalty {t_test_1, p_value_1}")
print(f"between 0 and 20 penalty {t_test_2, p_value_2}")
print(f"between 10 and 20 penalty {t_test_3, p_value_3}")

between 0 and 10 penalty (27.44918743790098, 1.0656216309572848e-165)
between 0 and 20 penalty (85.85966923311122, 0.0)
between 10 and 20 penalty (58.961269253935605, 0.0)


### Another question: What is the impact of expected profit between the two segments?

### The cancellation rate without segments

In [189]:
total_cancelled_20 = df_20.groupby('driver.id').agg(total_cancelled=('cancel.dummy', lambda x: (x == 1).sum()))

In [190]:
total_cancelled_10 = df_10.groupby('driver.id').agg(total_cancelled=('cancel.dummy', lambda x: (x == 1).sum()))

In [191]:
total_cancelled_0 = df_0.groupby('driver.id').agg(total_cancelled=('cancel.dummy', lambda x: (x == 1).sum()))

In [192]:
total_orders_20 = df_20.groupby('driver.id')['cancel.dummy'].count()
total_orders_10 = df_10.groupby('driver.id')['cancel.dummy'].count()
total_orders_0 = df_0.groupby('driver.id')['cancel.dummy'].count()

In [193]:
total_cancelled_sparse_20 = csr_matrix(total_cancelled_20).reshape(-1, 1)
total_orders_sparse_20 = csr_matrix(total_orders_20).reshape(-1, 1)
ratio_20 = total_cancelled_sparse_20/total_orders_sparse_20
ratio_20 = np.array(ratio_20).reshape(1,-1)
series_20 = pd.Series(ratio_20.flatten())
series_20.index = total_orders_20.index
df_20_ratio = series_20

In [194]:
total_cancelled_sparse_10 = csr_matrix(total_cancelled_10).reshape(-1, 1)
total_orders_sparse_10 = csr_matrix(total_orders_10).reshape(-1, 1)
ratio_10 = total_cancelled_sparse_10/total_orders_sparse_10
ratio_10 = np.array(ratio_10).reshape(1,-1)
series_10 = pd.Series(ratio_10.flatten())
series_10.index = total_orders_10.index
df_10_ratio = series_10

In [195]:
total_cancelled_sparse_0 = csr_matrix(total_cancelled_0).reshape(-1, 1)
total_orders_sparse_0 = csr_matrix(total_orders_0).reshape(-1, 1)
ratio_0 = total_cancelled_sparse_0/total_orders_sparse_0
ratio_0 = np.array(ratio_0).reshape(1,-1)
series_0 = pd.Series(ratio_0.flatten())
series_0.index = total_orders_0.index
df_0_ratio = series_0

In [196]:
### cancel rate ratio between groups
# between 0 and 10 penalty
t_test_1, p_value_1 = stats.ttest_ind(df_0_ratio, df_10_ratio, equal_var = False)
# between 0 and 20 penalty
t_test_2, p_value_2 = stats.ttest_ind(df_0_ratio, df_20_ratio, equal_var = False)
# between 10 and 20 penalty
t_test_3, p_value_3 = stats.ttest_ind(df_10_ratio, df_20_ratio, equal_var = False)

In [197]:
print(t_test_1, p_value_1)
print(t_test_2, p_value_2)
print(t_test_3, p_value_3)

27.44918743790098 1.0656216309572848e-165
85.85966923311122 0.0
58.961269253935605 0.0


### Testing for average profit between grocery and restaurant segments, and within each segment.

In [45]:
df_0_grocery_profit_mean = df_0_grocery.groupby('driver.id')['expected.profit'].mean()
df_10_grocery_profit_mean = df_10_grocery.groupby('driver.id')['expected.profit'].mean()
df_20_grocery_profit_mean = df_20_grocery.groupby('driver.id')['expected.profit'].mean()
df_0_restaurant_profit_mean = df_0_restaurant.groupby('driver.id')['expected.profit'].mean()
df_10_restaurant_profit_mean = df_10_restaurant.groupby('driver.id')['expected.profit'].mean()
df_20_restaurant_profit_mean = df_20_restaurant.groupby('driver.id')['expected.profit'].mean()

In [509]:
### grocery vs restaurant mean profit
# 0 penalty
t_test_0_1, p_value_0_1 = stats.ttest_ind(df_0_grocery_profit_mean, df_0_restaurant_profit_mean, equal_var = False)
# 10 penalty
t_test_10_1, p_value_10_1 = stats.ttest_ind(df_10_grocery_profit_mean, df_10_restaurant_profit_mean, equal_var = False) 
# 20 penalty
t_test_20_1, p_value_20_1 = stats.ttest_ind(df_20_grocery_profit_mean, df_20_restaurant_profit_mean, equal_var = False) 

### grocery vs grocery mean profit
# between 0 and 10 penalty for grocery
t_test_0_1_g, p_value_0_1_g = stats.ttest_ind(df_0_grocery_profit_mean, df_10_grocery_profit_mean, equal_var = False) 
# between 0 and 20 penalty for grocery
t_test_10_1_g, p_value_10_1_g = stats.ttest_ind(df_0_grocery_profit_mean, df_20_grocery_profit_mean, equal_var = False) 
# between 10 and 20 penalty for grocery
t_test_20_1_g, p_value_20_1_g = stats.ttest_ind(df_10_grocery_profit_mean, df_20_grocery_profit_mean, equal_var = False)

### restaurant vs restaurant mean profit
# between 0 and 10 penalty for restaurant
t_test_0_1_r, p_value_0_1_r = stats.ttest_ind(df_0_restaurant_profit_mean, df_10_restaurant_profit_mean, equal_var = False) 
# between 0 and 20 penalty for restaurant
t_test_10_1_r, p_value_10_1_r = stats.ttest_ind(df_0_restaurant_profit_mean, df_20_restaurant_profit_mean, equal_var = False) 
# between 10 and 20 penalty for restaurant
t_test_20_1_r, p_value_20_1_r = stats.ttest_ind(df_10_restaurant_profit_mean, df_20_restaurant_profit_mean, equal_var = False)

In [513]:
avg_profit_between_groc_rest = {
    "Comparison": [
        "Grocery vs Restaurant (0 penalty)",
        "Grocery vs Restaurant (10 penalty)",
        "Grocery vs Restaurant (20 penalty)",
        "Grocery vs Grocery (0 vs 10 penalty)",
        "Grocery vs Grocery (0 vs 20 penalty)",
        "Grocery vs Grocery (10 vs 20 penalty)",
        "Restaurant vs Restaurant (0 vs 10 penalty)",
        "Restaurant vs Restaurant (0 vs 20 penalty)",
        "Restaurant vs Restaurant (10 vs 20 penalty)"
    ],
    "t-value": [
        t_test_0_1, t_test_10_1, t_test_20_1,
        t_test_0_1_g, t_test_10_1_g, t_test_20_1_g,
        t_test_0_1_r, t_test_10_1_r, t_test_20_1_r
    ],
    "p-value": [
        p_value_0_1, p_value_10_1, p_value_20_1,
        p_value_0_1_g, p_value_10_1_g, p_value_20_1_g,
        p_value_0_1_r, p_value_10_1_r, p_value_20_1_r
    ]
}

results_table = pd.DataFrame(avg_profit_between_groc_rest)

In [515]:
results_table

Unnamed: 0,Comparison,t-value,p-value
0,Grocery vs Restaurant (0 penalty),152.563394,0.0
1,Grocery vs Restaurant (10 penalty),171.000264,0.0
2,Grocery vs Restaurant (20 penalty),178.495276,0.0
3,Grocery vs Grocery (0 vs 10 penalty),-29.281028,3.4813e-188
4,Grocery vs Grocery (0 vs 20 penalty),-55.753416,0.0
5,Grocery vs Grocery (10 vs 20 penalty),-27.026704,1.152017e-160
6,Restaurant vs Restaurant (0 vs 10 penalty),-32.525802,4.510416e-231
7,Restaurant vs Restaurant (0 vs 20 penalty),-58.352788,0.0
8,Restaurant vs Restaurant (10 vs 20 penalty),-27.978545,1.017035e-171


In [51]:
mean_0_g = df_0_grocery_profit_mean.mean()
mean_10_g = df_10_grocery_profit_mean.mean()
mean_20_g = df_20_grocery_profit_mean.mean()
mean_0_r = df_0_restaurant_profit_mean.mean()
mean_10_r = df_10_restaurant_profit_mean.mean()
mean_20_r = df_20_restaurant_profit_mean.mean()
std_0_g = df_0_grocery_profit_mean.std()
std_10_g = df_10_grocery_profit_mean.std()
std_20_g = df_20_grocery_profit_mean.std()
std_0_r = df_0_restaurant_profit_mean.std()
std_10_r = df_10_restaurant_profit_mean.std()
std_20_r = df_20_restaurant_profit_mean.std()

In [55]:
mean_0_g, mean_10_g, mean_20_g

(15.226250413782463, 16.53396399943229, 17.68544789454814)

In [61]:
std_0_g, std_10_g, std_20_g

(12.235265280117646, 11.470494547517514, 11.166649996175947)

In [59]:
mean_0_r, mean_10_r, mean_20_r

(8.707097361421582, 9.896782036371459, 10.84897035875083)

In [557]:
mean_10_g/mean_0_g, std_10_g/std_0_g

(1.0858854642549498, 0.9374945524194814)

In [559]:
mean_20_g/mean_10_g, std_20_g/std_10_g

(1.0696435467716867, 0.973510771476952)

In [561]:
mean_10_r/mean_0_r, std_10_r/std_0_r

(1.1366339005488784, 0.8663878727359996)

In [567]:
mean_20_r/mean_10_r, std_20_r/std_10_r

(1.0962119120012952, 1.0055217503242817)

### Compare against profit without segments of grocery and restaurant

In [63]:
profit_mean_0 = df_0.groupby('driver.id')['expected.profit'].sum()
profit_mean_10 = df_10.groupby('driver.id')['expected.profit'].sum()
profit_mean_20 = df_20.groupby('driver.id')['expected.profit'].sum()

In [67]:
profit_mean_0.mean(), profit_mean_10.mean(), profit_mean_20.mean()

(13.664747022733689, 14.942234063647941, 16.061971061017825)

In [69]:
profit_mean_0.std(), profit_mean_10.std(), profit_mean_20.std()

(11.507145327284867, 10.813149196575065, 10.609135258842917)

In [576]:
# 0 penalty
t_test_0_2, p_value_0_2 = stats.ttest_ind(profit_mean_0, profit_mean_10, equal_var = False)
# 10 penalty
t_test_10_2, p_value_10_2 = stats.ttest_ind(profit_mean_0, profit_mean_20, equal_var = False) 
# 20 penalty
t_test_20_2, p_value_20_2 = stats.ttest_ind(profit_mean_10, profit_mean_20, equal_var = False) 

In [578]:
t_test_0_2, p_value_0_2

(-33.97829876476105, 1.2025316026422955e-252)

In [580]:
t_test_10_2, p_value_10_2

(-64.32316600443902, 0.0)

In [582]:
t_test_20_2, p_value_20_2 

(-31.038930625443125, 3.1085795144043285e-211)

### Looking at the time placed for orders and comparing against those cancelled

In [248]:
df_time = df.copy()

In [250]:
df_time['time_of_day'] = df['order.placed.time'].dt.strftime('%H:%M')

In [85]:
bins = pd.date_range('00:00', '23:59', freq='h').time
df_time['time_interval'] = pd.cut(pd.to_datetime(df_time['time_of_day'], format='%H:%M').dt.hour, bins=range(24), right=False, labels=[f'{i}:00-{i+1}:00' for i in range(23)])

In [89]:
df_time.drop(columns='order.id', inplace=True)             

In [93]:
df_time['time_interval'].value_counts().sort_index()

time_interval
0:00-1:00      34929
1:00-2:00      24098
2:00-3:00      13565
3:00-4:00       8775
4:00-5:00      12687
5:00-6:00      18608
6:00-7:00      32934
7:00-8:00      63061
8:00-9:00      79085
9:00-10:00     77032
10:00-11:00    63217
11:00-12:00    61341
12:00-13:00    60352
13:00-14:00    60203
14:00-15:00    63040
15:00-16:00    72834
16:00-17:00    82327
17:00-18:00    95915
18:00-19:00    98796
19:00-20:00    89730
20:00-21:00    82446
21:00-22:00    79190
22:00-23:00    71457
Name: count, dtype: int64

In [688]:
df_incompleted_time_orders = df_time[df_time['delivery.completed.time'].isna()]

In [696]:
df_incompleted_time_orders['time_interval'].value_counts().sort_index() / df_time['time_interval'].value_counts().sort_index()

time_interval
0:00-1:00      0.148215
1:00-2:00      0.165864
2:00-3:00      0.177147
3:00-4:00      0.170598
4:00-5:00      0.309529
5:00-6:00      0.183093
6:00-7:00      0.155341
7:00-8:00      0.237770
8:00-9:00      0.195410
9:00-10:00     0.136164
10:00-11:00    0.118623
11:00-12:00    0.121517
12:00-13:00    0.123459
13:00-14:00    0.120243
14:00-15:00    0.125523
15:00-16:00    0.150246
16:00-17:00    0.140379
17:00-18:00    0.165542
18:00-19:00    0.140016
19:00-20:00    0.124384
20:00-21:00    0.131310
21:00-22:00    0.140384
22:00-23:00    0.154694
Name: count, dtype: float64

### checking the times when the orders were cancelled for each segment

In [99]:
df_time_0 = df_time[df_time['penalty.variant'] == 0]
df_time_10 = df_time[df_time['penalty.variant'] == 10]
df_time_20 = df_time[df_time['penalty.variant'] == 20]

In [101]:
df_time_0_grocery = df_time_0[df_time_0['business.type'] == 'grocery']
df_time_10_grocery = df_time_10[df_time_10['business.type'] == 'grocery']
df_time_20_grocery = df_time_20[df_time_20['business.type'] == 'grocery']
df_time_0_restaurant = df_time_0[df_time_0['business.type'] == 'restaurant']
df_time_10_restaurant = df_time_10[df_time_10['business.type'] == 'restaurant']
df_time_20_restaurant = df_time_20[df_time_20['business.type'] == 'restaurant']

In [107]:
df_incompleted_time_0_g = df_time_0_grocery[df_time_0_grocery['delivery.completed.time'].isna()]
df_incompleted_time_10_g = df_time_10_grocery[df_time_10_grocery['delivery.completed.time'].isna()]
df_incompleted_time_20_g = df_time_20_grocery[df_time_20_grocery['delivery.completed.time'].isna()]
df_incompleted_time_0_r = df_time_0_restaurant[df_time_0_restaurant['delivery.completed.time'].isna()]
df_incompleted_time_10_r = df_time_10_restaurant[df_time_10_restaurant['delivery.completed.time'].isna()]
df_incompleted_time_20_r = df_time_20_restaurant[df_time_20_restaurant['delivery.completed.time'].isna()]

In [131]:
cancelled_orders_per_hour_20_r = df_incompleted_time_20_r['time_interval'].value_counts().sort_index()
cancelled_orders_per_hour_10_r = df_incompleted_time_10_r['time_interval'].value_counts().sort_index()
cancelled_orders_per_hour_0_r = df_incompleted_time_0_r['time_interval'].value_counts().sort_index()
total_orders_per_hour_20_r = df_time_20_restaurant['time_interval'].value_counts().sort_index()
total_orders_per_hour_10_r = df_time_10_restaurant['time_interval'].value_counts().sort_index()
total_orders_per_hour_0_r = df_time_0_restaurant['time_interval'].value_counts().sort_index()

In [133]:
cancelled_orders_per_hour_20_g = df_incompleted_time_20_g['time_interval'].value_counts().sort_index()
cancelled_orders_per_hour_10_g = df_incompleted_time_10_g['time_interval'].value_counts().sort_index()
cancelled_orders_per_hour_0_g = df_incompleted_time_0_g['time_interval'].value_counts().sort_index()
total_orders_per_hour_20_g = df_time_20_grocery['time_interval'].value_counts().sort_index()
total_orders_per_hour_10_g = df_time_10_grocery['time_interval'].value_counts().sort_index()
total_orders_per_hour_0_g = df_time_0_grocery['time_interval'].value_counts().sort_index()

In [146]:
ratio_cancellation_per_hour_0_g = cancelled_orders_per_hour_0_g / total_orders_per_hour_0_g
ratio_cancellation_per_hour_10_g = cancelled_orders_per_hour_10_g / total_orders_per_hour_10_g
ratio_cancellation_per_hour_20_g = cancelled_orders_per_hour_20_g / total_orders_per_hour_20_g
ratio_cancellation_per_hour_0_r = cancelled_orders_per_hour_0_r / total_orders_per_hour_0_r
ratio_cancellation_per_hour_10_r = cancelled_orders_per_hour_10_r / total_orders_per_hour_10_r
ratio_cancellation_per_hour_20_r = cancelled_orders_per_hour_20_r / total_orders_per_hour_20_r

In [262]:
ratio_cancellation_per_hour_20_g

time_interval
0:00-1:00      0.093172
1:00-2:00      0.112991
2:00-3:00      0.118618
3:00-4:00      0.108808
4:00-5:00      0.285372
5:00-6:00      0.146079
6:00-7:00      0.091966
7:00-8:00      0.193977
8:00-9:00      0.146958
9:00-10:00     0.067975
10:00-11:00    0.057688
11:00-12:00    0.058330
12:00-13:00    0.059735
13:00-14:00    0.056315
14:00-15:00    0.063365
15:00-16:00    0.087191
16:00-17:00    0.075124
17:00-18:00    0.097128
18:00-19:00    0.073542
19:00-20:00    0.059281
20:00-21:00    0.067077
21:00-22:00    0.080322
22:00-23:00    0.099674
Name: count, dtype: float64

All ratios calculated above give the percentage of orders cancelled out of all orders.

In [264]:
# The treatment effect tells me what the difference is when imposing the 20 dollar cancellation fee. 
# For the restaurant segment, 4-5 am has the least reduction in cancellations. 
treatment_effect_r_0_20 = ratio_cancellation_per_hour_0_r - ratio_cancellation_per_hour_20_r
treatment_effect_r_0_20

time_interval
0:00-1:00      0.089010
1:00-2:00      0.070866
2:00-3:00      0.099594
3:00-4:00      0.112405
4:00-5:00      0.066020
5:00-6:00      0.078905
6:00-7:00      0.087985
7:00-8:00      0.101042
8:00-9:00      0.085145
9:00-10:00     0.080800
10:00-11:00    0.092557
11:00-12:00    0.078155
12:00-13:00    0.089659
13:00-14:00    0.086267
14:00-15:00    0.079317
15:00-16:00    0.074109
16:00-17:00    0.081379
17:00-18:00    0.082086
18:00-19:00    0.084848
19:00-20:00    0.088387
20:00-21:00    0.094547
21:00-22:00    0.076034
22:00-23:00    0.087890
Name: count, dtype: float64

In [217]:
# But how much is this impact in relation to the baseline?
percentage_change_from_treatment_0_20_r = treatment_effect_r_0_20 / ratio_cancellation_per_hour_0_r

In [219]:
percentage_change_from_treatment_0_20_r

time_interval
0:00-1:00      0.428482
1:00-2:00      0.346190
2:00-3:00      0.427858
3:00-4:00      0.464750
4:00-5:00      0.217036
5:00-6:00      0.339692
6:00-7:00      0.389412
7:00-8:00      0.352698
8:00-9:00      0.351515
9:00-10:00     0.399334
10:00-11:00    0.486205
11:00-12:00    0.414540
12:00-13:00    0.456035
13:00-14:00    0.441220
14:00-15:00    0.399409
15:00-16:00    0.347498
16:00-17:00    0.392357
17:00-18:00    0.352050
18:00-19:00    0.397335
19:00-20:00    0.448954
20:00-21:00    0.459033
21:00-22:00    0.385706
22:00-23:00    0.417891
Name: count, dtype: float64

This indicates that the time interval between 4-5 am is least impacted by the cancellation fee between the 0 and 20 dollar penalty groups for the restaurant segment

In [214]:
percentage_change_from_treatment_0_20_r.idxmin(), percentage_change_from_treatment_0_20_r.min()

('4:00-5:00', 0.2124943723182373)

In [222]:
treatment_effect_g_0_20 = ratio_cancellation_per_hour_0_g - ratio_cancellation_per_hour_20_g

In [224]:
percentage_change_from_treatment_0_20_g = treatment_effect_g_0_20 / ratio_cancellation_per_hour_0_g

In [226]:
percentage_change_from_treatment_0_20_g

time_interval
0:00-1:00      0.479256
1:00-2:00      0.428470
2:00-3:00      0.431150
3:00-4:00      0.458605
4:00-5:00      0.212494
5:00-6:00      0.293624
6:00-7:00      0.488312
7:00-8:00      0.291696
8:00-9:00      0.365163
9:00-10:00     0.576627
10:00-11:00    0.614011
11:00-12:00    0.617210
12:00-13:00    0.614847
13:00-14:00    0.628448
14:00-15:00    0.580984
15:00-16:00    0.510457
16:00-17:00    0.538803
17:00-18:00    0.489822
18:00-19:00    0.537933
19:00-20:00    0.589475
20:00-21:00    0.576534
21:00-22:00    0.532475
22:00-23:00    0.463844
Name: count, dtype: float64

In [228]:
percentage_change_from_treatment_0_20_g.idxmin(), percentage_change_from_treatment_0_20_g.min()

('4:00-5:00', 0.2124943723182373)

The penalty fee of 20 dollars has the lowest impact on those grocery deliveries being cancelled between 4:00-5:00.

In [244]:
# Do it for 0 and 10 group in grocery
treatment_effect_g_0_10 = ratio_cancellation_per_hour_0_g - ratio_cancellation_per_hour_10_g
percentage_change_from_treatment_0_10_g = treatment_effect_g_0_10 / ratio_cancellation_per_hour_10_g
# Do it for 10 and 20 group in grocery
treatment_effect_g_10_20 = ratio_cancellation_per_hour_10_g - ratio_cancellation_per_hour_20_g
percentage_change_from_treatment_10_20_g = treatment_effect_g_10_20 / ratio_cancellation_per_hour_10_g
# Do it for both 0 and 10 group and 10 and 20 group in restaurant
treatment_effect_r_0_10 = ratio_cancellation_per_hour_0_r - ratio_cancellation_per_hour_10_r
percentage_change_from_treatment_0_10_r = treatment_effect_r_0_10 / ratio_cancellation_per_hour_0_r
treatment_effect_r_10_20 = ratio_cancellation_per_hour_10_r - ratio_cancellation_per_hour_20_r
percentage_change_from_treatment_10_20_r = treatment_effect_r_10_20 / ratio_cancellation_per_hour_10_r

For cancellation rates in an hourly time interval from 0:00 to 23:59, separated by each segment, the times between 4-5 am are impacted the LEAST by imposing the penalty fee, EXCEPT for the restaurant segment between the 0 and 10 dollar penalty groups. 

Is there something that can be done in this specific time interval?