# AB Test Simulation

- In this notebook, we simulate an A/B test to study the impact of reducing busy ratio on delivery lateness. 

- We randomly assign orders to a control group (no change) and a treatment group (busy ratio artificially reduced), then analyze whether this intervention leads to improved on-time delivery rates.

# Import Libraries

In [7]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
from scipy.stats import chi2_contingency, mannwhitneyu, ttest_rel
from statsmodels.stats.proportion import proportions_ztest
import statsmodels.formula.api as smf


In [8]:
df = pd.read_csv("stattest_dataset.csv")

# Busy Ratio

We tested whether reducing the busy ratio of dashers in Market 1 could decrease the chance of late deliveries.

Control group: Current conditions.

Treatment group: Busy ratio reduced by 30%.

In [9]:
df_m1 = df[df["market_id"] == 1].copy()

df_m1["group"] = np.random.choice(["control", "treatment"], size=len(df_m1))

df_m1.loc[df_m1["group"] == "treatment", "busy_ratio"] *= 0.70


Null hypothesis (H₀): The proportion of on-time deliveries is the same in the treatment group (reduced busy ratio) and the control group.


Alternative hypothesis (H₁): The treatment group has a higher on-time rate than the control group.


In [10]:
control = df_m1[df_m1["group"] == "control"]["is_on_time"]
treatment = df_m1[df_m1["group"] == "treatment"]["is_on_time"]

count = [treatment.sum(), control.sum()]  # treatment first
nobs = [len(treatment), len(control)]

stat, pval = proportions_ztest(count, nobs, alternative='larger')  # test if treatment > control
print(f"Z-statistic: {stat}, p-value: {pval}")


Z-statistic: -0.0022877610330355688, p-value: 0.500912683807391


Interpretation:

- The p-value (0.407
) is much larger than 0.05, so we fail to reject the null hypothesis.

- This means there is no statistically significant difference in on-time rate between the treatment group (reduced busy_ratio) and the control group.

# Order to Dasher Ratio

Let's do the same for Order to Dasher Ratio

In [11]:
df_m1 = df[df["market_id"] == 1].copy()

df_m1["group"] = np.random.choice(["control", "treatment"], size=len(df_m1))

df_m1.loc[df_m1["group"] == "treatment", "order_to_dasher_ratio"] *= 0.5


Null hypothesis (H₀): The proportion of on-time deliveries is the same in the treatment group (reduced busy ratio) and the control group.


Alternative hypothesis (H₁): The treatment group has a higher on-time rate than the control group.


In [12]:

control = df_m1[df_m1["group"] == "control"]["is_on_time"]
treatment = df_m1[df_m1["group"] == "treatment"]["is_on_time"]

count = [treatment.sum(), control.sum()]  # treatment first
nobs = [len(treatment), len(control)]

stat, pval = proportions_ztest(count, nobs, alternative='larger')  # test if treatment > control
print(f"Z-statistic: {stat}, p-value: {pval}")


Z-statistic: 0.5361880744792235, p-value: 0.29591428732371794


Interpretation:

- The p-value (0.447
) is much larger than 0.05, so we fail to reject the null hypothesis.

- This means there is no statistically significant difference in on-time rate between the treatment group (reduced order_to_dasher_ratio) and the control group.

# Next Steps:

Final reccomendations in the readme.