# Power analysis for conversion

**Hypothesis**: raising (lowering) the recommended booked budget will lower (raise) conversion.

If this is true, we can change the default recommendation to maximise revenue

```math
\textup{Revenue}(\textup{budget recommendation}) = \textup{conversion}(\textup{budget recommendation}) \times \textup{budget recommendation}
```

And we can choose the recommendation that maximises $\textup{DH Revenue}$. This is not our long term goal, which is vendor value. But learning how to measure response to our policies will be necessary to optimise for the more complicated problem of vendor value.

We want to estimate the $\textup{conversion}$ mapping from budget recommendation size to conversion. 

We can do this by randomly varying the budget recommendation and observing conversion rates.

**Findings**

1. For a significant difference in a month long basic A/B test, we could detect 10% relative uplift using about 1/3 vendors.
2. Our key parameter is the _elasticity_, and we want to test that this is signficantly different from one.

*Note*
- We can also do this for retention and vendor outcomes. These are both likely to be harder to measure, because they are noisier. First, retention and vendor outcomes are both conditional on self service bookings. The self service booking rate is low, [around 1.5% over 2025](https://tableau.deliveryhero.net/#/site/GlobalStandardReporting/views/SelfBookingAnalytics/MBRproduct?=null&:iid=1).

In [13]:
import numpy as np
import pandas as pd

from statsmodels.stats.proportion import proportion_effectsize
from statsmodels.stats.power import zt_ind_solve_power
from scipy import stats
from scipy.stats import norm
import matplotlib.pyplot as plt
from typing import Dict, Tuple
import warnings
warnings.filterwarnings('ignore')

- Based on the [Self Booking dashboard](https://tableau.deliveryhero.net/#/site/GlobalStandardReporting/views/SelfBookingAnalytics/MBRproduct?=null&:iid=1), we have a Vendor Portal Conversion Rate around 1.5%.
- We use standard power of 0.8 and type two error rate bound of 0.05
- We have around 650k users per month

Let's find the minimum effect size

In [19]:
MONTHLY_USERS = 650000
ALPHA=0.05  # type I error rate
BETA=0.2    # type II error rate
BASELINE_CVR = 0.015

In [20]:
RELATIVE_EFFECT = 0.05

effect_size = proportion_effectsize(BASELINE_CVR, BASELINE_CVR * (1 + RELATIVE_EFFECT))

sample_size = zt_ind_solve_power(
    effect_size=effect_size,
    power=1-BETA,
    alpha=ALPHA,
    alternative='two-sided' 
)

print(f"Sample size per group: {sample_size:,.0f}")
print(f"This is {2*sample_size/MONTHLY_USERS:.2f} times our monthly user base")

Sample size per group: 422,412
This is 1.30 times our monthly user base


In [22]:
RELATIVE_EFFECT = 0.1
print(BASELINE_CVR * (1 + RELATIVE_EFFECT))

effect_size = proportion_effectsize(BASELINE_CVR, BASELINE_CVR * (1 + RELATIVE_EFFECT))

sample_size = zt_ind_solve_power(
    effect_size=effect_size,
    power=1-BETA,
    alpha=ALPHA,
    alternative='two-sided' 
)

print(f"Sample size per group: {sample_size:,.0f}")
print(f"This is {2*sample_size/MONTHLY_USERS:.2f} times our monthly user base")

0.0165
Sample size per group: 108,093
This is 0.33 times our monthly user base


# Let's simulate some data and recover parameters



In [171]:
import numpy as np
import pandas as pd
import scipy as sp
import statsmodels.api as sm
from statsmodels.discrete.discrete_model import Logit

rng = np.random.default_rng(42)

In [None]:
def get_conversion_probability(alpha: float, beta: float, budget: float) -> float:
    # conversion probability (make it decreasing in price, arbitrary functional form)
    logit = alpha + beta * np.log(budget)
    conv_prob = 1 / (1 + np.exp(-logit))
    return conv_prob


def get_simulation(N: int, budget_change: float, alpha: float, beta: float) -> pd.DataFrame:
    assert budget_change < 1, "budget_change needs to be less than one"
    assert budget_change > 0, "budget_change needs to be greater than zero"

    # baseline mean budget
    baseline_budget_mean = 100
    baseline_budgets = rng.poisson(lam=baseline_budget_mean, size=N)
    
    # randomly assign up, same, down 
    assignments = np.random.choice(['down', 'same', 'up'], size=N)
    budgets = np.where(assignments == 'down', baseline_budgets*(1-budget_change),
                       np.where(assignments == 'up', baseline_budgets*(1+budget_change), baseline_budgets))
    conversion_probabilities = [
        get_conversion_probability(alpha, beta, b) for b in budgets
    ]
    # conversion draws
    conversion = np.random.binomial(1, conversion_probabilities)

    df = pd.DataFrame({
        "user_id": np.arange(N),
        "assignment": assignments,
        "budget": budgets,
        "conv_prob": conversion_probabilities,
        "conversion": conversion
    })

    df['base_budget'] = df['budget'].copy()
    
    # Reverse the randomization
    df.loc[df['assignment'] == 'up', 'base_budget'] = df.loc[df['assignment'] == 'up', 'budget'] / (1+budget_change)
    df.loc[df['assignment'] == 'down', 'base_budget'] = df.loc[df['assignment'] == 'down', 'budget'] / (1-budget_change)
    return df

def simulate_outcomes(df: pd.DataFrame, alpha: float, elasticity: float, estimated_elasticity: float, sim_change: float) -> Dict[str, float]:

    df['base_conversion_prob'] = [get_conversion_probability(alpha, estimated_elasticity, b) for b in df.base_budget.values]
    df['base_conversion'] = [np.random.binomial(1, p) for p in df.base_conversion_prob]
    df['expected_revenue'] = df['base_conversion_prob'] * df['base_budget']

    df['sim_budget'] = df.base_budget * (1 + sim_change)
    df['sim_conversion_prob'] = [get_conversion_probability(alpha, estimated_elasticity, b) for b in df.sim_budget.values]

    df['sim_conversion'] = [np.random.binomial(1, p) for p in df.sim_conversion_prob]
    df['sim_expected_revenue'] = df['sim_conversion_prob'] * df['sim_budget']

    original_exp_revenue = np.sum(df.expected_revenue)
    simulated_exp_revenue = np.sum(df.sim_expected_revenue)
    return {'original': original_exp_revenue, 'simulated': simulated_exp_revenue}


In [371]:
alpha = 10
elasticity = -3
print(f"True elasticity is {elasticity}")
df = get_simulation(10000, 0.2, alpha, elasticity)
print(f"Baseline conversion rate {df.conversion.mean()}")

results = simulate_outcomes(df, alpha, elasticity, elasticity, 0.1)

True elasticity is -3
Baseline conversion rate 0.0258


First we use a Wald Estimator

$$
\theta = \frac{E[\ln(\textup{Conversion}) | \textup{Higher budget}] - E[\ln(\textup{Conversion}) | \textup{Lower budget}]}
{E[\ln(\textup{Budget}) | \textup{Higher Budget}] - E[\ln\textup{Budget} | \textup{Lower Budget}]}
$$

Note, this is a linear estimator, but our model is non linear. So there will be some approximation error.

In [372]:
def get_wald_estimate_simple(df: pd.DataFrame, Z1_name: str, Z0_name: str) -> float:
    # Delta log conversion rate / Delta log budget
    p1 = df.loc[df['assignment']==Z1_name]['conversion'].mean()
    p0 = df.loc[df['assignment']==Z0_name]['conversion'].mean()
    
    delta_ln_conversion = np.log(p1) - np.log(p0)
    delta_ln_budget = np.mean(np.log(df.loc[df.assignment==Z1_name].budget)) - \
                      np.mean(np.log(df.loc[df.assignment==Z0_name].budget))
    
    return delta_ln_conversion / delta_ln_budget


Z1 = ['up', 'same', 'up']
Z0 = ['same', 'down', 'down']
for z1, z0 in zip(Z1, Z0):
    wald_bs = [get_wald_estimate_simple(df.sample(n=10000, replace=True), z1, z0) for i in range(1000)]
    print(f"Estimated elasticity ({z1} to {z0}) is {np.mean(wald_bs):.2f}, ({np.quantile(wald_bs, 0.05):.2f}, {np.quantile(wald_bs, 0.95):.2f})")

Estimated elasticity (up to same) is -2.46, (-4.20, -0.72)
Estimated elasticity (same to down) is -3.41, (-4.56, -2.35)
Estimated elasticity (up to down) is -3.01, (-3.73, -2.36)


In [None]:
def simple_iv_logit(df):
    df['log_budget'] = np.log(df['budget'])
    
    # Create instrument dummies
    df['up_dummy'] = (df['assignment'] == 'up').astype(int)
    df['down_dummy'] = (df['assignment'] == 'down').astype(int)
    
    # First stage: log_budget ~ up_dummy + down_dummy
    X_first = sm.add_constant(df[['up_dummy', 'down_dummy']])
    first_stage = sm.OLS(df['log_budget'], X_first).fit()
    df['residual_v1'] = first_stage.resid
    
    # Second stage logit
    X_second = sm.add_constant(df[['log_budget', 'residual_v1']])
    logit_model = Logit(df['conversion'], X_second).fit(disp=False)
    
    elasticity = logit_model.params['log_budget']
    ci_elasticity = logit_model.conf_int(alpha=0.05).loc['log_budget'].values
    return elasticity, logit_model, ci_elasticity

# Usage
print(f"Elasticity: {elasticity}")
est_elasticity, model, ci = simple_iv_logit(df.sample(n=1000, replace=True))
print(f"IV Logit elasticity: {est_elasticity:.3f}, ({model.conf_int(alpha=0.05).loc['log_budget'].values})")
print(f"Test of exogeneity (residual coef): {model.params['residual_v1']:.3f}")
print(f"P-value: {model.pvalues['residual_v1']:.3f}")

iv_ests = [(e, True if m.pvalues['residual_v1']> 0.05 else False) for e, m, ci in [simple_iv_logit(df.sample(n=10000, replace=True)) for i in range(100)]]

print(f"Estimated elasticity is {np.mean([x[0] for x in iv_ests]):.2f}, ({np.quantile([x[0] for x in iv_ests], 0.05):.2f}, {np.quantile([x[0] for x in iv_ests], 0.95):.2f})")

Elasticity: -3
IV Logit elasticity: -3.747, ([-6.19969018 -1.29505007])
Test of exogeneity (residual coef): 0.618
P-value: 0.782
Estimated elasticity is -2.99, (-3.59, -2.47)


## Exploiting estimates


In [350]:
simulated_results = simulate_outcomes(df, alpha, elasticity, est_elasticity, -0.1) 
simulated_results

{'original': np.float64(74290.18362859893),
 'simulated': np.float64(99188.77750591206)}

By lowering revenue, we can raise revenue

In [360]:
alpha = -4
elasticity = -0.00
df = get_simulation(10000, 0.2, alpha, elasticity)
print(f"Elasticity: {elasticity}")
est_elasticity, model, ci = simple_iv_logit(df)
print(f"IV Logit elasticity: {est_elasticity:.3f}, ({model.conf_int(alpha=0.05).loc['log_budget'].values})")
print(f"Test of exogeneity (residual coef): {model.params['residual_v1']:.3f}")
print(f"P-value: {model.pvalues['residual_v1']:.3f}")

iv_ests = [(e, True if m.pvalues['residual_v1']> 0.05 else False) for e, m, ci in [simple_iv_logit(df.sample(n=10000, replace=True)) for i in range(100)]]

print(f"Estimated elasticity is {np.mean([x[0] for x in iv_ests]):.2f}, ({np.quantile([x[0] for x in iv_ests], 0.05):.2f}, {np.quantile([x[0] for x in iv_ests], 0.95):.2f})")


simulated_results = simulate_outcomes(df, alpha, elasticity, est_elasticity, 0.1) 
simulated_results

Elasticity: -0.0
IV Logit elasticity: -0.188, ([-1.0931741  0.7164683])
Test of exogeneity (residual coef): -1.129
P-value: 0.203
Estimated elasticity is -0.21, (-0.97, 0.50)


{'original': np.float64(7635.877650070457),
 'simulated': np.float64(8251.144128944452)}

In [361]:
alpha = -3
elasticity = -1/10
print(f"True elasticity is {elasticity}")
df = get_simulation(10000, 0.2, alpha, elasticity)
print(f"Baseline conversion rate {df.conversion.mean()}")
if abs(elasticity) < 1:
    print("Raise prices to increase revenue")
else:
    print("Lower prices to increase revenue")

True elasticity is -0.1
Baseline conversion rate 0.034
Raise prices to increase revenue


In [362]:
gt_params = [
    (20, -5),
    (15, -4),
    (10, -3),
    (5, -2),
    (0, -1),
    (-2, -1/2),
    (-3, -1/10)
]

In [364]:
bs_results = []

n_sims = 100
for alpha, elasticity in gt_params:
    print(f"Elasticity is {elasticity}")
    raise_counter = 0
    lower_counter = 0
    for i in range(n_sims):
        df = get_simulation(10000, 0.2, alpha, elasticity)
        iv_est, model, ci = simple_iv_logit(df)
        # sanity check
        # iv_est = elasticity
        # ci = [elasticity, elasticity] 
        if ci[1] < -1:
            # upper bound of estimate is less than -1
            # elastic, lower prices
            sim_change = -0.1
            lower_counter += 1
        elif ci[0] > -1:
            # lower bound of estimate is greater than -1
            # inelastic, raise prices
            sim_change = 0.1
            raise_counter += 1
        else:
            # inconclusive
            sim_change = 0
        simulated_results = simulate_outcomes(df, alpha, elasticity, iv_est, sim_change) 
        simulated_results['iv_est'] = iv_est
        simulated_results['elasticity'] = elasticity
        simulated_results['budget_change'] = sim_change

        bs_results.append(simulated_results)    

    print(f"Lower prices in {lower_counter / n_sims:.2f} share of simulations")
    print(f"Raise prices in {raise_counter / n_sims:.2f} share of simulations")

df_bootstrap = pd.DataFrame(bs_results)
df_bootstrap['uplift'] = df_bootstrap['simulated'] - df_bootstrap['original']
df_bootstrap.groupby("elasticity")['uplift'].mean()

Elasticity is -5
Lower prices in 1.00 share of simulations
Raise prices in 0.00 share of simulations
Elasticity is -4
Lower prices in 1.00 share of simulations
Raise prices in 0.00 share of simulations
Elasticity is -3
Lower prices in 1.00 share of simulations
Raise prices in 0.00 share of simulations
Elasticity is -2
Lower prices in 0.52 share of simulations
Raise prices in 0.00 share of simulations
Elasticity is -1
Lower prices in 0.02 share of simulations
Raise prices in 0.03 share of simulations
Elasticity is -0.5
Lower prices in 0.00 share of simulations
Raise prices in 0.15 share of simulations
Elasticity is -0.1
Lower prices in 0.00 share of simulations
Raise prices in 0.69 share of simulations


elasticity
-5.0    28089.289308
-4.0    15582.050573
-3.0     6768.329844
-2.0      364.243687
-1.0     2584.977648
-0.5     4597.322005
-0.1     7221.189765
Name: uplift, dtype: float64

We can increase revenue in every case, on expectation.

Concerns

- very hard to get practical estimates for relatively inelastic estimates
- with 20% changes up, down we only get significance for around 1/2 estimates when elasticity in {-2, 1/2}
- For elasticity < -3, we always decrease, correctly
- For elasiticty = -0.1, we only increase 0.7 simulations