<a href="https://colab.research.google.com/github/milieureka/MKT-science/blob/main/Random_Experiment/Sample_size_estimation(AB).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import math

 # Mathematical formula to estimate sample size required under assumptions

Identify the baseline conversion rate for users (control) and the minimum conversion rate that we want to detect for treated users.

The sample size formula is designed to ensure the experiment has enough data to reliably detect a specified difference in conversion rates (e.g., from 30% to 35%) while controlling for statistical errors.

- **Purpose**: The formula calculates the minimum number of users \( N \) needed to detect a difference between two proportions (control and treatment conversion rates) with a desired level of statistical power (80% chance) and significance (5% Type I error rate - false positives).
  
- **Formula Breakdown**:

$$ N = \left\lceil 2 \times \frac{(p_1(1-p_1) + p_2(1-p_2)) \times (Z_{\text{power}} + Z_{\alpha/2})^2}{(p_2 - p_1)^2} \right\rceil $$

  - $ p_1(1-p_1) + p_2(1-p_2) $: variances of the binomial distributions for the control $ p_1 $ and treatment groups $ p_2 $.
  - $ (p_2 - p_1)^2 $: The squared difference in conversion rates (effect size), which determines how small a difference we want to detect. Smaller differences require larger samples.
  - $ Z_{\text{power}} + Z_{\alpha/2} $: A constant combined Z-scores reflecting the desired power (80%) and significance level (5%) in a two-tailed test.
  - **Factor of 2**: Assumes equal-sized control and treatment groups, doubling the sample size for both.

In [2]:
np.random.seed(1)
iterations = 1000

control_success = .30
treatment_success = .35

N = math.ceil(2*(control_success*(1-control_success)+treatment_success*(1-treatment_success))*
              ((2.8/(treatment_success-control_success))**2))

print("Required Sample Size:", N)

Required Sample Size: 2745


In the code, it calculates the variance:

*   0.30 × (1 − 0.30) = 0.21 for the control group
*   0.35×(1−0.35)=0.2275 for the treatment group
=> 0.21+0.2275=0.4375.

The effect size is 0.35−0.30=0.05.

The term $ (2.8/0.05)^2 $ =3136 scales the sample size to achieve 80% power.

The total sample size N is computed and rounded up, resulting in the number of users needed (split equally between control and treatment groups).

This sample size ensures that, if the true difference in conversion rates is at least 5%, we have an 80% chance of detecting it with 95% confidence.

# Simulate Random Samples

To test whether the calculated sample size works, the code simulates A/B testing experiments. The function create_sample generates a dataset representing one experiment:

In [3]:
def create_sample(N,control_success,treatment_success):
    split_n = round(N/2)
    df = pd.DataFrame(np.random.uniform(0,1,size=(N,2)),columns=list(['treatment','randnum']))
    df.iloc[0:split_n,0]=0
    df.iloc[split_n:,0]=1
    df['outcome']=0
    df['outcome']=np.where((df['treatment']==0) & (df['randnum']<control_success),1,df['outcome'])
    df['outcome']=np.where((df['treatment']==1) & (df['randnum']<treatment_success),1,df['outcome'])
    return df

Split the sample: The total sample size N is divided into two equal groups: *(N/2)* for the control group and half for the treatment group.

Create a dataset: A DataFrame is created with N rows and two columns:
*   treatment: Set to 0 for the first half (control group) and 1 for the second half (treatment group).
*   randnum: Random numbers between 0 and 1, used to simulate whether a user converts.

Simulate outcomes: The outcome column is initially set to 0 (no conversion).
*   For the control group (treatment=0), if a user’s randnum is less than 0.30 (control_success), their outcome is set to 1 (conversion). This simulates a 30% conversion rate.
*   For the treatment group (treatment=1), if a user’s randnum is less than 0.35 (treatment_success), their outcome is set to 1, simulating a 35% conversion rate.

# Run Simulations and Perform Statistical Tests

In [None]:
def get_t_list(iterations):
    tlist = list()
    for i in range(iterations):
        df = create_sample(N,control_success,treatment_success)
        formula = 'outcome~treatment'
        model = smf.ols(formula,data=df).fit()
        t=model.tvalues[1]
        tlist.append(t)
    return tlist

Provide one example simulation

In [None]:
df = create_sample(N,control_success,treatment_success)
formula = 'outcome~treatment'
model = smf.ols(formula,data=df).fit()
print(model.summary())

With appropriately chosen sample size and sufficiently large number of iterations, the share of simulations that are statistically significant should be very close to 80% by design (this is the chosen power level embedded in the 2.8 parameter)

In [None]:
t_list = get_t_list(iterations)
t_stats = pd.DataFrame(t_list,columns=['tstat'])
t_stats['sig'] = np.where(np.abs(t_stats['tstat'])>=1.96,1,0)
print(t_stats['sig'].mean())