## Power and Sample Size


statsmodels has a number of methods for power calculation

see e.g.: https://machinelearningmastery.com/statistical-power-and-power-analysis-in-python/

The statsmodels package contains several methods for power calculation. Here, we use proportion_effectsize to calculate the effect size and TTestIndPower to solve for the sample size:

In [8]:
from pathlib import Path
import random

import pandas as pd
import numpy as np

from scipy import stats
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats import power

import matplotlib.pylab as plt

For example, suppose current click-through rates are about 1.1%, and you are seeking a 10% boost to 1.21%. 

So we have two boxes: box A with 1.1% ones (say, 110 ones and 9,890 zeros), and box B with 1.21% ones (say, 121 ones and 9,879 zeros).

In [24]:
# Initializing the parameters:
current_click_through = 0.011
sought_click_through = 0.0121
alpha = 0.05
power = 0.8


# Calculate Effect Size:
# The effect size quantifies the difference between the two proportions in terms of their standardized units.
# This was set initially as at 10% increase, so in this case effect_size results in 0.010297

effect_size = sm.stats.proportion_effectsize(sought_click_through, current_click_through)
analysis = sm.stats.TTestIndPower()
result = analysis.solve_power(effect_size=effect_size, 
                              alpha=alpha, power=power, alternative='larger')
print('Sample Size: %.3f' % result)
effect_size

Sample Size: 116602.393


0.01029785095103608

In [32]:
# Initializing a different set of parameters:
current_click_through = 0.011
sought_click_through = 0.0165
alpha = 0.05
power = 0.8


# Calculate Effect Size:
    # The effect size quantifies the difference between the two proportions in terms of their standardized units.
    # This was set initially as at 10% increase, so in this case effect_size results in 0.0474681
    # sm.stats.proportion_effectsize() computes this effect size using Cohen's hh formula for proportions:

effect_size = sm.stats.proportion_effectsize(sought_click_through, current_click_through)


# Initialize a Power Analysis Object:
    # This creates an instance of TTestIndPower, which is used to perform power analysis for a two-sample t-test. 
    # While the name suggests a t-test, it's commonly used for proportions in A/B testing setups.

analysis = sm.stats.TTestIndPower()

# Solve for Required Sample Size:
    # "solve_power()" computes the minimum sample size required to detect the specified effect size.
    # Parameters:
    # - effect_size: The previously calculated effect size.
    # - alpha: The significance level (set at 0.05, allowing a 5% chance of Type I error, i.e., false positive).
    # - power: The statistical power (set at 0.8, meaning there’s an 80% chance of detecting a true difference if it exists, minimizing Type II error).
    # - alternative: Specifies the alternative hypothesis. 'larger' means the sought CTR is hypothesized to be larger than the current CTR.

result = analysis.solve_power(effect_size=effect_size, 
                              alpha=alpha, power=power, alternative='larger')
print('Sample Size: %.3f' % result)

effect_size

Sample Size: 5488.408


0.047468188118164584