🌟 Exercise 1: Calculating Required Sample Size
You are planning an A/B test to evaluate the impact of a new email subject line on the open rate. Based on past data, you expect a small effect size of 0.3 (an increase from 20% to 23% in the open rate). You aim for an 80% chance (power = 0.8) of detecting this effect if it exists, with a 5% significance level (α = 0.05).

Calculate the required sample size per group using Python’s statsmodels library.
What sample size is needed for each group to ensure your test is properly powered?



In [7]:
from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize

p1 = 0.20
p2 = 0.23
alpha = 0.05
power = 0.80
ratio = 1

effect_size = proportion_effectsize(p1, p2)

analysis = NormalIndPower()

sample_size = analysis.solve_power(effect_size=effect_size, power=power, alpha=alpha, ratio=ratio)

sample_size = int(sample_size) + 1

print(f"Required sample size per group: {sample_size}")

Required sample size per group: 2941


🌟 Exercise 2: Understanding the Relationship Between Effect Size and Sample Size
Using the same A/B test setup as in Exercise 1, you want to explore how changing the expected effect size impacts the required sample size.

Calculate the required sample size for the following effect sizes: 0.2, 0.4, and 0.5, keeping the significance level and power the same.
How does the sample size change as the effect size increases? Explain why this happens.

In [8]:
from statsmodels.stats.power import NormalIndPower
import numpy as np

alpha = 0.05
power = 0.80
ratio = 1

effect_sizes = [0.2, 0.4, 0.5]

analysis = NormalIndPower()

print("Calculating required sample sizes for different effect sizes:\n")
print("Effect Size\tRequired Sample Size per Group")
print("-----------\t------------------------------")

for es in effect_sizes:
    sample_size = analysis.solve_power(effect_size=es, power=power, alpha=alpha, ratio=ratio)
    sample_size = int(np.ceil(sample_size))
    print(f"{es}\t\t{sample_size}")

Calculating required sample sizes for different effect sizes:

Effect Size	Required Sample Size per Group
-----------	------------------------------
0.2		393
0.4		99
0.5		63


🌟 Exercise 3: Exploring the Impact of Statistical Power
Imagine you are conducting an A/B test where you expect a small effect size of 0.2. You initially plan for a power of 0.8 but wonder how increasing or decreasing the desired power level impacts the required sample size.

Calculate the required sample size for power levels of 0.7, 0.8, and 0.9, keeping the effect size at 0.2 and significance level at 0.05.
Question: How does the required sample size change with different levels of statistical power? Why is this understanding important when designing A/B tests?


In [9]:
from statsmodels.stats.power import NormalIndPower
import numpy as np

effect_size = 0.2
alpha = 0.05
power_levels = [0.7, 0.8, 0.9]
ratio = 1

analysis = NormalIndPower()

print("Calculating required sample sizes for different power levels:\n")
print("Power Level\tRequired Sample Size per Group")
print("-----------\t------------------------------")

for power in power_levels:
    sample_size = analysis.solve_power(effect_size=effect_size, power=power, alpha=alpha, ratio=ratio)
    sample_size = int(np.ceil(sample_size))
    print(f"{power}\t\t{sample_size}")

Calculating required sample sizes for different power levels:

Power Level	Required Sample Size per Group
-----------	------------------------------
0.7		309
0.8		393
0.9		526


As the desired statistical power increases, the required sample size per group also increases.
Understanding this relationship helps in designing experiments that are adequately powered to detect meaningful effects. Ensure that the calculation is computed using the correct dimension.


🌟 Exercise 4: Implementing Sequential Testing
You are running an A/B test on two versions of a product page to increase the purchase rate. You plan to monitor the results weekly and stop the test early if one version shows a significant improvement.

Define your stopping criteria.
Decide how you would implement sequential testing in this scenario.
At the end of week three, Version B has a p-value of 0.02. What would you do next?

By defining clear stopping criteria and implementing sequential testing with appropriate statistical adjustments, possible to effectively balance the risks of Type I and Type II errors while allowing for early detection of significant effects. In this scenario, with Version B showing a p-value of 0.02 at the end of week three, which is below the adjusted significance level of 0.0221, should stop the test early and proceed with implementing Version B, as it has demonstrated a statistically significant improvement.

🌟 Exercise 5: Applying Bayesian A/B Testing
You’re testing a new feature in your app, and you want to use a Bayesian approach. Initially, you believe the new feature has a 50% chance of improving user engagement. After collecting data, your analysis suggests a 65% probability that the new feature is better.

Describe how you would set up your prior belief.
After collecting data, how does the updated belief (posterior distribution) influence your decision?
What would you do if the posterior probability was only 55%?

1. Describe How You Would Set Up Your Prior Belief

In Bayesian A/B testing, setting up your prior belief involves specifying your initial assumptions about the parameters before observing the data. Here’s how you can do it for this scenario:

a. Define the Metric of Interest
•	User Engagement Rate: The proportion of users who engage with the app in a meaningful way (e.g., clicks, time spent, purchases).

b. Choose Appropriate Probability Distributions

Since we’re dealing with proportions (rates between 0 and 1), the Beta distribution is suitable for modeling our beliefs about the engagement rates of both versions of the app.
2. What Would You Do If the Posterior Probability Was Only 55%?

A posterior probability of 55% indicates very weak evidence that the new feature is better.





🌟 Exercise 6: Implementing Adaptive Experimentation
You’re running a test with three different website layouts to increase user engagement. Initially, each layout gets 33% of the traffic. After the first week, Layout C shows higher engagement.

Explain how you would adjust the traffic allocation after the first week.
Describe how you would continue to adapt the experiment in the following weeks.
What challenges might you face with adaptive experimentation, and how would you address them?

1. Adjusting Traffic Allocation After the First Week:

Since Layout C shows higher engagement, allocate more traffic to it while still testing Layouts A and B. For example, assign a higher percentage (e.g., 60-70%) to Layout C and distribute the remaining traffic between Layouts A and B to continue gathering data.

2. Adapting the Experiment in the Following Weeks:

Continue to monitor engagement metrics and adjust traffic allocation accordingly. Use adaptive algorithms like Multi-Armed Bandits (e.g., Thompson Sampling) to balance exploration (testing all layouts) and exploitation (favoring the best performer). Gradually increase traffic to the top-performing layout while ensuring others receive enough exposure for accurate assessment.

3. Challenges and How to Address Them:

•	Statistical Biases: Adaptive allocation can introduce bias, making it hard to compare layouts directly. Mitigate this by using statistical methods suited for adaptive experiments.

•	Insufficient Data for Some Layouts: Less traffic to underperforming layouts may lead to unreliable metrics. Set minimum traffic thresholds to ensure sufficient data.

•	User Experience Consistency: Users might see different layouts on return visits. Address this by assigning users to a specific layout throughout the experiment (user consistency).