# <font color='red'>Chapter: Sample Size Calculation</font>

## <font color='green'>Introduction to Sample Size Calculation</font>

Determining the correct sample size is one of the most critical steps in research and experimentation. A properly calculated sample size ensures that:
- Results are statistically reliable.
- Resources are utilized efficiently.
- Over-sampling or under-sampling errors are avoided.

### **Key Parameters in Sample Size Calculation**:
1. **Confidence Level ($\alpha$)**: The probability of results being within a specified range (e.g., 95%).
2. **Margin of Error ($E$)**: The maximum acceptable difference between the sample estimate and the true population parameter.
3. **Variability ($\sigma$ or $p$)**: The distribution of the population or proportion of interest.
4. **Population Size ($N$)**: Total number of individuals in the target group.

### **Why It Matters**:
- **Under-sampling**: Leads to results that are not representative.
- **Over-sampling**: Wastes resources without significant gains in precision.


## <font color='green'>Formulas for Sample Size Calculation</font>

### **1. Sample Size for Proportions**:
$$ n = \frac{z^2 \cdot p \cdot (1 - p)}{E^2} $$

Where:
- $z$: Z-value for the desired confidence level (e.g., 1.96 for 95%).
- $p$: Estimated proportion in the population.
- $1 - p$: Complement of $p$.
- $E$: Margin of error.

### **2. Sample Size for Means**:
$$ n = \frac{z^2 \cdot \sigma^2}{E^2} $$

Where:
- $\sigma$: Standard deviation of the population.

### **3. Finite Population Adjustment**:
If the population size is finite:
$$ n_{adj} = \frac{n}{1 + \frac{n - 1}{N}} $$

Where:
- $N$: Total population size.

### **4. When Standard Deviation is Unknown**:
Approximate $\sigma$ using pilot studies or historical data.


## <font color='red'>Cohen's d and Effect Size</font>

### <font color='green'>Definition:</font>
Cohen's $d$ is a standardized measure that quantifies the magnitude of the difference between two means. It is widely used in hypothesis testing, particularly for determining effect sizes in A/B testing and other experimental designs. It provides a scale-free way to interpret the strength of a difference, making results comparable across studies.

---

### <font color='green'>Formula:</font>
$$ d = \frac{\overline{X}_1 - \overline{X}_2}{\sigma} $$

#### Where:
- $ \overline{X}_1, \overline{X}_2 $: Mean values of the two groups being compared.
- $ \sigma $: Pooled standard deviation of the population, which represents the variability across both groups.

This formula assumes the variances between the groups are similar and the samples are drawn independently.

---

### <font color='green'>Effect Size Interpretation:</font>
Effect sizes are categorized into levels based on the $d$ value:

- **Small Effect:** $d = 0.2$  
  Indicates a subtle difference between the groups.
- **Medium Effect:** $d = 0.5$  
  Suggests a moderate difference.
- **Large Effect:** $d = 0.8$  
  Highlights a substantial difference.

These thresholds help assess whether the observed differences are practically meaningful or statistically significant.

---

### <font color='green'>Role in Sample Size Calculation:</font>
Cohen's $d$ directly influences the required sample size for experiments. Larger effect sizes generally require smaller sample sizes to detect statistically significant differences, while smaller effect sizes demand larger samples to achieve sufficient power.

#### Why Cohen's $d$ Matters in A/B Testing:
- **Context of the Test:** Cohen's $d$ helps define the minimum detectable effect (MDE), which determines the smallest effect size that the experiment is designed to detect.  
- **Practical Application:** In real-world A/B testing, understanding $d$ ensures that the experiment is neither underpowered (leading to inconclusive results) nor overpowered (wasting resources).

For example, if an experiment aims to compare the performance of two website designs, a large $d$ might indicate a significant visual improvement, while a small $d$ might suggest only marginal gains.

---

### <font color='green'>Worked Example:</font>
Suppose you’re comparing the test scores of two classes after introducing a new teaching method. The mean scores are:
- Class A (control): $ \overline{X}_1 = 70 $
- Class B (treatment): $ \overline{X}_2 = 75 $

The pooled standard deviation is $ \sigma = 10 $.

Using Cohen’s $d$:  
$$ d = \frac{\overline{X}_2 - \overline{X}_1}{\sigma} = \frac{75 - 70}{10} = 0.5 $$

This $d$-value represents a medium effect, indicating that the teaching method has a moderate impact on improving test scores.

---


## <font color='green'>Relative vs. Absolute Lift</font>

### **Relative Lift**:
- **Formula**:  
  $$ p_2 = p_1 \cdot (1 + \text{Relative Lift}) $$

- **When to Use**:  
  - Proportional improvements matter.
  - Comparisons across varied baselines.

### **Absolute Lift**:
- **Formula**:  
  $$ p_2 = p_1 + \text{Absolute Lift} $$

- **When to Use**:  
  - Focused on fixed numerical goals.
  - Practical significance is crucial.

### **Key Considerations**:
- Baseline proportion ($p_1$).
- Stakeholder goals.
- Purpose of the experiment.


## <font color='green'>Sample Size Calculation for A/B Tests</font>

### **Formula**:
$$
n = \frac{(z_{\alpha/2} + z_{\beta})^2 \cdot [p_1 \cdot (1 - p_1) + p_2 \cdot (1 - p_2)]}{(p_2 - p_1)^2}
$$

Where:
- $z_{\alpha/2}$: Z-value for the chosen confidence level.
- $z_{\beta}$: Z-value for the chosen power.
- $p_1$: Baseline proportion.
- $p_2$: Expected proportion (based on lift or absolute difference).
- $p_2 - p_1$: Minimum detectable effect (MDE).

### **Applications**:
- Website optimization.
- Marketing experiments.
- Product feature testing.


### <font color='red'>Cohen's d in the Context of A/B Testing</font>

#### <font color='green'>Definition</font>
Cohen's $d$ measures the magnitude of the difference between two means relative to the standard deviation of the population. It plays a key role in understanding the effect size, which directly influences the required sample size in A/B testing.

#### <font color='green'>Formula</font>
$$
d = \frac{\bar{X}_1 - \bar{X}_2}{\sigma}
$$

Where:
- $ \bar{X}_1, \bar{X}_2 $: Means of the two groups (e.g., test and control groups).
- $ \sigma $: Standard deviation of the population.

#### <font color='green'>Effect Size and Sample Size</font>
Cohen's $d$ determines the effect size, which impacts the sample size required to achieve a certain confidence level and statistical power in A/B testing:
- **Small Effect Size ($d = 0.2$)**: Small difference between groups, requiring a larger sample size to detect the effect.
- **Medium Effect Size ($d = 0.5$)**: Moderate difference, reducing the sample size compared to small effects.
- **Large Effect Size ($d = 0.8$)**: Significant difference, allowing for smaller sample sizes.

#### <font color='green'>Connection to A/B Tests</font>
In A/B testing, Cohen's $d$ is typically used for comparing two means (e.g., average purchase amounts). However, for proportions (e.g., conversion rates), the **z-test formula for proportions** is preferred. The effect size directly influences the Minimum Detectable Effect (MDE), baseline proportion ($p_1$), and target proportion ($p_2$).

#### <font color='green'>Relationship Between Cohen's $d$ and Sample Size</font>
- **Effect Size $d$** helps define the **Minimum Detectable Effect (MDE)**.
- The MDE, along with baseline ($p_1$) and target ($p_2$) proportions, determines the sample size required for a test with specified confidence and power levels.

#### <font color='green'>Applications in A/B Tests</font>
- Conversion rate optimization (e.g., clicks, sign-ups).
- Comparison of average metrics (e.g., revenue per user).
- Estimating the minimum number of samples required for meaningful results.

This understanding of Cohen's $d$ ensures that tests are neither underpowered (risking missed effects) nor overpowered (wasting resources).


## <font color='green'>Real-World Applications</font>

1. **Election Polling**: Estimating proportions for a population with a known size.
2. **Clinical Trials**: Determining sample size for testing drug efficacy.
3. **Marketing Campaigns**: Calculating sample size for A/B testing in email campaigns.


# <font color='red'>Exercises for Practice</font>

## <font color='green'>Calculate Cohen's $d$</font>

**Two groups of users interacted with different versions of a website. Their average time spent on the website was recorded:** 

- Group 1: $\bar{X}_1 = 5.2$ minutes
- Group 2: $\bar{X}_2 = 4.3$ minutes
- Standard deviation across both groups: $\sigma = 1.8$

Using the formula for Cohen's $d$, calculate the effect size and interpret it.

In [20]:
# Parameters
mean_1 = 5.2
mean_2 = 4.3
std_dev = 1.8

# Calculate Cohen's d
cohens_d = (mean_1 - mean_2) / std_dev
print(f"Cohen's d: {cohens_d:.2f}")
# Interpretation
if cohens_d < 0.2:
    print("Effect size is negligible.")
elif cohens_d < 0.5:
    print("Effect size is small.")
elif cohens_d < 0.8:
    print("Effect size is medium.")
else:
    print("Effect size is large.")


Cohen's d: 0.50
Effect size is medium.


## <font color='green'>Determine Sample Size for Proportion Comparison</font>

**You want to detect a 5% improvement in email sign-ups:** 

- Baseline proportion: $p_1 = 0.20$
- Target proportion: $p_2 = 0.25$
- Confidence level: 95% ($z_{\alpha/2} = 1.96$)
- Power: 80% ($z_\beta = 0.84$)

Calculate the sample size required for this A/B test.

In [36]:
from math import ceil

# Parameters
p1 = 0.20
p2 = 0.25
z_alpha = 1.96
z_beta = 0.84

# Minimum detectable effect (MDE)
mde = p2 - p1

# Calculate sample size
pooled_p = (p1 + p2) / 2
n = ((z_alpha + z_beta)**2 * (pooled_p * (1 - pooled_p))) / (mde**2)
sample_size = ceil(n)

print(f"Required sample size per group: {sample_size}")


Required sample size per group: 547


## <font color='green'>Calculate Sample Size for Relative Lift</font>

**An e-commerce platform wants to test a new feature that they estimate will improve the conversion rate by 15%. Given:** 

- Baseline proportion: $p_1 = 0.10$
- Relative lift: 15% (i.e., $p_2 = p_1 \cdot (1 + 0.15)$)
- Confidence level: 95% ($z_{\alpha/2} = 1.96$)
- Power: 90% ($z_\beta = 1.28$)

Determine the sample size required to run this test.

In [38]:
# Parameters
p1 = 0.10
relative_lift = 0.15
p2 = p1 * (1 + relative_lift)
z_alpha = 1.96
z_beta = 1.28

# Minimum detectable effect (MDE)
mde = p2 - p1

# Calculate sample size
pooled_p = (p1 + p2) / 2
n = ((z_alpha + z_beta)**2 * (pooled_p * (1 - pooled_p))) / (mde**2)
sample_size = ceil(n)

print(f"Required sample size per group: {sample_size}")


Required sample size per group: 4477


## <font color='green'>Effect Size Interpretation</font>

**Given the following effect sizes for Cohen's $d$, categorize them as small, medium, or large:** 

- $d = 0.15$
- $d = 0.45$
- $d = 0.90$

Explain how the sample size required would differ for each.

In [40]:
# Effect sizes
effect_sizes = [0.15, 0.45, 0.90]
categories = []

for d in effect_sizes:
    if d < 0.2:
        categories.append("Small effect")
    elif d < 0.5:
        categories.append("Medium effect")
    else:
        categories.append("Large effect")

# Display results
for i, d in enumerate(effect_sizes):
    print(f"Cohen's d: {d} - {categories[i]}")


Cohen's d: 0.15 - Small effect
Cohen's d: 0.45 - Medium effect
Cohen's d: 0.9 - Large effect


## <font color='green'>Calculate Confidence Interval for Conversion Rates</font>

**A/B test results for a sign-up page show:** 

- Sample size: $n = 800$
- Conversion rate: $\hat{p} = 0.18$
- Confidence level: 95% ($z_{\alpha/2} = 1.96$)

Calculate the confidence interval for the conversion rate.

In [42]:
# Parameters
n = 800
p_hat = 0.18
z_alpha = 1.96

# Calculate margin of error
margin_error = z_alpha * ((p_hat * (1 - p_hat)) / n)**0.5

# Confidence interval
lower_bound = p_hat - margin_error
upper_bound = p_hat + margin_error

print(f"Confidence Interval: [{lower_bound:.3f}, {upper_bound:.3f}]")


Confidence Interval: [0.153, 0.207]
