[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/lindsayalexandra14/lindsayalexandra14/blob/main/templates/ab_testing/fisher_exact_test_python_template.ipynb)

*Note: Python version seems to be more conservative for sample size and CI than R*

#Top

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/AB%20Testing%20Landing%20Page%20Small%20Sample%20Size.png)

**summary**
*   This hypothetical experiment tests two Landing Pages (control vs. treatment)
*   The initial sample size is 30,000 users but the test gets cut short and our sample is cut to 305 users and I need to use Fisher's Exact Test for small sample sizes where a cell has < 10 users
*   The landing page needs to be switched over sooner due to changed timelines, so the results give as much information as I will get about how the treatment performed
*   I am trying to prove that the treatment performed better than the control because the team is interested in moving forward with the treatment
*  It was established from the test that the treatment performed better with significance (at alpha=0.05). The practical significance is low (cohen's h = 0.02). It did not have the full desired statistical power (75% vs. 80%)
*  In this case, given the constraints, I am comfortable enough with the treatment performing some level higher than the control with significance and not vice versa that I will recommend moving forward with implementing the treatment

**tl;dr for results**

*   Skip to "Results Summary" at the end

#Setup

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Setup.png)

##Install packages

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Install%20Packages.png)

##Import Libraries

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Import%20Libraries.png)

In [112]:
import numpy as np
import pandas as pd
import scipy.stats as stats
from math import asin, sqrt
import math
import textwrap
from scipy.stats import fisher_exact
from statsmodels.stats.contingency_tables import Table2x2

#Test Design

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Test%20Design%20Dark.png)

##Parameters

In [113]:
alpha = 0.05             # Significance level
power = 0.80             # Desired statistical power
                         #(Probability of detecting an effect when it exists; 0.8 is standard)
control = 0.14           # Baseline conversion rate
effect = 0.05            # Relative effect size (5% lift over basline)
mde = control * effect   # Minimum Detectable Effect (absolute difference)
treatment = control + mde  # Treatment conversion rate with lift

print(f"Control: {control:.4f}")
print(f"Treatment: {treatment:.4f}")

Control: 0.1400
Treatment: 0.1470


In [114]:
# Assign group labels
p_1 = treatment
p_2 = control
p1_label = "Treatment"
p2_label = "Control"
alternative = "greater"  # 'greater', 'less', or 'two-sided'

In [115]:
# Hypothesis Statement
if alternative == "greater":
    hypothesis = f"{p1_label} ({p_1:.4f}) is greater than {p2_label} ({p_2:.4f})"
elif alternative == "less":
    hypothesis = f"{p1_label} ({p_1:.4f}) is less than {p2_label} ({p_2:.4f})"
else:
    hypothesis = f"{p1_label} ({p_1:.4f}) is different from {p2_label} ({p_2:.4f})"

print("Hypothesis:", hypothesis)

Hypothesis: Treatment (0.1470) is greater than Control (0.1400)


##Effect size

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Effect%20Size.png)

In [116]:
# Cohen's h - Standardized Effect Size for Proportions
def cohens_h(p1, p2):
    return 2 * (asin(sqrt(p1)) - asin(sqrt(p2)))

effect_size = cohens_h(treatment, control)

print(f"Minimum Detectable Effect (MDE): {mde:.3f}")
print(f"Effect Size (Cohen's h): {effect_size:.3f}")

Minimum Detectable Effect (MDE): 0.007
Effect Size (Cohen's h): 0.020


Cohen's h benchmarks:

0.2 = small effect

0.5 = medium effect

0.8 = large effect

If the effect is tiny, it will require a very large sample size to detect.

*   Effect is translated into Cohen’s h
*   It is a way to quantify how big the difference between two proportions is, on a standardized scale  
*   Absolute differences (like +2%) are different on a baseline of 5% vs 50%
*   Puts differences on a common scale, to compare effect sizes fairly across experiments
*   Demonstrates practical meaning (vs. just statistical significance)

##Sample Size

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Sample%20Size.png)

Calculate minimum sample size for each group (cell) for one-sided and two-sided tests:
*   A one-sided test is used when you want to test if one group performs specifically better or worse than the other (a directional hypothesis).
*   A two-sided test is used when you want to test if there is any difference between the groups, regardless of direction — whether one is better or worse.

In [117]:
# Simulate Power Using Fisher's Exact Test
def simulate_fisher_power(p1, p2, n1, n2, alpha, reps=10000, alternative=alternative, seed=100):
    np.random.seed(seed)
    count = 0
    for _ in range(reps):
        x1 = np.random.binomial(n1, p1)
        x2 = np.random.binomial(n2, p2)
        table = [[x1, n1 - x1], [x2, n2 - x2]]
        p = stats.fisher_exact(table, alternative=alternative)[1]
        if p < alpha:
            count += 1
    return count / reps

In [118]:
# Find Minimum Sample Size to Achieve Desired Power
def find_min_sample_size(p1, p2, alpha, power, max_n=50000, reps=10000, alternative=alternative, step=1000, seed=100):
    np.random.seed(seed)
    for n in range(1000, max_n + 1, step):
        sim_power = simulate_fisher_power(p1, p2, n1=n, n2=n, alpha=alpha, reps=reps, alternative=alternative, seed=seed)
        print(f"n = {n} → power = {sim_power:.3f}")
        if sim_power >= power:
            return n
    return None

In [119]:
# Run Sample Size Estimation
print("\nSearching for minimum required sample size...")
min_n = find_min_sample_size(p1=p_1, p2=p_2, alpha=alpha, power=power, alternative=alternative)

if min_n:
    print(f"\n Minimum sample size per group to achieve {power*100:.0f}% power: {min_n}")

else:
    print("\n Could not achieve desired power within sample size limits.")



Searching for minimum required sample size...
n = 1000 → power = 0.105
n = 2000 → power = 0.144
n = 3000 → power = 0.182
n = 4000 → power = 0.217
n = 5000 → power = 0.240
n = 6000 → power = 0.278
n = 7000 → power = 0.314
n = 8000 → power = 0.336
n = 9000 → power = 0.365
n = 10000 → power = 0.401
n = 11000 → power = 0.425
n = 12000 → power = 0.459
n = 13000 → power = 0.469
n = 14000 → power = 0.506
n = 15000 → power = 0.530
n = 16000 → power = 0.548
n = 17000 → power = 0.573
n = 18000 → power = 0.593
n = 19000 → power = 0.617
n = 20000 → power = 0.637
n = 21000 → power = 0.646
n = 22000 → power = 0.664
n = 23000 → power = 0.681
n = 24000 → power = 0.702
n = 25000 → power = 0.716
n = 26000 → power = 0.741
n = 27000 → power = 0.752
n = 28000 → power = 0.759
n = 29000 → power = 0.773
n = 30000 → power = 0.788
n = 31000 → power = 0.798
n = 32000 → power = 0.806

 Minimum sample size per group to achieve 80% power: 32000


In [120]:
# Fixed sample sizes
n_1 = 32000  # Sample size for group 1
n_2 = 32000  # Sample size for group 2

# Estimate power at fixed sample size
estimated_power = simulate_fisher_power(p1=p_1, p2=p_2, n1=n_1, n2=n_2,
                                        alpha=alpha, alternative=alternative)

print(f"Power for manual estimate: {estimated_power:.3f}")

Power for manual estimate: 0.806


#Results

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Results%20Dark.png)

##Data

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Data.png)

###Import Data

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Import%20Data.png)

From a dataset:

In [121]:
#df_data=

Manually input:

In [122]:
control_conversions=7
treatment_conversions=18
control_no_conversions=150
treatment_no_conversions=130

In [123]:
print(p1_label) # set above in test design
print(p2_label)

Treatment
Control


##Contingency Table

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Contingency%20Table.png)

In [124]:
# Create contingency table
data = np.array([
    [control_conversions, control_no_conversions],
    [treatment_conversions, treatment_no_conversions]
])

table = pd.DataFrame(data, columns=["Converted", "Not_Converted"], index=["Control", "Treatment"])

print(table)

           Converted  Not_Converted
Control            7            150
Treatment         18            130


In [125]:
if table.index[0] != p1_label:
    table_to_use = table.iloc[[1, 0], :]  # flip row order
else:
    table_to_use = table

print(table_to_use)

           Converted  Not_Converted
Treatment         18            130
Control            7            150


##Conversion Rates:

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Conversion%20Rates.png)

In [126]:
n1 = table_to_use.iloc[0].sum()  # Reference group (row 1)
n2 = table_to_use.iloc[1].sum()  # Comparison group (row 2)

p1 = table_to_use.iloc[0]["Converted"] / n1  # Conversion rate for row 1
p2 = table_to_use.iloc[1]["Converted"] / n2  # Conversion rate for row 2

groups = {'p1': p1_label, 'p2': p2_label}

print(f"p1: {groups['p1']} Conversion Rate: {round(p1 * 100, 2)}%")
print(f"p2: {groups['p2']} Conversion Rate: {round(p2 * 100, 2)}%")

p1: Treatment Conversion Rate: 12.16%
p2: Control Conversion Rate: 4.46%


In [127]:
if alternative == "two.sided":
    alt_text = f"{p1_label} ({p1:.4f}) is different from {p2_label} ({p2:.4f})"
elif alternative == "greater":
    alt_text = f"{p1_label} ({p1:.4f}) is greater than {p2_label} ({p2:.4f})"
elif alternative == "less":
    alt_text = f"{p1_label} ({p1:.4f}) is less than {p2_label} ({p2:.4f})"
else:
    alt_text = "Unknown alternative hypothesis"

print("Result Hypothesis:", alt_text)

Result Hypothesis: Treatment (0.1216) is greater than Control (0.0446)


##Effect Size:

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Effect%20Size.png)

In [128]:
# Absolute difference
abs_diff = abs(p1 - p2)

# Cohen's h function
def proportion_effectsize(control, treatment):
    return 2 * math.asin(math.sqrt(treatment)) - 2 * math.asin(math.sqrt(control))

h = proportion_effectsize(control, treatment)

print(f"Absolute difference: {abs_diff:.3f} ({abs_diff * 100:.1f}%)")
print(f"Cohen's h: {h:.3f}")

# Interpret effect size
def interpret_h(h):
    abs_h = abs(h)
    if abs_h < 0.2:
        return "negligible"
    elif abs_h < 0.5:
        return "small"
    elif abs_h < 0.8:
        return "medium"
    else:
        return "large"

print(f"Effect size interpretation: {interpret_h(h)}")

Absolute difference: 0.077 (7.7%)
Cohen's h: 0.020
Effect size interpretation: negligible


##Fisher's Exact Test

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Fishers%20Exact%20Test.png)

Use Fisher's Exact Test since a cell (control converted) has < 10 users:

In [129]:
print(table_to_use)

           Converted  Not_Converted
Treatment         18            130
Control            7            150


Check parameters and change if needed:

In [130]:
print(alternative)
print(alpha)
print(power)
# alternative="greater" # Row 1 (reference) of Contingency Table is greater than Row 2
# alpha=0.05
# power=0.8

greater
0.05
0.8


Run test:

In [131]:
odds_ratio, p_value = fisher_exact(table_to_use.values, alternative=alternative)

print(f"Fisher's Exact Test:")
print(f"Odds Ratio: {odds_ratio:.4f}")
print(f"p-value ({alternative}): {p_value:.4f}")

Fisher's Exact Test:
Odds Ratio: 2.9670
p-value (greater): 0.0119


#P-value

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Pvalue.png)

In [132]:
print(f"p-value: {round(p_value, 3)}")

# Significance message
if p_value < alpha:
    pvalue_message = (
        f"Because the p-value ({p_value:.3f}) is less than alpha ({alpha:.3f}), "
        f"this result is statistically significant at the {(1 - alpha) * 100:.0f}% confidence level."
    )
else:
    pvalue_message = (
        f"Because the p-value ({p_value:.3f}) is greater than or equal to alpha ({alpha:.3f}), "
        f"this result is not statistically significant at the {(1 - alpha) * 100:.0f}% confidence level."
    )

print("\n".join(textwrap.wrap(pvalue_message, width=80)))

p-value: 0.012
Because the p-value (0.012) is less than alpha (0.050), this result is
statistically significant at the 95% confidence level.


##Confidence Interval

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Confidence%20Interval.png)

In [133]:
table = table_to_use.values
tbl = Table2x2(table)
lower_ci, upper_ci = tbl.oddsratio_confint(alpha=alpha)
if alternative=="less":
  print("Upper CI:",f"{upper_ci:.2f}")
elif alternative=="greater":
  print("Lower CI:",f"{lower_ci:.2f}")
else:
  print("CI:",f"{lower_ci:.2f}, {upper_ci:.2f}")

Lower CI: 1.20


###"Less" Hypothesis:

The upper bound of the CI is what you want to look at for "less" — it should be < 1 for significance.

For a "less" alternative (testing if Group B < Group A):


If the upper bound < 1:
You are confident the odds in Group B are lower than in Group A.
This supports your "less" hypothesis → Result significant.

If the upper bound ≥ 1:
It’s possible Group B’s odds are not lower than Group A’s (could be equal or higher).
This does not support your "less" hypothesis → Result not significant.

###"Greater" Hypothesis:

The lower bound of the CI is what you want to look at for "greater" — it should be > 1 for significance.

For a "greater" alternative (testing if Group B > Group A):

If lower bound > 1:
With your chosen confidence level, you can say the odds of conversion in Group B are at least this much higher than in Group A.
This supports your "greater" hypothesis → Result is significant.

If lower bound ≤ 1:
You cannot confidently say that Group B’s odds are higher than Group A’s.
This does not support your "greater" hypothesis → Result is not significant.

###Interpretation:

In [134]:
conf_level = (1 - alpha) * 100

if alternative == "less":
    if upper_ci < 1:
        percent_diff = (1 - upper_ci) * 100
        sentence = (
            f"95% CI Upper Bound for Odds Ratio: {upper_ci:.3f}.\n\n"
            f"With {conf_level:.0f}% confidence, the control group (p₁) has lower odds of conversion than the treatment group (p₂).\n"
            f"The odds of conversion in the control group are up to {percent_diff:.1f}% lower than in the treatment group.\n"
            "This supports the hypothesis that treatment is better than control."
        )
    else:
        sentence = (
            f"95% CI Upper Bound for Odds Ratio: {upper_ci:.3f}.\n\n"
            f"With {conf_level:.0f}% confidence, we cannot rule out that the treatment group is not better than the control group.\n"
            "This does not support the hypothesis that treatment is better."
        )

elif alternative == "greater":
    if lower_ci > 1:
        percent_diff = (lower_ci - 1) * 100
        sentence = (
            f"95% CI Lower Bound for Odds Ratio: {lower_ci:.3f}.\n\n"
            f"With {conf_level:.0f}% confidence, the treatment group (p₂) has higher odds of conversion than the control group (p₁).\n"
            f"The odds of conversion in the treatment group are at least {percent_diff:.1f}% higher than in the control group.\n"
            "This supports the hypothesis that treatment is better than control."
        )
    else:
        sentence = (
            f"95% CI Lower Bound for Odds Ratio: {lower_ci:.3f}.\n\n"
            f"With {conf_level:.0f}% confidence, we cannot rule out that the treatment group is not better than the control group.\n"
            "This does not support the hypothesis that treatment is better."
        )

elif alternative == "two.sided":
    if lower_ci > 1:
        percent_diff = (lower_ci - 1) * 100
        sentence = (
            f"95% CI: [{lower_ci:.3f}, {upper_ci:.3f}].\n\n"
            f"With {conf_level:.0f}% confidence, the treatment group (p₂) has higher odds of conversion than the control group (p₁).\n"
            f"The odds of conversion in the treatment group are at least {percent_diff:.1f}% higher than in the control group.\n"
            "This supports a significant difference favoring treatment."
        )
    elif upper_ci < 1:
        percent_diff = (1 - upper_ci) * 100
        sentence = (
            f"95% CI: [{lower_ci:.3f}, {upper_ci:.3f}].\n\n"
            f"With {conf_level:.0f}% confidence, the control group (p₁) has lower odds of conversion than the treatment group (p₂).\n"
            f"The odds of conversion in the control group are up to {percent_diff:.1f}% lower than in the treatment group.\n"
            "This supports a significant difference favoring treatment."
        )
    else:
        sentence = (
            f"95% CI: [{lower_ci:.3f}, {upper_ci:.3f}].\n\n"
            f"With {conf_level:.0f}% confidence, we cannot rule out no difference in odds between treatment and control.\n"
            "This does not support a statistically significant difference."
        )

print(sentence)
print()

# Significance check based on whether CI includes 1
includes_one = (lower_ci <= 1) and (upper_ci >= 1)

if includes_one:
    message = (
        f"Because the interval includes 1, this result is not statistically significant "
        f"at the {conf_level:.0f}% confidence level."
    )
else:
    message = (
        f"Because the interval does not include 1, this result is statistically significant "
        f"at the {conf_level:.0f}% confidence level."
    )

print(message)


95% CI Lower Bound for Odds Ratio: 1.201.

With 95% confidence, the treatment group (p₂) has higher odds of conversion than the control group (p₁).
The odds of conversion in the treatment group are at least 20.1% higher than in the control group.
This supports the hypothesis that treatment is better than control.

Because the interval does not include 1, this result is statistically significant at the 95% confidence level.


##Statistical Power

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Statistical%20Power.png)

In [135]:
print(table_to_use)

           Converted  Not_Converted
Treatment         18            130
Control            7            150


In [136]:
import numpy as np
from scipy.stats import fisher_exact

def power_fisher_test(n1, n2, p1, p2, alpha=0.05, alternative='greater', nsim=10000, seed=100):
    np.random.seed(seed)
    rejections = 0

    for _ in range(nsim):
        # Simulate successes/failures for each group
        group1 = np.random.binomial(n1, p1)
        group2 = np.random.binomial(n2, p2)

        # [[success_group1, failure_group1], [success_group2, failure_group2]]
        table = np.array([
            [group1, n1 - group1],
            [group2, n2 - group2]
        ])

        # Fisher's exact test
        _, p_value = fisher_exact(table, alternative=alternative)

        if p_value < alpha:
            rejections += 1

    # Estimated power = proportion of rejections
    power = rejections / nsim
    return power

result_power = power_fisher_test(n1, n2, p1, p2, alpha=alpha, alternative='greater', nsim=10000)
print(f"Estimated power (simulation): {result_power:.4f}")


Estimated power (simulation): 0.7443


In [137]:
power_pct = f"{result_power * 100:.1f}"
print(f"Result Power: {power_pct}%\n")

if result_power < 0.8:
    power_sentence = (
        f"Our test was underpowered (e.g., only ~{power_pct}% power), meaning there was \n"
        "a higher chance we failed to detect a true difference due to limited sample size. \n"
        "As a result, while the effect appears meaningful, we cannot be statistically \n"
        "confident in it without further data and cannot give a confident \n"
        "estimate in incremental revenue from the test."
    )
else:
    power_sentence = (
        f"Our test was adequately powered (e.g., ~{power_pct}% power), meaning we had a \n"
        "strong chance of detecting a true difference if one existed."
    )

print(power_sentence)


Result Power: 74.4%

Our test was underpowered (e.g., only ~74.4% power), meaning there was 
a higher chance we failed to detect a true difference due to limited sample size. 
As a result, while the effect appears meaningful, we cannot be statistically 
confident in it without further data and cannot give a confident 
estimate in incremental revenue from the test.


#Results Summary

![Alt text](https://github.com/lindsayalexandra14/ds_portfolio/raw/main/2_images/templates/notebook/headers/lilac/Results%20Summary.png)

In [138]:
print(table_to_use)
print()
print(f"p1: {groups['p1']} Conversion Rate: {round(p1 * 100, 2)}%")
print(f"p2: {groups['p2']} Conversion Rate: {round(p2 * 100, 2)}%")
print()
print("Result Hypothesis:", alt_text)
print()
print(f"Absolute difference: {abs_diff:.3f} ({abs_diff * 100:.1f}%)")
print(f"Cohen's h: {h:.3f}")
print(f"Effect size interpretation: {interpret_h(h)}")
print()
print(f"Fisher's Exact Test:")
print(f"Odds Ratio: {odds_ratio:.4f}")
print()
print(f"p-value: {round(p_value, 3)}")
print("\n".join(textwrap.wrap(pvalue_message, width=80)))
print()
print(sentence)
print()
print(f"95% CI: [{lower_ci:.3f}, {upper_ci:.3f}]")
print()
print(message)
print()
print(power_sentence)

           Converted  Not_Converted
Treatment         18            130
Control            7            150

p1: Treatment Conversion Rate: 12.16%
p2: Control Conversion Rate: 4.46%

Result Hypothesis: Treatment (0.1216) is greater than Control (0.0446)

Absolute difference: 0.077 (7.7%)
Cohen's h: 0.020
Effect size interpretation: negligible

Fisher's Exact Test:
Odds Ratio: 2.9670

p-value: 0.012
Because the p-value (0.012) is less than alpha (0.050), this result is
statistically significant at the 95% confidence level.

95% CI Lower Bound for Odds Ratio: 1.201.

With 95% confidence, the treatment group (p₂) has higher odds of conversion than the control group (p₁).
The odds of conversion in the treatment group are at least 20.1% higher than in the control group.
This supports the hypothesis that treatment is better than control.

95% CI: [1.201, 7.328]

Because the interval does not include 1, this result is statistically significant at the 95% confidence level.

Our test was underp