[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leaguilar/QuantUX/blob/main/notebooks/sample_size_calculation.ipynb)

[Survey Monkey Sample Size Calculator](https://www.surveymonkey.com/mp/sample-size-calculator/)

$$\text{Sample Size} = \frac{\frac{Z^2 \cdot P(1-P)}{e^2}}{1 + \frac{Z^2 \cdot P(1-P)}{e^2 \cdot N}}$$

Where:
- $n$: The Sample Size (what you are calculating).
- $N$: The Population Size (the total number of people you are trying to study).
- $Z$: The Z-score associated with your desired Confidence Level.$P$: The Population Proportion (the expected response distribution). 
- Is typical to assume $P = 0.5$ (or 50%) when it's unknown, as this yields the largest, most conservative sample size.
- $e$: The Margin of Error (expressed as a decimal, e.g., 5% becomes 0.05).

In [2]:
import math


In [8]:
# N: The total number of people in the target group.
population_size = 500000 
# MoE: The desired margin of error as a percentage (e.g., 5.0).
margin_of_error_percent = 5.0
# CL: The desired confidence level (e.g., 95).
confidence_level_percent = 99 
# P: The assumed proportion of the population. Default to 0.5 for maximum sample size.
population_proportion = 0.5 

In [9]:
z_score_map = {
    80: 1.28,
    85: 1.44,
    90: 1.65,
    95: 1.96,
    99: 2.58,
}

In [10]:
if confidence_level_percent not in z_score_map:
    print(f"Error: Confidence level {confidence_level_percent}% is not supported in the Z-score map.")
    Z = 0
else:
    Z = z_score_map[confidence_level_percent]
    
P = population_proportion
# Convert margin of error from percentage to decimal (e.g., 5% -> 0.05)
e = margin_of_error_percent / 100.0

In [11]:
# Basic input validation
if population_size <= 0 or e <= 0 or e >= 1 or Z == 0:
    required_sample_size = 0
    print("Cannot calculate sample size due to invalid inputs.")
else:
    # A. Calculate the sample size for an infinite population (n_0)
    # Formula: n_0 = (Z^2 * P * (1-P)) / e^2
    n_0_numerator = Z**2 * P * (1 - P)
    n_0_denominator = e**2
    n_0 = n_0_numerator / n_0_denominator

    # B. Apply the Finite Population Correction (FPC)
    # Formula: n = n_0 / (1 + (n_0 / N))
    fpc_denominator = 1 + (n_0 / population_size)
    sample_size = n_0 / fpc_denominator

    # C. The sample size must always be a whole number, so we round up.
    required_sample_size = math.ceil(sample_size)


In [12]:
print(f"--- Sample Size Calculation Inputs ---")
print(f"Population Size (N): {population_size}")
print(f"Confidence Level: {confidence_level_percent}% (Z-score: {Z})")
print(f"Margin of Error (e): {margin_of_error_percent}% ({e})")
print(f"Assumed Population Proportion (P): {P}")
print(f"--------------------------------------")
if required_sample_size > 0:
    print(f"Sample size for infinite population (n0): {n_0:.2f}")
    print(f"Final calculated sample size (n): {sample_size:.2f}")
    print(f"\nMinimum Required Sample Size: {required_sample_size}")

--- Sample Size Calculation Inputs ---
Population Size (N): 500000
Confidence Level: 99% (Z-score: 2.58)
Margin of Error (e): 5.0% (0.05)
Assumed Population Proportion (P): 0.5
--------------------------------------
Sample size for infinite population (n0): 665.64
Final calculated sample size (n): 664.76

Minimum Required Sample Size: 665
