# Estimating Minimum Sample Size
The following are my notes on estimating minimum sample sizes. During my research, I found many articles with different equations and definitions for calculating the minimum sample sizes, which was very frustrating. I included two references that were easy to understand for me.
___

## References
* https://www.surveymonkey.com/mp/sample-size-calculator/
* http://www.calculator.net/sample-size-calculator.html?type=1&cl=95&ci=5&pp=50&ps=&x=96&y=25

## Sample Size for Unlimited Populations
Let's start off with unlimited populations, since large datasets are ubiquitous today, and for most applications, your population will be over a million. The equation is also simpler and it provides a conservative estimate.

$$ n = \frac{z^2 \times \hat{p}(1-\hat{p})}{\epsilon^2} $$

* $n$: minimum sample size
* $z$: z-score (see Table in Appendix below)
* $\hat{p}$: proportion of the population
* $\epsilon$: margin of error

You will typically be asked for the minimum sample size for a test given a margin error of 5% and a confidence interval of 95%. You will need to determine the proportion of the population, $\hat{p}$, to determine the sample size. The most conservative estimate for sample size is when $\hat{p} = 0.5$, so this is an easy choice to start.


In [1]:
z_score = {0.70: 1.04, 
           0.75: 1.15, 
           0.80: 1.28,
           0.85: 1.44,
           0.92: 1.75,
           0.95: 1.96,
           0.96: 2.05,
           0.98: 2.32,
           0.99: 2.58,
           0.999: 3.29,
           0.9999: 3.89,
           0.99999: 4.42}

In [5]:
def min_sample_size(CI=0.95, MOE=0.05):
    z = z_score[CI]
    return (z**2 * 0.5**2 / MOE**2)

In [7]:
min_sample_size()

384.1599999999999

## Sample Size for Finite Populations

## Appendix

### Confidence Level and Z-score

Confidence Level | z-score (±)
--- | ---
0.70 | 1.04
0.75 | 1.15
0.80 | 1.28
0.85 | 1.44
0.92 | 1.75
0.95 | 1.96
0.96 | 2.05
0.98 | 2.33
0.99 | 2.58
0.999 | 3.29
0.9999 | 3.89
0.99999 | 4.42