Optimal Data Collection
What's the least amount of data you need to collect to estimate the population mean with a particular standard error? For the simplest case---estimating the mean of a binomial variable using simple random sampling, a conservative estimate of the variance (
Why?
In a realistic example, we find the benefit of using optimal allocation over simple random sampling is 6.5% (see the code block below).
Assuming two groups
## Benefit of Using Optimal Allocation Rules
## wa = .8
## vara = .25; pa = .5
## varb = .16; pb = .8
## SRS: pop_mean of .8*.5 + .2*.8 = .56
# sqrt(p(1 -p)/n) = .015
# n = p*(1- p)/.015^2 = 1095
# optimal_n_plus_allocation(.8, .25, .16, .015)
# n na nb
#1024 853 171
Manuscript and Scripts
-
What's the least amount of data we need to collect (and how) to estimate mean with a particular s.e. when we know the strata and strata variances?
-
Script has three functions for the 2-group case:
- What is the optimal size of
$n_a$ and$n_b$ when the variances,$w_a$ , and$n$ are known? - What is the optimal size of
$n$ ,$n_a$ , and$n_b$ when$\sigma_{\bar{x}}, \sigma_a^2, \sigma_b^2, w_a$ are known (using constrained optimization)? - What is the optimal size of
$n$ ,$n_a$ , and$n_b$ when$\sigma_{\bar{x}}, \sigma_a^2, \sigma_b^2, w_a$ are known (using the analytical formula)?
- What is the optimal size of
-
Script has three functions for the 2-group case:
-
What's the next best data point to collect when you know the strata and strata variances?
Authors
Ken Cor and Gaurav Sood