# Sample Variance and Variance of Sample Means

Two expressions float around that allow the inference of the population variance based on samples that are drawn. In one case, $\sigma_p^2/(n-1)$, and in the other case $\sigma_p^2/n$. Especially during times of prolonged peace, that can already be confusing.

They're actually both correct, and they both have to do with drawing samples, and they're both frequently called $\sigma_s$. The thing is that the context is two very different calculations.

## Sample Means
A collection of $N$ samples are drawn from a population. The samples each have size $n$. When the mean of the each of the $N$ samples is calculated, it turns out that the $N$ sample means have a distribution that has mean equal to the population mean, and variance equal to $\sigma_p^2/n$. 

In other words, if you have a bunch of samples, you can infer the population variance by looking at the distribution of the means of the samples you took.

## Sample Variance
A single sample of size $n$ is drawn from a population. The variance of the $n$ members of the sample is $\sigma_p^2/(n-1)$.

In [15]:
import numpy as np

def var_1(sample):
    """
    population variance estimated based on variance within individual samples.
    """
    mean = np.mean(sample)
    n = len(sample)
    population_variance = np.sum([(i-mean)**2 for i in sample])/(n-1)
    
    return population_variance

def var_2(samples):
    """
    population variance estimated based on the variance of the means of a collection of samples.
    """
    n = len(samples[0]) 
    means = [np.mean(sample) for sample in samples]
    population_mean = np.mean(means)
    population_variance = n*np.std(means)**2
    
    return population_variance


N = 5
var = 1
samples = [np.sqrt(var)*np.random.randn(N) for i in range(10000)]

# via intra-sample variance
pvar1 = np.mean([var_1(sample) for sample in samples])
pvar1_std = np.std([var_1(sample) for sample in samples])

# via sample means
pvar2 = var_2(samples)

print("True Population Variance:\n%1.3f\n" % var)

print("Estimated via intra-sample variance:\n %1.3f (+/- %1.3f)\n" % (pvar1,pvar1_std))

print("Estimated via variance of sample means:\n %1.3f\n" % pvar2)

True Population Variance:
1.000

Estimated via intra-sample variance:
 1.012 (+/- 0.714)

Estimated via variance of sample means:
 0.979

