## Since population mean(parameter) is almost always unknown to us, we do sampling to find out sample mean(statistic) to estimate our population paramter.

## It is always told to us to divide by n-1 instead of n to find sample variance, this is to avoid biased estimate for population variance from sample. This is called Bessel's Correction.

## Hence, I have developed a small program which demonstrates if we divide by n-1 instead of n, then 13 out of 20 times it gives more accurate measure of variance estimates from the samples

### The reason behind dividing by n-1 

### When we take two items (with replacement) from a population of size n, there is  1/n robability that the same item is sampled twice. This means, regardless of the population’s distribution, there is a 1/n chance of observing 0 sampled squared difference.

### Intuitively, this 1/n chance of observing 0 for the sample variance would mean that we need to correct the formula by dividing by (1–1/n), or equivalently, multiplying by n/(n-1). Now when we multiply our naive formula by this value we see the infamous n-1 expression on the bottom

In [1]:
import numpy as np

In [65]:
np.random.seed(42)
population = np.random.randint(200, size=300)

In [75]:
population_mean = np.mean(population)
population_variance = np.var(population)
print(population_variance,population_mean)

3309.6045666666664 102.89


In [67]:
def mean(data):
    return data.sum()/len(data)

def variance_n(data):
    sum = 0
    for i in range(0,len(data)):
        sum = sum + ((data[i]-mean(data))**2)
    return sum/len(data) 

def variance_n_1(data):
    sum = 0
    for i in range(0,len(data)):
        sum = sum + ((data[i]-mean(data))**2)
    return sum/(len(data)-1)

In [100]:
np.random.seed(42)

for i in range(20):
    sample = np.random.choice(population,20)
    print((np.var(population)-variance_n(sample)),(np.var(population)-variance_n_1(sample)))
    

1237.6770666666666 1128.6282508771928
-638.8429333333329 -846.655959649122
-143.9954333333335 -325.76385438596526
-1497.3954333333336 -1750.3954333333336
584.3570666666665 440.92298771929836
1403.257066666666 1302.9229877192977
243.39456666666638 82.01509298245583
1123.2645666666667 1008.194040350877
1526.3770666666662 1432.5229877192978
93.35706666666692 -75.91911754385956
54.06456666666645 -117.27964385964924
945.9145666666668 821.5098298245612
-725.7954333333337 -938.1849070175444
-689.2829333333334 -899.7506964912286
-583.0829333333336 -787.9612228070177
128.70456666666632 -38.71122280701775
354.25706666666747 198.71246140350922
569.8445666666662 425.64667192982415
151.5570666666663 -14.65595964912336
449.7770666666661 299.25982982456117


### From the aforesaid snippet it can be seen that approximately 65% times it gives better estimate if we divide by n-1 instead of n