<a href="https://colab.research.google.com/github/khaichiong/meco7312/blob/master/L8_MeanSquaredError.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Monte Carlo simulation for calculating bias and variance of an estimator

In [2]:
import numpy as np

In [16]:
#DGP
n = 20 #sample size
s = 10000 #number of experiments
x = np.random.uniform(0,2,(n,s))

#ground truth, population mean parameter
theta = 1

Mean square error of the sample mean is:
$E[(\hat{\theta} - \theta)^2]$

In [17]:
#mean-squared error of sample mean:
np.mean((np.mean(x,axis=0)-theta)**2)

0.016596684629832423

Decomposing mean square error $E[(\hat{\theta} - \theta)^2] = bias^2 + Var(\hat{\theta})$

$bias^2 = (E[\hat{\theta}] - \theta)^2$

In [18]:
#taking the sample mean for each experiment
sample_mean = np.mean(x,axis=0)

#square of bias
(np.mean(sample_mean) - theta)**2

2.929771287691587e-06

Variance of estimator = $Var(\hat{\theta})$

In [19]:
np.var(sample_mean)

0.016593754858544734

In [20]:
(np.mean(sample_mean) - theta)**2 + np.var(sample_mean)

0.016596684629832426

# Comparing the MSE of sample variance vs maximum likelihood estimator of variance, when DGP is uniform distribution

In [29]:
n = 20 #sample size
s = 10000 #number of experiments
x = np.random.uniform(0,2,(n,s))
#population variance, ground truth
sigma = 1/3

In [30]:
#taking the sample variance for each experiment
sample_var = np.var(x,axis=0,ddof=1)

In [31]:
#taking the biased sample variance for each experiment
mle_var = np.var(x,axis=0,ddof=0)

In [40]:
#mean squared error of the unbiased sample variance
np.mean((sample_var - sigma)**2)

0.004903860949262039

In [41]:
#mean squared error of the biased sample variance
np.mean((mle_var - sigma)**2)

0.004718160674179678

In [42]:
#identical calculation
(np.mean(sample_var) - sigma)**2  + np.var(sample_var)

0.004903860949262039

In [43]:
#identical calculation
(np.mean(mle_var) - sigma)**2  + np.var(mle_var)

0.004718160674179676

For Normal distribution, we know the mean-squared error as $\frac{2\sigma^4}{n-1}$ and $\frac{2n-1}{n^2}\sigma^4$ for the unbiased and biased sample variance respectively

In [51]:
n = 20 #sample size
s = 10000 #number of experiments
x = np.random.normal(0,1,(n,s))
sigma = 1

In [52]:
sample_var = np.var(x,axis=0,ddof=1)
mle_var = np.var(x,axis=0,ddof=0)
print((np.mean(sample_var) - sigma)**2  + np.var(sample_var))
print((np.mean(mle_var) - sigma)**2  + np.var(mle_var))

0.10472998527323919
0.09746729096826037


In [53]:
print(2/(n-1))
print((2*n-1)/(n**2))

0.10526315789473684
0.0975


When the DGP is exponential

In [58]:
n = 20 #sample size
s = 10000 #number of experiments
x = np.random.exponential(1,(n,s))
sigma = 1

In [57]:
sample_var = np.var(x,axis=0,ddof=1)
mle_var = np.var(x,axis=0,ddof=0)
print((np.mean(sample_var) - sigma)**2  + np.var(sample_var))
print((np.mean(mle_var) - sigma)**2  + np.var(mle_var))

0.003987796224853146
0.0039831748706362574


We see that for large $n$, these two estimators are asymptotically equivalent