# Estimation

**Estimation**: The process of trying to determine the best-fit model and parameters for a given distribution.

**Estimator**: An algorithm that delivers an *estimation*.

There are many possible estimators for any given problem. The best estimator depends on what errors we're trying to minimize.

**Maximumum Likelihood Estimators** (MLE) have the best probability of being the "right guess." Downey uses the example of rolling three dice and guessing the sum: the mean (10.5) minimizes the *mean squared error*, but can never be the "correct" guess, since 10.5 is not a possible value. 

**Bias** describes the way in which an estimator is consistently "off" from its estimate. Bias can either measure the difference between the expected value of the estimator and the actual parameter (mean-bias, the standard definition) or the difference between the median value and the actual param (median-bias, less common).

**Sampling error**: variance in the data caused by *random* selection (usually at low sample sizes)

**Sampling distribution**: the plot of the sample means of many trials

**Sampling bias**: A predictable way in which a given sampling *method* produces a "wrong" estimate

## Exercises

**1.**

In this chapter we used sample mean and sample median [as estimators] to estimate µ, and found that sample mean yields lower MSE. Also, we used S2 and Sn−12 to estimate σ, and found that S2 is biased and Sn−12 unbiased.

Run similar experiments to see if sample mean and sample median are biased estimates of µ. Also check whether S2 or Sn−12 yields a lower MSE.

In [10]:
import random
import numpy as np
from math import sqrt

def MSE(estimates, actual):
    return np.mean([(e - actual)**2 for e in estimates])

def bias(estimates, actual):
    return np.mean([e - actual for e in estimates])

def experiment(mu=0, sigma=1, n=10, m=100000):

    sample_means, sample_medians = [], []
    s2_biased, s2_unbiased = [], []
    
    for x in range(m):
        samples = [random.gauss(mu, sigma) for y in range(n)]
        sample_means.append(np.mean(samples))
        sample_medians.append(np.median(samples))
        s2_biased.append(np.var(samples))
        s2_unbiased.append(np.var(samples, ddof=1))
    
    print("Bias of the sample mean:", bias(sample_means, mu))
    print("Bias of the sample median:", bias(sample_medians, mu))
    print("MSE of S^2:", MSE(s2_biased, sigma**2))
    print("MSE of S(n-1)^2:", MSE(s2_unbiased, sigma**2))
    
experiment()

Bias of the sample mean: -4.97303609867e-05
Bias of the sample median: -0.000201764198987
MSE of S^2: 0.43561906087686575
MSE of S(n-1)^2: 0.47096633175223307


**2.**

Suppose you draw a sample with size n=10 from an exponential distribution with λ=2. Simulate this experiment 1000 times and plot the sampling distribution of the estimate L. Compute the standard error of the estimate and the 90% confidence interval.

Repeat the experiment with a few different values of `n` and make a plot of standard error versus `n`.

In [16]:
%matplotlib inline%

def samples(n, lam, m):
    Ls = []
    for i in range(m):
        L = 1 / np.mean([random.expovariate(lam) for _ in range(n)])
        Ls.append(L)
    return Ls

def conf_interval(estimates, actual):
    low = np.percentile(estimates, 5)
    high = np.percentile(estimates, 95)
    return (low, high)

def plot_sampling_distribution(samples):
    pass

def plot_standard_error(lam, m, ns=[100, 1000, 10000]):
    errors = [MSE(samples(n, lam, m), lam) for n in ns]
    # TODO: bar chart of errors

lam = 2
m = 1000
samples = sample(n=10, lam=lam, m=m)
plot_sampling_distribution(samples)
print("SE of the estimate:", MSE(samples, lam))
print("90% confidence interval:", conf_interval(samples, lam))
plot_standard_error(lam=lam, m=m)



IndentationError: expected an indented block (<ipython-input-16-fd22f76debb7>, line 19)