# Parameter estimation: a Bayesian approach

In the previous notebook we examined how to decide whether the mean of sample was different from a given population mean using a hypothesis-testing approach. Now we'll look at a different way to think about parameter estimate that lets us use data to make probability statements about the parameter. First, you need to be familiar with Bayes' rule.

### Bayes' Rule

Bayes' theorem is an equation that describes the probability of an event A, given that some other event B has occurred:

$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$

Fundamentally, Bayes' theorem is an equation to update our knowledge of the probability of an event based on new evidence. To make this more explicit, Bayes' theorem is often rearranged to produce Bayes' rule.

To understand Bayes' rule, consider:

- $A_1$ and $A_2$ as two competing hypotheses (e.g., the mean size of a population of items is equal to 10 cm or is not equal to 10cm)

- B as an (independent and identically distributed) sample of observations from the population

In this case, the prior odds of the event is simply the ratio of the probability of the two events:


$O(A_1:A_2) = \frac{P(A_1)}{P(A_2)}$

When we observe B, the new data, we can estimate the **likelihood** of this data under each of the alternative hypotheses:

-> $\Lambda(A_1:A_2|B) = \frac{P(B|A_1)}{P(B|A_2}$ <-


When many events A are involved, since P(B) is fixed and can be treated as a constant, the rule can be phrased as:

$P(A|B) \propto P(A)P(B|A)$


To sum it up, the basic concept of Bayesian parameter estimation is:

Posterior distribution $\propto$ Likelihood * Prior distribution

### Priors

The crux of Bayesian parameter estimation is the concept of the **prior distribution.**

A **conjugate prior** refers to when the prior and posterior distributions belong to the same parametric family. This leads to more simple mathematical properties and is extremely useful for sequential estimation.

Some examples:

- if the data are Poisson random variables and the prior is a Gamma distribution, the posterior is also a Gamma distribution

- if the data are Bernoulli random variables and the prior is a beta distribution, the posterior is also a beta distribution

- if the data and the prior are normally distributed, the posterior will be normally distributed.

An **improper prior** is when the prior is not a proper probability distribution (e.g., does not sum to one); this is often the case if the prior is a uniform distribution.

Priors contain varying amounts of information about the parameter, which affects how much the observed data are "allowed" to influence the posterior. For example, for a normally distributed prior, a high variance means a diffuse or uninformative prior that will allow the new data to influence the posterior distribution more than an informative prior with low variance.

In [32]:
###Posterior sampling

#define grid of possible parameter values
grid = [float(x)/1000 for x in xrange(1001)]

#define prior probability for each possible parameter
prior = [1]*1001

#compute likelihood at each value in grid

#compute posterior of likelihood and prior
post = likelihood*prior

#standardize
post = post/sum(post)

#sample the posterior

1

In [None]:
### Assumptions

### Key differences with the hypothesis testing approach:
- fundamentally, Bayesians view all parameters as distributions and not as some "true" fixed numbers. 

Some additional notes:
