# Introduction
Parameter estimation is a common data analysis problem. We can think of many *real life* examples of experiments where we are interested in knowing the value of a set of parameters, e.g., the charge of the electron, the mass of the Moon, etc. In this notebook we focus on examples where we are interested in estimating the value of a single parameter. These examples will be used to discuss about the use of Bayes' Theorem, error-bars, and confidence intervals.

# Example 1: Fair coin?
Assume that we encounter a very *strange* coin. This is, we observe that from 11 flips only 4 of them landed heads. Naturally we can ask: is this a fair coin?
Where by fair we mean a coin that has $1/2$ chances of landing heads/tails on a flip.

Moreover, let's say we determine the coin is fair. How sure are we of our assertion about the coin? IF the coin was not fair, how unfair do we think the coin is?

To try answering these questions we need to formulate the problem more precisely. Instead of considering a pair of hypothesis (the coin is fair or not) we can see the problem through a large number of contiguous propositions, or hypothesis, about the range in which the *bias-weighting* of the coin might lie. Let $H$ denote this *bias-weighting*, then if $H=0$ the coin will always land tails after a flip. Conversely, when $H=1$ the coin will always land heads after a flip. Notice that $H=1/2$ indicates a **fair coin**.

The propositions then could be stated in the form: 
* $0.00 < H < 0.01$;
* $0.01 < H < 0.02$;
* $0.03 < H < 0.03$;
* so on...

In this way, the *state of knowledge* about the (degree of) fairness or unfairness of the coin is specified by how much we believe the statement to be true, given by the probability assigned to each of the propositions or to groups of them. 

In the presence of data, our inference about the fairness of the coin is summarized by the contditional pdf: $$\text{prob}(H| \text{\{data\}}, I),$$ where the probability that $H$ lies within an infinitesimal interval $dH$ is given by:
$$\text{prob}(H| \text{\{data\}}, I)dH.$$
To estimate this posterior pdf we need to use Bayes' Theorem:
$$\text{prob}(X | Y,I) = \dfrac{\text{prob}(Y | X, I)\times\text{prob}(Y | I)}{\text{prob}(Y | I)}.$$

Which in our case reads:
$$\text{prob}(H | \text{\{data\}},I) \propto \text{prob}(\text{\{data\}}| H,I)\times \text{prob}(H | I),$$

note that we have omitted $\text{prob}(\text{\{data\}}, I)$ since does not involve $H$. If needed, we can calculate the normalization factor from:
$$\int_{0}^{1}\text{prob}(H | \text{\{data\}}, I)dH = 1.$$

The prior pdf, $\text{prob}(H | I)$ represents our knowledge about the coin given the information we have about the problem: *a strange coin*. We can assign a simple probability that reflects this situation, for instance:
$$\text{prob}(H|I) = \left\{ \begin{array}\\ 1, \,\, 0 \leq H \leq 1 \\
                                0, \text{ otherwise}\end{array}\right.$$
                                
This prior state of knowledge will be updated by data via the likelihood function: $\text{prob}(\text{\{data\}}| H, I)$. Additionally, we can assume the coin tosses are independent events, then the probability of getting **R heads in N tosses** is given by the binomial distribution:
$$\text{prob}(\text{\{data\}}| H, I) = H^R(1-H)^{N-R}.$$

Don't worry about the origin of this expression for the moment, we will come back to it on future notebooks.

In [None]:
Let