## TOC:
* [1 - 3 Steps of Bayesian Data Analises](#3steps-BDA)
* [2 - Bayes Formula - An Intuition](#bayes-formula)
* [3 - Most Used Distributions](#distributions)
* [4 - Posterior Approximation - A glimpse into MCMC](#mcmc)
* [5 - Quick Example in PyMC3](#example)
* [6 - Sources](#sources)

## 1. The three steps of Bayesian data analysis (Gelman) <a class="anchor" id="3steps-BDA"></a>

#### Frequentist Approach

- Need robust calibration


#### Bayesian Inference
- Model not only frequency but also epistemological variation and weights over the model configuration space  
- A probability distribution encodes much more information than a step function!

 - **Prior** = prior distribution encodes domain expertise about the model configurations in the observational model, and possibly even the context of the observational model relative to the true data generating process and latent observational process  
  - prior distributions do not need to encode all of our domain expertise but rather just enough to ensure useful inferences  
 - **Likelihood** = The likelihood function maps each model configuration to a numerical quantification that increases for model configurations that are more consistent with the specific observation and decreases for those model configurations that are less consistent. In other words, the likelihood function quantifies the relative consistency of each model configuration with the observed data.  
 - **Posterior** = Bayes’ Theorem can be thought of as switching from one conditional probability density function, that specifying the observational model, to another, that specifying the posterior distribution.


**Contraction** $\rightarrow$ likelihood function is more informative than, but also consistent with, the prior distribution:

 <img src="imgs_prez/img7.png" width="480" height="480" align="center"/>
 **Source**: betanalpha.github.io

**Containment** $\rightarrow$ prior distribution is more informative than, but also consistent with, the likelihood function:

 <img src="imgs_prez/img8.png" width="480" height="480" align="center"/>
 **Source**: betanalpha.github.io

**Compromise** $\rightarrow$ when there is tension between the information encoded in the likelihood function and the prior:

 <img src="imgs_prez/img9.png" width="480" height="480" align="center"/>
 **Source**: betanalpha.github.io

 <img src="imgs_prez/img1.png" width="640" height="640" align="center"/>
 **Source**: betanalpha.github.io

 <img src="imgs_prez/img2.png" width="640" height="640" align="center"/>
 
 **Source**: betanalpha.github.io

<img src="imgs_prez/img3.png" width="640" height="640" align="center"/>

 **Source**: betanalpha.github.io

<img src="imgs_prez/img4.png" width="640" height="640" align="center"/>

 **Source**: betanalpha.github.io

<img src="imgs_prez/img5.png" width="640" height="640" align="center"/>

 **Source**: betanalpha.github.io

<img src="imgs_prez/img6.png" width="640" height="640" align="center"/>

 **Source**: betanalpha.github.io

## 2. Bayes Formula - An Intuition <a class="anchor" id="bayes-formula"></a>

From product rule we have

$$
p(\theta,  y)= p(\theta \mid y) \: p(y) 
$$

Can be also written as:
$$
p(\theta,  y)= p(y \mid \theta ) \: p(\theta)
$$

Re-ordening
$$
p(\theta \mid y) \: p(y) = p(y \mid \theta ) \: p(\theta)
$$

Finally 😍
$$
p(\theta \mid y)  = \frac{p(y \mid \theta ) \: p(\theta)}{p(y)}
$$

Where:
    
$p(y \mid \theta ) \rightarrow$ **Likelihood** ="plausibility"of the data given the parameters  

$p(\theta)  \rightarrow$ **Prior Distribution** = What we know about parameters withouth seen the data 

$p(y) \rightarrow $  **Marginal Likelihood or Evidence** =

$(\theta \mid y) \rightarrow $  **Posterior Distribution** = compromise between prior and likelihood, updating prior believes in light of new data $\rightarrow$ suitable for **sequential** data analysis  

Writting differently ($ \theta = hypothesis $, $ y = data $):   
$$
p(hypothesis \mid data)  = \frac{p(data \mid hypothesis ) \: p(hypothesis)}{p(data)}
$$

### _"A Bayesian us one who, vagely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule."_

## 3. Most Used Distributions <a class="anchor" id="distributions"></a>


1. **Continuous Distributions** <a class="anchor" id="continuous"></a>  
1.1 - Normal (Gaussian) <a class="anchor" id="normal"></a>  
1.2 - T-Student (Robust Inference) <a class="anchor" id="tstudent"></a>  
1.3 - Beta (univariate)  <a class="anchor" id="beta"></a>  
1.4 - Dirichlet (multivariate)  <a class="anchor" id="dirichlet">  </a>
1.5 - Gamma  <a class="anchor" id="gamma"></a>  
1.6 - Uniform  

2. **Discrete Distributions**  
2.1 - Bernouli  
2.2 - Binomial
2.3 - Poisson

## 4. Posterior Approximation - A glimpse into MCMC <a class="anchor" id="mcmc"></a>

- While convalescing from an illness in 1946, Stan Ulam was playing solitaire. It, then, occurred to him to try to compute the chances that a particular solitaire laid out with 52 cards would come out successfully (Eckhard, 1987). After attempting exhaustive combinatorial calculations, he decided to go for the more practical approach of laying out several solitaires at random and then observing and counting the number of successful plays. This idea of selecting a statistical sample to approximate a hard combinatorial problem by a much simpler problem is at the heart of modern Monte Carlo simulation.
- MCMC algorithms typically require the design of proposal mechanisms to generate candidate hypotheses
- MCMC techniques are often applied to solve integration and optimisation problems in large dimensional spaces.

In [31]:
!ls imgs_prez/

enrico_fermi.jpeg img3.png          img6.png          img9.png
img1.png          img4.png          img7.png          stan_ulam.jpeg
img2.png          img5.png          img8.png          von_neumann.jpeg


<img src="imgs_prez/stan_ulam.jpeg" width="240" height="240" align="center"/>
<img src="imgs_prez/von_neumann.jpeg" width="240" height="240" align="center"/>
<img src="imgs_prez/enrico_fermi.jpeg" width="240" height="240" align="center"/>
 **Source**: An Introduction to MCMC for Machine Learning

### The Monte Carlo principle

$$
p_{N}(x)=\frac{1}{N}\sum_{i=1}^{N} \delta_{x}^{(i)}(x)
$$

Approximate integrals (or very large sums) $I(f)$ with tractable sums $I_{N}(f)$


$$
I_{N}(f)=\frac{1}{N}\sum_{i=1}^{N} f(x^{(i)}) \xrightarrow[\inf]{\text{a.s.}}I(f) = \int_{x} f(x)p(x)d(x)
$$

### Metropolis-Hastings


## Nando MCMC Videos
- Video 1: https://youtu.be/TNZk8lo4e-Q
- Video 2: https://youtu.be/sK3cg15g8FI

## 5. Quick Example in PyMC3  <a class="anchor" id="example"></a>

## Sources <a class="anchor" id="sources"></a>

1. [Michael Betancourt - Probabilistic Modeling and Statistical Inference](https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html)  
2. [Michael Betancourt - Probabilistic Building Blocks](https://betanalpha.github.io/assets/case_studies/probability_densities.html)  
3. [Michael Betancourt - Markov Chain Monte Carlo in Practice](https://betanalpha.github.io/assets/case_studies/markov_chain_monte_carlo.html)  
4. [Aerin Kim - Bayesian Inference — Intuition and Example](https://towardsdatascience.com/bayesian-inference-intuition-and-example-148fd8fb95d6)
5. [AllenDowney - ThinkBayes2](https://github.com/AllenDowney/ThinkBayes2)
6. [Andrew Gelman - Bayesian Data Analysis](http://www.stat.columbia.edu/~gelman/book/)  
7. [Colin Carroll - imcmc](https://github.com/ColCarroll/imcmc)
8. [Andrieu, Freitas - An Introduction to MCMC for Machine Learning](https://www.cs.ubc.ca/~arnaud/andrieu_defreitas_doucet_jordan_intromontecarlomachinelearning.pdf)  
9. [Eric Ma - An Introduction to Probability and Computational Bayesian Statistics](https://ericmjl.github.io/essays-on-data-science/machine-learning/computational-bayesian-stats/)