# Bayesian Inference

Bayesian statistics uses probability distributions to quantify degrees of belief. Bayesian inference provides the mathematical formalism and machinery to incorporate prior beliefs into our models and update them based on new evidence. The prior distribution encodes the degree of belief of a parameter or set of parameters before they are seen. On observation of new data, the priors are updated to form the posterior.

This is akin to how humans think and process information. Imagine you are estimating the amount of time it takes for you to travel from home to work. The first time, you make a guess based on some assumptions you have, for instance the distance and amount of traffic. After driving to work the first time, you have evidence of how long it actually takes. After doing this day in and day out, you become more *certain*. This is basically Bayesian inference. Initially, we make an estimate (our prior belief) which encodes how certain we are about some phenomena (i.e the time it takes to drive to work). After seeing increasingly more data, we become more certain and update our belief (the posterior belief).

Bayes formula is central to Bayesian inference (hence the not so obvious name!). Bayes formula is given by

$p(\theta | D) = \frac{p(D|\theta) p(\theta)}{p(D)}$

which in English terms is

$\text{posterior = } \frac{\text{prior } \times \text{ likelihood}}{\text{evidence}}$

The evidence serves to normalize the density to give you a distribution. Often we can ignore it and write the above as

$p(\theta | D) \propto p(D|\theta) p(\theta)$

$\text{posterior} \propto \text{prior } \times \text{ likelihood}$

We walk through a concrete example of Bayesian inference to solidify the concept



## Example

This example is motivated by this [blog post](https://medium.com/paper-club/analytical-bayesian-inference-with-conjugate-priors-4a1d75ca799b)

You are an up and coming musician. You are releasing a new album tomorrow and you want to estimate how many streams you'll get on the first day of release.

The relevant distribution in this case is a Poisson distribution. A Poisson distribution is a discrete distribution that shows how many times an event is likely to occur in a specified period of time. An assumption is that the event occur at a constant rate within a given interval of time and that they are independent. The probability mass function is given by the formula

$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$

The only parameter for this distribution is $ \lambda $. We choose a prior over this parameter. To make it analytical, we use the conjugate prior of the Poisson distribution, the gamma distribution. The probability density function is given by

$\frac{\beta^\alpha x^{\alpha-1}e^{-\beta x}}{\Gamma(\alpha)}$

where $\Gamma(\alpha)$ is the gamma function given by

$\Gamma(\alpha) = \int^\infty_0 x^{\alpha-1}e^{-x} dx$