In [2]:
%matplotlib inline
from __future__ import division
import matplotlib as mpl
from matplotlib import gridspec
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.stats import norm
from ipywidgets import interactive
from bayeshelper import linear_model

# Bayesian inference in Python
Ryan Dwyer *February 17, 2016*


# Motivation

![phase kick data](figs/phase-kick-151119.pdf)

- Phase kick experimental data has complicated noise profile (see shaded regions)
- Especially in organic samples with long recovery times, few data points, so worthwhile to glean as much information as possible from data


*Bayesian inference allows inferring,*

- **experimental noise**
- **sample parameters**
- **distribution of possible sample parameters and experimental noise**

*simultaneously, in a motivated way.*

# Statistics notation

<img src="linear-fit.png" width=640px>

- Consider linear regression
- $N$ data points $(x_i, y_i)$
- Model the experimental data as a line with normally distributed errors,
    $$\begin{align}
    \mu_i& = m x_i + b&\\
    y_i& \sim \mathcal{N}(\mu_i, \sigma)&
    \end{align}
    $$
    - $\sim$ means "is distributed as"
    - $\mathcal{N}(\mu, \sigma)$ means a normal distribution with mean $\mu$, standard deviation $\sigma$
    - This shows explicitly how to simulate data from the model.
- Because $y_i$ is normally distributed, each data point has a likelihood $L_i$ given by the normal distribution's probability density (see above, right),
    $$L_i = \frac{1}{\sigma \sqrt{2\pi}} \exp \left( \frac{-(y_i -\mu_i)^2}{2\sigma^2} \right )$$
- Independent data points, so the likelihood of the entire dataset is the product of the likelihood of individual data points,
    $$
    L = \prod_{i=1}^{N} L_i
    $$
- Easier to work with the log likelihood,
    $$\log L = \sum_{i=1}^{N} \log L_i = -\sum_{i=1}^{N} \frac{(y_i -\mu_i)^2}{2\sigma^2} + \log(\sigma \sqrt{2\pi})$$
- Ordinary least squares gives the *maximum likelihood estimate (MLE)* for this model (green).

# Bayesian

<img src="linear-fit.png" width=640px>

- **Prior** representing initial knowledge / belief about distribution of $m$, $b$, $\sigma$
- **Likelihood** derived from model (same as above)
- **Posterior** Rather than a point estimate (MLE), compute a posterior distribution of plausible parameters values, given the **likelihood** and **prior**

<img src="figs/likelihood.png" alt="likelihood" style="width:640px">

- $\mathrm{Posterior} \propto \mathrm{Prior} \times \mathrm{Likelihood}$

**Why?** Observed data may not provide much information about a given parameter.

# Example: Brownian motion

[[notebook](Brownian Motion Example.ipynb), [html](Brownian Motion Example.html)]

# Python resources

## `emcee`

- Sampler needs no knowledge of derivatives

## `PyStan`

- Hamiltonian Monte Carlo (HMC) with "No-U-turn sampler" (NUTS) sampler

## `PyMC2/3`

- Metropolis-Hastings (PyMC2/PyMC3), HMC with NUTS (PyMC3)


## Resources


- [Frequentism and Bayesianism: A Python-driven Primer.](http://arxiv.org/pdf/1411.5018v1.pdf)
    - This is based on a series of blog posts at [Pythonic Permutations](https://jakevdp.github.io/blog/2015/08/07/frequentism-and-bayesianism-5-model-selection/). Most relevant is,
- [Frequentism and Bayesianism IV: How to be a Bayesian in Python](http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/)

### Books

- [*Stan Modeling Language User's Guide and Reference Manual*](http://mc-stan.org/documentation/)
    - Really great resource; implements basic versions of many different types of models in Stan (linear regression, time series analysis, multilevel modeling, ARMA processes, and more).
- *Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan.* Kruschke, J.
    - Starts from a fairly basic level. The author also has a variety of articles on his [website](http://www.indiana.edu/~kruschke/).
- *Statistical Rethinking: A Bayesian Course with Examples in R and Stan.* McElreath, R [website](http://xcelab.net/rm/statistical-rethinking/).
    - Emphasis on information theory for motivating choice of distributions, errors, priors.
- Bayesian Data Analysis. Gelman, A., *et al*.
    - Comprehensive overview, somewhat more advanced. Available online from the [Cornell library](https://newcatalog.library.cornell.edu/catalog/9204986).

## Outline

- [x] Motivation
- [x] Statistics notation / terminology (with pictures)
- [x] Brownian motion example [[notebook](LogNormalBrownian.ipynb), html]
- [ ] Different samplers?
- [ ] Scientific use cases