## Bernoulli Distribution
---

Let us define a random variable $X_i$ $(i=1,\dots,n)$ corresponding to tossing a coin such that
$$
X_i = 
	\begin{cases}
	1, & \text{Head is obtained}; \\
	0, & \text{Tail is obtained},
	\end{cases}
$$
and 
$$
	P(X_i=1) = \theta,\quad P(X_i=0) = 1-\theta.
$$
Then $X_i$ follows the <font color=red>Bernoulli distribution</font> and its p.m.f. is given by
$$
	p(x_i|\theta) = \theta^{x_i}(1-\theta)^{1-x_i},\quad x_i=0,1.
$$
Then the joint p.f. of $D=(x_1,\dots,x_n)$ is
\begin{align*}
	p(D|\theta) &= \prod_{i=1}^n p(x_i|\theta) = \prod_{i=1}^n \theta^{x_i}(1-\theta)^{1-x_i} \\
	&= \theta^{y}(1-\theta)^{n-y},\quad y = \sum_{i=1}^n x_i.
\end{align*}
When we regard $p(D|\theta)$ as a function of $\theta$, it is called the <font color=red>likelihood</font> or <font color=red>likelihood function</font>.

## Likelihood of Bernoulli Probability
---

`import` literally imports a package named NumPy in Python. NumPy enable us to use vectors and matrices in Python. It also comes with numerous functions for mathematical computation. `as np` means that we use `np` as a abbreviation of `numpy`.

In [1]:
import numpy as np

`scipy.stats` is a module in `SciPy`, a Python package for scientific computing. `scipy.stats` includes many functions for statistical analysis.

In [2]:
import scipy.stats as st

We use `bokeh` (BTW, the name of this package originates from a Japanese word "ボケ" which means "blurred") to create interactive graphs.

In [3]:
from bokeh.io import show, output_notebook
from bokeh.layouts import column, row
from bokeh.models import ColumnDataSource, HoverTool, Slider, Span
from bokeh.plotting import figure

You need to run `output_notebook()` before you use `bokeh` in a Jupyter Notebook.

In [4]:
output_notebook()

`q` is a 101 $\times$ 1 vector that contains a grid, {0.0, 0.01, $\ldots$, 1.0}. The first number in `linspace(0.0, 1.0, 101)` is the starting point, the second is the end point, and the third is the number of grid points.

In [5]:
q = np.linspace(0.0, 1.0, 101)

`print` shows the content of `q`.

In [6]:
print(q)

[0.   0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1  0.11 0.12 0.13
 0.14 0.15 0.16 0.17 0.18 0.19 0.2  0.21 0.22 0.23 0.24 0.25 0.26 0.27
 0.28 0.29 0.3  0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4  0.41
 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5  0.51 0.52 0.53 0.54 0.55
 0.56 0.57 0.58 0.59 0.6  0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69
 0.7  0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8  0.81 0.82 0.83
 0.84 0.85 0.86 0.87 0.88 0.89 0.9  0.91 0.92 0.93 0.94 0.95 0.96 0.97
 0.98 0.99 1.  ]


`x` is the data set.

In [7]:
x = np.array([1, 0, 1, 1, 1])

`def` is used to define a new function. In the following cell, the likelihood function of a Bernoulli distribution is defined.

In [8]:
def bernoulli_likelihood(x, q):
    n = len(x)
    y = np.sum(x)
    return q ** y * (1.0 - q) ** (n - y)

The following cell creates a graph of the likelihood.

In [9]:
source = ColumnDataSource(
    data=dict(
        q = q,
        l = bernoulli_likelihood(x, q)
    )
)
hover = HoverTool(
    tooltips=[
        ('\u03B8', '@q{0.0000}'), 
        ('likelihood', '@l{0.0000}')
    ]
)
p = figure(plot_width=400, plot_height=300, x_range=(0, 1), y_range=(0, 0.1),
           tools=[hover], toolbar_location=None, title='Likelihood (Bernoulli Distribution)')
p.line('q', 'l', source=source, line_color='navy', line_width=2)
p.xaxis.axis_label = '\u03B8'
p.yaxis.axis_label = 'Likelihood'
p.xgrid.grid_line_color = p.ygrid.grid_line_color = p.outline_line_color = None
show(p)

The following cell draws a plot to illustrate two kinds of prior distribution for the probability of sucess in the Bernoulli distribution.

In [10]:
source = ColumnDataSource(
    data=dict(
        q = q,
        uniform_pdf = st.uniform.pdf(q),
        beta_pdf = st.beta.pdf(q, 6, 6)
    )
)
hover = HoverTool(
    tooltips=[
        ('\u03B8', '@q{0.0000}'), 
        ('uniform', '@uniform_pdf{0.0000}'),
        ('beta', '@beta_pdf{0.0000}')
    ]
)
p = figure(plot_width=400, plot_height=300, x_range=(0, 1), y_range=(0, 2.8),
           tools=[hover], toolbar_location=None, title='Prior Distribution')
p.line('q', 'uniform_pdf', source=source, line_color='navy', line_width=2,
       legend_label='Uniform distribution')
p.line('q', 'beta_pdf', source=source, line_color='firebrick', line_width=2, line_dash='dashed',
       legend_label='Beta distribution')
p.xaxis.axis_label = '\u03B8'
p.yaxis.axis_label = 'Probability density'
p.legend.location = 'bottom_center'
p.legend.click_policy = 'hide'
p.legend.border_line_color = p.xgrid.grid_line_color = p.ygrid.grid_line_color = p.outline_line_color = None
show(p)

The <font color=red>uniform distribution</font> $\text{Uniform}(a, b)$ is 

$$
 p(x|a,b) = 
 \begin{cases}
 \frac1{b-a}, & (a\leqq x \leqq b); \\
 0, & (\text{otherwise}).
 \end{cases}
$$

In the above figure, we set $a=0$ and $b=1$. `st.uniform.pdf(x, loc=a, scale=b-a)` in `scipy.stats` computes the pdf of the uniform distribution $\text{Uniform}(a, b)$.

The <font color=red>beta distribution</font> $\text{Beta}(\alpha, \beta)$ is 

$$
 p(x|\alpha,\beta) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)},\ 0\leqq x\leqq 1,
$$

where $B(\alpha,\beta)$ is the beta function:

$$
 B(\alpha,\beta) = \int_0^1x^{\alpha-1}(1-x)^{\beta-1}dx.
$$

`st.beta.pdf(x, a, b)` in `scipy.stats` computes the pdf of the beta distribution $\text{Beta}(a, b)$.

## Beta Distribution
---

The shape of a beta distribution depends on two parameters: $\alpha$ and $\beta$. The following cell creates a Bokeh interactive plot in which you may play around with various parameter settings. Use two sliders labeled "$\alpha$" and "$\beta$" to see how these parameters affect the shape of the distribution.

In [11]:
def beta_pdf_plot(doc):
    slider_a = Slider(value=1.0, start=0.0, end=10.0, step=0.1, title='\u03B1')
    slider_b = Slider(value=1.0, start=0.0, end=10.0, step=0.1, title='\u03B2')
    
    x = np.linspace(0.0, 1.0, 1001)
    source = ColumnDataSource(data=dict(x=x, y=st.beta.pdf(x, 1.0, 1.0)))
    hover = HoverTool(tooltips=[('x', '@x{0.0000}'), ('density', '@y{0.0000}')])
    p = figure(plot_width=400, plot_height=300, x_range=(0, 1), y_range=(0, 5),
               tools=[hover], toolbar_location=None)
    p.line('x', 'y', source=source, line_color='navy', line_width=2)
    p.xaxis.axis_label = 'Beta distribution'
    p.yaxis.axis_label = 'Probability density'
    p.xgrid.grid_line_color = p.ygrid.grid_line_color = p.outline_line_color = None   

    def update_pdf(attr, old, new):
        a = slider_a.value
        b = slider_b.value
        source.data['y'] = st.beta.pdf(x, a, b)

    for params in [slider_a, slider_b]:
        params.on_change('value', update_pdf)
    
    doc.add_root(column(row(slider_a, slider_b, width=400), p))

# Your local jupyter server must be 'http://localhost:8888/'
show(beta_pdf_plot)

## Derivation of the Posterior Distribution of $\theta$
---

Suppose the prior distribution is $\text{Beta}(\alpha_0,\beta_0)$.

The posterior distribution of $\theta$ is given by

\begin{align*}
 p(\theta|D) &\propto p(D|\theta)p(\theta) \\
 &\propto \theta^{y}(1-\theta)^{n-y} \times \theta^{\alpha_0-1}(1-\theta)^{\beta_0-1} \\
 &\propto \theta^{y+\alpha_0-1}(1-\theta)^{n-y+\beta_0-1} \\
 &\propto \theta^{\alpha_\star-1}(1-\theta)^{\beta_\star-1},\\
 \alpha_\star &=y+\alpha_0,\quad \beta_\star=n-y+\beta_0.
\end{align*}

This is the beta distribution $\text{Beta}(\alpha_\star,\beta_\star)$.

As you may notice, the posterior distribution belongs to the same family of distribution as the prior. This type of prior distribution is called the <font color=red>natural conjugate prior distribution</font>.

## A Numerical Example of the Posterior Distribution
---

Here we generate pseudo-random numbers from the Bernoulli distribution with $\theta=\frac12$. `st.bernoulli.rvs(prob0, size=n)` generate `n` pseudo-random numbers from the Bernoulli distribution with probability of success `prob0`.

In [12]:
prob0 = 0.5
n = 10
np.random.seed(99)
data = st.bernoulli.rvs(prob0, size=n)
print(data)

[1 0 1 0 1 1 0 0 1 0]


Next, let us see how the hyper-parameters $(\alpha_0, \beta_0)$ and the sample sizes $n$ affect the shape of the posterior distribution. `bernoulli_posterior_plot` creates an interactive plot with two sliders for tuning the hyper-parameters. `a_0` and `b_0` in `bernoulli_posterior_plot` are  $\alpha_0$ and $\beta_0$ respectively.

In [13]:
def bernoulli_posterior_plot(doc):
    slider_a = Slider(value=1.0, start=0.1, end=10.0, step=0.1, title='\u03B1_0')
    slider_b = Slider(value=1.0, start=0.1, end=10.0, step=0.1, title='\u03B2_0')
    slider_n = Slider(value=10, start=1, end=250, step=1, title='Sample size')
    
    prob0 = 0.5
    a0 = slider_a.value
    b0 = slider_b.value
    n = slider_n.value
    x = np.linspace(0.0, 1.0, 1001)
    np.random.seed(99)
    data = st.bernoulli.rvs(prob0, size=250)
    y = data[:n].sum()
    a_star = y + a0
    b_star = n - y + b0
    source = ColumnDataSource(
        data = dict(
            x = x,
            prior = st.beta.pdf(x, a0, b0),
            posterior = st.beta.pdf(x, a_star, b_star)
        )
    )
    
    hover = HoverTool(
        tooltips = [
            ('\u03B8', '@x{0.0000}'), 
            ('prior', '@prior{0.0000}'),
            ('posterior', '@posterior{0.0000}')
        ]
    )
    p = figure(plot_width=600, plot_height=300, x_range=(0, 1),
               tools=[hover], toolbar_location=None)
    p.line('x', 'posterior', source=source, line_color='navy', line_width=2,
          legend_label='Posterior distribution')
    p.line('x', 'prior', source=source, line_color='firebrick', line_width=2, line_dash='dashed',
          legend_label='Prior distribution')
    p.xaxis.axis_label = '\u03B8'
    p.yaxis.axis_label = 'Probability density'
    p.add_layout(p.legend[0], 'right')
    p.legend.click_policy = 'hide'
    p.legend.border_line_color = p.xgrid.grid_line_color = p.ygrid.grid_line_color = p.outline_line_color = None

    def update_pdf(attr, old, new):
        a0 = slider_a.value
        b0 = slider_b.value
        n = slider_n.value
        y = data[:n].sum()
        a_star = y + a0
        b_star = n - y + b0
        source.data['prior'] = st.beta.pdf(x, a0, b0)
        source.data['posterior'] = st.beta.pdf(x, a_star, b_star)

    for params in [slider_a, slider_b, slider_n]:
        params.on_change('value', update_pdf)
    
    doc.add_root(column(row(slider_a, slider_b, slider_n, width=500), p))

# Your local jupyter server must be 'http://localhost:8888/'
show(bernoulli_posterior_plot)

Bayes' theorem is rearranged as

\begin{equation*}
 \frac{p(\theta|D)}{p(\theta)} = \frac{p(D|\theta)}{p(D)}.
\end{equation*}

Therefore

\begin{equation*}
\frac{p(\theta|D)}{p(\theta)} \gtreqqless 1\quad \text{if and only if}\quad \frac{p(D|\theta)}{p(D)} \gtreqqless 1.
\end{equation*}

Since

\begin{equation*}
\begin{cases}
 \displaystyle
 \frac{p(\theta|D)}{p(\theta)} > 1, & \text{plausibility of $\theta$ is increased}; \\
 \\
 \displaystyle
 \frac{p(\theta|D)}{p(\theta)} < 1, & \text{plausibility of $\theta$ is decreased},
\end{cases}
\end{equation*}

The plausibility of a specific value of $\theta$ depends on whether its likelihood $p(D|\theta)$ is higher the marginal likelihood $p(D)$.

## Beysian Inferernce with the Posterior Distribution
----

The posterior distribution $p(\theta|D)$ embodies all available information about unknown parameter(s), $\theta$. When the number of parameters to be analyzed is relatively small, displaying graphs of all (marginal) posterior distributions may be sufficient to convey useful insights on the parameters to readers.

However, when we need to analyze many parameters, it is impractical and pointless to show all graphs on the parameters in an article or report. In practice, we calculate and report several "summary statistics" that show us key characteristics of the posterior distribution. We call them the <font color=red>posterior statistics</font>.

## Point Estimation
---

On many occasions, we need to report one particular value of the parameter we regard as the most plausible guess. This type of value is called an  <font color=red>estimate</font> and a procedure to obtain an estimate is called  <font color=red>point estimation</font>.

In Bayesian statistics, an estimate of the parameter is defined as a value that minimize the expected loss.

\begin{align*}
 \delta_\star
 &= \arg\min_{\delta}\mathrm{E}_{\theta}[L(\theta,\delta)|D] \\
 &= \arg\min_{\delta}\int_{\Theta}L(\theta,\delta)p(\theta|D)d\theta,
\end{align*}

where $L$ is the <font color=red>loss function</font> and $\Theta$ is a set of all possible values of $\theta$ (<font color=red>parameter space</font>).

| loss function  | $L(\theta,\delta)$        | point estimate
|:---------------|:-------------------------:|:------------------|
| quadratic loss | $(\theta-\delta)^2$       | posterior mean
| absolute loss  | $|\theta-\delta|$         | posterior median
| 0-1 loss       | $1-\mathbf{1}_\theta(\delta)$ | posterior mode

Let us make a figure of three loss functions where the ture valeu of $\theta$ is $\frac12$.

In [14]:
q = np.linspace(0, 1, 250)
source = ColumnDataSource(
    data = dict(
        q = q,
        loss_L2 = (q - 0.5)**2,
        loss_L1 = np.abs(q - 0.5),
        loss_01 = np.ones(q.shape),
    )
)
p = figure(plot_width=400, plot_height=300, x_range=(0, 1), title='Loss Function',
           toolbar_location=None)
p.line('q', 'loss_L2', source=source, line_color='navy', line_width=2,
       legend_label='Quadratic loss')
p.line('q', 'loss_L1', source=source, line_color='firebrick', line_width=2, line_dash='dashed',
       legend_label='Absolute loss')
p.line('q', 'loss_01', source=source, line_color='green', line_width=2, line_dash='dashdot',
       legend_label='0-1 loss')
p.line([0.5, 0.5], [0.0, 1.0], line_color='green', line_dash='dotted')
p.circle(0.5, 1.0, size=8, line_color='green', fill_color='white')
p.circle(0.5, 0.0, size=8, line_color='green', fill_color='green')
p.xaxis.axis_label = 'Point estimate \u03B4'
p.yaxis.axis_label = 'Loss'
p.legend.location = (20, 130)
p.legend.click_policy = 'hide'
p.legend.border_line_color = p.xgrid.grid_line_color = p.ygrid.grid_line_color = p.outline_line_color = None
show(p)

## Interval Estimation
---

1. <font color=red>Credible Interval (CI)</font> <br>
The credible interval of $\theta$ is an interval $[a_c, b_c]$ such that

  * $P(a_c \leqq \theta \leqq b_c|D) = 1-c$,
  * $P(\theta < a_c|D) = \frac{c}2$ and $P(\theta > b_c|D\} = \frac{c}2$.

2. <font color=red>Highest Posterior Density interval (HPDI)</font>

   The highest posterior density interval of $\theta$ is an interval $[a_c, b_c]$ such that

  * $P(a_c \leqq \theta \leqq b_c|D) = 1-c$,
  * for any pair $(\theta,\theta^{\prime})$ such that $\theta\in[a_c, b_c]$ and $\theta^{\prime}\notin [a_c, b_c]$, $p(\theta|D) > p(\theta^{\prime}|D)$ must hold.

   In particular, if the distribution is unimodal (it has the unique mode), the HPDI must satisfy

\begin{align*}
 P(a_c \leqq \theta \leqq b_c|D) &= 1-c, \\
 p(a_c|D) &= p(b_c|D).
\end{align*}

Here we import a module called `scipy.optimize`. It includes functions for numerical optimization.

In [15]:
import scipy.optimize as opt

`pandas` is a Python package for data analysis. `display` is used to show a Pandas dataframe in the Jupyter Notebook.

In [16]:
import pandas as pd
from IPython.display import display

The function `beta_hpdi` returns the HPD interval of the beta distribution. The nested function `hdpi_conditions` returns a vector such that

$$
 \begin{bmatrix}
 P(\theta\leqq b|D) - P(\theta \leqq a|D) - p \\
 p(b|D) - p(a|D)
 \end{bmatrix},
$$

where $p=1-c$ and $P(\theta\leqq b|D) - P(\theta \leqq a|D) = P(a \leqq \theta \leqq b|D)$.

So if we find a pair $[a, b]$ that makes the above vector equal to zero, such a pair is regarded as the 100p% HPDI. This is done by the function `root` in `scipy.optimize`.

A general syntax of `root` is as follows.
```Python
root(f, initial_value, arg=(arguments_of_f))
```
`root` finds a solution of `f`, i.e., a vector $x$ such that $f(x)=0$, by using a seach algorithm. `initial_value` is the starting point of the search algorithm. `arg` is a tuple containing arguments to be passed to `f`.

In [17]:
def beta_hpdi(ci0, alpha, beta, prob):
    def hpdi_conditions(v, a, b, p):
        eq1 = st.beta.cdf(v[1], a, b) - st.beta.cdf(v[0], a, b) - p
        eq2 = st.beta.pdf(v[1], a, b) - st.beta.pdf(v[0], a, b)
        return np.hstack((eq1, eq2))
    return opt.root(hpdi_conditions, ci0, args=(alpha, beta, prob)).x

The following cell computes the 90% confidence interval `ci` and the 90% highest posterior density interval `hpdi` of $\text{Beta}(2,5)$ as an illustration. `interval` is a method to compute the credible interval, which is already included in `scipy.stats`.

In [18]:
a = 2.0
b = 5.0
prob = 0.9
ci = st.beta.interval(prob, a, b)
hpdi = beta_hpdi(ci, a, b, prob)

Then we create a figure for comparison betweem `ci` and `hdpi`.

In [19]:
x = np.linspace(0.0, 1.0, 250)
ci_x =  np.linspace(ci[0], ci[1], 250)
hpdi_x = np.linspace(hpdi[0], hpdi[1], 250)
hpdi_height = st.beta.pdf(hpdi[0], a, b)
source = ColumnDataSource(
    data = dict(
        x = x,
        posterior = st.beta.pdf(x, a, b),
        ci_x = ci_x,
        ci_y1 = np.zeros(ci_x.shape),
        ci_y2 = st.beta.pdf(ci_x, a, b),
        hpdi_x = hpdi_x,
        hpdi_y1 = np.zeros(hpdi_x.shape),
        hpdi_y2 = st.beta.pdf(hpdi_x, a, b)
    )
)
hover = HoverTool(
    tooltips = [
        ('\u03B8', '@x{0.0000}'),
        ('posterior', '@posterior{0.0000}')
    ]
)

# Credible Interval (CI)
ci_plot = figure(plot_width=400, plot_height=300, x_range=(0, 1), y_range=(0, 2.8),
                 title='Credible Interval', tools=[hover], toolbar_location=None)
ci_plot.line('x', 'posterior', source=source, line_color='navy', line_width=2)
ci_plot.varea('ci_x', 'ci_y1', 'ci_y2', source=source, fill_color='#9ecae1', fill_alpha=0.5)
ci_plot.add_layout(Span(location=hpdi_height, dimension='width', line_color='firebrick'))
ci_plot.xaxis.axis_label = '\u03B8'
ci_plot.yaxis.axis_label = 'Probability density'
ci_plot.xgrid.grid_line_color = ci_plot.ygrid.grid_line_color = ci_plot.outline_line_color = None

# Highest Posterior Density Interval (HPDI)
hpdi_plot = figure(plot_width=400, plot_height=300, x_range=(0, 1), y_range=(0, 2.8),
                   title='Highest Posterior Density Interval', tools=[hover], toolbar_location=None)
hpdi_plot.line('x', 'posterior', source=source, line_color='navy', line_width=2)
hpdi_plot.varea('hpdi_x', 'hpdi_y1', 'hpdi_y2', source=source, fill_color='#9ecae1', fill_alpha=0.5)
hpdi_plot.add_layout(Span(location=hpdi_height, dimension='width', line_color='firebrick'))
hpdi_plot.xaxis.axis_label = '\u03B8'
hpdi_plot.yaxis.axis_label = 'Probability density'
hpdi_plot.xgrid.grid_line_color = hpdi_plot.ygrid.grid_line_color = hpdi_plot.outline_line_color = None

show(row(ci_plot, hpdi_plot))

`bernoulli_stats` computes posterior statistics (mean, median, mode, standard deviation, CI, HPDI). The following are standard methods for descriptive statistics:

+ `interval` - interval
+ `mean` - mean
+ `median` - median
+ `std` - standard deviation (the square root of the variance)

`DataFrame` converts a matrix (NumPy 2D array) into a Pandas dataframe.

In [20]:
def bernoulli_stats(data, a0, b0, prob):
    n = data.size
    sum_data = data.sum()
    a = sum_data + a0
    b = n - sum_data + b0
    mean_pi = st.beta.mean(a, b)
    median_pi = st.beta.median(a, b)
    mode_pi = (a - 1.0) / (a + b - 2.0)
    sd_pi = st.beta.std(a, b)
    ci_pi = st.beta.interval(prob, a, b)
    hpdi_pi = beta_hpdi(ci_pi, a, b, prob)
    stats = np.hstack((mean_pi, median_pi, mode_pi, sd_pi, ci_pi, hpdi_pi))
    stats = stats.reshape((1, 8))
    stats_string = ['mean', 'median', 'mode', 'sd', 'ci (lower)', 'ci (upper)', 'hpdi (lower)', 'hpdi (upper)']
    param_string = ['$\\theta$']
    results = pd.DataFrame(stats, index=param_string, columns=stats_string)
    return results, a, b

In the following cell, we generate 10 pseudo-random numbers from the Bernoulli distribution with $\theta=\frac12$ and compute the posterior statistics with `bernoulli_stats`.

In [21]:
prob0 = 0.5
n = 10
a0 = 1.0
b0 = 1.0
np.random.seed(99)
data = st.bernoulli.rvs(prob0, size=n)
prob = 0.95
results, a, b = bernoulli_stats(data, a0, b0, prob)
display(results)

Unnamed: 0,mean,median,mode,sd,ci (lower),ci (upper),hpdi (lower),hpdi (upper)
$\theta$,0.5,0.5,0.5,0.138675,0.233794,0.766206,0.233794,0.766206


Note that three point estimats (mean, median, mode) are all identical. Furthermore the CI is also identical to the HPDI. This is because the posterior distribution is unimodal and symmetric around the mode.

## Bayes Factor
---

In statistics, either Bayesian or non-Bayesian, a hypothesis on the parameter(s) is a region or interval where the true value of the parameter is supposed to be located. In general, a hypothesis $H_i$ under which the true value of  $\theta$ is located in a region $S_i\subset\Theta$ is expressed as

\begin{equation*}
 H_i:\; \theta\in S_i, \quad i=0,1,2,\dots
\end{equation*}

In Bayesian statistics, plausibility of a hypothesis is measured by the posterior probability that the true value of $\theta$ is located in $S_i$, that is,

$$
 P(H_i|D) = P(\theta\in S_i|D) = \int_{S_i}p(\theta|D)d\theta.
$$

Competing hypotheses can be compared by the <font color=red>Bayes factor</font>, which is defined as

$$
 \text{Bayes factor} = \mathrm{B}_{ij}
 = \frac{P(H_i|D)}{P(H_j|D)}\div\frac{P(H_i)}{P(H_j)}
 = \frac{\text{Posterior odds ratio}}{\text{Prior odds ratio}}.
$$

| Rank | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Bayes factor $\mathrm{B}_{ij}$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Support for $H_j$ |
|:----:|:---------------------------------------------:|:-----------------------:|
|    0 | $0 < \log_{10}(\mathrm{B}_{ij})$              | Rejected                |
|    1 | $-\frac12 < \log_{10}(\mathrm{B}_{ij}) < 0$   | Barely worth mentioning |
|    2 | $ -1 < \log_{10}(\mathrm{B}_{ij}) < -\frac12$ | Substantial             |
|    3 | $-\frac32< \log_{10}(\mathrm{B}_{ij}) < -1$   | Strong                  |
|    4 | $-2 < \log_{10}(\mathrm{B}_{ij}) < -\frac32$  | Very strong             |
|    5 | $\log_{10}(\mathrm{B}_{ij}) < -2$             | Decisive                |

We want to check whether the true value of $\theta$ is more than 50% or not, i.e.,

$$
 \begin{cases}
 H_0:\ & \theta \leqq 0.5; \\
 H_1:\ & \theta > 0.5.
 \end{cases}
$$

When $\theta$ is the true vote share of a candidate, supporting $H_1$ implies that this candidate won the election.

Suppose the true $\theta$ is 60% and we generate 500 observations from the Bernoulli distribution with $\theta=60\%$.

In [22]:
p = 0.6
n = 500
a0 = 1.0
b0 = 1.0
np.random.seed(99)
data = st.bernoulli.rvs(p, size=n)
y = np.sum(data)
a_star = y + a0
b_star = n - y + b0
Prior_odds = st.beta.cdf(0.5, a0, b0) / (1.0 - st.beta.cdf(0.5, a0, b0))
Posterior_odds = st.beta.cdf(0.5, a_star, b_star) / (1.0 - st.beta.cdf(0.5, a_star, b_star))
Bayes_factor = Posterior_odds / Prior_odds
print([np.log10(Prior_odds), np.log10(Posterior_odds), np.log10(Bayes_factor)])

[0.0, -3.474580539092309, -3.474580539092309]


In this example, the prior odds ratio is 1. So the Bayes factor is identital to the posterior odds ratio. The Bayes factor suggests that $H_1$ is decisively supported by the evidence. Thus we can safely call the election for this candidate.

## Savage-Dickey Density Ratio
---

The <font color=red>Savage-Dickey density ratio</font> (<font color=red>SDDR</font>) is the Bayes factor specifically designed for 

$$
 \begin{cases}
 H_0:\ & \theta = \theta_0; \\
 H_1:\ & \theta \ne \theta_0.
 \end{cases}
$$

The SDDR is based on the spike-and-slab prior:

$$
 p(\theta) = p_0 \delta(\theta-\theta_0) + (1-p_0)f(\theta),\quad 0 < p_0 < 1,
$$

where the choice of $p_0$ is arbitrary. The SDDR is given by

$$
 \mathrm{B}_{01} = \frac{f(\theta_0|D)}{f(\theta_0)},
$$

where $f(\theta|D)$ is the posterior distribution when $\theta\ne\theta_0$, i.e.,

$$
 f(\theta|D)= \frac{p(D|\theta)f(\theta)}{\int_{\Theta}p(D|\theta)f(\theta)d\theta}.
$$

As an illustration, we test the following hypotheses 

$$
 \begin{cases}
 H_0:\ & \theta = 0.5; \\
 H_1:\ & \theta \ne 0.5.
 \end{cases}
$$

with the SDDR. We use $\text{Beta}(1,1)$ (the uniform distribution between 0 and 1) as $f$ in the spike-and-slab prior.

In [23]:
SDDR = st.beta.pdf(0.5, a_star, b_star)
print(np.log10(SDDR))

-1.261090030594163


The SDDR suggests that $H_1$ is strongly supported. So we may conclude that the true value of $\theta$ is not equal to 50%.