Inference for binomial proportion (Matlab/Python)

Algae status is monitored in 274 sites at Finnish lakes and rivers. The observations for the 2008 algae status at each site are presented in file algae.txt (’0’: no algae, ’1’: algae present). Let $\pi$ be the probability of a monitoring site having detectable blue-green algae levels.

Use a binomial model for observations and a $Beta(2, 10)$ prior for $\pi$ in Bayesian inference. Formulate Bayesian model (likelihood $p(y|\pi)$ and prior $p(\pi)$) and use your model to answer the following questions:

Hint: With a conjugate prior a closed form posterior is Beta form (see equations in the book). You can then use betapdf, betacdf and betainv functions in Matlab, or functions in class scipy.stats.beta in Python.

a) What can you say about the value of the unknown $\pi$ according to the observations and your prior knowledge? Summarize your results using point and interval estimates.

The following is given

$$p(y|\pi) = Binom(\pi) = {n \choose y}  \pi^y(1-\pi)^{n-y} $$

$$p(\pi) = Beta(2, 10) = \frac{\pi^{2-1}(1-\pi)^{10-1}}{B(2,10)} $$

Using bayes rule to get the posterior

$$p(\pi|y) = \frac{p(y|\pi)p(\pi)}{p(y)}$$

Inserting the definitions of $p(y|\pi)$ and $p(\pi)$

$$p(\pi|y) = \frac{{n \choose y}  \pi^y(1-\pi)^{n-y}\frac{\pi^{2-1}(1-\pi)^{10-1}}{B(2,10)}}{p(y)}$$

Re-arranging all the constant terms

$$p(\pi|y) = \frac{{n \choose y}}{p(y)B(2,10)} \pi^y(1-\pi)^{n-y}\pi^{2-1}(1-\pi)^{10-1}$$

Dropping the constant term.

$$p(\pi|y) \propto \pi^y(1-\pi)^{n-y}\pi^{2-1}(1-\pi)^{10-1}$$

Reducing

$$p(\pi|y) \propto \pi^{y+2-1}(1-\pi)^{n-y+10-1}$$

This is a $Beta(y+2, n-y+10)$ distribution which we can plot

In [34]:
import scipy.stats as stats
import numpy as np
data = np.loadtxt('algae.txt')
y = data.sum()
n = data.size

a = y + 2
b = n - y + 10
print "mode: %f"%(y/n)
print "95 central interval: %s"%(stats.beta.interval(0.95,int(a),int(b)),)

mode: 0.160584
95 central interval: (0.12065601480266504, 0.20551209692962358)


b) Is the proportion of monitoring sites with detectable algae levels π smaller than π0 = 0.2 that is known from historical records?

c) What assumptions are required in order to use this kind of a model with this type of data?

d) Make prior sensitivity analysis by testing different reasonable priors. Summarize the results by one or two sentences.